CN110414543A - A kind of method of discrimination, equipment and the computer storage medium of telephone number danger level - Google Patents

A kind of method of discrimination, equipment and the computer storage medium of telephone number danger level Download PDF

Info

Publication number
CN110414543A
CN110414543A CN201810404735.9A CN201810404735A CN110414543A CN 110414543 A CN110414543 A CN 110414543A CN 201810404735 A CN201810404735 A CN 201810404735A CN 110414543 A CN110414543 A CN 110414543A
Authority
CN
China
Prior art keywords
telephone number
formula
feature index
doubtful violation
identified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810404735.9A
Other languages
Chinese (zh)
Inventor
张滨
赵刚
袁捷
冯运波
于乐
江为强
王言青
彭刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201810404735.9A priority Critical patent/CN110414543A/en
Publication of CN110414543A publication Critical patent/CN110414543A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the invention discloses method of discrimination, equipment and the computer storage mediums of a kind of telephone number danger level, and the method includes inputting telephone number to be detected;According to preset doubtful violation telephone number identification model, doubtful violation telephone number is identified from the telephone number to be detected;The danger level of identified doubtful violation telephone number is determined according to preset telephone number danger level decision model.The violation phone for different danger levels may be implemented according to the present embodiment the method, different interception and resolution are formulated, to avoid wasting the effect of unnecessary abatement resource.

Description

A kind of method of discrimination of telephone number danger level, equipment and computer storage Medium
Technical field
The present embodiments relate to field of communication security more particularly to a kind of method of discrimination of telephone number danger level, Equipment and computer storage medium.
Background technique
Violation phone broad categories at present, it is more rampant, to user's exploitation content, the information such as manage money matters, sell house, even Organized premeditated swindle user's wealth, case take place frequently, and have seriously affected daily life, therefore, part telecommunications fortune Battalion quotient is directed to the different types of harassing and wrecking fraudulent call propositions such as the number of changing phone between international fraudulent call, net, " ringing a sound " phone and controls Reason measure.But it is had the disadvantage in that in the prior art for the control measures of violation phone
1, lack the systemic intellectual analysis to violation phone, can not determine the danger level of violation number;
2, targetedly interception and resolution can not be formulated violation number.
Summary of the invention
In order to solve the above technical problems, an embodiment of the present invention is intended to provide a kind of differentiation sides of telephone number danger level Method can distinguish the violation phone of different danger levels, to intercept formulating and treat with a certain discrimination in resolution, divide and control It, the order of importance and emergency to save unnecessary abatement resource waste and administering provides support.
The technical scheme of the present invention is realized as follows:
In a first aspect, the embodiment of the invention provides a kind of method of discrimination of telephone number danger level, the method packet It includes:
Input telephone number to be detected;
According to preset doubtful violation telephone number identification model, doubtful disobey is identified from the telephone number to be detected Advise telephone number;
The danger of identified doubtful violation telephone number is determined according to preset telephone number danger level decision model Rank.
In the above scheme, described according to preset doubtful violation telephone number identification model, from described to be detected Before identifying doubtful violation telephone number in telephone number, the method also includes:
Construct the doubtful violation telephone number identification model.
In the above scheme, the building doubtful violation telephone number identification model, comprising:
Pass through the communication signaling data acquisition training sample set in acquisition history call event;
Determine the fisrt feature index value collection of the training sample set;
Mathematical model is constructed based on the fisrt feature index value collection;
The accuracy rate of the mathematical model, recall ratio and preset threshold value are compared;
Do not reach preset threshold value corresponding to any of the accuracy rate and recall ratio, for the mathematical model into Row optimization;
Reach preset threshold value corresponding to the accuracy rate and recall ratio, the mathematical model is determined as the doubtful violation Phone identification model.
It is in the above scheme, described that mathematical model is constructed based on the fisrt feature index value collection, comprising:
Determine that the fisrt feature index value collection is the input parameter of the mathematical model, the violation of telephone number is identified as The output parameter of the mathematical model, the violation mark is for identifying whether the telephone number is doubtful violation phone number Code;
It is normalized for the input parameter;
Based on after normalization input parameter and the output parameter according to algorithm of support vector machine determine mathematical model.
In the above scheme, the input parameter based on after normalization and the output parameter are calculated according to support vector machines Method determines mathematical model, comprising:
Lagrangian is established according to formula 1:
Wherein, ai>=0 indicates Lagrange multiplier, xiInput parameter after indicating i-th of normalization, yiIndicate i-th it is defeated Parameter out;
Formula 2 and formula 3 are obtained for 0 to the partial derivative of ω and b respectively based on formula 1:
Formula 2 and formula 3 are substituted into formula 1 respectively and obtain formula 4:
Wherein, ai>=0,
By in the input parameter substitution formula 4 and formula 5 after the normalization, a is determinediMeet the vector x of formula 5 when >=0p, described Formula 5 are as follows:
ai·{[(ω·xi)+b]yi- 1 }=0 (5);
By the supporting vector xpSubstitution formula 5 obtains the value of threshold value b;
By the supporting vector xp, the threshold value b value and 2 substitution formula 6 of formula obtain support vector cassification prediction letter Number:The formula 6 are as follows:
G (x)=(ω x)+b (6).
In the above scheme, the accuracy rate of the mathematical model, recall ratio and preset threshold value are compared described Before, the method also includes:
Obtain verifying sample set;
Based on the mathematical model, doubtful violation telephone number is tested in identification from the verifying sample set;
The accuracy rate of the mathematical model is calculated based on the doubtful violation telephone number of the test and the verifying sample set And recall ratio.
In the above scheme, the fisrt feature index value collection is the statistics value set of fisrt feature index, wherein described Fisrt feature index includes:
The calling frequency, called number, the duration of call, ring duration, actively discharge number, passively release number, be called from Divergence, same caller called number between related coefficient, the identical ten thousand number sections maximum frequency of calling, caller accounting, between call time Every standard deviation.
In the above scheme, it is described according to preset telephone number danger level decision model determine it is identified it is doubtful disobey Advise the danger level of telephone number, comprising:
Obtain the second feature index value collection of the identified doubtful violation telephone number;
After carrying out smooth transformation for the second feature index value collection, based on the second feature index value after smooth transformation Collection calculates the danger level score value of the identified doubtful violation telephone number according to entropy algorithm;
Based on the danger level score value, the identified doubtful violation phone number is directed to according to K-Means clustering algorithm Code is classified;
The danger level of sorted identified doubtful violation telephone number is determined according to radar map analytic approach.
In the above scheme, the second feature index value collection is the statistics value set of second feature index, wherein described Second feature index, comprising: the calling frequency, called number, the duration of call, ring duration, actively discharge number, passively release it is secondary Number, called dispersion, same caller called number between related coefficient, complained number, labeled number.
It is in the above scheme, described to carry out smooth transformation for the second feature index value collection, comprising:
Smooth transformation, the formula 7 are carried out to the second feature index value collection according to formula 7 are as follows:
xij'=log (xij+ 1) (7),
Wherein, xijIndicate that the second feature index value of i-th of identified doubtful violation telephone number concentrates j-th of feature Index value, xij' indicate that the second feature index value of i-th of identified doubtful violation telephone number after smooth transformation concentrates the J characteristic index value.
In the above scheme, the second feature index value collection based on after smooth transformation is according to described in the calculating of entropy algorithm The danger level score value of identified doubtful violation telephone number, comprising:
The second feature index value collection after the smooth transformation is normalized according to formula 8, the formula 8 are as follows:
Wherein, x 'ijThe second feature index value of i-th of identified doubtful violation telephone number after indicating smooth transformation Concentrate j-th of characteristic index value, min (x'j) indicate that j-th of index is in identified doubtful violation electricity in second feature index set Talk about the minimum value in number, max (x'j) indicate that j-th of index is in identified doubtful violation phone in second feature index set Maximum value in number, x "ijIndicate i-th of identified doubtful violation phone number after smooth transformation, normalized The second feature index value of code concentrates j-th of characteristic index value;
The probability that each index occurs in the second feature index, the formula 9 are calculated according to formula 9 are as follows:
Wherein, m indicates the total number of identified doubtful violation telephone number;
The weight of each index in the second feature index, the formula 10 and the formula 11 are calculated according to formula 10 and formula 11 Are as follows:
Wherein, wjIndicate that the weighted value of j-th of index in second feature index, n indicate the total number of second feature index, ejIndicate the entropy of j-th of index in second feature index;
The danger level score value of the identified doubtful violation telephone number, the formula 12 are calculated according to formula 12 are as follows:
Wherein, wjIndicate the weighted value of j-th of index in second feature index, x "ijIt indicates by smooth transformation, normalization The second feature index value of treated i-th of identified doubtful violation telephone number concentrates j-th of characteristic index value, FiTable Show the danger level score value of i-th of identified doubtful violation telephone number.
In the above scheme, described to be based on the danger level score value, it has been identified according to K-Means clustering algorithm for described Doubtful violation telephone number classify, comprising:
Step 1 determines cluster number of clusters K based on preset classification number;
Step 2 determines classified sample set X (x based on the danger level score valuei)xi∈Rn, and the classified sample set is returned For a cluster;
Step 3 randomly selects 2 cluster centre point μ in the cluster12∈Rn
Step 4, the affiliated cluster that each classification samples in the cluster are calculated according to formula 13, the formula 13 are as follows:
Cl=argmin | | xil||2, l=1,2 (13),
Wherein, xiPresentation class sample, μlIndicate cluster centre point;
Step 5, the center μ that each cluster is updated according to formula 14l, the formula 14 are as follows:
Wherein,Indicate first of cluster, t-th of classification samples, s indicates the classification samples sum in first of cluster;
Step 6 repeats step 4 and step 5, until distortion functionConvergence;
Step 7 calculates error sum of squares according to formula 15, chooses the maximum cluster of the error sum of squares as next division Cluster, the formula 15 are as follows:
Wherein, xiIndicate i-th of classification samples, ulIndicate that the central point of first of cluster, z indicate the classification samples of each cluster Number;
Step 8 repeats step 3 to step 7 until the number of cluster is K.
In the above scheme, described to determine the sorted identified doubtful violation phone according to radar map analytic approach The danger level of number, comprising:
Determine the third feature index value of the sorted identified doubtful violation telephone number, the third feature Index includes: caller average, dispersion average value, caller release average time, average ring duration, average call duration;
Based on the third feature index value, the corresponding phone number of the cluster centre point is drawn according to radar map analytic approach The radar map of code;
The danger level of the sorted identified doubtful violation telephone number is determined according to the radar map.
In the above scheme, the danger level, comprising: high-risk number, middle danger number, low danger number, security number.
In the above scheme, the method also includes:
Based on the danger level of the doubtful violation telephone number, interception is targetedly formulated according to local policy and is administered Scheme.
Second aspect, the embodiment of the invention provides a kind of discriminating devices, comprising: network interface, memory and processor;
Wherein,
The network interface, during being configured to be received and sent messages between other ext nal network elements, the reception of signal And transmission;
The memory is configured to the computer program that storage can be run on a processor;
The processor is configured to when running the computer program, executes any one of first aspect the method The step of.
The third aspect, the embodiment of the invention provides a kind of computer storage medium, the computer storage medium storage There is discriminating program, the step of any one of first aspect the method is realized when the discriminating program is executed by least one processor Suddenly.
The embodiment of the invention provides a kind of telephone number hazard class method for distinguishing, equipment and computer storage mediums, originally Inventive embodiments are by inputting telephone number to be detected, according to preset doubtful violation telephone number identification model, from described Doubtful violation telephone number is identified in telephone number to be detected, is determined according to preset telephone number danger level decision model The danger level of identified doubtful violation telephone number is realized the violation phone for being directed to different danger levels, is formulated different Interception and resolution, to avoid wasting the effect of unnecessary abatement resource.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of the method for discrimination of telephone number danger level provided in an embodiment of the present invention;
Fig. 2 is the radar map schematic diagram that four cluster centres point provided in an embodiment of the present invention corresponds to telephone number;
Fig. 3 is a kind of structural schematic diagram of discriminating device provided in an embodiment of the present invention;
Fig. 4 is the structural schematic diagram of another discriminating device provided in an embodiment of the present invention;
Fig. 5 is the structural schematic diagram of another discriminating device provided in an embodiment of the present invention;
Fig. 6 is the structural schematic diagram of another discriminating device provided in an embodiment of the present invention;
Fig. 7 is a kind of hardware structural diagram of discriminating device provided in an embodiment of the present invention.
Specific embodiment
In the embodiment of the present invention, the number dialed is called number, is dialed and the side answered is known as callee, main The dynamic side for dialing called party number is known as calling party.
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description.
Embodiment one
It is described it illustrates a kind of method of discrimination of telephone number danger level provided in an embodiment of the present invention referring to Fig. 1 Method includes:
S101: telephone number to be detected is inputted;
S102: it according to preset doubtful violation telephone number identification model, is identified from the telephone number to be detected Doubtful violation telephone number;
S103: identified doubtful violation telephone number is determined according to preset telephone number danger level decision model Danger level.
The embodiment of the present invention is known from telephone number to be detected according to preset doubtful violation telephone number identification model Not doubtful violation telephone number, and then identified doubtful violation is determined according to preset telephone number danger level decision model The danger level of telephone number is targetedly formulated to realize the violation phone for different danger levels and intercepts and administer The effect of scheme further avoids unnecessary abatement resource waste.
It should be noted that described according to preset doubtful violation telephone number identification model, from described to be detected Before identifying doubtful violation telephone number in telephone number, the method also includes:
Construct the doubtful violation telephone number identification model.
Specifically, the building doubtful violation telephone number identification model, comprising:
Pass through the communication signaling data acquisition training sample set in acquisition history call event;
Determine the fisrt feature index value collection of the training sample set;
Mathematical model is constructed based on the fisrt feature index value collection;
The accuracy rate of the mathematical model, recall ratio and preset threshold value are compared;
Do not reach preset threshold value corresponding to any of the accuracy rate and recall ratio, for the mathematical model into Row optimization;
Reach preset threshold value corresponding to the accuracy rate and recall ratio, the mathematical model is determined as the doubtful violation Phone identification model.
It is appreciated that the identification of doubtful violation telephone number is determined based on preset mathematical model.Mathematical modeling Process includes collecting a large amount of training sample data, forms training sample set, and analyzing and training sample set obtains fisrt feature index, system The corresponding fisrt feature index value of training sample is counted, fisrt feature index value collection is formed.Characteristic index value based on training sample Collection, according to specific algorithm founding mathematical models.Since mathematical model is constructed on the basis of limited training sample, It needs to be verified the reasonability to determine mathematical model using performance of the Given information to mathematical model.In the embodiment of the present invention In, the performance of mathematical model is verified using accuracy rate and recall ratio.When specific implementation, by the accuracy rate and Cha Quan of mathematical model Rate and preset threshold value are compared, when the accuracy rate of mathematical model and recall ratio all meet preset threshold value, determine described in Mathematical model is verified, and then the mathematical model is determined as doubtful violation telephone number identification model;Work as mathematical model Accuracy rate and any one of recall ratio when being unsatisfactory for preset threshold value, determine that the mathematical model verifying does not pass through, need to institute It states mathematical model to optimize, it is preferred that can optimize according to parameter of the particle swarm algorithm to mathematical model, until mathematics The accuracy rate and recall ratio of model meet preset threshold requirement.
Specifically, the communication signaling data acquisition training sample set by acquisition history call event, comprising:
Call event signaling call bill data is acquired, blacklist telephone number and the corresponding communication of white list telephone number are obtained Information, the blacklist telephone number and white list telephone number are for obtaining training sample set;Wherein, the blacklist phone Number includes the violation telephone number having been acknowledged, institute's white list telephone number includes the electricity extracted according to the operation manual of setting Talk about number;
It deletes invalid in the communication information and has false information, obtain training sample set.
When specific implementation, call event signaling ticket can be acquired from province's signaling monitoring system by way of signal collecting Data.The violation number that history is had been acknowledged is extracted white as blacklist telephone number source by white list operation manual List telephone number.Blacklist telephone number, white list telephone number and signaling ticket are matched, extracted continuous N days The communication information of blacklist telephone number and white list telephone number.Wherein N is greater than the number equal to 1, can be according to local Strategy determines.Extracted Subscriber Number is marked to distinguish blacklist telephone number and white list telephone number, simultaneously Rejecting critical field data format is wrong or critical field data are there are the record of vacancy value, and residue record is used as training sample Collection.
After training sample set has been determined, analyzing and training sample set obtains fisrt feature index, and further statistics obtains The corresponding fisrt feature index value collection of training sample set, the fisrt feature index value collection are the statistical value collection of fisrt feature index It closes, wherein the fisrt feature index includes: the calling frequency, called number, the duration of call, ring duration, actively release time Number, passively release number, called dispersion, same caller called number between related coefficient, the maximum frequency of identical ten thousand number section of calling Secondary, caller accounting, call time separation standard are poor.Wherein, the call time separation standard difference needs 3 or 3 or more Called phone number.For example, training sample set is A { ai, wherein aiIt indicates i-th of training sample, counts training sample set The fisrt feature index value collection X { x of Ai, wherein xiIndicate that training sample concentrates the corresponding fisrt feature of i-th of training sample to refer to Scale value, the fisrt feature index value are a vectors, and the dimension of vector is equal to fisrt feature index middle finger target total number.Again It is secondary for example, training sample A { 12345678911,32546789541 }, wherein 12345678911 be blacklist phone number Code, 32546789541 be white list telephone number.Communication signaling based on acquisition includes: calling frequency according to fisrt feature index Secondary, called number, ring duration, actively discharges number, passively releases number, is called dispersion, same caller the duration of call Related coefficient, the identical ten thousand number sections maximum frequency of calling, caller accounting, call time separation standard are poor between called number, statistics instruction The fisrt feature index value for practicing sample set A integrates as X { 4,6,1,1,0,10,2,0.8,5,0.2,2;2,3,20,4,4,1,0.2, 0.8,0,0.9,2}。
It is further, described that mathematical model is constructed based on the fisrt feature index value collection, comprising:
Determine that the fisrt feature index value collection is the input parameter of the mathematical model, the violation of telephone number is identified as The output parameter of the mathematical model, the violation mark is for identifying whether the telephone number is doubtful violation phone number Code;
It is normalized for the input parameter;
Based on after normalization input parameter and the output parameter according to algorithm of support vector machine determine mathematical model.
When specific implementation, fisrt feature index value collection X { x is obtainedi, wherein xiIndicate that training sample concentrates i-th of trained sample This fisrt feature index value, the fisrt feature index value is a vector;Set Y { yiBe telephone number mark, use In identification telephone numbers whether be doubtful violation telephone number, wherein yi∈ { -1 ,+1 }, wherein yi=+1 indicates training sample set In i-th of training sample be doubtful violation telephone number, yi=-1 indicates that training sample i-th of training sample of concentration is non-violation Telephone number.In embodiments of the present invention, it determines that X is the input parameter of mathematical model, determines that the output that Y is mathematical model is joined Number.Since algorithm of support vector machine only receives the data between [- 1~+1], so place need to be normalized to input parameter X Input parameter X is mapped in [0~1] range by reason.Based on the input parameter after normalization and the output parameter determined according to branch Hold vector machine algorithm building mathematical model.
Further, the input parameter based on after normalization and the output parameter are determined according to algorithm of support vector machine Mathematical model, comprising:
Lagrangian is established according to formula 1:
Wherein, ai>=0 indicates Lagrange multiplier, xiInput parameter after indicating i-th of normalization, yiIndicate i-th it is defeated Parameter out;
Formula 2 and formula 3 are obtained for 0 to the partial derivative of ω and b respectively based on formula 1;
Formula 2 and formula 3 are substituted into formula 1 respectively and obtain formula 4:
Wherein, ai>=0,
By in the input parameter substitution formula 4 and formula 5 after the normalization, a is determinediMeet the vector x of formula 5 when >=0p, described Formula 5 are as follows:
ai·{[(ω·xi)+b]yi- 1 }=0 (5);
By the supporting vector xpSubstitution formula 5 obtains the value of threshold value b;
By the supporting vector xp, the threshold value b value and 2 substitution formula 6 of formula obtain support vector cassification prediction letter Number:The formula 6 are as follows:
G (x)=(ω x)+b (6).
It is appreciated that support vector machines theory is originally sourced from the research to data classification problem, meet classification by finding It is required that optimizing decision hyperplane also maximize the white space of hyperplane two sides while guaranteeing nicety of grading, from And guarantee to solve the optimization of linear separability problem.The process of support vector machines training is exactly to find the mistake of optimal classification line Journey.
When specific implementation, the training process of algorithm of support vector machine can include but is not limited to following steps:
Step 1: setting the linear discriminant function of two classification are as follows: g (x)=(ω x)+b, wherein x is defeated after normalizing Enter parameter, ω is weight vector, and b is classification thresholds, and (ω x) indicates the inner product of vector ω and x;
Step 2: whether training of judgement sample meets constraint condition:If meeting Constraint condition, then in the training sample, there are an optimal classification lines (linear discriminant function): g (x)=(ω x)+b=0 Can correctly be classified all samples and class interval maximum, in optimal classification line, because x is the input parameter after normalization, and Input parameter after normalization be it is known, seeking the process of discriminant classification function g (x) is exactly to seek the process of ω and b;It needs to illustrate It if training sample is unsatisfactory for constraint condition, can be mapped by a kernel function, training sample is mapped to higher-dimension sky Between in, be exactly linear separability in higher dimensional space training sample;
Step 3: the sample nearest from classification line in training sample is supporting vector xp, meet | g (xp) | -1=0, root According to geometric knowledge, then Euclidean distance of the supporting vector to classification line are as follows:Wherein, xpFor supporting vector;ypFor branch The desired output of vector is held, g (x) is discriminant function;
Step 4: then needing to ask maximized class interval to acquire optimal classification line under conditions of linear separabilityAccording to mathematical knowledge, it can be converted into and ask minimumExtreme value is sought according to constraint condition, belongs to convex quadratic programming problem, therefore Existence anduniquess optimal solution, therefore problem is solved using method of Lagrange multipliers;
Step 5: firstly, establishing Lagrangian:In formula: ai >=0 is Lagrange multiplier;
Step 6: the saddle point of Lagrangian is optimal solution, it is 0 to the partial derivative of ω and b on saddle point, it may be assumed thatThe above problem is thus converted into dual problem, it may be assumed that
Step 7: the optimal solution of dual problem must satisfy following condition: ai·{[(ω·xi)+b]yi- 1 }=0, find aiValue, which is not zero and meets the vector of above-mentioned condition, as supporting vector, acquires differentiation for supporting vector input constraint condition Function threshold b;
Step 8: according to the supporting vector of searching and threshold value b is acquired, the solution ω (a) of dual problem is sought, then obtains most optimal sorting Class line (discriminant function):Classification, which is converted, by optimal classification function predicts optimal differentiation letter Number:As support vector cassification function.
After establishing mathematical model, needs to verify based on performance of the Given information to the mathematical model of building, test Before card, need to calculate the performance indicator of mathematical model.Based on this, in technical solution shown in Fig. 1, it is described described by institute State mathematical model accuracy rate, recall ratio and preset threshold value be compared before, the method also includes:
Obtain verifying sample set;
Based on the mathematical model, doubtful violation telephone number is tested in identification from the verifying sample set;
The accuracy rate of the mathematical model is calculated based on the doubtful violation telephone number of the test and the verifying sample set And recall ratio.
When specific implementation, the black phone number list and history that can be had confirmed that from third party's acquisition history have confirmed that white Telephone number list, and it is true according to fisrt feature index statistical history has confirmed that respectively black phone number list and history The corresponding test fisrt feature index value collection of the white telephone number list recognized, to obtain verifying sample set.It is appreciated that verifying Sample set includes: the white telephone number list that the black phone number list that has confirmed that of history, history have confirmed that, tests fisrt feature Index value collection.Test fisrt feature index value collection is input to the mathematical model and is verified result.Determine in verification result+ 1 corresponding telephone number is to test doubtful violation telephone number, by the doubtful violation telephone number of the test and verifying sample set In black phone number list compare, correct number in the doubtful violation telephone number of test is obtained, according to accuracy rate=survey Try the doubtful violation telephone number total number of correct number/test in doubtful violation telephone number;Recall ratio=test is doubtful to be disobeyed The number for advising telephone number in the black phone number list in telephone number in correct number/verifying sample set calculates mathematical modulo The accuracy rate and recall ratio of type.If the accuracy and recall ratio of mathematical model all meet preset threshold requirement, enter electricity The judgement of number danger level is talked about, otherwise any one of accuracy and recall ratio are unsatisfactory for preset threshold value, then needing for number It learns model to optimize, until accuracy and recall ratio all meet preset threshold value.
After obtaining doubtful violation telephone number in the embodiment of the present invention, the danger of doubtful violation telephone number is further determined that Dangerous rank is based on this, described true according to preset telephone number danger level decision model in technical solution shown in Fig. 1 The danger level of fixed identified doubtful violation telephone number, comprising:
Obtain the second feature index value collection of the identified doubtful violation telephone number;
After carrying out smooth transformation for the second feature index value collection, based on the second feature index value after smooth transformation Collection calculates the danger level score value of the identified doubtful violation telephone number according to entropy algorithm;
Based on the danger level score value, the identified doubtful violation phone number is directed to according to K-Means clustering algorithm Code is classified;
The danger level of sorted identified doubtful violation telephone number is determined according to radar map analytic approach.
It should be noted that the second feature index value collection is the statistics value set of second feature index, wherein described Second feature index, comprising: the calling frequency, called number, the duration of call, ring duration, actively discharge number, passively release it is secondary Number, called dispersion, same caller called number between related coefficient, complained number, labeled number.When specific implementation, The doubtful more apparent index of violation telephone feature in fisrt feature index can be chosen, comprising: the calling frequency, called number, The duration of call, ring duration, actively discharge number, passively release number, called dispersion, same caller called number between phase Relationship number etc., while the calling information and third party's mark data of user in operator's difference channel are extracted, and count and obtain Index including being complained number, labeled number etc., to obtain second feature index.By identified doubtful violation number It is matched with second feature index, statistics obtains the second feature index value collection of identified doubtful violation number.
It is further, described to carry out smooth transformation for the second feature index value collection, comprising:
Smooth transformation, the formula 7 are carried out to the second feature index value collection according to formula 7 are as follows:
xij'=log (xij+ 1) (7),
Wherein, xijIndicate that the second feature index value of i-th of identified doubtful violation telephone number is concentrated j-th Characteristic index value, xijThe second feature index value collection of i-th of identified doubtful violation telephone number after ' expression smooth transformation In j-th of characteristic index value.
It is appreciated that forming data for the influence for avoiding extreme value and obtaining in normal distribution as far as possible, need to carry out data Smooth transformation.
Further, the second feature index value collection based on after smooth transformation calculates described identified according to entropy algorithm Doubtful violation telephone number danger level score value, comprising:
The second feature index value collection after the smooth transformation is normalized according to formula 8, the formula 8 are as follows:
Wherein, x 'ijThe second feature index value collection of i-th of identified doubtful violation telephone number after indicating smooth transformation In j-th of characteristic index value, min (x'j) indicate that j-th of index is in identified doubtful violation phone in second feature index set Minimum value in number, max (x'j) indicate that j-th of index is in identified doubtful violation phone number in second feature index set Maximum value in code, x "ijIndicate i-th of identified doubtful violation telephone number after smooth transformation, normalized Second feature index value concentrate j-th of characteristic index value;
The probability that each index occurs in the second feature index, the formula 9 are calculated according to formula 9 are as follows:
Wherein, m indicates the total number of identified doubtful violation telephone number;
The weight of each index in the second feature index, the formula 10 and the formula 11 are calculated according to formula 10 and formula 11 Are as follows:
Wherein, wjIndicate that the weighted value of j-th of index in second feature index, n indicate the total of the second feature index Number, ejIndicate the entropy of j-th of index in second feature index;
The danger level score value of the identified doubtful violation telephone number, the formula 12 are calculated according to formula 12 are as follows:
Wherein, wjIndicate the weighted value of j-th of index in second feature index, x "ijIt indicates by smooth transformation, normalization The second feature index value of treated i-th of identified doubtful violation telephone number concentrates j-th of characteristic index value, FiTable Show the danger level score value of i-th of identified doubtful violation telephone number.
It is understood that in information theory, entropy is to a kind of probabilistic measurement.Information content is bigger, and uncertainty is more Small, entropy is also just smaller;Information content is smaller, and uncertain bigger, entropy is also bigger.Therefore, according to the characteristic of entropy, we can use entropy Value judges the dispersion degree of some index, and the dispersion degree of index is bigger, and influence of the index to overall merit is bigger.At this By calculating the entropy of each index in second feature index in inventive embodiments, so it is each in second feature index according to obtaining The entropy of a index calculates the weight of each index.It is each in the weight characterization second feature index of each index in second feature index Disturbance degree of a index to violation telephone number danger level score value.It should be noted that the sum of weight of each index is 1.
Based on the danger level score value of determining doubtful violation telephone number, can be disobeyed further according to clustering algorithm to doubtful Rule telephone number is classified.It is described to be based on the danger level score value based on this, according to K-Means clustering algorithm for described Identified doubtful violation telephone number is classified, comprising:
Step 1 determines cluster number of clusters K based on preset classification number;
Step 2 determines classified sample set X (x based on the danger level score valuei)xi∈Rn, and the classified sample set is returned For a cluster;
Step 3 randomly selects 2 cluster centre point μ in the cluster12∈Rn
Step 4, the affiliated cluster that each classification samples in the cluster are calculated according to formula 13, the formula 13 are as follows:
Cl=argmin | | xil||2, l=1,2 (13),
Wherein, xiPresentation class sample, μlIndicate cluster centre point;
Step 5, the center μ that each cluster is updated according to formula 14l, the formula 14 are as follows:
Wherein,Indicate first of cluster, t-th of classification samples, s indicates the classification samples sum in first of cluster;
Step 6 repeats step 4 and step 5, until distortion functionConvergence;
Step 7 calculates error sum of squares according to formula 15, chooses the maximum cluster of the error sum of squares as next division Cluster, the formula 15 are as follows:
Wherein, xiIndicate i-th of classification samples, ulIndicate that the central point of first of cluster, z indicate the classification samples of each cluster Number;
Step 8 repeats step 3 to step 7 until the number of cluster is K.
It is appreciated that K-means algorithm is clustering algorithm typically based on distance, using distance commenting as similitude Valence index thinks that the distance of two objects is closer, similarity is bigger.Cluster is by apart from close in K-means algorithm Object composition, the final goal of K-means algorithm is to obtain compact and independent cluster.In K-means algorithm implementation procedure The middle first step is center of the random any k object of selection as initial clustering, initially represents a cluster.And then each Remaining each object is concentrated to data in iteration, according to each object assigns each object again at a distance from each cluster center To nearest cluster.After having investigated all data objects, an iteration operation is completed, and new cluster centre is computed.Such as For fruit before and after an iteration, the value of distortion function illustrates that K-means algorithm has been restrained there is no variation.It needs to illustrate It is that in embodiments of the present invention, the number K of cluster can be set to 4, therefore after the processing by K-means algorithm, it can be with Doubtful violation telephone number is divided into 4 classes.
Further, danger level is determined for sorted doubtful violation telephone number.Technical solution shown in Fig. 1 In, the danger level that the sorted identified doubtful violation telephone number is determined according to radar map analytic approach, packet It includes:
It is corresponding for the cluster centre point in the sorted identified doubtful violation telephone number, determining every one kind Telephone number third feature index value, the third feature index includes: caller average, dispersion average value, caller Discharge average time, average ring duration, average call duration;
Based on the third feature index value, the corresponding phone number of the cluster centre point is drawn according to radar map analytic approach The radar map of code;
The danger level of the sorted identified doubtful violation telephone number is determined according to the radar map.
In one possible implementation, the danger level, comprising: high-risk number, middle danger number, low danger number, Security number.
It is understood that the radar map drawn according to third feature index value by the identified doubtful violation telephone number of four classes It distinguishes, and there is certain discrimination.The third feature index value of high-risk number is integrally higher, and the third of other grades is special Sign index value will be reduced successively.For example: four classes cluster the corresponding telephone number of central point are as follows: the first kind: 18123569843, the second class: 13723766983, third class: 15523764583, the 4th class: 18126539873.Referring to table 1 Shown, it illustrates the third feature index value statistical forms that four cluster centre points correspond to telephone number, according to shown in table 1 Three characteristic index values draw radar map, and shown in Figure 2, it illustrates the radars that four cluster centre points correspond to telephone number Figure.Based on Fig. 2 it can be seen that the discrimination of 4 class telephone numbers is more apparent, determines and be followed successively by high-risk number in radar map from outside to inside Code, middle danger number, low danger number, security number therefore deduce that: the first kind for being cluster centre point with 18123569843 Telephone number is high-risk number, for the second class telephone number of cluster centre point is middle danger number with 13723766983, with 15523764583 for the third class telephone number of cluster centre point be low danger number, is cluster centre point with 18126539873 4th class telephone number is security number.
Table 1
For the telephone number of danger level has been determined, reasonable resolution can be set according to local policy, be based on This, technical solution shown in FIG. 1 further include: the danger level based on the doubtful violation telephone number, according to local policy needle To the formulation interception of property and resolution.
It is appreciated that in embodiments of the present invention, for high-risk number, can be, but not limited to setting and intercept or refuse to connect Enter;For middle danger number, can be, but not limited to that blacklist is added;For low danger number, can be formulated according to local policy specific Resolution;For right number, can be handled according to normal flow.
The embodiment of the present invention determines doubtful violation phone by constructing doubtful violation telephone number identification model, and further The danger level of doubtful violation phone is determined according to the algorithm of setting, so that operator can targetedly formulate improvement side Case avoids unnecessary abatement resource from wasting.
Embodiment two
Based on the identical inventive concept of previous embodiment, referring to Fig. 3, it illustrates one kind provided in an embodiment of the present invention to sentence The structural schematic diagram of other equipment 30, as shown in figure 3, the discriminating device 30 includes: importation 301, identification division 302, determines Part 303;Wherein,
Importation 301: it is configured to input telephone number to be detected;
Identification division 302: it is configured to according to preset doubtful violation telephone number identification model, from the electricity to be detected Doubtful violation telephone number is identified in words number;
It determines part 303: being configured to determine according to preset telephone number danger level decision model identified doubtful The danger level of violation telephone number;
In the above scheme, as shown in figure 4, the discriminating device 30 further includes building part 304, the building part 304 are configured to construct the doubtful violation telephone number identification model.
In the above scheme, 304 concrete configuration of building part are as follows:
Pass through the communication signaling data acquisition training sample set in acquisition history call event;
Determine the fisrt feature index value collection of the training sample set;
Mathematical model is constructed based on the fisrt feature index value collection;
The accuracy rate of the mathematical model, recall ratio and preset threshold value are compared;
Do not reach preset threshold value corresponding to any of the accuracy rate and recall ratio, for the mathematical model into Row optimization;
Reach preset threshold value corresponding to the accuracy rate and recall ratio, the mathematical model is determined as the doubtful violation Phone identification model.
In the above scheme, 304 concrete configuration of building part are as follows:
Determine that the fisrt feature index value collection is the input parameter of the mathematical model, the violation of telephone number is identified as The output parameter of the mathematical model, the violation mark is for identifying whether the telephone number is doubtful violation phone number Code;
It is normalized for the input parameter;
Based on after normalization input parameter and the output parameter according to algorithm of support vector machine determine mathematical model.
In the above scheme, 304 concrete configuration of building part are as follows:
Lagrangian is established according to formula 1:
Wherein, ai>=0 indicates Lagrange multiplier, xiInput parameter after indicating i-th of normalization, yiIndicate i-th it is defeated Parameter out;
Formula 2 and formula 3 are obtained for 0 to the partial derivative of ω and b respectively based on formula 1;
Formula 2 and formula 3 are substituted into formula 1 respectively and obtain formula 4:
Wherein, ai>=0,
By in the input parameter substitution formula 4 and formula 5 after the normalization, a is determinediMeet the vector x of formula 5 when >=0p, described Formula 5 are as follows:
ai·{[(ω·xi)+b]yi- 1 }=0 (5);
By the supporting vector xpSubstitution formula 5 obtains the value of threshold value b;
By the supporting vector xp, the threshold value b value and 2 substitution formula 6 of formula obtain support vector cassification prediction letter Number:The formula 6 are as follows:
G (x)=(ω x)+b (6).
In the above scheme, as shown in figure 5, the discriminating device 30 further includes calculating section 305, the calculating section 305 are configured that
Obtain verifying sample set;
Based on the mathematical model, doubtful violation telephone number is tested in identification from the verifying sample set;
The accuracy rate of the mathematical model is calculated based on the doubtful violation telephone number of the test and the verifying sample set And recall ratio.
In the above scheme, the fisrt feature index value collection is the statistics value set of fisrt feature index, wherein described Fisrt feature index includes:
The calling frequency, called number, the duration of call, ring duration, actively discharge number, passively release number, be called from Divergence, same caller called number between related coefficient, the identical ten thousand number sections maximum frequency of calling, caller accounting, between call time Every standard deviation.
In the above scheme, 303 concrete configuration of determining part are as follows:
Obtain the second feature index value collection of the identified doubtful violation telephone number;
After carrying out smooth transformation for the second feature index value collection, based on the second feature index value after smooth transformation Collection calculates the danger level score value of the identified doubtful violation telephone number according to entropy algorithm;
Based on the danger level score value, the identified doubtful violation phone number is directed to according to K-Means clustering algorithm Code is classified;
The danger level of sorted identified doubtful violation telephone number is determined according to radar map analytic approach.
In the above scheme, the second feature index value collection is the statistics value set of second feature index, wherein described Second feature index, comprising: the calling frequency, called number, the duration of call, ring duration, actively discharge number, passively release it is secondary Number, called dispersion, same caller called number between related coefficient, complained number, labeled number.
In the above scheme, 303 concrete configuration of determining part are as follows:
Smooth transformation, the formula 7 are carried out to the second feature index value collection according to formula 7 are as follows:
xij'=log (xij+ 1) (7),
Wherein, xijIndicate that the second feature index value of i-th of identified doubtful violation telephone number concentrates j-th of feature Index value, xij' indicate that the second feature index value of i-th of identified doubtful violation telephone number after smooth transformation concentrates the J characteristic index value.
In the above scheme, 303 concrete configuration of determining part are as follows:
The second feature index value collection after the smooth transformation is normalized according to formula 8, the formula 8 are as follows:
Wherein, x 'ijThe second feature index value of i-th of identified doubtful violation telephone number after indicating smooth transformation Concentrate j-th of characteristic index value, min (x'j) indicate that j-th of index is in identified doubtful violation electricity in second feature index set Talk about the minimum value in number, max (x'j) indicate that j-th of index is in identified doubtful violation phone in second feature index set Maximum value in number, x "ijIndicate i-th of identified doubtful violation phone number after smooth transformation, normalized The second feature index value of code concentrates j-th of characteristic index value;
The probability that each index occurs in the second feature index, the formula 9 are calculated according to formula 9 are as follows:
Wherein, m indicates the total number of identified doubtful violation telephone number;
The weight of each index in the second feature index, the formula 10 and the formula 11 are calculated according to formula 10 and formula 11 Are as follows:
Wherein, wjIndicate that the weighted value of j-th of index in second feature index, n indicate the total number of second feature index, ejIndicate the entropy of j-th of index in second feature index;
The danger level score value of the identified doubtful violation telephone number, the formula 12 are calculated according to formula 12 are as follows:
Wherein, wjIndicate the weighted value of j-th of index in second feature index, x "ijIt indicates by smooth transformation, normalization The second feature index value of treated i-th of identified doubtful violation telephone number concentrates j-th of characteristic index value, FiTable Show the danger level score value of i-th of identified doubtful violation telephone number.
In the above scheme, 303 concrete configuration of determining part are as follows:
Step 1 determines cluster number of clusters K based on preset classification number;
Step 2 determines classified sample set X (x based on the danger level score valuei)xi∈Rn, and the classified sample set is returned For a cluster;
Step 3 randomly selects 2 cluster centre point μ in the cluster12∈Rn
Step 4, the affiliated cluster that each classification samples in the cluster are calculated according to formula 13, the formula 13 are as follows:
Cl=argmin | | xil||2, l=1,2 (13),
Wherein, xiPresentation class sample, μlIndicate cluster centre point;
Step 5, the center μ that each cluster is updated according to formula 14l, the formula 14 are as follows:
Wherein,Indicate first of cluster, t-th of classification samples, s indicates the classification samples sum in first of cluster;
Step 6 repeats step 4 and step 5, until distortion functionConvergence;
Step 7 calculates error sum of squares according to formula 15, chooses the maximum cluster of the error sum of squares as next division Cluster, the formula 15 are as follows:
Wherein, xiIndicate i-th of classification samples, ulIndicate that the central point of first of cluster, z indicate the classification samples of each cluster Number;
Step 8 repeats step 3 to step 7 until the number of cluster is K.
In the above scheme, 303 concrete configuration of determining part are as follows:
Determine the third feature index value of the sorted identified doubtful violation telephone number, the third feature Index includes: caller average, dispersion average value, caller release average time, average ring duration, average call duration;
Based on the third feature index value, drawn according to radar map analytic approach described sorted identified doubtful separated Advise the radar map of telephone number;
The danger level of the sorted identified doubtful violation telephone number is determined according to the radar map.
In the above scheme, the danger level, comprising: high-risk number, middle danger number, low danger number, security number.
In the above scheme, as shown in fig. 6, the discriminating device further includes formulating part 306, the formulation part 306 It is configured that
Based on the danger level of the doubtful violation telephone number, interception is targetedly formulated according to local policy and is administered Scheme.
It is to be appreciated that in the present embodiment, " part " can be partial circuit, segment processor, subprogram or soft Part etc., naturally it is also possible to be unit, can also be that module is also possible to non-modularization.
In addition, each component part in the present embodiment can integrate in one processing unit, it is also possible to each list Member physically exists alone, and can also be integrated in one unit with two or more units.Above-mentioned integrated unit both can be with Using formal implementation of hardware, can also be realized in the form of software function module.If the integrated unit is with software The form realization of functional module is not intended as independent product and when selling or using, can store computer-readable at one It takes in storage medium, based on this understanding, the technical solution of the present embodiment substantially in other words contributes to the prior art Part or all or part of the technical solution can be embodied in the form of software products, the computer software product It is stored in a storage medium, including some instructions are used so that a computer equipment (can be personal computer, service Device or the network equipment etc.) or processor (processor) execute the present embodiment the method all or part of the steps.And Storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read Only Memory), random access memory The various media that can store program code such as device (RAM, Random Access Memory), magnetic or disk.
Therefore, a kind of computer storage medium is present embodiments provided, which is stored with method of discrimination, The step of method of discrimination realizes method described in above-described embodiment one when being executed by least one processor.
Based on above-mentioned discriminating device 30 and computer storage medium, referring to Fig. 7, it illustrates offers of the embodiment of the present invention A kind of discriminating device 30 specific hardware structure, may include: network interface 701, memory 702 and processor 703;It is each Component is coupled by bus system 704.It is understood that bus system 704 is logical for realizing the connection between these components Letter.Bus system 704 further includes power bus, control bus and status signal bus in addition in addition to including data/address bus.But it is For the sake of clear explanation, in Fig. 7 various buses are all designated as bus system 704.Wherein, network interface 701, be configured to During being received and sent messages between other ext nal network elements, signal is sended and received;
Memory 702 is configured to the computer program that storage can be run on processor 703;
Processor 703 is configured to when running the computer program, is executed:
Input telephone number to be detected;
According to preset doubtful violation telephone number identification model, doubtful disobey is identified from the telephone number to be detected Advise telephone number;
The danger of identified doubtful violation telephone number is determined according to preset telephone number danger level decision model Rank.
It is appreciated that the memory 702 in the embodiment of the present invention can be volatile memory or nonvolatile memory, It or may include both volatile and non-volatile memories.Wherein, nonvolatile memory can be read-only memory (Read- Only Memory, ROM), programmable read only memory (Programmable ROM, PROM), the read-only storage of erasable programmable Device (Erasable PROM, EPROM), electrically erasable programmable read-only memory (Electrically EPROM, EEPROM) or Flash memory.Volatile memory can be random access memory (Random Access Memory, RAM), be used as external high Speed caching.By exemplary but be not restricted explanation, the RAM of many forms is available, such as static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), double data speed synchronous dynamic RAM (Double Data Rate SDRAM, DDRSDRAM), enhanced Synchronous Dynamic Random Access Memory (Enhanced SDRAM, ESDRAM), synchronized links Dynamic random access memory (Synchlink DRAM, SLDRAM) and direct rambus random access memory (Direct Rambus RAM, DRRAM).The memory 702 of system and method described herein is intended to include but is not limited to these and arbitrarily its It is suitble to the memory of type.
And processor 703 may be a kind of IC chip, the processing capacity with signal.During realization, on Each step for stating method can be completed by the integrated logic circuit of the hardware in processor 703 or the instruction of software form. Above-mentioned processor 703 can be general processor, digital signal processor (Digital Signal Processor, DSP), Specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor are patrolled Collect device, discrete hardware components.It may be implemented or execute disclosed each method, step and the logical box in the embodiment of the present invention Figure.General processor can be microprocessor or the processor is also possible to any conventional processor etc..In conjunction with the present invention The step of method disclosed in embodiment, can be embodied directly in hardware decoding processor and execute completion, or use decoding processor In hardware and software module combination execute completion.Software module can be located at random access memory, and flash memory, read-only memory can In the storage medium of this fields such as program read-only memory or electrically erasable programmable memory, register maturation.The storage Medium is located at memory 702, and processor 703 reads the information in memory 702, and the step of the above method is completed in conjunction with its hardware Suddenly.
It is understood that embodiments described herein can with hardware, software, firmware, middleware, microcode or its Combination is to realize.For hardware realization, processing unit be may be implemented in one or more specific integrated circuit (Application Specific Integrated Circuits, ASIC), digital signal processor (Digital Signal Processing, DSP), digital signal processing appts (DSP Device, DSPD), programmable logic device (Programmable Logic Device, PLD), field programmable gate array (Field-Programmable Gate Array, FPGA), general processor, In controller, microcontroller, microprocessor, other electronic units for executing herein described function or combinations thereof.
For software implementations, it can be realized herein by executing the module (such as process, function etc.) of function described herein The technology.Software code is storable in memory and is executed by processor.Memory can in the processor or It is realized outside processor.
Specifically, it when the processor 703 in discriminating device 30 is additionally configured to run the computer program, executes aforementioned Method and step described in embodiment one, is not discussed here.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, the shape of hardware embodiment, software implementation or embodiment combining software and hardware aspects can be used in the present invention Formula.Moreover, the present invention, which can be used, can use storage in the computer that one or more wherein includes computer usable program code The form for the computer program product implemented on medium (including but not limited to magnetic disk storage and optical memory etc.).
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention.

Claims (17)

1. a kind of method of discrimination of telephone number danger level, which is characterized in that the described method includes:
Input telephone number to be detected;
According to preset doubtful violation telephone number identification model, doubtful violation electricity is identified from the telephone number to be detected Talk about number;
The danger level of identified doubtful violation telephone number is determined according to preset telephone number danger level decision model.
2. the method according to claim 1, wherein being identified described according to preset doubtful violation telephone number Model, before identifying doubtful violation telephone number in the telephone number to be detected, the method also includes:
Construct the doubtful violation telephone number identification model.
3. according to the method described in claim 2, it is characterized in that, the building doubtful violation telephone number identifies mould Type, comprising:
Pass through the communication signaling data acquisition training sample set in acquisition history call event;
Determine the fisrt feature index value collection of the training sample set;
Mathematical model is constructed based on the fisrt feature index value collection;
The accuracy rate of the mathematical model, recall ratio and preset threshold value are compared;
Do not reach preset threshold value corresponding to any of the accuracy rate and recall ratio, is carried out for the mathematical model excellent Change;
Reach preset threshold value corresponding to the accuracy rate and recall ratio, the mathematical model is determined as the doubtful violation phone Number Reorganization model.
4. according to the method described in claim 3, it is characterized in that, described construct mathematics based on the fisrt feature index value collection Model, comprising:
Determine that the fisrt feature index value collection is the input parameter of the mathematical model, the violation of telephone number is identified as described The output parameter of mathematical model, the violation mark is for identifying whether the telephone number is doubtful violation telephone number;
It is normalized for the input parameter;
Based on after normalization input parameter and the output parameter according to algorithm of support vector machine determine mathematical model.
5. according to the method described in claim 4, it is characterized in that, the input parameter and the output based on after normalization Parameter determines mathematical model according to algorithm of support vector machine, comprising:
Lagrangian is established according to formula 1:
Wherein, ai>=0 indicates Lagrange multiplier, xiInput parameter after indicating i-th of normalization, yiIndicate i-th of output ginseng Number;
Formula 2 and formula 3 are obtained for 0 to the partial derivative of ω and b respectively based on formula 1:
Formula 2 and formula 3 are substituted into formula 1 respectively and obtain formula 4:
Wherein, ai>=0,
By in the input parameter substitution formula 4 and formula 5 after the normalization, a is determinediMeet the vector x of formula 5 when >=0p, the formula 5 Are as follows:
ai·{[(ω·xi)+b]yi- 1 }=0 (5);
By the supporting vector xpSubstitution formula 5 obtains the value of threshold value b;
By the supporting vector xp, the threshold value b value and 2 substitution formula 6 of formula obtain support vector cassification anticipation function:The formula 6 are as follows:
G (x)=(ω x)+b (6).
6. according to the method described in claim 3, it is characterized in that, described by the accuracy rate of the mathematical model, recall ratio Before being compared with preset threshold value, the method also includes:
Obtain verifying sample set;
Based on the mathematical model, doubtful violation telephone number is tested in identification from the verifying sample set;
The accuracy rate of the mathematical model is calculated based on the doubtful violation telephone number of the test and the verifying sample set and is looked into Full rate.
7. method according to claim 1-6, which is characterized in that the fisrt feature index value collection is the first spy Levy the statistics value set of index, wherein the fisrt feature index includes:
The calling frequency, called number, the duration of call, ring duration, actively discharge number, passively release number, be called dispersion, Related coefficient, the identical ten thousand number sections maximum frequency of calling, caller accounting, call time separation standard between the called number of same caller Difference.
8. the method according to claim 1, wherein described determine mould according to preset telephone number danger level Type determines the danger level of identified doubtful violation telephone number, comprising:
Obtain the second feature index value collection of the identified doubtful violation telephone number;
After carrying out smooth transformation for the second feature index value collection, pressed based on the second feature index value collection after smooth transformation The danger level score value of the identified doubtful violation telephone number is calculated according to entropy algorithm;
Based on the danger level score value, according to K-Means clustering algorithm for the identified doubtful violation telephone number into Row classification;
The danger level of sorted identified doubtful violation telephone number is determined according to radar map analytic approach.
9. according to the method described in claim 8, it is characterized in that, the second feature index value collection is second feature index Count value set, wherein the second feature index, comprising: the calling frequency, called number, the duration of call, ring duration, master Dynamic release number, passively release number, called dispersion, same caller called number between related coefficient, complained number, quilt Mark number.
10. according to the method described in claim 9, it is characterized in that, it is described for the second feature index value collection carry out it is flat Sliding transformation, comprising:
Smooth transformation, the formula 7 are carried out to the second feature index value collection according to formula 7 are as follows:
xij'=log (xij+ 1) (7),
Wherein, xijIndicate that the second feature index value of i-th of identified doubtful violation telephone number concentrates j-th of feature Index value, xijThe second feature index value collection of i-th of identified doubtful violation telephone number after ' expression smooth transformation In j-th of characteristic index value.
11. according to the method described in claim 9, it is characterized in that, the second feature index value based on after smooth transformation Collection calculates the danger level score value of the identified doubtful violation telephone number according to entropy algorithm, comprising:
The second feature index value collection after the smooth transformation is normalized according to formula 8, the formula 8 are as follows:
Wherein, x 'ijThe second feature index value of i-th of identified doubtful violation telephone number after indicating smooth transformation is concentrated J-th of characteristic index value, min (x'j) indicate that j-th of index is in identified doubtful violation phone number in second feature index set Minimum value in code, max (x'j) indicate that j-th of index is in identified doubtful violation telephone number in second feature index set In maximum value, x "ijIndicate i-th of identified doubtful violation telephone number after smooth transformation, normalized Second feature index value concentrates j-th of characteristic index value;
The probability that each index occurs in the second feature index, the formula 9 are calculated according to formula 9 are as follows:
Wherein, m indicates the total number of identified doubtful violation telephone number;
The weight of each index in the second feature index, the formula 10 and the formula 11 are calculated according to formula 10 and formula 11 are as follows:
Wherein, wjIndicate that the weighted value of j-th of index in second feature index, n indicate the total number of second feature index, ejTable Show the entropy of j-th of index in second feature index;
The danger level score value of the identified doubtful violation telephone number, the formula 12 are calculated according to formula 12 are as follows:
Wherein, wjIndicate the weighted value of j-th of index in second feature index, x "ijIt indicates to pass through smooth transformation, normalized The second feature index value of the identified doubtful violation telephone number of i-th afterwards concentrates j-th of characteristic index value, FiIndicate the The danger level score value of i identified doubtful violation telephone numbers.
12. according to the method described in claim 8, it is characterized in that, described be based on the danger level score value, according to K-Means Clustering algorithm is classified for the identified doubtful violation telephone number, comprising:
Step 1 determines cluster number of clusters K based on preset classification number;
Step 2 determines classified sample set X (x based on the danger level score valuei)xi∈Rn, and the classified sample set is classified as one A cluster;
Step 3 randomly selects 2 cluster centre point μ in the cluster12∈Rn
Step 4, the affiliated cluster that each classification samples in the cluster are calculated according to formula 13, the formula 13 are as follows:
Cl=argmin | | xil||2, l=1,2 (13),
Wherein, xiPresentation class sample, μlIndicate cluster centre point;
Step 5, the center μ that each cluster is updated according to formula 14l, the formula 14 are as follows:
Wherein,Indicate first of cluster, t-th of classification samples, s indicates the classification samples sum in first of cluster;
Step 6 repeats step 4 and step 5, until distortion functionConvergence;
Step 7 calculates error sum of squares according to formula 15, chooses the maximum cluster of the error sum of squares as next division Cluster, the formula 15 are as follows:
Wherein, xiIndicate i-th of classification samples, ulFor the central point of first of cluster, z indicates the classification samples number of each cluster;
Step 8 repeats step 3 to step 7 until the number of cluster is K.
13. according to the method for claim 12, which is characterized in that it is described determine the classification according to radar map analytic approach after Identified doubtful violation telephone number danger level, comprising:
Determine the third feature index value of the sorted identified doubtful violation telephone number, the third feature index It include: caller average, dispersion average value, caller release average time, average ring duration, average call duration;
Based on the third feature index value, the corresponding telephone number of the cluster centre point is drawn according to radar map analytic approach Radar map;
The danger level of the sorted identified doubtful violation telephone number is determined according to the radar map.
14. according to the method described in claim 8, it is characterized in that, the danger level, comprising: high-risk number, middle danger number Code, low danger number, security number.
15. the method according to claim 1, wherein the method also includes:
Based on the danger level of the doubtful violation telephone number, interception and improvement side are targetedly formulated according to local policy Case.
16. a kind of discriminating device, comprising: network interface, memory and processor;
Wherein,
The network interface, during being configured to be received and sent messages between other ext nal network elements, the reception of signal and hair It send;
The memory is configured to the computer program that storage can be run on a processor;
The processor is configured to when running the computer program, and perform claim requires any one of 1 to 15 the method Step.
17. a kind of computer storage medium, the computer storage medium is stored with discriminating program, and the discriminating program is by least One processor realizes the step of any one of claim 1 to 15 the method when executing.
CN201810404735.9A 2018-04-28 2018-04-28 A kind of method of discrimination, equipment and the computer storage medium of telephone number danger level Pending CN110414543A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810404735.9A CN110414543A (en) 2018-04-28 2018-04-28 A kind of method of discrimination, equipment and the computer storage medium of telephone number danger level

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810404735.9A CN110414543A (en) 2018-04-28 2018-04-28 A kind of method of discrimination, equipment and the computer storage medium of telephone number danger level

Publications (1)

Publication Number Publication Date
CN110414543A true CN110414543A (en) 2019-11-05

Family

ID=68357447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810404735.9A Pending CN110414543A (en) 2018-04-28 2018-04-28 A kind of method of discrimination, equipment and the computer storage medium of telephone number danger level

Country Status (1)

Country Link
CN (1) CN110414543A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434214A (en) * 2020-11-03 2021-03-02 中国南方电网有限责任公司 Redis-based operation event pushing method
CN113361807A (en) * 2021-06-30 2021-09-07 中国电信股份有限公司 Number recognition model optimization method and device and electronic equipment
CN114067834A (en) * 2020-07-30 2022-02-18 ***通信集团有限公司 Bad preamble recognition method and device, storage medium and computer equipment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004077369A1 (en) * 2003-02-26 2004-09-10 Steven Anderson Real time mobile telephony system and process for remote payment, credit granting transactions
CN101902523A (en) * 2010-07-09 2010-12-01 中兴通讯股份有限公司 Mobile terminal and filtering method of short messages thereof
CN103049642A (en) * 2012-11-19 2013-04-17 浙江工商大学 Urban viaduct traffic flow detection method based on K-Means algorithm improved through entropy evaluation method and dynamic programming
CN104469025A (en) * 2014-11-26 2015-03-25 杭州东信北邮信息技术有限公司 Clustering-algorithm-based method and system for intercepting fraud phone in real time
CN105763713A (en) * 2016-01-19 2016-07-13 浙江鹏信信息科技股份有限公司 Harassing call intercepting method based on combination of Internet technology and communication technology
CN105809174A (en) * 2016-03-29 2016-07-27 北京小米移动软件有限公司 Method and device for identifying image
CN106255116A (en) * 2016-08-24 2016-12-21 王瀚辰 A kind of recognition methods harassing number
CN106791220A (en) * 2016-11-04 2017-05-31 国家计算机网络与信息安全管理中心 Prevent the method and system of telephone fraud
CN106954218A (en) * 2017-03-15 2017-07-14 中国联合网络通信集团有限公司 The number sorted methods, devices and systems of one kind harassing and wrecking
CN107273531A (en) * 2017-06-28 2017-10-20 百度在线网络技术(北京)有限公司 Telephone number classifying identification method, device, equipment and storage medium
CN107331385A (en) * 2017-07-07 2017-11-07 重庆邮电大学 A kind of identification of harassing call and hold-up interception method

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004077369A1 (en) * 2003-02-26 2004-09-10 Steven Anderson Real time mobile telephony system and process for remote payment, credit granting transactions
CN101902523A (en) * 2010-07-09 2010-12-01 中兴通讯股份有限公司 Mobile terminal and filtering method of short messages thereof
CN103049642A (en) * 2012-11-19 2013-04-17 浙江工商大学 Urban viaduct traffic flow detection method based on K-Means algorithm improved through entropy evaluation method and dynamic programming
CN104469025A (en) * 2014-11-26 2015-03-25 杭州东信北邮信息技术有限公司 Clustering-algorithm-based method and system for intercepting fraud phone in real time
CN105763713A (en) * 2016-01-19 2016-07-13 浙江鹏信信息科技股份有限公司 Harassing call intercepting method based on combination of Internet technology and communication technology
CN105809174A (en) * 2016-03-29 2016-07-27 北京小米移动软件有限公司 Method and device for identifying image
CN106255116A (en) * 2016-08-24 2016-12-21 王瀚辰 A kind of recognition methods harassing number
CN106791220A (en) * 2016-11-04 2017-05-31 国家计算机网络与信息安全管理中心 Prevent the method and system of telephone fraud
CN106954218A (en) * 2017-03-15 2017-07-14 中国联合网络通信集团有限公司 The number sorted methods, devices and systems of one kind harassing and wrecking
CN107273531A (en) * 2017-06-28 2017-10-20 百度在线网络技术(北京)有限公司 Telephone number classifying identification method, device, equipment and storage medium
CN107331385A (en) * 2017-07-07 2017-11-07 重庆邮电大学 A kind of identification of harassing call and hold-up interception method

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
周志华: ""机器学习"", 《清华大学出版社》 *
张大斌: "《数据挖掘与商务智能实验教程》", 31 January 2015 *
张立达等: "《上市公司大股东股权减持问题研究 基于盈余管理的视角》", 30 September 2017 *
张良均等: "《Python数据分析与挖掘实战》", 31 January 2016 *
张良均等著: ""Python数据分析与挖掘实战"", 《机械工业出版社》 *
辛阳等: ""大数据技术原理与实践"", 《北京邮电大学出版社》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067834A (en) * 2020-07-30 2022-02-18 ***通信集团有限公司 Bad preamble recognition method and device, storage medium and computer equipment
CN112434214A (en) * 2020-11-03 2021-03-02 中国南方电网有限责任公司 Redis-based operation event pushing method
CN113361807A (en) * 2021-06-30 2021-09-07 中国电信股份有限公司 Number recognition model optimization method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN110619535B (en) Data processing method and device
CN111932269B (en) Equipment information processing method and device
CN110414543A (en) A kind of method of discrimination, equipment and the computer storage medium of telephone number danger level
CN111028016A (en) Sales data prediction method and device and related equipment
CN110348471B (en) Abnormal object identification method, device, medium and electronic equipment
CN110930218A (en) Method and device for identifying fraudulent customer and electronic equipment
CN109242658B (en) Suspicious transaction report generation method, suspicious transaction report generation system, suspicious transaction report generation computer device and suspicious transaction report storage medium
CN111709603A (en) Service request processing method, device and system based on wind control
CN114169439A (en) Abnormal communication number identification method and device, electronic equipment and readable medium
CN114372884A (en) Risk identification method and risk identification device for transaction data
CN105991574A (en) Risk behavior monitoring method and apparatus thereof
CN114139931A (en) Enterprise data evaluation method and device, computer equipment and storage medium
CN112085588B (en) Method and device for determining safety of rule model and data processing method
KR102332997B1 (en) Server, method and program that determines the risk of financial fraud
CN114363082B (en) Network attack detection method, device, equipment and computer readable storage medium
CN113011503B (en) Data evidence obtaining method of electronic equipment, storage medium and terminal
CN114970495A (en) Name disambiguation method and device, electronic equipment and storage medium
CN114519520A (en) Model evaluation method, model evaluation device and storage medium
CN114742655A (en) Anti-money laundering behavior recognition system based on machine learning
CN110458707B (en) Behavior evaluation method and device based on classification model and terminal equipment
CN113486933A (en) Model training method, user identity information prediction method and device
CN112529303A (en) Risk prediction method, device, equipment and storage medium based on fuzzy decision
CN111160011A (en) Organization unit standardization method, device, equipment and storage medium
CN115378856B (en) Communication detection method, device and storage medium
CN110598799A (en) Target detection result evaluation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191105