CN101389085A - Rubbish short message recognition system and method based on sending behavior - Google Patents

Rubbish short message recognition system and method based on sending behavior Download PDF

Info

Publication number
CN101389085A
CN101389085A CNA2008102242531A CN200810224253A CN101389085A CN 101389085 A CN101389085 A CN 101389085A CN A2008102242531 A CNA2008102242531 A CN A2008102242531A CN 200810224253 A CN200810224253 A CN 200810224253A CN 101389085 A CN101389085 A CN 101389085A
Authority
CN
China
Prior art keywords
short message
content
calling number
hashed value
junk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2008102242531A
Other languages
Chinese (zh)
Other versions
CN101389085B (en
Inventor
张尼
张智江
张范
顾旻霞
贾川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Communication Co Ltd
Original Assignee
China United Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Communication Co Ltd filed Critical China United Communication Co Ltd
Priority to CN2008102242531A priority Critical patent/CN101389085B/en
Publication of CN101389085A publication Critical patent/CN101389085A/en
Application granted granted Critical
Publication of CN101389085B publication Critical patent/CN101389085B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention relates to a spam short message identification system and a method for the same based on the sending behavior. The method comprises: judging the type of the doubtful short message according to the hashed value, the length of the short message, and the amount of the short message having the same hashed value; recording the calling number that sends the short message according to the short message that belongs to the doubtful one, or, undergoing treatment respectively according to whether the calling number appears for the first time; if the amount of the short message of the type of the doubtful short message is accorded with the preset value, acquiring all the calling numbers associated with the type of the doubtful short message; if the difference between the amount of the different hashed value associated with all the calling numbers and the amount of the calling number is not greater than the preset value, the doubtful short message is spam short message. The invention can ensure that the spam short message received by the short message centre can be identified efficiently with real time, and implement real-time interception for the spam short message.

Description

Rubbish short message recognition system and method based on the behavior of transmission
Technical field
The present invention relates to the junk short message field, relate in particular to rubbish short message recognition system and method based on the behavior of transmission.
Background technology
In recent years, the situation that junk short message spreads unchecked grows in intensity, and almost each cellphone subscriber has been the invasion of junk short message.Investigation result according to the issue of China Internet association shows that China cellphone subscriber on average receives 8.29 junk short message weekly.
Junk short message can be divided into two kinds of patterns according to the mode of its transmission, a kind of is to utilize the short messaging gateway of mobile operator to send, when the user received short message, the transmission number of its demonstration was the Number for access of short message, rather than the phone number of domestic consumer.The junk short message that this kind method sends have speed fast, simple to operate, need characteristics such as operator's permission, short message types is in the majority with Commdity advertisement, service class.
Another kind is that the mobile phone card is inserted the mass-sending device, is connected on the computer by serial by serial port connecting wire, utilizes the mass-sending software on the computer to send (sending for the mass-sending device hereinafter to be referred as this mode) then.The mobile phone card (as M-ZONE, walk in the Divine Land etc.) that user or purchase need not to register is in a large number overdrawed, or just catches the weakness of preferential set meal to come mad short-message sending.This class mass-sending device can connect 16-20 simultaneously and can mass-send up to ten thousand short message in a short period of time with upper port, so operator often has little time to charge just by the malicious overdraft telephone expenses.When the user received short message, the transmission number of its demonstration was common phone number.The junk short message that the method sends has that number is many, speed soon, does not need characteristics such as operator's permission.In addition, during mass-sending, the junk short message flow is huge, certainly will take more Radio Resource, and for guaranteeing throughput, the junk short message sender can select a plurality of parallel transmissions of transmission point that are positioned at different base station usually.
Along with public's medium and the public opinion attention rate to junk short message constantly improves, mobile operator has strengthened utilizing short messaging gateway to send the renovation dynamics of junk short message, some simple, effective measures have been implemented, as strengthening contents supervision to the short message transmit port, in content of short message, add company's actual signature, improve and to utilize the port rate that send SMS message, close etc. complaining bigger port.
After above-mentioned measure was implemented, the phenomenon that sends junk short message by short messaging gateway obviously reduced.But still there is not effective governing measure at present for the phenomenon that the lawless person utilizes the mass-sending device to send junk short message.
Utilize the mass-sending device to send the means of junk short message at the lawless person, the present implementation method of operator mainly contains following four kinds of mechanism: content recognition, black and white lists, traffic statistics, ticket analysis:
1, content recognition mechanism
Relatively Chang Yong content recognition technology such as rule-based recognition methods promptly are provided with some rules, as getting the winning number in a bond, make a good deal of money etc., as long as meet these rules one or several, just think junk short message.Adopt rule-based recognition technology, advantage is that its principle and implementation are all comparatively simple, and application cost is lower.But its weak point is 1) rule all is artificial appointment, needs people constantly to go to find and sums up, upgrades, and pays bigger maintenance costs.2) to choose difficulty big for rule, only is difficult to judge the content legality of short message by the keyword coupling, therefore is easy to cause erroneous judgement; 3) the rubbish sender is easy to by using methods such as phonetic or homophone to walk around list of rules.
In addition, people have also adopted hashing technique, bayesian algorithm, support vector machine method etc.These methods can learning word frequency and pattern, can associate with junk short message and normal short message like this and judge.Above-mentioned is a kind of with respect to keyword, more complicated and more intelligentized content recognition technology, but its shortcoming is also comparatively obvious, for example speed is slow, need the user to bring in constant renewal in rule base or training set, along with junk short message producer's technical merit improves constantly, this method is easy to lose efficacy etc.
2, black and white lists mechanism
The black and white lists technology is to discern junk short message according to the phone number of transmit leg.To the short message that the number in the white list sends, system will not carry out any processing, but directly let pass; And any short message that sends for the number in the blacklist, system all will tackle, and forbids that it is descending.This method is simple, and is very little to the influence of original system, need not transform original short message service center substantially.This method needs real-time update white list and blacklist, and recognition capability is limited.
3, short message traffic statistics mechanism
The short message flow that some mobile phones or some SP unit interval send or receive is added up, in case find that this statistical value surpasses some threshold values and just reports to the police.If detect the individual or the service provider of a large amount of transmission junk short message, immediately it supervised.First method is to detect the short message number that sends in the unit interval.Detect the unit interval user number that sends SMS message and need dispose a counter for each user, short message counter of every transmission adds one automatically.The bar number reaches defined amount if send SMS message, and is then reported to the police automatically by counter.Second method is two short messages of detection blanking time.Promptly the transmission frequency of short message is monitored, too short at interval when two short messages, represent that promptly this user sends SMS message frequently, then report to the police automatically.
4, ticket analysis mechanisms
This mechanism is as the statistics source with the original bill files on the accounting server, adding up each number rising in the certain hour section exhales the information bar number and sends success rate, think that then this number is a suspicious user when surpassing certain thresholding, and submit to operating personnel and judge whether need this number is added blacklist.This machine-processed weak point is to have adopted the processed offline mode, send to bill record collection life period poor (above 15 minutes) from short message, and the lawless person can utilize this time difference, by the mode of massive duplication SIM card, sends ten hundreds of junk short message.
There is following shortcoming in above-mentioned recognition technology: 1) content recognition mechanism need be paid bigger maintenance costs, need the user to bring in constant renewal in rule base or training set, can not find New-type refuse vehicle short message feature, be difficult to discern the part variation of junk short message, and invade privacy of user; 2) black and white lists technology recognition capability is limited; 3) short message traffic statistics technology and ticket analytical technology real-time are relatively poor.
Summary of the invention
In order to solve above-mentioned technical problem, the invention provides rubbish short message recognition system and method based on the behavior of transmission, its purpose is that the sender who distinguishes short message is mass-sending device or normal users, satisfy the requirement of real-time, accuracy, and do not invade privacy of user.
The invention provides junk short message recognition methods, comprising based on the behavior of transmission:
Step 1 is calculated the hashed value of the short message receive, and the quantity of the short message with identical content that has sent according to this hashed value record and the length of content of short message;
Step 2, if the quantity of the short message with identical content that has sent reaches first threshold, and the length of content of short message then writes down the calling number of this short message greater than second threshold value, and writes down the corresponding relation of the hashed value of this calling number and this short message; Otherwise, if the calling number of short message occurs for the first time, then do not do any operation, if this calling number occurs the non-first time, then write down the corresponding relation of the hashed value of this calling number and this short message;
Step 3, if the length with identical content and content of short message that has sent reaches the 3rd threshold value greater than the quantity of the short message of second threshold value, then obtain this length all calling numbers greater than the short message correspondence of second threshold value with identical content and content of short message; If the difference of the quantity of the different hashed values of these all calling number correspondences and the quantity of these all calling numbers is not more than the 4th threshold value, judge that then this short message with identical content is a junk short message.
In the step 1, directory and contents table are set also;
Directory is used to write down the calling number of short message, and the hashed value set of the short message correspondence that sends of this calling number;
Contents table is used to write down the hashed value of short message, and the length of content of short message has the quantity of the short message of identical content, sends all calling numbers set of the short message with identical content;
The quantity of the hashed value of the short message that receives, the short message with identical content that sent and the length records of content of short message are in contents table.
In the step 1, the hashed value of calculating the short message that receives comprises:
If the length of the content of short message that receives, is then determined the position of first Chinese character of short message of receiving and the position of last Chinese character greater than second threshold value, and according to calculating hashed value from first Chinese character to a last content that Chinese character comprised; Perhaps
If the length of the content of short message that receives is less than or equal to second threshold value, then directly the content of the short message that receives is calculated hashed value.
Step 2 comprises:
Step 41 judges whether to meet the following conditions: the quantity of the short message with identical content that has sent reaches first threshold, and the length of content of short message is greater than second threshold value; If, execution in step 42, otherwise execution in step 43;
Step 42 writes directory with the calling number of the short message that receives: if this calling number occurs for the first time, then preserve the hashed value of this calling number and this short message correspondence in directory, and this calling number is recorded in the contents table; If this number exists, then relatively in the hashed value of this short message correspondence and the directory the corresponding hashed value of this calling number whether identical: if different, then the hashed value with this short message correspondence is recorded in the directory, and with this number record in contents table, if identical, then do not do any operation;
Step 43 if the calling number of the short message that receives occurs for the first time, is not then done any operation; If this calling number occurs the non-first time, whether the corresponding hashed value of this calling number is identical in the hashed value of the short message correspondence that then relatively receives and the directory: if different, then the hashed value with this short message correspondence is recorded in the directory, if identical, does not then do any operation.
Contents table also is used to write down the short message generic attribute, and the short message generic attribute comprises suspicious short message class, junk short message class, and normal short message class.
In the step 2, if the quantity of the short message with identical content that has sent reaches first threshold, and the length of content of short message is greater than second threshold value, and then this short message generic attribute with short message correspondence of identical content is labeled as suspicious short message class in contents table.
In the step 3, also the short message generic attribute with the hashed value correspondence of junk short message is labeled as the junk short message class.
Also comprise step 4, the calling number of junk short message is sent to short message service center, be used for filtering junk short message for short message service center.
Also comprise step 5, the normal short message class in the table that regularly clears contents.
The invention provides rubbish short message recognition system, comprising based on the behavior of transmission:
Send content processing module, be used to calculate the hashed value of the short message that receives, and write down the quantity of the short message that has sent and the length of content of short message with identical content according to this hashed value; If the quantity of the short message with identical content that has sent reaches first threshold, and the length of content of short message then writes down the calling number of this short message greater than second threshold value, and writes down the corresponding relation of the hashed value of this calling number and this short message; Otherwise, if the calling number of short message occurs for the first time, then do not do any operation, if this calling number occurs the non-first time, then write down the corresponding relation of the hashed value of this calling number and this short message;
The send mode statistical module, be used for when the length with identical content and content of short message that has sent reaches the 3rd threshold value greater than the quantity of the short message of second threshold value, then obtaining this length all calling numbers greater than the short message correspondence of second threshold value with identical content and content of short message; If the difference of the quantity of the different hashed values of these all calling number correspondences and the quantity of these all calling numbers is not more than the 4th threshold value, judge that then this short message with identical content is a junk short message.
Send content processing module, also be used to be provided with directory and contents table;
Directory is used to write down the calling number of short message, and the hashed value set of the short message correspondence that sends of this calling number;
Contents table is used to write down the hashed value of short message, and the length of content of short message has the quantity of the short message of identical content, sends all calling numbers set of the short message with identical content;
The quantity of the hashed value of the short message that receives, the short message with identical content that sent and the length records of content of short message are in contents table.
The hashed value of the short message that calculating receives comprises:
If the length of the content of short message that receives, is then determined the position of first Chinese character of short message of receiving and the position of last Chinese character greater than second threshold value, and according to calculating hashed value from first Chinese character to a last content that Chinese character comprised; Perhaps
If the length of the content of short message that receives is less than or equal to second threshold value, then directly the content of the short message that receives is calculated hashed value.
Send content processing module, also be used to judge whether to meet the following conditions: the quantity of the short message with identical content that has sent reaches first threshold, and the length of content of short message is greater than second threshold value;
If: the calling number of the short message that receives is write directory, if this calling number occurs for the first time, then in directory, preserve the hashed value of this calling number and this short message correspondence, and this calling number is recorded in the contents table; If this calling number exists, then relatively in the hashed value of this short message correspondence and the directory the corresponding hashed value of this calling number whether identical: if different, then the hashed value with this short message correspondence is recorded in the directory, and this calling number is recorded in the contents table, if identical, then do not do any operation;
Otherwise: if the calling number of the short message that receives occurs for the first time, then do not do any operation; If this calling number occurs the non-first time, whether the corresponding hashed value of this calling number is identical in the hashed value of the short message correspondence that then relatively receives and the directory: if different, then the hashed value with this short message correspondence is recorded in the directory, if identical, does not then do any operation.
Contents table also is used to write down the short message generic attribute, and the short message generic attribute comprises suspicious short message class, junk short message class, and normal short message class.
Send content processing module, also be used for reaching first threshold in the quantity of the short message that has sent with identical content, and the length of content of short message is during greater than second threshold value, and the short message generic attribute that this is had the short message correspondence of identical content in contents table is labeled as suspicious short message class.
The send mode statistical module also is used in contents table the short message generic attribute of the hashed value correspondence of junk short message being labeled as the junk short message class.
Also comprise the calling number sending module, be used for the calling number of this junk short message is sent to short message service center, filter junk short message for short message service center.
Also comprise administration module, be used for the normal short message class that regularly clears contents and show.
The present invention also provides a kind of mobile communication system, comprises short message service center, based on the rubbish short message recognition system of the behavior of transmission;
Described based on the rubbish short message recognition system bypass that sends behavior in short message service center, perhaps be arranged in the short message service center.
The present invention can guarantee that the junk short message that short message service center receives is discerned real-time and efficiently, and can realize the real-time blocking to junk short message.
Description of drawings
Fig. 1 is a rubbish short message recognition system structure chart provided by the invention;
Fig. 2 is a data structure provided by the invention;
Fig. 3 is junk short message identification process figure provided by the invention;
Fig. 4 is a kind of network structure provided by the invention.
Embodiment
Short message communication between normal users has randomness, characteristics such as independence.Be embodied in,
1) content of short message of same calling number generation has randomness.Each communication, these calling numbers can produce the short message of different length, different content.
2) content of short message and the length of different calling number generations generally have nothing in common with each other.
For realizing high-throughput, the short message that uses the mass-sending device to send has content usually and length is fixed, and each recipient is only sent inferior characteristics.Be embodied in:
The rubbish sender often uses a plurality of calling numbers, and these numbers only produce the short message of regular length or content.
In the present invention, the transmission behavior is analyzed to be further divided into send content analysis and send mode is analyzed two parts.
1) junk short message that sends in a period of time of same rubbish sender has convention, can form the structure of height cluster; When carrying out statistical analysis after the cluster at object can no longer be the short message of magnanimity, but the less short message class of scale is labeled as suspicious short message class with duplicate short message quantity above setting first threshold and the content of short message length short message class greater than second threshold value.
2) the calling number send mode that follow-up branch is gone into the short message correspondence in the suspicious short message class is added up; If the short message total quantity of divide in the suspicious short message class surpasses the 3rd threshold value, and the hashed value quantity that all calling numbers of this short message class correspondence produce is not more than the 4th threshold value with the difference of calling number quantity, and then the affiliated class of short message is the junk short message class.
Rubbish short message recognition system among the present invention mainly is divided into four parts, as shown in Figure 1: send content processing module 101, send mode statistical module 102, calling number sending module 103, administration module 104.Send content processing module 101 and use effective hashing algorithm, original short message is converted into the hashed value that is easy to computing and storage, and write in the contents table, and utilize the hashed value comparative result that the short message flow is classified, first threshold that duplicate short message quantity surpass to be set and content of short message length are suspicious short message class greater than the short message class of second threshold value; The calling number transmission behavior of 102 pairs of follow-up short messages of suspicious class of send mode statistical module is added up, with the identification junk short message.Calling number sending module 103 extracts the calling number of junk short message class correspondence, and calling number is sent to short message service center, is used for filtering junk short message for short message service center.The dispensable for the present invention module of calling number sending module 103 and administration module 104, transmission content processing module 101 and send mode statistical module 102 just can be realized the identification to junk short message.
Handle in the short message process, the present invention needs frequently a large amount of calling numbers and the hashed value information of storing in the internal memory to be retrieved, compared, and constantly eliminates the calling number and the hashed value of normal short message class.For supporting aforesaid operations, the invention provides a sets of data structure, it is made up of contents table and directory two parts, as shown in Figure 2.
(1) contents table C is responsible for preservation, retrieval and the organization work of short message hashed value with hash table form tissue.Corresponding short message class in each unit in the table comprises following field: duplicate short message quantity V, short message generic attribute VI in the hashed value III of content of short message correspondence, content of short message length IV, this short message class (rubbish, suspicious, normal three kinds of values are arranged), send all calling numbers set VII of this short message.
(2) directory N is responsible for the statistical work of short message send mode with hash table form tissue.Each unit correspondence two parts in the table: the 1) descriptor of calling number comprises calling number value I; 2) the short message hashed value set II of calling number generation.
From the above, directory is related each other with contents table.Can obtain the dialing number information of all short message correspondences in this short message class by contents table; Can obtain all content of short message information that this number sends by directory.
A kind of categorical data is arranged in the directory: the calling number of short message, only work as certain short message and fall into suspicious short message time-like, extract its calling number and write directory.
A kind of data type is arranged: short message is carried out the hash computing, and the length of hashed value and short message is added contents table in the contents table.
Send content processing module 101 real time scan short message flows, the short message that content is repeated gathers into a class, and number of short in the statistics class, sets first threshold f if number of short surpasses 0, and the length of content of short message is greater than second threshold value, and then this short message is suspicious short message class.
Distinguishing the common way of discerning junk short message according to content repeatability is that content of short message is carried out the hash computing, generates a hashed value, finishes operations such as comparison, computing then with this value.Compare with black and white lists mechanism with keyword recognition, the method sends content with a plurality of calling numbers and carries out association analysis, has effect preferably aspect discrimination and the real-time performance.
Below content of short message being repeated identification problem is described: regard the body part (being designated hereinafter simply as short message) of an envelope short message as byte sequence M=b that length is x 1b 2B x, the length of M is designated as length (M).As an aspect of research short message cluster character, what be concerned about is given k envelope short message, and whether its content exists repetition.
Therefore, a kind of feasible method be successively relatively in the short message each byte sequence whether identical, be to improve relative efficiency, preserve the content of short message of visiting with data structure T.Run into the new short message of an envelope, at first with T in element relatively, if not therein, then it is added among the T, otherwise abandons this input, and the quantity of the identical short message of recorded content.Obviously, for guaranteeing to finish operation and minimizing memory costs such as retrieval, comparison, statistics fast, guarantee the availability of algorithm, it is the most natural method that T is organized into a hash table.
Ashing technique has two kinds usually, and a kind of is that whole content of short message is done hash, and an envelope content of short message correspondence a hashed value, and this method is effective to the short hash object of length; Another kind method is that several byte subsequences of content of short message are cooked hash, the set that an envelope short message correspondence a hashed value, and this method is more effective to the bigger hash object of length.Consider content of short message length less (maximum length is 140 bytes), for guaranteeing readability, the content of short message that sends in a period of time can change at random, so select first kind of ashing technique in the present invention for use.The codomain that will guarantee hashed value simultaneously is enough big, can the original short message of unique representative.If two hashed values do not wait, then their the representative original short message difference; If two hashed values are identical, then the different probability of the original short message of their representatives is minimum.
But from recent statistical conditions, the junk short message content that most of mass-sending device produces has following feature:
1) length is generally greater than 80 bytes.
2) the rubbish sender adds random character at head, the afterbody of short message, guarantees that every envelope content of short message has the deviation of several bytes.
Rubbish sender's main purpose is to allow the user can read their short message, and content of short message is extremely short, so the rubbish sender can only carry out limited modification to content of short message.Usual way is exactly to add the character string (as letter, numeral) that can increase automatically in the rostral-caudal of short message, if completely random produces content of short message, nobody can know which kind of content short message expresses, and junk short message sender's purpose does not reach so.
Obviously, whole content of short message is done the method for hash and can't be handled this type of short message, will propose following thinking head it off among the present invention.
1) for the short message of length, finish following operation greater than the second threshold k byte:
When an envelope short message arrives, at first whole content of short message is handled, determine the position s of first Chinese character of short message and the end position e of last Chinese character, make up one with the content of short message subclass that position s begins, position e finishes, and it is carried out hash calculate its corresponding hashed value.The query contents table if this hashed value occurs for the first time, is then preserved this value in contents table, promptly set up a new short message class in contents table, and number of short in such is designated as 1; If the short message class of this hashed value correspondence Already in the contents table, then increases 1 with number of short in such;
2) be less than or equal to the short message of K byte for length, directly it carried out the hash computing, obtain its corresponding hashed value.The query contents table if this hashed value occurs for the first time, is then preserved this value in contents table, promptly set up a new short message class in contents table, and number of short in such is designated as 1; If the short message class of this hashed value correspondence Already in the contents table, then increases 1 with number of short in such.
If the first threshold f that number of short surpass to be set in certain short message class in the contents table 0, and the length of short message is greater than the K byte, and then this short message class is labeled as suspicious short message class.
In the present invention, use hashing algorithm to be the MD5 algorithm, the K value is 80 bytes, for the short message of length smaller or equal to 80 bytes, hash to as if whole short message.For short message greater than 80 bytes, hash to as if from first Chinese character to a last content of short message that Chinese character comprised, what preserve in the contents table is corresponding hashed value.
If current short message is divided into suspicious short message class, the calling number of this short message correspondence is write directory, if this number occurs for the first time, then in directory, preserve this value, the short message hashed value of this number correspondence is write hashed value set field in the directory, simultaneously with the set of numbers field of this number record in contents table short message class.If this number exists, then whether the hashed value that has existed in the hashed value of more current short message and the directory memory cell is identical, if it is different, then new hashed value is added in the corresponding memory cell, and with the calling number set field of this number record in contents table short message class; Otherwise do not do any operation.
If current short message is not divided into suspicious short message class,, then do not do any operation if this calling number occurs for the first time; If this calling number occurs non-for the first time, whether the hashed value that has existed in the hashed value of more current short message and the directory is identical, if different, then in the memory cell with new hashed value adding correspondence, if identical, do not do any operation.
Send mode statistical module 102 is used to discern junk short message.
If duplicate short message quantity surpasses the 3rd threshold value f in certain suspicious short message class 1, then send mode statistical module 102 obtains corresponding all the calling number set of this short message class, in directory these calling numbers is added up, if the hashed value quantity that these numbers produce and the difference of calling number quantity are not more than the 4th threshold value f 2, then should suspicious class be labeled as the junk short message class, and all calling numbers of this short message class were submitted to calling number sending module 103.
Calling number sending module 103 is sent to short message service center with calling number, is used for filtering junk short message for short message service center.
Administration module 104 will regularly be deleted the normal short message class in the contents table, to guarantee the availability of internal memory.Usually, administration module 104 is in idle state, every one-period t, and normal short message class in the automatic scavenge system of administration module.
The present invention can guarantee that the junk short message that short message service center receives is discerned real-time and efficiently, classified and handles, and realizes the real-time blocking to junk short message.The present invention is at first according to a large amount of features that repeat of mass-sending junk short message, designs the effective hashing algorithm content that saves short message, and on this basis the short message flow carried out the poly-classification of content, makes the ONLINE RECOGNITION junk short message become possibility.The present invention further uses calling number to send behavioural information, thereby effectively the junk short message that device produces is mass-sended in identification, and the short message that certain body and function family is sent in batches can not produce wrong report.
Flow process provided by the invention as shown in Figure 3, parameter f wherein 0Value is 100, parameter f 1Value is 1000, parameter f 2Value is 0 (also can get other value, for example 1,2 ,-1 ,-2 etc.), comprises the steps:
Step 301, initialization makes up the contents table of storage hashed value and the directory of number storing information.Receive a new short message.
Whether step 302 judges short message length greater than the K byte, if execution in step 303, otherwise execution in step 304.
Step 303, determine short message first, last Chinese character position, execution in step 304.
Step 304 is less than or equal to the short message of K byte for short message length, directly calculates the hashed value of received new short message; For the short message of short message length, the content that this short message comprises from first Chinese character to a last Chinese character is calculated hashed value greater than the K byte.
Step 305 judges whether hashed value is present in the contents table, if, execution in step 306, otherwise execution in step 307;
Step 306 is revised contents table, and the number of times that hashed value is occurred increases by 1, execution in step 308;
Step 307 adds contents table with hashed value, and the number of times that hashed value is occurred is designated as 1;
Step 308 judges that the hashed value occurrence number is greater than f 0, whether short message length is set up simultaneously greater than the K byte simultaneously, if execution in step 309, otherwise execution in step 312;
Step 309 judges whether this number occurs for the first time, if execution in step 310, otherwise execution in step 311;
Step 310 writes directory with calling number and corresponding hashed value, and the calling number that calling number is inserted in the contents table is gathered field, execution in step 314;
Step 311 writes directory with the hashed value that does not write down in directory, the calling number that calling number is inserted in the contents table is gathered field, execution in step 314;
Step 312 judges whether this number occurs for the first time, if execution in step 317, otherwise execution in step 313;
Step 313 writes directory with the hashed value that does not write down in directory, execution in step 317;
Step 314 is if duplicate short message quantity surpasses threshold value f in certain short message class 1, then forward step 315 to; Otherwise forward step 317 to;
Step 315 judges that the hashed value quantity of all calling numbers generations and the difference of calling number quantity are not more than the 4th threshold value, if above-mentioned condition is set up execution in step 316, otherwise execution in step 317;
Step 316, the short message class of then judging this short message correspondence is the junk short message class, and the calling number of this short message correspondence is sent to short message service center, is used for filtering junk short message, execution in step 317 for short message service center;
Step 317 finishes current short message work of treatment, prepares to receive next envelope short message.
Fig. 4 has described network configuration of the present invention, and the rubbish short message recognition system among the present invention is connected (being that the rubbish short message recognition system bypass is in short message service center) as independent network element with short message service center.Rubbish short message recognition system can obtain the mirror image of short message flow in the short message service center from short message service center, does not influence short message service center's normal handling work; Simultaneously, in case after finding junk short message, rubbish short message recognition system can pass to short message service center with the calling number of junk short message correspondence, the junk short message of the follow-up transmission of device is mass-sended in interception in time.
Certainly, the system among the present invention also can be embodied directly in the form of software module in short message service center, and the short message flow of process is discerned.
Those skilled in the art can also carry out various modifications to above content under the condition that does not break away from the definite the spirit and scope of the present invention of claims.Therefore scope of the present invention is not limited in above explanation, but determine by the scope of claims.

Claims (19)

1, based on the junk short message recognition methods of the behavior of transmission, it is characterized in that, comprising:
Step 1 is calculated the hashed value of the short message receive, and the quantity of the short message with identical content that has sent according to this hashed value record and the length of content of short message;
Step 2, if the quantity of the short message with identical content that has sent reaches first threshold, and the length of content of short message then writes down the calling number of this short message greater than second threshold value, and writes down the corresponding relation of the hashed value of this calling number and this short message; Otherwise, if the calling number of short message occurs for the first time, then do not do any operation, if this calling number occurs the non-first time, then write down the corresponding relation of the hashed value of this calling number and this short message;
Step 3, if the length with identical content and content of short message that has sent reaches the 3rd threshold value greater than the quantity of the short message of second threshold value, then obtain this length all calling numbers greater than the short message correspondence of second threshold value with identical content and content of short message; If the difference of the quantity of the different hashed values of these all calling number correspondences and the quantity of these all calling numbers is not more than the 4th threshold value, judge that then this short message with identical content is a junk short message.
2, junk short message recognition methods as claimed in claim 1 is characterized in that, in the step 1, directory and contents table is set also;
Directory is used to write down the calling number of short message, and the hashed value set of the short message correspondence that sends of this calling number;
Contents table is used to write down the hashed value of short message, and the length of content of short message has the quantity of the short message of identical content, sends all calling numbers set of the short message with identical content;
The quantity of the hashed value of the short message that receives, the short message with identical content that sent and the length records of content of short message are in contents table.
3, junk short message recognition methods as claimed in claim 2 is characterized in that, in the step 1, the hashed value of calculating the short message that receives comprises:
If the length of the content of short message that receives, is then determined the position of first Chinese character of short message of receiving and the position of last Chinese character greater than second threshold value, and according to calculating hashed value from first Chinese character to a last content that Chinese character comprised; Perhaps
If the length of the content of short message that receives is less than or equal to second threshold value, then directly the content of the short message that receives is calculated hashed value.
4, junk short message recognition methods as claimed in claim 2 is characterized in that step 2 comprises:
Step 41 judges whether to meet the following conditions: the quantity of the short message with identical content that has sent reaches first threshold, and the length of content of short message is greater than second threshold value; If, execution in step 42, otherwise execution in step 43;
Step 42 writes directory with the calling number of the short message that receives: if this calling number occurs for the first time, then preserve the hashed value of this calling number and this short message correspondence in directory, and this calling number is recorded in the contents table; If this number exists, then relatively in the hashed value of this short message correspondence and the directory the corresponding hashed value of this calling number whether identical: if different, then the hashed value with this short message correspondence is recorded in the directory, and with this number record in contents table, if identical, then do not do any operation;
Step 43 if the calling number of the short message that receives occurs for the first time, is not then done any operation; If this calling number occurs the non-first time, whether the corresponding hashed value of this calling number is identical in the hashed value of the short message correspondence that then relatively receives and the directory: if different, then the hashed value with this short message correspondence is recorded in the directory, if identical, does not then do any operation.
5, junk short message recognition methods as claimed in claim 2 is characterized in that, contents table also is used to write down the short message generic attribute, and the short message generic attribute comprises suspicious short message class, junk short message class, and normal short message class.
6, junk short message recognition methods as claimed in claim 5, it is characterized in that, in the step 2, if the quantity of the short message with identical content that has sent reaches first threshold, and the length of content of short message is greater than second threshold value, and then this short message generic attribute with short message correspondence of identical content is labeled as suspicious short message class in contents table.
7, junk short message recognition methods as claimed in claim 6 is characterized in that, in the step 3, also the short message generic attribute with the hashed value correspondence of junk short message is labeled as the junk short message class.
8, junk short message recognition methods as claimed in claim 1 is characterized in that, also comprises step 4, and the calling number of junk short message is sent to short message service center, is used for filtering junk short message for short message service center.
9, junk short message recognition methods as claimed in claim 5 is characterized in that, also comprises step 5, the normal short message class in the table that regularly clears contents.
10, based on the rubbish short message recognition system of the behavior of transmission, it is characterized in that, comprising:
Send content processing module, be used to calculate the hashed value of the short message that receives, and write down the quantity of the short message that has sent and the length of content of short message with identical content according to this hashed value; If the quantity of the short message with identical content that has sent reaches first threshold, and the length of content of short message then writes down the calling number of this short message greater than second threshold value, and writes down the corresponding relation of the hashed value of this calling number and this short message; Otherwise, if the calling number of short message occurs for the first time, then do not do any operation, if this calling number occurs the non-first time, then write down the corresponding relation of the hashed value of this calling number and this short message;
The send mode statistical module, be used for when the length with identical content and content of short message that has sent reaches the 3rd threshold value greater than the quantity of the short message of second threshold value, then obtaining this length all calling numbers greater than the short message correspondence of second threshold value with identical content and content of short message; If the difference of the quantity of the different hashed values of these all calling number correspondences and the quantity of these all calling numbers is not more than the 4th threshold value, judge that then this short message with identical content is a junk short message.
11, the rubbish short message recognition system based on the behavior of transmission as claimed in claim 10 is characterized in that, sends content processing module, also is used to be provided with directory and contents table;
Directory is used to write down the calling number of short message, and the hashed value set of the short message correspondence that sends of this calling number;
Contents table is used to write down the hashed value of short message, and the length of content of short message has the quantity of the short message of identical content, sends all calling numbers set of the short message with identical content;
The quantity of the hashed value of the short message that receives, the short message with identical content that sent and the length records of content of short message are in contents table.
12, the rubbish short message recognition system based on the behavior of transmission as claimed in claim 11 is characterized in that, the hashed value of calculating the short message that receives comprises:
If the length of the content of short message that receives, is then determined the position of first Chinese character of short message of receiving and the position of last Chinese character greater than second threshold value, and according to calculating hashed value from first Chinese character to a last content that Chinese character comprised; Perhaps
If the length of the content of short message that receives is less than or equal to second threshold value, then directly the content of the short message that receives is calculated hashed value.
13, the rubbish short message recognition system based on the behavior of transmission as claimed in claim 11 is characterized in that,
Send content processing module, also be used to judge whether to meet the following conditions: the quantity of the short message with identical content that has sent reaches first threshold, and the length of content of short message is greater than second threshold value;
If: the calling number of the short message that receives is write directory, if this calling number occurs for the first time, then in directory, preserve the hashed value of this calling number and this short message correspondence, and this calling number is recorded in the contents table; If this calling number exists, then relatively in the hashed value of this short message correspondence and the directory the corresponding hashed value of this calling number whether identical: if different, then the hashed value with this short message correspondence is recorded in the directory, and this calling number is recorded in the contents table, if identical, then do not do any operation;
Otherwise: if the calling number of the short message that receives occurs for the first time, then do not do any operation; If this calling number occurs the non-first time, whether the corresponding hashed value of this calling number is identical in the hashed value of the short message correspondence that then relatively receives and the directory: if different, then the hashed value with this short message correspondence is recorded in the directory, if identical, does not then do any operation.
14, the rubbish short message recognition system based on the behavior of transmission as claimed in claim 11 is characterized in that contents table also is used to write down the short message generic attribute, and the short message generic attribute comprises suspicious short message class, junk short message class, and normal short message class.
15, the rubbish short message recognition system based on the behavior of transmission as claimed in claim 14, it is characterized in that, send content processing module, also be used for reaching first threshold in the quantity of the short message that has sent with identical content, and the length of content of short message is during greater than second threshold value, and the short message generic attribute that this is had the short message correspondence of identical content in contents table is labeled as suspicious short message class.
16, as claimed in claim 15ly it is characterized in that the send mode statistical module also is used in contents table the short message generic attribute of the hashed value correspondence of junk short message being labeled as the junk short message class based on the rubbish short message recognition system that sends behavior.
17, the rubbish short message recognition system based on the behavior of transmission as claimed in claim 10, it is characterized in that, also comprise: the calling number sending module, be used for the calling number of this junk short message is sent to short message service center, filter junk short message for short message service center.
18, the rubbish short message recognition system based on the behavior of transmission as claimed in claim 14 is characterized in that, also comprises administration module, is used for the normal short message class that regularly clears contents and show.
19, a kind of mobile communication system comprises short message service center, it is characterized in that, also comprises the rubbish short message recognition system based on the behavior of transmission as claimed in claim 10;
Described based on the rubbish short message recognition system bypass that sends behavior in short message service center, perhaps be arranged in the short message service center.
CN2008102242531A 2008-10-14 2008-10-14 Rubbish short message recognition system and method based on sending behavior Active CN101389085B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008102242531A CN101389085B (en) 2008-10-14 2008-10-14 Rubbish short message recognition system and method based on sending behavior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008102242531A CN101389085B (en) 2008-10-14 2008-10-14 Rubbish short message recognition system and method based on sending behavior

Publications (2)

Publication Number Publication Date
CN101389085A true CN101389085A (en) 2009-03-18
CN101389085B CN101389085B (en) 2012-03-21

Family

ID=40478201

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008102242531A Active CN101389085B (en) 2008-10-14 2008-10-14 Rubbish short message recognition system and method based on sending behavior

Country Status (1)

Country Link
CN (1) CN101389085B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096703A (en) * 2010-12-29 2011-06-15 北京新媒传信科技有限公司 Filtering method and equipment of short messages
CN102368842A (en) * 2011-10-12 2012-03-07 中国联合网络通信集团有限公司 Detection method of abnormal behavior of mobile terminal and detection system thereof
CN102982048A (en) * 2011-09-07 2013-03-20 百度在线网络技术(北京)有限公司 Method and device for assessing junk information mining rule
WO2015054993A1 (en) * 2013-10-18 2015-04-23 中兴通讯股份有限公司 Method and device for processing spam information
CN106452856A (en) * 2016-09-28 2017-02-22 杭州鸿雁智能科技有限公司 Traffic flow statistics method and device, and wireless access equipment with traffic flow statistics function
CN106454818A (en) * 2015-08-06 2017-02-22 ***通信集团四川有限公司 Data information service credit control method and data information service credit control device
WO2018113551A1 (en) * 2016-12-23 2018-06-28 阿里巴巴集团控股有限公司 Identification method and device, and anti-junk content system
CN108684032A (en) * 2018-03-30 2018-10-19 广东欧珀移动通信有限公司 Intercept setting method and relevant device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008053426A1 (en) * 2006-10-31 2008-05-08 International Business Machines Corporation Identifying unwanted (spam) sms messages
CN101257671B (en) * 2007-07-06 2010-12-08 浙江大学 Method for real time filtering large scale rubbish SMS based on content
CN100576940C (en) * 2007-08-01 2009-12-30 浙江大学 Short message monitoring center and method for supervising
CN101150762A (en) * 2007-11-06 2008-03-26 ***通信集团江苏有限公司 A spam real time interception method and system

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096703A (en) * 2010-12-29 2011-06-15 北京新媒传信科技有限公司 Filtering method and equipment of short messages
CN102096703B (en) * 2010-12-29 2013-06-12 北京新媒传信科技有限公司 Filtering method and equipment of short messages
CN102982048A (en) * 2011-09-07 2013-03-20 百度在线网络技术(北京)有限公司 Method and device for assessing junk information mining rule
CN102368842A (en) * 2011-10-12 2012-03-07 中国联合网络通信集团有限公司 Detection method of abnormal behavior of mobile terminal and detection system thereof
CN102368842B (en) * 2011-10-12 2013-03-20 中国联合网络通信集团有限公司 Detection method of abnormal behavior of mobile terminal and detection system thereof
WO2015054993A1 (en) * 2013-10-18 2015-04-23 中兴通讯股份有限公司 Method and device for processing spam information
CN106454818A (en) * 2015-08-06 2017-02-22 ***通信集团四川有限公司 Data information service credit control method and data information service credit control device
CN106452856A (en) * 2016-09-28 2017-02-22 杭州鸿雁智能科技有限公司 Traffic flow statistics method and device, and wireless access equipment with traffic flow statistics function
WO2018113551A1 (en) * 2016-12-23 2018-06-28 阿里巴巴集团控股有限公司 Identification method and device, and anti-junk content system
CN108684032A (en) * 2018-03-30 2018-10-19 广东欧珀移动通信有限公司 Intercept setting method and relevant device
CN108684032B (en) * 2018-03-30 2021-05-18 Oppo广东移动通信有限公司 Interception setting method and related equipment

Also Published As

Publication number Publication date
CN101389085B (en) 2012-03-21

Similar Documents

Publication Publication Date Title
CN101389085B (en) Rubbish short message recognition system and method based on sending behavior
CN101335920B (en) Rubbish short message recognition system and method based on calling number location and transmitted content
CN108881265B (en) Network attack detection method and system based on artificial intelligence
CN108833186B (en) Network attack prediction method and device
CN103415004B (en) A kind of method and device detecting junk short message
CN101686444B (en) System and method for detecting spam SMS sender number in real time
CN103763690A (en) Method and device for sending short messages to mobile terminal from detection fake base station
CN108833185B (en) Network attack route restoration method and system
CN106534463B (en) Strange call processing method and device, terminal and server
CN101860822A (en) Method and system for monitoring spam messages
CN104717674A (en) Number attribute recognition method and device, terminal and server
CN103067896A (en) Junk short message filtering method and device
WO2016082568A1 (en) Short message safe processing method and apparatus
CN103391547A (en) Information processing method and terminal
CN103957516A (en) Junk short message filtering method and engine
CN101909261A (en) Method and system for monitoring spam
CN102438205B (en) Method and system for pushing service based on action of mobile user
CN113412607B (en) Content pushing method and device, mobile terminal and storage medium
CN102368842A (en) Detection method of abnormal behavior of mobile terminal and detection system thereof
CN108366052A (en) Verify the processing method and system of short message
CN110113748B (en) Crank call monitoring method and device
KR20170006158A (en) System and method for detecting fraud usage of message
CN102905236B (en) A kind of junk short message monitoring method, Apparatus and system
CN106936807A (en) A kind of recognition methods of malicious operation and device
CN106899947A (en) Short message method for cleaning and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant