CN112380323A - Junk information removing system and method based on Chinese word segmentation recognition technology - Google Patents
Junk information removing system and method based on Chinese word segmentation recognition technology Download PDFInfo
- Publication number
- CN112380323A CN112380323A CN202011391134.2A CN202011391134A CN112380323A CN 112380323 A CN112380323 A CN 112380323A CN 202011391134 A CN202011391134 A CN 202011391134A CN 112380323 A CN112380323 A CN 112380323A
- Authority
- CN
- China
- Prior art keywords
- short message
- short
- module
- messages
- short messages
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 28
- 238000005516 engineering process Methods 0.000 title claims abstract description 26
- 238000000034 method Methods 0.000 title claims abstract description 15
- 238000012795 verification Methods 0.000 claims abstract description 52
- 238000007781 pre-processing Methods 0.000 claims abstract description 36
- 238000012216 screening Methods 0.000 claims abstract description 8
- 238000013500 data storage Methods 0.000 claims description 31
- 238000003062 neural network model Methods 0.000 claims description 15
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 239000003814 drug Substances 0.000 claims description 8
- 229940079593 drug Drugs 0.000 claims description 8
- 230000009286 beneficial effect Effects 0.000 abstract description 4
- 238000001514 detection method Methods 0.000 abstract description 2
- 238000004891 communication Methods 0.000 description 5
- 230000001788 irregular Effects 0.000 description 3
- 239000000463 material Substances 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a junk information removing system and method based on a Chinese word segmentation recognition technology, relates to the technical field of junk information recognition, and solves the technical problems that in the prior art, the junk short message recognition rate is not high and the working efficiency is low; the method comprises the steps of firstly, preliminarily screening short messages received by an intelligent terminal according to numbers sent by the short messages to obtain preliminarily screened short messages, then extracting verification keywords in the short messages by a segmentation technology, matching the verification keywords with a sensitive word bank, and finally judging through an intelligent model; triple detection is set, so that the accuracy of garbage information identification and the working efficiency of the invention are improved; the invention is provided with the short message preprocessing module, which is helpful for improving the rejection efficiency of the spam short messages in the invention; the invention is provided with the short message analysis module, which is beneficial to improving the recognition rate of the spam short messages and ensuring that the intelligent terminal is not disturbed by the spam short messages.
Description
Technical Field
The invention belongs to the field of junk information identification, relates to a Chinese word segmentation identification technology, and particularly relates to a junk information removing system and method based on the Chinese word segmentation identification technology.
Background
The short message service, as a basic service of the mobile communication network, provides a convenient message communication service for users, and at the same time, becomes a channel for sending some illegal short messages, causing many damages, such as illegal crimes of strange and fraud by using short messages, unreal messages and rumors by using short messages, and the like.
The invention patent with publication number CN103874033A discloses a method for recognizing irregular spam messages based on chinese word segmentation, which comprises the steps of performing chinese word segmentation according to normal horizontal reading for the same message according to the content of the message, and calculating the weight according to the number of words in the segmentation result; judging the range of the contents of the irregular short messages according to the characteristic that the number of characters of each line of short messages must be controlled by the irregular short messages, converting the characters in the range of the contents of the irregularly arranged short messages into horizontal arrangement in a vertical mode, then performing Chinese word segmentation, and calculating the weight according to the number of words of the overall word segmentation result; then, according to the comparison of the two weights, whether the short messages are arranged normally or irregularly is judged.
According to the scheme, the matching keywords are analyzed by adopting the content according to the arrangement type, whether the spam messages are spam messages is identified, the missing judgment of the spam messages is further avoided, and the recall ratio and the precision ratio of the spam messages are improved; however, the scheme mainly aims at irregularly arranged spam short messages, so that the form of identifying the short messages is single, and the practicability of the scheme is reduced; therefore, the above solution still needs further improvement.
Disclosure of Invention
In order to solve the problems existing in the scheme, the invention provides a junk information removing system and method based on a Chinese word segmentation recognition technology.
The purpose of the invention can be realized by the following technical scheme: a junk information removing system based on Chinese word segmentation recognition technology comprises a processor, an IP analysis module, an information publishing module, a data storage module, a short message preprocessing module, an intelligent model module and a short message analysis module;
the short message preprocessing module is used for preprocessing the short message received by the intelligent terminal to obtain a primary screened short message and sending the primary screened short message to the short message analysis module through the processor;
the short message analysis module analyzes the primary screened short messages through the intelligent model and the keyword analysis technology in sequence, screens out spam short messages according to the analysis result, and sends IP analysis signals to the IP analysis module through the processor;
the intelligent model module is used for acquiring an intelligent model;
the IP analysis module is used for analyzing the IP address of the junk short message.
Preferably, the short message preprocessing module is configured to perform preliminary screening on the short messages, and includes:
the intelligent terminal receives the short message and then sends the short message to the short message preprocessing module; the intelligent terminal comprises an intelligent mobile phone and a tablet personal computer;
the short message preprocessing module acquires a sending number of the short message after receiving the short message, and acquires a short message mark database stored in the storage module through the processor;
matching the sending number with the number in the short message database, intercepting the short message corresponding to the sending number when the matching result is obtained, and automatically removing the short message from the intelligent terminal; when the matching result is not obtained, the short message is marked as a primary screened short message, and the primary screened short message and the short message analysis signal are sent to a short message analysis module through a processor;
and sending the sending record of the short message analysis signal to a data storage module for storage through a processor.
Preferably, the short message analysis module is configured to analyze the primarily selected short message, and includes:
acquiring a sensitive word bank in a data storage module through a processor; the sensitive word bank at least comprises a keyword of a sensitive word type, and the sensitive word type comprises drugs and yellow-related drugs;
extracting the preliminary screened short messages by a Chinese word segmentation technology to obtain verification keywords, matching the verification keywords with the keywords in the sensitive word bank, judging the preliminary screened short messages to be spam short messages when the verification keywords are matched with results in the sensitive word bank, and automatically removing the spam short messages from the intelligent terminal; when the verification keyword cannot be matched with the result in the sensitive word bank, acquiring an intelligent model in the data storage module;
converting the primary screened short message into an input array, marking the input array as a verification input array, and inputting the verification input array into an intelligent model to judge the primary screened short message;
and when the primary screened short messages are judged to be spam short messages, the primary screened short messages are automatically removed from the intelligent terminal.
Preferably, the intelligent model module is configured to train the neural network model to obtain an intelligent model, and includes:
acquiring a spam message database through the Internet, and numbering spam messages; the specific reference numbers are 5 digits, such as 1+01+ 00; wherein, the position of 1 represents the arrangement rule of the spam messages, and the arrangement rule comprises a horizontal row and a vertical row; the position of 01 represents the type of sensitive words, 00 represents the mixed type, 01 represents drugs, and 02 represents the reference yellow; the position of 00 represents the number of sensitive words;
preprocessing the junk short messages, converting the preprocessed junk short messages into an input array of a neural network model, and taking the serial numbers corresponding to the junk short messages as an output array of the neural network to train the neural network model; the neural network model comprises an error feedforward neural network and an RBF neural network;
and marking the trained neural network model as an intelligent model, and sending the intelligent model to a data storage module for storage through a processor.
Preferably, the IP analysis module analyzes the IP address of the spam message after receiving the IP analysis signal, and adds the IP address to the IP blacklist when the number of times of sending the spam message by the IP address exceeds the preset number of times of the spam message.
Preferably, the information issuing module is used for issuing a rejection result of the spam short messages and periodically issuing a rejection record of the spam short messages to the intelligent terminal.
Preferably, the short message tag database is generated by a third party platform, and includes:
generating an empty short message mark library through a processor;
acquiring a harassment number statistical table through a third-party platform; the third party platform comprises China Mobile, China Unicom and China telecom, and the number in the harassing number statistical table is a number marked as a harassing call by a user of the third party platform;
acquiring the marking times of the numbers in the harassment number statistical table, and marking the marking times as BC;
when the marking times BC is larger than L1, the number corresponding to the marking times is stored in a short message marking library; wherein L1 is a preset marking time threshold, L1> 0;
and sending the short message mark library to a data storage module for storage through a processor.
Preferably, the processor is respectively in communication connection with the IP analysis module, the information release module, the data storage module, the short message preprocessing module, the intelligent model module and the short message analysis module, and the data storage module is in communication connection with the information release module.
A junk information removing method based on a Chinese word segmentation recognition technology comprises the following steps:
the method comprises the following steps: the intelligent terminal receives the short message and then sends the short message to the short message preprocessing module; the short message preprocessing module acquires a sending number of the short message after receiving the short message, and acquires a short message mark database stored in the storage module through the processor; matching the sending number with the number in the short message database, intercepting the short message corresponding to the sending number when the matching result is obtained, and automatically removing the short message from the intelligent terminal; when the matching result is not obtained, the short message is marked as a primary screened short message, and the primary screened short message and the short message analysis signal are sent to a short message analysis module through a processor;
step two: acquiring a sensitive word bank in a data storage module through a processor; extracting the preliminary screened short messages by a Chinese word segmentation technology to obtain verification keywords, matching the verification keywords with the keywords in the sensitive word bank, judging the preliminary screened short messages to be spam short messages when the verification keywords are matched with results in the sensitive word bank, and automatically removing the spam short messages from the intelligent terminal; when the verification keyword cannot be matched with the result in the sensitive word bank, acquiring an intelligent model in the data storage module; converting the primary screened short message into an input array, marking the input array as a verification input array, and inputting the verification input array into an intelligent model to judge the primary screened short message; and when the primary screened short messages are judged to be spam short messages, the primary screened short messages are automatically removed from the intelligent terminal.
Compared with the prior art, the invention has the beneficial effects that:
1. the method comprises the steps of firstly, preliminarily screening short messages received by an intelligent terminal according to numbers sent by the short messages to obtain preliminarily screened short messages, then extracting verification keywords in the short messages by a segmentation technology, matching the verification keywords with a sensitive word bank, and finally judging through an intelligent model; triple detection is set, so that the accuracy of garbage information identification and the working efficiency of the invention are improved;
2. the invention is provided with a short message preprocessing module, which is used for preliminarily screening short messages; the intelligent terminal receives the short message and then sends the short message to the short message preprocessing module; the short message preprocessing module acquires a sending number of the short message after receiving the short message, and acquires a short message mark database stored in the storage module through the processor; matching the sending number with the number in the short message database, intercepting the short message corresponding to the sending number when the matching result is obtained, and automatically removing the short message from the intelligent terminal; when the matching result is not obtained, the short message is marked as a primary screened short message, and the primary screened short message and the short message analysis signal are sent to a short message analysis module through a processor; the short message preprocessing module realizes the preliminary screening of the short messages by screening the numbers of the sent short messages, and is beneficial to improving the removal efficiency of the spam short messages;
3. the invention is provided with a short message analysis module, which is used for analyzing the primary selected short message; acquiring a sensitive word bank in a data storage module through a processor; extracting the preliminary screened short messages by a Chinese word segmentation technology to obtain verification keywords, matching the verification keywords with the keywords in the sensitive word bank, judging the preliminary screened short messages to be spam short messages when the verification keywords are matched with results in the sensitive word bank, and automatically removing the spam short messages from the intelligent terminal; when the verification keyword cannot be matched with the result in the sensitive word bank, acquiring an intelligent model in the data storage module; converting the primary screened short message into an input array, marking the input array as a verification input array, and inputting the verification input array into an intelligent model to judge the primary screened short message; when the primary screened short messages are judged to be spam short messages, the primary screened short messages are automatically removed from the intelligent terminal; the short message analysis module further screens the primary screened short messages through a Chinese word segmentation technology and an intelligent model in sequence, and automatically eliminates the screened spam short messages, so that the method is beneficial to improving the recognition rate of the spam short messages and ensuring that the intelligent terminal is not disturbed by the spam short messages.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic diagram of the principle of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a spam removal system based on a chinese word segmentation recognition technology includes a processor, an IP analysis module, an information distribution module, a data storage module, a short message preprocessing module, an intelligent model module, and a short message analysis module;
the short message preprocessing module is used for preprocessing the short message received by the intelligent terminal to obtain a primary screened short message and sending the primary screened short message to the short message analysis module through the processor;
the short message analysis module analyzes the primary screened short messages through the intelligent model and the keyword analysis technology in sequence, screens out spam short messages according to the analysis result, and sends IP analysis signals to the IP analysis module through the processor;
the intelligent model module is used for acquiring an intelligent model;
the IP analysis module is used for analyzing the IP address of the junk short message.
Further, the short message preprocessing module is used for primarily screening the short messages, and comprises:
the intelligent terminal receives the short message and then sends the short message to the short message preprocessing module; the intelligent terminal comprises an intelligent mobile phone and a tablet computer;
the short message preprocessing module acquires a sending number of the short message after receiving the short message, and acquires a short message mark database stored in the storage module through the processor;
matching the sending number with the number in the short message database, intercepting the short message corresponding to the sending number when the matching result is obtained, and automatically removing the short message from the intelligent terminal; when the matching result is not obtained, the short message is marked as a primary screened short message, and the primary screened short message and the short message analysis signal are sent to a short message analysis module through a processor;
and sending the sending record of the short message analysis signal to a data storage module for storage through a processor.
Further, the short message analysis module is used for analyzing the primarily selected short message, and comprises:
acquiring a sensitive word bank in a data storage module through a processor; the sensitive word bank at least comprises a key word of a sensitive word type, and the sensitive word type comprises drugs and yellow-related drugs;
extracting the preliminary screened short messages by a Chinese word segmentation technology to obtain verification keywords, matching the verification keywords with the keywords in the sensitive word bank, judging the preliminary screened short messages to be spam short messages when the verification keywords are matched with results in the sensitive word bank, and automatically removing the spam short messages from the intelligent terminal; when the verification keyword cannot be matched with the result in the sensitive word bank, acquiring an intelligent model in the data storage module;
converting the primary screened short message into an input array, marking the input array as a verification input array, and inputting the verification input array into an intelligent model to judge the primary screened short message;
and when the primary screened short messages are judged to be spam short messages, the primary screened short messages are automatically removed from the intelligent terminal.
Further, the intelligent model module is used for training the neural network model to obtain the intelligent model, and comprises:
acquiring a spam message database through the Internet, and numbering spam messages; the specific reference numbers are 5 digits, such as 1+01+ 00; wherein, the position of 1 represents the arrangement rule of the spam messages, and the arrangement rule comprises horizontal rows and vertical rows; the position of 01 represents the type of sensitive words, 00 represents the mixed type, 01 represents drugs, and 02 represents the reference yellow; the position of 00 represents the number of sensitive words;
preprocessing the junk short messages, converting the preprocessed junk short messages into an input array of a neural network model, and taking the serial numbers corresponding to the junk short messages as an output array of the neural network to train the neural network model; the neural network model comprises an error feedforward neural network and an RBF neural network;
and marking the trained neural network model as an intelligent model, and sending the intelligent model to a data storage module for storage through a processor.
Further, the IP analysis module analyzes the IP address of the junk short message after receiving the IP analysis signal, and adds the IP address into an IP blacklist when the number of times of sending the junk short message by the IP address exceeds the preset number of times of sending the junk short message.
Further, the information issuing module is used for issuing a rejection result of the spam short messages and periodically issuing a rejection record of the spam short messages to the intelligent terminal.
Further, the short message mark database is generated through a third-party platform, and comprises the following steps:
generating an empty short message mark library through a processor;
acquiring a harassment number statistical table through a third-party platform; the third party platform comprises China Mobile, China Unicom and China telecom, and the number in the harassing number statistical table is the number marked as a harassing call by the user of the third party platform;
acquiring the marking times of the numbers in the harassment number statistical table, and marking the marking times as BC;
when the marking times BC is larger than L1, the number corresponding to the marking times is stored in a short message marking library; wherein L1 is a preset marking time threshold;
and sending the short message mark library to a data storage module for storage through a processor.
Further, the processor is respectively in communication connection with the IP analysis module, the information release module, the data storage module, the short message preprocessing module, the intelligent model module and the short message analysis module, and the data storage module is in communication connection with the information release module.
A junk information removing method based on Chinese word segmentation recognition technology comprises the following steps:
the method comprises the following steps: the intelligent terminal receives the short message and then sends the short message to the short message preprocessing module; the short message preprocessing module acquires a sending number of the short message after receiving the short message, and acquires a short message mark database stored in the storage module through the processor; matching the sending number with the number in the short message database, intercepting the short message corresponding to the sending number when the matching result is obtained, and automatically removing the short message from the intelligent terminal; when the matching result is not obtained, the short message is marked as a primary screened short message, and the primary screened short message and the short message analysis signal are sent to a short message analysis module through a processor;
step two: acquiring a sensitive word bank in a data storage module through a processor; extracting the preliminary screened short messages by a Chinese word segmentation technology to obtain verification keywords, matching the verification keywords with the keywords in the sensitive word bank, judging the preliminary screened short messages to be spam short messages when the verification keywords are matched with results in the sensitive word bank, and automatically removing the spam short messages from the intelligent terminal; when the verification keyword cannot be matched with the result in the sensitive word bank, acquiring an intelligent model in the data storage module; converting the primary screened short message into an input array, marking the input array as a verification input array, and inputting the verification input array into an intelligent model to judge the primary screened short message; and when the primary screened short messages are judged to be spam short messages, the primary screened short messages are automatically removed from the intelligent terminal.
The above formulas are all calculated by removing dimensions and taking values thereof, the formula is one closest to the real situation obtained by collecting a large amount of data and performing software simulation, and the preset parameters in the formula are set by the technical personnel in the field according to the actual situation.
The working principle of the invention is as follows:
the intelligent terminal receives the short message and then sends the short message to the short message preprocessing module; the short message preprocessing module acquires a sending number of the short message after receiving the short message, and acquires a short message mark database stored in the storage module through the processor; matching the sending number with the number in the short message database, intercepting the short message corresponding to the sending number when the matching result is obtained, and automatically removing the short message from the intelligent terminal; when the matching result is not obtained, the short message is marked as a primary screened short message, and the primary screened short message and the short message analysis signal are sent to a short message analysis module through a processor;
acquiring a sensitive word bank in a data storage module through a processor; extracting the preliminary screened short messages by a Chinese word segmentation technology to obtain verification keywords, matching the verification keywords with the keywords in the sensitive word bank, judging the preliminary screened short messages to be spam short messages when the verification keywords are matched with results in the sensitive word bank, and automatically removing the spam short messages from the intelligent terminal; when the verification keyword cannot be matched with the result in the sensitive word bank, acquiring an intelligent model in the data storage module; converting the primary screened short message into an input array, marking the input array as a verification input array, and inputting the verification input array into an intelligent model to judge the primary screened short message; and when the primary screened short messages are judged to be spam short messages, the primary screened short messages are automatically removed from the intelligent terminal.
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing is merely exemplary and illustrative of the present invention and various modifications, additions and substitutions may be made by those skilled in the art to the specific embodiments described without departing from the scope of the invention as defined in the following claims.
Claims (7)
1. A junk information removing system based on Chinese word segmentation recognition technology is characterized by comprising a processor, an IP analysis module, an information publishing module, a data storage module, a short message preprocessing module, an intelligent model module and a short message analysis module;
the short message preprocessing module is used for preprocessing the short message received by the intelligent terminal to obtain a primary screened short message and sending the primary screened short message to the short message analysis module through the processor;
the short message analysis module analyzes the primary screened short messages through the intelligent model and the keyword analysis technology in sequence, screens out spam short messages according to the analysis result, and sends IP analysis signals to the IP analysis module through the processor;
the intelligent model module is used for acquiring an intelligent model;
the IP analysis module is used for analyzing the IP address of the junk short message.
2. The system of claim 1, wherein the short message preprocessing module is configured to perform preliminary screening on short messages, and comprises:
the intelligent terminal receives the short message and then sends the short message to the short message preprocessing module; the intelligent terminal comprises an intelligent mobile phone and a tablet personal computer;
the short message preprocessing module acquires a sending number of the short message after receiving the short message, and acquires a short message mark database stored in the storage module through the processor;
matching the sending number with the number in the short message database, intercepting the short message corresponding to the sending number when the matching result is obtained, and automatically removing the short message from the intelligent terminal; when the matching result is not obtained, the short message is marked as a primary screened short message, and the primary screened short message and the short message analysis signal are sent to a short message analysis module through a processor;
and sending the sending record of the short message analysis signal to a data storage module for storage through a processor.
3. The system of claim 1, wherein the text message analysis module is configured to analyze the initially selected text message, and comprises:
acquiring a sensitive word bank in a data storage module through a processor; the sensitive word bank at least comprises a keyword of a sensitive word type, and the sensitive word type comprises drugs and yellow-related drugs;
extracting the preliminary screened short messages by a Chinese word segmentation technology to obtain verification keywords, matching the verification keywords with the keywords in the sensitive word bank, judging the preliminary screened short messages to be spam short messages when the verification keywords are matched with results in the sensitive word bank, and automatically removing the spam short messages from the intelligent terminal; when the verification keyword cannot be matched with the result in the sensitive word bank, acquiring an intelligent model in the data storage module;
converting the primary screened short message into an input array, marking the input array as a verification input array, and inputting the verification input array into an intelligent model to judge the primary screened short message;
and when the primary screened short messages are judged to be spam short messages, the primary screened short messages are automatically removed from the intelligent terminal.
4. The system of claim 1, wherein the intelligent model module is configured to train a neural network model to obtain an intelligent model, and the system comprises:
acquiring a spam message database through the Internet, and numbering spam messages;
preprocessing the junk short messages, converting the preprocessed junk short messages into an input array of a neural network model, and taking the serial numbers corresponding to the junk short messages as an output array of the neural network to train the neural network model; the neural network model comprises an error feedforward neural network and an RBF neural network;
and marking the trained neural network model as an intelligent model, and sending the intelligent model to a data storage module for storage through a processor.
5. The junk information removal system according to claim 1 wherein the IP analysis module analyzes the IP address of the junk short message after receiving the IP analysis signal, and adds the IP address to an IP blacklist when the number of times of sending the junk short message by the IP address exceeds a predetermined number of times of sending the junk short message.
6. The system of claim 1, wherein the information distribution module is configured to distribute a reject result of the spam messages and periodically distribute a reject record of the spam messages to the intelligent terminal.
7. A junk information removing method based on a Chinese word segmentation recognition technology is characterized by comprising the following steps:
the method comprises the following steps: the intelligent terminal receives the short message and then sends the short message to the short message preprocessing module; the short message preprocessing module acquires a sending number of the short message after receiving the short message, and acquires a short message mark database stored in the storage module through the processor; matching the sending number with the number in the short message database, intercepting the short message corresponding to the sending number when the matching result is obtained, and automatically removing the short message from the intelligent terminal; when the matching result is not obtained, the short message is marked as a primary screened short message, and the primary screened short message and the short message analysis signal are sent to a short message analysis module through a processor;
step two: acquiring a sensitive word bank in a data storage module through a processor; extracting the preliminary screened short messages by a Chinese word segmentation technology to obtain verification keywords, matching the verification keywords with the keywords in the sensitive word bank, judging the preliminary screened short messages to be spam short messages when the verification keywords are matched with results in the sensitive word bank, and automatically removing the spam short messages from the intelligent terminal; when the verification keyword cannot be matched with the result in the sensitive word bank, acquiring an intelligent model in the data storage module; converting the primary screened short message into an input array, marking the input array as a verification input array, and inputting the verification input array into an intelligent model to judge the primary screened short message; and when the primary screened short messages are judged to be spam short messages, the primary screened short messages are automatically removed from the intelligent terminal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011391134.2A CN112380323A (en) | 2020-12-01 | 2020-12-01 | Junk information removing system and method based on Chinese word segmentation recognition technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011391134.2A CN112380323A (en) | 2020-12-01 | 2020-12-01 | Junk information removing system and method based on Chinese word segmentation recognition technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112380323A true CN112380323A (en) | 2021-02-19 |
Family
ID=74589618
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011391134.2A Pending CN112380323A (en) | 2020-12-01 | 2020-12-01 | Junk information removing system and method based on Chinese word segmentation recognition technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112380323A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116566677A (en) * | 2023-05-15 | 2023-08-08 | 深圳市智联物联科技有限公司 | Short message receiving and transmitting system of serial server |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103874033A (en) * | 2012-12-12 | 2014-06-18 | 上海粱江通信***股份有限公司 | Method for identifying irregular spam short message on the basis of Chinese word segmentation |
CN104168548A (en) * | 2014-08-21 | 2014-11-26 | 北京奇虎科技有限公司 | Short message intercepting method and device and cloud server |
CN104794125A (en) * | 2014-01-20 | 2015-07-22 | 中国科学院深圳先进技术研究院 | Method and device for recognizing junk short message |
CN108062303A (en) * | 2017-12-06 | 2018-05-22 | 北京奇虎科技有限公司 | The recognition methods of refuse messages and device |
-
2020
- 2020-12-01 CN CN202011391134.2A patent/CN112380323A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103874033A (en) * | 2012-12-12 | 2014-06-18 | 上海粱江通信***股份有限公司 | Method for identifying irregular spam short message on the basis of Chinese word segmentation |
CN104794125A (en) * | 2014-01-20 | 2015-07-22 | 中国科学院深圳先进技术研究院 | Method and device for recognizing junk short message |
CN104168548A (en) * | 2014-08-21 | 2014-11-26 | 北京奇虎科技有限公司 | Short message intercepting method and device and cloud server |
CN108062303A (en) * | 2017-12-06 | 2018-05-22 | 北京奇虎科技有限公司 | The recognition methods of refuse messages and device |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116566677A (en) * | 2023-05-15 | 2023-08-08 | 深圳市智联物联科技有限公司 | Short message receiving and transmitting system of serial server |
CN116566677B (en) * | 2023-05-15 | 2024-02-13 | 深圳市智联物联科技有限公司 | Short message receiving and transmitting system of serial server |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106550155B (en) | Swindle sample is carried out to suspicious number and screens the method and system sorted out and intercepted | |
CN109600752B (en) | Deep clustering fraud detection method and device | |
CN103024746B (en) | System and method for processing spam short messages for telecommunication operator | |
CN101784022A (en) | Method and system for filtering and classifying short messages | |
CN109451182B (en) | Detection method and device for fraud telephone | |
CN111405562B (en) | Mobile malicious user identification method and system based on communication behavior rules | |
CN107517463A (en) | A kind of recognition methods of telephone number and device | |
CN103037339A (en) | Short message filtering method based on user creditworthiness and short message spam degree | |
CN111045847A (en) | Event auditing method and device, terminal equipment and storage medium | |
CN111104521A (en) | Anti-fraud detection method and detection system based on graph analysis | |
CN101389085B (en) | Rubbish short message recognition system and method based on sending behavior | |
CN107895122A (en) | A kind of special sensitive information active defense method, apparatus and system | |
CN109684157A (en) | Alarm method, equipment, storage medium and device based on the log that reports an error | |
Singh et al. | Email spam classification by support vector machine | |
CN115759640A (en) | Public service information processing system and method for smart city | |
CN111131627B (en) | Method, device and readable medium for detecting personal harmful call based on streaming data atlas | |
CN109151229A (en) | Abnormal call automatic identification early warning system and its working method, call center system | |
CN115222303A (en) | Industry risk data analysis method and system based on big data and storage medium | |
CN110213152A (en) | Identify method, apparatus, server and the storage medium of spam | |
CN109274834B (en) | Express number identification method based on call behavior | |
CN112380323A (en) | Junk information removing system and method based on Chinese word segmentation recognition technology | |
CN105163296A (en) | Multi-dimensional spam message filtering method and system | |
CN111861733B (en) | Fraud prevention and control system and method based on address fuzzy matching | |
CN113112323A (en) | Abnormal order identification method, device, equipment and medium based on data analysis | |
CN110059189B (en) | Game platform message classification system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |