CN112380323A - Junk information removing system and method based on Chinese word segmentation recognition technology - Google Patents

Junk information removing system and method based on Chinese word segmentation recognition technology Download PDF

Info

Publication number
CN112380323A
CN112380323A CN202011391134.2A CN202011391134A CN112380323A CN 112380323 A CN112380323 A CN 112380323A CN 202011391134 A CN202011391134 A CN 202011391134A CN 112380323 A CN112380323 A CN 112380323A
Authority
CN
China
Prior art keywords
short message
short
module
messages
short messages
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011391134.2A
Other languages
Chinese (zh)
Inventor
杨奚诚
王诚
熊瑛
卢倩
夏洋阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei D2s Soft Information Technology Co ltd
Original Assignee
Hefei D2s Soft Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei D2s Soft Information Technology Co ltd filed Critical Hefei D2s Soft Information Technology Co ltd
Priority to CN202011391134.2A priority Critical patent/CN112380323A/en
Publication of CN112380323A publication Critical patent/CN112380323A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a junk information removing system and method based on a Chinese word segmentation recognition technology, relates to the technical field of junk information recognition, and solves the technical problems that in the prior art, the junk short message recognition rate is not high and the working efficiency is low; the method comprises the steps of firstly, preliminarily screening short messages received by an intelligent terminal according to numbers sent by the short messages to obtain preliminarily screened short messages, then extracting verification keywords in the short messages by a segmentation technology, matching the verification keywords with a sensitive word bank, and finally judging through an intelligent model; triple detection is set, so that the accuracy of garbage information identification and the working efficiency of the invention are improved; the invention is provided with the short message preprocessing module, which is helpful for improving the rejection efficiency of the spam short messages in the invention; the invention is provided with the short message analysis module, which is beneficial to improving the recognition rate of the spam short messages and ensuring that the intelligent terminal is not disturbed by the spam short messages.

Description

Junk information removing system and method based on Chinese word segmentation recognition technology
Technical Field
The invention belongs to the field of junk information identification, relates to a Chinese word segmentation identification technology, and particularly relates to a junk information removing system and method based on the Chinese word segmentation identification technology.
Background
The short message service, as a basic service of the mobile communication network, provides a convenient message communication service for users, and at the same time, becomes a channel for sending some illegal short messages, causing many damages, such as illegal crimes of strange and fraud by using short messages, unreal messages and rumors by using short messages, and the like.
The invention patent with publication number CN103874033A discloses a method for recognizing irregular spam messages based on chinese word segmentation, which comprises the steps of performing chinese word segmentation according to normal horizontal reading for the same message according to the content of the message, and calculating the weight according to the number of words in the segmentation result; judging the range of the contents of the irregular short messages according to the characteristic that the number of characters of each line of short messages must be controlled by the irregular short messages, converting the characters in the range of the contents of the irregularly arranged short messages into horizontal arrangement in a vertical mode, then performing Chinese word segmentation, and calculating the weight according to the number of words of the overall word segmentation result; then, according to the comparison of the two weights, whether the short messages are arranged normally or irregularly is judged.
According to the scheme, the matching keywords are analyzed by adopting the content according to the arrangement type, whether the spam messages are spam messages is identified, the missing judgment of the spam messages is further avoided, and the recall ratio and the precision ratio of the spam messages are improved; however, the scheme mainly aims at irregularly arranged spam short messages, so that the form of identifying the short messages is single, and the practicability of the scheme is reduced; therefore, the above solution still needs further improvement.
Disclosure of Invention
In order to solve the problems existing in the scheme, the invention provides a junk information removing system and method based on a Chinese word segmentation recognition technology.
The purpose of the invention can be realized by the following technical scheme: a junk information removing system based on Chinese word segmentation recognition technology comprises a processor, an IP analysis module, an information publishing module, a data storage module, a short message preprocessing module, an intelligent model module and a short message analysis module;
the short message preprocessing module is used for preprocessing the short message received by the intelligent terminal to obtain a primary screened short message and sending the primary screened short message to the short message analysis module through the processor;
the short message analysis module analyzes the primary screened short messages through the intelligent model and the keyword analysis technology in sequence, screens out spam short messages according to the analysis result, and sends IP analysis signals to the IP analysis module through the processor;
the intelligent model module is used for acquiring an intelligent model;
the IP analysis module is used for analyzing the IP address of the junk short message.
Preferably, the short message preprocessing module is configured to perform preliminary screening on the short messages, and includes:
the intelligent terminal receives the short message and then sends the short message to the short message preprocessing module; the intelligent terminal comprises an intelligent mobile phone and a tablet personal computer;
the short message preprocessing module acquires a sending number of the short message after receiving the short message, and acquires a short message mark database stored in the storage module through the processor;
matching the sending number with the number in the short message database, intercepting the short message corresponding to the sending number when the matching result is obtained, and automatically removing the short message from the intelligent terminal; when the matching result is not obtained, the short message is marked as a primary screened short message, and the primary screened short message and the short message analysis signal are sent to a short message analysis module through a processor;
and sending the sending record of the short message analysis signal to a data storage module for storage through a processor.
Preferably, the short message analysis module is configured to analyze the primarily selected short message, and includes:
acquiring a sensitive word bank in a data storage module through a processor; the sensitive word bank at least comprises a keyword of a sensitive word type, and the sensitive word type comprises drugs and yellow-related drugs;
extracting the preliminary screened short messages by a Chinese word segmentation technology to obtain verification keywords, matching the verification keywords with the keywords in the sensitive word bank, judging the preliminary screened short messages to be spam short messages when the verification keywords are matched with results in the sensitive word bank, and automatically removing the spam short messages from the intelligent terminal; when the verification keyword cannot be matched with the result in the sensitive word bank, acquiring an intelligent model in the data storage module;
converting the primary screened short message into an input array, marking the input array as a verification input array, and inputting the verification input array into an intelligent model to judge the primary screened short message;
and when the primary screened short messages are judged to be spam short messages, the primary screened short messages are automatically removed from the intelligent terminal.
Preferably, the intelligent model module is configured to train the neural network model to obtain an intelligent model, and includes:
acquiring a spam message database through the Internet, and numbering spam messages; the specific reference numbers are 5 digits, such as 1+01+ 00; wherein, the position of 1 represents the arrangement rule of the spam messages, and the arrangement rule comprises a horizontal row and a vertical row; the position of 01 represents the type of sensitive words, 00 represents the mixed type, 01 represents drugs, and 02 represents the reference yellow; the position of 00 represents the number of sensitive words;
preprocessing the junk short messages, converting the preprocessed junk short messages into an input array of a neural network model, and taking the serial numbers corresponding to the junk short messages as an output array of the neural network to train the neural network model; the neural network model comprises an error feedforward neural network and an RBF neural network;
and marking the trained neural network model as an intelligent model, and sending the intelligent model to a data storage module for storage through a processor.
Preferably, the IP analysis module analyzes the IP address of the spam message after receiving the IP analysis signal, and adds the IP address to the IP blacklist when the number of times of sending the spam message by the IP address exceeds the preset number of times of the spam message.
Preferably, the information issuing module is used for issuing a rejection result of the spam short messages and periodically issuing a rejection record of the spam short messages to the intelligent terminal.
Preferably, the short message tag database is generated by a third party platform, and includes:
generating an empty short message mark library through a processor;
acquiring a harassment number statistical table through a third-party platform; the third party platform comprises China Mobile, China Unicom and China telecom, and the number in the harassing number statistical table is a number marked as a harassing call by a user of the third party platform;
acquiring the marking times of the numbers in the harassment number statistical table, and marking the marking times as BC;
when the marking times BC is larger than L1, the number corresponding to the marking times is stored in a short message marking library; wherein L1 is a preset marking time threshold, L1> 0;
and sending the short message mark library to a data storage module for storage through a processor.
Preferably, the processor is respectively in communication connection with the IP analysis module, the information release module, the data storage module, the short message preprocessing module, the intelligent model module and the short message analysis module, and the data storage module is in communication connection with the information release module.
A junk information removing method based on a Chinese word segmentation recognition technology comprises the following steps:
the method comprises the following steps: the intelligent terminal receives the short message and then sends the short message to the short message preprocessing module; the short message preprocessing module acquires a sending number of the short message after receiving the short message, and acquires a short message mark database stored in the storage module through the processor; matching the sending number with the number in the short message database, intercepting the short message corresponding to the sending number when the matching result is obtained, and automatically removing the short message from the intelligent terminal; when the matching result is not obtained, the short message is marked as a primary screened short message, and the primary screened short message and the short message analysis signal are sent to a short message analysis module through a processor;
step two: acquiring a sensitive word bank in a data storage module through a processor; extracting the preliminary screened short messages by a Chinese word segmentation technology to obtain verification keywords, matching the verification keywords with the keywords in the sensitive word bank, judging the preliminary screened short messages to be spam short messages when the verification keywords are matched with results in the sensitive word bank, and automatically removing the spam short messages from the intelligent terminal; when the verification keyword cannot be matched with the result in the sensitive word bank, acquiring an intelligent model in the data storage module; converting the primary screened short message into an input array, marking the input array as a verification input array, and inputting the verification input array into an intelligent model to judge the primary screened short message; and when the primary screened short messages are judged to be spam short messages, the primary screened short messages are automatically removed from the intelligent terminal.
Compared with the prior art, the invention has the beneficial effects that:
1. the method comprises the steps of firstly, preliminarily screening short messages received by an intelligent terminal according to numbers sent by the short messages to obtain preliminarily screened short messages, then extracting verification keywords in the short messages by a segmentation technology, matching the verification keywords with a sensitive word bank, and finally judging through an intelligent model; triple detection is set, so that the accuracy of garbage information identification and the working efficiency of the invention are improved;
2. the invention is provided with a short message preprocessing module, which is used for preliminarily screening short messages; the intelligent terminal receives the short message and then sends the short message to the short message preprocessing module; the short message preprocessing module acquires a sending number of the short message after receiving the short message, and acquires a short message mark database stored in the storage module through the processor; matching the sending number with the number in the short message database, intercepting the short message corresponding to the sending number when the matching result is obtained, and automatically removing the short message from the intelligent terminal; when the matching result is not obtained, the short message is marked as a primary screened short message, and the primary screened short message and the short message analysis signal are sent to a short message analysis module through a processor; the short message preprocessing module realizes the preliminary screening of the short messages by screening the numbers of the sent short messages, and is beneficial to improving the removal efficiency of the spam short messages;
3. the invention is provided with a short message analysis module, which is used for analyzing the primary selected short message; acquiring a sensitive word bank in a data storage module through a processor; extracting the preliminary screened short messages by a Chinese word segmentation technology to obtain verification keywords, matching the verification keywords with the keywords in the sensitive word bank, judging the preliminary screened short messages to be spam short messages when the verification keywords are matched with results in the sensitive word bank, and automatically removing the spam short messages from the intelligent terminal; when the verification keyword cannot be matched with the result in the sensitive word bank, acquiring an intelligent model in the data storage module; converting the primary screened short message into an input array, marking the input array as a verification input array, and inputting the verification input array into an intelligent model to judge the primary screened short message; when the primary screened short messages are judged to be spam short messages, the primary screened short messages are automatically removed from the intelligent terminal; the short message analysis module further screens the primary screened short messages through a Chinese word segmentation technology and an intelligent model in sequence, and automatically eliminates the screened spam short messages, so that the method is beneficial to improving the recognition rate of the spam short messages and ensuring that the intelligent terminal is not disturbed by the spam short messages.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic diagram of the principle of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a spam removal system based on a chinese word segmentation recognition technology includes a processor, an IP analysis module, an information distribution module, a data storage module, a short message preprocessing module, an intelligent model module, and a short message analysis module;
the short message preprocessing module is used for preprocessing the short message received by the intelligent terminal to obtain a primary screened short message and sending the primary screened short message to the short message analysis module through the processor;
the short message analysis module analyzes the primary screened short messages through the intelligent model and the keyword analysis technology in sequence, screens out spam short messages according to the analysis result, and sends IP analysis signals to the IP analysis module through the processor;
the intelligent model module is used for acquiring an intelligent model;
the IP analysis module is used for analyzing the IP address of the junk short message.
Further, the short message preprocessing module is used for primarily screening the short messages, and comprises:
the intelligent terminal receives the short message and then sends the short message to the short message preprocessing module; the intelligent terminal comprises an intelligent mobile phone and a tablet computer;
the short message preprocessing module acquires a sending number of the short message after receiving the short message, and acquires a short message mark database stored in the storage module through the processor;
matching the sending number with the number in the short message database, intercepting the short message corresponding to the sending number when the matching result is obtained, and automatically removing the short message from the intelligent terminal; when the matching result is not obtained, the short message is marked as a primary screened short message, and the primary screened short message and the short message analysis signal are sent to a short message analysis module through a processor;
and sending the sending record of the short message analysis signal to a data storage module for storage through a processor.
Further, the short message analysis module is used for analyzing the primarily selected short message, and comprises:
acquiring a sensitive word bank in a data storage module through a processor; the sensitive word bank at least comprises a key word of a sensitive word type, and the sensitive word type comprises drugs and yellow-related drugs;
extracting the preliminary screened short messages by a Chinese word segmentation technology to obtain verification keywords, matching the verification keywords with the keywords in the sensitive word bank, judging the preliminary screened short messages to be spam short messages when the verification keywords are matched with results in the sensitive word bank, and automatically removing the spam short messages from the intelligent terminal; when the verification keyword cannot be matched with the result in the sensitive word bank, acquiring an intelligent model in the data storage module;
converting the primary screened short message into an input array, marking the input array as a verification input array, and inputting the verification input array into an intelligent model to judge the primary screened short message;
and when the primary screened short messages are judged to be spam short messages, the primary screened short messages are automatically removed from the intelligent terminal.
Further, the intelligent model module is used for training the neural network model to obtain the intelligent model, and comprises:
acquiring a spam message database through the Internet, and numbering spam messages; the specific reference numbers are 5 digits, such as 1+01+ 00; wherein, the position of 1 represents the arrangement rule of the spam messages, and the arrangement rule comprises horizontal rows and vertical rows; the position of 01 represents the type of sensitive words, 00 represents the mixed type, 01 represents drugs, and 02 represents the reference yellow; the position of 00 represents the number of sensitive words;
preprocessing the junk short messages, converting the preprocessed junk short messages into an input array of a neural network model, and taking the serial numbers corresponding to the junk short messages as an output array of the neural network to train the neural network model; the neural network model comprises an error feedforward neural network and an RBF neural network;
and marking the trained neural network model as an intelligent model, and sending the intelligent model to a data storage module for storage through a processor.
Further, the IP analysis module analyzes the IP address of the junk short message after receiving the IP analysis signal, and adds the IP address into an IP blacklist when the number of times of sending the junk short message by the IP address exceeds the preset number of times of sending the junk short message.
Further, the information issuing module is used for issuing a rejection result of the spam short messages and periodically issuing a rejection record of the spam short messages to the intelligent terminal.
Further, the short message mark database is generated through a third-party platform, and comprises the following steps:
generating an empty short message mark library through a processor;
acquiring a harassment number statistical table through a third-party platform; the third party platform comprises China Mobile, China Unicom and China telecom, and the number in the harassing number statistical table is the number marked as a harassing call by the user of the third party platform;
acquiring the marking times of the numbers in the harassment number statistical table, and marking the marking times as BC;
when the marking times BC is larger than L1, the number corresponding to the marking times is stored in a short message marking library; wherein L1 is a preset marking time threshold;
and sending the short message mark library to a data storage module for storage through a processor.
Further, the processor is respectively in communication connection with the IP analysis module, the information release module, the data storage module, the short message preprocessing module, the intelligent model module and the short message analysis module, and the data storage module is in communication connection with the information release module.
A junk information removing method based on Chinese word segmentation recognition technology comprises the following steps:
the method comprises the following steps: the intelligent terminal receives the short message and then sends the short message to the short message preprocessing module; the short message preprocessing module acquires a sending number of the short message after receiving the short message, and acquires a short message mark database stored in the storage module through the processor; matching the sending number with the number in the short message database, intercepting the short message corresponding to the sending number when the matching result is obtained, and automatically removing the short message from the intelligent terminal; when the matching result is not obtained, the short message is marked as a primary screened short message, and the primary screened short message and the short message analysis signal are sent to a short message analysis module through a processor;
step two: acquiring a sensitive word bank in a data storage module through a processor; extracting the preliminary screened short messages by a Chinese word segmentation technology to obtain verification keywords, matching the verification keywords with the keywords in the sensitive word bank, judging the preliminary screened short messages to be spam short messages when the verification keywords are matched with results in the sensitive word bank, and automatically removing the spam short messages from the intelligent terminal; when the verification keyword cannot be matched with the result in the sensitive word bank, acquiring an intelligent model in the data storage module; converting the primary screened short message into an input array, marking the input array as a verification input array, and inputting the verification input array into an intelligent model to judge the primary screened short message; and when the primary screened short messages are judged to be spam short messages, the primary screened short messages are automatically removed from the intelligent terminal.
The above formulas are all calculated by removing dimensions and taking values thereof, the formula is one closest to the real situation obtained by collecting a large amount of data and performing software simulation, and the preset parameters in the formula are set by the technical personnel in the field according to the actual situation.
The working principle of the invention is as follows:
the intelligent terminal receives the short message and then sends the short message to the short message preprocessing module; the short message preprocessing module acquires a sending number of the short message after receiving the short message, and acquires a short message mark database stored in the storage module through the processor; matching the sending number with the number in the short message database, intercepting the short message corresponding to the sending number when the matching result is obtained, and automatically removing the short message from the intelligent terminal; when the matching result is not obtained, the short message is marked as a primary screened short message, and the primary screened short message and the short message analysis signal are sent to a short message analysis module through a processor;
acquiring a sensitive word bank in a data storage module through a processor; extracting the preliminary screened short messages by a Chinese word segmentation technology to obtain verification keywords, matching the verification keywords with the keywords in the sensitive word bank, judging the preliminary screened short messages to be spam short messages when the verification keywords are matched with results in the sensitive word bank, and automatically removing the spam short messages from the intelligent terminal; when the verification keyword cannot be matched with the result in the sensitive word bank, acquiring an intelligent model in the data storage module; converting the primary screened short message into an input array, marking the input array as a verification input array, and inputting the verification input array into an intelligent model to judge the primary screened short message; and when the primary screened short messages are judged to be spam short messages, the primary screened short messages are automatically removed from the intelligent terminal.
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing is merely exemplary and illustrative of the present invention and various modifications, additions and substitutions may be made by those skilled in the art to the specific embodiments described without departing from the scope of the invention as defined in the following claims.

Claims (7)

1. A junk information removing system based on Chinese word segmentation recognition technology is characterized by comprising a processor, an IP analysis module, an information publishing module, a data storage module, a short message preprocessing module, an intelligent model module and a short message analysis module;
the short message preprocessing module is used for preprocessing the short message received by the intelligent terminal to obtain a primary screened short message and sending the primary screened short message to the short message analysis module through the processor;
the short message analysis module analyzes the primary screened short messages through the intelligent model and the keyword analysis technology in sequence, screens out spam short messages according to the analysis result, and sends IP analysis signals to the IP analysis module through the processor;
the intelligent model module is used for acquiring an intelligent model;
the IP analysis module is used for analyzing the IP address of the junk short message.
2. The system of claim 1, wherein the short message preprocessing module is configured to perform preliminary screening on short messages, and comprises:
the intelligent terminal receives the short message and then sends the short message to the short message preprocessing module; the intelligent terminal comprises an intelligent mobile phone and a tablet personal computer;
the short message preprocessing module acquires a sending number of the short message after receiving the short message, and acquires a short message mark database stored in the storage module through the processor;
matching the sending number with the number in the short message database, intercepting the short message corresponding to the sending number when the matching result is obtained, and automatically removing the short message from the intelligent terminal; when the matching result is not obtained, the short message is marked as a primary screened short message, and the primary screened short message and the short message analysis signal are sent to a short message analysis module through a processor;
and sending the sending record of the short message analysis signal to a data storage module for storage through a processor.
3. The system of claim 1, wherein the text message analysis module is configured to analyze the initially selected text message, and comprises:
acquiring a sensitive word bank in a data storage module through a processor; the sensitive word bank at least comprises a keyword of a sensitive word type, and the sensitive word type comprises drugs and yellow-related drugs;
extracting the preliminary screened short messages by a Chinese word segmentation technology to obtain verification keywords, matching the verification keywords with the keywords in the sensitive word bank, judging the preliminary screened short messages to be spam short messages when the verification keywords are matched with results in the sensitive word bank, and automatically removing the spam short messages from the intelligent terminal; when the verification keyword cannot be matched with the result in the sensitive word bank, acquiring an intelligent model in the data storage module;
converting the primary screened short message into an input array, marking the input array as a verification input array, and inputting the verification input array into an intelligent model to judge the primary screened short message;
and when the primary screened short messages are judged to be spam short messages, the primary screened short messages are automatically removed from the intelligent terminal.
4. The system of claim 1, wherein the intelligent model module is configured to train a neural network model to obtain an intelligent model, and the system comprises:
acquiring a spam message database through the Internet, and numbering spam messages;
preprocessing the junk short messages, converting the preprocessed junk short messages into an input array of a neural network model, and taking the serial numbers corresponding to the junk short messages as an output array of the neural network to train the neural network model; the neural network model comprises an error feedforward neural network and an RBF neural network;
and marking the trained neural network model as an intelligent model, and sending the intelligent model to a data storage module for storage through a processor.
5. The junk information removal system according to claim 1 wherein the IP analysis module analyzes the IP address of the junk short message after receiving the IP analysis signal, and adds the IP address to an IP blacklist when the number of times of sending the junk short message by the IP address exceeds a predetermined number of times of sending the junk short message.
6. The system of claim 1, wherein the information distribution module is configured to distribute a reject result of the spam messages and periodically distribute a reject record of the spam messages to the intelligent terminal.
7. A junk information removing method based on a Chinese word segmentation recognition technology is characterized by comprising the following steps:
the method comprises the following steps: the intelligent terminal receives the short message and then sends the short message to the short message preprocessing module; the short message preprocessing module acquires a sending number of the short message after receiving the short message, and acquires a short message mark database stored in the storage module through the processor; matching the sending number with the number in the short message database, intercepting the short message corresponding to the sending number when the matching result is obtained, and automatically removing the short message from the intelligent terminal; when the matching result is not obtained, the short message is marked as a primary screened short message, and the primary screened short message and the short message analysis signal are sent to a short message analysis module through a processor;
step two: acquiring a sensitive word bank in a data storage module through a processor; extracting the preliminary screened short messages by a Chinese word segmentation technology to obtain verification keywords, matching the verification keywords with the keywords in the sensitive word bank, judging the preliminary screened short messages to be spam short messages when the verification keywords are matched with results in the sensitive word bank, and automatically removing the spam short messages from the intelligent terminal; when the verification keyword cannot be matched with the result in the sensitive word bank, acquiring an intelligent model in the data storage module; converting the primary screened short message into an input array, marking the input array as a verification input array, and inputting the verification input array into an intelligent model to judge the primary screened short message; and when the primary screened short messages are judged to be spam short messages, the primary screened short messages are automatically removed from the intelligent terminal.
CN202011391134.2A 2020-12-01 2020-12-01 Junk information removing system and method based on Chinese word segmentation recognition technology Pending CN112380323A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011391134.2A CN112380323A (en) 2020-12-01 2020-12-01 Junk information removing system and method based on Chinese word segmentation recognition technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011391134.2A CN112380323A (en) 2020-12-01 2020-12-01 Junk information removing system and method based on Chinese word segmentation recognition technology

Publications (1)

Publication Number Publication Date
CN112380323A true CN112380323A (en) 2021-02-19

Family

ID=74589618

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011391134.2A Pending CN112380323A (en) 2020-12-01 2020-12-01 Junk information removing system and method based on Chinese word segmentation recognition technology

Country Status (1)

Country Link
CN (1) CN112380323A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116566677A (en) * 2023-05-15 2023-08-08 深圳市智联物联科技有限公司 Short message receiving and transmitting system of serial server

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103874033A (en) * 2012-12-12 2014-06-18 上海粱江通信***股份有限公司 Method for identifying irregular spam short message on the basis of Chinese word segmentation
CN104168548A (en) * 2014-08-21 2014-11-26 北京奇虎科技有限公司 Short message intercepting method and device and cloud server
CN104794125A (en) * 2014-01-20 2015-07-22 中国科学院深圳先进技术研究院 Method and device for recognizing junk short message
CN108062303A (en) * 2017-12-06 2018-05-22 北京奇虎科技有限公司 The recognition methods of refuse messages and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103874033A (en) * 2012-12-12 2014-06-18 上海粱江通信***股份有限公司 Method for identifying irregular spam short message on the basis of Chinese word segmentation
CN104794125A (en) * 2014-01-20 2015-07-22 中国科学院深圳先进技术研究院 Method and device for recognizing junk short message
CN104168548A (en) * 2014-08-21 2014-11-26 北京奇虎科技有限公司 Short message intercepting method and device and cloud server
CN108062303A (en) * 2017-12-06 2018-05-22 北京奇虎科技有限公司 The recognition methods of refuse messages and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116566677A (en) * 2023-05-15 2023-08-08 深圳市智联物联科技有限公司 Short message receiving and transmitting system of serial server
CN116566677B (en) * 2023-05-15 2024-02-13 深圳市智联物联科技有限公司 Short message receiving and transmitting system of serial server

Similar Documents

Publication Publication Date Title
CN106550155B (en) Swindle sample is carried out to suspicious number and screens the method and system sorted out and intercepted
CN109600752B (en) Deep clustering fraud detection method and device
CN103024746B (en) System and method for processing spam short messages for telecommunication operator
CN101784022A (en) Method and system for filtering and classifying short messages
CN109451182B (en) Detection method and device for fraud telephone
CN111405562B (en) Mobile malicious user identification method and system based on communication behavior rules
CN107517463A (en) A kind of recognition methods of telephone number and device
CN103037339A (en) Short message filtering method based on user creditworthiness and short message spam degree
CN111045847A (en) Event auditing method and device, terminal equipment and storage medium
CN111104521A (en) Anti-fraud detection method and detection system based on graph analysis
CN101389085B (en) Rubbish short message recognition system and method based on sending behavior
CN107895122A (en) A kind of special sensitive information active defense method, apparatus and system
CN109684157A (en) Alarm method, equipment, storage medium and device based on the log that reports an error
Singh et al. Email spam classification by support vector machine
CN115759640A (en) Public service information processing system and method for smart city
CN111131627B (en) Method, device and readable medium for detecting personal harmful call based on streaming data atlas
CN109151229A (en) Abnormal call automatic identification early warning system and its working method, call center system
CN115222303A (en) Industry risk data analysis method and system based on big data and storage medium
CN110213152A (en) Identify method, apparatus, server and the storage medium of spam
CN109274834B (en) Express number identification method based on call behavior
CN112380323A (en) Junk information removing system and method based on Chinese word segmentation recognition technology
CN105163296A (en) Multi-dimensional spam message filtering method and system
CN111861733B (en) Fraud prevention and control system and method based on address fuzzy matching
CN113112323A (en) Abnormal order identification method, device, equipment and medium based on data analysis
CN110059189B (en) Game platform message classification system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination