CN111669757A - Terminal fraud call identification method based on conversation text word vector - Google Patents

Terminal fraud call identification method based on conversation text word vector Download PDF

Info

Publication number
CN111669757A
CN111669757A CN202010542362.9A CN202010542362A CN111669757A CN 111669757 A CN111669757 A CN 111669757A CN 202010542362 A CN202010542362 A CN 202010542362A CN 111669757 A CN111669757 A CN 111669757A
Authority
CN
China
Prior art keywords
text
vector
word
participle
fraud
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010542362.9A
Other languages
Chinese (zh)
Other versions
CN111669757B (en
Inventor
孙晓晨
宁珊
林格平
张之含
侯炜
洪永婷
倪善金
周书敏
万辛
沈亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinxun Digital Technology Hangzhou Co ltd
National Computer Network and Information Security Management Center
Original Assignee
EB INFORMATION TECHNOLOGY Ltd
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by EB INFORMATION TECHNOLOGY Ltd, National Computer Network and Information Security Management Center filed Critical EB INFORMATION TECHNOLOGY Ltd
Priority to CN202010542362.9A priority Critical patent/CN111669757B/en
Publication of CN111669757A publication Critical patent/CN111669757A/en
Application granted granted Critical
Publication of CN111669757B publication Critical patent/CN111669757B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/12Detection or prevention of fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A terminal fraud call identification method based on conversation text word vectors comprises the following steps: the user marks the incoming call in the terminal App, when the incoming call is marked as a fraud category, the incoming call is converted into a text after the approval of the user authorization, the text is viewed and desensitized by the user, and the text is uploaded to the server and stored as a text sample after the authorization of the user; performing word segmentation and part-of-speech tagging on a text sample to obtain a syntactic dependency label and a word combination vector of a segmented word, splicing the word combination vector, the part-of-speech tagging and the syntactic dependency label to form a content vector of the segmented word, and calculating a scene element label to which the segmented word belongs to obtain a semantic vector of the text sample; constructing a fraud classification recognition model, using a text sample in a server as a training sample, and then pushing the trained model to an App from the server; and after receiving the new call to be identified, the App obtains the fraud category to which the call belongs according to the model and prompts the user. The invention belongs to the technical field of information, and can accurately identify fraud telephones based on conversation texts.

Description

Terminal fraud call identification method based on conversation text word vector
Technical Field
The invention relates to a terminal fraud call identification method based on a conversation text word vector, and belongs to the technical field of information.
Background
The current telecommunication fraud cases launched overseas are increasing day by day, and the filtering requirements of mobile phone users on fraud calls are increasing. However, more and more fraudulent communication behaviors tend to be concealed, the characteristics related to the communication behaviors are weakened, and the accuracy and the recall rate of the mobile phone system for identifying bad calls can be further improved only by analyzing and identifying the communication texts.
At present, the fraud call filtering method based on the mobile phone terminal system in the market is more primitive. Mainstream manufacturers generally adopt a user marking means, that is, the user actively marks the category of the phone and uploads the phone to the server to form a fraud number marking library, so as to filter fraud numbers. The drawback of this approach is that fraudulent calls cannot be found in real time, often when the victim has been found to be deceived.
Therefore, how to accurately identify fraudulent calls based on call texts has become a technical problem generally concerned by various mobile phone manufacturers and mobile phone system developers.
Disclosure of Invention
In view of the above, the present invention provides a method for identifying a terminal fraud phone based on a conversation text word vector, which can accurately identify a fraud phone based on a conversation text.
In order to achieve the above object, the present invention provides a method for identifying terminal fraud calls based on conversation text word vectors, comprising:
step one, a user marks an incoming call in a mobile phone terminal App, for the incoming call marked as a fraud category by the user, the incoming call is extracted and converted into a text after the user authorizes to approve, then the converted text is submitted to the user for inspection and desensitization, and finally the text after the user inspection and desensitization is uploaded to a server to be stored as a text sample after the user authorizes to approve;
secondly, performing word segmentation and part-of-speech tagging on each text sample in the server to obtain a syntactic dependency tag of each word segmentation, then calculating a word vector, a character vector, a pinyin vector and a stroke vector of each word segmentation in the text sample to form a word combination vector of each word segmentation in the text sample, splicing the word combination vector, the part-of-speech tagging and the syntactic dependency tag of each word segmentation to form a content vector of each word segmentation, calculating a scenario element tag to which each word segmentation belongs according to the content vector of each word segmentation, and finally averaging the content vectors and the scenario element tags of all words segmentation in the text sample to obtain a semantic vector corresponding to the text sample;
thirdly, constructing a fraud classification recognition model, inputting semantic vectors corresponding to texts, outputting fraud-related classes to which the texts belong, training the fraud classification recognition model by using text samples uploaded by users in a server as training samples, and then pushing the trained model to a mobile phone terminal App of the users from the server side for model updating;
and step four, after receiving a new call to be identified, the mobile phone terminal App of the user extracts the content text of the call to be identified for word segmentation, generates part-of-speech labels, sentence dependence labels and word combination vectors of all the segmented words in the text, then obtains the fraud category to which the call number to be identified belongs according to a fraud classification identification model in the mobile phone terminal App, and prompts the user through App information.
Compared with the prior art, the invention has the beneficial effects that: the invention provides a method for identifying a conversation text, which can quickly convert the conversation text into a numerical vector, fuse a word vector, a pinyin vector and a stroke vector, construct event elements of various fraud scenes on the basis of part-of-speech identification, realize the targeted analysis of various fraud scenes from a plurality of angles such as event description, subsequent actions, double attitudes and the like, fully ensure the privacy of users, solve the problem of semantic deviation caused by homophonic special-shaped characters or polyphonic characters, and improve the accuracy and recall rate of the identification of bad calls of the users and manufacturers to the greatest extent.
Drawings
FIG. 1 is a flow chart of a terminal fraud call identification method based on conversation text word vectors of the present invention.
Fig. 2 is a flowchart of a specific step of performing word segmentation and part-of-speech tagging on each text sample to obtain a syntactic dependency tag of each segmented word in step two of fig. 1.
Fig. 3 is a flowchart illustrating a specific step of combining word combination vectors, part-of-speech tags, and syntactic dependency tags of each participle in the text sample to form a content vector of each participle, calculating a context element tag to which each participle belongs according to the content vector of each participle, and finally averaging the content vectors and the context element tags of all participles in the text sample to obtain a semantic vector corresponding to the text sample in step two of fig. 1.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the accompanying drawings.
As shown in fig. 1, the present invention provides a method for identifying terminal fraud calls based on conversation text word vectors, which comprises:
step one, a user marks an incoming call in a mobile phone terminal App, for the incoming call marked as a fraud category by the user, the incoming call is extracted and converted into a text after the user authorizes to approve, then the converted text is submitted to the user for inspection and desensitization, and finally the text after the user inspection and desensitization is uploaded to a server to be stored as a text sample after the user authorizes to approve, wherein the desensitization is to remove sensitive information related to personal identity, such as identity card number, name, mobile phone number and the like;
secondly, performing word segmentation and part-of-speech tagging on each text sample in the server to obtain a syntactic dependency tag of each word segmentation, then calculating a word vector, a character vector, a pinyin vector and a stroke vector of each word segmentation in the text sample to form a word combination vector of each word segmentation in the text sample, splicing the word combination vector, the part-of-speech tagging and the syntactic dependency tag of each word segmentation to form a content vector of each word segmentation, calculating a scenario element tag to which each word segmentation belongs according to the content vector of each word segmentation, and finally averaging the content vectors and the scenario element tags of all words segmentation in the text sample to obtain a semantic vector corresponding to the text sample;
thirdly, constructing a fraud classification recognition model, inputting semantic vectors corresponding to texts, outputting fraud-related classes to which the texts belong, training the fraud classification recognition model by using text samples uploaded by users in a server as training samples, and then pushing the trained model to a mobile phone terminal App of the users from the server side for model updating;
and step four, after receiving a new call to be identified, the mobile phone terminal App of the user extracts the content text of the call to be identified for word segmentation, generates part-of-speech labels, sentence dependence labels and word combination vectors of all the segmented words in the text, then obtains the fraud category to which the call number to be identified belongs according to a fraud classification identification model in the mobile phone terminal App, and prompts the user through App information.
The first step may further comprise:
step 11, after a user installs a mobile phone terminal App, obtaining a function of marking an incoming call, when the user marks that the current incoming call is a fraud type by using the function, extracting content in the first 60 seconds of the incoming call by using an HMM algorithm in the mobile phone terminal App so as to generate a content text, then removing personal identity related information in the content text based on a general rule, and finally pushing the desensitized text in the mobile phone terminal App to be viewed by the user;
step 12, the user views the text, can edit the text to further perfect desensitization, and then select whether to upload the desensitization text marked as a fraud category by the user to the server, if so, upload the text and the mark of the fraud category to the server under the authorization of the user;
step 13, performing text cleaning on the text received by the server, wherein the text cleaning comprises the steps of removing abnormal characters except Chinese, English and numbers in the text, uniformly replacing line feed characters and placeholders with blanks, and separating and converting a plurality of blanks into a blank;
and step 14, cleaning the text again, intercepting the first 180 characters of the text, and removing the text with the text amount smaller than 15 characters.
As shown in fig. 2, in the second step, performing word segmentation and part-of-speech tagging on each text sample to obtain a syntactic dependency tag of each word segmentation, which may further include:
step 21, generating a stop word dictionary based on Chinese grammar;
step 22, manually adding common words as a user-defined dictionary based on the fraud scene;
step 23, performing word segmentation and part-of-speech tagging on the text sample by using an HMM algorithm based on a DAG (hidden Markov model) word graph, and simultaneously inputting an optimized word segmentation result of a custom dictionary;
step 24, performing syntactic dependency analysis on each participle by using a fast Offset-based algorithm, and outputting a syntactic dependency label of each participle, as shown in the following table:
Figure BDA0002539352510000031
Figure BDA0002539352510000041
and 25, filtering stop words in the text sample by using the stop word dictionary.
In the second step, the word vector, the pinyin vector and the stroke vector of each participle are calculated to form a word combination vector of each participle in the text sample, and the method further comprises the following steps:
outputting a word vector C of each participle by using a skip-Gram methodw0Word vector CcPinyin vector CpAnd stroke vector CbThen, a word combination vector for each participle is constructed:
Figure BDA0002539352510000042
Figure BDA0002539352510000043
wherein the content of the first and second substances,
Figure BDA0002539352510000044
the vector is a plurality of word combination vectors obtained by different combination modes, and sum represents summation operation.
The invention utilizes a skip-Gram model to convert words into numerical vectors. The core of skip-Gram is a Huffman tree, each word reaches a leaf node from the root of the tree, and a word in its context can be predicted. Each word is iterated N-1 times, resulting in a prediction of all words in its context. I.e. assuming that a text sample S is composed of n words w1......wnComposition of, wherein the word wtThe probability of 2k words occurring with a context word window size of k can be predicted.
As shown in fig. 3, in the second step, the word combination vector, the part-of-speech tagging, and the syntactic dependency tag of each participle in the text sample are merged to form a content vector of each participle, a context element tag to which each participle belongs is obtained through calculation according to the content vector of each participle, and finally, the content vectors and the context element tags of all participles in the text sample are averaged, so as to obtain a semantic vector corresponding to the text, which may further include:
step A1, setting a plurality of scene elements, labeling the scene elements corresponding to each participle by combining with specific event scenes, labeling 12 types of scene elements shown in the following table, and classifying the scene elements not belonging to the 12 types as others;
Figure BDA0002539352510000051
step A2, inputting the word combination vector, part-of-speech tagging and sentence dependency label of each participle in the text sample into an LSTM model for encoding, and obtaining a content vector corresponding to each participle;
step A3, calculating a weighted influence factor of each participle relative to other participles according to the word combination vector of each participle by using Self-orientation;
step A4, combining the content vector of each participle obtained in step A2 and the weighted influence factor of each participle obtained in step A3 into a new content vector of each participle, and then inputting the new content vector of each participle into a CNN model, wherein the output of the CNN model is a scene element corresponding to each participle;
step A5, inputting the new content vector and scene element of each participle in the text sample into the LSTM model for encoding, combining the output results of the LSTM models corresponding to all participles in the text sample into a vector matrix, and taking the average value of the second dimension of the orientation vector matrix as the semantic vector of the text sample.
In step three, a fraud classification identification model may be constructed based on the CNN model.
In the fourth step, the working process of the fraud classification identification model in the mobile phone terminal App is as follows:
the method comprises the steps of combining and forming a content vector of each participle according to a word combination vector, part-of-speech tagging and a sentence dependency tag of each participle in a call text, calculating and obtaining a scene element tag to which each participle belongs according to the content vector of each participle, averaging the content vectors and the scene element tags of all the participles in the call text to obtain a semantic vector corresponding to the call text, inputting the semantic vector corresponding to the call text into a fraud classification recognition model in an App (application) of a mobile phone terminal to obtain a fraud category to which a call number to be recognized belongs, pushing the tag obtained by recognition through an App message and reminding a user, selecting whether to correct the tag by the user and editing and desensitizing the text again, and uploading the text and the tag to a server for secondary training if the user agrees to authorize.
It is worth mentioning that in step two, a plurality of word combination vectors of each participle in the training sample can be calculated, for example
Figure BDA0002539352510000061
Figure BDA0002539352510000062
And then in the third step, semantic vectors obtained by corresponding to different word combination vectors are respectively input into fraud classification recognition models for training, and according to the recognition accuracy rates of the fraud classification recognition models corresponding to different word combination vectors, a fraud classification recognition model with the highest recognition accuracy rate and a word combination vector corresponding to the fraud classification recognition model are selected, so that the fraud classification recognition model and the word combination vector selected in the fourth step are used for calculating and obtaining the fraud category to which the call number to be recognized belongs.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. A terminal fraud call identification method based on conversation text word vectors is characterized by comprising the following steps:
step one, a user marks an incoming call in a mobile phone terminal App, for the incoming call marked as a fraud category by the user, the incoming call is extracted and converted into a text after the user authorizes to approve, then the converted text is submitted to the user for inspection and desensitization, and finally the text after the user inspection and desensitization is uploaded to a server to be stored as a text sample after the user authorizes to approve;
secondly, performing word segmentation and part-of-speech tagging on each text sample in the server to obtain a syntactic dependency tag of each word segmentation, then calculating a word vector, a character vector, a pinyin vector and a stroke vector of each word segmentation in the text sample to form a word combination vector of each word segmentation in the text sample, splicing the word combination vector, the part-of-speech tagging and the syntactic dependency tag of each word segmentation to form a content vector of each word segmentation, calculating a scenario element tag to which each word segmentation belongs according to the content vector of each word segmentation, and finally averaging the content vectors and the scenario element tags of all words segmentation in the text sample to obtain a semantic vector corresponding to the text sample;
thirdly, constructing a fraud classification recognition model, inputting semantic vectors corresponding to texts, outputting fraud-related classes to which the texts belong, training the fraud classification recognition model by using text samples uploaded by users in a server as training samples, and then pushing the trained model to a mobile phone terminal App of the users from the server side for model updating;
and step four, after receiving a new call to be identified, the mobile phone terminal App of the user extracts the content text of the call to be identified for word segmentation, generates part-of-speech labels, sentence dependence labels and word combination vectors of all the segmented words in the text, then obtains the fraud category to which the call number to be identified belongs according to a fraud classification identification model in the mobile phone terminal App, and prompts the user through App information.
2. The method of claim 1, wherein step one further comprises:
step 11, after a user installs a mobile phone terminal App, obtaining a function of marking an incoming call, when the user marks that the current incoming call is a fraud type by using the function, extracting content in the first 60 seconds of the incoming call by using an HMM algorithm in the mobile phone terminal App so as to generate a content text, then removing personal identity related information in the content text based on a general rule, and finally pushing the desensitized text in the mobile phone terminal App to be viewed by the user;
step 12, the user views the text, edits the text to further improve desensitization, and then selects whether to upload the desensitization text marked as a fraud category by the user to the server, and if so, uploads the text and the mark of the fraud category to the server under the authorization of the user;
step 13, performing text cleaning on the text received by the server, wherein the text cleaning comprises the steps of removing abnormal characters except Chinese, English and numbers in the text, uniformly replacing line feed characters and placeholders with blanks, and separating and converting a plurality of blanks into a blank;
and step 14, cleaning the text again, intercepting the first 180 characters of the text, and removing the text with the text amount smaller than 15 characters.
3. The method according to claim 1, wherein in the second step, word segmentation and part-of-speech tagging are performed on each text sample to obtain a syntactic dependency label of each word segmentation, and the method further comprises:
step 21, generating a stop word dictionary based on Chinese grammar;
step 22, manually adding common words as a user-defined dictionary based on the fraud scene;
step 23, performing word segmentation and part-of-speech tagging on the text sample by using an HMM algorithm based on a DAG (hidden Markov model) word graph, and simultaneously inputting an optimized word segmentation result of a custom dictionary;
step 24, performing syntactic dependency analysis on each participle by using a fast Offset-based algorithm, and outputting a syntactic dependency label of each participle;
and 25, filtering stop words in the text sample by using the stop word dictionary.
4. The method of claim 1, wherein in step two, the word vector, the pinyin vector, and the stroke vector of each participle are calculated to form a word combination vector of each participle in the text sample, and further comprising:
outputting a word vector C of each participle by using a skip-Gram methodw0Word vector CcPinyin vector CpAnd stroke vector CbThen, a word combination vector for each participle is constructed:
Figure FDA0002539352500000021
Figure FDA0002539352500000022
wherein the content of the first and second substances,
Figure FDA0002539352500000023
the vector is a plurality of word combination vectors obtained by different combination modes, and sum represents summation operation.
5. The method according to claim 1, wherein in the second step, a word combination vector, part-of-speech tagging and a syntactic dependency tag of each participle in the text sample are combined to form a content vector of each participle, a context element tag to which each participle belongs is obtained through calculation according to the content vector of each participle, and finally, the content vectors and the context element tags of all participles in the text sample are averaged to obtain a semantic vector corresponding to the text, further comprising:
step A1, setting a plurality of scene elements;
step A2, inputting the word combination vector, part-of-speech tagging and sentence dependency label of each participle in the text sample into an LSTM model for encoding, and obtaining a content vector corresponding to each participle;
step A3, calculating a weighted influence factor of each participle relative to other participles according to the word combination vector of each participle by using Self-orientation;
step A4, combining the content vector of each participle obtained in step A2 and the weighted influence factor of each participle obtained in step A3 into a new content vector of each participle, and then inputting the new content vector of each participle into a CNN model, wherein the output of the CNN model is a scene element corresponding to each participle;
step A5, inputting the new content vector and scene element of each participle in the text sample into the LSTM model for encoding, combining the output results of the LSTM models corresponding to all participles in the text sample into a vector matrix, and taking the average value of the second dimension of the orientation vector matrix as the semantic vector of the text sample.
6. The method as recited in claim 1, wherein, in step three, a fraud classification identification model is constructed based on a CNN model.
7. The method as claimed in claim 1, wherein in step four, the fraud classification recognition model works in the cell phone terminal App as follows:
the method comprises the steps of combining and forming a content vector of each participle according to a word combination vector, part-of-speech tagging and a sentence dependency tag of each participle in a call text, calculating and obtaining a scene element tag to which each participle belongs according to the content vector of each participle, averaging the content vectors and the scene element tags of all the participles in the call text to obtain a semantic vector corresponding to the call text, inputting the semantic vector corresponding to the call text into a fraud classification recognition model in an App (application) of a mobile phone terminal to obtain a fraud category to which a call number to be recognized belongs, pushing the tag obtained by recognition through an App message and reminding a user, selecting whether to correct the tag by the user and editing and desensitizing the text again, and uploading the text and the tag to a server for secondary training if the user agrees to authorize.
8. The method as claimed in claim 1, wherein a plurality of word combination vectors of each segmented word in the training samples are calculated in step two, then the semantic vectors corresponding to different word combination vectors are respectively inputted into the fraud classification recognition models in step three for training, and the fraud classification recognition model with the highest recognition accuracy and the corresponding word combination vector thereof are selected according to the recognition accuracy of the fraud classification recognition models corresponding to different word combination vectors, so that the fraud category to which the call number to be recognized belongs is calculated and obtained in step four by using the selected fraud classification recognition model and the word combination vector.
CN202010542362.9A 2020-06-15 2020-06-15 Terminal fraud call identification method based on conversation text word vector Active CN111669757B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010542362.9A CN111669757B (en) 2020-06-15 2020-06-15 Terminal fraud call identification method based on conversation text word vector

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010542362.9A CN111669757B (en) 2020-06-15 2020-06-15 Terminal fraud call identification method based on conversation text word vector

Publications (2)

Publication Number Publication Date
CN111669757A true CN111669757A (en) 2020-09-15
CN111669757B CN111669757B (en) 2023-03-14

Family

ID=72387708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010542362.9A Active CN111669757B (en) 2020-06-15 2020-06-15 Terminal fraud call identification method based on conversation text word vector

Country Status (1)

Country Link
CN (1) CN111669757B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766903A (en) * 2021-01-18 2021-05-07 阿斯利康投资(中国)有限公司 Method, apparatus, device and medium for identifying adverse events
CN113254595A (en) * 2021-06-22 2021-08-13 北京沃丰时代数据科技有限公司 Chatting recognition method and device, electronic equipment and storage medium
CN114021564A (en) * 2022-01-06 2022-02-08 成都无糖信息技术有限公司 Segmentation word-taking method and system for social text
CN114091476A (en) * 2021-11-18 2022-02-25 北京淘友天下科技发展有限公司 Dialog recognition method and device, electronic equipment and computer readable storage medium
WO2022121183A1 (en) * 2020-12-11 2022-06-16 平安科技(深圳)有限公司 Text model training method, recognition method, apparatus, device and storage medium
WO2022166613A1 (en) * 2021-02-02 2022-08-11 北京有竹居网络技术有限公司 Method and apparatus for recognizing role in text, and readable medium and electronic device
CN117891926A (en) * 2024-03-15 2024-04-16 环球数科集团有限公司 Text feature fraud early warning system based on artificial intelligence

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294845A (en) * 2016-08-19 2017-01-04 清华大学 The many emotions sorting technique extracted based on weight study and multiple features and device
CN108153727A (en) * 2017-12-18 2018-06-12 浙江鹏信信息科技股份有限公司 Utilize the method for semantic mining algorithm mark sales calls and the system of improvement sales calls
CN108428447A (en) * 2018-06-19 2018-08-21 科大讯飞股份有限公司 A kind of speech intention recognition methods and device
CN108566627A (en) * 2017-11-27 2018-09-21 浙江鹏信信息科技股份有限公司 A kind of method and system identifying fraud text message using deep learning
CN109388801A (en) * 2018-09-30 2019-02-26 阿里巴巴集团控股有限公司 The determination method, apparatus and electronic equipment of similar set of words
CN109451182A (en) * 2018-10-19 2019-03-08 北京邮电大学 A kind of detection method and device of fraudulent call
CN110309299A (en) * 2018-04-12 2019-10-08 腾讯科技(深圳)有限公司 Communicate anti-swindle method, apparatus, computer-readable medium and electronic equipment
CN110427608A (en) * 2019-06-24 2019-11-08 浙江大学 A kind of Chinese word vector table dendrography learning method introducing layering ideophone feature

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294845A (en) * 2016-08-19 2017-01-04 清华大学 The many emotions sorting technique extracted based on weight study and multiple features and device
CN108566627A (en) * 2017-11-27 2018-09-21 浙江鹏信信息科技股份有限公司 A kind of method and system identifying fraud text message using deep learning
CN108153727A (en) * 2017-12-18 2018-06-12 浙江鹏信信息科技股份有限公司 Utilize the method for semantic mining algorithm mark sales calls and the system of improvement sales calls
CN110309299A (en) * 2018-04-12 2019-10-08 腾讯科技(深圳)有限公司 Communicate anti-swindle method, apparatus, computer-readable medium and electronic equipment
CN108428447A (en) * 2018-06-19 2018-08-21 科大讯飞股份有限公司 A kind of speech intention recognition methods and device
CN109388801A (en) * 2018-09-30 2019-02-26 阿里巴巴集团控股有限公司 The determination method, apparatus and electronic equipment of similar set of words
CN109451182A (en) * 2018-10-19 2019-03-08 北京邮电大学 A kind of detection method and device of fraudulent call
CN110427608A (en) * 2019-06-24 2019-11-08 浙江大学 A kind of Chinese word vector table dendrography learning method introducing layering ideophone feature

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022121183A1 (en) * 2020-12-11 2022-06-16 平安科技(深圳)有限公司 Text model training method, recognition method, apparatus, device and storage medium
CN112766903A (en) * 2021-01-18 2021-05-07 阿斯利康投资(中国)有限公司 Method, apparatus, device and medium for identifying adverse events
CN112766903B (en) * 2021-01-18 2024-02-06 阿斯利康投资(中国)有限公司 Method, device, equipment and medium for identifying adverse event
WO2022166613A1 (en) * 2021-02-02 2022-08-11 北京有竹居网络技术有限公司 Method and apparatus for recognizing role in text, and readable medium and electronic device
CN113254595A (en) * 2021-06-22 2021-08-13 北京沃丰时代数据科技有限公司 Chatting recognition method and device, electronic equipment and storage medium
CN114091476A (en) * 2021-11-18 2022-02-25 北京淘友天下科技发展有限公司 Dialog recognition method and device, electronic equipment and computer readable storage medium
CN114021564A (en) * 2022-01-06 2022-02-08 成都无糖信息技术有限公司 Segmentation word-taking method and system for social text
CN114021564B (en) * 2022-01-06 2022-04-01 成都无糖信息技术有限公司 Segmentation word-taking method and system for social text
CN117891926A (en) * 2024-03-15 2024-04-16 环球数科集团有限公司 Text feature fraud early warning system based on artificial intelligence
CN117891926B (en) * 2024-03-15 2024-05-14 环球数科集团有限公司 Text feature fraud early warning system based on artificial intelligence

Also Published As

Publication number Publication date
CN111669757B (en) 2023-03-14

Similar Documents

Publication Publication Date Title
CN111669757B (en) Terminal fraud call identification method based on conversation text word vector
CN112804400B (en) Customer service call voice quality inspection method and device, electronic equipment and storage medium
CN107729309B (en) Deep learning-based Chinese semantic analysis method and device
CN110444198B (en) Retrieval method, retrieval device, computer equipment and storage medium
CN111695352A (en) Grading method and device based on semantic analysis, terminal equipment and storage medium
CN110362822B (en) Text labeling method, device, computer equipment and storage medium for model training
CN112380853B (en) Service scene interaction method and device, terminal equipment and storage medium
CN108536654A (en) Identify textual presentation method and device
CN112992125B (en) Voice recognition method and device, electronic equipment and readable storage medium
CN107967250B (en) Information processing method and device
CN111223476B (en) Method and device for extracting voice feature vector, computer equipment and storage medium
CN112699683A (en) Named entity identification method and device fusing neural network and rule
CN113240510A (en) Abnormal user prediction method, device, equipment and storage medium
CN112235470B (en) Incoming call client follow-up method, device and equipment based on voice recognition
CN110633475A (en) Natural language understanding method, device and system based on computer scene and storage medium
CN112784573A (en) Text emotion content analysis method, device and equipment and storage medium
CN116610772A (en) Data processing method, device and server
KR101440887B1 (en) Method and apparatus of recognizing business card using image and voice information
CN116741155A (en) Speech recognition method, training method, device and equipment of speech recognition model
CN110728145A (en) Method for establishing natural language understanding model based on recording conversation
CN113420549B (en) Abnormal character string identification method and device
CN113470617B (en) Speech recognition method, electronic equipment and storage device
CN111464687A (en) Strange call request processing method and device
CN114595332A (en) Text classification prediction method and device and electronic equipment
CN113590828A (en) Method and device for acquiring call key information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 100029 Beijing city Chaoyang District Yumin Road No. 3

Patentee after: NATIONAL COMPUTER NETWORK AND INFORMATION SECURITY MANAGEMENT CENTER

Patentee after: Xinxun Digital Technology (Hangzhou) Co.,Ltd.

Address before: 100029 Beijing city Chaoyang District Yumin Road No. 3

Patentee before: NATIONAL COMPUTER NETWORK AND INFORMATION SECURITY MANAGEMENT CENTER

Patentee before: EB Information Technology Ltd.

CP01 Change in the name or title of a patent holder