CN117119104B - Telecom fraud active detection processing method based on virtual character orientation training - Google Patents

Telecom fraud active detection processing method based on virtual character orientation training Download PDF

Info

Publication number
CN117119104B
CN117119104B CN202311384816.4A CN202311384816A CN117119104B CN 117119104 B CN117119104 B CN 117119104B CN 202311384816 A CN202311384816 A CN 202311384816A CN 117119104 B CN117119104 B CN 117119104B
Authority
CN
China
Prior art keywords
fraud
call
words
preset
keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311384816.4A
Other languages
Chinese (zh)
Other versions
CN117119104A (en
Inventor
王骕
王凯
鲁兆聪
陈瑞
王礼胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Zhiyu Information Technology Co ltd
Original Assignee
Nanjing Zhiyu Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Zhiyu Information Technology Co ltd filed Critical Nanjing Zhiyu Information Technology Co ltd
Priority to CN202311384816.4A priority Critical patent/CN117119104B/en
Publication of CN117119104A publication Critical patent/CN117119104A/en
Application granted granted Critical
Publication of CN117119104B publication Critical patent/CN117119104B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • H04M3/2281Call monitoring, e.g. for law enforcement purposes; Call tracing; Detection or prevention of malicious calls
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • H04M3/2218Call detail recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • H04M3/2254Arrangements for supervision, monitoring or testing in networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4936Speech interaction details
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Technology Law (AREA)
  • Computer Security & Cryptography (AREA)
  • Artificial Intelligence (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention relates to the technical field of telecommunication fraud detection, and discloses a telecommunication fraud active detection processing method based on virtual character directional training, which comprises the following steps: according to the call request, calling, identifying fraud through voice recognition of the call, and replying to appointed information corresponding to the fraud based on a preset reply corpus; recording and evidence obtaining are carried out on the call, relevant information of the call is recorded, and the recording and the relevant information are respectively sent to the encryption evidence database and the suspicious personnel information database. Through voice recognition and intelligent response technology, based on the calculation result of the first similarity value of single words in the preset keywords and the second similarity value of multiple words in the preset keywords in the conversation process, whether the current conversation belongs to telecommunication fraud is recognized in real time, the timeliness is high, and relevant information of the telecommunication fraud is actively collected and obtained, and provided for relevant departments to prevent fraud.

Description

Telecom fraud active detection processing method based on virtual character orientation training
Technical Field
The invention relates to the technical field of telecommunication fraud detection, in particular to a telecommunication fraud active detection processing method based on virtual character directional training.
Background
In the prior art, the telecommunication fraud call is lack of real-time identification, and the telecommunication fraud behavior can be confirmed only after the end of the telecommunication fraud call, so that timeliness is lacking.
Disclosure of Invention
The embodiment of the invention aims to provide a telecom fraud active detection processing method based on virtual character directional training, which is used for identifying a telecom fraud process through a voice recognition and intelligent response technology, actively collecting and evidence-obtaining telecom fraud related information and providing the information to related departments to prevent fraud activities.
In order to solve the above technical problems, a first aspect of the embodiments of the present invention provides a telecom fraud active detection processing method based on virtual character orientation training, including the following steps:
according to the call request, calling, identifying fraud through voice recognition of the call, and replying to the designated information corresponding to the fraud based on a preset reply corpus: extracting and identifying words in the voice of the call, comparing the identified words with keyword voice data in a preset keyword list, and calculating a first similarity value of the similarity degree of the words and the keywords in the preset keyword list; when the first similarity value of the words is larger than a first preset proportion value, judging that the words belong to keywords in the preset keyword list; when a plurality of words in the call are all judged to be the keywords, calculating a second similarity value of the call as a telecommunication fraud call according to the weight value of the keywords in the preset keyword list; when the second similarity value is larger than a second preset proportion value, judging that the current call is a telecommunication fraud call, and calling a preset corpus to answer;
recording and evidence obtaining the call, recording related information of the call, and respectively transmitting the recording and the related information to an encryption evidence database and a suspicious personnel information database, wherein the recording and evidence obtaining act is authorized;
wherein the related information includes: voice, text, personnel, and/or network address information; the preset keyword list comprises a plurality of sub-lists, wherein the sub-lists respectively correspond to common words used in corresponding preset types of telecommunication fraud activities; the preset corpus comprises a plurality of audio data respectively corresponding to the telecommunication fraud types.
Further, the calculating a first similarity value of the similarity degree between the word and the keyword in the preset keyword list includes:
acquiring the acoustic score and syllable number of the words;
obtaining the standard syllable number of the corresponding keyword;
calculating a first similarity value of the words and the keywords, wherein the calculation formula is as follows:
;
wherein,for the acoustic score at the ith occurrence of the word,/i>N is the number of times the word has appeared, < >>For the number of syllables contained in the word, < +.>For the number of syllables contained in the keyword, < +.>Is the number of the keyword corresponding to the word.
Further, the calculation formula of the second similarity valueThe method comprises the following steps:
wherein m is the number of words which are determined to be keywords in the conversation process, and j is the conversation processThe sequence number of the word determined to be the keyword,,/>a first similarity value between the term determined to be a keyword and the keyword, and +.>And a weight value of the keyword corresponding to the word which is judged to be the keyword by the j-th keyword in the preset keyword list, wherein the weight value is proportional to the occurrence frequency of the word in the telecommunication fraud process.
Further, the calculating process of the second similarity value further includes:
calculating the similarity between every two identical words which appear repeatedly in the conversation at this time;
wherein s and t are the times of the occurrence of the j-th word in the current call respectively,the values of s and t are different, k is a positive real number, < >>、/>Respectively is>Second, th->The signal to noise ratio value of the audio signal of the corresponding frame of the word appearing next time,/>、/>Respectively is>Second, th->Variance of signal-to-noise value of audio signal of corresponding frame of the word appearing next time, ++>Is->Second, th->Covariance of signal-to-noise value of the audio signal of the corresponding frame of the word appearing next time;
obtaining average value of similarity between the jth words in the conversation;
;
Calibrating the second similarity according to the average value of the similarity, wherein the calculation formula of the second similarity value is as followsThe method comprises the following steps:
further, before the call is performed according to the call request, the method further includes:
generating a virtual victim role based on the virtual victim role model, carrying out asynchronous requests of different network platforms through a web crawler technology, and carrying out distributed registration, login, reply and/or sharing operation according to the identity of the virtual victim role.
Further, before generating the virtual victim role based on the virtual victim role model, the method further comprises:
constructing a virtual victim role model, and generating a standardized victim role model according to preset values, wherein the victim role model comprises the following components: character attribute, behavior attribute and communication attribute;
acquiring telecom fraud history data, and training the virtual victim character model based on the telecom fraud history data;
acquiring real victim information in a new telecom fraud case, and continuously training the virtual victim role model through a machine learning algorithm;
wherein the telecom fraud history data and the real victim information in the new telecom fraud case are both legal data that have been authorized for use.
Further, before the constructing the virtual victim role model, the method further comprises:
obtaining a modeling data list from the different network platforms;
the modeling data list includes: the method comprises the steps of acquiring a request mode, cookie and Session, requesting a return result, an operation interface, a transaction interface, a logout interface and an account switching interface of a network platform.
Further, the extracting and identifying the words in the call audio further includes:
acquiring an audio signal of the word, and identifying the audio signal based on a multipartite speech recognition model to obtain a plurality of word description forms of the word corresponding to various dialects;
searching the plurality of word description forms based on the preset keyword list to obtain the word description forms of the keywords belonging to the preset keyword list;
comparing the word description mode belonging to the preset keyword list keywords with the keywords in the preset keyword list obtained through audio signal comparison, and judging the words as the keywords in the preset keyword list if the word description mode is consistent with the keywords in the preset keyword list.
Further, the recording the related information of the call includes:
and converting the voice information of the call into corresponding text contents, sorting, escaping and formatting through a natural language processing technology, and respectively storing the voice information and the text information into the encryption evidence database.
Accordingly, a second aspect of an embodiment of the present invention provides an electronic device, including: at least one processor; and a memory coupled to the at least one processor; the memory stores instructions executable by the one processor, so that the at least one processor performs the telecommunication fraud active detection processing method based on virtual character orientation training.
Accordingly, a third aspect of embodiments of the present invention provides a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the above-described telecommunications fraud active detection processing method based on virtual character orientation training.
The technical scheme provided by the embodiment of the invention has the following beneficial technical effects:
1. the detection is accurate: the invention accurately identifies the sensitive words in the conversation process, and correspondingly replies based on the preset reply corpus, and collects relevant information such as voice, characters, personnel, network addresses and the like after confirming the fraud behavior.
2. The adaptability is strong: compared with traditional telecom fraud detection, the invention uses virtual characters for directional training, provides victim character models with rich types and clear division, and manufactures various characters according to the corresponding models, wherein the characters can comprise: real victims such as fraud in returning a bill, financing a false loan, and a false married girlfriend can adapt to different fraud situations.
3. Continuous optimization: the invention can continuously learn from the training of the virtual roles, improves according to new interaction data, and adapts to the continuously evolving novel telecommunication fraud.
4. And (3) automatic treatment: the training and interaction based on the virtual roles can automatically identify telecommunication fraud, so that manual intervention is reduced, and efficiency is improved.
Drawings
FIG. 1 is a flow chart of a telecommunication fraud active detection processing method based on virtual character orientation training provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of a telecom fraud active detection processing system architecture based on virtual character orientation training according to an embodiment of the present invention.
Detailed Description
The objects, technical solutions and advantages of the present invention will become more apparent by the following detailed description of the present invention with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.
According to the telecommunication fraud active detection processing method, only a user can answer a call passively in an actual detection process, and the user cannot call actively; in the process of answering, whether the opposite party is involved in telecommunication fraud or not is identified in real time, the behaviors meet the relevant regulations of laws and regulations of the country and region where the opposite party is located, and after confirming that the call is involved in telecommunication fraud, voice and text information are obtained and only provided for relevant departments to be used as evidence for identification and verification, and are not disclosed to the public and stored in a special encryption database, so that the problem of personal privacy leakage is prevented.
Referring to fig. 1, a first aspect of the present invention provides a telecom fraud active detection processing method based on virtual character orientation training, which includes the following steps:
step S200, a call is made according to the call request, fraud behaviors are distinguished through voice recognition of the call, and specified information corresponding to the fraud behaviors is replied based on a preset reply corpus. Extracting and identifying words in voice of a call, comparing the identified words with keyword voice data in a preset keyword list, and calculating a first similarity value of the similarity degree of the words and the keywords in the preset keyword list; when the first similarity value of the words is larger than a first preset proportion value, judging that the words belong to keywords in a preset keyword list; when a plurality of words in the call are all judged to be keywords, calculating a second similarity value of the call as a telecommunication fraud call according to the weight value of the keywords in a preset keyword list; when the second similarity value is larger than a second preset proportion value, the current call is judged to be a telecommunication fraud call, and a preset corpus is called for replying.
When the first similarity value of the words appearing in the conversation process and the keywords in the preset keyword list is larger than the first preset proportion value, the words can be confirmed to be common words in the telecommunication fraud process in the preset keyword list, and at the moment, the conversation is likely to belong to telecommunication fraud implementation behaviors.
In daily life, the call can not be completely confirmed by using a word which possibly is a telecom fraud term, and whether the call is a fraud action implemented by a telecom fraud person is confirmed by comprehensively judging other words belonging to keywords in a preset keyword list in the process of the call. Therefore, it is necessary to comprehensively judge a plurality of keywords appearing in the call, and judge the possibility that the current call belongs to a fraudulent call by calculating the second similarity value.
The frequency of keywords that often appear varies from one type of telecommunications fraud to another in the manner of the telephone employed in the fraud process. According to various types of fraud activities, the numerical range of the second preset proportion value can be set to 50% -70% by combining the weight of each keyword, and the specific numerical value is adjusted according to the type of fraud activity.
And step S400, recording and evidence obtaining is carried out on the call, recording related information of the call, and respectively transmitting the recording and the related information to an encryption evidence database and a suspicious personnel information database.
In order to retain the original data of the call process, the call process needs to be recorded. The recording evidence-taking action in the conversation process has obtained the authorization permission of the relevant departments for processing telecommunication fraud, and ensures that the audio data of the recording is only limited to the work of the relevant departments for use.
Wherein the related information includes: voice, text, personnel, and/or network address information; the preset keyword list comprises a plurality of sub-lists, and the sub-lists respectively correspond to common words used in corresponding preset types of telecommunication fraud activities; the preset corpus comprises a plurality of audio data respectively corresponding to the telecommunication fraud types.
The preset keyword list in the invention comprises a plurality of sub-lists respectively corresponding to the telecommunication fraud types, each sub-list corresponds to a keyword commonly used in one telecommunication fraud type, and the keyword with the highest occurrence frequency is listed in the corresponding sub-list according to the different telecommunication fraud types. When the words in the real-time call are extracted, the words are compared with all keywords in a preset keyword list to determine whether the words belong to keywords common to telecommunication fraud in the list, if so, the call can be determined to be the telecommunication fraud, and the call behavior caused by the common call behavior or other users' wrong dialing codes can be determined.
The preset reply corpus stores prerecorded real person audio, and prerecorded according to gender, age, academic and other conditions. Aiming at various problems possibly occurring in the current telecommunication fraud process and various emotional reactions of real victims which should occur in the telecommunication fraud process, real person audio is prerecorded, and after the fraud type and specific fraud means are identified according to the voice recognition technology in the telecommunication fraud process, the real person audio in a preset reply corpus is correspondingly called for playing replies.
In addition, in order to increase the playing effect of the real person audio in the preset reply corpus and improve the credibility, the speech-assisting words such as "thienyl", "o" and the like can be added appropriately in the reply process according to the speaking habit of the ordinary person in the telephone process.
Specifically, the calculating the first similarity value of the similarity degree between the word and the keyword in the preset keyword list in step S200 includes:
step S211, obtaining the acoustic score of the words and the syllable number.
Acoustic scoring refers to the process of analyzing and evaluating speech signals to determine the quality or confidence of speech. The acoustic score may be used in quality assessment, noise detection, speaker recognition, etc. of the speech recognition system. In general, the higher the acoustic score, the better the quality of the speech signal, thereby helping to improve the accuracy of speech recognition.
The method is characterized in that the method comprises the steps of obtaining the acoustic score of the words, determining the acoustic score by evaluating factors such as distortion, noise, breakage and the like of voice quality through root mean square error or signal-to-noise ratio, and modeling and comparing the voice signals based on a Gaussian mixture model or an artificial neural network to output the words in voice frequency so as to obtain the acoustic score corresponding to the words in the voice frequency.
Syllables are the basic phonetic units that make up a word and can be used to measure the length or phonological structure of a word. The number of syllables refers to the number of syllables included in one word. There are a number of ways to determine the number of syllables of a word, one common way is to determine it based on spelling rules of the word. In general, each vowel letter (a, e, i, o, u and y) typically corresponds to a syllable. Consonant letters can appear alone in a syllable or can be combined with previous syllables to form syllables.
And detecting the words identified in the conversation process, and determining the syllable number of the actually detected words.
Step S212, obtaining the standard syllable number of the corresponding keyword.
The standard syllable number of the keywords in the preset keyword list is calculated in advance and stored in the relevant positions of the keywords. And when the first similarity value is calculated, directly calling the number of pre-stored standard syllables and comparing and calculating the number of syllables of the identified words to obtain the first similarity value.
Step S213, calculating a first similarity value between the words and the keywords, wherein the calculation formula is as follows:
;
wherein,for the acoustic score of the ith occurrence of the word, < >>N is the number of times the word has appeared, +.>For the number of syllables contained in the word, +.>For the number of syllables contained in the keyword, +.>Is the number of the keyword corresponding to the word.
Specifically, the calculation formula of the second similarity value in step S200The method comprises the following steps:
wherein m is the number of words which are judged to be keywords in the conversation process, j is the sequence number of the words which are judged to be keywords in the conversation process,,/>a first similarity value between the term determined to be a keyword and the keyword, and +.>And (3) for the weight value of the keyword corresponding to the j-th word judged as the keyword in the preset keyword list, wherein the size of the weight value is in direct proportion to the occurrence frequency of the word in the telecommunication fraud process.
In addition, in order to further improve the accuracy of the calculation of the second similarity value, the calculation process of the second similarity value further includes:
step S221, calculating the similarity between every two identical words which appear repeatedly in the conversation;
wherein s and t are the times of the occurrence of the j-th word in the conversation,,/>the values of s and t are different, k is a positive real number, < >>、/>Respectively is>Second, th->The signal-to-noise value of the audio signal of the corresponding frame of the word appearing next time,/->、/>Respectively is>Second, th->Variance of signal-to-noise value of audio signal of corresponding frame of word appearing next time, < >>Is->Second, th->The next-occurring word corresponds to the covariance of the signal-to-noise value of the frame audio signal.
Step S222, obtaining the average value of the similarity between every two jth words in the current call
Step S223, calibrating the second similarity according to the average value of the similarity, and calculating the second similarity value according to the calculation formulaThe method comprises the following steps:
in addition, in another implementation manner of the embodiment of the present invention, the extracting and identifying the words in the voice of the call in step S200 further includes:
step S201, obtaining the audio signals of the words, and identifying the audio signals based on the multipartite speech recognition model to obtain a plurality of word description forms of the words corresponding to various dialects.
Step S202, searching a plurality of word description forms based on a preset keyword list to obtain word description forms of keywords belonging to the preset keyword list.
Step S203, comparing the word description mode belonging to the keywords in the preset keyword list with the keywords in the preset keyword list obtained through audio signal comparison, and if the word description mode is consistent with the keyword description mode, judging that the word is the keyword in the preset keyword list.
Currently, in the implementation process of telecommunication fraud, the regional attribute of personnel implementing the telecommunication fraud is relatively obvious, the audio signal in the conversation process is identified through a multiparty language identification model comprising the region, the possible word description mode of the word audio signal is determined based on different dialects, the possible word description mode is searched based on the word part content in a preset keyword list, keywords belonging to the preset keyword list are obtained, and the keywords are compared with the keywords identified through the audio mode, so that the confirmed keywords are obtained. The method avoids feedback of wrong information due to audio recognition errors caused by the fact that telecom fraud personnel use dialects with regional characteristics when passing.
In one specific implementation manner of the embodiment of the present invention, before the call according to the call request in step S200, the method further includes:
step S120, generating a virtual victim role based on the virtual victim role model, carrying out asynchronous requests of different network platforms through a web crawler technology, and carrying out distributed registration, login, reply and/or sharing operation according to the identity of the virtual victim role.
Asynchronous requests for multiple different web platforms based on web crawler technology may be accomplished using a programming language (e.g., python) and an associated crawler library (e.g., scrapy, beautifulSoup) by simultaneously retrieving data from multiple web pages. Asynchronous requests allow multiple requests to be sent simultaneously without blocking other requests and responses can be processed more efficiently. By using asynchronous request techniques, multiple requests may be sent in parallel, thereby improving the efficiency of crawling.
The purpose of the release of the relevant information is to enable the common user to effectively appear on suspicious platforms, and any fraud victims can only draw the attention of telecommunication fraud personnel when relevant normal operations are carried out on certain platforms in the early stage.
Further, before the virtual victim character is generated and uploaded to the internet in step S120, the method further includes:
step S111, constructing a virtual victim role model, and generating a standardized victim role model according to preset values, wherein the victim role model comprises the following components: character attributes, behavior attributes, and communication attributes.
According to the classification of various kinds of fraud real victims in reality, the virtual victim roles may also correspond to the following fraud types: the swiping returns fraud, the false network investment financial fraud, the false network loan fraud, the fraud fraud in the form of physical distribution customer service, fraud in the form of public inspection, fraud in the form of fraud false credit-type fraud, false shopping and service-type fraud, impersonation leader and acquaintance-type fraud, false transaction-type fraud and wedding love and friend-making fraud of network game products, and the like.
Step S112, obtaining telecom fraud history data, and training the virtual victim role model based on the telecom fraud history data.
The historical data of telecommunication fraud is usually stored in a database of related units of fraud processing, and after the authorized permission of the related units is obtained, the historical data of telecommunication fraud behavior is obtained; and training the virtual victim character model based on the historical data, thereby improving the accuracy and reliability of the model.
Step S113, new real victim information is acquired, and the virtual victim character model is continuously trained through a machine learning algorithm.
Based on the high-occurrence situation of the current telecommunication fraud, in order to realize accurate identification and judgment of the novel fraud, update data of the related unit database can be periodically obtained after corresponding authorization is obtained, and then the virtual victim character model is continuously trained based on the update data, so that the accuracy of the model is further improved.
Wherein, the telecom fraud history data and the real victim information in the new telecom fraud case are legal data which are authorized to be used. The telecom fraud historical data is non-public data, and the data is stored in a non-public encryption database in the use process, so that the data security is ensured, and the real victim information in the historical data is ensured not to be revealed.
In addition, the roles of the real victim and the virtual victim can be compared in parallel, after the information of the real victim is learned, the same field list is established, and missing items of the virtual role are compared one by one.
Further, prior to constructing the virtual victim character model, further comprising:
step S101, obtaining a modeling data list from a plurality of different network platforms. Specifically, the modeling data list includes: the method comprises the steps of obtaining a request mode, cookie and Session, requesting a return result, an operation interface, a transaction interface, a logout interface and an account switching interface.
Before a model is built, the technical scheme of the invention needs to acquire a data list required by modeling from different network platforms, and acquires various modeling required information such as acquisition request modes, cookie and Session, request return results, operation interfaces, transaction interfaces, logout interfaces, account switching interfaces and the like of each network platform after the authorization permission of each network platform is acquired. The information and the model are stored in a non-public encryption database so as to ensure the security of data.
Before the virtual victim role model is built, modeling metadata is obtained according to a network platform which is most commonly used by a common user, and the virtual victim role model is built through a modeling data list comprising an obtaining request mode, cookie and Session, a request return result, an operation interface, a transaction interface, a logout interface and an account switching interface.
According to the technical scheme, the telecom fraud active detection method based on the virtual character directional training utilizes the generated virtual character to interact with telecom fraud personnel, so that on one hand, the virtual character can be utilized to train in a simulation environment to acquire various data and information, and labor cost is saved; on the other hand, through continuous learning, timely automatic updating can be achieved, and novel fraud which is continuously evolved can be dealt with. By changing the passive situation of telecommunication fraud into active in the process of hitting the telecommunication fraud, the situation of searching a dialing source of the fraud from the traditional relying on mass communication data and telephone records is changed into active hitting, and specific information of the telecommunication fraud is obtained by matching with the communication of the telecommunication fraud personnel, so that the telecommunication fraud is hit in advance. In addition, interaction is performed with telecommunication fraud personnel by utilizing the generated virtual roles, interaction data between the virtual roles are continuously collected, identification and judgment are performed, and evidence obtaining data are delivered to related departments for processing.
Further, the recording of the call related information in step S400 includes:
step S410, converting the voice information of the call into corresponding text content, sorting, escaping and formatting through natural language processing technology, and respectively storing the voice information and the text information into an encryption evidence database.
The technical scheme provides a group of victim role models with rich types and clear division, further automatically realizes the behaviors of registration, login, reply, sharing and the like, makes victim roles which are easy to fall into bill return fraud, fake loan financing, fake loving friends making, information leakage and the like according to the corresponding models, continuously trains the roles through user behavior simulation, and enhances the reality of the role model to a real victim. The incoming call request received by the victim virtual character uploaded to the Internet is read, fraud molecules are automatically replied through a voice recognition and intelligent response technology, and in the process, the language of the fraud molecules is continuously integrated, described, processed, searched and updated, so that fusion of data, information, methods and experience is achieved, a high-quality fraud operation response effect is formed, and the fraud molecules are perceived as successful fraud. And obtaining and recording the data determined to be the suspected fraud molecules.
The real data for training the virtual victim role model can be derived from various fraud cases collected according to network public information, can also cooperate with related departments, train and verify the virtual victim role model according to the internal data of the related departments on the premise of ensuring the safety and confidentiality of the data, and continuously train the virtual victim role model according to the updated data of the related departments so as to improve the accuracy of the virtual victim role model. After the information of the telecommunication fraud personnel is obtained, the related departments can be fed back to perfect various databases related to the telecommunication fraud.
In order to implement the above-mentioned telecom fraud active detection processing method based on virtual character orientation training, please refer to fig. 2, the present invention further provides a telecom fraud active detection processing system for virtual character orientation training, which includes:
the data layer is used for storing various typical fraud model characteristics and metadata providing anti-fraud support, the data layer is based on a storage middleware of a Mysql relational database, and Redis is used as a storage medium for caching data with higher frequency of use of the database;
the processing layer is used for finishing simulation of various fraud models, virtualizing various character images which are easily affected by fraud, enabling the character images to reach the state that criminals intend to develop fraud attempt activities, continuously training the models, and achieving a series of purposes of collecting information of the criminals, blocking the fraud activities and the like;
the display layer guides the user to input the anti-fraud basic metadata to the automatic simulation of various models, and displays of model simulation information, anti-fraud equipment management information, anti-fraud answer records and other information.
Accordingly, a second aspect of an embodiment of the present invention provides an electronic device, including: at least one processor; and a memory coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor performs the telecommunication fraud active detection processing method based on the virtual character orientation training.
Accordingly, a third aspect of embodiments of the present invention provides a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the above-described telecommunications fraud active detection processing method based on virtual character orientation training.
The embodiment of the invention aims to protect a telecom fraud active detection processing method based on virtual character orientation training, which comprises the following steps: according to the call request, calling, identifying fraud through voice recognition of the call, and replying to the designated information corresponding to the fraud based on a preset reply corpus: extracting and identifying words in voice of a call, comparing the identified words with keyword voice data in a preset keyword list, and calculating a first similarity value of the similarity degree of the words and the keywords in the preset keyword list; when the first similarity value of the words is larger than a first preset proportion value, judging that the words belong to keywords in a preset keyword list; when a plurality of words in the call are all judged to be keywords, calculating a second similarity value of the call as a telecommunication fraud call according to the weight value of the keywords in a preset keyword list; when the second similarity value is larger than a second preset proportion value, judging that the current call is a telecommunication fraud call, and calling a preset corpus to answer; recording and evidence obtaining are carried out on the call, relevant information of the call is recorded, and the recording and the relevant information are respectively sent to the encryption evidence database and the suspicious personnel information database. The technical scheme has the following effects:
1. the detection is accurate: the invention accurately identifies the sensitive words in the conversation process, and correspondingly replies based on the preset reply corpus, and collects relevant information such as voice, characters, personnel, network addresses and the like after confirming the fraud behavior.
2. The adaptability is strong: compared with traditional telecom fraud detection, the invention uses virtual characters for directional training, provides victim character models with rich types and clear division, and manufactures various characters according to the corresponding models, wherein the characters can comprise: real victims such as fraud in returning a bill, financing a false loan, and a false married girlfriend can adapt to different fraud situations.
3. Continuous optimization: the invention can continuously learn from the training of the virtual roles, improves according to new interaction data, and adapts to the continuously evolving novel telecommunication fraud.
4. And (3) automatic treatment: the training and interaction based on the virtual roles can automatically identify telecommunication fraud, so that manual intervention is reduced, and efficiency is improved.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims (6)

1. A telecom fraud active detection processing method based on virtual character orientation training is characterized by comprising the following steps:
according to the call request, calling, identifying fraud through voice recognition of the call, and replying to the designated information corresponding to the fraud based on a preset reply corpus: extracting and identifying words in the voice of the call, comparing the identified words with keyword voice data in a preset keyword list, and calculating a first similarity value of the similarity degree of the words and the keywords in the preset keyword list; when the first similarity value of the words is larger than a first preset proportion value, judging that the words belong to keywords in the preset keyword list; when a plurality of words in the call are all judged to be the keywords, calculating a second similarity value of the call as a telecommunication fraud call according to the weight value of the keywords in the preset keyword list; when the second similarity value is larger than a second preset proportion value, judging that the current call is a telecommunication fraud call, and calling a preset corpus to answer;
recording and evidence obtaining the call, recording related information of the call, and respectively transmitting the recording and the related information to an encryption evidence database and a suspicious personnel information database, wherein the recording and evidence obtaining act is authorized;
wherein the related information includes: voice, text, personnel, and/or network address information; the preset keyword list comprises a plurality of sub-lists, wherein the sub-lists respectively correspond to common words used in corresponding preset types of telecommunication fraud activities; the preset corpus comprises a plurality of audio data respectively corresponding to the telecommunication fraud types;
the calculating a first similarity value of the similarity degree of the words and the keywords in the preset keyword list comprises the following steps:
acquiring the acoustic score and syllable number of the words;
obtaining the standard syllable number of the corresponding keyword;
calculating a first similarity value of the words and the keywords, wherein the calculation formula is as follows:
;
wherein,for the acoustic score at the ith occurrence of the word,/i>N is the number of times the word has appeared,for the number of syllables contained in the word, < +.>For the number of syllables contained in the keyword, < +.>Numbering the keywords corresponding to the words;
calculation formula of the second similarity valueThe method comprises the following steps:
wherein m is the number of words which are judged to be keywords in the conversation process, j is the sequence number of the words which are judged to be keywords in the conversation process,,/>a first similarity value between the term determined to be a keyword and the keyword, and +.>The weight value of the keyword corresponding to the word which is judged to be the keyword by the j-th keyword in the preset keyword list is proportional to the occurrence frequency of the word in the telecommunication fraud process;
before the call is carried out according to the call request, the method further comprises the following steps:
constructing a virtual victim role model, and generating a standardized victim role model according to preset values, wherein the victim role model comprises the following components: character attribute, behavior attribute and communication attribute;
acquiring telecom fraud historical data, training a virtual victim character model based on the telecom fraud historical data, and training the virtual victim character model based on the historical data;
acquiring new real victim information, continuously training a virtual victim role model through a machine learning algorithm, and updating the virtual victim role model;
replying to the designated information corresponding to the fraud based on the preset reply expectation library by interacting with the telecommunication fraud by using the generated virtual character;
wherein the telecom fraud history data and the real victim information in the new telecom fraud case are both legal data that have been authorized for use.
2. The method for proactive detection processing of telecommunications fraud based on avatar orientation training of claim 1, wherein the process of calculating the second similarity value further comprises:
calculating the similarity between every two identical words which appear repeatedly in the conversation at this time;
wherein,、/>the number of times of the occurrence of the j-th word in the current call is respectively +.>,/>The values of s and t are different, k is a positive real number, < >>、/>Respectively is>Second, th->The signal-to-noise value of the audio signal of the corresponding frame of the word appearing next time, < >>、/>Respectively is>Second, th->Variance of signal-to-noise value of audio signal of corresponding frame of the word appearing next time, ++>Is->Second, th->Covariance of signal-to-noise value of the audio signal of the corresponding frame of the word appearing next time;
obtaining average value of similarity between the jth words in the conversation;
;
Calibrating the second similarity value according to the average value of the similarity, wherein the calculation formula of the second similarity value is as followsThe method comprises the following steps:
3. the method for actively detecting and processing telecom fraud based on virtual character orientation training according to claim 1, further comprising, before the communicating according to the communication request:
generating a virtual victim role based on the virtual victim role model, carrying out asynchronous requests of different network platforms through a web crawler technology, and carrying out distributed registration, login, reply and/or sharing operation according to the identity of the virtual victim role.
4. The method for proactive detection of telecommunications fraud based on virtual character orientation training of claim 1, further comprising, prior to constructing the virtual victim character model:
obtaining a modeling data list from the different network platforms;
the modeling data list includes: the method comprises the steps of acquiring a request mode, cookie and Session, requesting a return result, an operation interface, a transaction interface, a logout interface and an account switching interface of a network platform.
5. The method for proactive detection processing of telecommunications fraud based on avatar orientation training of claim 4, wherein the extracting and recognizing words in the speech of the conversation further comprises:
acquiring an audio signal of the word, and identifying the audio signal based on a multipartite speech recognition model to obtain a plurality of word description forms of the word corresponding to various dialects;
searching the plurality of word description forms based on the preset keyword list to obtain the word description forms of the keywords belonging to the preset keyword list;
comparing the word description mode belonging to the preset keyword list keywords with the keywords in the preset keyword list obtained through audio signal comparison, and judging the words as the keywords in the preset keyword list if the word description mode is consistent with the keywords in the preset keyword list.
6. The method for actively detecting and processing telecom fraud based on virtual character orientation training according to any one of claims 1 to 5, wherein recording the relevant information of the call includes:
and converting the voice information of the call into corresponding text information, sorting, escaping and formatting through a natural language processing technology, and respectively storing the voice information and the text information into the encryption evidence database.
CN202311384816.4A 2023-10-25 2023-10-25 Telecom fraud active detection processing method based on virtual character orientation training Active CN117119104B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311384816.4A CN117119104B (en) 2023-10-25 2023-10-25 Telecom fraud active detection processing method based on virtual character orientation training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311384816.4A CN117119104B (en) 2023-10-25 2023-10-25 Telecom fraud active detection processing method based on virtual character orientation training

Publications (2)

Publication Number Publication Date
CN117119104A CN117119104A (en) 2023-11-24
CN117119104B true CN117119104B (en) 2024-01-30

Family

ID=88811447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311384816.4A Active CN117119104B (en) 2023-10-25 2023-10-25 Telecom fraud active detection processing method based on virtual character orientation training

Country Status (1)

Country Link
CN (1) CN117119104B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106303058A (en) * 2016-08-24 2017-01-04 成都中英锐达科技有限公司 Anti-swindle audio recognition method and system
CN111901473A (en) * 2020-09-04 2020-11-06 中国平安人寿保险股份有限公司 Incoming call processing method, device, equipment and storage medium
CN113112992A (en) * 2019-12-24 2021-07-13 ***通信集团有限公司 Voice recognition method and device, storage medium and server
CN114257688A (en) * 2021-12-28 2022-03-29 深圳云天励飞技术股份有限公司 Telephone fraud identification method and related device
US11463582B1 (en) * 2021-07-09 2022-10-04 T-Mobile Usa, Inc. Detecting scam callers using conversational agent and machine learning systems and methods
CN116939616A (en) * 2023-09-15 2023-10-24 中关村科学城城市大脑股份有限公司 Equipment control method and device applied to telecommunication fraud prevention and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012042200A1 (en) * 2010-09-30 2012-04-05 British Telecommunications Public Limited Company Speech comparison

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106303058A (en) * 2016-08-24 2017-01-04 成都中英锐达科技有限公司 Anti-swindle audio recognition method and system
CN113112992A (en) * 2019-12-24 2021-07-13 ***通信集团有限公司 Voice recognition method and device, storage medium and server
CN111901473A (en) * 2020-09-04 2020-11-06 中国平安人寿保险股份有限公司 Incoming call processing method, device, equipment and storage medium
US11463582B1 (en) * 2021-07-09 2022-10-04 T-Mobile Usa, Inc. Detecting scam callers using conversational agent and machine learning systems and methods
CN114257688A (en) * 2021-12-28 2022-03-29 深圳云天励飞技术股份有限公司 Telephone fraud identification method and related device
CN116939616A (en) * 2023-09-15 2023-10-24 中关村科学城城市大脑股份有限公司 Equipment control method and device applied to telecommunication fraud prevention and electronic equipment

Also Published As

Publication number Publication date
CN117119104A (en) 2023-11-24

Similar Documents

Publication Publication Date Title
US11706165B2 (en) Personalized chatbots for inmates
CN107222865B (en) Communication swindle real-time detection method and system based on suspicious actions identification
US8145562B2 (en) Apparatus and method for fraud prevention
CN112053221A (en) Knowledge graph-based internet financial group fraud detection method
WO2021190086A1 (en) Face-to-face examination risk control method and apparatus, computer device, and storage medium
US20210368041A1 (en) Systems and methods for authentication and fraud detection
CN111709052A (en) Private data identification and processing method, device, equipment and readable medium
CN111553701A (en) Session-based risk transaction determination method and device
CN112053222A (en) Knowledge graph-based internet financial group fraud detection method
CN112468659A (en) Quality evaluation method, device, equipment and storage medium applied to telephone customer service
CN113807103B (en) Recruitment method, device, equipment and storage medium based on artificial intelligence
CN115577172A (en) Article recommendation method, device, equipment and medium
CN113486166B (en) Construction method, device and equipment of intelligent customer service robot and storage medium
CN117119104B (en) Telecom fraud active detection processing method based on virtual character orientation training
CN114971658B (en) Anti-fraud propaganda method, system, electronic equipment and storage medium
CN113064983B (en) Semantic detection method, semantic detection device, computer equipment and storage medium
CN113792140A (en) Text processing method and device and computer readable storage medium
CN114119030A (en) Fraud prevention method and device, electronic equipment and storage medium
CN112347792A (en) Anti-fraud verification identification method and system based on relationship extraction
CN110766091A (en) Method and system for identifying road loan partner
CN115294635A (en) System, method and electronic equipment applied to anti-fraud knowledge propaganda
CN117540003B (en) Text processing method and related device
CN113255361B (en) Automatic voice content detection method, device, equipment and storage medium
WO2022107241A1 (en) Processing device, processing method, and program
CN116189682A (en) Text information display method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant