CN110648670B

CN110648670B - Fraud identification method and device, electronic equipment and computer-readable storage medium

Info

Publication number: CN110648670B
Application number: CN201911006499.6A
Authority: CN
Inventors: 赖勇铨; 贺亚运; 林春; 李美玲
Original assignee: China Citic Bank Corp Ltd
Current assignee: China Citic Bank Corp Ltd
Priority date: 2019-10-22
Filing date: 2019-10-22
Publication date: 2021-11-26
Anticipated expiration: 2039-10-22
Also published as: CN110648670A

Abstract

The application provides a fraud identification method, a fraud identification device, electronic equipment and a computer-readable storage medium, which are applied to the technical field of audio processing, wherein the method comprises the following steps: clustering each voiceprint feature in the voiceprint feature pool through a clustering algorithm to obtain at least one cluster, sequencing each cluster according to the similarity of each cluster, and determining a fraud identification object according to a sequencing result, so that the problem of how to identify the identity of one person pretending to be multiple persons for carrying out fraud application is solved, and the automatic processing of fraud identification is realized; in addition, based on the clustering analysis of the voiceprint features of the current applicant and the voiceprint features of the historical applicant, the problem of identification of the fraud application in the historical application is solved, and the accuracy of identification of the fraud application is improved.

Description

Fraud identification method and device, electronic equipment and computer-readable storage medium

Technical Field

The present application relates to the field of audio processing technologies, and in particular, to a fraud identification method and apparatus, an electronic device, and a computer-readable storage medium.

Background

With the development of credit card business and the great increase of card issuing amount in China, the risk of credit cards is increasingly highlighted, the bad account rate of the credit cards is increased, the bad account occupation ratio caused by fraud applications is large, and how to effectively identify the fraud applications becomes a problem.

At present, the credit card fraud application is implemented by means of identity authentication, i.e. a service person checks the information provided by the credit card applicant, and if false information is provided, the credit card fraud application is identified as a fraud application. However, the existing method for identifying the fraud application through identity authentication cannot identify the situation of fraud application of credit card by one person pretending to be the identity of multiple persons.

Disclosure of Invention

The application provides a fraud identification method, a fraud identification device, electronic equipment and a computer readable storage medium, which are used for solving the problem of identification of fraud applications by one person pretending to be identities of multiple persons and the problem of identification of the fraud applications in historical applications, and adopt the following technical scheme:

in a first aspect, there is provided a fraud identification method, the method comprising,

acquiring audio information of a plurality of target objects;

based on the audio information of a plurality of target objects, extracting the voiceprint features of each target object through a pre-trained first deep learning model to obtain a voiceprint feature pool;

clustering each voiceprint feature in the voiceprint feature pool through a clustering algorithm to obtain at least one cluster, and sequencing each cluster according to the similarity of each cluster;

and determining a fraud target object based on the sorting result of each cluster.

Wherein, the clustering algorithm is a greedy clustering algorithm.

Optionally, obtaining audio information of a plurality of target objects, previously comprising:

acquiring call voice information between a target object and a corresponding customer service;

carrying out voice separation on the target object and the call voice information of the corresponding customer service based on a corresponding voice separation algorithm;

and determining the audio information of the target object based on the audio frequency of the call voice information between the target object and the corresponding customer service after voice separation.

Optionally, the voice separation of the target object from the call voice information of the corresponding customer service based on the corresponding voice separation algorithm includes at least one of:

carrying out voice separation on each target object and the communication voice information of the corresponding customer service through a convolutional neural network and a cyclic neural network;

and carrying out voice separation on each target object and the call voice information of the corresponding customer service through a sliding window method and a K-Means clustering algorithm.

Optionally, performing voice separation on each target object and the call voice information of the corresponding customer service through a convolutional neural network and a cyclic neural network, including:

acquiring a spectrogram corresponding to the call voice information of the target object and the corresponding customer service;

extracting three-dimensional features of a spectrogram based on a pre-trained convolutional neural network, wherein the three-dimensional features comprise time dimension features, frequency dimension features and channel dimension features;

carrying out average pooling treatment on the three-dimensional characteristics in a frequency dimension to obtain pooled characteristics;

inputting the pooled features into a pre-trained recurrent neural network to obtain a segmentation label on a time dimension;

and performing voice separation on the target call audio based on the segmentation label on the time dimension.

Optionally, performing voice separation on each target object and the call voice information of the corresponding customer service through a sliding window method and a K-Means clustering algorithm, including:

extracting voice features of the communication voice information of the target object and the corresponding customer service through a sliding window method based on a pre-trained second deep learning model to obtain a feature vector array;

performing K-Means clustering on the feature vector array to obtain a label of each feature vector;

and based on the label of each feature vector, carrying out voice separation on each target object and the call voice information of the corresponding customer service.

Optionally, determining the audio information of the target object based on the audio obtained by separating the call voice information between the target object and the corresponding customer service by voice separation includes:

extracting voice print characteristics of the audio frequency after voice separation of the call voice information between the target object and the corresponding customer service;

calculating the similarity between the voiceprint characteristics of the voice-separated audio and the voiceprint characteristics of at least one pre-stored customer service;

based on the similarity calculation result, audio information of the target object is determined.

In a second aspect, there is provided a fraud identification apparatus, the apparatus comprising,

the first acquisition module is used for acquiring audio information of a plurality of target objects;

the extraction module is used for extracting the voiceprint features of each target object through a pre-trained first deep learning model based on the audio information of a plurality of target objects to obtain a voiceprint feature pool;

the sorting module is used for performing clustering and classifying processing on each voiceprint feature in the voiceprint feature pool through a clustering algorithm to obtain at least one cluster, and sorting each cluster according to the similarity of each cluster;

and the first determining module is used for determining the cheating target object based on the sorting result of each cluster.

Wherein, the clustering algorithm is a greedy clustering algorithm.

Optionally, the apparatus further comprises:

the second acquisition module is used for acquiring the call voice information between the target object and the corresponding customer service;

the voice separation module is used for carrying out voice separation on the target object and the call voice information of the corresponding customer service based on a corresponding voice separation algorithm;

and the second determining module is used for determining the audio information of the target object based on the audio frequency of the call voice information between the target object and the corresponding customer service after voice separation.

Optionally, the voice separation module is configured to perform voice separation on each target object and the call voice information of the corresponding customer service through a convolutional neural network and a cyclic neural network; and/or the method is used for separating each target object from the call voice information of the corresponding customer service by a sliding window method and a K-Means clustering algorithm.

Optionally, the voice separation module comprises:

the acquisition unit is used for acquiring a spectrogram of the target object corresponding to the call voice information of the corresponding customer service;

the first extraction unit is used for extracting three-dimensional features of the spectrogram based on a pre-trained convolutional neural network, wherein the three-dimensional features comprise time dimension features, frequency dimension features and channel dimension features;

the pooling unit is used for carrying out average pooling treatment on the three-dimensional characteristics in a frequency dimension to obtain pooled characteristics;

the first label unit is used for inputting the pooled features into a pre-trained recurrent neural network to obtain a segmentation label on a time dimension;

and the first separation unit is used for carrying out voice separation on the target call audio based on the segmentation label on the time dimension.

Optionally, the voice separation module comprises:

the second extraction unit is used for extracting the voice characteristics of the call voice information of the target object and the corresponding customer service on the basis of a pre-trained second deep learning model through a sliding window method to obtain a feature vector array;

the second label unit is used for carrying out K-Means clustering on the feature vector array to obtain labels of all feature vectors;

and the second separation unit is used for separating the voice of each target object from the communication voice information of the corresponding customer service based on the label of each feature vector.

Optionally, the second determining module includes:

the third extraction unit is used for extracting the voiceprint characteristics of the audio frequency after the voice separation of the call voice information between the target object and the corresponding customer service;

the calculating unit is used for calculating the similarity between the voiceprint characteristics of the voice-separated audio and the voiceprint characteristics of at least one pre-stored customer service;

a third determination unit configured to determine audio information of the target object based on the similarity calculation result.

In a third aspect, an electronic device is provided, which includes:

one or more processors;

a memory;

one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: the fraud identification method shown in the first aspect is performed.

In a fourth aspect, there is provided a computer-readable storage medium for storing computer instructions which, when run on a computer, cause the computer to perform the fraud identification method of the first aspect.

Compared with the prior art that the fraud application is recognized through an identity authentication mode, the fraud recognition method, the device, the electronic equipment and the computer-readable storage medium have the advantages that the audio information of a plurality of target objects is obtained, the voiceprint features of the target objects are extracted through the pre-trained first deep learning model based on the audio information of the target objects to obtain a voiceprint feature pool, then clustering and classifying are carried out on the voiceprint features in the voiceprint feature pool through a clustering algorithm to obtain at least one cluster, the clusters are sequenced according to the similarity of the clusters, and finally the fraud target object is determined based on the sequencing result of the clusters. Clustering each voiceprint feature in the voiceprint feature pool through a clustering algorithm to obtain at least one cluster, sequencing each cluster according to the similarity of each cluster, and determining a fraud identification object according to a sequencing result, so that the problem of how to identify the identity of one person pretending to be multiple persons for carrying out fraud application is solved, and the automatic processing of fraud identification is realized; in addition, based on the clustering analysis of the voiceprint features of the current applicant and the voiceprint features of the historical applicant, the problem of identification of the fraud application in the historical application is solved, and the accuracy of identification of the fraud application is improved.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic flow chart illustrating a fraud identification method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a fraud identification apparatus according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of another fraud identification apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

An embodiment of the present application provides a fraud identification method, as shown in fig. 1, the method may include the following steps:

step S101, acquiring audio information of a plurality of target objects;

specifically, audio information of a plurality of target objects is acquired, wherein the target objects can be credit card applicants, the audio information is dialogue information with customer service when the credit card applicants apply for the credit card, but the audio information only comprises audio of the credit card applicants and does not comprise an audio part of the customer service.

Step S102, extracting the voiceprint features of each target object through a pre-trained first deep learning model based on the audio information of a plurality of target objects to obtain a voiceprint feature pool;

specifically, the voiceprint feature of each target object is extracted through the pre-trained first deep learning model to obtain a voiceprint feature pool, wherein the audio information of the plurality of target objects can be subjected to standardization processing, the spectrogram of the plurality of target objects is obtained through a sliding window method and FFT (fast Fourier transform), and the voiceprint feature of each target object is extracted through the pre-trained first deep learning model to obtain the voiceprint feature pool based on the obtained plurality of spectrograms.

Step S103, clustering each voiceprint feature in the voiceprint feature pool through a clustering algorithm to obtain at least one cluster, and sequencing each cluster according to the similarity of each cluster;

specifically, clustering each voiceprint feature in the voiceprint feature pool through a corresponding clustering algorithm to obtain at least one cluster, wherein the clustering algorithm can be a greedy clustering algorithm, a K-Means clustering algorithm, a hierarchical clustering algorithm and the like; after the clustering process, the clusters can be sorted according to the similarity values respectively corresponding to the clusters obtained by calculation, wherein the clusters can be sorted from high to low according to the similarity values.

And step S104, determining a cheating target object based on the sorting result of each cluster.

Illustratively, the clusters are sorted according to the similarity from high to low, the applicant contained in the top N clusters can be set as a fraudulent applicant, and the applicant contained in the cluster with the similarity exceeding a predetermined threshold can also be set as a fraudulent applicant, wherein the higher the similarity in the cluster is, the higher the probability that one person in the cluster will masquerade as multiple persons to perform a fraudulent application is.

Compared with the prior art that the fraud application is recognized through an identity authentication mode, the fraud recognition method includes the steps that audio information of a plurality of target objects is obtained, voiceprint features of the target objects are extracted through a pre-trained first deep learning model to obtain a voiceprint feature pool based on the audio information of the target objects, clustering classification processing is conducted on the voiceprint features in the voiceprint feature pool through a clustering algorithm to obtain at least one cluster, the clusters are sequenced according to the similarity of the clusters, and finally the fraud target objects are determined based on sequencing results of the clusters. Clustering each voiceprint feature in the voiceprint feature pool through a clustering algorithm to obtain at least one cluster, sequencing each cluster according to the similarity of each cluster, and determining a fraud identification object according to a sequencing result, so that the problem of how to identify the identity of one person pretending to be multiple persons for carrying out fraud application is solved, and the automatic processing of fraud identification is realized; in addition, based on the clustering analysis of the voiceprint features of the current applicant and the voiceprint features of the historical applicant, the problem of identification of the fraud application in the historical application is solved, and the accuracy of identification of the fraud application is improved.

The embodiment of the application provides a possible implementation manner, wherein the clustering algorithm is a greedy clustering algorithm.

Specifically, the clustering algorithm is a greedy clustering algorithm, the voiceprint features of a plurality of target applicants form a feature vector pool, and clustering of feature points are performed by using the greedy algorithm. The cluster processing method by using the greedy algorithm comprises the following steps:

step 1, randomly and unreplacing a non-clustered feature point from a feature vector pool as an initial member of a cluster; step 2, sequentially detecting the average distance between each feature point in the feature vector pool and all members in the cluster, adding the feature points into the cluster if the distance is smaller than a specified threshold value, and independently clustering the feature points if the distances between a certain feature point and all other feature points are larger than the specified threshold value; and 3, randomly and not putting back and extracting one non-clustered feature point from the feature vector pool, repeating the processes 1 and 2 until all members in the feature vector pool are clustered, and finishing clustering.

For the embodiment of the application, a greedy clustering algorithm is used for clustering the voiceprint features, and compared with algorithms such as K-Means or spectral clustering, the clustering method has the advantages of small calculation complexity, no need of specifying the category number in advance and more convenience in operation.

The embodiment of the present application provides a possible implementation manner, and further, before step S101, the method includes:

step 105 (not shown in the figure), obtaining the communication voice information between the target object and the corresponding customer service;

specifically, when the target object applies for a credit card, the audio acquisition device of the terminal device such as a mobile phone may be used to acquire the call voice information between the target object and the corresponding customer service. The voice information of the call can be mute-cut by VAD, and then the music part such as mobile phone ring in the voice information of the call can be cut by pre-training a deep learning model capable of identifying music segments.

Step 106 (not shown in the figure), performing voice separation on the target object and the call voice information of the corresponding customer service based on a corresponding voice separation algorithm;

specifically, the target object and the call voice information of the corresponding customer service are subjected to voice separation through a corresponding voice segmentation algorithm; the voice separation technique distinguishes the speaking sections of different speakers from one section of audio data without any prior knowledge, and labels the sections one by one.

Step 107 (not shown in the figure), determining the audio information of the target object based on the audio after the voice separation of the call voice information between the target object and the corresponding customer service.

Illustratively, after voice separation is performed on call voice information between the target object and the corresponding customer service, two kinds of audio are obtained, wherein one kind of audio represents the audio of the target object, and the other kind of audio represents the audio of the customer service, and which kind of audio is determined as the audio of the target object based on the two kinds of audio.

With the embodiment of the application, the problem of determining the audio information of the target object is solved.

The embodiment of the present application provides a possible implementation manner, and step S106 (not shown in the figure) includes:

step S1061 (not shown in the figure), performing voice separation on the communication voice information of each target object and the corresponding customer service through the convolutional neural network and the cyclic neural network;

step S1062 (not shown in the figure), the speech separation is performed on the call speech information of each target object and the corresponding customer service through a sliding window method and a K-Means clustering algorithm.

Specifically, the corresponding speech separation can be realized by a method of a convolutional neural network and a cyclic neural network, or by a sliding window method and a K-Means clustering algorithm.

The embodiment of the application solves the problem of voice separation of the call voice information between the target object and the corresponding customer service.

The embodiment of the present application provides a possible implementation manner, and specifically, step S1061 (not shown in the figure) includes:

step S10611 (not shown in the figure), acquiring a spectrogram corresponding to the call voice information of the target object and the corresponding customer service;

and acquiring a spectrogram corresponding to the call voice information of the target object and the corresponding customer service through a corresponding data acquisition method, wherein the spectrogram can be obtained by carrying out standardization processing, sliding window processing and FFT processing on the call voice information of the target object and the corresponding customer service.

Step S10612 (not shown in the figure), extracting three-dimensional features of the spectrogram based on the pre-trained convolutional neural network, the three-dimensional features including time dimension features, frequency dimension features, and channel dimension features;

specifically, extracting three-dimensional features of a spectrogram through a pre-trained convolutional neural network (such as ResNet-18, VGG-16 and the like); the two-dimensional spectrogram (f, t) can be input into a convolutional neural network to obtain a three-dimensional characteristic map (f, t, c), wherein f is a frequency dimension, t is a time dimension, c is a channel dimension, and the channel dimension is related to the number of convolutional kernels of the convolutional layer.

Step S10613 (not shown), performing average pooling on the three-dimensional features in the frequency dimension to obtain pooled features;

specifically, the three-dimensional feature is subjected to average pooling in frequency dimension to obtain pooled features, so that the three-dimensional feature map is converted into a two-dimensional feature map, the length of the dimension in time is kept, and the size of the other dimension of the two-dimensional feature map is determined by the number of convolution kernels of the feature map output by the convolution network, so that call voices with different lengths can be processed.

For example, channel is 6, f is 28, and t is 28, and the process of converting the three-dimensional feature map into the two-dimensional feature map may be: mean posing is carried out along the frequency f direction, namely, the average value of 28 numerical values at each time point is taken along the frequency direction, so that a one-dimensional 28-digit vector is obtained; the one-dimensional vectors from the 6 channel processes are then stacked together to obtain the final two-dimensional features (6 rows and 28 columns).

Step S10614 (not shown), inputting the pooled features into a pre-trained recurrent neural network to obtain a segmentation label in a time dimension;

specifically, the pooled features are input into a pre-trained recurrent neural network to obtain segmentation labels on a time dimension, the characteristics on the time dimension extracted by the recurrent neural network can be subjected to sequence analysis by adopting the recurrent neural network, and speaker labels on corresponding time points can be effectively and accurately output by correlating the input of a period of time before and after the correlation, so that the purpose of speaker voice separation can be achieved through the labels; the recurrent neural Network may be an LSTM Network (Long Short Memory Network), or may be another recurrent neural Network that can implement the present application, and the present application is not limited herein.

Step S10615 (not shown), performing voice separation on the target call audio based on the split tag in the time dimension.

Specifically, the target call audio is subjected to voice separation based on the segmentation label in the time dimension, and the target call audio may be subjected to voice separation based on the mapping relationship between the segmentation label and the corresponding audio segment.

For the embodiment of the application, the three-dimensional characteristics of the spectrogram corresponding to the call voice information of the target object and the corresponding customer service are extracted, so that the information of a speaker is fully utilized, and the accuracy of voice segmentation can be improved; in addition, the segmentation labels of the target call audio are automatically output through the pre-trained recurrent neural network, the judgment of the conversion points is more accurate, the accuracy of voice segmentation is further improved, and moreover, the clustering processing is not needed by an additional clustering algorithm, so that the end-to-end processing of voice separation can be realized.

The embodiment of the present application provides a possible implementation manner, and specifically, step S1062 (not shown in the figure) includes:

step S10621 (not shown in the figure) of extracting the voice features of the communication voice information of the target object and the corresponding customer service based on the pre-trained second deep learning model by a sliding window method to obtain a feature vector array;

specifically, a sliding window method (the window body length is n seconds, and the step length is s seconds) is used for carrying out feature extraction on the audio data by using a trained deep learning model to obtain a feature vector array. The sliding window can process respective voice fragments of the target object and the customer service in the sliding process, meanwhile, the sliding window can also cross an overlapping area of voices of two persons (the overlapping means that a window body of the sliding window covers the voices of the two persons at the same time), and the sliding of the sliding window is represented as reciprocating motion of a feature point between two feature clusters in a feature space.

Step S10622 (not shown in the figure) performs K-Means clustering on the feature vector array to obtain the label of each feature vector;

step S10623 (not shown) performs voice separation on each target object from the call voice information of the corresponding customer service based on the label of each feature vector.

Specifically, K-means clustering is carried out on the extracted feature vectors, the feature points are labeled with 0 or 1, and the boundary points of the transition between 0 and 1 are the segmentation points of the voices of different people along with the movement of the feature points. The division point at this moment corresponds to a window body of a sliding window, namely a section of audio with the length of n seconds, the middle point of the section of the audio with the length of n seconds is taken as a real division point, and after division point data exists, the audio of the client can be extracted in the same way as the method 1.

For the embodiment of the application, the problem of voice separation of the call voice of the target object and the customer service is solved through a K-means clustering algorithm.

The embodiment of the present application provides a possible implementation manner, and specifically, step S107 (not shown in the figure) includes:

step S1071 (not shown in the figure) extracts voiceprint features of the audio after the speech separation of the call speech information between the target object and the corresponding customer service;

step S1072 (not shown in the figure) calculates a similarity between the voiceprint feature of the voice-separated audio and a voiceprint feature of at least one pre-stored customer service;

step S1073 (not shown in the figure) determines the audio information of the target object based on the similarity calculation result.

Specifically, after the voice separation of the call voice information of the target object and the corresponding customer service, the audio frequencies of two persons (the target object and the customer service) are obtained, and the voiceprint features of the audio frequencies after the separation processing are extracted through a corresponding voiceprint feature extraction method (such as a deep learning method).

Specifically, when the voiceprint feature of only one customer service is prestored, if the similarity between the voiceprint feature of one person of the two persons and the voiceprint feature of the prestored customer service is smaller than a preset threshold value, the person is a target object (namely, the audio frequency of the credit card applicant); and if the similarity between the voiceprint feature of one person and the prestored voiceprint feature of the customer service is greater than a preset threshold value, the audio of the one person is the audio of the customer service, and the audio corresponding to the other person is the audio of the target object (namely the credit card applicant).

Specifically, when voiceprint features of a plurality of customer services are prestored, similarity between the voiceprint feature of one person and the voiceprint feature of each prestored customer service is respectively calculated to obtain a plurality of similarity values; if any similarity value is larger than a preset threshold value, the audio of one person is the audio of customer service; if all the similarity values are smaller than the predetermined threshold, it is indicated that the audio corresponding to one person is the audio of the target object (i.e. the credit card applicant).

With the embodiment of the application, the problem of determining the audio frequency of the target object is solved.

Fig. 2 is a fraud identification apparatus provided in an embodiment of the present application, where the apparatus 20 includes: a first obtaining module 201, an extracting module 202, a sorting module 203, and a first determining module 204, wherein,

a first obtaining module 201, configured to obtain audio information of a plurality of target objects;

the extraction module 202 is configured to extract voiceprint features of each target object through a pre-trained first deep learning model based on audio information of a plurality of target objects to obtain a voiceprint feature pool;

the sorting module 203 is configured to perform clustering processing on each voiceprint feature in the voiceprint feature pool through a clustering algorithm to obtain at least one cluster, and sort each cluster according to the similarity of each cluster;

a first determining module 204, configured to determine a fraud target object based on the sorting result of each cluster.

The embodiment of the application provides a fraud identification device, compared with the identification of fraud applications realized in the prior art through an identity authentication mode, the embodiment of the application obtains the voice print feature pool by obtaining the audio information of a plurality of target objects and based on the audio information of the plurality of target objects and extracting the voice print features of each target object through a pre-trained first deep learning model, then performs clustering classification processing on each voice print feature in the voice print feature pool through a clustering algorithm to obtain at least one cluster, sorts each cluster according to the similarity of each cluster, and finally determines the fraud target object based on the sorting result of each cluster. Clustering each voiceprint feature in the voiceprint feature pool through a clustering algorithm to obtain at least one cluster, sequencing each cluster according to the similarity of each cluster, and determining a fraud identification object according to a sequencing result, so that the problem of how to identify the identity of one person pretending to be multiple persons for carrying out fraud application is solved, and the automatic processing of fraud identification is realized; in addition, based on the clustering analysis of the voiceprint features of the current applicant and the voiceprint features of the historical applicant, the problem of identification of the fraud application in the historical application is solved, and the accuracy of identification of the fraud application is improved.

The fraud recognition apparatus of this embodiment can execute the fraud recognition method provided in the above embodiments of this application, and the implementation principles thereof are similar and will not be described herein again.

As shown in fig. 3, the present embodiment provides another fraud identification apparatus, where the apparatus 30 includes: a first obtaining module 301, an extracting module 302, a sorting module 303, and a first determining module 304, wherein,

a first obtaining module 301, configured to obtain audio information of multiple target objects;

the first obtaining module 301 in fig. 3 has the same or similar function as the first obtaining module 201 in fig. 2.

The extraction module 302 is configured to extract voiceprint features of each target object through a pre-trained first deep learning model based on audio information of a plurality of target objects to obtain a voiceprint feature pool;

wherein the extraction module 302 in fig. 3 has the same or similar function as the extraction module 202 in fig. 2.

The sorting module 303 is configured to perform clustering classification processing on each voiceprint feature in the voiceprint feature pool through a clustering algorithm to obtain at least one cluster, and sort each cluster according to the similarity of each cluster;

wherein the sorting module 303 in fig. 3 has the same or similar function as the sorting module 203 in fig. 2.

A first determining module 304, configured to determine a fraud target object based on the sorting result of each cluster.

Wherein the first determining module 304 in fig. 3 has the same or similar function as the first determining module 204 in fig. 2.

The embodiment of the present application provides a possible implementation manner, and the apparatus 30 further includes:

a second obtaining module 305, configured to obtain call voice information between the target object and the corresponding customer service;

the voice separation module 306 is used for performing voice separation on the target object and the call voice information of the corresponding customer service based on a corresponding voice separation algorithm;

and a second determining module 307, configured to determine audio information of the target object based on the audio obtained by separating the call voice information between the target object and the corresponding customer service.

The embodiment of the present application provides a possible implementation manner, and specifically, the voice separation module 306 is configured to perform voice separation on the communication voice information of each target object and the corresponding customer service through a convolutional neural network and a cyclic neural network; and/or the method is used for separating each target object from the call voice information of the corresponding customer service by a sliding window method and a K-Means clustering algorithm.

The embodiment of the present application provides a possible implementation manner, and specifically, the voice separation module 306 includes:

an obtaining unit 3061 (not shown in the figure) for obtaining a spectrogram of the target object corresponding to the call voice information of the corresponding customer service;

a first extraction unit obtaining unit 3062 (not shown in the figure), configured to extract three-dimensional features of the spectrogram based on a pre-trained convolutional neural network, where the three-dimensional features include a time dimension feature, a frequency dimension feature, and a channel dimension feature;

a pooling unit obtaining unit 3063 (not shown in the figure) for performing average pooling processing on the three-dimensional features in frequency dimension to obtain pooled features;

a first label unit obtaining unit 3064 (not shown in the figure), configured to input the pooled features to a pre-trained recurrent neural network, so as to obtain a segmentation label in a time dimension;

a first separation unit obtaining unit 3065 (not shown in the figure) for performing voice separation on the target call audio based on the division label in the time dimension.

a second extraction unit 3066 (not shown in the figure), which extracts the voice features of the call voice information of the target object and the corresponding customer service based on a pre-trained second deep learning model by a sliding window method to obtain a feature vector array;

a second label unit 3067 (not shown in the figure), configured to perform K-Means clustering on the feature vector array to obtain labels of the feature vectors;

a second separation unit 3068 (not shown in the figure) for separating the call voice information of the corresponding customer service from each target object based on the label of each feature vector.

The embodiment of the present application provides a possible implementation manner, and specifically, the second determining module 305 includes:

a third extracting unit 3051 (not shown in the figure), configured to extract a voiceprint feature of the audio obtained after the voice separation of the call voice information between the target object and the corresponding customer service;

a calculating unit 3052 (not shown in the figure), configured to calculate a similarity between a voiceprint feature of the voice-separated audio and a voiceprint feature of at least one pre-stored customer service;

a third determination unit 3053 (not shown in the figure) for determining audio information of the target object based on the similarity calculation result.

The embodiment of the present application provides a fraud identification apparatus, which is suitable for the method shown in the foregoing embodiment, and is not described herein again.

An embodiment of the present application provides an electronic device, as shown in fig. 4, an electronic device 40 shown in fig. 4 includes: a processor 401 and a memory 403. Wherein the processor 401 is coupled to the memory 403, such as via a bus 402. Further, the electronic device 40 may also include a transceiver 404. It should be noted that the transceiver 404 is not limited to one in practical applications, and the structure of the electronic device 40 is not limited to the embodiment of the present application. In this embodiment, the processor 401 is applied to implement the functions of the first obtaining module, the extracting module, the sorting module, and the first determining module shown in fig. 2 or fig. 3, and the functions of the second obtaining module, the voice separating module, and the second determining module shown in fig. 3. The transceiver 404 includes a receiver and a transmitter.

The processor 401 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 401 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 402 may include a path that transfers information between the above components. The bus 402 may be a PCI bus or an EISA bus, etc. The bus 402 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.

The memory 403 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 403 is used for storing application program codes for executing the scheme of the application, and the execution is controlled by the processor 401. The processor 401 is configured to execute application program code stored in the memory 403 to implement the functions of the fraud identification apparatus provided by the embodiment shown in fig. 2 or fig. 3.

The embodiment of the application provides electronic equipment, and compared with the prior art that identification of a fraud application is realized through an identity authentication mode, the embodiment of the application obtains audio information of a plurality of target objects, extracts voiceprint features of the target objects through a pre-trained first deep learning model to obtain a voiceprint feature pool based on the audio information of the target objects, performs clustering classification processing on the voiceprint features in the voiceprint feature pool through a clustering algorithm to obtain at least one cluster, sorts the clusters according to similarity of the clusters, and finally determines the fraud target object based on a sorting result of the clusters. Clustering each voiceprint feature in the voiceprint feature pool through a clustering algorithm to obtain at least one cluster, sequencing each cluster according to the similarity of each cluster, and determining a fraud identification object according to a sequencing result, so that the problem of how to identify the identity of one person pretending to be multiple persons for carrying out fraud application is solved, and the automatic processing of fraud identification is realized; in addition, based on the clustering analysis of the voiceprint features of the current applicant and the voiceprint features of the historical applicant, the problem of identification of the fraud application in the historical application is solved, and the accuracy of identification of the fraud application is improved.

The embodiment of the application provides an electronic device suitable for the method embodiment. And will not be described in detail herein.

The present application provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method shown in the above embodiments is implemented.

Compared with the prior art that identification of a fraud application is achieved through an identity authentication mode, the audio information of a plurality of target objects is obtained, the voiceprint features of the target objects are extracted through a pre-trained first deep learning model to obtain a voiceprint feature pool based on the audio information of the target objects, clustering classification processing is conducted on the voiceprint features in the voiceprint feature pool through a clustering algorithm to obtain at least one cluster, the clusters are ranked according to the similarity of the clusters, and finally the fraud target object is determined based on the ranking result of the clusters. Clustering each voiceprint feature in the voiceprint feature pool through a clustering algorithm to obtain at least one cluster, sequencing each cluster according to the similarity of each cluster, and determining a fraud identification object according to a sequencing result, so that the problem of how to identify the identity of one person pretending to be multiple persons for carrying out fraud application is solved, and the automatic processing of fraud identification is realized; in addition, based on the clustering analysis of the voiceprint features of the current applicant and the voiceprint features of the historical applicant, the problem of identification of the fraud application in the historical application is solved, and the accuracy of identification of the fraud application is improved.

The embodiment of the application provides a computer-readable storage medium which is suitable for the method embodiment. And will not be described in detail herein.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. A fraud identification method, comprising:

acquiring audio information of a plurality of target objects;

based on the audio information of the target objects, extracting the voiceprint features of the target objects through a pre-trained first deep learning model to obtain a voiceprint feature pool;

clustering each voiceprint feature in the voiceprint feature pool through a clustering algorithm to obtain at least one cluster, calculating a similarity value corresponding to each cluster by taking the clusters as a whole, and sequencing each cluster according to the similarity of each cluster;

determining the target object contained in the corresponding cluster as a cheating target object based on the sequencing result of each cluster;

the treatment of the classified agglomeration type comprises the following steps: step 1, randomly and unreplacing a non-clustered feature point from a feature vector pool as an initial member of a cluster; step 2, sequentially detecting the average distance between each feature point in the feature vector pool and all members in the cluster, adding the feature points into the cluster if the distance is smaller than a specified threshold value, and independently clustering the feature points if the distances between a certain feature point and all other feature points are larger than the specified threshold value; and 3, randomly and not putting back and extracting one non-clustered feature point from the feature vector pool, repeating the processes 1 and 2 until all members in the feature vector pool are clustered, and finishing clustering.

2. The method of claim 1, wherein the obtaining audio information for a plurality of target objects previously comprises:

acquiring the communication voice information between the target object and the corresponding customer service;

and determining the audio information of the target object based on the audio of the call voice information between the target object and the corresponding customer service after voice separation.

3. The method of claim 2, wherein the voice separating the target object from the call voice information of the corresponding customer service based on the corresponding voice separation algorithm comprises at least one of:

carrying out voice separation on the communication voice information of each target object and the corresponding customer service through a convolutional neural network and a cyclic neural network;

4. The method of claim 3, wherein the voice separating the call voice information of each target object and the corresponding customer service through the convolutional neural network and the cyclic neural network comprises:

extracting three-dimensional features of the spectrogram based on a pre-trained convolutional neural network, wherein the three-dimensional features comprise time dimension features, frequency dimension features and channel dimension features;

5. The method of claim 3, wherein the voice separating each target object from the call voice information of the corresponding customer service through a sliding window method and a K-Means clustering algorithm comprises:

and carrying out voice separation on the communication voice information of each target object and the corresponding customer service based on the label of each feature vector.

6. The method according to any one of claims 2-5, wherein the determining the audio information of the target object based on the audio after voice separation of the call voice information between the target object and the corresponding customer service comprises:

extracting voice print characteristics of the audio frequency after the voice separation of the call voice information between the target object and the corresponding customer service;

determining audio information of the target object based on the similarity calculation result.

7. An apparatus for fraud identification, comprising:

the extraction module is used for extracting the voiceprint features of the target objects through a pre-trained first deep learning model based on the audio information of the target objects to obtain a voiceprint feature pool;

the sorting module is used for performing clustering processing on each voiceprint feature in the voiceprint feature pool through a clustering algorithm to obtain at least one cluster, calculating a similarity value corresponding to each cluster by taking the cluster as a whole, and sorting each cluster according to the similarity of each cluster; the treatment of the classified agglomeration type comprises the following steps: step 1, randomly and unreplacing a non-clustered feature point from a feature vector pool as an initial member of a cluster; step 2, sequentially detecting the average distance between each feature point in the feature vector pool and all members in the cluster, adding the feature points into the cluster if the distance is smaller than a specified threshold value, and independently clustering the feature points if the distances between a certain feature point and all other feature points are larger than the specified threshold value; step 3, randomly and unretracting a non-clustered feature point from the feature vector pool, repeating the processes 1 and 2 until all members in the feature vector pool are clustered, and finishing clustering;

and the first determining module is used for determining the target object contained in the corresponding cluster as a fraudulent target object based on the sorting result of each cluster.

8. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: -performing a fraud identification method according to any of claims 1 to 6.

9. A computer-readable storage medium for storing computer instructions which, when executed on a computer, cause the computer to perform the fraud identification method of any of preceding claims 1 to 6.