CN115063612A

CN115063612A - Fraud early warning method, device, equipment and storage medium based on face-check video

Info

Publication number: CN115063612A
Application number: CN202210585741.5A
Authority: CN
Inventors: 袁宏进; 曾凡涛; 刘玉宇; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-05-27
Filing date: 2022-05-27
Publication date: 2022-09-16

Abstract

The invention relates to the technical field of big data, and discloses a fraud early warning method, device, equipment and storage medium based on a face-examination video. The method comprises the following steps: performing background segmentation on the obtained face examination image to obtain a background face examination image, and performing feature extraction on the background face examination image to obtain a feature background image; calculating a first similarity between the characteristic background image and the historical image characteristics in the face examination black background image library and a second similarity between the characteristic background image and the images in the preset face examination white list characteristic library, and obtaining a first early warning background image and a second early warning background image from the characteristic background image based on the first similarity and the second similarity; and aggregating the first early warning background image and the second early warning background image, identifying and detecting the face-up scene according to the aggregation result, and early warning according to the detection result. According to the scheme, the scene of the potential group-partner fraud case is searched through historical data, the technical problem that the accuracy rate of fraud scene recognition is low is solved, and the false alarm rate of an invalid scene is reduced.

Description

Fraud early warning method, device, equipment and storage medium based on face-check video

Technical Field

The invention relates to the technical field of big data, in particular to a fraud early warning method, a fraud early warning device, fraud early warning equipment and a storage medium based on a face-check video.

Background

In the field of financial wind control, due to the development of online network technology, group cheating behaviors are carried out by utilizing online video remote auditing technology loopholes, and a lot of behaviors are developed in recent years. For example, a plurality of people implement fraud in the same site scene, a large number of service application sites are concentrated, and the same site and site appear in the video background during application.

For the scene of searching for potential or unknown group fraud cases through historical data, the global features and the local features are obtained by utilizing a convolutional neural network through the existing image retrieval technology, so that the matching case of visual image similarity can be realized, but in field application, many similar false alarm cases which are not required by business can appear, for example: white walls, ceilings, in-car, special stores, etc., can greatly reduce the performance of matching recall fraud cases of the retrieval system, and can also consume a large amount of manpower for manual review to find potential background fraud risks, which is a huge loss for financial fraud risks. Therefore, how to effectively and timely discover the potential black background fraud case images which are not checked out manually, the accuracy of fraud scene identification is improved, and the false alarm rate of invalid scenes is further reduced, which is a technical problem to be solved by technical personnel in the field.

Disclosure of Invention

The invention mainly aims to search the scene of potential or unknown group fraud cases through historical data, solve the technical problem of low accuracy of fraud scene identification and reduce the false alarm rate of invalid scenes.

The invention provides a fraud early warning method based on a face-examination video, which comprises the following steps: acquiring a face examination video, and extracting the face examination video to obtain a face examination image; inputting the face examination image into a preset portrait background segmentation model for background segmentation to obtain a background face examination image, and performing feature extraction on the background face examination image to obtain a feature background image of the background face examination image; calculating a first similarity between the characteristic background image and the historical image characteristics in a preset trial black background image library, and selecting an image with the first similarity being greater than a preset threshold value from the characteristic background image to obtain a first early warning background image; calculating a second similarity between the feature background image and an image in a preset face examination white list feature library, and obtaining a second early warning background image from the feature background image based on the second similarity; and aggregating the first early warning background image and the second early warning background image to obtain a multi-modal feature vector, inputting the multi-modal feature vector into a pre-trained fraud detection model for fraud detection, and early warning according to a fraud detection result.

Optionally, in a first implementation manner of the first aspect of the present invention, the obtaining a review video and extracting the review video to obtain a review image includes: when performing face examination, receiving a face examination video based on preset audio acquisition equipment, and acquiring the total frame number of the face examination video; and extracting the image corresponding to each frame number from the face-up video based on the total frame number to obtain a face-up image.

Optionally, in a second implementation manner of the first aspect of the present invention, the portrait background segmentation model includes an image segmentation network and a background image recognition network, and the inputting the face-examination image into a preset portrait background segmentation model for background segmentation to obtain a background-examination image includes: inputting the face-examination image into an image segmentation network in a preset portrait background segmentation model, and performing portrait segmentation on the face-examination image through the image segmentation network to obtain a portrait segmentation image, wherein the portrait segmentation image comprises a portrait image and a non-portrait image; performing face recognition on the portrait segmentation image to obtain a portrait image containing a face and a non-portrait image not containing the face; and inputting the non-portrait image without the face into a background image identification network in the portrait background segmentation model, and identifying the non-portrait image without the face through the background image identification network to obtain a background image.

Optionally, in a third implementation manner of the first aspect of the present invention, the performing portrait segmentation on the face-examination image through the image segmentation network to obtain a portrait segmentation image includes: inputting the face-examination image into a convolutional layer in the image segmentation network, and performing convolution processing on the face-examination image through the convolutional layer to generate a convolution image; performing dimensionality reduction on the convolution image based on a pyramid pooling layer in the image segmentation network; and outputting the convolution image after dimensionality reduction through a full connection layer in the image segmentation network to obtain a portrait segmentation image of the face image, wherein the portrait segmentation image comprises a portrait image containing a face and a non-portrait image not containing the face.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the identifying, by the background image identification network, the non-human image that does not include a face to obtain a background image includes: inputting the non-portrait images without faces into a sampling layer in the background image recognition network, and performing up-sampling on the non-portrait images without faces through the sampling layer to obtain sampling images; carrying out background coding on the sampling image based on a coder in the background image identification network to obtain a background coding image; and carrying out sequence decoding on the background coding image through a decoder in the background image identification network to obtain a background image.

Optionally, in a fifth implementation manner of the first aspect of the present invention, the performing feature extraction on the background review image to obtain a feature background image of the background review image includes: inputting the background surface examination image into a convolution layer of a preset image feature extraction model, and performing convolution operation on the background surface examination image through the convolution layer of the image feature extraction model to obtain an initial background feature map; standardizing the initial background feature map through a batch standard layer of the image feature extraction model of the initial background feature map to obtain a standard background feature map; outputting the standard background feature map based on an activation function of the image feature extraction model; and performing multi-scale feature fusion on the standard background feature image to obtain a feature background image.

Optionally, in a sixth implementation manner of the first aspect of the present invention, before the inputting the multi-modal feature vectors into a pre-trained fraud detection model for fraud detection and performing early warning according to a fraud detection result, the method further includes: extracting an image data sample from the video sample, and acquiring an image sample characteristic vector of the extracted image data sample; merging the image sample characteristic vector and the sample image characteristic vector to obtain a sample video characteristic vector; and training a machine learning model according to the sample video feature vector and a fraud label corresponding to the video sample to obtain a fraud detection model.

The invention provides a fraud early warning device based on a face-examination video, which comprises: the extraction module is used for acquiring a face examination video and extracting the face examination video to obtain a face examination image; the segmentation module is used for inputting the face examination image into a preset portrait background segmentation model for background segmentation to obtain a background face examination image, and extracting the characteristics of the background face examination image to obtain a characteristic background image of the background face examination image; the first calculation module is used for calculating a first similarity between the characteristic background image and the historical image characteristics in a preset trial black background image library, and selecting an image with the first similarity being greater than a preset threshold value from the characteristic background image to obtain a first early warning background image; the second calculation module is used for calculating a second similarity between the characteristic background image and an image in a preset face examination white list characteristic library and obtaining a second early warning background image from the characteristic background image based on the second similarity; and the aggregation module is used for aggregating the first early warning background image and the second early warning background image to obtain a multi-modal feature vector, inputting the multi-modal feature vector into a pre-trained fraud detection model for fraud detection, and performing early warning according to a fraud detection result.

Optionally, in a first implementation manner of the second aspect of the present invention, the extracting module is specifically configured to: when performing face examination, receiving a face examination video based on preset audio acquisition equipment, and acquiring the total frame number of the face examination video; and extracting the image corresponding to each frame number from the face-up video based on the total frame number to obtain a face-up image.

Optionally, in a second implementation manner of the second aspect of the present invention, the dividing module includes: the segmentation unit is used for inputting the face-examination image into an image segmentation network in a preset portrait background segmentation model, and performing portrait segmentation on the face-examination image through the image segmentation network to obtain a portrait segmentation image, wherein the portrait segmentation image comprises a portrait image and a non-portrait image; the face recognition unit is used for carrying out face recognition on the portrait segmentation image to obtain a portrait image containing a face and a non-portrait image not containing the face; and the background recognition unit is used for inputting the non-portrait image without the face into a background image recognition network in the portrait background segmentation model, and recognizing the non-portrait image without the face through the background image recognition network to obtain a background image.

Optionally, in a third implementation manner of the second aspect of the present invention, the dividing unit is specifically configured to: inputting the face-examination image into a convolutional layer in the image segmentation network, and performing convolution processing on the face-examination image through the convolutional layer to generate a convolution image; performing dimensionality reduction on the convolution image based on a pyramid pooling layer in the image segmentation network; and outputting the convolution image after dimensionality reduction through a full connection layer in the image segmentation network to obtain a portrait segmentation image of the face image, wherein the portrait segmentation image comprises a portrait image containing a face and a non-portrait image not containing the face.

Optionally, in a fourth implementation manner of the second aspect of the present invention, the background identification unit is specifically configured to: inputting the non-portrait images without faces into a sampling layer in the background image recognition network, and performing up-sampling on the non-portrait images without faces through the sampling layer to obtain sampling images; carrying out background coding on the sampling image based on a coder in the background image identification network to obtain a background coding image; and carrying out sequence decoding on the background coding image through a decoder in the background image identification network to obtain a background image.

Optionally, in a fifth implementation manner of the second aspect of the present invention, the dividing module is further specifically configured to: inputting the background surface examination image into a convolution layer of a preset image feature extraction model, and performing convolution operation on the background surface examination image through the convolution layer of the image feature extraction model to obtain an initial background feature map; standardizing the initial background feature map through a batch standard layer of the image feature extraction model of the initial background feature map to obtain a standard background feature map; outputting the standard background feature map based on an activation function of the image feature extraction model; and performing multi-scale feature fusion on the standard background feature image to obtain a feature background image.

Optionally, in a sixth implementation manner of the second aspect of the present invention, the apparatus for fraud pre-warning based on a face-up video further includes: the acquisition module is used for extracting an image data sample from the video sample and acquiring an image sample characteristic vector of the extracted image data sample; the merging module is used for merging the image sample characteristic vector and the sample image characteristic vector to obtain a sample video characteristic vector; and the training module is used for training a machine learning model according to the sample video feature vector and the fraud label corresponding to the video sample to obtain a fraud detection model.

The invention provides fraud early warning equipment based on a face-examination video, which comprises: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;

the at least one processor invokes the instructions in the memory to cause the video-based fraud alert apparatus to perform the steps of the video-based fraud alert method described above.

A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the above-mentioned method for early warning of fraud based on a face-up video.

According to the technical scheme provided by the invention, the obtained face examination image is subjected to background segmentation to obtain a background face examination image, and the characteristic of the background face examination image is extracted to obtain a characteristic background image; calculating a first similarity between the characteristic background image and the historical image characteristics in the face examination black background image library and a second similarity between the characteristic background image and the images in the preset face examination white list characteristic library, and obtaining a first early warning background image and a second early warning background image from the characteristic background image based on the first similarity and the second similarity; and aggregating the first early warning background image and the second early warning background image, identifying and detecting the face-up scene according to the aggregation result, and early warning according to the detection result. According to the scheme, the scene of the potential group partner fraud case is searched through historical data, the technical problem that the accuracy rate of fraud scene recognition is low is solved, and the false alarm rate of invalid scenes is reduced.

Drawings

FIG. 1 is a schematic diagram of a fraud warning method based on a face-examination video according to a first embodiment of the invention;

FIG. 2 is a schematic diagram of a fraud warning method based on a face-examination video according to a second embodiment of the present invention;

FIG. 3 is a schematic diagram of a third embodiment of a fraud warning method based on a face-examination video provided by the invention;

FIG. 4 is a schematic diagram of a fraud warning method based on a face-examination video according to a fourth embodiment of the present invention;

FIG. 5 is a schematic diagram of a fifth embodiment of a fraud warning method based on a face-examination video provided by the invention;

FIG. 6 is a schematic diagram of a fraud warning apparatus based on a face-examination video according to a first embodiment of the present invention;

FIG. 7 is a schematic diagram of a second embodiment of a fraud warning apparatus based on a video-on-screen provided by the invention;

fig. 8 is a schematic diagram of an embodiment of a fraud warning apparatus based on a face-check video provided by the present invention.

Detailed Description

The embodiment of the invention provides a fraud early warning method, a device, equipment and a storage medium based on a face examination video, wherein in the technical scheme of the invention, firstly, an obtained face examination image is subjected to background segmentation to obtain a background face examination image, and the background face examination image is subjected to feature extraction to obtain a feature background image; calculating a first similarity between the characteristic background image and the historical image characteristics in the face examination black background image library and a second similarity between the characteristic background image and the images in the preset face examination white list characteristic library, and obtaining a first early warning background image and a second early warning background image from the characteristic background image based on the first similarity and the second similarity; and aggregating the first early warning background image and the second early warning background image, identifying and detecting the face-up scene according to the aggregation result, and early warning according to the detection result. According to the scheme, the scene of the potential group partner fraud case is searched through historical data, the technical problem that the accuracy rate of fraud scene recognition is low is solved, and the false alarm rate of invalid scenes is reduced.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For convenience of understanding, a detailed flow of an embodiment of the present invention is described below, and referring to fig. 1, a first embodiment of a fraud warning method based on a video review according to an embodiment of the present invention includes:

101. acquiring a face examination video, and extracting the face examination video to obtain a face examination image;

in this embodiment, a face examination video is obtained, and the face examination video is extracted to obtain a face examination image.

Specifically, the review video is generated based on different user line review scenes, for example, a video stream generated by the online review of the user a in the company B or an online review video generated by the user C applying for the startup loan of the company D, further, it should be understood that different review images may exist in the review video, and in order to better identify whether the user has a fraudulent behavior in the review video, the invention extracts the review image from the review video to accurately position the background image in the review video, thereby realizing the identification of the fraudulent behavior of the user.

Illustratively, if the total frame number of the review video is N, the start frame of one image of the review video is identified as S, and the end frame of the one image of the review video is identified as E, the corresponding image sampling method includes: and starting from the S-th frame, carrying out image extraction on the image frame, if the image extraction fails (the image cannot be detected), continuing traversing the next frame, if the image extraction succeeds, extracting the corresponding image frame as an audit image, and finishing the traversing.

Further, in this embodiment, the pre-trained portrait background segmentation model includes a depeplabv 3+ neural network, which is used to segment the portrait and the background in the image, so as to more accurately locate the background image in the interview video.

102. Inputting the face examination image into a preset portrait background segmentation model for background segmentation to obtain a background face examination image, and performing feature extraction on the background face examination image to obtain a feature background image of the background face examination image;

in this embodiment, the face-examination image is input into a preset portrait background segmentation model to perform background segmentation, so as to obtain a background face-examination image. It should be noted that the portrait image refers to an image including only a face, the non-portrait image includes a background image and a portrait background image, the background image refers to an image not including a face, and the portrait background image refers to an image including both a portrait and a background, and based on the portrait segmentation, the portrait image can be screened from the face-examination image, so that the accuracy of extracting the subsequent background image is guaranteed.

In an alternative embodiment, the image segmentation network comprises: the face examination image is segmented by using an image segmentation network in the face image background segmentation model to obtain a face image segmentation image, wherein the face image segmentation image comprises a face image and a non-face image, and the face image segmentation image comprises a convolution layer, a pyramid pooling layer and a full connection layer, and the face image segmentation network comprises: convolving the face-examination image by utilizing a convolution layer in the image segmentation network to generate a convolution image, reducing the dimension of the convolution image by utilizing a pyramid pooling layer in the image segmentation network, and outputting the reduced convolution image by utilizing a full-connection layer in the image segmentation network to obtain a portrait segmentation image of the face-examination image, wherein the portrait segmentation image comprises: portrait images and non-portrait images.

Further, convolution of the face image is achieved through convolution kernels of convolution layers so as to extract a feature image of the face image, dimension reduction of the convolution image is achieved through a pooling function in a pyramid pooling layer, such as a relu function, and output of the convolution image after dimension reduction is achieved through an activation function of the full connection layer, such as a softmax function.

In an optional embodiment, the background image recognition network comprises: the method comprises the following steps of selecting a background image from the non-portrait image by using a background image identification network in the portrait background segmentation model, wherein the method comprises the following steps: and utilizing a sampling layer in the background image identification network to perform up-sampling on the portrait segmentation image to obtain a sampling image, utilizing an encoder in the background image identification network to perform background encoding on the sampling image to obtain a background encoded image, and utilizing a decoder in the background image identification network to perform sequence decoding on the background encoded image to obtain the background image.

103. Calculating a first similarity between the characteristic background image and the historical image characteristics in the preset trial black background image library, and selecting an image with the first similarity being greater than a preset threshold value from the characteristic background image to obtain a first early warning background image;

in this embodiment, a first similarity between the characteristic background image and the historical image characteristics in the preset trial blacking background image library is calculated, and an image with the first similarity larger than a preset threshold value is selected from the characteristic background image to obtain a first early warning background image. The historical image features in the face-examination black background image library are obtained by extracting features of the collected fraud background images in the fraud scene, namely, the images in the face-examination black background library are the fraud background images. Therefore, in the embodiment, the first similarity between the characteristic background image and the image in the trial blacking background image library is calculated, and the image with the first similarity larger than the preset threshold value is selected from the characteristic background image, so that the existing fraud image is screened from the trial video, thereby early warning the fraud behavior existing in the trial video and helping a user make a better judgment.

In this embodiment, before calculating the first similarity between the feature background image and the image in the trial-and-error background image library, the method further includes: and clustering the images in the face-examination black background image library, namely classifying the images of the same type into one type, so as to improve the retrieval speed of the images in the subsequent face-examination black background image library.

104. Calculating a second similarity between the characteristic background image and an image in a preset face examination white list characteristic library, and obtaining a second early warning background image from the characteristic background image based on the second similarity;

in this embodiment, a second similarity between the feature background image and an image in the preset face examination white list feature library is calculated, and a second early warning background image is obtained from the feature background image based on the second similarity.

Specifically, the trial white background image library is obtained by collecting non-fraudulent background images in a non-fraudulent scene, that is, images in the trial white background image library are all non-fraudulent background images, so that in the embodiment of the present invention, by calculating the second similarity between the feature background image and the images in the trial white background image library, and selecting an image from the feature background image, where the first similarity is not greater than a preset threshold, an existing fraudulent image is screened out from the trial video, so that a fraudulent behavior existing in the trial video can be warned, and a user can be helped to make a better judgment.

In this embodiment, before calculating the second similarity between the feature background image and the image in the trial-and-error background image library, the method further includes: and clustering the images in the face-up white background image library, namely classifying the images of the same type into one type, so as to improve the retrieval speed of the images in the subsequent face-up white background image library.

105. And aggregating the first early warning background image and the second early warning background image to obtain a multi-modal feature vector, inputting the multi-modal feature vector into a pre-trained fraud detection model for fraud detection, and early warning according to a fraud detection result.

In this embodiment, the first early warning background image and the second early warning background image are aggregated to obtain a multi-modal feature vector, the multi-modal feature vector is input into a pre-trained fraud detection model to perform fraud detection, and early warning is performed according to a fraud detection result.

Specifically, the fraud detection model may be one of machine learning models such as a two-classifier and an SVM model, in which case, the fraud detection model sets an input item as a multi-modal feature vector in advance, and an output item as a fraud detection result of the face-up video, where the fraud detection result may include a fraud video and a non-fraud video.

Taking a classifier which is constructed by taking a fraud detection model as a softmax algorithm as an example, the server inputs the multi-mode feature vectors into a fraud detection model of a pre-trained face-examination video, and the fraud detection model of the face-examination video outputs the probability that a video to be detected is a fraud video and the probability of a non-fraud video, so that data failure of fraud detection of the face-examination video is determined according to the probability.

According to the fraud detection method for the face-examination video, the image data and the voice data in the face-examination video are obtained, and the feature vectors corresponding to the image data and the voice data are respectively determined, so that after the image feature vectors and the voice feature vectors are combined, the fraud detection result of the face-examination video to be detected is obtained by utilizing the multi-modal feature vectors and the fraud detection model obtained through combination. According to the scheme, the image feature vectors and the voice feature vectors in the video are fused, so that the feature information of facial expressions or body actions, the feature information of voice emotion of the asked object and/or the feature information of the table speaking content are effectively represented, the amount of the feature information is increased, the comprehensiveness and diversity of the feature information are improved, and the accuracy of the video cheating detection of the face examination is effectively improved.

In the embodiment of the invention, the obtained face examination image is subjected to background segmentation to obtain a background face examination image, and the characteristic of the background face examination image is extracted to obtain a characteristic background image; calculating a first similarity between the characteristic background image and the historical image characteristics in the face examination black background image library and a second similarity between the characteristic background image and the images in the preset face examination white list characteristic library, and obtaining a first early warning background image and a second early warning background image from the characteristic background image based on the first similarity and the second similarity; and aggregating the first early warning background image and the second early warning background image, identifying and detecting the face check scene according to an aggregation result, and early warning according to a detection result. According to the scheme, the scene of the potential group-partner fraud case is searched through historical data, the technical problem that the accuracy rate of fraud scene recognition is low is solved, and the false alarm rate of an invalid scene is reduced.

Referring to fig. 2, a fraud warning method based on a face-check video according to a second embodiment of the present invention includes:

201. acquiring a face examination video, and extracting the face examination video to obtain a face examination image;

202. inputting the face examination image into an image segmentation network in a preset portrait background segmentation model, and performing portrait segmentation on the face examination image through the image segmentation network to obtain a portrait segmentation image;

in this embodiment, the face examination image is input to an image segmentation network in a preset portrait background segmentation model, and the face examination image is segmented by the image segmentation network to obtain a portrait segmentation image. Wherein the image segmentation network comprises: the face examination image is segmented by using an image segmentation network in the face image background segmentation model to obtain a face image segmentation image, wherein the face image segmentation image comprises a face image and a non-face image, and the face image segmentation image comprises a convolution layer, a pyramid pooling layer and a full connection layer, and the face image segmentation network comprises: convolving the face-examination image by utilizing a convolution layer in the image segmentation network to generate a convolution image, reducing the dimension of the convolution image by utilizing a pyramid pooling layer in the image segmentation network, and outputting the reduced convolution image by utilizing a full-connection layer in the image segmentation network to obtain a portrait segmentation image of the face-examination image, wherein the portrait segmentation image comprises: portrait images and non-portrait images.

Further, the convolution of the face image is realized through a convolution kernel of a convolution layer to extract a feature image of the face image, the dimension reduction of the convolution image is realized through a pooling function in a pyramid pooling layer, such as a relu function, and the output of the convolution image after dimension reduction is realized through an activation function of the full-link layer, such as a softmax function.

203. Carrying out face recognition on the portrait segmentation image to obtain a portrait image containing a face and a non-portrait image not containing the face;

in this embodiment, the face recognition is performed on the face segmentation image to obtain a face image including a face and a non-face image not including a face. Specifically, the facial recognition is called face recognition. The face recognition technology is a computer technology for recognizing faces by using analysis and comparison. Face recognition is a popular research field of computer technology, including face tracking detection, automatic image magnification adjustment, night infrared detection, automatic exposure intensity adjustment, and other techniques. The face recognition technology belongs to the biological feature recognition technology, and is used for distinguishing organism individuals from biological features of organisms (generally, specifically, people).

Specifically, the portrait image is an image including only a face, the non-portrait image includes a background image and a portrait background image, the background image is an image not including a face, the portrait background image includes both a portrait and a background, and the portrait image can be screened from the face-examination image based on portrait segmentation, so that the accuracy of extracting the subsequent background image is guaranteed.

204. Inputting the non-portrait image without the face into a background image recognition network in a portrait background segmentation model, and recognizing the non-portrait image without the face through the background image recognition network to obtain a background image;

in this embodiment, a non-human image not including a face is input to a background image recognition network in a human image background segmentation model, and the non-human image not including a face is recognized by the background image recognition network to obtain a background image. Specifically, the background image recognition network includes: the method comprises the following steps of selecting a background image from the non-portrait image by using a background image identification network in the portrait background segmentation model, wherein the method comprises the following steps: the sampling layer in the background image recognition network is used for up-sampling the portrait segmentation image to obtain a sampling image, an encoder in the background image recognition network is used for carrying out background encoding on the sampling image to obtain a background encoding image, and a decoder in the background image recognition network is used for carrying out sequence decoding on the background encoding image to obtain the background image.

Further, the upsampling refers to a process of segmenting a portrait into a specified resolution size, for example, after one (416, 3) face image is subjected to a portrait segmentation operation, obtaining a portrait segmented image as (13,13,16), and in order to compare the portrait segmented image with the corresponding face image, the portrait segmented image needs to be changed into the size of (416, 3), which is called upsampling, the background encoding refers to a process of masking the non-background area of the sampled image, and the sequence decoding is used for encoding the background area of the image with the masked background extracted.

205. Inputting the background surface examination image into a convolution layer of a preset image feature extraction model, and performing convolution operation on the background surface examination image through the convolution layer of the image feature extraction model to obtain an initial background feature map;

in this embodiment, the background side examination image is input into the convolution layer of the preset image feature extraction model, and the convolution operation is performed on the background side examination image through the convolution layer of the image feature extraction model, so as to obtain an initial background feature map. Wherein each Convolutional layer (Convolutional layer) in the Convolutional neural network is composed of a plurality of Convolutional units, and parameters of each Convolutional unit are optimized through a back propagation algorithm. The convolution operation aims to extract different input features, the convolution layer at the first layer can only extract some low-level features such as edges, lines, angles and other levels, and more layers of networks can iteratively extract more complex features from the low-level features.

206. Standardizing the initial background feature map through a batch standard layer of the initial background feature map image feature extraction model to obtain a standard background feature map;

in this embodiment, a batch standard layer of the initial background feature image feature extraction model is normalized by the batch standard layer to obtain a standard background feature image. Specifically, when the network is deep, if the initial input data is small, such as between [0,1], forward propagation will result in smaller data, and eventually the data will tend to 0. Resulting in backward propagation, the gradient may disappear, making the model untrained. If the input data is large, the data is larger and larger due to forward propagation, and when the gradient is obtained through backward propagation, the gradient can explode, which is also not beneficial to training.

One of the main assumptions in training a learning system is that the distribution of inputs remains constant throughout the training process. This condition is always satisfied for a linear model that simply maps input data to some appropriate output, but is not the same when dealing with a neural network that is built up from multiple layers. In such an architecture, the input to each layer is affected by the parameters of all previous layers (small changes to network parameters are amplified as the network gets deeper). Thus, a small change made in the back-propagation step within one layer can produce a large change in the input to another layer and, in the end, change the feature map distribution. During the training process, each layer needs to be adapted to the new distribution obtained by the previous layer, which slows down the convergence speed. Batch normalization overcomes this problem while reducing covariance shifts of the inner layers during the training process (changes in the network activation distribution due to changes in network parameters during the training process).

207. Outputting a standard background feature map based on an activation function of the image feature extraction model;

in this embodiment, the standard background feature map is output based on an activation function of the image feature extraction model. In particular, the Activation Function (Activation Function) is a Function that runs on a neuron of an artificial neural network and is responsible for mapping the input of the neuron to the output.

In this embodiment, Activation functions (Activation functions) play an important role in learning and understanding very complex and nonlinear functions by the artificial neural network [1] model. They introduce non-linear characteristics into our network. In the neuron, as shown in fig. 1, input is weighted and summed, and then a function is applied, which is an activation function. The activation function is introduced to increase the non-linearity of the neural network model. Commonly used activation functions are Sigmoid function, Tanh function and ReLU function. Wherein, the Sigmoid function is a common Sigmoid function in biology, and is also called a Sigmoid growth curve. In the information science, due to the properties of single increment and single increment of an inverse function, a Sigmoid function is often used as a threshold function of a neural network, and variables are mapped to be between 0 and 1. The Tanh function is one of hyperbolic functions, and Tanh () is hyperbolic tangent. In mathematics, the hyperbolic tangent "Tanh" is derived from the basic hyperbolic functions hyperbolic sine and hyperbolic cosine.

208. Performing multi-scale feature fusion on the standard background feature image to obtain a feature background image;

in this embodiment, multi-scale feature fusion is performed on the standard background feature map to obtain a feature background image. In particular, the most challenging problem in target detection is the scale variation problem (scale variance) of the target. In target detection, the shapes and sizes of objects are different, and even some objects with tiny, extremely large or extreme shapes (such as slender type, narrow and high type, etc.) may appear, which brings great difficulty to accurate identification and accurate positioning of targets. In the existing algorithms proposed aiming at the problem of target size change, the more effective algorithms mainly comprise an image pyramid and a feature pyramid, and the common idea of the two is to detect objects with different sizes by using multi-scale features.

The image pyramid is to scale the image to different resolutions and extract features of different sizes from the image of different resolutions by using the same CNN. And the multi-scale feature fusion is to fuse the bottom-layer features in the output standard background feature map into the output standard background feature map so as to reduce the influence of image gray scale change caused by different gains. The bottom-level features refer to basic features of the output standard background feature map, such as color, length, width, and the like, and in an embodiment of the present invention, the multi-scale feature fusion may be implemented by a CSP (Cross-Stage-Partial-connections) module in the spatial cavity pyramid.

209. Calculating a first similarity between the characteristic background image and the historical image characteristics in the preset trial black background image library, and selecting an image with the first similarity being greater than a preset threshold value from the characteristic background image to obtain a first early warning background image;

210. calculating a second similarity between the characteristic background image and an image in a preset face examination white list characteristic library, and obtaining a second early warning background image from the characteristic background image based on the second similarity;

211. and aggregating the first early warning background image and the second early warning background image to obtain a multi-modal feature vector, inputting the multi-modal feature vector into a pre-trained fraud detection model for fraud detection, and early warning according to a fraud detection result.

Steps 201 and 209-211 in this embodiment are similar to steps 101 and 103-105 in the first embodiment, and are not described herein again.

In the embodiment of the invention, the obtained face examination image is subjected to background segmentation to obtain a background face examination image, and the characteristic of the background face examination image is extracted to obtain a characteristic background image; calculating a first similarity between the characteristic background image and the historical image characteristics in the face examination black background image library and a second similarity between the characteristic background image and the images in the preset face examination white list characteristic library, and obtaining a first early warning background image and a second early warning background image from the characteristic background image based on the first similarity and the second similarity; and aggregating the first early warning background image and the second early warning background image, identifying and detecting the face-up scene according to the aggregation result, and early warning according to the detection result. According to the scheme, the scene of the potential group partner fraud case is searched through historical data, the technical problem that the accuracy rate of fraud scene recognition is low is solved, and the false alarm rate of invalid scenes is reduced.

Referring to fig. 3, a third embodiment of the fraud warning method based on the review video in the embodiment of the present invention includes:

301. acquiring a face examination video, and extracting the face examination video to obtain a face examination image;

302. inputting the face-examination image into a convolution layer in an image segmentation network, and performing convolution processing on the face-examination image through the convolution layer to generate a convolution image;

in this embodiment, the face-examination image is input to a convolution layer in an image segmentation network, and the face-examination image is convolved by the convolution layer to generate a convolved image, specifically, the image segmentation network includes: the face examination image is segmented by using an image segmentation network in the face image background segmentation model to obtain a face image segmentation image, wherein the face image segmentation image comprises a face image and a non-face image, and the face image segmentation image comprises a convolution layer, a pyramid pooling layer and a full connection layer, and the face image segmentation network comprises: convolving the face examination image by utilizing a convolution layer in the image segmentation network to generate a convolution image, reducing the dimension of the convolution image by utilizing a pyramid pooling layer in the image segmentation network, outputting the reduced convolution image by utilizing a full connection layer in the image segmentation network to obtain a portrait segmentation image of the face examination image, wherein the portrait segmentation image comprises: portrait images and non-portrait images.

303. Performing dimensionality reduction on the convolution image based on a pyramid pooling layer in the image segmentation network;

in this embodiment, the dimension reduction processing is performed on the convolution image based on the pyramid pooling layer in the image segmentation network. The pyramid pooling layer has the function of enabling the CNN to input images with any size, and an SSP layer is added behind the last convolution layer of the CNN, so that feature maps with different and any sizes can output a vector with a fixed length after passing through the SSP layer. Then the vector with the fixed length is input into a full connection layer to carry out subsequent classification detection tasks.

In this embodiment, a set of feature maps corresponding to the image is obtained after the image is subjected to operations of a plurality of convolution layers. The spatial pyramid pooling acts on this set of feature maps. The spatial pyramid pooling comprises a plurality of pyramid layers, wherein each pyramid layer comprises 4 × 4 spatial bins, the feature map is subjected to sliding window pooling, the size of each sliding window is 224/4-56, the step size is 224/4-56, so that the sliding window has 4 × 4 positions on the feature map, and the position of each sliding window obtains a value by applying a posing operation, so that the feature map generates 4 × 4-16 output values, and the pyramid layers generate 16 × 256 outputs after acting on the feature map because the feature map has 256. The other pyramid layers produce an output in the same way, because different pyramid layers have different numbers of spatial bins, and therefore produce outputs of different lengths. The output of the second layer is 4 × 256 and the third layer is 1 × 256. Combining the outputs of all pyramids yields an output of 21 x 256 length.

304. Outputting the convolution image after dimensionality reduction through a full connection layer in the image segmentation network to obtain a portrait segmentation image of the face-examination image;

in this embodiment, the reduced-dimension convolution image is output through the full connection layer in the image segmentation network, so as to obtain a portrait segmentation image of the face-up image. Wherein, the fully connected layer is that each node is connected with all nodes of the previous layer and is used for integrating the extracted features. The parameters of a fully connected layer are also typically the most due to its fully connected nature. For example, in VGG16, if the first fully-connected layer FC1 has 4096 nodes and the previous layer POOL2 is 7 × 512 ═ 25088 nodes, 4096 × 25088 weighted values are required for the transmission, which requires a large amount of memory.

Specifically, in the CNN structure, after passing through a plurality of convolutional layers and pooling layers, 1 or more than 1 fully-connected layer is connected, similarly to the MLP, and each neuron in the fully-connected layer is fully connected to all neurons in the layer before the fully-connected layer. The fully-connected layer can integrate local information with category distinctiveness in the convolutional layer or the pooling layer, and in order to improve the performance of the CNN network, a ReLU function is generally adopted as an excitation function of each neuron of the fully-connected layer. The output value of the last fully connected layer is passed to an output, which may be classified using softmax logistic regression, which may also be referred to as softmax layer.

305. Carrying out face recognition on the portrait segmentation image to obtain a portrait image containing a face and a non-portrait image not containing the face;

306. inputting the non-portrait images without faces into a sampling layer in a background image recognition network, and performing up-sampling on the non-portrait images without faces through the sampling layer to obtain sampling images;

in this embodiment, a non-portrait image that does not include a face is input to a sampling layer in a background image recognition network, and the non-portrait image that does not include a face is up-sampled by the sampling layer to obtain a sampled image. Specifically, the background image recognition network includes: the method comprises the following steps of selecting a background image from the non-portrait image by using a background image identification network in the portrait background segmentation model, wherein the method comprises the following steps: the sampling layer in the background image recognition network is used for up-sampling the portrait segmentation image to obtain a sampling image, an encoder in the background image recognition network is used for carrying out background encoding on the sampling image to obtain a background encoding image, and a decoder in the background image recognition network is used for carrying out sequence decoding on the background encoding image to obtain the background image.

Further, the upsampling is to divide the portrait into a specified resolution size, for example, one (416, 3) of face-to-face images is subjected to a portrait division operation to obtain one portrait divided image (13,13,16), and in order to compare the portrait divided image with the corresponding face-to-face image, the face-to-face image needs to be changed into the size of (416, 3), which is called upsampling, the background coding is a process of masking the sampled image with a non-background area, and the sequence decoding is used for coding a background area of the image with the masked background extracted.

307. Carrying out background coding on the sampled image based on an encoder in a background image identification network to obtain a background coded image;

in this embodiment, the encoder in the background-image-based identification network performs background encoding on the sampled image to obtain a background encoded image. Wherein the encoder (encoder) is a device that compiles, converts signals (e.g., bit streams) or data into a form of signals that can be communicated, transmitted, and stored. Encoders convert angular or linear displacements, called codewheels, into electrical signals, called coderulers. The encoder can be divided into a contact type and a non-contact type according to a reading mode; encoders can be classified into an incremental type and an absolute type according to their operation principles. The incremental encoder converts displacement into periodic electrical signals, and then converts the electrical signals into counting pulses, and the number of the pulses is used for expressing the magnitude of the displacement. Each position of the absolute encoder corresponds to a certain digital code, so that its representation is only dependent on the start and end positions of the measurement, and not on the intermediate course of the measurement.

308. Performing sequence decoding on the background coded image through a decoder in the background image identification network to obtain a background image;

in this embodiment, a decoder in the background image recognition network performs sequence decoding on the background encoded image to obtain a background image. Wherein, the decoder (decoder) is a hardware/software device capable of decoding the digital video and audio data stream to be restored into analog video and audio signals. Encoders such as mpeg4 for video, mp3 for audio, ac3, dts, etc. can compress and store the original data, but these are also common encoding formats and professional encoding formats, and are not used by ordinary households. Decoding software, commonly referred to as plug-ins, is required for playback of such video and audio on a home device or computer. Such as mpeg4 decoding plug-in ffdshow, ac3 decoding plug-in ac3fliter, etc. Only your computer with various decoding plug-ins can play these images and sounds. In multimedia, the encoder mainly compresses the analog video/audio signal into a data encoding file, and the decoder converts the data encoding file into the analog video/audio signal.

309. Calculating a first similarity between the characteristic background image and the historical image characteristics in the preset trial black background image library, and selecting an image with the first similarity being greater than a preset threshold value from the characteristic background image to obtain a first early warning background image;

310. calculating a second similarity between the characteristic background image and an image in a preset face examination white list characteristic library, and obtaining a second early warning background image from the characteristic background image based on the second similarity;

311. and aggregating the first early warning background image and the second early warning background image to obtain a multi-modal feature vector, inputting the multi-modal feature vector into a pre-trained fraud detection model for fraud detection, and early warning according to a fraud detection result.

Steps

301 and 309 and 311 in this embodiment are similar to

steps

101 and 103 and 105 in the first embodiment, and are not described herein again.

Referring to fig. 4, a fraud warning method based on a face-check video according to a fourth embodiment of the present invention includes:

401. when performing face examination, receiving a face examination video based on preset audio acquisition equipment, and acquiring the total frame number of the face examination video;

in this embodiment, a face review video is received, a total frame number of the face review video is obtained, and an image corresponding to each frame number in the face review video is extracted based on the total frame number to obtain a face review image.

The review video is generated based on different user line review scenes, for example, a video stream generated by the online review of the user a in the company B or an online review video generated by the user C in the entrepreneurship loan of the application company D, further, it should be understood that different review images exist in the review video, and in order to better identify whether the user has a fraud behavior in the review video, the invention extracts the review image from the review video to accurately position the background image in the review video, thereby realizing the identification of the fraud behavior of the user.

402. Based on the total frame number, extracting an image corresponding to each frame number from a face-checking video to obtain a face-checking image;

in this embodiment, based on the total number of frames, an image corresponding to each frame number is extracted from the review video, so as to obtain a review image. For example, if the total frame number of the face examination video is N, the start frame of one image of the face examination video is identified as S, and the end frame of the one image of the face examination video is identified as E, the corresponding image sampling method includes: and starting from the S-th frame, carrying out image extraction on the image frame, if the image extraction fails (the image cannot be detected), continuing traversing the next frame, if the image extraction succeeds, extracting the corresponding image frame as an audit image, and finishing the traversing.

Further, in the embodiment of the present invention, the pre-trained portrait background segmentation model includes a deplabv 3+ neural network, which is used to segment the portrait and the background in the image, so as to more accurately locate the background image in the interview video.

403. Inputting the face examination image into a preset portrait background segmentation model for background segmentation to obtain a background face examination image, and performing feature extraction on the background face examination image to obtain a feature background image of the background face examination image;

404. calculating a first similarity between the characteristic background image and the historical image characteristics in the preset trial black background image library, and selecting an image with the first similarity being greater than a preset threshold value from the characteristic background image to obtain a first early warning background image;

405. calculating a second similarity between the characteristic background image and an image in a preset face examination white list characteristic library, and obtaining a second early warning background image from the characteristic background image based on the second similarity;

406. and aggregating the first early warning background image and the second early warning background image to obtain a multi-modal feature vector, inputting the multi-modal feature vector into a pre-trained fraud detection model for fraud detection, and early warning according to a fraud detection result.

The

steps

401, 402, 406, and 409 in this embodiment are similar to the

steps

101, 103, and 104 in the first embodiment, and are not described herein again.

In the embodiment of the invention, the obtained face examination image is subjected to background segmentation to obtain a background face examination image, and the characteristic of the background face examination image is extracted to obtain a characteristic background image; calculating a first similarity between the characteristic background image and the historical image characteristics in the face examination black background image library and a second similarity between the characteristic background image and the images in the preset face examination white list characteristic library, and obtaining a first early warning background image and a second early warning background image from the characteristic background image based on the first similarity and the second similarity; and aggregating the first early warning background image and the second early warning background image, identifying and detecting the face-up scene according to the aggregation result, and early warning according to the detection result. According to the scheme, the scene of the potential group-partner fraud case is searched through historical data, the technical problem that the accuracy rate of fraud scene recognition is low is solved, and the false alarm rate of an invalid scene is reduced.

Referring to fig. 5, a fifth embodiment of the fraud warning method based on the review video in the embodiment of the present invention includes:

501. acquiring a face examination video, and extracting the face examination video to obtain a face examination image;

502. inputting the face examination image into a preset portrait background segmentation model for background segmentation to obtain a background face examination image, and performing feature extraction on the background face examination image to obtain a feature background image of the background face examination image;

503. calculating a first similarity between the characteristic background image and the historical image characteristics in the preset trial black background image library, and selecting an image with the first similarity being greater than a preset threshold value from the characteristic background image to obtain a first early warning background image;

504. calculating a second similarity between the characteristic background image and an image in a preset face examination white list characteristic library, and obtaining a second early warning background image from the characteristic background image based on the second similarity;

505. extracting an image data sample from the video sample, and acquiring an image sample characteristic vector of the extracted image data sample;

in this embodiment, an image data sample is extracted from a video sample, and an image sample feature vector of the extracted image data sample is obtained. Specifically, after the image data sample is acquired, the electronic device may extract a global image feature vector and a local image feature vector of the image data sample, and extract the global image feature vector and the local image feature vector of the image data sample. Then, an image sample feature vector between the image data sample and the sample image is determined based on the global image feature vector of the image data sample and the global image feature vector of the image data sample.

506. Merging the image sample characteristic vector and the sample image characteristic vector to obtain a sample video characteristic vector;

in this embodiment, the image sample feature vector and the sample image feature vector are merged to obtain a sample video feature vector. Wherein the feature vector merge is a transform that merges a given series of columns into a single vector column. It can combine the original features and the features generated by different feature transformers into a single feature vector to train ML models, such as machine learning algorithms like logistic regression and decision trees. VectorAssembler can accept the following input types: all value types, boolean types, vector types. The values of the input columns will be added to a vector in the order specified.

507. Training the machine learning model according to the sample video feature vector and a fraud label corresponding to the video sample to obtain a fraud detection model;

in this embodiment, the machine learning model is trained according to the sample video feature vector and the fraud label corresponding to the video sample, so as to obtain a fraud detection model. Specifically, the machine learning model is trained according to the sample video feature vector and the fraud label corresponding to the video sample, and the trained machine learning model is obtained and used as a fraud detection model.

In the training process of the fraud model, after the 3D convolutional neural network model is trained by using the video sample, the server extracts image sample data in the video sample data, acquires an image feature vector of the image sample data, and combines the image sample feature vector and the sample image feature vector to obtain a sample video feature vector; and training the machine learning model according to the sample video feature vector and the fraud label corresponding to the video sample, and acquiring the trained machine learning model as a fraud detection model. Accuracy of fraud detection is improved.

508. And aggregating the first early warning background image and the second early warning background image to obtain a multi-modal feature vector, inputting the multi-modal feature vector into a pre-trained fraud detection model for fraud detection, and early warning according to a fraud detection result.

Steps 501-503 in this embodiment are similar to steps 101-103 in the first embodiment, and are not described herein again.

In the embodiment of the invention, the obtained face-examination image is subjected to background segmentation to obtain a background face-examination image, and the characteristic of the background face-examination image is extracted to obtain a characteristic background image; calculating a first similarity between the characteristic background image and the historical image characteristics in the face examination black background image library and a second similarity between the characteristic background image and the images in the preset face examination white list characteristic library, and obtaining a first early warning background image and a second early warning background image from the characteristic background image based on the first similarity and the second similarity; and aggregating the first early warning background image and the second early warning background image, identifying and detecting the face-up scene according to the aggregation result, and early warning according to the detection result. According to the scheme, the scene of the potential group partner fraud case is searched through historical data, the technical problem that the accuracy rate of fraud scene recognition is low is solved, and the false alarm rate of invalid scenes is reduced.

With reference to fig. 6, the fraud warning method based on an audit video in the embodiment of the present invention is described above, and a fraud warning apparatus based on an audit video in the embodiment of the present invention is described below, where a first embodiment of the fraud warning apparatus based on an audit video in the embodiment of the present invention includes:

the extracting module 601 is configured to obtain a face examination video, and extract the face examination video to obtain a face examination image;

a segmentation module 602, configured to input the face examination image into a preset portrait background segmentation model for background segmentation to obtain a background face examination image, and perform feature extraction on the background face examination image to obtain a feature background image of the background face examination image;

a first calculating module 603, configured to calculate a first similarity between the feature background image and a historical image feature in a preset trial black background image library, and select an image with the first similarity greater than a preset threshold from the feature background image to obtain a first early warning background image;

a second calculating module 604, configured to calculate a second similarity between the feature background image and an image in a preset face examination white list feature library, and obtain a second early warning background image from the feature background image based on the second similarity;

and the aggregation module 605 is configured to aggregate the first early warning background image and the second early warning background image to obtain a multi-modal feature vector, input the multi-modal feature vector into a pre-trained fraud detection model for fraud detection, and perform early warning according to a fraud detection result.

In the embodiment of the invention, the obtained face examination image is subjected to background segmentation to obtain a background face examination image, and the characteristic of the background face examination image is extracted to obtain a characteristic background image; calculating a first similarity between the characteristic background image and the historical image characteristics in the face examination black background image library and a second similarity between the characteristic background image and the historical image characteristics in the preset face examination white list characteristic library, and obtaining a first early warning background image and a second early warning background image from the characteristic background image based on the first similarity and the second similarity; and aggregating the first early warning background image and the second early warning background image, identifying and detecting the face-up scene according to the aggregation result, and early warning according to the detection result. According to the scheme, the scene of the potential group partner fraud case is searched through historical data, the technical problem that the accuracy rate of fraud scene recognition is low is solved, and the false alarm rate of invalid scenes is reduced.

Referring to fig. 7, a fraud warning apparatus based on a face-examination video according to a second embodiment of the present invention specifically includes:

In this embodiment, the extracting module 601 is specifically configured to:

when performing face examination, receiving a face examination video based on preset audio acquisition equipment, and acquiring the total frame number of the face examination video;

and extracting the image corresponding to each frame number from the face-up video based on the total frame number to obtain a face-up image.

In this embodiment, the segmentation module 602 includes:

a segmentation unit 6021, configured to input the face-examination image into an image segmentation network in a preset portrait background segmentation model, and perform portrait segmentation on the face-examination image through the image segmentation network to obtain a portrait segmentation image, where the portrait segmentation image includes a portrait image and a non-portrait image;

a face recognition unit 6022, configured to perform face recognition on the person image segmentation image to obtain a person image including a face and a non-person image not including a face;

a background identifying unit 6023, configured to input the non-human image without the face into a background image identifying network in the human image background segmentation model, and identify the non-human image without the face through the background image identifying network to obtain a background image.

In this embodiment, the dividing unit 6021 is specifically configured to:

inputting the face-examination image into a convolutional layer in the image segmentation network, and performing convolution processing on the face-examination image through the convolutional layer to generate a convolution image;

performing dimensionality reduction on the convolution image based on a pyramid pooling layer in the image segmentation network;

and outputting the convolution image after dimensionality reduction through a full connection layer in the image segmentation network to obtain a portrait segmentation image of the face image, wherein the portrait segmentation image comprises a portrait image containing a face and a non-portrait image not containing the face.

In this embodiment, the background identification unit 6022 is specifically configured to:

inputting the non-portrait images without faces into a sampling layer in the background image recognition network, and performing up-sampling on the non-portrait images without faces through the sampling layer to obtain sampling images;

carrying out background coding on the sampling image based on a coder in the background image identification network to obtain a background coding image;

and carrying out sequence decoding on the background coding image through a decoder in the background image identification network to obtain a background image.

In this embodiment, the segmentation module 602 is further specifically configured to:

inputting the background surface examination image into a convolution layer of a preset image feature extraction model, and performing convolution operation on the background surface examination image through the convolution layer of the image feature extraction model to obtain an initial background feature map;

standardizing the initial background feature map through a batch standard layer of the image feature extraction model of the initial background feature map to obtain a standard background feature map;

outputting the standard background feature map based on an activation function of the image feature extraction model; and performing multi-scale feature fusion on the standard background feature map to obtain a feature background image.

In this embodiment, the fraud early warning apparatus based on the face-check video further includes:

an obtaining module 606, configured to extract an image data sample from the video sample, and obtain an image sample feature vector of the extracted image data sample;

a merging module 607, configured to merge the image sample feature vector and the sample image feature vector to obtain a sample video feature vector;

the training module 608 is configured to train a machine learning model according to the sample video feature vector and the fraud label corresponding to the video sample, so as to obtain a fraud detection model.

Fig. 6 and fig. 7 describe in detail the fraud warning apparatus based on a review video in the embodiment of the present invention from the perspective of a modular functional entity, and describe in detail the fraud warning apparatus based on a review video in the embodiment of the present invention from the perspective of hardware processing.

Fig. 8 is a schematic structural diagram of a video-based fraud warning apparatus 800 according to an embodiment of the present invention, where the video-based fraud warning apparatus 800 may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 810 (e.g., one or more processors) and a memory 820, and one or more storage media 830 (e.g., one or more mass storage devices) storing an application 833 or data 832. Memory 820 and storage medium 830 may be, among other things, transient or persistent storage. The program stored in the storage medium 830 may include one or more modules (not shown), each of which may include a series of instruction operations for the video-based fraud alert apparatus 800. Still further, the processor 810 may be configured to communicate with the storage medium 830, and execute a series of instruction operations in the storage medium 830 on the video-based fraud warning apparatus 800 to implement the steps of the video-based fraud warning method provided by the above-described method embodiments.

The video-based fraud alert apparatus 800 may also include one or more power supplies 840, one or more wired or wireless network interfaces 850, one or more input-output interfaces 860, and/or one or more operating systems 831, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art will appreciate that the configuration of the video-based fraud alert device shown in fig. 8 does not constitute a limitation of the video-based fraud alert device provided herein, and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.

The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and may also be a volatile computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed on a computer, the instructions cause the computer to execute the steps of the above-mentioned fraud early warning method based on a face-examination video.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A fraud early warning method based on an audit video is characterized by comprising the following steps:

acquiring a face examination video, and extracting the face examination video to obtain a face examination image;

inputting the face examination image into a preset portrait background segmentation model for background segmentation to obtain a background face examination image, and performing feature extraction on the background face examination image to obtain a feature background image of the background face examination image;

calculating a first similarity between the characteristic background image and the historical image characteristics in a preset trial black background image library, and selecting an image with the first similarity being greater than a preset threshold value from the characteristic background image to obtain a first early warning background image;

calculating a second similarity between the characteristic background image and an image in a preset face examination white list characteristic library, and obtaining a second early warning background image from the characteristic background image based on the second similarity;

and aggregating the first early warning background image and the second early warning background image to obtain a multi-modal feature vector, inputting the multi-modal feature vector into a pre-trained fraud detection model for fraud detection, and early warning according to a fraud detection result.

2. The fraud early warning method based on the face examination video of claim 1, wherein the obtaining the face examination video and extracting the face examination video to obtain a face examination image comprises:

and extracting an image corresponding to each frame number from the face-up video based on the total frame number to obtain a face-up image.

3. The fraud early warning method based on face examination video according to claim 1, wherein the portrait background segmentation model comprises an image segmentation network and a background image recognition network, and the background segmentation is performed by inputting the face examination image into a preset portrait background segmentation model to obtain a background face examination image, comprising:

inputting the face-examination image into an image segmentation network in a preset portrait background segmentation model, and performing portrait segmentation on the face-examination image through the image segmentation network to obtain a portrait segmentation image, wherein the portrait segmentation image comprises a portrait image and a non-portrait image;

carrying out face recognition on the portrait segmentation image to obtain a portrait image containing a face and a non-portrait image not containing the face;

and inputting the non-portrait image without the face into a background image recognition network in the portrait background segmentation model, and recognizing the non-portrait image without the face through the background image recognition network to obtain a background image.

4. The fraud early warning method based on an audit video of claim 1, wherein the extracting the features of the background audit image to obtain the feature background image of the background audit image comprises:

standardizing the initial background characteristic diagram on a batch standard layer of the image characteristic extraction model through the batch standard layer to obtain a standard background characteristic diagram;

outputting the standard background feature map based on an activation function of the image feature extraction model;

and performing multi-scale feature fusion on the standard background feature image to obtain a feature background image.

5. The fraud early warning method based on an audit video of claim 3, wherein the obtaining of the portrait segmentation image by the portrait segmentation of the audit image through the image segmentation network comprises:

performing dimension reduction processing on the convolution image based on a pyramid pooling layer in the image segmentation network;

6. The fraud early warning method based on an audit video of claim 3, wherein the identifying the non-human image without the face through the background image identification network to obtain a background image comprises:

7. The fraud early warning method based on an audit video as claimed in claim 1, wherein the method further comprises, before inputting the multi-modal feature vectors into a pre-trained fraud detection model for fraud detection and performing early warning according to fraud detection results:

extracting an image data sample from the video sample, and acquiring an image sample characteristic vector of the extracted image data sample;

merging the image sample characteristic vector and the sample image characteristic vector to obtain a sample video characteristic vector;

and training a machine learning model according to the sample video feature vector and a fraud label corresponding to the video sample to obtain a fraud detection model.

8. A fraud early warning device based on a face-check video is characterized in that the fraud early warning device based on the face-check video comprises:

the extraction module is used for acquiring a face examination video and extracting the face examination video to obtain a face examination image;

the segmentation module is used for inputting the face examination image into a preset portrait background segmentation model for background segmentation to obtain a background face examination image, and extracting the characteristics of the background face examination image to obtain a characteristic background image of the background face examination image;

the first calculation module is used for calculating a first similarity between the characteristic background image and the historical image characteristics in a preset trial black background image library, and selecting an image with the first similarity being greater than a preset threshold value from the characteristic background image to obtain a first early warning background image;

the second calculation module is used for calculating a second similarity between the characteristic background image and an image in a preset face examination white list characteristic library and obtaining a second early warning background image from the characteristic background image based on the second similarity;

and the aggregation module is used for aggregating the first early warning background image and the second early warning background image to obtain a multi-modal feature vector, inputting the multi-modal feature vector into a pre-trained fraud detection model for fraud detection, and early warning according to a fraud detection result.

9. An audit video based fraud early warning device, comprising: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;

the at least one processor invokes the instructions in the memory to cause the video-based fraud alert apparatus to perform the steps of the video-based fraud alert method of any of claims 1-7.

10. A computer-readable storage medium, having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method of any of claims 1-7.