WO2021237570A1 - 影像审核方法及装置、设备、存储介质 - Google Patents

影像审核方法及装置、设备、存储介质 Download PDF

Info

Publication number
WO2021237570A1
WO2021237570A1 PCT/CN2020/092923 CN2020092923W WO2021237570A1 WO 2021237570 A1 WO2021237570 A1 WO 2021237570A1 CN 2020092923 W CN2020092923 W CN 2020092923W WO 2021237570 A1 WO2021237570 A1 WO 2021237570A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
file
feature vector
threshold
image file
Prior art date
Application number
PCT/CN2020/092923
Other languages
English (en)
French (fr)
Inventor
罗茂
Original Assignee
深圳市欢太科技有限公司
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市欢太科技有限公司, Oppo广东移动通信有限公司 filed Critical 深圳市欢太科技有限公司
Priority to CN202080100202.7A priority Critical patent/CN115443490A/zh
Priority to PCT/CN2020/092923 priority patent/WO2021237570A1/zh
Publication of WO2021237570A1 publication Critical patent/WO2021237570A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the embodiments of this application relate to Internet technology, and relate to but not limited to image review methods and devices, equipment, and storage media.
  • the image review method provided by the embodiment of the application includes: extracting features of the image file to be reviewed using a target classification model to obtain a corresponding feature vector; wherein the target classification model uses multiple sample image files and corresponding multiple images Obtained by transforming file training; determining the similarity between the feature vector of the image file to be reviewed and at least one reference feature vector in the review set; determining the relationship between the determined similarity and the first threshold State whether the image file to be reviewed is a violation file.
  • the image review device includes: a feature extraction module configured to use a target classification model to perform feature extraction on an image file to be reviewed to obtain a corresponding feature vector; wherein the target classification model is based on a plurality of sample image files Obtained through training with corresponding multiple image transformation files; the first determining module is configured to determine the similarity between the feature vector of the image file to be reviewed and at least one reference feature vector in the review set; the review module is configured to According to the determined relationship between the similarity and the first threshold, it is determined whether the pending image file is a violation file.
  • the electronic device provided by an embodiment of the present application includes a memory and a processor.
  • the memory stores a computer program that can run on the processor.
  • the processor executes the program, the image review described in any of the embodiments of the present application is implemented Steps in the method.
  • the computer-readable storage medium provided by the embodiment of the present application has a computer program stored thereon, and when the computer program is executed by a processor, it implements the steps in any one of the image review methods described in the embodiment of the present application.
  • the electronic device uses the target classification model to perform feature extraction on the image file to be reviewed to obtain the corresponding feature vector; wherein, the target classification model is obtained through training of multiple sample image files and corresponding multiple image transformation files.
  • the target classification model is obtained through training of multiple sample image files and corresponding multiple image transformation files.
  • FIG. 1 is a schematic diagram of an exemplary application scenario of an image review method according to an embodiment of this application
  • FIG. 2 is a schematic diagram of the implementation process of the image review method according to the embodiment of the application.
  • FIG. 3 is a schematic diagram of the training process of the target classification model according to the embodiment of the application.
  • FIG. 4 is a schematic diagram of the implementation process of the method for generating a review set according to an embodiment of the application
  • FIG. 5 is a schematic diagram of an implementation process of a method for determining a first threshold value according to an embodiment of the application
  • FIG. 6 is a schematic diagram of the implementation process of another image review method according to an embodiment of the application.
  • FIG. 7A is a schematic structural diagram of MobileNetV2 according to an embodiment of the application.
  • FIG. 7B is a schematic structural diagram of a feature extraction structure according to an embodiment of the application.
  • FIG. 8 is a schematic diagram of the implementation process of another image review method according to an embodiment of the application.
  • FIG. 9 is a schematic diagram of the implementation process of yet another image review method according to an embodiment of the application.
  • FIG. 10 is a schematic diagram of the implementation process of another image review method according to an embodiment of the application.
  • FIG. 11 is a schematic diagram of the implementation process of another image review method according to an embodiment of the application.
  • FIG. 12 is a schematic diagram of a transformation operation performed on an original picture according to an embodiment of the application.
  • FIG. 13 is a simplified structural diagram of MobileNetV2 according to an embodiment of the application.
  • Figure 14 is a schematic diagram of the curve of the sigmoid function
  • FIG. 15 is a schematic diagram of a process of image matching according to an embodiment of the application.
  • FIG. 16 is the corresponding recall and wrong_recall when the candidate threshold is 35 to 70 in the embodiment of the application;
  • FIG. 17 is the corresponding recall and wrong_recall when the candidate threshold is 50 to 55 according to the embodiment of the application;
  • FIG. 18 is a schematic flowchart of a picture review system according to an embodiment of the application.
  • FIG. 19 is a schematic diagram of the Mobilehashnet algorithm flow in the picture review system according to an embodiment of the application.
  • 20A is a schematic diagram of the structure of an image file review device according to an embodiment of the application.
  • 20B is a schematic structural diagram of another image file review device according to an embodiment of the application.
  • FIG. 21 is a schematic diagram of a hardware entity of an electronic device according to an embodiment of the application.
  • first ⁇ second ⁇ third involved in the embodiments of the present application only distinguishes similar or different objects, and does not represent a specific order of objects. Understandably, “first ⁇ second ⁇ “Third” can be interchanged in a specific order or sequence when permitted, so that the embodiments of the present application described herein can be implemented in a sequence other than those illustrated or described herein.
  • FIG. 1 is a schematic diagram of an exemplary application scenario 100 of an image review method provided by an embodiment of the present application.
  • the scene 100 includes a terminal 101, an image review device 102 and a second database 103.
  • the image review device 102 is used to review the image file 104 input by the user at the terminal 101 to determine whether the file is a violation file; if it is a violation file, it is forbidden to store the file in the second database 103; otherwise If it is not a violation file, that is, the file is a compliant file, then the file is allowed to be stored in the second database 103 so that the user or other users can retrieve, browse or download the file.
  • the terminal 101 may be a mobile terminal with wireless communication capabilities such as a mobile phone (for example, a mobile phone), a tablet computer, a notebook computer, or the like, or a desktop computer or desktop computer with computing functions that is inconvenient to move.
  • a mobile phone for example, a mobile phone
  • a tablet computer for example, a tablet computer
  • a notebook computer or the like
  • a desktop computer or desktop computer with computing functions that is inconvenient to move such as a mobile phone (for example, a mobile phone), a tablet computer, a notebook computer, or the like
  • desktop computer or desktop computer with computing functions that is inconvenient to move.
  • the image review device 102 may be configured in the terminal 101, or may be configured independently of the terminal 101. There may be one or more image review devices 102 in the application scene 100. Multiple image review devices 102 can review the image files input by different users in parallel, thereby increasing the data processing speed.
  • the second database 103 can also be configured in the image reviewing device 102 when the image reviewing device 102 is configured on the network side.
  • the terminal 101, the image auditing device 102, and the second database 103 are independent of different devices
  • the terminal 101 and the image auditing device 102 can communicate through the network
  • the image auditing device 102 and the second database 103 can also communicate with each other through the network.
  • the communication may be performed through a network, and the network may be a wireless network or a wired network, and the embodiment of the present application does not specifically limit the communication mode here.
  • the embodiment of the application provides an image review method, which can be applied to electronic equipment with an image review device.
  • the electronic equipment can be a computer device, a notebook computer, any node server in a distributed computing architecture, or a mobile terminal. Wait.
  • the functions implemented by the image review method can be implemented by invoking program codes by the processor in the electronic device.
  • the program codes can be stored in a computer storage medium. It can be seen that the electronic device at least includes a processor and a storage medium.
  • FIG. 2 is a schematic diagram of the implementation process of the image review method according to the embodiment of the application. As shown in FIG. 2, the method may include the following steps 201 to 203:
  • Step 201 Use the target classification model to perform feature extraction on the image file to be reviewed to obtain the corresponding feature vector; wherein the target classification model is obtained through training of multiple sample image files and corresponding multiple image transformation files.
  • the target classification model may be a deep learning model, for example, a neural network model.
  • the model can be a lightweight neural network model, such as MobileNetV2.
  • the model can also be a non-lightweight neural network model.
  • the electronic device can be implemented through steps 301 to 304 in the following embodiment.
  • the so-called image transformation file refers to a file obtained by performing transformation processing such as inversion, rotation, liquefaction, scaling, cropping, mosaic, noise, color change, or occlusion on a sample image file, or a combination of these transformation methods.
  • the image file to be reviewed may be of various types.
  • the image file to be reviewed is an image or a piece of video (for example, a short video, a live video, a movie, a TV series, etc.).
  • the electronic device can randomly sample one or more video frame images from the video, and then perform feature extraction on these images through the target classification model to obtain the feature vector corresponding to the video.
  • Step 202 Determine the similarity between the feature vector of the image file to be reviewed and at least one reference feature vector in the review set.
  • the corresponding review set can be different. That is to say, when the image file to be reviewed is an image, the reference feature vector in the corresponding review set is extracted by the electronic device from the image. When the image file to be reviewed is a piece of video, a reference feature vector in the corresponding review set is extracted by the electronic device from multiple images. All in all, the dimension of the feature vector of the image file to be reviewed is consistent with the dimension of the reference feature vector. Of course, it is not limited to the above rules. The dimensions of the two feature vectors can also be different.
  • the parameter types that characterize the similarity can be varied, for example, it can be Hamming distance, Euclidean distance, or cosine similarity.
  • Step 203 Determine whether the to-be-reviewed image file is a violation file according to the determined relationship between the similarity and the first threshold.
  • the audit set generated based on the compliant reference image file (for a brief description, referred to as the compliant set) and the audit set generated based on the offending reference image file (hereinafter referred to as the violation set), correspond to the judgment criteria Is different.
  • the similarity is characterized by the Hamming distance.
  • the Hamming distance between two strings of equal length refers to the number of different characters at the corresponding positions of the two strings. Therefore, the smaller the Hamming distance, the more similar the two feature vectors, and the more similar the corresponding two image files.
  • the ratio of the number of similarities less than the first threshold to the total number of similarities is determined, and when the ratio is greater than the second threshold, the image file to be reviewed is determined to be a violation file.
  • the compliance set in one example, when the ratio is greater than the second threshold, the image file to be reviewed is determined to be a compliance file.
  • the electronic device can be implemented through step 604 to step 606 in the following embodiment.
  • the electronic device can also be implemented through step 802 to step 809 in the following embodiment.
  • the similarity characterizes the number of different features between two feature vectors.
  • the audit set is a violation set. Every time the electronic device determines the similarity with the reference feature vector, it counts the current similarity that is less than the first threshold. If the number is greater than or equal to the third threshold, the calculation of similarity is stopped, and the image file to be reviewed is determined to be a violation file, which is output as the review result.
  • the electronic device can also determine whether the image file to be reviewed is a violation file through steps 902 to 904 in the following embodiment.
  • the electronic device uses the target classification model to perform feature extraction on the image file to be reviewed to obtain the corresponding feature vector; wherein, the target classification model is trained through multiple sample image files and corresponding multiple image transformation files In this way, even if the image file to be reviewed is a file that has undergone multiple transformation processes such as rotation, liquefaction, and deformation of the original file, it can still extract the feature vector consistent with the original file, so as to realize the image file that is arbitrarily transformed Accurate identification can enhance the robustness of the image review method.
  • the electronic device may pre-train to obtain the target classification model, generate the review set, and determine the first threshold; wherein,
  • the following steps 301 to 304 may be included. It should be noted that the electronic device may perform the following steps 301 to 304 before performing feature extraction on the image file to be reviewed. The electronic device may also execute the following steps 301 to 304 when it is configured to have an image review function.
  • Step 301 Obtain the type label of each sample image file.
  • the sample image files include illegal image files and compliant image files.
  • Violating video files for example, can be files related to terror, violence, pornography, and gambling.
  • Compliant image files for example, may be files related to natural scenery and buildings.
  • the electronic device can sample some illegal sample files from the first database that collects a variety of illegal image files, and sample some compliant sample files from the second database that collects a variety of compliant image files. .
  • a certain number of image files are selected from the first database and the second database as the sample files. For example, select 100 illegal images and 100 compliant images from these two databases as sample image files.
  • Step 302 Perform transformation processing on each of the sample image files according to multiple transformation rules to obtain a set of image transformation files corresponding to the files.
  • the transformation rules can be various.
  • the basic transformation rules include flip, rotation, liquefaction, zoom, crop, mosaic, noise, color change, and occlusion.
  • the combined transformation rule is a combination of at least two basic transformation rules. Taking the above 9 basic transformation rules as an example, there are 502 combined transformation rules, namely In an example, the electronic device may perform transformation processing on the sample image file according to 100 different transformation rules to obtain 100 image transformation files corresponding to the file.
  • Step 303 Assign the type label of each sample image file to each image transformation file in the corresponding image transformation file set.
  • the type tags of the image file after conversion and the image file before conversion should be consistent. For example, if the illegal image file has been liquefied, the liquefied file is still illegal, and its nature remains unchanged. Therefore, the type label of the image transformation file corresponding to each sample image file can be consistent with the type label of the sample image file.
  • Step 304 Train a specific neural network model according to each of the sample image files, each of the image transformation files, and respective corresponding type labels to obtain the target classification model.
  • each sample image file is transformed according to multiple transformation rules to obtain the image transformation file set of the corresponding file; the type label of each sample image file is assigned to the corresponding image transformation file set Each of the image transformation files; according to each of the sample image files, each of the image transformation files and respective corresponding type tags, a specific neural network model is trained to obtain the target classification model.
  • the training samples include image transformation files obtained by performing multiple transformations on the sample image files, which can enrich the diversity of training samples and make the target classification model obtained by training have better robustness.
  • the model can accurately extract the feature vector of the transformed file, so as to accurately identify whether the file is It is a violation file.
  • the feature vectors extracted from the image files before and after the transformation process using this model are basically the same. Therefore, even if the input image file is a file after the transformation process, the electronic device can accurately identify whether the file is Violating documents.
  • the type label of each sample image file is assigned to each image transformation file in the corresponding image transformation file set; in this way, under the premise of ensuring the diversity of training samples, it reduces Manual labeling costs, no need to manually label each image transformation file type label.
  • the electronic device can automatically obtain a large number of rich and diverse training samples by transforming and processing the sample image files.
  • the electronic device may load the generated audit set into the cache in advance. There is no restriction on the timing of loading.
  • the electronic device can load the generated review set before using the target classification model to extract the features of the image file to be reviewed; for another example, the electronic device can also extract the features of the image file to be reviewed and determine the image file to be reviewed Before the similarity between the feature vector of and at least one reference feature vector in the audit set, load the generated audit set; another example, when the electronic device is configured to have the image audit function, load the generated audit set .
  • the method for generating an audit set may include the following steps 401 and 402:
  • Step 401 Using the target classification model, perform feature extraction on multiple reference image files to obtain feature vectors of corresponding files.
  • the multiple reference image files may be violation files, for example, all or part of the files in the first database, and the audit set obtained based on this is the violation set.
  • the multiple reference image files may be compliance files, for example, all or part of the files in the second database.
  • the nature of the audit set is different, that is, the compliance set and the violation set. In the image review stage, the corresponding judgment criteria are also different.
  • the multiple reference image files are part of the files in the database, they may be files randomly extracted from the database by the electronic device, or some representative files in the database, such as some files with higher priority.
  • Step 402 Use the feature vector of each reference image file as a reference feature vector to generate the review set.
  • the review set is loaded into the buffer area in advance.
  • the electronic device does not need to perform feature extraction on the multiple reference image files to generate the review set; instead, it can directly use the pre-generated review set to perform the image review. In this way, the time consumption of the feature extraction process can be saved, so that the time for reviewing the image can be saved.
  • the electronic device may load the determined first threshold into the cache in advance. There is no restriction on the timing of loading. For example, the electronic device may load the determined first threshold before determining whether the image file to be reviewed is a violation file; for another example, the electronic device may also load the determined first threshold value before performing feature extraction on the image file to be reviewed. Threshold; For another example, the electronic device can also load the determined first threshold when it is configured to have an image review function.
  • the method for determining the first threshold may include the following steps 501 to 503:
  • Step 501 assuming that the first threshold is a plurality of different candidate thresholds, according to the image review method, determine whether a plurality of verified image files are violating files, so as to obtain the review corresponding to each candidate threshold Result collection.
  • the plurality of verification image files may include a violation image file and a compliance image file.
  • the verification image file is different from the file used to train the neural network model.
  • the multiple verification image files may also include files obtained after the electronic device performs various transformation processes on the original image files.
  • the transformation rules used in the transformation processing may be the same as the transformation rules used in the model training stage.
  • step 501 a set of audit results obtained based on each candidate threshold can be obtained.
  • the set of audit results corresponding to threshold 1 is the content in the second column of Table 1.
  • Candidate threshold 2 ... Candidate threshold N Verify image file 1 1 1 ... 1 Verify image file 2 0 1 ... 1 ... ... ... ... ... Verify image file M 1 1 ... 0
  • Step 502 Determine the correct recall rate and the error recall rate under the corresponding candidate threshold according to each audit result set and the type label of each verified image file.
  • TN represents the number of violation documents reviewed as violations
  • FP represents the number of violation documents reviewed as compliance documents
  • FN represents the number of compliance documents reviewed as violation documents.
  • Step 503 Determine the candidate thresholds corresponding to the correct recall rate and the false recall rate that meet specific conditions as the first threshold.
  • the candidate threshold corresponding to the minimum error recall rate is selected as the first threshold.
  • the electronic device may adopt a grid search method to gradually approach the optimal value, so as to select the first threshold from a plurality of candidate thresholds.
  • FIG. 6 is a schematic diagram of the implementation process of the image review method according to the embodiment of the application. As shown in FIG. 6, the method may include the following steps 601 to 606:
  • Step 601 Obtain a feature vector extraction structure of the target classification model.
  • the feature vector extraction structure includes the input layer to the non-linear activation layer of the target classification model; wherein, the target classification model uses a plurality of sample image files And the corresponding multiple image transformation file training.
  • the target classification model can be a lightweight neural network model MobileNetV2.
  • the structure of the network includes a "bottleneck structure", a conv2d layer, a sigmoid activation layer, an n ⁇ 1 dimensional fully connected layer (Dense), and a normalized index layer (softmax).
  • the "bottleneck structure", the conv2d layer, and the sigmoid activation layer may be used as the feature vector extraction structure.
  • Step 602 Use the feature vector extraction structure to perform feature extraction on the image file to be reviewed to obtain a corresponding feature vector.
  • the output of the nonlinear activation layer of the feature vector extraction structure is the feature vector corresponding to the file.
  • Step 603 Determine the similarity between the feature vector of the image file to be reviewed and each reference feature vector in the review set; wherein the similarity is used to represent the number of different features between the two feature vectors;
  • Step 604 Determine the number of similarities less than the first threshold, where the similarity is used to characterize the number of different features between two feature vectors.
  • the similarity is the Hamming distance.
  • Step 605 Determine the ratio of the number to the total number of similarities
  • Step 606 Determine whether the to-be-reviewed image file is a violation file according to the relationship between the ratio and the second threshold.
  • the image file to be reviewed is determined to be a violation file; when the ratio is less than or equal to the second threshold, the file is determined to be a compliant file .
  • the image file to be reviewed is determined to be a compliant file; when the ratio is less than or equal to the second threshold, the file is determined to be a violation file.
  • the number of similarities less than the first threshold is counted, and the ratio between the number and the total number of similarities is determined; according to the relationship between the ratio and the second threshold, it is determined whether the image file to be reviewed is a violation File; In this way, compared to only obtaining the audit result based on the similarity with a reference feature vector, the audit result obtained in this way is more reliable and the recognition accuracy rate is higher.
  • FIG. 8 is a schematic diagram of the implementation process of the image review method according to the embodiment of the application. As shown in FIG. 8, the method may include the following steps 801 to 809:
  • the target classification model is used to perform feature extraction on the image file to be reviewed to obtain the corresponding feature vector; wherein the target classification model is obtained through training of multiple sample image files and corresponding multiple image transformation files.
  • a target classification model usually consists of multiple sequentially connected layers.
  • the first layer generally takes an image as input, and extracts features from the image through specific operations.
  • the features extracted from the previous layer of each layer are used as input, and by transforming them in a specific form, more complex features can be obtained.
  • This hierarchical feature extraction process can be accumulated, which gives the neural network powerful feature extraction capabilities.
  • the neural network can transform the initial input image into higher-level abstract features.
  • the image review method when feature extraction is performed on the image file to be reviewed through the target classification model, no matter how complicated the original file is to obtain the image file to be reviewed, the extracted feature vector is basically unchanged. In this way, the image review method has strong robustness, and even if the illegal file is transformed and uploaded to the network, it can still be accurately identified.
  • Step 802 Determine the similarity between the feature vector of the image file to be reviewed and the i-th reference feature vector in the review set; where i is greater than 0 and less than or equal to the reference feature vector in the review set Total number
  • Step 803 Determine whether the image file to be reviewed is a violation file according to the relationship between the similarity corresponding to the i-th reference feature vector and the first threshold; if so, go to step 804; otherwise, go to step 807;
  • the so-called similarity corresponding to the i-th reference feature vector refers to the similarity between the feature vector of the image file to be reviewed and the i-th reference feature vector.
  • Step 804 Count the first determined number of times the image file to be reviewed is a violation file
  • Step 805 Determine whether the first number of determinations is greater than the third threshold; if yes, go to step 806; otherwise, i+1, go back to step 802;
  • Step 806 Output that the image file to be reviewed is a violation file.
  • the first number of times of determination is greater than the third threshold, it is sufficient to reliably determine that the image file to be reviewed is a violation file, and there is no need to continue to calculate the similarity between the feature vector of the image file to be reviewed and the remaining reference feature vector. , Thereby saving the amount of calculation and shortening the audit time.
  • the third threshold is 900
  • the similarity is represented by Hamming distance.
  • the first determination number is 901. That is, among the similarities corresponding to the first to 1000th reference feature vectors, 901 similarities are less than the first threshold.
  • the image review process can be ended, and the review result of the image file to be reviewed as a violation file is output. There is no need to continue to calculate the similarity with the remaining 9,000 reference feature vectors.
  • Step 807 Count the second determined number of times that the image file to be reviewed is a compliant file
  • Step 808 Determine whether the second determination times are greater than the fourth threshold; if yes, go to step 809; otherwise, i+1, go back to step 802;
  • the fourth threshold is greater than the third threshold. In this way, the false detection rate of illegal files can be reduced.
  • Step 809 Output that the image file to be reviewed is a compliance file.
  • FIG. 9 is a schematic diagram of the implementation process of the image review method of the embodiment of the application. As shown in FIG. 9, the method may include the following steps 901 to 904:
  • Step 901 using the target classification model to perform feature extraction on the image file to be reviewed to obtain the corresponding feature vector; wherein the target classification model is obtained through training of multiple sample image files and corresponding multiple image transformation files;
  • Step 902 Determine the similarity between the feature vector of the image file to be reviewed and the i-th reference feature vector in the review set; where i is greater than 0 and less than or equal to the reference feature vector in the review set
  • the total number, the reference image file corresponding to the reference feature vector is a violation file; the similarity is used to characterize the number of different features between the two feature vectors;
  • Step 903 Determine whether the similarity corresponding to the i-th reference feature vector is less than the first threshold; if yes, go to step 904; otherwise, i+1, go back to step 902;
  • Step 904 Determine that the image file to be reviewed is a violation file, and output the review result.
  • the review process is ended, and the output to be reviewed is the review result of the violation file; otherwise, continue to traverse the next One refers to the feature vector until it is determined that the image file to be reviewed is a violation file.
  • the output pending image file is the audit result of the compliance file.
  • the input picture that is, the picture to be reviewed
  • the picture in the violation gallery that is, an example of the first database
  • Commonly used similarity algorithms such as the perceptual hash (pHash) algorithm and the Scale-Invariant Feature Transform (SIFT) algorithm.
  • the pHash algorithm is a rule algorithm designed manually.
  • the basic principle of the algorithm is to obtain the hash value of the input picture, and then calculate the hash "distance" between the input picture and a picture in the illegal library to obtain these two
  • the similarity of the picture when the similarity is greater than the set threshold, the match is considered successful.
  • the implementation process of the algorithm is as follows:
  • Reduce the size of the input picture simplify the color of the reduced picture; calculate the average value of the simplified picture; compare the grayscale of the pixel based on the average; calculate the hash value based on the grayscale; calculate and violate the rules based on the hash value
  • the Hamming distance of a picture in the gallery when the Hamming distance is less than the set threshold, it is determined that the matching is successful, and the input picture is an illegal picture.
  • the SIFT algorithm is used to detect and describe the local features in the picture. It finds extreme points in the spatial scale and extracts its position, scale, and rotation invariants. The description and detection of local features can help to identify objects. SIFT features are based on some local appearance points of interest on the object and have nothing to do with the size and rotation of the picture.
  • the algorithm factors (ie, image feature extraction operators) of the pHash algorithm and the SIFT algorithm are both artificially designed, so they can only meet specific matching scenarios.
  • the pHash algorithm can only maintain the invariance of scale scaling and color change;
  • the SIFT algorithm can only maintain the invariance of rotation, scale scaling, brightness change, affine, and noise.
  • the neural network model is mainly used to directly calculate whether the two pictures match.
  • the implementation process is shown in Figure 10, which is divided into a training phase and a prediction phase.
  • the basic process of the training phase includes the following steps 1001 to 1004:
  • Step 1001 design a model structure (including convolutional layer, fully connected layer, pooling layer, etc.) to obtain an initial similarity model, that is, a neural network model;
  • Step 1002 prepare a large amount of image data as training samples
  • Step 1003 Perform data enhancement processing on each picture in the training sample, for example, rotate, mirror, and render the pictures separately, and combine the two pictures obtained after different data transformations of the same picture into a positive sample (1 ), and other transformed pictures as negative samples (0).
  • step 1004 the initial similarity model is updated through the gradient descent series optimization algorithm and the training samples after data enhancement, to obtain the trained similarity model, that is, the target classification model.
  • the basic process of the prediction phase includes steps 1005 to 1007:
  • Step 1005 the input picture and each picture in the illegal library are calculated for similarity
  • Step 1006 Determine whether the ratio of the number of similarities less than the first threshold to the total number of similarities is greater than the second threshold; if so, go to step 1007;
  • step 1007 it is considered that the matching is successful, and it is determined that the input picture is a violation picture.
  • the deep learning model contains multiple convolution kernels obtained through gradient descent.
  • the convolution kernel has a strong ability to express image features and basically meets all image transformation scenarios.
  • it is necessary to cyclically perform matching calculations with all pictures in the gallery, plus the computational consumption of the neural network model itself, and its resource consumption is unacceptable.
  • a deep neural network is used to extract image features to obtain the image hash, which is an example of feature vector; compare the similarity of the two image hashes to determine whether the matching is successful.
  • the process may include the following Step 1 to Step 4):
  • Step1 Data preparation. Prepare 200 original pictures, as shown in Figure 12, perform picture transformation operations such as flipping, rotating, scaling, cropping, liquefying, mosaicing, noise, discoloration, and occlusion on each original picture, or a combination of them. Perform 100 different transformation operations on each picture, so that a total of 20,000 samples are obtained.
  • picture transformation operations such as flipping, rotating, scaling, cropping, liquefying, mosaicing, noise, discoloration, and occlusion on each original picture, or a combination of them.
  • Step2 Design the model.
  • the lightweight deep neural network MobileNetV2 is selected as the feature extractor. Before training the model, modify the MobileNetV2 network structure.
  • the original structure of MobileNetV2 is shown in Table 2 below.
  • the header "Input” is the input size of the structure layer
  • “Operator” is the structure type of the layer.
  • C is the dimension of the output feature layer of this layer
  • n is the number of repetitions of this layer
  • s is the number of steps of the deep convolution kernel.
  • the input size of the 11th layer of MobileNetV2 is fixed at 1 ⁇ 1 ⁇ 1280, and k 1 ⁇ 1 size convolution kernels are used for convolution calculation, so as to output a 1-dimensional vector of length k. Finally, connect the softmax activation layer to calculate the probability of k categories.
  • the MobileNetV2 structure is modified as follows: between the conv2d layer and the softmax layer, a sigmoid activation layer and an n ⁇ 1 dimensional fully connected layer (Dense) are added.
  • the modified MobileNetV2 structure is shown in Figure 7A.
  • Step3 Model training stage.
  • a picture classification model that is, a specific neural network model.
  • k 200
  • n is the dimension of the hash that needs to be encoded (for example, 300).
  • the model loss function is a multi-category cross-entropy loss (categorical_crossentropy), the optimization algorithm is Adam, the learning rate is fixed at 0.001, and the accuracy of the trained model is >99.5%.
  • Step4 Matching stage.
  • the output of the model is a 1-dimensional vector with a length of n (for example, 300).
  • n for example, 300.
  • the activation function is sigmoid
  • the value range of the sigmoid output is (0, 1).
  • the output is filtered according to the principle of output ⁇ 0.5, then 0, output>0.5, then 1, and the output is filtered, and finally a hash vector with a length of 300 and a value of 0 or 1 is obtained, that is, a feature vector.
  • the reason why the extracted feature vector is called a hash vector is because even if the input image is a transformed image of the original image, the feature vector extracted by Mobilehashnet is still consistent with the feature vector of the original image.
  • the Hamming distance of the two pictures can be calculated according to the hash vector of the picture. The smaller the distance, the more similar the two pictures.
  • the realization of matching can specify a first threshold. When the Hamming distance is lower than the first threshold, the two pictures are considered to be the same picture and the matching is successful; otherwise, the matching fails.
  • the preparation process of the validation set is the same as the above training set. Prepare several pictures in the non-training set, perform data enhancement, and calculate the correct recall rate (recall) and wrong recall rate (wrong_recall) of the matching model under different candidate thresholds.
  • a grid search method can be used to gradually approach the optimal value.
  • the grid search results are shown in Figure 16 and Figure 17; among them, Figure 16 shows that when the candidate threshold is 35 to 70, the corresponding recall And wrong_recall. Figure 17 shows the corresponding recall and wrong_recall when the candidate threshold is 50 to 55.
  • the hash dimension directly determines the number of convolution kernels of the 2d convolutional layer (conv2d1 ⁇ 1) in the modified MobileNetV2 structure and the output dimension n of the activation layer. Since it is at the end of the network structure, its size directly affects the learning ability of the model. If the hash dimension is too small, it will lead to underfitting of the model and reduce the limit on the number of libraries; too large dimension not only increases the time consumption of generating hash, but also increases the time consumption of calculating the Hamming distance, so you need to choose a reasonable hash dimension .
  • Mobilehashnet uses deep neural networks to extract image features, which theoretically has performance advantages.
  • the matching performance of the Mobilehashnet algorithm, the Phash algorithm and the SIFT algorithm is compared under different image transformation methods. The experimental results are shown in Table 3.
  • the Phash algorithm is basically unable to match in image transformations such as flipping, rotating, and zooming; the SIFT algorithm is at a low value in all types of image changes.
  • the Mobilehashnet algorithm can achieve 100% recall in image transformations of flipping, distorting, cutting, mosaic, and noise, and in other image transformations, the recall value is higher, and the wrong_recall value is lower. .
  • training can be performed without manually labeling a large number of samples, and a large number of training samples are automatically obtained through image data enhancement technology.
  • the Mobilehashnet algorithm provided by the embodiments of this application extracts image features by using a deep neural network, generates image hashes based on these features, and performs image matching. Compared with the related image matching/similarity algorithm, it effectively improves the correct recall rate, reduces the false recall rate, and does not require a large amount of manual data annotation.
  • the picture review system reviews the pictures uploaded by users to prevent the spread of a large number of illegal pictures. Due to the complexity of image content, as shown in Figure 18, the process of the image review system includes an illegal library matching model, an image classification model, a face recognition model, a text recognition model, and a text classification model. The pictures to be reviewed are reviewed by each model in turn. When the results of all models are "normal”, the review result can be "normal", that is, a compliant picture; otherwise, it is a violating picture.
  • the illegal library matching model in the image review system can be implemented by the Mobilehashnet algorithm provided in the embodiment of this application, which ensures a high correct recall rate and a low error recall rate for matching.
  • the implementation process of this algorithm is shown in Figure 19, extract the hash vector of the picture to be reviewed; determine the Hamming distance of each hash vector in the illegal hash library corresponding to the hash vector and the illegal library, that is, calculate the Hamming distance in batches; judge; Whether each Hamming distance is greater than the first threshold, so as to obtain the recall result, that is, the correct recall rate and the false recall rate.
  • the offending hash library can be obtained when the system is initialized, and only one hash calculation is required for matching, that is, only the feature extraction of the image to be reviewed is required.
  • the image file review device provided by the embodiments of the present application, including the modules included and the units included in each module, can be implemented by the processor in the terminal; of course, it can also be implemented by specific logic. Circuit implementation; in the implementation process, the processor can be a central processing unit (CPU), a microprocessor (MPU), a digital signal processor (DSP), or a field programmable gate array (FPGA), etc.
  • the processor can be a central processing unit (CPU), a microprocessor (MPU), a digital signal processor (DSP), or a field programmable gate array (FPGA), etc.
  • FIG. 20A is a schematic structural diagram of an image file review device according to an embodiment of the application.
  • the device 200 includes a feature extraction module 201, a first determination module 202, and an review module 203, wherein:
  • the feature extraction module 201 is configured to use the target classification model to perform feature extraction on the image file to be reviewed to obtain the corresponding feature vector; wherein the target classification model is obtained through training of multiple sample image files and corresponding multiple image transformation files ;
  • the first determining module 202 is configured to determine the similarity between the feature vector of the image file to be reviewed and at least one reference feature vector in the review set;
  • the review module 203 is configured to determine whether the image file to be reviewed is a violation file according to the determined relationship between the similarity and the first threshold.
  • the feature extraction module 201 is configured to obtain a feature vector extraction structure of the target classification model, and the feature vector extraction structure includes the input layer to the non-linear activation layer of the target classification model;
  • the type of the target classification model is a neural network model; the feature vector extraction structure is used to perform feature extraction on the image file to be reviewed to obtain a corresponding feature vector.
  • the image auditing device 200 further includes: a tag acquisition module 204, configured to acquire the type tag of each sample image file; a transformation processing module 205, configured to follow a variety of transformation rules , Performing transformation processing on each of the sample image files to obtain an image transformation file set of the corresponding file; the tag labeling module 206 is configured to assign the type label of each sample image file to the corresponding image transformation file set Each image transformation file; the model training module 207 is configured to train a specific neural network model according to each of the sample image files, each of the image transformation files, and their corresponding type labels to obtain the target classification Model.
  • the review module 203 is configured to: determine the number of similarities less than the first threshold, where the similarity is used to characterize the number of different features between two feature vectors; determine that the number is equal to The ratio of the total number of similarities; according to the relationship between the ratio and the second threshold, it is determined whether the image file to be reviewed is a violation file.
  • the first determining module 202 is configured to determine the similarity between the feature vector of the image file to be reviewed and the i-th reference feature vector in the review set; where i is greater than 0 and Less than or equal to the total number of reference feature vectors in the review set; the similarity is used to characterize the number of different features between two feature vectors, and the reference image file corresponding to the reference feature vector is a violation file; accordingly ,
  • the review module 203 is configured to determine that the image file to be reviewed is a violation file when the similarity corresponding to the i-th reference feature vector is less than the first threshold.
  • the first determining module 202 is further configured to: when the similarity corresponding to the i-th reference feature vector is greater than or equal to the first threshold, determine the feature vector of the image file to be reviewed and The similarity between the i+1th reference feature vector in the review set is used to determine whether the image file to be reviewed is a violation file.
  • the image review device 200 further includes: a loading module 208 configured to load the generated review set; correspondingly, the feature extraction module 201 is further configured to: use the The target classification model performs feature extraction on multiple reference image files to obtain the feature vector of the corresponding file; and uses the feature vector of each reference image file as a reference feature vector to generate the review set.
  • a loading module 208 configured to load the generated review set
  • the feature extraction module 201 is further configured to: use the The target classification model performs feature extraction on multiple reference image files to obtain the feature vector of the corresponding file; and uses the feature vector of each reference image file as a reference feature vector to generate the review set.
  • the loading module 208 is configured to load the determined first threshold
  • the device further includes a second determination module, configured to use the feature extraction module, the first determination module, and the review module of the device under the assumption that the first threshold is a plurality of different candidate thresholds.
  • a second determination module configured to use the feature extraction module, the first determination module, and the review module of the device under the assumption that the first threshold is a plurality of different candidate thresholds.
  • the embodiments of the present application if the above-mentioned image review method is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer readable storage medium.
  • the technical solutions of the embodiments of the present application can be embodied in the form of a software product in essence or a part that contributes to related technologies.
  • the computer software product is stored in a storage medium and includes a number of instructions to enable The electronic device executes all or part of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), magnetic disk or optical disk and other media that can store program codes. In this way, the embodiments of the present application are not limited to any specific combination of hardware and software.
  • FIG. 21 is a schematic diagram of the hardware entity of the electronic device according to an embodiment of the application.
  • the electronic device 210 includes a memory 211 and a processor 212.
  • 211 stores a computer program that can be run on the processor 212, and the processor 212 implements the steps in the image review method provided in the foregoing embodiment when the processor 212 executes the program.
  • the memory 211 is configured to store instructions and applications executable by the processor 212, and can also cache data to be processed or processed by the processor 212 and each module in the electronic device 210 (for example, image data, audio data, etc.). , Voice communication data and video communication data), which can be implemented by flash memory (FLASH) or random access memory (Random Access Memory, RAM).
  • an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps in the image review method provided in the above-mentioned embodiments are implemented.
  • the disclosed device and method can be implemented in other ways.
  • the embodiments of the touch screen system described above are merely illustrative, for example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation, such as: multiple modules or components can be combined , Or can be integrated into another system, or some features can be ignored or not implemented.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed can be indirect coupling or communication connection through some interfaces, devices or modules, and can be electrical, mechanical or other forms of.
  • modules described above as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules; they may be located in one place or distributed on multiple network units; Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional modules in the embodiments of the present application may all be integrated into one processing unit, or each module may be individually used as a unit, or two or more modules may be integrated into one unit; the above-mentioned integration
  • the module can be implemented in the form of hardware, or in the form of hardware plus software functional units.
  • the foregoing program can be stored in a computer readable storage medium.
  • the execution includes The steps of the foregoing method embodiment; and the foregoing storage medium includes: various media that can store program codes, such as a removable storage device, a read only memory (Read Only Memory, ROM), a magnetic disk, or an optical disk.
  • ROM Read Only Memory
  • the aforementioned integrated unit of this application is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer readable storage medium.
  • the technical solutions of the embodiments of the present application can be embodied in the form of a software product in essence or a part that contributes to related technologies.
  • the computer software product is stored in a storage medium and includes a number of instructions to enable The electronic device executes all or part of the method described in each embodiment of the present application.
  • the aforementioned storage media include: removable storage devices, ROMs, magnetic disks, or optical disks and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

影像审核方法包括:利用目标分类模型对待审影像文件进行特征提取,得到对应的特征向量;其中,目标分类模型是通过多个样本影像文件和对应的多种影像变换文件训练得到的;确定所述待审影像文件的特征向量与审核集合中的至少一个参考特征向量之间的相似度;根据确定的相似度与第一阈值之间的关系,确定待审影像文件是否是违规文件。还提供影像审核装置、设备和存储介质。

Description

影像审核方法及装置、设备、存储介质 技术领域
本申请实施例涉及互联网技术,涉及但不限于影像审核方法及装置、设备、存储介质。
背景技术
在互联网内容的审核业务中,“坏人”故意将违规的影像文件进行各种方式的变换,以“骗过”影像审核装置,进而将违规的影像文件传播到互联网。影像文件的变换方式多种多样,例如,旋转、液化、变形、噪点、渲染等基本变换方式或它们的组合。可见,“坏人”将违规的影像文件进行变换后上传至互联网,给影像审核装置带来了非常大的技术挑战。
发明内容
本申请实施例提供的影像审核方法及装置、设备、存储介质是这样实现的:
本申请实施例提供的影像审核方法,包括:利用目标分类模型对待审影像文件进行特征提取,得到对应的特征向量;其中,所述目标分类模型是通过多个样本影像文件和对应的多种影像变换文件训练得到的;确定所述待审影像文件的特征向量与审核集合中的至少一个参考特征向量之间的相似度;根据确定的所述相似度与第一阈值之间的关系,确定所述待审影像文件是否是违规文件。
本申请实施例提供的影像审核装置,包括:特征提取模块,配置为利用目标分类模型对待审影像文件进行特征提取,得到对应的特征向量;其中,所述目标分类模型是通过多个样本影像文件和对应的多种影像变换文件训练得到的;第一确定模块,配置为确定所述待审影像文件的特征向量与审核集合中的至少一个参考特征向量之间的相似度;审核模块,配置为根据确定的所述相似度与第一阈值之间的关系,确定所述待审影像文件是否是违规文件。
本申请实施例提供的电子设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现本申请实施例任一所述影像审核方法中的步骤。
本申请实施例提供的计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现本申请实施例任一所述影像审核方法中的步骤。
本申请实施例中,电子设备利用目标分类模型对待审影像文件进行特征提取,得到对应的特征向量;其中,所述目标分类模型是通过多个样本影像文件和对应的多种影像变换文件训练得到的;如此,即使待审影像文件是对原始文件进行旋转、液化、变形等多种变换处理后的文件,仍然能够提取与原始文件相一致的特征向量,从而实现对任意变换的影像文件的准确识别,进而能够增强影像审核方法的鲁棒性。
附图说明
图1为本申请实施例影像审核方法的示例性应用场景的示意图;
图2为本申请实施例影像审核方法的实现流程示意图;
图3为本申请实施例目标分类模型的训练过程示意图;
图4为本申请实施例审核集合的生成方法的实现流程示意图;
图5为本申请实施例第一阈值的确定方法的实现流程示意图;
图6为本申请实施例另一影像审核方法的实现流程示意图;
图7A为本申请实施例MobileNetV2的结构示意图;
图7B为本申请实施例特征提取结构的结构示意图;
图8为本申请实施例又一影像审核方法的实现流程示意图;
图9为本申请实施例再一影像审核方法的实现流程示意图;
图10为本申请实施例另一影像审核方法的实现流程示意图;
图11为本申请实施例又一影像审核方法的实现流程示意图;
图12为本申请实施例对原始图片进行变换操作的示意图;
图13为本申请实施例简化后的MobileNetV2结构示意图;
图14为sigmoid函数的曲线示意图;
图15为本申请实施例图片匹配的流程示意图;
图16为本申请实施例候选阈值为35至70时对应的recall和wrong_recall;
图17为本申请实施例候选阈值为50至55时对应的recall和wrong_recall;
图18为本申请实施例图片审核***的流程示意图;
图19为本申请实施例图片审核***中的Mobilehashnet算法流程示意图;
图20A为本申请实施例影像文件审核装置的结构示意图;
图20B为本申请实施例另一影像文件审核装置的结构示意图;
图21为本申请实施例的电子设备的硬件实体示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请的具体技术方案做进一步详细描述。以下实施例用于说明本申请,但不用来限制本申请的范围。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
需要指出,本申请实施例所涉及的术语“第一\第二\第三”仅仅是是区别类似或不同的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
下面首先说明本申请实施例提供的影像审核方法的示例性应用场景。
图1是本申请实施例提供的影像审核方法的示例性应用场景100的示意图。如图1 所示,场景100包括终端101、影像审核装置102和第二数据库103。其中,影像审核装置102,用于对用户在终端101输入的影像文件104进行审核,以确定该文件是否是违规文件;如果是违规文件,则禁止将该文件存储在第二数据库103中;反之,如果不是违规文件,即该文件是合规文件,则允许该文件存储在第二数据库103中,以便该用户或其他用户检索、浏览或下载该文件。
需要说明的是,终端101可以是移动电话(例如手机)、平板电脑、笔记本电脑等具有无线通信能力的移动终端,还可以是不便移动的具有计算功能的台式计算机、桌面电脑等。
影像审核装置102可以配置在终端101中,也可以独立于终端101而配置。应用场景100中可以有一个或多个影像审核装置102。多个影像审核装置102可以并行对不同用户输入的影像文件进行审核,从而提高数据处理速度。
第二数据库103除了可以独立于影像审核装置102和终端101的配置之外,在影像审核装置102配置在网络侧的情况下,第二数据库103还可以配置在影像审核装置102中。
在终端101、影像审核装置102和第二数据库103相互独立于不同的设备的情况下,终端101与影像审核装置102之间可以通过网络进行通信,影像审核装置102与第二数据库103之间也可以通过网络进行通信,该网络可以为无线网络或有线网络,本申请实施例在此不对通信方式进行具体限定。
本申请实施例提供一种影像审核方法,所述方法可以应用于具有影像审核装置的电子设备,所述电子设备可以是计算机设备、笔记本电脑、分布式计算架构中的任一节点服务器、移动终端等。所述影像审核方法所实现的功能可以通过所述电子设备中的处理器调用程序代码来实现,当然程序代码可以保存在计算机存储介质中。可见,所述电子设备至少包括处理器和存储介质。
图2为本申请实施例影像审核方法的实现流程示意图,如图2所示,所述方法可以包括以下步骤201至步骤203:
步骤201,利用目标分类模型对待审影像文件进行特征提取,得到对应的特征向量;其中,所述目标分类模型是通过多个样本影像文件和对应的多种影像变换文件训练得到的。
需要说明的是,目标分类模型可以是深度学习模型,例如为神经网络模型。对于该模型中所包含的层数不做限定。该模型可以是轻量级的神经网络模型,例如为MobileNetV2。当然,该模型也可以是非轻量级的神经网络模型。对于目标分类模型的训练过程,电子设备可以通过如下实施例的步骤301至304实现。
可以理解地,所谓影像变换文件,指的是对样本影像文件进行翻转、旋转、液化、缩放、剪裁、马赛克、噪声、变色或遮挡等变换处理或者这些变换方式的组合处理得到的文件。
待审影像文件可以是多种多样的,例如待审影像文件为一张图像或者一段视频(例如为短视频、直播视频、电影、电视剧等)。在待审影像文件为一段视频的情况下,电子设备可以从该视频中随机采样一帧或多帧视频帧图像,然后通过目标分类模型对这些图像进行特征提取,得到该视频对应的特征向量。
步骤202,确定所述待审影像文件的特征向量与审核集合中的至少一个参考特征向量之间的相似度。
通常情况下,为了保证审核准确率,待审影像文件为一张图像和为一段视频时,对应的审核集合可以是不同的。也就是说,待审影像文件为一张图像时,对应的审核集合中的参考特征向量是电子设备从一张图像中提取得到的。待审影像文件为一段视频时, 对应的审核集合中的一个参考特征向量是电子设备从多张图像中提取得到的。总而言之,待审影像文件的特征向量的维度与参考特征向量的维度一致。当然,也可以不局限于上述规则。这两个特征向量的维度也可以是不同的。
表征相似度的参数类型可以是多种多样的,例如可以是汉明距离、欧氏距离或者余弦相似度等。
步骤203,根据确定的所述相似度与第一阈值之间的关系,确定所述待审影像文件是否是违规文件。
可以理解地,基于合规的参考影像文件生成的审核集合(为简便描述,以下称为合规集合)和基于违规的参考影像文件生成的审核集合(以下称为违规集合),对应的判断准则是不同的。
以相似度通过汉明距离来表征为例,两个等长字符串之间的汉明距离指的是两个字符串对应位置的不同字符的个数。因此,汉明距离越小,说明两个特征向量越相似,对应的两个影像文件也越相似。对于违规集合来讲,在一个示例中,确定小于第一阈值的相似度的数目与相似度总数目的比值,当该比值大于第二阈值时,确定待审影像文件为违规文件。对于合规集合来讲,在一个示例中,当该比值大于第二阈值时,确定待审影像文件为合规文件。
确定待审影像文件是否是违规文件的方法可以是多种多样的。例如,电子设备可以通过如下实施例的步骤604至步骤606实现。再如,电子设备还可以通过如下实施例的步骤802至步骤809实现。相似度表征的是两个特征向量之间的不同特征的数目,审核集合为违规集合,电子设备可以每确定一次与参考特征向量之间的相似度,便统计一次当前小于第一阈值的相似度数目,如果该数目大于或等于第三阈值,则停止相似度的运算,确定待审影像文件为违规文件,以此作为审核结果输出。
又如,电子设备还可以通过以下实施例的步骤902至步骤904确定待审影像文件是否是违规文件。
在本申请实施例中,电子设备利用目标分类模型对待审影像文件进行特征提取,得到对应的特征向量;其中,所述目标分类模型是通过多个样本影像文件和对应的多种影像变换文件训练得到的;如此,即使待审影像文件是对原始文件进行旋转、液化、变形等多种变换处理后的文件,仍然能够提取与原始文件相一致的特征向量,从而实现对任意变换的影像文件的准确识别,进而能够增强影像审核方法的鲁棒性。
在一些实施例中,电子设备在对待审影像文件进行审核之前,可以预先地训练得到目标分类模型、生成审核集合和确定第一阈值;其中,
对于目标分类模型的训练过程,如图3所示,可以包括以下步骤301至步骤304。需要说明的是,电子设备可以在对待审影像文件进行特征提取之前,执行以下步骤301至步骤304。电子设备还可以在被配置为具有影像审核功能时,执行以下步骤301至步骤304。
步骤301,获取每一样本影像文件的类型标签。
可以理解地,样本影像文件包括违规的影像文件和合规的影像文件。违规的影像文件,例如可以是与恐怖、暴力、色情和赌博等相关的文件。合规的影像文件,例如可以是与自然风景和建筑物等相关的文件。电子设备可以从收集了多种多样的违规影像文件的第一数据库中采样得到部分违规的样本文件,从收集了多种多样的合规影像文件的第二数据库中采样得到部分合规的样本文件。
为了降低每一样本影像文件的标签标注工作,通常情况下,从第一数据库和第二数据库中选取一定数量的影像文件作为样本文件。例如,从这两个数据库中选取100张违规图像和100张合规图像作为样本影像文件。
步骤302,按照多种变换规则,对每一所述样本影像文件进行变换处理,得到对应文件的影像变换文件集合。
变换规则可以是多种多样的,例如,基本的变换规则包括翻转、旋转、液化、缩放、剪裁、马赛克、噪声、变色和遮挡等。组合的变换规则为至少两种基本的变换规则的组合。以上述9种基本的变换规则为例,组合的变换规则包括502种,即
Figure PCTCN2020092923-appb-000001
在一个示例中,电子设备可以按照100种不同的变换规则,对样本影像文件进行变换处理,得到该文件对应的100个影像变换文件。
步骤303,将每一所述样本影像文件的类型标签,赋予给对应影像变换文件集合中的每一影像变换文件。
可以理解地,变换后的影像文件与变换前的影像文件的类型标签应该是一致的。例如,违规影像文件被进行了液化处理,液化处理后的文件仍然是违规的,其性质是不变的。因此,每一样本影像文件对应的影像变换文件的类型标签可以与该样本影像文件的类型标签一致。
步骤304,根据每一所述样本影像文件、每一所述影像变换文件和各自对应的类型标签,对特定的神经网络模型进行训练,得到所述目标分类模型。
在本申请实施例中,按照多种变换规则,对每一样本影像文件进行变换处理,得到对应文件的影像变换文件集合;将每一样本影像文件的类型标签,赋予给对应影像变换文件集合中的每一影像变换文件;根据每一所述样本影像文件、每一所述影像变换文件和各自对应的类型标签,对特定的神经网络模型进行训练,得到所述目标分类模型。
如此,一方面,训练样本中包括对样本影像文件进行多种变换得到的影像变换文件,这样能够丰富训练样本的多样性,使得训练得到的目标分类模型具有较好的鲁棒性。在基于该目标分类模型对待审影像文件进行审核时,能够对抗变换处理后的文件。即使用户在输入影像文件之前,对该文件进行了翻转、旋转、缩放、裁剪、马赛克等变换处理,也能够通过该模型准确地提取变换处理后的文件的特征向量,从而能够准确识别该文件是否是违规文件。简单来说,利用该模型对变换处理前和变换处理后的影像文件提取的特征向量基本一致,因此即使输入的影像文件是变换处理后的文件,电子设备也能够准确地识别出该文件是否是违规文件。
另一方面,在本申请实施例中,将每一样本影像文件的类型标签,赋予给对应影像变换文件集合中的每一影像变换文件;如此,在保证训练样本多样性的前提下,减少了人工标注成本,无需人工对每一影像变换文件标注类型标签。电子设备通过对样本影像文件进行变换处理,即可自动获取大量丰富多样的训练样本。
在一些实施例中,电子设备可以预先将已生成的审核集合加载至缓存中。对于加载的时机不做限定。例如,电子设备可以在利用目标分类模型对待审影像文件进行特征提取之前,加载已生成的审核集合;再如,电子设备还可以在对待审影像文件进行特征提取之后,且在确定待审影像文件的特征向量与审核集合中的至少一个参考特征向量之间的相似度之前,加载已生成的审核集合;又如,电子设备还可以在被配置为具有影像审核功能时,加载已生成的审核集合。
在一些实施例中,对于审核集合的生成方法,如图4所示,可以包括以下步骤401和步骤402:
步骤401,利用所述目标分类模型,对多个参考影像文件进行特征提取,得到对应文件的特征向量。
在一些实施例中,所述多个参考影像文件可以是违规文件,例如为第一数据库中的全部或部分文件,基于此得到的审核集合为违规集合。在另一些实施例中,所述多个参 考影像文件可以是合规文件,例如为第二数据库中的全部或部分文件。如上文提到的,审核集合的性质不同,即合规集合和违规集合,在影像审核阶段,对应的判断准则也是不同的。
所述多个参考影像文件是数据库中的部分文件时,可以是电子设备从数据库中随机抽取的文件,还可以是数据库中一些具有代表性的文件,比如优先级比较高的一些文件。
步骤402,将每一所述参考影像文件的特征向量作为参考特征向量,生成所述审核集合。
在本申请实施例中,预先将审核集合加载至缓存区。这样,电子设备在对待审影像文件进行审核的过程中,无需对所述多个参考影像文件进行特征提取,以生成审核集合;而是,直接使用预先生成的审核集合进行影像审核即可。如此,能够节约特征提取处理的时间消耗,从而能够节约影像的审核时长。
在一些实施例中,电子设备可以预先将已确定的第一阈值加载至缓存中。对于加载的时机不做限定。例如,电子设备可以在确定所述待审影像文件是否是违规文件之前,加载已确定的第一阈值;再如,电子设备还可以在对待审影像文件进行特征提取之前,加载已确定的第一阈值;又如,电子设备还可以在被配置为具有影像审核功能时,加载已确定的第一阈值。
在一些实施例中,所述第一阈值的确定方法,如图5所示,可以包括以下步骤501至步骤503:
步骤501,在假设所述第一阈值分别为多个不同候选阈值的情况下,根据所述影像审核方法,确定多个验证影像文件是否是违规文件,从而得到每一所述候选阈值对应的审核结果集合。
在一些实施例中,所述多个验证影像文件可以包括违规影像文件和合规影像文件。验证影像文件与用于训练神经网络模型的文件不同。所述多个验证影像文件中还可以包括电子设备对原始影像文件进行多种变换处理后的文件。变换处理采用的变换规则可以与模型训练阶段采用的变换规则相同。
可以理解地,通过实施步骤501,能够得到基于每一候选阈值获得的审核结果集合。如表1所示,其中阈值1对应的审核结果集合为表1中的第2列的内容。
表1
  候选阈值1 候选阈值2 …… 候选阈值N
验证影像文件1 1 1 …… 1
验证影像文件2 0 1 …… 1
…… …… …… …… ……
验证影像文件M 1 1 …… 0
其中,候选阈值所属列中的“1”表示对应的文件的审核结果为合规文件,“0”表示对应的文件的审核结果为违规文件。
步骤502,根据每一审核结果集合和每一所述验证影像文件的类型标签,确定在对应候选阈值下的正确召回率和错误召回率。
在一个示例中,正确召回率的计算公式如下式(1)所示:
Figure PCTCN2020092923-appb-000002
错误召回率的计算公式如下式(2)所示:
Figure PCTCN2020092923-appb-000003
在式(1)和式(2)中,TN表示将违规文件审核为违规文件的数量;FP表示将违规文件审核为合规文件的数量;FN表示将合规文件审核为违规文件的数量。
步骤503,将满足特定条件的正确召回率和错误召回率所对应的候选阈值,确定为所述第一阈值。
可以理解地,选择哪个候选阈值作为第一阈值,直接决定了影像审核方法的识别准确率。因此,应该在保证较高正确召回率的前提下,尽量地降低错误召回率,从而选择对应的候选阈值作为第一阈值。举例来说,在保证正确召回率大于或等于最小正确召回率(比如0.85)的情况下,选择最小错误召回率对应的候选阈值,作为第一阈值。
在一些实施例中,电子设备可以采用网格搜索法,逐渐逼近最佳值,从而从多个候选阈值中选择第一阈值。
本申请实施例再提供一种影像审核方法,图6为本申请实施例影像审核方法的实现流程示意图,如图6所示,所述方法可以包括以下步骤601至步骤606:
步骤601,获取所述目标分类模型的特征向量提取结构,所述特征向量提取结构包括所述目标分类模型的输入层至非线性激活层;其中,所述目标分类模型是通过多个样本影像文件和对应的多种影像变换文件训练得到的。
举例来说,目标分类模型可以为轻量级的神经网络模型MobileNetV2。该网络的结构,如图7A所示,包括“bottleneck结构”、conv2d层、sigmoid激活层、n×1维的全连接层(Dense)和归一化指数层(softmax)。在一些实施例中,如图7B所示,可以将“bottleneck结构”、conv2d层和sigmoid激活层作为特征向量提取结构。
步骤602,利用所述特征向量提取结构,对所述待审影像文件进行特征提取,得到对应的特征向量。
也就是说,特征向量提取结构的非线性激活层的输出即为该文件对应的特征向量。
步骤603,确定所述待审影像文件的特征向量与审核集合中的每一参考特征向量之间的相似度;其中,所述相似度用于表征两个特征向量之间的不同的特征数目;
步骤604,确定小于所述第一阈值的相似度的数目,所述相似度用于表征两个特征向量之间的不同的特征数目。
例如,相似度为汉明距离。
步骤605,确定所述数目与相似度总数目的比值;
步骤606,根据所述比值与第二阈值之间的关系,确定所述待审影像文件是否是违规文件。
可以理解地,在审核集合为违规集合的情况下,所述比值大于第二阈值时,确定待审影像文件为违规文件;所述比值小于或等于第二阈值时,确定该文件为合规文件。
在审核集合为合规集合的情况下,所述比值大于第二阈值时,确定待审影像文件为合规文件;所述比值小于或等于第二阈值时,确定该文件为违规文件。
在本申请实施例中,统计小于第一阈值的相似度的数目,确定该数目与确定的相似度总数目之间的比值;根据比值与第二阈值的关系,确定待审影像文件是否是违规文件;如此,相比于仅根据与一个参考特征向量的相似度,获得审核结果,这种方式获得的审核结果更为可靠,识别准确率更高。
本申请实施例再提供一种影像审核方法,图8为本申请实施例影像审核方法的实现流程示意图,如图8所示,所述方法可以包括以下步骤801至步骤809:
步骤801,利用目标分类模型对待审影像文件进行特征提取,得到对应的特征向量;其中,所述目标分类模型是通过多个样本影像文件和对应的多种影像变换文件训练得到的。
可以理解地,一个目标分类模型通常由多个顺序连接的层(layer)组成。第一层一般以图像为输入,通过特定的运算从图像中提取特征。接下来,每一层以前一层提取的特征作为输入,对其进行特定形式的变换,便可以得到更复杂一些的特征。这种层次化的特征提取过程可以累加,从而赋予了神经网络强大的特征提取能力。经过很多层的变换之后,神经网络就可以将初始输入的图像变换为更高层次的抽象的特征。
这种由简单到复杂、由低级到高级的抽象过程可以通过生活中的例子来体会。例如,在英语学习过程中,通过字母的组合,可以得到单词;通过单词的组合,可以得到句子;通过对句子的分析,可以了解语义;通过对语义的分析,可以获得表达的思想或目的。而这种语义、思想等,就是更高级别的抽象。
因此,在本申请实施例中,通过目标分类模型对待审影像文件进行特征提取时,无论待审影像文件是原始文件经过多么复杂的变换处理得到的,其提取的特征向量基本是不变的。这样,使得所述影像审核方法具有较强的鲁棒性,即使违规文件被变换处理后上传至网络,仍然能够被准确识别。
步骤802,确定所述待审影像文件的特征向量与所述审核集合中的第i个参考特征向量之间的相似度;其中,i大于0且小于或等于所述审核集合中的参考特征向量总数目;
步骤803,根据所述第i个参考特征向量对应的相似度与第一阈值之间的关系,确定所述待审影像文件是否是违规文件;如果是,执行步骤804;否则,执行步骤807;
所谓第i个参考特征向量对应的相似度,指的是待审影像文件的特征向量与第i个参考特征向量之间的相似度。
步骤804,统计所述待审影像文件是违规文件的第一确定次数;
步骤805,确定所述第一确定次数是否大于第三阈值;如果是,执行步骤806;否则,i+1,返回执行步骤802;
步骤806,输出所述待审影像文件是违规文件。
可以理解地,如果第一确定次数大于第三阈值,则足以可靠地确定待审影像文件是违规文件,此时无需再继续计算待审影像文件的特征向量与剩余参考特征向量之间的相似度了,从而节约运算量,缩短审核时长。
举例来说,假设审核集合包括10000个参考特征向量,第三阈值为900,相似度通过汉明距离表征。那么,在计算至第1000个参考特征向量对应的相似度时,第一确定次数为901。即,在第1个至第1000个参考特征向量对应的相似度中,有901个相似度小于第一阈值。至此可以结束影像审核流程,输出待审影像文件为违规文件的审核结果。而无需再继续计算与剩余的9000个参考特征向量之间的相似度了。
步骤807,统计所述待审影像文件是合规文件的第二确定次数;
步骤808,确定所述第二确定次数是否大于第四阈值;如果是,执行步骤809;否则,i+1,返回执行步骤802;
在一些实施例中,第四阈值大于第三阈值。这样,可以降低违规文件的误检率。
步骤809,输出所述待审影像文件是合规文件。
本申请实施例再提供一种影像审核方法,图9为本申请实施例影像审核方法的实现流程示意图,如图9所示,所述方法可以包括以下步骤901至步骤904:
步骤901,利用目标分类模型对待审影像文件进行特征提取,得到对应的特征向量;其中,所述目标分类模型是通过多个样本影像文件和对应的多种影像变换文件训练得到的;
步骤902,确定所述待审影像文件的特征向量与所述审核集合中的第i个参考特征向量之间的相似度;其中,i大于0且小于或等于所述审核集合中的参考特征向量总数 目,所述参考特征向量对应的参考影像文件为违规文件;所述相似度用于表征两个特征向量之间的不同的特征数目;
步骤903,确定所述第i个参考特征向量对应的相似度是否小于第一阈值;如果是,执行步骤904;否则,i+1,返回执行步骤902;
步骤904,确定所述待审影像文件是违规文件,并输出该审核结果。
相比于上述步骤802至步骤809,这里,如果第i个参考特征向量对应的相似度小于第一阈值,则结束审核流程,输出待审影像文件是违规文件的审核结果;否则,继续遍历下一参考特征向量,直至确定待审影像文件是违规文件为止。当然,在一些实施例中,如果遍历审核集合中的每一参考特征向量,结果均为对应的相似度大于或等于第一阈值,则输出待审影像文件是合规文件的审核结果。
在相关技术中,通过将输入图片(即待审图片)与违规图库(即第一数据库的一种示例)中的图片进行相似度计算,以判断该输入图片是否违规。常用的相似度算法,比如感知哈希(pHash)算法和尺度不变特征转换(Scale-Invariant Feature Transform,SIFT)算法。
pHash算法,是通过人工设计的规则算法,该算法的基本原理是:获得输入图片的hash值,再通过计算该输入图片与违规图库中的一张图片的hash“距离”,从而得到这两张图片的相似度;当相似度大于设定的阈值时,则认为匹配成功。算法的实现过程如下:
缩小输入图片的尺寸;简化缩小后的图片的色彩;计算简化后的图片的平均值;基于平均值,比较像素的灰度;基于灰度,计算哈希值;基于哈希值,计算与违规图库中的一张图片的汉明距离;当汉明距离小于设定的阈值时,则确定匹配成功,输入图片为违规图片。
SIFT算法,用来侦测和描述图片中的局部性特征,它在空间尺度中寻找极值点,并提取出其位置、尺度、旋转不变量。局部特征的描述和侦测可以帮助辨识物体,SIFT特征是基于物体上的一些局部外观的兴趣点而与图片的大小和旋转无关。
然而,pHash算法和SIFT算法的算法因子(即图片特征抽取算子)均由人为设计,因此只能满足特定的匹配场景。pHash算法只能保持尺度缩放、变色的不变性;SIFT算法只能保持旋转、尺度缩放、亮度变化、仿射、噪声的不变性。
基于此,下面将说明本申请实施例在一个实际的应用场景中的示例性应用。
对于端到端的深度学习匹配算法,主要通过神经网络模型,直接计算两张图片是否匹配。实现流程如图10所示,分为训练阶段和预测阶段。训练阶段的基本流程包括以下步骤1001至步骤1004:
步骤1001,设计模型结构(包括卷积层、全连接层和池化层等),得到初始的相似度模型,即神经网络模型;
步骤1002,准备大量图片数据作为训练样本;
步骤1003,对训练样本中的每张图片进行数据增强处理,比如,对图片分别进行旋转、镜像和渲染等,将同一张图片经过不同数据变换后得到的两张图片,组合为正样本(1),其他变换后的图片作为负样本(0)。
步骤1004,通过梯度下降系列优化算法和数据增强后的训练样本,更新初始的相似度模型,得到训练好的相似度模型,即目标分类模型。
预测阶段的基本流程,如图10所示,包括步骤1005至步骤1007:
步骤1005,输入图片与违规图库中的每张图片进行相似度计算;
步骤1006,确定小于第一阈值的相似度数目与相似度总数目的比值是否大于第二阈值;如果是,执行步骤1007;
步骤1007,认为匹配成功,确定输入图片为违规图片。
端到端的深度学习匹配算法,深度学习模型含有多个通过梯度下降获得的卷积核,卷积核对图片特征的表达能力极强,基本满足所有图片变换场景。但是,在预测阶段,对于一个输入图片,需要循环地与图库中的所有图片进行匹配计算,再加上神经网络模型本身的计算消耗,其资源的消耗是无法接受的。
在本申请实施例中,结合hash和深度学习的特点,采用深度神经网络抽取图片特征,获得图片hash,即特征向量的一种示例;比较两张图片hash的相似度,判断是否匹配成功。
以下详细描述本申请实施例提供的影像审核方法的实现流程,如图11所示,该流程可以包括以下Step1至Step4):
Step1)数据准备。准备200张原始图片,如图12所示,对每张原始图片进行翻转、旋转、缩放、裁剪、液化、马赛克、噪声、变色、遮挡等图片变换操作,或者它们的组合变换。对每张图片进行100次不同的变换操作,这样总共获得20000个样本。
Step2)设计模型。选用轻量级的深度神经网络MobileNetV2作为特征提取器。在对该模型进行训练之前,对MobileNetV2网络结构进行修改,MobileNetV2的原结构如下表2所示,其中,表头“Input”为该结构层输入的大小,“Operator”为该层的结构类型,“c”为该层的输出特征层维度,“n”为该层的重复次数,“s”为深度卷积核的步数。
表2
Num Input Operator c n s
1 224 2×3 Conv2d 32 1 2
2 112 2×32 bottleneck 16 1 1
3 112 2×16 bottleneck 24 2 2
4 56 2×24 bottleneck 32 3 2
5 28 2×32 bottleneck 64 4 2
6 14 2×64 bottleneck 96 3 1
7 14 2×96 bottleneck 160 3 2
8 7 2×160 bottleneck 320 1 1
9 7 2×320 Conv2d 1×1 1280 1 1
10 7 2×1280 Avgpool 7×7 - 1 -
11 1×1×1280 Conv2d 1×1 k - -
12 k×1 Active-Softmax k - -
MobileNetV2的第11层的输入大小固定为1×1×1280,采用k个1×1大小的卷积核进行卷积计算,从而输出长度为k的1维向量。最后,连接softmax激活层,从而计算得到k个类别的概率。
为了便于描述,将表2所示的第1至10层简称为“bottleneck结构”,简化后的MobileNetV2结构如图13所示。
对MobileNetV2结构进行如下修改:在conv2d层与softmax层中间,添加一层sigmoid激活层与n×1维的全连接层(Dense)。修改后的MobileNetV2结构如图7A所示。
Step3)模型训练阶段。
将Step1中得到的20000张图片作为训练样本,200张原始图片作为训练样本的标签,训练一个图片分类模型,即特定的神经网络模型。对应到图7A中,k=200,n为需要编码hash的维度(例如取为300)。训练图7A中所示的修改后的MobileNetV2分类 模型。
模型损失函数为多分类的交叉熵损失(categorical_crossentropy),优化算法为Adam,学习率固定为0.001,训练得到的模型准确率>99.5%。
Step4)匹配阶段。
加载Step3得到的模型参数,为了得到图片的hash值,删除模型的最后两层,即Dense层与softmax层,修改后的模型如图7B所示。为便于描述,将此模型称为“Mobilehashnet”,即特征向量提取结构的一种示例。将基于该模型实现的图片审核方法称为Mobilehashnet算法。
此时模型的输出为一个长度为n(例如为300)的1维向量,如图14所示,由于激活函数为sigmoid,sigmoid输出的取值范围为(0,1)。然后,根据输出<0.5则取0,输出>0.5则取1的原则,对输出进行过滤,最终得到长度为300、取值为0或1的hash向量,即特征向量。
需要说明的是,之所以将提取的特征向量称为hash向量,是因为即使输入图片是原始图片被变换处理后的图片,Mobilehashnet提取的特征向量仍然与原始图片的特征向量一致。
如图15所示,在获得图片1和图片2的hash向量之后,即可根据图片的hash向量,计算两张图片的汉明距离。距离越小两张图片越相似。匹配的实现可规定一个第一阈值,当汉明距离低于第一阈值时,则认为两张图片为同一张图片,匹配成功;否则,匹配失败。
需要说明的是,这里对于第一阈值的选取,需要预先通过验证获得。其中验证集的准备过程与上述训练集相同。准备若干非训练集中的图片,进行数据增强,计算不同候选阈值下,匹配模型的正确召回率(recall)和错误召回率(wrong_recall)。
一个好的匹配模型,应在保证正确召回率的前提下,尽量降低错误召回率。在一些实施例中,可以采用网格搜索法,逐渐逼近最佳值,网格搜索结果如图16和图17所示;其中,图16示出了候选阈值为35至70时,对应的recall和wrong_recall。图17示出了候选阈值为50至55时,对应的recall和wrong_recall。
在一个示例中,在候选阈值=52处,recall=0.85,wrong_recall=0.15,是一个好的取值。这是因为,在保证recall的值大于或等于0.85的前提下,wrong_recall的值越小越好,因此可以将最小wrong_recall对应的候选阈值确定为第一阈值。
hash维度直接决定了修改后的MobileNetV2结构中2d卷积层(conv2d1×1)的卷积核个数以及激活层的输出维度n,由于处于网络结构的末端,其大小直接影响模型的学习能力。hash维度过小,将导致模型欠拟合,并降低图库的数量限制;维度过大不仅增加了生成hash的耗时,并增加了计算汉明距离的耗时,所以需要选择一个合理的hash维度。
在一个示例中,hash维度n取为原始图片数量(分类种类)的1.5倍,即,n=1.5×200=300。
可以理解地,相对于依靠纯人工设计的计算因子,Mobilehashnet采用深度神经网络提取图片特征,理论上具有性能优势。为了更直观地说明其高性能特点,在不同图片变换方式下,进行Mobilehashnet算法与Phash算法、SIFT算法的匹配性能对比,实验结果如表3所示。
表3
Figure PCTCN2020092923-appb-000004
Figure PCTCN2020092923-appb-000005
从表3所示的对比结果中可以看出,Phash算法在翻转、旋转、缩放等图片变换中基本无法进行匹配;SIFT算法在所有图片变化种类中,recall均处于较低值。而本申请实施例中,Mobilehashnet算法在翻转、扭曲、剪切、马赛克、噪声的图片变换中,能达到100%的recall,且在其余图片变换中,recall值均较高,wrong_recall值均较低。
相比于相关技术中通过人工标注大量样本训练一个图片分类模型,在本申请实施例所提供的Mobilehashnet算法中,无需通过人工大量标注样本即可进行训练,通过图片数据增强技术自动获取大量训练样本。
本申请实施例所提供的Mobilehashnet算法,通过采用深度神经网络提取图片特征,基于这些特征生成图片hash,并进行图片匹配。相比于相关图片匹配/相似度算法,有效地提高了正确召回率,降低了错误召回率,且无需人工大量标注数据。
图片审核***对用户上传的图片进行审核,防止大量违法违规图片的传播。由于图片内容的复杂性,如图18所示,图片审核***流程包括了违规图库匹配模型、图片分类模型、人脸识别模型、文字识别模型、文本分类模型。待审图片依次经过各个模型进行审核,当所有模型结果均为“正常”时,其审核结果才能是“正常”,即是合规图片;否则,则为违规图片。
其中,图片审核***中的违规图库匹配模型,可由本申请实施例提供的Mobilehashnet算法实现,保证匹配的高的正确召回率与低的错误召回率。该算法的实现流程如图19所示,提取待审图片的hash向量;确定该hash向量与违规图库对应的违规hash库中的每一hash向量的汉明距离,即批量计算汉明距离;判断每一汉明距离是否大于第一阈值,从而获得召回结果,即正确召回率和错误召回率。
在一些实施例中,违规hash库在***初始化时即可获得,匹配时仅需进行一次hash计算,即仅需对待审图片进行特征提取即可。
基于前述的实施例,本申请实施例提供的影像文件审核装置,包括所包括的各模块、以及各模块所包括的各单元,可以通过终端中的处理器来实现;当然也可通过具体的逻辑电路实现;在实施的过程中,处理器可以为中央处理器(CPU)、微处理器(MPU)、 数字信号处理器(DSP)或现场可编程门阵列(FPGA)等。
图20A为本申请实施例影像文件审核装置的结构示意图,如图20A所示,所述装置200包括特征提取模块201、第一确定模块202和审核模块203,其中:
特征提取模块201,配置为利用目标分类模型对待审影像文件进行特征提取,得到对应的特征向量;其中,所述目标分类模型是通过多个样本影像文件和对应的多种影像变换文件训练得到的;
第一确定模块202,配置为确定所述待审影像文件的特征向量与审核集合中的至少一个参考特征向量之间的相似度;
审核模块203,配置为根据确定的所述相似度与第一阈值之间的关系,确定所述待审影像文件是否是违规文件。
在一些实施例中,特征提取模块201,配置为:获取所述目标分类模型的特征向量提取结构,所述特征向量提取结构包括所述目标分类模型的输入层至非线性激活层;其中,所述目标分类模型的类型为神经网络模型;利用所述特征向量提取结构,对所述待审影像文件进行特征提取,得到对应的特征向量。
在一些实施例中,如图20B所示,影像审核装置200还包括:标签获取模块204,配置为获取每一所述样本影像文件的类型标签;变换处理模块205,配置为按照多种变换规则,对每一所述样本影像文件进行变换处理,得到对应文件的影像变换文件集合;标签标注模块206,配置为将每一所述样本影像文件的类型标签,赋予给对应影像变换文件集合中的每一影像变换文件;模型训练模块207,配置为根据每一所述样本影像文件、每一所述影像变换文件和各自对应的类型标签,对特定的神经网络模型进行训练,得到所述目标分类模型。
在一些实施例中,审核模块203,配置为:确定小于所述第一阈值的相似度的数目,所述相似度用于表征两个特征向量之间的不同的特征数目;确定所述数目与相似度总数目的比值;根据所述比值与第二阈值之间的关系,确定所述待审影像文件是否是违规文件。
在一些实施例中,第一确定模块202,配置为:确定所述待审影像文件的特征向量与所述审核集合中的第i个参考特征向量之间的相似度;其中,i大于0且小于或等于所述审核集合中的参考特征向量总数目;所述相似度用于表征两个特征向量之间的不同的特征数目,所述参考特征向量对应的参考影像文件为违规文件;相应地,审核模块203,配置为在所述第i个参考特征向量对应的相似度小于所述第一阈值的情况下,确定所述待审影像文件是违规文件。
在一些实施例中,第一确定模块202,还配置为:在所述第i个参考特征向量对应的相似度大于或等于所述第一阈值时,确定所述待审影像文件的特征向量与所述审核集合中的第i+1个参考特征向量之间的相似度,以确定所述待审影像文件是否是违规文件。
在一些实施例中,如图20B所示,影像审核装置200,还包括:加载模块208,配置为加载已生成的所述审核集合;相应地,特征提取模块201,还配置为:利用所述目标分类模型,对多个参考影像文件进行特征提取,得到对应文件的特征向量;将每一所述参考影像文件的特征向量作为参考特征向量,生成所述审核集合。
在一些实施例中,加载模块208,配置为加载已确定的所述第一阈值;
相应地,所述装置还包括第二确定模块,配置为:在假设所述第一阈值分别为多个不同候选阈值的情况下,利用所述装置的特征提取模块、第一确定模块和审核模块,确定多个验证影像文件是否是违规文件,从而得到每一所述候选阈值对应的审核结果集合;根据每一审核结果集合和每一所述验证影像文件的类型标签,确定在对应候选阈值下的正确召回率和错误召回率;将满足特定条件的正确召回率和错误召回率所对应的候 选阈值,确定为所述第一阈值。
以上装置实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请装置实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。
需要说明的是,本申请实施例中,如果以软件功能模块的形式实现上述的影像审核方法,并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得电子设备执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本申请实施例不限制于任何特定的硬件和软件结合。
对应地,本申请实施例提供一种电子设备,图21为本申请实施例的电子设备的硬件实体示意图,如图21所示,所述电子设备210包括存储器211和处理器212,所述存储器211存储有可在处理器212上运行的计算机程序,所述处理器212执行所述程序时实现上述实施例中提供的影像审核方法中的步骤。
需要说明的是,存储器211配置为存储由处理器212可执行的指令和应用,还可以缓存待处理器212以及电子设备210中各模块待处理或已经处理的数据(例如,图像数据、音频数据、语音通信数据和视频通信数据),可以通过闪存(FLASH)或随机访问存储器(Random Access Memory,RAM)实现。
对应地,本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述实施例中提供的影像审核方法中的步骤。
这里需要指出的是:以上存储介质、芯片和终端设备实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请存储介质、芯片和终端设备实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。
应理解,说明书通篇中提到的“一个实施例”或“一实施例”或“一些实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”或“在一些实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者设备中还存在另外的相同要素。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的触摸屏***的实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个模块或组件可以结合,或可以集成到另一个***,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或模块的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的模块可以是、或也可以不是物理上分开的,作为模块显示 的部件可以是、或也可以不是物理模块;既可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部模块来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能模块可以全部集成在一个处理单元中,也可以是各模块分别单独作为一个单元,也可以两个或两个以上模块集成在一个单元中;上述集成的模块既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得电子设备执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。
本申请所提供的几个方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。
本申请所提供的几个产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。
本申请所提供的几个方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。
以上所述,仅为本申请的实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (18)

  1. 影像审核方法,所述方法包括:
    利用目标分类模型对待审影像文件进行特征提取,得到对应的特征向量;其中,所述目标分类模型是通过多个样本影像文件和对应的多种影像变换文件训练得到的;
    确定所述待审影像文件的特征向量与审核集合中的至少一个参考特征向量之间的相似度;
    根据确定的所述相似度与第一阈值之间的关系,确定所述待审影像文件是否是违规文件。
  2. 根据权利要求1所述的方法,其中,所述目标分类模型的类型为神经网络模型,所述利用目标分类模型对待审影像文件进行特征提取,得到对应的特征向量,包括:
    获取所述目标分类模型的特征向量提取结构,所述特征向量提取结构包括所述目标分类模型的输入层至非线性激活层;
    利用所述特征向量提取结构,对所述待审影像文件进行特征提取,得到对应的特征向量。
  3. 根据权利要求1或2所述的方法,其中,所述目标分类模型的训练过程,包括:
    获取每一所述样本影像文件的类型标签;
    按照多种变换规则,对每一所述样本影像文件进行变换处理,得到对应文件的影像变换文件集合;
    将每一所述样本影像文件的类型标签,赋予给对应影像变换文件集合中的每一影像变换文件;
    根据每一所述样本影像文件、每一所述影像变换文件和各自对应的类型标签,对特定的神经网络模型进行训练,得到所述目标分类模型。
  4. 根据权利要求1所述的方法,其中,所述根据确定的所述相似度与第一阈值之间的关系,确定所述待审影像文件是否是违规文件,包括:
    确定小于所述第一阈值的相似度的数目,所述相似度用于表征两个特征向量之间的不同的特征数目;
    确定所述数目与相似度总数目的比值;
    根据所述比值与第二阈值之间的关系,确定所述待审影像文件是否是违规文件。
  5. 根据权利要求1所述的方法,其中,所述相似度用于表征两个特征向量之间的不同的特征数目,所述参考特征向量对应的参考影像文件为违规文件;
    所述确定所述待审影像文件的特征向量与审核集合中的至少一个参考特征向量之间的相似度,包括:
    确定所述待审影像文件的特征向量与所述审核集合中的第i个参考特征向量之间的相似度;其中,i大于0且小于或等于所述审核集合中的参考特征向量总数目;
    相应地,所述根据确定的所述相似度与第一阈值之间的关系,确定所述待审影像文件是否是违规文件,包括:
    在所述第i个参考特征向量对应的相似度小于所述第一阈值的情况下,确定所述待审影像文件是违规文件。
  6. 根据权利要求5所述的方法,其中,还包括:
    在所述第i个参考特征向量对应的相似度大于或等于所述第一阈值时,确定所述待审影像文件的特征向量与所述审核集合中的第i+1个参考特征向量之间的相似度,以确定所述待审影像文件是否是违规文件。
  7. 根据权利要求1至6任一项所述的方法,其中,还包括:加载已生成的所述审核集合;
    所述审核集合的生成方法,包括:
    利用所述目标分类模型,对多个参考影像文件进行特征提取,得到对应文件的特征向量;
    将每一所述参考影像文件的特征向量作为参考特征向量,生成所述审核集合。
  8. 根据权利要求1至6任一项所述的方法,其中,还包括:加载已确定的所述第一阈值;其中,所述第一阈值的确定方法包括:
    在假设所述第一阈值分别为多个不同候选阈值的情况下,根据所述影像审核方法,确定多个验证影像文件是否是违规文件,从而得到每一所述候选阈值对应的审核结果集合;
    根据每一审核结果集合和每一所述验证影像文件的类型标签,确定在对应候选阈值下的正确召回率和错误召回率;
    将满足特定条件的正确召回率和错误召回率所对应的候选阈值,确定为所述第一阈值。
  9. 影像审核装置,包括:
    特征提取模块,配置为利用目标分类模型对待审影像文件进行特征提取,得到对应的特征向量;其中,所述目标分类模型是通过多个样本影像文件和对应的多种影像变换文件训练得到的;
    第一确定模块,配置为确定所述待审影像文件的特征向量与审核集合中的至少一个参考特征向量之间的相似度;
    审核模块,配置为根据确定的所述相似度与第一阈值之间的关系,确定所述待审影像文件是否是违规文件。
  10. 根据权利要求9所述的装置,其中,所述特征提取模块,配置为:
    获取所述目标分类模型的特征向量提取结构,所述特征向量提取结构包括所述目标分类模型的输入层至非线性激活层;其中,所述目标分类模型的类型为神经网络模型;
    利用所述特征向量提取结构,对所述待审影像文件进行特征提取,得到对应的特征向量。
  11. 根据权利要求9或10所述的装置,其中,还包括:
    标签获取模块,配置为获取每一所述样本影像文件的类型标签;
    变换处理模块,配置为按照多种变换规则,对每一所述样本影像文件进行变换处理,得到对应文件的影像变换文件集合;
    标签标注模块,配置为将每一所述样本影像文件的类型标签,赋予给对应影像变换文件集合中的每一影像变换文件;
    模型训练模块,配置为根据每一所述样本影像文件、每一所述影像变换文件和各自对应的类型标签,对特定的神经网络模型进行训练,得到所述目标分类模型。
  12. 根据权利要求9所述的装置,其中,所述审核模块,配置为:
    确定小于所述第一阈值的相似度的数目,所述相似度用于表征两个特征向量之间的不同的特征数目;
    确定所述数目与相似度总数目的比值;
    根据所述比值与第二阈值之间的关系,确定所述待审影像文件是否是违规文件。
  13. 根据权利要求9所述的装置,其中,
    所述第一确定模块,配置为:确定所述待审影像文件的特征向量与所述审核集合中的第i个参考特征向量之间的相似度;
    其中,i大于0且小于或等于所述审核集合中的参考特征向量总数目;所述相似度用于表征两个特征向量之间的不同的特征数目,所述参考特征向量对应的参考影像文件为违规文件;
    相应地,所述审核模块,配置为在所述第i个参考特征向量对应的相似度小于所述第一阈值的情况下,确定所述待审影像文件是违规文件。
  14. 根据权利要求13所述的装置,其中,所述第一确定模块,还配置为:
    在所述第i个参考特征向量对应的相似度大于或等于所述第一阈值时,确定所述待审影像文件的特征向量与所述审核集合中的第i+1个参考特征向量之间的相似度,以确定所述待审影像文件是否是违规文件。
  15. 根据权利要求9至14任一所述的装置,其中,还包括:
    加载模块,配置为加载已生成的所述审核集合;
    相应地,所述特征提取模块,还配置为:利用所述目标分类模型,对多个参考影像文件进行特征提取,得到对应文件的特征向量;将每一所述参考影像文件的特征向量作为参考特征向量,生成所述审核集合。
  16. 根据权利要求9至14任一所述的装置,其中,还包括:
    加载模块,配置为加载已确定的所述第一阈值;
    相应地,所述装置还包括第二确定模块,配置为:在假设所述第一阈值分别为多个不同候选阈值的情况下,利用所述装置的特征提取模块、第一确定模块和审核模块,确定多个验证影像文件是否是违规文件,从而得到每一所述候选阈值对应的审核结果集合;根据每一审核结果集合和每一所述验证影像文件的类型标签,确定在对应候选阈值下的正确召回率和错误召回率;将满足特定条件的正确召回率和错误召回率所对应的候选阈值,确定为所述第一阈值。
  17. 电子设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1至8任一项所述影像审核方法中的步骤。
  18. 计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现权利要求1至8任一项所述影像审核方法中的步骤。
PCT/CN2020/092923 2020-05-28 2020-05-28 影像审核方法及装置、设备、存储介质 WO2021237570A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080100202.7A CN115443490A (zh) 2020-05-28 2020-05-28 影像审核方法及装置、设备、存储介质
PCT/CN2020/092923 WO2021237570A1 (zh) 2020-05-28 2020-05-28 影像审核方法及装置、设备、存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/092923 WO2021237570A1 (zh) 2020-05-28 2020-05-28 影像审核方法及装置、设备、存储介质

Publications (1)

Publication Number Publication Date
WO2021237570A1 true WO2021237570A1 (zh) 2021-12-02

Family

ID=78745395

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092923 WO2021237570A1 (zh) 2020-05-28 2020-05-28 影像审核方法及装置、设备、存储介质

Country Status (2)

Country Link
CN (1) CN115443490A (zh)
WO (1) WO2021237570A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114443880A (zh) * 2022-01-24 2022-05-06 南昌市安厦施工图设计审查有限公司 一种装配式建筑的大样图审图方法及审图***
CN114612839A (zh) * 2022-03-18 2022-06-10 壹加艺术(武汉)文化有限公司 一种短视频分析处理方法、***及计算机存储介质
CN115297360A (zh) * 2022-09-14 2022-11-04 百鸣(北京)信息技术有限公司 一种多媒体软件视频上传智能审核***
CN115994772A (zh) * 2023-02-22 2023-04-21 中信联合云科技有限责任公司 图书资料处理方法及***、图书快速铺货方法、电子设备
CN116452836A (zh) * 2023-05-10 2023-07-18 武汉精阅数字传媒科技有限公司 一种基于图像数据处理的新媒体素材内容采集***

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116866666B (zh) * 2023-09-05 2023-12-08 天津市北海通信技术有限公司 轨道交通环境下的视频流画面处理方法及装置
CN117292395B (zh) * 2023-09-27 2024-05-24 自然资源部地图技术审查中心 审图模型的训练方法和训练装置及审图的方法和装置
CN118193772A (zh) * 2024-05-13 2024-06-14 飞狐信息技术(天津)有限公司 一种图像分析处理方法、装置、存储介质及电子设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101359372A (zh) * 2008-09-26 2009-02-04 腾讯科技(深圳)有限公司 分类器的训练方法及装置、识别敏感图片的方法及装置
CN108960782A (zh) * 2018-07-10 2018-12-07 北京木瓜移动科技股份有限公司 内容审核方法以及装置
CN109561322A (zh) * 2018-12-27 2019-04-02 广州市百果园信息技术有限公司 一种视频审核的方法、装置、设备和存储介质
US10402699B1 (en) * 2015-12-16 2019-09-03 Hrl Laboratories, Llc Automated classification of images using deep learning—back end
CN110377775A (zh) * 2019-07-26 2019-10-25 Oppo广东移动通信有限公司 一种图片审核方法及装置、存储介质
CN110738697A (zh) * 2019-10-10 2020-01-31 福州大学 基于深度学习的单目深度估计方法
CN111079816A (zh) * 2019-12-11 2020-04-28 北京金山云网络技术有限公司 图像的审核方法、装置和服务器

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101359372A (zh) * 2008-09-26 2009-02-04 腾讯科技(深圳)有限公司 分类器的训练方法及装置、识别敏感图片的方法及装置
US10402699B1 (en) * 2015-12-16 2019-09-03 Hrl Laboratories, Llc Automated classification of images using deep learning—back end
CN108960782A (zh) * 2018-07-10 2018-12-07 北京木瓜移动科技股份有限公司 内容审核方法以及装置
CN109561322A (zh) * 2018-12-27 2019-04-02 广州市百果园信息技术有限公司 一种视频审核的方法、装置、设备和存储介质
CN110377775A (zh) * 2019-07-26 2019-10-25 Oppo广东移动通信有限公司 一种图片审核方法及装置、存储介质
CN110738697A (zh) * 2019-10-10 2020-01-31 福州大学 基于深度学习的单目深度估计方法
CN111079816A (zh) * 2019-12-11 2020-04-28 北京金山云网络技术有限公司 图像的审核方法、装置和服务器

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114443880A (zh) * 2022-01-24 2022-05-06 南昌市安厦施工图设计审查有限公司 一种装配式建筑的大样图审图方法及审图***
CN114612839A (zh) * 2022-03-18 2022-06-10 壹加艺术(武汉)文化有限公司 一种短视频分析处理方法、***及计算机存储介质
CN114612839B (zh) * 2022-03-18 2023-10-31 壹加艺术(武汉)文化有限公司 一种短视频分析处理方法、***及计算机存储介质
CN115297360A (zh) * 2022-09-14 2022-11-04 百鸣(北京)信息技术有限公司 一种多媒体软件视频上传智能审核***
CN115994772A (zh) * 2023-02-22 2023-04-21 中信联合云科技有限责任公司 图书资料处理方法及***、图书快速铺货方法、电子设备
CN115994772B (zh) * 2023-02-22 2024-03-08 中信联合云科技有限责任公司 图书资料处理方法及***、图书快速铺货方法、电子设备
CN116452836A (zh) * 2023-05-10 2023-07-18 武汉精阅数字传媒科技有限公司 一种基于图像数据处理的新媒体素材内容采集***
CN116452836B (zh) * 2023-05-10 2023-11-28 杭州元媒科技有限公司 一种基于图像数据处理的新媒体素材内容采集***

Also Published As

Publication number Publication date
CN115443490A (zh) 2022-12-06

Similar Documents

Publication Publication Date Title
WO2021237570A1 (zh) 影像审核方法及装置、设备、存储介质
WO2020119350A1 (zh) 视频分类方法、装置、计算机设备和存储介质
WO2020199468A1 (zh) 图像分类方法、装置及计算机可读存储介质
CN107463605B (zh) 低质新闻资源的识别方法及装置、计算机设备及可读介质
US10831814B2 (en) System and method for linking multimedia data elements to web pages
Hua et al. Clickage: Towards bridging semantic and intent gaps via mining click logs of search engines
Thyagharajan et al. A review on near-duplicate detection of images using computer vision techniques
US9576221B2 (en) Systems, methods, and devices for image matching and object recognition in images using template image classifiers
RU2668717C1 (ru) Генерация разметки изображений документов для обучающей выборки
US20230376527A1 (en) Generating congruous metadata for multimedia
Murray et al. A deep architecture for unified aesthetic prediction
CN110427895A (zh) 一种基于计算机视觉的视频内容相似度判别方法及***
US10380267B2 (en) System and method for tagging multimedia content elements
KR101647691B1 (ko) 하이브리드 기반의 영상 클러스터링 방법 및 이를 운용하는 서버
CN111651636A (zh) 视频相似片段搜索方法及装置
WO2021179631A1 (zh) 卷积神经网络模型压缩方法、装置、设备及存储介质
CN110163061B (zh) 用于提取视频指纹的方法、装置、设备和计算机可读介质
WO2021012493A1 (zh) 短视频关键词提取方法、装置及存储介质
Phadikar et al. Content-based image retrieval in DCT compressed domain with MPEG-7 edge descriptor and genetic algorithm
CN113221918B (zh) 目标检测方法、目标检测模型的训练方法及装置
US10504002B2 (en) Systems and methods for clustering of near-duplicate images in very large image collections
US20230222762A1 (en) Adversarially robust visual fingerprinting and image provenance models
US11537636B2 (en) System and method for using multimedia content as search queries
Kapadia et al. Improved CBIR system using Multilayer CNN
Thirani et al. Enhancing performance evaluation for video plagiarism detection using local feature through SVM and KNN algorithm

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20937851

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 25.04.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20937851

Country of ref document: EP

Kind code of ref document: A1