WO2020239015A1 - Image recognition method and apparatus, image classification method and apparatus, electronic device, and storage medium - Google Patents

Image recognition method and apparatus, image classification method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2020239015A1
WO2020239015A1 PCT/CN2020/092898 CN2020092898W WO2020239015A1 WO 2020239015 A1 WO2020239015 A1 WO 2020239015A1 CN 2020092898 W CN2020092898 W CN 2020092898W WO 2020239015 A1 WO2020239015 A1 WO 2020239015A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
picture
category
classified
model
Prior art date
Application number
PCT/CN2020/092898
Other languages
French (fr)
Chinese (zh)
Inventor
苏驰
李凯
刘弘也
Original Assignee
北京金山云网络技术有限公司
北京金山云科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京金山云网络技术有限公司, 北京金山云科技有限公司 filed Critical 北京金山云网络技术有限公司
Publication of WO2020239015A1 publication Critical patent/WO2020239015A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Definitions

  • This application relates to the field of image processing technology, in particular to an image recognition and classification method, device, electronic equipment and storage medium.
  • a picture-in-picture image appears, where the picture-in-picture image refers to that the first image is contained in the second image, and the area of the first image is smaller than the area of the second image.
  • Display A displays an image B.
  • the content in this image B is that two girls are having a conversation in a room.
  • a display C placed in the room is also displayed in image B, and a picture is displayed on display C.
  • Image D In this scene, image B contains image D. Therefore, image D is called a picture-in-picture image.
  • the purpose of the embodiments of the present application is to provide an image recognition and classification method, device, electronic device, and storage medium, so as to realize accurate recognition of the content of a live broadcast image containing vulgar pornography.
  • the specific technical solutions are as follows:
  • an embodiment of the present application provides a method for recognizing a picture-in-picture image, and the method includes:
  • the picture-in-picture image recognition model is obtained through machine learning training using the first data set, and the first data set includes multiple Set of data, each set of data includes a first image and a corresponding first label, and the first label is used to indicate whether the first image is a picture-in-picture image.
  • an embodiment of the present application provides a method for training a picture-in-picture image recognition model, the method including:
  • the initial neural network model is trained based on the first data set to obtain the picture-in-picture image recognition model.
  • the first data set includes multiple sets of data, and each set of data includes a first image and a corresponding first label.
  • the tag is used to indicate whether the first image is a picture-in-picture image.
  • an embodiment of the present application provides a method for classifying image pornography, and the method includes:
  • the picture-in-picture image recognition model is obtained through machine learning training using the first data set, and the first data set includes multiple Set of data, each set of data includes a first image and a corresponding first label, where the first label is used to indicate whether the first image is a picture-in-picture image;
  • the image to be classified is not a picture-in-picture image, analyze the image to be classified based on the first classification model to determine the pornographic level category of the image to be classified.
  • the first classification model is obtained through machine learning training using the second data set.
  • the second data set includes multiple sets of data, and each set of data includes a second image and a corresponding second label, and the second label is used to indicate the pornographic level category of the second image;
  • the image to be classified is a picture-in-picture image
  • the second classification model is trained by machine learning using the second data set and The trained model is obtained after changing the global average pooling layer and convolution kernel.
  • an embodiment of the present application provides a device for recognizing a picture-in-picture image, the device including:
  • the acquisition module is used to acquire the image to be recognized
  • the picture-in-picture image recognition module is used to analyze the image to be recognized based on the picture-in-picture image recognition model to determine whether the image to be recognized is a picture-in-picture image.
  • the picture-in-picture image recognition model uses the first data set through machine learning After training, the first data set includes multiple sets of data, and each set of data includes a first image and a corresponding first label. The first label is used to indicate whether the first image is a picture-in-picture image.
  • an embodiment of the present application provides a training device for a picture-in-picture image recognition model, which includes:
  • the acquisition module is used to acquire the pre-created initial neural network model
  • the training module is used to train the initial neural network model based on the first data set to obtain the picture-in-picture image recognition model.
  • the first data set includes multiple sets of data, and each set of data includes a first image and a corresponding first image.
  • a label, the first label is used to indicate whether the first image is a picture-in-picture image.
  • an image pornography grade classification device which includes:
  • the picture-in-picture image recognition module is used to analyze the image to be classified based on the picture-in-picture image recognition model to determine whether the image to be classified is a picture-in-picture image.
  • the picture-in-picture image recognition model uses the first data set through machine learning After training, the first data set includes multiple sets of data, and each set of data includes a first image and a corresponding first label, and the first label is used to indicate whether the first image is a picture-in-picture image;
  • the first category determination module is used for when the image to be classified is not a picture-in-picture image, analyze the image to be classified based on the first classification model to determine the pornographic level category of the image to be classified, wherein the first classification model adopts the second
  • the data set is obtained through machine learning training, the second data set includes multiple sets of data, each set of data includes a second image and a corresponding second label, and the second label is used to indicate the pornographic level category of the second image;
  • the second category determination module is used for when the image to be classified is a picture-in-picture image, analyze the image to be classified based on the second classification model to determine the pornographic level category of the image to be classified, wherein the second classification model adopts the second classification model.
  • the data set is trained by machine learning, and is obtained by changing the global average pooling layer and convolution kernel from the trained model.
  • an embodiment of the present application provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete mutual communication through the communication bus;
  • Memory used to store computer programs
  • the processor is configured to implement the method provided in the first aspect, the second aspect, or the third aspect of the embodiments of the present application when executing the program stored in the memory.
  • the embodiments of the present application provide a computer-readable storage medium.
  • the instructions in the storage medium are executed by the processor of the electronic device, the electronic device can execute the first aspect and the second aspect of the embodiments of the present application. Aspect or the method provided by the third aspect.
  • an embodiment of the present application provides an application program for execution at runtime: the method provided in the first, second, or third aspect of the embodiments of the present application.
  • the image to be classified is obtained, and the image to be classified is analyzed based on the picture-in-picture image recognition model to determine whether the image to be classified is a picture-in-picture image. If the image to be classified is not a picture-in-picture image, then Analyze the image to be classified based on the first classification model to determine the pornographic level category of the image to be classified. If the image to be classified is a picture-in-picture image, analyze the image to be classified based on the second classification model to determine the pornographic level category of the image to be classified .
  • the picture-in-picture image recognition model is obtained through machine learning training using a first data set.
  • the first data set includes multiple sets of data.
  • Each set of data includes a first image and a corresponding first label.
  • the first label is used for Indicates whether the first image is a picture-in-picture image;
  • the first classification model is obtained through machine learning training using a second data set.
  • the second data set includes multiple sets of data, and each set of data includes a second image and a corresponding second image. Label, the second label is used to indicate the pornographic level category of the second image;
  • the second classification model is obtained by using the second data set to train through machine learning, and changing the global average pooling layer and convolution kernel from the trained model.
  • the pornographic level category of the image the second classification model is different from the first classification model.
  • the second classification model uses the second data set through machine learning training and the trained model changes the global average pooling layer and convolution kernel. Therefore, it is possible to accurately identify the pornographic level category of the image to be classified that is a picture-in-picture image, thereby realizing accurate identification of the picture-in-picture image containing vulgar pornography.
  • FIG. 1 is a flowchart of a method for recognizing a picture-in-picture image provided by an embodiment of the application
  • FIG. 2 is a flowchart of a training method for a picture-in-picture image recognition model provided by an embodiment of the application
  • Fig. 3 is a flowchart of a method for classifying image pornography provided by an embodiment of the application
  • FIG. 4 is a specific flowchart of step S304 in the embodiment shown in FIG. 3;
  • FIG. 5 is a specific flowchart of step S404 in the embodiment shown in FIG. 4;
  • FIG. 6 is a flowchart of the training method of the second classification model in the method for classifying image pornography levels provided by an embodiment of the application;
  • FIG. 7 is a schematic structural diagram of an apparatus for recognizing picture-in-picture images provided by an embodiment of the application.
  • FIG. 8 is a schematic structural diagram of a training device for a picture-in-picture image recognition model provided by an embodiment of the application.
  • FIG. 9 is a schematic structural diagram of an image pornography grade classification device provided by an embodiment of the application.
  • FIG. 10 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • embodiments of the present application provide an image recognition and classification method, device, electronic equipment, and computer-readable storage medium.
  • an embodiment of the present application provides a method for recognizing a picture-in-picture image.
  • the method may include the following steps.
  • S101 Acquire an image to be recognized.
  • S102 Analyze the image to be recognized based on the picture-in-picture image recognition model, and determine whether the image to be recognized is a picture-in-picture image.
  • the picture-in-picture image recognition model is obtained through machine learning training using a first data set.
  • the first data set includes multiple sets of data.
  • Each set of data includes a first image and a corresponding first label.
  • the first label is used for Indicates whether the first image is a picture-in-picture image.
  • the image to be recognized is analyzed based on the picture-in-picture image recognition model to determine whether the image to be recognized is a picture-in-picture image.
  • the picture-in-picture image recognition model is obtained through machine learning training using a first data set.
  • the first data set includes multiple sets of data. Each set of data includes a first image and a corresponding first label. The first label is used to represent the first image. Whether an image is a picture-in-picture image.
  • the picture-in-picture image recognition model is a neural network model, which can realize the end-to-end output of the recognition result of whether the image to be recognized is a picture-in-picture image, and realize the automatic recognition of the picture-in-picture image.
  • the method for recognizing picture-in-picture images provided by the embodiments of the present application can be applied to electronic devices with image processing functions, for example, servers of live broadcast platforms, image processing devices, etc., which are not specifically limited herein.
  • the image to be recognized is the image that requires picture-in-picture recognition.
  • the electronic device can use its own image acquisition device to obtain the image to be recognized, or it can obtain the image to be recognized from other electronic devices, which is reasonable.
  • the image to be recognized may be a live image of a live broadcast platform, or an image that requires picture-in-picture recognition in other scenes, and is not specifically limited here.
  • the electronic device can analyze the image to be recognized based on the picture-in-picture image recognition model to obtain the recognition result of whether the image to be classified is a picture-in-picture image.
  • the picture-in-picture image recognition model is obtained through machine learning training using the first data set.
  • the first data set includes multiple sets of data. Each set of data includes a first image and a corresponding first label.
  • the first label can be manually labeled Obtained, used to indicate whether the first image is a picture-in-picture image.
  • the training process of the picture-in-picture image recognition model is detailed in the embodiment shown in FIG. 2.
  • the picture-in-picture image recognition model can be a neural network model such as a convolutional neural network, which can specifically include a feature extraction part, a global average pooling layer, and an output layer.
  • the feature extraction part is composed of a series of convolution, batch normalization and activation functions to extract image features;
  • the global average pooling layer performs global average pooling operation on the image features extracted by the feature extraction part to obtain the global input image Features;
  • the output layer is a fully connected layer, which performs fully connected processing on global features, and the resulting category vector is used to determine whether the input image to be recognized is a picture-in-picture image.
  • the output result of the picture-in-picture image recognition model may be a probability vector, that is, the probability that the image to be recognized is a picture-in-picture image and the probability that the image to be recognized is not a picture-in-picture image.
  • the output result of the picture-in-picture image recognition model may be a label, which identifies that the image to be recognized is a picture-in-picture image or not a picture-in-picture image. For example, label 1 indicates that the image to be recognized is a picture-in-picture image, and label 2 indicates that the image to be recognized is not a picture-in-picture image.
  • S102 may be specifically implemented through the following steps:
  • the first step is to input the image to be recognized into the picture-in-picture image recognition model to obtain a first output result, where the first output result is a two-dimensional vector, and the first output result includes a first component and a second component, and the first component represents The probability that the image to be recognized is a picture-in-picture image, and the second component represents the probability that the image to be recognized is not a picture-in-picture image.
  • the second step if the first component is greater than the second component, it is determined that the image to be recognized is a picture-in-picture image.
  • the output of the picture-in-picture image recognition model is a two-dimensional vector, which is called the first output result.
  • the first output result includes two components. One component represents the probability that the input image to be recognized is a picture-in-picture image, and the other The component represents the probability that the input image to be recognized is not a picture-in-picture image. Based on the size of the two components in the first output result, it can be determined whether the image to be recognized is a picture-in-picture image.
  • the picture-in-picture image recognition model can specifically include two parts.
  • the first part outputs the first vector.
  • the size of the first vector is 2, which means it contains two components.
  • the two components respectively indicate that the image to be recognized is a picture-in-picture
  • the probability to be normalized and the probability to be normalized that the image to be recognized is not a picture-in-picture.
  • the second part is a preset normalization function.
  • the normalization function can be a loss function (also called a softmax function). Input the obtained first vector into the normalization function, and the first output result can be obtained. Among them, the normalization function is shown in equation (1).
  • the first vector of X including two components X 1 and X 2 , respectively represents the to-be-normalized probability that the image to be recognized is a picture-in-picture and the probability that the image to be recognized is not a picture-in-picture; p is the second
  • the vector includes two components p 1 and p 2 , respectively representing the probability that the image to be recognized is a picture-in-picture image and the probability that the image to be recognized is not a picture-in-picture image.
  • the sizes of p 1 and p 2 can be compared. If p 1 > p 2 , that is, the probability that the image to be recognized is a picture-in-picture image is greater than the probability that the image to be recognized is not a picture-in-picture image, It can be determined that the image to be recognized is a picture-in-picture image. Similarly, if p 1 ⁇ p 2 , that is, the probability that the image to be recognized is a picture-in-picture image is not greater than the probability that the image to be recognized is not a picture-in-picture image, it can be determined that the image to be recognized is not a picture-in-picture image.
  • the electronic device can determine that the probability that the image to be recognized is a picture-in-picture image is 0.7 is greater than the probability that the image to be recognized is not a picture-in-picture image is 0.3, and it can determine that the image to be recognized is Picture-in-picture image.
  • an embodiment of the present application provides a method for training a picture-in-picture image recognition model.
  • the method may include the following steps.
  • the first data set includes multiple sets of data, and each set of data includes a first image and a corresponding first label, and the first label is used to indicate whether the first image is a picture-in-picture image.
  • the initial neural network model is trained based on the first data set to obtain a picture-in-picture image recognition model.
  • the picture-in-picture image recognition model obtained through training is a neural network model, which can realize end-to-end output of the recognition result of whether the image to be recognized is a picture-in-picture image.
  • the training method of the picture-in-picture image recognition model provided by the embodiments of the present application can be applied to electronic equipment with image processing functions, for example, it can be a server of a live broadcast platform, image processing equipment, etc., and can also be applied to those that provide model training functions.
  • the training server is not specifically limited here.
  • the initial neural network model consists of three parts: the feature extraction part, the global average pooling layer and the output layer.
  • the feature extraction part is composed of a series of convolution, batch normalization and activation functions to extract image features;
  • the global average pooling layer performs global average pooling operation on the image features extracted by the feature extraction part to obtain the global input image Features;
  • the output layer is a fully connected layer, which performs fully connected processing on global features, and the resulting category vector is used to determine whether the input image is a picture-in-picture.
  • S202 can be specifically implemented by using the following steps:
  • the first step is to obtain a first image from the first data set, and input the first image to the initial neural network model to obtain the second output result, where the second output result is a two-dimensional vector, and the second output result includes the first Three components and a fourth component.
  • the third component represents the probability that the first image is a picture-in-picture image
  • the fourth component represents the probability that the first image is not a picture-in-picture image.
  • the loss amount is determined according to the second output result and the first label corresponding to the first image, where the loss amount represents the difference between the second output result and the first label.
  • the third step is to update the weight parameters in the initial neural network model according to the amount of loss.
  • the first data set includes multiple sets of data, and each set of data includes a first image and a corresponding first label.
  • the first label is manually calibrated to indicate whether the first image is a picture-in-picture image sample image.
  • the data in the data set can be divided into training set and test set according to the ratio of K:1, and the training set is used to train the neural network model.
  • Obtain a first image from the first data set, input it to the initial neural network model, and obtain the second output result. Then, according to the second output result and the first label corresponding to the first image, a preset loss function L -log(p y ) is used to calculate the loss L, where y is the first label.
  • the loss amount L is inversely proportional to the value of p y , that is to say, adjusting the network parameters of the neural network model makes the value of the loss function L become smaller and smaller, making the value of p y larger and larger, even approaching 1, so It can make the probability that the target deep learning model is a picture-in-picture image in the output result approaches 1, which makes the recognition result more and more accurate.
  • the back propagation algorithm can be used to calculate the reciprocal of the loss L to the network parameters Among them, W is a network parameter. Furthermore, the stochastic gradient descent algorithm is used to update the network parameters. That is, calculate the new network parameters according to the following formula:
  • W * is a new network parameter
  • is a preset adjustment parameter. Its specific value can be set according to factors such as training requirements and the accuracy of the target neural network model. For example, it can be 0.001, 0.0015, 0.002, etc. Make specific restrictions.
  • an embodiment of the present application provides a method for classifying image pornography, and the method may include the following steps.
  • S302 Analyze the image to be classified based on the picture-in-picture image recognition model to determine whether the image to be classified is a picture-in-picture image, if it is, execute S304, otherwise execute S303.
  • the picture-in-picture image recognition model is obtained through machine learning training using a first data set.
  • the first data set includes multiple sets of data.
  • Each set of data includes a first image and a corresponding first label.
  • the first label is used for Indicates whether the first image is a picture-in-picture image.
  • S303 Analyze the image to be classified based on the first classification model, and determine the pornographic level category of the image to be classified.
  • the first classification model is obtained through machine learning training using a second data set.
  • the second data set includes multiple sets of data.
  • Each set of data includes a second image and a corresponding second label.
  • the second label is used to represent the first The pornographic level category of the second image.
  • S304 Analyze the image to be classified based on the second classification model, and determine the pornographic level category of the image to be classified.
  • the second classification model is obtained by using the second data set through machine learning training, and changing the global average pooling layer and convolution kernel from the trained model.
  • the picture-in-picture image recognition model is obtained through machine learning training using a first data set.
  • the first data set includes multiple sets of data.
  • Each set of data includes a first image and a corresponding first label.
  • the first label is used for Indicates whether the first image is a picture-in-picture image;
  • the first classification model is obtained through machine learning training using a second data set.
  • the second data set includes multiple sets of data, and each set of data includes a second image and a corresponding second image. Label, the second label is used to indicate the pornographic level category of the second image;
  • the second classification model is obtained by using the second data set to train through machine learning, and changing the global average pooling layer and convolution kernel from the trained model.
  • the pornographic level category of the image the second classification model is different from the first classification model.
  • the second classification model uses the second data set through machine learning training and the trained model changes the global average pooling layer and convolution kernel. Therefore, it is possible to accurately identify the pornographic level category of the image to be classified that is a picture-in-picture image, thereby realizing accurate identification of the picture-in-picture image containing vulgar pornography.
  • the image pornography level classification method provided in the embodiments of the application can be applied to any electronic device that needs to perform image pornography level classification, for example, it can be a server of a live broadcast platform, image processing equipment, etc., which are not specifically limited here, for the convenience of description , Hereinafter referred to as electronic equipment.
  • the image to be classified obtained by the electronic device is an image that needs to be classified. It is reasonable that the electronic device can use its own image acquisition device to obtain the image to be classified, or obtain the image to be classified from other electronic devices.
  • the image to be classified may be a live image of a live broadcast platform, or an image that needs to be classified in other scenes, and is not specifically limited here.
  • the electronic device can input the image to be classified into a pre-trained picture-in-picture image recognition model to obtain the recognition result of whether the image to be classified is a picture-in-picture image .
  • the picture-in-picture image recognition model is obtained through machine learning training using the first data set.
  • the first data set includes multiple sets of data. Each set of data includes a first image and a corresponding first label. The first label is manual calibration. , Used to indicate whether the first image is a picture-in-picture image.
  • the training process of the picture-in-picture image recognition model is shown in the embodiment shown in Fig. 2, which will not be repeated here.
  • the electronic device may analyze the image to be classified based on the first classification model to determine the pornographic level category of the image to be classified.
  • the first classification model is obtained through machine learning training using a second data set.
  • the second data set includes multiple sets of data. Each set of data includes a second image and a corresponding second label. The second label is manually calibrated for Indicates the pornographic level category of the second image. In this way, the trained first classification model can also recognize the pornographic level category of the image according to the image feature, and then output the recognition result.
  • the first classification model may be a neural network model such as a convolutional neural network, including a feature extraction part, a global average pooling layer, and an output layer.
  • the feature extraction part is used to extract features in the image to be classified;
  • the global average pooling layer is used to perform global average pooling operations on the features extracted by the feature extraction part to obtain the global features of the image to be classified;
  • the output layer is used to perform global features Full connection processing obtains a category vector used to determine the pornographic level category of the image to be classified.
  • the output result of the first classification model may be a probability vector, that is, the probability that the image to be classified is each preset pornographic level category.
  • the output result of the first classification model may be a label, which identifies the pornographic level category of the image to be classified. For example, label a indicates that the category of the image to be classified is a normal category, label b indicates that the category of the image to be classified is a vulgar category, and label c indicates that the category of the image to be classified is a pornographic category.
  • the electronic device can compare the probabilities in the probability vector, and determine the preset pornographic level category corresponding to the largest probability as the pornographic level category of the image to be classified.
  • the output of the first classification model is a probability vector, which includes the probability that the image to be classified is normal, vulgar, and pornographic. If the output result of the first classification model is ⁇ 0.8, 0.1, 0.1 ⁇ , then the probability that the image to be classified is a normal category, a vulgar category, and a pornographic category are 0.8, 0.1, 0.1, respectively, and the electronic device can determine the image to be classified
  • the pornographic level category of is the category with the highest probability, that is, the normal category.
  • the electronic device when the image to be classified is not a picture-in-picture image, the electronic device can input the image to be classified into the first classification model to obtain the category of the image to be classified. In this way, it can be ensured that the category of the image to be classified can be accurately determined when the image to be classified is not a picture-in-picture image.
  • the electronic device may analyze the image to be classified based on the second classification model to determine the pornographic level category of the image to be classified.
  • the second classification model is obtained through machine learning training using the second data set, and the global averaging layer and convolution kernel are changed from the trained model.
  • the second classification model may be a neural network model such as a convolutional neural network, including a feature extraction part, a non-global average pooling layer, and a convolutional layer.
  • the feature extraction part is used to extract features in the image to be classified;
  • the non-global average pooling layer is used to pool the features extracted by the feature extraction part to obtain the pooling result;
  • the convolution layer is used to convolve the pooling result Operate to obtain a category matrix used to determine the pornographic level category of the image to be classified.
  • the second classification model may be obtained by modifying the global average pooling layer and the output layer of the first classification model after the first classification model is trained.
  • the second classification model may also be obtained through training.
  • the network parameters of the second classification model may be adjusted so that the second classification model can learn the correspondence between the image features of the sample image and the preset category. In this way, the trained second classification model can also recognize the image category according to the image characteristics, and then output the recognition result.
  • the second classification model is obtained by using the second data set through machine learning training and changing the trained model.
  • the global average pooling layer in the second classification model is used to determine whether it is global pooling from the global average pooling layer of the model trained on the second data set Obtained after the parameter is changed to No; the convolutional layer in the second classification model is obtained by changing the output layer of the model trained on the second data set to a convolutional layer with a kernel function size of 1 ⁇ 1.
  • the global average pooling layer in the trained first classification model can be used After the parameter for determining whether it is global pooling is set to No, the non-global average pooling layer in the second classification model is obtained. Specifically, the parameter "whether it is global pooling" can be changed from True to False.
  • the second classification model needs to extract the image features of the sub-regions in the image to be classified and output the corresponding category matrix. Therefore, in order for the second classification model to extract the image features of the sub-regions in the image to be classified, the image features of the first classification model can be
  • the output layer is modified to a 1 ⁇ 1 convolutional layer.
  • the first classification model after training can be appropriately modified to obtain the second classification model without retraining the second classification model, reducing the training time of the deep learning model , To further improve the efficiency of image classification.
  • the output result of the second classification model is a multi-dimensional category matrix, that is, a matrix composed of the probability that the image to be classified is each preset category.
  • the output result of the second classification model may also be a label, which identifies the pornographic level category of the image to be classified. For example, label A indicates that the category of the image to be classified is a normal category, label B indicates that the category of the image to be classified is a vulgar category, and label C indicates that the category of the image to be classified is a pornographic category.
  • the number of elements in the category matrix is related to the preset category, the network structure of the second classification model, network parameters, and the processing of the image to be classified before inputting the second classification model.
  • the second classification model processes the image to be classified, it can extract the image features of each sub-region, and then determine the probability that each sub-region belongs to the preset category according to the correspondence between the image features of the sub-regions included and the category, and the composition probability vector.
  • the probability vectors corresponding to all the sub-regions constitute the above-mentioned multi-dimensional category matrix, and each element in the category matrix is the probability vector of the sub-region at the corresponding position.
  • the output result of the second classification model is a t ⁇ t ⁇ 3 category matrix Z, where the specific value of t is related to the network structure and network parameters of the second It is related to the processing of the image to be classified before inputting the second classification model, and t ⁇ t is the number of sub-regions of the image to be classified.
  • Each element (t a , t b ) in the category matrix Z corresponds to a sub-region of the corresponding position in the image to be classified, where a ⁇ (1, t), b ⁇ (1, t).
  • each subregion corresponds to an element in the category matrix Z.
  • Each element (t a , t b ) corresponds to a three-dimensional probability vector (p1, p2, p3), where p1 represents the probability that the category of the sub-region corresponding to (t a , t b ) is a normal category, and p2 represents ( t a , t b ) the probability that the category of the corresponding sub-region is a vulgar category, p3 represents the probability that the category of the sub-region corresponding to (t a , t b ) is a pornographic category.
  • the electronic device can determine the categories of all sub-regions in the image to be classified, and further, the electronic device can determine the pornographic level category of the image to be classified according to the category of each sub-region.
  • the category of the subregion includes a pornographic category
  • the pornographic level category of the image to be classified is determined as the pornographic category.
  • other methods can also be used to determine the pornographic level category of the image to be classified. For example, the category with the largest number of categories in all sub-regions is determined as the pornographic level category of the image to be classified. This is reasonable and no specific limitation is made here. .
  • the above-mentioned S304 can be specifically implemented through the following steps:
  • the image to be classified is enlarged according to a predetermined ratio.
  • the second classification model needs to extract image features from the sub-regions of the image to be classified, if the sub-region is small, the accuracy of the second classification model processing will be reduced. Therefore, in order to ensure the accuracy of the output results of the second classification model, you can
  • the classified image is enlarged, for example, the length and width of the image to be classified are enlarged by K times, etc., where the specific value of K can be preset according to the actual classification requirements and the size of the image to be classified, which is not specifically limited here .
  • S402 Input the enlarged image to be classified into a second classification model to obtain a class matrix.
  • the category matrix includes multiple sets of elements, each set of elements corresponds to a subregion of the image to be classified, and each element in each set of elements represents the probability of a preset category corresponding to the subregion.
  • the electronic device can input the enlarged image to be classified into the second classification model, and the second classification model can extract the image features of the sub-regions of the enlarged image to be classified , Image feature extraction is more accurate.
  • the electronic device before inputting the image to be classified into the second classification model, the electronic device can enlarge the image to be classified, so that the second classification model can more accurately determine the category of the image to be classified.
  • S403 Determine the preset category corresponding to the element with the largest value in each group of elements of the category matrix as the category of the subregion of the image to be classified represented by the group of elements.
  • each element in the category matrix is a probability vector composed of the probability that the corresponding subregion is the preset category
  • the electronic device can determine the preset category corresponding to the maximum value of each element as the value of the image to be classified corresponding to the element.
  • the category of the subarea is a probability vector composed of the probability that the corresponding subregion is the preset category
  • the electronic device can determine the preset category corresponding to the largest one of p1, p2, and p3 as (t a , t b ) The category of the sub-region corresponding to the location.
  • S404 Determine the pornographic level category of the image to be classified according to the category of each subregion.
  • the electronic device can determine the categories of all sub-regions in the image to be classified, and further, the electronic device can determine the pornographic level category of the image to be classified according to the category of each sub-region.
  • the category of the subregion includes a pornographic category
  • the pornographic level category of the image to be classified is determined as the pornographic category.
  • other methods can also be used to determine the pornographic level category of the image to be classified. For example, the category with the largest number of categories in all sub-regions is determined as the pornographic level category of the image to be classified. This is reasonable and no specific limitation is made here. .
  • the output result of the above-mentioned second classification model is a multi-dimensional category matrix
  • the electronic device can determine the preset category corresponding to the maximum value of each element in the category matrix as the element to be classified
  • the category of the sub-region of the image, and then the pornographic level category of the image to be classified is determined according to the category of each sub-region.
  • the second classification model can extract the image features of the subregions in the image to be classified, and then output a category matrix representing the category of each subregion, so that the electronic device can accurately determine the category of each subregion and the pornographic level category of the image to be classified.
  • the foregoing S404 may be specifically implemented through the following steps:
  • S501 According to the category of each sub-region, respectively determine the ratio of the number of sub-regions belonging to each abnormal category to the total number of sub-regions.
  • the electronic device can determine the ratio of the number of sub-regions belonging to each abnormal category to the total number of sub-regions according to the category of each sub-region.
  • the abnormal categories mentioned here can be pornographic categories, Vulgar categories, illegal categories, etc.
  • the total number of sub-regions is 100.
  • the number of sub-areas in the vulgar category is 35
  • the number of sub-regions in the porn category is 40
  • the other sub-areas are in normal categories.
  • S502 Determine whether each ratio is less than a preset threshold.
  • the electronic device can determine whether each ratio is less than the preset threshold.
  • the preset thresholds can be the same or different.
  • the specific preset thresholds can be based on actual classification requirements and other factors set up.
  • the corresponding preset threshold can be set lower. In order to eliminate a certain type of abnormal content, the corresponding preset threshold can be set to 0. If the requirements for a certain abnormal type are relatively loose, the corresponding preset threshold can be set higher.
  • the ratios are all less than the preset threshold, it means that the number of abnormal categories in the sub-regions in the image to be classified is very small, and the pornographic level category of the image to be classified can be determined as the normal category.
  • the electronic device can compare the above-mentioned ratios and find which ratio is the largest.
  • the pornographic level category is the category of the sub-region with the largest ratio. For example, through comparison, it is found that the ratio of the vulgar category is the largest, indicating that the number of sub-regions of the vulgar category is more than the number of sub-regions of other abnormal categories, and then the pornographic level category of the image to be classified can be determined as the vulgar category.
  • the foregoing preset categories may include a normal category, a vulgar category, and a pornographic category.
  • the electronic device can determine the first ratio of the sub-areas belonging to the vulgar category to the total number of sub-areas, and the second ratio of the sub-areas belonging to the pornographic category to the total number of sub-areas, and then according to the first ratio and The second ratio determines whether the category of the image to be classified is a normal category, a vulgar category or a pornographic category.
  • electronic equipment can accurately identify vulgar and pornographic images, and the classification accuracy and efficiency are improved.
  • the second classification model may be obtained by modifying the trained model as described above, or may be obtained by pre-training.
  • the training method of the second classification model may include the following steps:
  • S601 Obtain a neural network model and a second data set.
  • the neural network model includes a feature extraction part, a non-global average pooling layer, and a convolutional layer.
  • the second data set includes multiple sets of data. Each set of data includes a second image and a corresponding second label. The second label is Manually calibrated, used to indicate the pornographic level category of the second image.
  • the second data set can be divided into a training set and a test set according to a K:1 ratio, and the training set is used to train the neural network model.
  • S602 Obtain a second image from the second data set, and input the second image into the neural network model, and obtain a category matrix through sequential operations of the feature extraction part, the non-global average pooling layer, and the convolutional layer.
  • a second image can be obtained from the second data set, and the second image can be input into the aforementioned neural network model, and the neural network model can process the second image to obtain the output result.
  • the output result is a category matrix, which can represent the category of the second image.
  • S603 Input the category matrix into a preset loss function to obtain a probability distribution vector.
  • n is the number of types of categories of sexual level.
  • the pornographic category includes normal category, vulgar category and pornographic category, then n is 3, assuming that the category vector X corresponding to the second image T is ⁇ 1, 3, 6 ⁇ , then the probability vector corresponding to the second image T
  • S604 Determine the loss amount according to the probability distribution vector and the second label corresponding to the second image.
  • the pornographic level category includes a normal category, a vulgar category and a pornographic category
  • the second label corresponding to the second image is a pornographic category
  • p y is the element p 3 in the probability vector p corresponding to the second image.
  • S605 Determine whether the loss function corresponding to the loss amount has converged, if it does not converge, execute S606, and if it converges, it is determined to complete the training and obtain the second classification model.
  • S606 Update the network parameters of the neural network model according to the loss amount, and return to execute S602-S605.
  • the loss amount L is inversely proportional to the value of p y , that is to say, updating the network parameters of the neural network model makes the loss amount L smaller and smaller, making the value of p y larger and larger, even approaching 1, which can make The probability of a certain category in the output result of the target deep learning model approaches 1, which makes the classification result more and more accurate.
  • the backpropagation algorithm can be used to calculate the reciprocal of the loss L to the network parameters Among them, W is a network parameter. Furthermore, the stochastic gradient descent algorithm is used to update the network parameters. That is, calculate the new network parameters according to the following formula:
  • W * is a new network parameter
  • is a preset adjustment parameter. Its specific value can be set according to factors such as training requirements and the accuracy of the target neural network model. For example, it can be 0.001, 0.0015, 0.002, etc. Make specific restrictions.
  • the loss function determines whether the neural network model has met the requirements by judging whether the loss function converges. If the loss function converges, it means that the accuracy of the output result of the current neural network model has reached the requirements, and the image can be accurately classified, so at this time The training can be stopped, and the second classification model is obtained.
  • the training process ensures that the output result of the second classification model is accurate.
  • the second classification model has a deeper number of layers and can extract more accurate and precise image features, with better classification effects and generalization capabilities. Strong and robust.
  • the structure of the model is the same as the picture-in-picture image classification model, which is the feature extraction part, the global average pooling layer and the output layer, but the output classification results are different.
  • the specific training process can refer to the figure The training process of the picture-in-picture image classification model shown in 2 uses the backpropagation algorithm and the stochastic gradient descent algorithm to update the weight of the model until convergence. The specific process is not repeated here.
  • an embodiment of the present application also provides a device for recognizing a picture-in-picture image.
  • the following describes a device for recognizing picture-in-picture images provided by embodiments of the present application.
  • an apparatus for recognizing picture-in-picture images may include:
  • the obtaining module 710 is used to obtain the image to be recognized
  • the picture-in-picture image recognition module 720 is used to analyze the image to be recognized based on the picture-in-picture image recognition model to determine whether the image to be recognized is a picture-in-picture image.
  • the picture-in-picture image recognition model uses the first data set to pass the machine Through learning and training, the first data set includes multiple sets of data, and each set of data includes a first image and a corresponding first label. The first label is used to indicate whether the first image is a picture-in-picture image.
  • the picture-in-picture image recognition module 720 may include:
  • the first recognition unit is used to input the image to be recognized into the picture-in-picture image recognition model to obtain a first output result, where the first output result is a two-dimensional vector, and the first output result includes a first component and a second classification.
  • One component represents the probability that the image to be recognized is a picture-in-picture image
  • the second component represents the probability that the image to be recognized is not a picture-in-picture image
  • the first determining unit is used to determine whether the first component is greater than the second component, and if it is greater, determine that the image to be recognized is a picture-in-picture image.
  • the image to be recognized is analyzed based on the picture-in-picture image recognition model to determine whether the image to be recognized is a picture-in-picture image.
  • the picture-in-picture image recognition model is obtained through machine learning training using a first data set.
  • the first data set includes multiple sets of data. Each set of data includes a first image and a corresponding first label. The first label is used to represent the first image. Whether an image is a picture-in-picture image.
  • the picture-in-picture image recognition model is a neural network model, which can realize the end-to-end output of the recognition result of whether the image to be recognized is a picture-in-picture image, and realize the automatic recognition of the picture-in-picture image.
  • an embodiment of the present application also provides a device for training the picture-in-picture image recognition model.
  • the following describes a training device for a picture-in-picture image recognition model provided by an embodiment of the present application.
  • a training device for a picture-in-picture image recognition model may include:
  • the obtaining module 810 is used to obtain the pre-created initial neural network model
  • the training module 820 is used to train the initial neural network model based on the first data set to obtain the picture-in-picture image recognition model, where the first data set includes multiple sets of data, and each set of data includes a first image and a corresponding The first label, the first label is used to indicate whether the first image is a picture-in-picture image.
  • the training module 820 may include:
  • the second recognition unit is used to obtain a first image from the first data set, and input the first image into the initial neural network model to obtain a second output result, where the second output result is a two-dimensional vector, and the second output
  • the result includes a third component and a fourth component.
  • the third component represents the probability that the first image is a picture-in-picture image
  • the fourth component represents the probability that the first image is not a picture-in-picture image
  • a loss calculation unit configured to determine the loss amount according to the second output result and the first label corresponding to the first image, where the loss amount represents the difference between the second output result and the first label;
  • the weight update unit is used to update the weight parameters in the initial neural network model according to the loss
  • the convergence condition judgment unit is configured to send a stop instruction to the second recognition unit when the loss function corresponding to the loss amount converges, so that the second recognition unit stops inputting the first image in the first data set to the initial neural network model.
  • the initial neural network model is trained based on the first data set to obtain a picture-in-picture image recognition model.
  • the picture-in-picture image recognition model obtained through training is a neural network model, which can realize end-to-end output of the recognition result of whether the image to be recognized is a picture-in-picture image.
  • an embodiment of the present application also provides an image pornography level classification device.
  • the following describes a device for classifying image pornography provided by an embodiment of the present application.
  • an image pornography grade classification device may include:
  • the obtaining module 910 is used to obtain the image to be classified
  • the picture-in-picture image recognition module 920 is used to analyze the image to be classified based on the picture-in-picture image recognition model to determine whether the image to be classified is a picture-in-picture image.
  • the picture-in-picture image recognition model adopts the first data set to pass the machine Through learning and training, the first data set includes multiple sets of data, and each set of data includes a first image and a corresponding first label, and the first label is used to indicate whether the first image is a picture-in-picture image;
  • the first category determining module 930 is used to analyze the image to be classified based on the first classification model when the image to be classified is not a picture-in-picture image to determine the pornographic level category of the image to be classified, wherein the first classification model is
  • the second data set is obtained through machine learning training, the second data set includes multiple sets of data, each set of data includes a second image and a corresponding second label, and the second label is used to indicate the pornographic level category of the second image;
  • the second category determining module 940 is configured to analyze the image to be classified based on the second classification model to determine the pornographic level category of the image to be classified when the image to be classified is a picture-in-picture image, wherein the second classification model is The second data set is obtained through machine learning training, and the global average pooling layer and convolution kernel are changed from the trained model.
  • the first classification model includes a feature extraction part, a global average pooling layer, and an output layer; the feature extraction part is used to extract features in the image to be classified; the global average pooling layer is used to compare The features extracted by the feature extraction part are subjected to a global average pooling operation to obtain the global features of the image to be classified; the output layer is used to perform full connection processing on the global features to obtain a category vector used to determine the pornographic level category of the image to be classified.
  • the global average pooling layer in the second classification model is used to determine whether it is global pooling from the global average pooling layer of the model trained on the second data set Obtained after the parameter is changed to No; the convolutional layer in the second classification model is obtained by changing the output layer of the model trained on the second data set to a convolutional layer with a kernel function size of 1 ⁇ 1.
  • the second classification model includes a feature extraction part, a non-global average pooling layer, and a convolutional layer;
  • the feature extraction part is used to extract features in the image to be classified;
  • the non-global average pooling layer It is used to perform a pooling operation on the features extracted by the feature extraction part to obtain a pooling result;
  • the convolutional layer is used to perform a convolution operation on the pooling result to obtain a category matrix used to determine the pornographic level category of the image to be classified.
  • the second category determining module 940 may include:
  • the image enlargement unit is used to enlarge the image to be classified according to a predetermined ratio
  • the category probability generating unit is used to input the enlarged image to be classified into the second classification model to obtain a category matrix, where the category matrix includes multiple groups of elements, each group of elements corresponds to a sub-region of the image to be classified, and the Each element represents the probability of a preset category corresponding to the subregion;
  • the sub-region category confirmation unit is used to determine the preset category corresponding to the element with the largest value in each group of elements of the category matrix as the category of the sub-region of the image to be classified represented by the group of elements;
  • the pornographic level category confirmation unit is used to determine the pornographic level category of the image to be classified according to the category of each subregion.
  • the second category determining module 940 may include:
  • the ratio confirmation unit is used to determine the ratio of the number of sub-regions belonging to each abnormal category to the total number of sub-regions according to the category of each sub-region;
  • the threshold judgment unit is used to judge whether each ratio is less than a preset threshold; if each ratio is less than a preset threshold, determine that the pornographic level category of the image to be classified is a normal category; if there is a ratio greater than the preset threshold, compare each The size of the ratio determines the pornographic level category of the image to be classified as the category of the sub-region with the largest ratio.
  • the pornographic level category of the second image includes a normal category, a vulgar category, and a pornographic category.
  • the picture-in-picture image recognition model is obtained through machine learning training using a first data set.
  • the first data set includes multiple sets of data.
  • Each set of data includes a first image and a corresponding first label.
  • the first label is used for Indicates whether the first image is a picture-in-picture image;
  • the first classification model is obtained through machine learning training using a second data set.
  • the second data set includes multiple sets of data, and each set of data includes a second image and a corresponding second image. Label, the second label is used to indicate the pornographic level category of the second image;
  • the second classification model is obtained by using the second data set to train through machine learning, and changing the global average pooling layer and convolution kernel from the trained model.
  • the pornographic level category of the image the second classification model is different from the first classification model.
  • the second classification model uses the second data set through machine learning training and the trained model changes the global average pooling layer and convolution kernel. Therefore, it is possible to accurately identify the pornographic level category of the image to be classified that is a picture-in-picture image, thereby realizing accurate identification of the picture-in-picture image containing vulgar pornography.
  • the electronic device may include a processor 1001, a communication interface 1002, a memory 1003, and a communication bus 1004.
  • the processor 1001, the communication interface 1002, and the memory 1003 pass through
  • the communication bus 1004 completes mutual communication
  • the memory 1003 is used to store computer programs
  • the processor 1001 is configured to implement the method for recognizing a picture-in-picture image, the method for training a picture-in-picture image recognition model, or the method for classifying image pornography provided by any of the above embodiments when executing the program stored in the memory 1003.
  • the electronic device can obtain the image to be classified, analyze the image to be classified based on the picture-in-picture image recognition model, and determine whether the image to be classified is a picture-in-picture image. If the image to be classified is not a picture, analyze the image to be classified based on the first classification model to determine the pornographic level category of the image to be classified. If the image to be classified is a picture-in-picture image, analyze the image to be classified based on the second classification model to determine the image to be classified The pornographic level category of the image.
  • the picture-in-picture image recognition model is obtained through machine learning training using a first data set.
  • the first data set includes multiple sets of data. Each set of data includes a first image and a corresponding first label. The first label is used for Indicates whether the first image is a picture-in-picture image; the first classification model is obtained through machine learning training using a second data set.
  • the second data set includes multiple sets of data, and each set of data includes a second image and a corresponding second image. Label, the second label is used to indicate the pornographic level category of the second image; the second classification model is obtained by using the second data set to train through machine learning, and changing the global average pooling layer and convolution kernel from the trained model.
  • the pornographic level category of the image the second classification model is different from the first classification model.
  • the second classification model uses the second data set through machine learning training and the trained model changes the global average pooling layer and convolution kernel. Therefore, it is possible to accurately identify the pornographic level category of the image to be classified that is a picture-in-picture image, thereby realizing accurate identification of the picture-in-picture image containing vulgar pornography.
  • the communication bus mentioned in the above electronic device may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the communication bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication interface is used for communication between the aforementioned electronic device and other devices.
  • the memory may include random access memory (Random Access Memory, RAM), and may also include non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk storage.
  • NVM non-Volatile Memory
  • the memory may also be at least one storage device located far away from the foregoing processor.
  • the above-mentioned processor can be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (DSP), a dedicated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • CPU central processing unit
  • NP Network Processor
  • DSP digital signal processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • the embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the method for recognizing a picture-in-picture image and the picture-in-picture image provided by any of the above embodiments are implemented
  • the training method of the Chinese painting image recognition model or the image pornographic classification method are implemented.
  • the embodiments of the present application also provide an application program for executing at runtime: the method for recognizing picture-in-picture images, the method for training picture-in-picture image recognition models, or the method for classifying image pornography provided by any of the above embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments of the present application provide an image recognition method and apparatus, an image classification method and apparatus, an electronic device, and storage medium. The image recognition method comprises: acquiring an image to be classified; analyzing said image on the basis of a picture-in-picture image recognition model to determine whether said image is a picture-in-picture image; and if not, analyzing said image on the basis of a first classification model to determine the pornographic level category of said image, or if yes, analyzing said image on the basis of a second classification model to determine the pornographic level category of said image. The second classification model is different from the first classification model. The second classification model is obtained by machine learning training using a second data set and then changing the global average pooling layer and convolution kernel on the basis of the trained model, and can accurately recognize the pornographic level category of said image which is a picture-in-picture image, thereby achieving accurate recognition of the picture-in-picture image comprising vulgar pornography.

Description

一种图像识别、分类方法、装置、电子设备及存储介质Image recognition and classification method, device, electronic equipment and storage medium
本申请要求于2019年05月31日提交中国专利局、申请号为201910469236.2、发明名称为“一种图像分类方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on May 31, 2019, the application number is 201910469236.2, and the invention title is "an image classification method, device, electronic equipment, and storage medium". The reference is incorporated in this application.
技术领域Technical field
本申请涉及图像处理技术领域,尤其是涉及一种图像识别、分类方法、装置、电子设备及存储介质。This application relates to the field of image processing technology, in particular to an image recognition and classification method, device, electronic equipment and storage medium.
背景技术Background technique
在一些视频播放场景中,会出现画中画图像,其中,画中画图像指的是第一图像被包含在第二图像中,且第一图像的面积小于第二图像的面积。例如:显示器A显示了一幅图像B,该图像B中的内容为两个女孩在一个房间内进行对话,房间内的放置的一个显示器C也显示在图像B中,显示器C上显示了一幅图像D,在此场景中,图像B中包含了图像D,因此,将图像D称为画中画图像。In some video playback scenes, a picture-in-picture image appears, where the picture-in-picture image refers to that the first image is contained in the second image, and the area of the first image is smaller than the area of the second image. For example: Display A displays an image B. The content in this image B is that two girls are having a conversation in a room. A display C placed in the room is also displayed in image B, and a picture is displayed on display C. Image D. In this scene, image B contains image D. Therefore, image D is called a picture-in-picture image.
相关技术中,存在采用神经网络模型来识别视频或照片等图像内容中是否包含色情内容的方法。但是,如果被识别的图像为画中画图像,且被识别的图像中画中画图像的面积小于整个图像的面积的百分之三十的时候,相关技术很难准确的识别画中画图像中是否包含色情内容。In related technologies, there is a method of using a neural network model to identify whether image content such as videos or photos contains pornographic content. However, if the recognized image is a picture-in-picture image, and the area of the picture-in-picture image in the recognized image is less than 30% of the area of the entire image, it is difficult for related technologies to accurately recognize the picture-in-picture image Whether it contains pornographic content.
另外,在一些直播平台场景中,由于直播平台的数量巨大,人力监管直播的视频是否包含色情图像会耗费大量成本,因此需要一种对直播的图像内容进行识别的方法,识别出包含有低俗色情的图像内容,进而对包含色情低俗内容的直播平台进行管理。In addition, in some live broadcast platform scenes, due to the huge number of live broadcast platforms, it will cost a lot of costs to manually monitor whether the live video contains pornographic images. Therefore, a method of identifying the content of the live broadcast image is needed to identify the vulgar pornography. Image content, and then manage the live broadcast platform containing pornographic and vulgar content.
发明内容Summary of the invention
本申请实施例的目的在于提供一种图像识别、分类方法、装置、电子设备及存储介质,以实现对包含有低俗色情的直播图像内容的准确识别。具体技术方案如下:The purpose of the embodiments of the present application is to provide an image recognition and classification method, device, electronic device, and storage medium, so as to realize accurate recognition of the content of a live broadcast image containing vulgar pornography. The specific technical solutions are as follows:
第一方面,本申请实施例提供了一种识别画中画图像的方法,该方法包 括:In the first aspect, an embodiment of the present application provides a method for recognizing a picture-in-picture image, and the method includes:
获取待识别图像;Obtain the image to be recognized;
基于画中画图像识别模型对待识别图像进行分析,确定待识别图像是否为画中画图像,其中,画中画图像识别模型为采用第一数据集通过机器学习训练得到,第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,第一标签用于表示第一图像是否为画中画图像。Analyze the image to be recognized based on the picture-in-picture image recognition model to determine whether the image to be recognized is a picture-in-picture image. The picture-in-picture image recognition model is obtained through machine learning training using the first data set, and the first data set includes multiple Set of data, each set of data includes a first image and a corresponding first label, and the first label is used to indicate whether the first image is a picture-in-picture image.
第二方面,本申请实施例提供了一种画中画图像识别模型的训练方法,该方法包括:In the second aspect, an embodiment of the present application provides a method for training a picture-in-picture image recognition model, the method including:
获取预先创建的初始神经网络模型;Obtain the pre-created initial neural network model;
基于第一数据集对初始神经网络模型进行训练,得到画中画图像识别模型,其中,第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,第一标签用于表示第一图像是否为画中画图像。The initial neural network model is trained based on the first data set to obtain the picture-in-picture image recognition model. The first data set includes multiple sets of data, and each set of data includes a first image and a corresponding first label. The tag is used to indicate whether the first image is a picture-in-picture image.
第三方面,本申请实施例提供了一种图像色情等级分类方法,该方法包括:In the third aspect, an embodiment of the present application provides a method for classifying image pornography, and the method includes:
获取待分类图像;Obtain the image to be classified;
基于画中画图像识别模型对待分类图像进行分析,确定待分类图像是否为画中画图像,其中,画中画图像识别模型为采用第一数据集通过机器学习训练得到,第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,第一标签用于表示第一图像是否为画中画图像;Analyze the image to be classified based on the picture-in-picture image recognition model to determine whether the image to be classified is a picture-in-picture image. The picture-in-picture image recognition model is obtained through machine learning training using the first data set, and the first data set includes multiple Set of data, each set of data includes a first image and a corresponding first label, where the first label is used to indicate whether the first image is a picture-in-picture image;
如果待分类图像非画中画图像,则基于第一分类模型对待分类图像进行分析,确定待分类图像的色情等级类别,其中,第一分类模型为采用第二数据集通过机器学习训练得到,第二数据集包括多组数据,每组数据均包括一个第二图像和对应的第二标签,第二标签用于表示第二图像的色情等级类别;If the image to be classified is not a picture-in-picture image, analyze the image to be classified based on the first classification model to determine the pornographic level category of the image to be classified. The first classification model is obtained through machine learning training using the second data set. The second data set includes multiple sets of data, and each set of data includes a second image and a corresponding second label, and the second label is used to indicate the pornographic level category of the second image;
如果待分类图像为画中画图像,则基于第二分类模型对待分类图像进行分析,确定待分类图像的色情等级类别,其中,第二分类模型为采用第二数据集通过机器学***均池化层和卷积核后得到。If the image to be classified is a picture-in-picture image, analyze the image to be classified based on the second classification model to determine the pornographic level category of the image to be classified. The second classification model is trained by machine learning using the second data set and The trained model is obtained after changing the global average pooling layer and convolution kernel.
第四方面,本申请实施例提供了一种识别画中画图像的装置,该装置包括:In a fourth aspect, an embodiment of the present application provides a device for recognizing a picture-in-picture image, the device including:
获取模块,用于获取待识别图像;The acquisition module is used to acquire the image to be recognized;
画中画图像识别模块,用于基于画中画图像识别模型对待识别图像进行 分析,确定待识别图像是否为画中画图像,其中,画中画图像识别模型为采用第一数据集通过机器学习训练得到,第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,第一标签用于表示第一图像是否为画中画图像。The picture-in-picture image recognition module is used to analyze the image to be recognized based on the picture-in-picture image recognition model to determine whether the image to be recognized is a picture-in-picture image. The picture-in-picture image recognition model uses the first data set through machine learning After training, the first data set includes multiple sets of data, and each set of data includes a first image and a corresponding first label. The first label is used to indicate whether the first image is a picture-in-picture image.
第五方面,本申请实施例提供了一种画中画图像识别模型的训练装置,该装置包括:In a fifth aspect, an embodiment of the present application provides a training device for a picture-in-picture image recognition model, which includes:
获取模块,用于获取预先创建的初始神经网络模型;The acquisition module is used to acquire the pre-created initial neural network model;
训练模块,用于基于第一数据集对初始神经网络模型进行训练,得到画中画图像识别模型,其中,第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,第一标签用于表示第一图像是否为画中画图像。The training module is used to train the initial neural network model based on the first data set to obtain the picture-in-picture image recognition model. The first data set includes multiple sets of data, and each set of data includes a first image and a corresponding first image. A label, the first label is used to indicate whether the first image is a picture-in-picture image.
第六方面,本申请实施例提供了一种图像色情等级分类装置,该装置包括:In a sixth aspect, an embodiment of the present application provides an image pornography grade classification device, which includes:
获取模块,用于获取待分类图像;An acquisition module for acquiring images to be classified;
画中画图像识别模块,用于基于画中画图像识别模型对待分类图像进行分析,确定待分类图像是否为画中画图像,其中,画中画图像识别模型为采用第一数据集通过机器学习训练得到,第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,第一标签用于表示第一图像是否为画中画图像;The picture-in-picture image recognition module is used to analyze the image to be classified based on the picture-in-picture image recognition model to determine whether the image to be classified is a picture-in-picture image. The picture-in-picture image recognition model uses the first data set through machine learning After training, the first data set includes multiple sets of data, and each set of data includes a first image and a corresponding first label, and the first label is used to indicate whether the first image is a picture-in-picture image;
第一类别确定模块,用于当待分类图像非画中画图像时,则基于第一分类模型对待分类图像进行分析,确定待分类图像的色情等级类别,其中,第一分类模型为采用第二数据集通过机器学习训练得到,第二数据集包括多组数据,每组数据均包括一个第二图像和对应的第二标签,第二标签用于表示第二图像的色情等级类别;The first category determination module is used for when the image to be classified is not a picture-in-picture image, analyze the image to be classified based on the first classification model to determine the pornographic level category of the image to be classified, wherein the first classification model adopts the second The data set is obtained through machine learning training, the second data set includes multiple sets of data, each set of data includes a second image and a corresponding second label, and the second label is used to indicate the pornographic level category of the second image;
第二类别确定模块,用于当待分类图像为画中画图像时,则基于第二分类模型对待分类图像进行分析,确定待分类图像的色情等级类别,其中,第二分类模型为采用第二数据集通过机器学***均池化层和卷积核后得到。The second category determination module is used for when the image to be classified is a picture-in-picture image, analyze the image to be classified based on the second classification model to determine the pornographic level category of the image to be classified, wherein the second classification model adopts the second classification model. The data set is trained by machine learning, and is obtained by changing the global average pooling layer and convolution kernel from the trained model.
第七方面,本申请实施例提供了一种电子设备,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;In a seventh aspect, an embodiment of the present application provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete mutual communication through the communication bus;
存储器,用于存放计算机程序;Memory, used to store computer programs;
处理器,用于执行存储器上所存放的程序时,实现本申请实施例第一方面、第二方面或第三方面所提供的方法。The processor is configured to implement the method provided in the first aspect, the second aspect, or the third aspect of the embodiments of the present application when executing the program stored in the memory.
第八方面,本申请实施例提供了一种计算机可读存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行本申请实施例第一方面、第二方面或第三方面所提供的方法。In an eighth aspect, the embodiments of the present application provide a computer-readable storage medium. When the instructions in the storage medium are executed by the processor of the electronic device, the electronic device can execute the first aspect and the second aspect of the embodiments of the present application. Aspect or the method provided by the third aspect.
第九方面,本申请实施例提供了一种应用程序,用于在运行时执行:本申请实施例第一方面、第二方面或第三方面所提供的方法。In a ninth aspect, an embodiment of the present application provides an application program for execution at runtime: the method provided in the first, second, or third aspect of the embodiments of the present application.
本申请实施例所提供的方案中,获取待分类图像,基于画中画图像识别模型对待分类图像进行分析,确定待分类图像是否为画中画图像,如果待分类图像非画中画图像,则基于第一分类模型对待分类图像进行分析,确定待分类图像的色情等级类别,如果待分类图像为画中画图像,则基于第二分类模型对待分类图像进行分析,确定待分类图像的色情等级类别。其中,画中画图像识别模型为采用第一数据集通过机器学***均池化层和卷积核后得到。先确定待分类图像是否为画中画图像,如果不是画中画图像,则使用第一分类模型确定待分类图像的色情等级类别,如果是画中画图像,则使用第二分类模型确定待分类图像的色情等级类别,第二分类模型和第一分类模型有所区别,第二分类模型是采用第二数据集通过机器学***均池化层和卷积核后得到的,因此,能够对是画中画图像的待分类图像的色情等级类别进行准确识别,从而实现了对包含有低俗色情的画中画图像的准确识别。In the solution provided by the embodiment of this application, the image to be classified is obtained, and the image to be classified is analyzed based on the picture-in-picture image recognition model to determine whether the image to be classified is a picture-in-picture image. If the image to be classified is not a picture-in-picture image, then Analyze the image to be classified based on the first classification model to determine the pornographic level category of the image to be classified. If the image to be classified is a picture-in-picture image, analyze the image to be classified based on the second classification model to determine the pornographic level category of the image to be classified . Among them, the picture-in-picture image recognition model is obtained through machine learning training using a first data set. The first data set includes multiple sets of data. Each set of data includes a first image and a corresponding first label. The first label is used for Indicates whether the first image is a picture-in-picture image; the first classification model is obtained through machine learning training using a second data set. The second data set includes multiple sets of data, and each set of data includes a second image and a corresponding second image. Label, the second label is used to indicate the pornographic level category of the second image; the second classification model is obtained by using the second data set to train through machine learning, and changing the global average pooling layer and convolution kernel from the trained model. First determine whether the image to be classified is a picture-in-picture image, if it is not a picture-in-picture image, use the first classification model to determine the pornographic level category of the image to be classified, if it is a picture-in-picture image, use the second classification model to determine the image to be classified The pornographic level category of the image, the second classification model is different from the first classification model. The second classification model uses the second data set through machine learning training and the trained model changes the global average pooling layer and convolution kernel. Therefore, it is possible to accurately identify the pornographic level category of the image to be classified that is a picture-in-picture image, thereby realizing accurate identification of the picture-in-picture image containing vulgar pornography.
附图说明Description of the drawings
为了更清楚地说明本申请实施例和相关技术的技术方案,下面对实施例和相关技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application and related technologies, the following briefly introduces the drawings that need to be used in the embodiments and related technologies. Obviously, the drawings in the following description are only of the present application. For some embodiments, those of ordinary skill in the art can obtain other drawings based on these drawings without creative work.
图1为本申请实施例所提供的识别画中画图像的方法的流程图;FIG. 1 is a flowchart of a method for recognizing a picture-in-picture image provided by an embodiment of the application;
图2为本申请实施例所提供的画中画图像识别模型的训练方法的流程图;2 is a flowchart of a training method for a picture-in-picture image recognition model provided by an embodiment of the application;
图3为本申请实施例所提供的图像色情等级分类方法的流程图;Fig. 3 is a flowchart of a method for classifying image pornography provided by an embodiment of the application;
图4为图3所示实施例中步骤S304的一种具体流程图;FIG. 4 is a specific flowchart of step S304 in the embodiment shown in FIG. 3;
图5为图4所示实施例中步骤S404的一种具体流程图;FIG. 5 is a specific flowchart of step S404 in the embodiment shown in FIG. 4;
图6为本申请实施例所提供的图像色情等级分类方法中第二分类模型的训练方式的流程图;FIG. 6 is a flowchart of the training method of the second classification model in the method for classifying image pornography levels provided by an embodiment of the application;
图7为本申请实施例所提供的识别画中画图像的装置的结构示意图;FIG. 7 is a schematic structural diagram of an apparatus for recognizing picture-in-picture images provided by an embodiment of the application;
图8为本申请实施例所提供的画中画图像识别模型的训练装置的结构示意图;FIG. 8 is a schematic structural diagram of a training device for a picture-in-picture image recognition model provided by an embodiment of the application;
图9为本申请实施例所提供的图像色情等级分类装置的结构示意图;FIG. 9 is a schematic structural diagram of an image pornography grade classification device provided by an embodiment of the application;
图10为本申请实施例所提供的电子设备的结构示意图。FIG. 10 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
具体实施方式Detailed ways
为使本申请的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions, and advantages of the present application clearer, the following further describes the present application in detail with reference to the drawings and embodiments. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of this application.
为了实现对包含有低俗色情的直播图像内容的准确识别,本申请实施例提供了一种图像识别、分类方法、装置、电子设备及计算机可读存储介质。In order to achieve accurate recognition of live image content containing vulgar pornography, embodiments of the present application provide an image recognition and classification method, device, electronic equipment, and computer-readable storage medium.
如图1所示,本申请实施例提供了一种识别画中画图像的方法,该方法可以包括如下步骤。As shown in FIG. 1, an embodiment of the present application provides a method for recognizing a picture-in-picture image. The method may include the following steps.
S101,获取待识别图像。S101: Acquire an image to be recognized.
S102,基于画中画图像识别模型对待识别图像进行分析,确定待识别图像是否为画中画图像。S102: Analyze the image to be recognized based on the picture-in-picture image recognition model, and determine whether the image to be recognized is a picture-in-picture image.
其中,画中画图像识别模型为采用第一数据集通过机器学习训练得到,第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,第一标签用于表示第一图像是否为画中画图像。Among them, the picture-in-picture image recognition model is obtained through machine learning training using a first data set. The first data set includes multiple sets of data. Each set of data includes a first image and a corresponding first label. The first label is used for Indicates whether the first image is a picture-in-picture image.
应用本申请实施例,在获取到待识别图像后,基于画中画图像识别模型对待识别图像进行分析,确定待识别图像是否为画中画图像。画中画图像识 别模型是采用第一数据集通过机器学习训练得到,第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,第一标签用于表示第一图像是否为画中画图像。可见,画中画图像识别模型是一种神经网络模型,能够实现端到端的输出待识别图像是否为画中画图像的识别结果,实现了对画中画图像的自动识别。Using the embodiment of the application, after the image to be recognized is acquired, the image to be recognized is analyzed based on the picture-in-picture image recognition model to determine whether the image to be recognized is a picture-in-picture image. The picture-in-picture image recognition model is obtained through machine learning training using a first data set. The first data set includes multiple sets of data. Each set of data includes a first image and a corresponding first label. The first label is used to represent the first image. Whether an image is a picture-in-picture image. It can be seen that the picture-in-picture image recognition model is a neural network model, which can realize the end-to-end output of the recognition result of whether the image to be recognized is a picture-in-picture image, and realize the automatic recognition of the picture-in-picture image.
本申请实施例所提供的识别画中画图像的方法可以应用于具有图像处理功能的电子设备,例如,可以为直播平台的服务器、图像处理设备等,在此不做具体限定。The method for recognizing picture-in-picture images provided by the embodiments of the present application can be applied to electronic devices with image processing functions, for example, servers of live broadcast platforms, image processing devices, etc., which are not specifically limited herein.
待识别图像即为需要进行画中画识别的图像。电子设备可以利用自身的图像采集器件获取待识别图像,也可以从其他电子设备获取待识别图像,这都是合理的。待识别图像可以为直播平台的直播图像,也可以是其他场景的需要进行画中画识别的图像,在此不做具体限定。The image to be recognized is the image that requires picture-in-picture recognition. The electronic device can use its own image acquisition device to obtain the image to be recognized, or it can obtain the image to be recognized from other electronic devices, which is reasonable. The image to be recognized may be a live image of a live broadcast platform, or an image that requires picture-in-picture recognition in other scenes, and is not specifically limited here.
获取上述待识别图像后,电子设备可以基于画中画图像识别模型对待识别图像进行分析,得到待分类图像是否为画中画图像的识别结果。After obtaining the above-mentioned image to be recognized, the electronic device can analyze the image to be recognized based on the picture-in-picture image recognition model to obtain the recognition result of whether the image to be classified is a picture-in-picture image.
画中画图像识别模型为采用第一数据集通过机器学习训练得到,第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,第一标签可以通过人工标注得到,用于表示第一图像是否为画中画图像。画中画图像识别模型的训练过程详见图2所示实施例。The picture-in-picture image recognition model is obtained through machine learning training using the first data set. The first data set includes multiple sets of data. Each set of data includes a first image and a corresponding first label. The first label can be manually labeled Obtained, used to indicate whether the first image is a picture-in-picture image. The training process of the picture-in-picture image recognition model is detailed in the embodiment shown in FIG. 2.
其中,画中画图像识别模型可以为卷积神经网络等神经网络模型,具体可以包括特征提取部分、全局平均池化层和输出层。特征提取部分由一系列的卷积、批归一化和激活函数组成,用于提取图像特征;全局平均池化层将特征提取部分提取的图像特征进行全局平均池化操作,得到输入图像的全局特征;输出层是一个全连接层,对全局特征进行全连接处理,得到的类别向量用于确定输入的待识别图像是否为画中画图像。Among them, the picture-in-picture image recognition model can be a neural network model such as a convolutional neural network, which can specifically include a feature extraction part, a global average pooling layer, and an output layer. The feature extraction part is composed of a series of convolution, batch normalization and activation functions to extract image features; the global average pooling layer performs global average pooling operation on the image features extracted by the feature extraction part to obtain the global input image Features; the output layer is a fully connected layer, which performs fully connected processing on global features, and the resulting category vector is used to determine whether the input image to be recognized is a picture-in-picture image.
作为本申请实施例的一种实施方式,画中画图像识别模型的输出结果可以为概率向量,也就是待识别图像为画中画图像的概率以及不为画中画图像的概率。作为本申请实施例的另一种实施方式,画中画图像识别模型的输出结果可以为标签,该标签标识待识别图像为画中画图像或者不为画中画图像。例如,标签1表示待识别图像为画中画图像,标签2表示待识别图像不为画中画图像。As an implementation manner of the embodiments of the present application, the output result of the picture-in-picture image recognition model may be a probability vector, that is, the probability that the image to be recognized is a picture-in-picture image and the probability that the image to be recognized is not a picture-in-picture image. As another implementation manner of the embodiments of the present application, the output result of the picture-in-picture image recognition model may be a label, which identifies that the image to be recognized is a picture-in-picture image or not a picture-in-picture image. For example, label 1 indicates that the image to be recognized is a picture-in-picture image, and label 2 indicates that the image to be recognized is not a picture-in-picture image.
作为本申请实施例的一种实施方式,S102具体可以通过如下步骤实现:As an implementation manner of the embodiment of the present application, S102 may be specifically implemented through the following steps:
第一步,将待识别图像输入画中画图像识别模型,得到第一输出结果,其中,第一输出结果为二维向量,第一输出结果包括第一分量和第二分量,第一分量表示待识别图像为画中画图像的概率,第二分量表示待识别图像不为画中画图像的概率。The first step is to input the image to be recognized into the picture-in-picture image recognition model to obtain a first output result, where the first output result is a two-dimensional vector, and the first output result includes a first component and a second component, and the first component represents The probability that the image to be recognized is a picture-in-picture image, and the second component represents the probability that the image to be recognized is not a picture-in-picture image.
第二步,如果第一分量大于第二分量,则确定待识别图像为画中画图像。In the second step, if the first component is greater than the second component, it is determined that the image to be recognized is a picture-in-picture image.
画中画图像识别模型输出的是一个二维向量,这里称为第一输出结果,第一输出结果中包括两个分量,一个分量表示输入的待识别图像为画中画图像的概率,另一个分量表示输入的待识别图像不为画中画图像的概率。则基于第一输出结果中的两个分量大小,可以确定出待识别图像是否为画中画图像。The output of the picture-in-picture image recognition model is a two-dimensional vector, which is called the first output result. The first output result includes two components. One component represents the probability that the input image to be recognized is a picture-in-picture image, and the other The component represents the probability that the input image to be recognized is not a picture-in-picture image. Based on the size of the two components in the first output result, it can be determined whether the image to be recognized is a picture-in-picture image.
画中画图像识别模型具体可以包括两个部分,第一个部分输出的是第一向量,第一向量的大小为2,即包含两个分量,两个分量分别表示待识别图像为画中画的待归一化概率和待识别图像不为画中画的待归一化概率。第二部分为预设的归一化函数,归一化函数可以为损失函数(也称为softmax函数),将得到的第一向量输入归一化函数,能够得到第一输出结果。其中,归一化函数如算式(1)所示。The picture-in-picture image recognition model can specifically include two parts. The first part outputs the first vector. The size of the first vector is 2, which means it contains two components. The two components respectively indicate that the image to be recognized is a picture-in-picture The probability to be normalized and the probability to be normalized that the image to be recognized is not a picture-in-picture. The second part is a preset normalization function. The normalization function can be a loss function (also called a softmax function). Input the obtained first vector into the normalization function, and the first output result can be obtained. Among them, the normalization function is shown in equation (1).
Figure PCTCN2020092898-appb-000001
Figure PCTCN2020092898-appb-000001
X第一向量,包括两个分量X 1和X 2,分别表示待识别图像为画中画的待归一化概率和待识别图像不为画中画的待归一化概率;p为第二向量,包括两个分量p 1和p 2,分别表示待识别图像为画中画图像的概率和待识别图像不为画中画图像的概率。 The first vector of X, including two components X 1 and X 2 , respectively represents the to-be-normalized probability that the image to be recognized is a picture-in-picture and the probability that the image to be recognized is not a picture-in-picture; p is the second The vector includes two components p 1 and p 2 , respectively representing the probability that the image to be recognized is a picture-in-picture image and the probability that the image to be recognized is not a picture-in-picture image.
在计算得到第一输出结果后,可以比较p 1和p 2的大小,如果p 1>p 2,即待识别图像为画中画图像的概率大于待识别图像不为画中画图像的概率,则可以确定待识别图像为画中画图像。同理,如果p 1≤p 2,即待识别图像为画中画图像的概率不大于待识别图像不为画中画图像的概率,则可以确定待识别图像不为画中画图像。 After the first output result is calculated, the sizes of p 1 and p 2 can be compared. If p 1 > p 2 , that is, the probability that the image to be recognized is a picture-in-picture image is greater than the probability that the image to be recognized is not a picture-in-picture image, It can be determined that the image to be recognized is a picture-in-picture image. Similarly, if p 1 ≤ p 2 , that is, the probability that the image to be recognized is a picture-in-picture image is not greater than the probability that the image to be recognized is not a picture-in-picture image, it can be determined that the image to be recognized is not a picture-in-picture image.
例如,第一输出结果为{0.7,0.3},那么电子设备可以确定待识别图像为画中画图像的概率0.7大于待识别图像不为画中画图像的概率0.3,便可以确定待识别图像为画中画图像。For example, if the first output result is {0.7, 0.3}, the electronic device can determine that the probability that the image to be recognized is a picture-in-picture image is 0.7 is greater than the probability that the image to be recognized is not a picture-in-picture image is 0.3, and it can determine that the image to be recognized is Picture-in-picture image.
如图2所示,本申请实施例提供了一种画中画图像识别模型的训练方法, 该方法可以包括如下步骤。As shown in FIG. 2, an embodiment of the present application provides a method for training a picture-in-picture image recognition model. The method may include the following steps.
S201,获取预先创建的初始神经网络模型。S201: Obtain a pre-created initial neural network model.
S202,基于第一数据集对初始神经网络模型进行训练,得到画中画图像识别模型。S202: Training the initial neural network model based on the first data set to obtain a picture-in-picture image recognition model.
其中,第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,第一标签用于表示第一图像是否为画中画图像。The first data set includes multiple sets of data, and each set of data includes a first image and a corresponding first label, and the first label is used to indicate whether the first image is a picture-in-picture image.
应用本申请实施例,通过构建初始神经网络模型,基于第一数据集对初始神经网络模型进行训练,得到画中画图像识别模型。通过训练得到画中画图像识别模型是一种神经网络模型,能够实现端到端的输出待识别图像是否为画中画图像的识别结果。Using the embodiments of the present application, by constructing an initial neural network model, the initial neural network model is trained based on the first data set to obtain a picture-in-picture image recognition model. The picture-in-picture image recognition model obtained through training is a neural network model, which can realize end-to-end output of the recognition result of whether the image to be recognized is a picture-in-picture image.
本申请实施例所提供的画中画图像识别模型的训练方法可以应用于具有图像处理功能的电子设备,例如,可以为直播平台的服务器、图像处理设备等,也可以应用于提供模型训练功能的训练服务器,在此不做具体限定。The training method of the picture-in-picture image recognition model provided by the embodiments of the present application can be applied to electronic equipment with image processing functions, for example, it can be a server of a live broadcast platform, image processing equipment, etc., and can also be applied to those that provide model training functions. The training server is not specifically limited here.
为了得到画中画图像识别模型,首先需要创建一个初始神经网络模型,初始神经网络模型由三部分组成:特征提取部分、全局平均池化层和输出层。特征提取部分由一系列的卷积、批归一化和激活函数组成,用于提取图像特征;全局平均池化层将特征提取部分提取的图像特征进行全局平均池化操作,得到输入图像的全局特征;输出层是一个全连接层,对全局特征进行全连接处理,得到的类别向量用于确定输入图像是否为画中画。In order to obtain the picture-in-picture image recognition model, an initial neural network model needs to be created first. The initial neural network model consists of three parts: the feature extraction part, the global average pooling layer and the output layer. The feature extraction part is composed of a series of convolution, batch normalization and activation functions to extract image features; the global average pooling layer performs global average pooling operation on the image features extracted by the feature extraction part to obtain the global input image Features; the output layer is a fully connected layer, which performs fully connected processing on global features, and the resulting category vector is used to determine whether the input image is a picture-in-picture.
作为本申请实施例的一种实施方式,S202具体可以利用如下步骤实现:As an implementation manner of the embodiment of this application, S202 can be specifically implemented by using the following steps:
第一步,从第一数据集中获取一个第一图像,并将第一图像输入至初始神经网络模型,得到第二输出结果,其中,第二输出结果为二维向量,第二输出结果包括第三分量和第四分量,第三分量表示第一图像为画中画图像的概率,第四分量表示第一图像不为画中画图像的概率。The first step is to obtain a first image from the first data set, and input the first image to the initial neural network model to obtain the second output result, where the second output result is a two-dimensional vector, and the second output result includes the first Three components and a fourth component. The third component represents the probability that the first image is a picture-in-picture image, and the fourth component represents the probability that the first image is not a picture-in-picture image.
第二步,根据第二输出结果和第一图像对应的第一标签,确定损失量,其中,损失量表示第二输出结果与第一标签之间的差异。In the second step, the loss amount is determined according to the second output result and the first label corresponding to the first image, where the loss amount represents the difference between the second output result and the first label.
第三步,根据损失量,更新初始神经网络模型中的权重参数。The third step is to update the weight parameters in the initial neural network model according to the amount of loss.
返回执行第一步,重复第一步至第三步,以不断迭代的对初始神经网络模型的权重参数进行更新,直至损失量对应的损失函数收敛,得到画中画图像识别模型。Return to the first step and repeat the first to third steps to continuously iteratively update the weight parameters of the initial neural network model until the loss function corresponding to the loss amount converges, and the picture-in-picture image recognition model is obtained.
第一数据集中包括多组数据,每组数据均包括一个第一图像和对应的第 一标签,第一标签是人工标定的,用于表示第一图像是否为画中画图像样本图像,第一数据集中的数据可以按照K:1的比例分成训练集和测试集,利用训练集训练神经网络模型。The first data set includes multiple sets of data, and each set of data includes a first image and a corresponding first label. The first label is manually calibrated to indicate whether the first image is a picture-in-picture image sample image. The data in the data set can be divided into training set and test set according to the ratio of K:1, and the training set is used to train the neural network model.
从第一数据集中获取一个第一图像,输入至初始神经网络模型,得到第二输出结果。然后,根据第二输出结果和第一图像对应的第一标签,利用预设的损失函数L=-log(p y)进行计算,得到损失量L,其中,y为第一标签。 Obtain a first image from the first data set, input it to the initial neural network model, and obtain the second output result. Then, according to the second output result and the first label corresponding to the first image, a preset loss function L=-log(p y ) is used to calculate the loss L, where y is the first label.
损失量L与p y的值成反比,也就是说,调整神经网络模型的网络参数使得损失函数的值L越来越小,使得p y的值越来越大,甚至趋近于1,这样能够使目标深度学习模型输出结果中为画中画图像的概率趋近于1,也就使得识别结果越来越准确。 The loss amount L is inversely proportional to the value of p y , that is to say, adjusting the network parameters of the neural network model makes the value of the loss function L become smaller and smaller, making the value of p y larger and larger, even approaching 1, so It can make the probability that the target deep learning model is a picture-in-picture image in the output result approaches 1, which makes the recognition result more and more accurate.
具体的,可以采用反向传播算法计算损失量L对网络参数的倒数
Figure PCTCN2020092898-appb-000002
其中,W为网络参数。进而,采用随机梯度下降算法更新网络参数。即根据下列算式计算新的网络参数:
Specifically, the back propagation algorithm can be used to calculate the reciprocal of the loss L to the network parameters
Figure PCTCN2020092898-appb-000002
Among them, W is a network parameter. Furthermore, the stochastic gradient descent algorithm is used to update the network parameters. That is, calculate the new network parameters according to the following formula:
Figure PCTCN2020092898-appb-000003
Figure PCTCN2020092898-appb-000003
其中,W *为新的网络参数,α预设调整参数,其具体值可以根据训练要求、目标神经网络模型的准确率等因素设定,例如,可以为0.001、0.0015、0.002等,在此不做具体限定。 Among them, W * is a new network parameter, and α is a preset adjustment parameter. Its specific value can be set according to factors such as training requirements and the accuracy of the target neural network model. For example, it can be 0.001, 0.0015, 0.002, etc. Make specific restrictions.
不断迭代地对神经网络模型的参数进行更新,直至收敛,则训练结束,得到最终的画中画图像识别模型。Iteratively update the parameters of the neural network model until convergence, then the training ends, and the final picture-in-picture image recognition model is obtained.
如图3所示,本申请实施例提供了一种图像色情等级分类方法,该方法可以包括如下步骤。As shown in FIG. 3, an embodiment of the present application provides a method for classifying image pornography, and the method may include the following steps.
S301,获取待分类图像。S301: Obtain an image to be classified.
S302,基于画中画图像识别模型对待分类图像进行分析,确定待分类图像是否为画中画图像,若是则执行S304,否则执行S303。S302: Analyze the image to be classified based on the picture-in-picture image recognition model to determine whether the image to be classified is a picture-in-picture image, if it is, execute S304, otherwise execute S303.
其中,画中画图像识别模型为采用第一数据集通过机器学习训练得到, 第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,第一标签用于表示第一图像是否为画中画图像。Among them, the picture-in-picture image recognition model is obtained through machine learning training using a first data set. The first data set includes multiple sets of data. Each set of data includes a first image and a corresponding first label. The first label is used for Indicates whether the first image is a picture-in-picture image.
S303,基于第一分类模型对待分类图像进行分析,确定待分类图像的色情等级类别。S303: Analyze the image to be classified based on the first classification model, and determine the pornographic level category of the image to be classified.
其中,第一分类模型为采用第二数据集通过机器学习训练得到,第二数据集包括多组数据,每组数据均包括一个第二图像和对应的第二标签,第二标签用于表示第二图像的色情等级类别。Among them, the first classification model is obtained through machine learning training using a second data set. The second data set includes multiple sets of data. Each set of data includes a second image and a corresponding second label. The second label is used to represent the first The pornographic level category of the second image.
S304,基于第二分类模型对待分类图像进行分析,确定待分类图像的色情等级类别。S304: Analyze the image to be classified based on the second classification model, and determine the pornographic level category of the image to be classified.
其中,第二分类模型为采用第二数据集通过机器学***均池化层和卷积核后得到。Among them, the second classification model is obtained by using the second data set through machine learning training, and changing the global average pooling layer and convolution kernel from the trained model.
应用本申请实施例,获取待分类图像,基于画中画图像识别模型对待分类图像进行分析,确定待分类图像是否为画中画图像,如果待分类图像非画中画图像,则基于第一分类模型对待分类图像进行分析,确定待分类图像的色情等级类别,如果待分类图像为画中画图像,则基于第二分类模型对待分类图像进行分析,确定待分类图像的色情等级类别。其中,画中画图像识别模型为采用第一数据集通过机器学***均池化层和卷积核后得到。先确定待分类图像是否为画中画图像,如果不是画中画图像,则使用第一分类模型确定待分类图像的色情等级类别,如果是画中画图像,则使用第二分类模型确定待分类图像的色情等级类别,第二分类模型和第一分类模型有所区别,第二分类模型是采用第二数据集通过机器学***均池化层和卷积核后得到的,因此,能够对是画中画图像的待分类图像的色情等级类别进行准确识别,从而实现了对包含有低俗色情的画中画图像的准确识别。Apply the embodiment of this application to obtain the image to be classified, analyze the image to be classified based on the picture-in-picture image recognition model, and determine whether the image to be classified is a picture-in-picture image. If the image to be classified is not a picture-in-picture image, it is based on the first classification The model analyzes the image to be classified to determine the pornographic level category of the image to be classified. If the image to be classified is a picture-in-picture image, analyze the image to be classified based on the second classification model to determine the pornographic level category of the image to be classified. Among them, the picture-in-picture image recognition model is obtained through machine learning training using a first data set. The first data set includes multiple sets of data. Each set of data includes a first image and a corresponding first label. The first label is used for Indicates whether the first image is a picture-in-picture image; the first classification model is obtained through machine learning training using a second data set. The second data set includes multiple sets of data, and each set of data includes a second image and a corresponding second image. Label, the second label is used to indicate the pornographic level category of the second image; the second classification model is obtained by using the second data set to train through machine learning, and changing the global average pooling layer and convolution kernel from the trained model. First determine whether the image to be classified is a picture-in-picture image, if it is not a picture-in-picture image, use the first classification model to determine the pornographic level category of the image to be classified, if it is a picture-in-picture image, use the second classification model to determine the image to be classified The pornographic level category of the image, the second classification model is different from the first classification model. The second classification model uses the second data set through machine learning training and the trained model changes the global average pooling layer and convolution kernel. Therefore, it is possible to accurately identify the pornographic level category of the image to be classified that is a picture-in-picture image, thereby realizing accurate identification of the picture-in-picture image containing vulgar pornography.
本申请实施例所提供的图像色情等级分类方法可以应用于需要进行图像色情等级分类的任意电子设备,例如,可以为直播平台的服务器、图像处理 设备等,在此不做具体限定,为了描述方便,以下简称电子设备。The image pornography level classification method provided in the embodiments of the application can be applied to any electronic device that needs to perform image pornography level classification, for example, it can be a server of a live broadcast platform, image processing equipment, etc., which are not specifically limited here, for the convenience of description , Hereinafter referred to as electronic equipment.
电子设备获取到的待分类图像为需要进行分类的图像,电子设备可以利用自身的图像采集器件获取待分类图像,也可以从其他电子设备获取待分类图像,这都是合理的。待分类图像可以为直播平台的直播图像,也可以是其他场景的需要进行分类的图像,在此不做具体限定。The image to be classified obtained by the electronic device is an image that needs to be classified. It is reasonable that the electronic device can use its own image acquisition device to obtain the image to be classified, or obtain the image to be classified from other electronic devices. The image to be classified may be a live image of a live broadcast platform, or an image that needs to be classified in other scenes, and is not specifically limited here.
获取上述待分类图像后,为了能够准确确定待分类图像的色情等级类别,电子设备可以将待分类图像输入预先训练的画中画图像识别模型,得到待分类图像是否为画中画图像的识别结果。画中画图像识别模型为采用第一数据集通过机器学习训练得到的,第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,第一标签为人工标定,用于表示第一图像是否为画中画图像。画中画图像识别模型的训练过程详见图2所示实施例,这里不再赘述。After obtaining the above-mentioned image to be classified, in order to accurately determine the pornographic level category of the image to be classified, the electronic device can input the image to be classified into a pre-trained picture-in-picture image recognition model to obtain the recognition result of whether the image to be classified is a picture-in-picture image . The picture-in-picture image recognition model is obtained through machine learning training using the first data set. The first data set includes multiple sets of data. Each set of data includes a first image and a corresponding first label. The first label is manual calibration. , Used to indicate whether the first image is a picture-in-picture image. The training process of the picture-in-picture image recognition model is shown in the embodiment shown in Fig. 2, which will not be repeated here.
如果待分类图像非画中画图像,为了确定待分类图像的色情等级类别,电子设备可以基于第一分类模型对待分类图像进行分析,确定待分类图像的色情等级类别。第一分类模型为采用第二数据集通过机器学习训练得到,第二数据集包括多组数据,每组数据均包括一个第二图像和对应的第二标签,第二标签为人工标定,用于表示第二图像的色情等级类别。这样,训练完成的第一分类模型也就可以根据图像特征对图像的色情等级类别进行识别,进而输出识别结果。If the image to be classified is not a picture-in-picture image, in order to determine the pornographic level category of the image to be classified, the electronic device may analyze the image to be classified based on the first classification model to determine the pornographic level category of the image to be classified. The first classification model is obtained through machine learning training using a second data set. The second data set includes multiple sets of data. Each set of data includes a second image and a corresponding second label. The second label is manually calibrated for Indicates the pornographic level category of the second image. In this way, the trained first classification model can also recognize the pornographic level category of the image according to the image feature, and then output the recognition result.
作为本申请实施例的一种实施方式,第一分类模型可以为卷积神经网络等神经网络模型,包括特征提取部分、全局平均池化层和输出层。特征提取部分用于提取待分类图像中的特征;全局平均池化层用于对特征提取部分提取的特征进行全局平均池化操作,得到待分类图像的全局特征;输出层用于对全局特征进行全连接处理,得到用于确定待分类图像的色情等级类别的类别向量。As an implementation of the embodiment of the present application, the first classification model may be a neural network model such as a convolutional neural network, including a feature extraction part, a global average pooling layer, and an output layer. The feature extraction part is used to extract features in the image to be classified; the global average pooling layer is used to perform global average pooling operations on the features extracted by the feature extraction part to obtain the global features of the image to be classified; the output layer is used to perform global features Full connection processing obtains a category vector used to determine the pornographic level category of the image to be classified.
作为本申请实施例的一种实施方式,第一分类模型的输出结果可以为概率向量,也就是待分类图像为各个预设色情等级类别的概率。作为本申请实施例的另一种实施方式,第一分类模型的输出结果可以为标签,该标签标识待分类图像的色情等级类别。例如,标签a表示待分类图像的类别为正常类别,标签b表示待分类图像的类别为低俗类别,标签c表示待分类图像的类别为色情类别。As an implementation manner of the embodiment of the present application, the output result of the first classification model may be a probability vector, that is, the probability that the image to be classified is each preset pornographic level category. As another implementation manner of the embodiment of the present application, the output result of the first classification model may be a label, which identifies the pornographic level category of the image to be classified. For example, label a indicates that the category of the image to be classified is a normal category, label b indicates that the category of the image to be classified is a vulgar category, and label c indicates that the category of the image to be classified is a pornographic category.
针对第一分类模型输出的是概率向量的情况而言,电子设备可以比较概率向量中各概率的大小,并将最大的概率对应的预设色情等级类别确定为待分类图像的色情等级类别。For the case where the first classification model outputs a probability vector, the electronic device can compare the probabilities in the probability vector, and determine the preset pornographic level category corresponding to the largest probability as the pornographic level category of the image to be classified.
举例来说,假设预设色情等级类别分别为正常类别、低俗类别及色情类别。第一分类模型输出的是概率向量,其中包括待分类图像为正常类别、低俗类别及色情类别的概率。如果第一分类模型输出的结果为{0.8,0.1,0.1},那么说明待分类图像为正常类别、低俗类别及色情类别的概率分别为0.8、0.1、0.1,那么电子设备便可以确定待分类图像的色情等级类别为概率最高的类别,即正常类别。For example, suppose that the preset pornographic level categories are normal, vulgar, and pornographic categories. The output of the first classification model is a probability vector, which includes the probability that the image to be classified is normal, vulgar, and pornographic. If the output result of the first classification model is {0.8, 0.1, 0.1}, then the probability that the image to be classified is a normal category, a vulgar category, and a pornographic category are 0.8, 0.1, 0.1, respectively, and the electronic device can determine the image to be classified The pornographic level category of is the category with the highest probability, that is, the normal category.
可见,在本实施例中,在待分类图像非画中画图像的情况下,电子设备可以将待分类图像输入第一分类模型,得到待分类图像的类别。这样,可以保证在待分类图像不为画中画图像的情况下可以准确确定待分类图像的类别。It can be seen that in this embodiment, when the image to be classified is not a picture-in-picture image, the electronic device can input the image to be classified into the first classification model to obtain the category of the image to be classified. In this way, it can be ensured that the category of the image to be classified can be accurately determined when the image to be classified is not a picture-in-picture image.
如果待分类图像为画中画图像,那么为了确定待分类图像的色情等级类别,电子设备可以基于第二分类模型对待分类图像进行分析,确定待分类图像的色情等级类别。第二分类模型是采用第二数据集通过机器学***均化层和卷积核后得到。If the image to be classified is a picture-in-picture image, in order to determine the pornographic level category of the image to be classified, the electronic device may analyze the image to be classified based on the second classification model to determine the pornographic level category of the image to be classified. The second classification model is obtained through machine learning training using the second data set, and the global averaging layer and convolution kernel are changed from the trained model.
作为本申请实施例的一种实施方式,第二分类模型可以为卷积神经网络等神经网络模型,包括特征提取部分、非全局平均池化层和卷积层。特征提取部分用于提取待分类图像中的特征;非全局平均池化层用于对特征提取部分提取的特征进行池化操作,得到池化结果;卷积层用于对池化结果进行卷积运算,得到用于确定待分类图像的色情等级类别的类别矩阵。As an implementation of the embodiment of the present application, the second classification model may be a neural network model such as a convolutional neural network, including a feature extraction part, a non-global average pooling layer, and a convolutional layer. The feature extraction part is used to extract features in the image to be classified; the non-global average pooling layer is used to pool the features extracted by the feature extraction part to obtain the pooling result; the convolution layer is used to convolve the pooling result Operate to obtain a category matrix used to determine the pornographic level category of the image to be classified.
第二分类模型可以是在第一分类模型训练好之后,通过对第一分类模型的全局平均池化层和输出层进行更改得到。第二分类模型也可以是训练得到的,在训练过程中,可以通过调整第二分类模型的网络参数,使得第二分类模型可以学习样本图像的图像特征与预设类别的对应关系。这样,训练完成的第二分类模型也就可以根据图像特征对图像的类别进行识别,进而输出识别结果。综合来讲,第二分类模型就是采用第二数据集通过机器学习训练,并对训练完成的模型进行变更得到的。The second classification model may be obtained by modifying the global average pooling layer and the output layer of the first classification model after the first classification model is trained. The second classification model may also be obtained through training. During the training process, the network parameters of the second classification model may be adjusted so that the second classification model can learn the correspondence between the image features of the sample image and the preset category. In this way, the trained second classification model can also recognize the image category according to the image characteristics, and then output the recognition result. In general, the second classification model is obtained by using the second data set through machine learning training and changing the trained model.
作为本申请实施例的一种实施方式,第二分类模型中的全局平均池化层,是由基于第二数据集训练得到的模型的全局平均池化层中用于判断是否为全局池化的参数变更为否后得到;第二分类模型中的卷积层,是由基于第二数 据集训练得到的模型的输出层变更为核函数大小为1×1的卷积层后得到。As an implementation of the embodiments of the present application, the global average pooling layer in the second classification model is used to determine whether it is global pooling from the global average pooling layer of the model trained on the second data set Obtained after the parameter is changed to No; the convolutional layer in the second classification model is obtained by changing the output layer of the model trained on the second data set to a convolutional layer with a kernel function size of 1×1.
由于将待分类图像输入第二分类模型之前往往会对图像进行放大处理,所以为了使第二分类模型可以处理任意大小的图像,可以将训练好的第一分类模型中的全局平均池化层中用于判断是否为全局池化的参数设置为否后得到第二分类模型中的非全局平均池化层。具体的,可以将“是否为全局池化”这一参数由True改为False。Since the image to be classified is often enlarged before entering the second classification model, in order to enable the second classification model to process images of any size, the global average pooling layer in the trained first classification model can be used After the parameter for determining whether it is global pooling is set to No, the non-global average pooling layer in the second classification model is obtained. Specifically, the parameter "whether it is global pooling" can be changed from True to False.
第二分类模型需要提取待分类图像中子区域的图像特征,并输出对应的类别矩阵,所以为了使第第二分类模型可以提取待分类图像中子区域的图像特征,可以将第一分类模型的输出层修改为1×1的卷积层。The second classification model needs to extract the image features of the sub-regions in the image to be classified and output the corresponding category matrix. Therefore, in order for the second classification model to extract the image features of the sub-regions in the image to be classified, the image features of the first classification model can be The output layer is modified to a 1×1 convolutional layer.
可见,在本实施例中,为了适应分类需要,可以对训练完成的第一分类模型进行适当修改,便可以获得第二分类模型,无需重新训练第二分类模型,降低深度学习模型的训练耗时,进一步提高图像分类的效率。It can be seen that in this embodiment, in order to meet the needs of classification, the first classification model after training can be appropriately modified to obtain the second classification model without retraining the second classification model, reducing the training time of the deep learning model , To further improve the efficiency of image classification.
第二分类模型的输出结果为多维的类别矩阵,也就是待分类图像为各个预设类别的概率组成的矩阵。第二分类模型的输出结果还可以为标签,该标签标识待分类图像的色情等级类别。例如,标签A表示待分类图像的类别为正常类别,标签B表示待分类图像的类别为低俗类别,标签C表示待分类图像的类别为色情类别。The output result of the second classification model is a multi-dimensional category matrix, that is, a matrix composed of the probability that the image to be classified is each preset category. The output result of the second classification model may also be a label, which identifies the pornographic level category of the image to be classified. For example, label A indicates that the category of the image to be classified is a normal category, label B indicates that the category of the image to be classified is a vulgar category, and label C indicates that the category of the image to be classified is a pornographic category.
类别矩阵中元素的数量与预设类别、第二分类模型的网络结构、网络参数以及输入第二分类模型前对待分类图像的处理等有关。第二分类模型在对待分类图像进行处理时,可以提取每个子区域的图像特征,进而根据其包括的子区域的图像特征与类别的对应关系,确定每个子区域属于预设类别的概率,组成概率向量。所有子区域对应的概率向量便组成了上述多维的类别矩阵,该类别矩阵中每个元素为对应位置的子区域的概率向量。The number of elements in the category matrix is related to the preset category, the network structure of the second classification model, network parameters, and the processing of the image to be classified before inputting the second classification model. When the second classification model processes the image to be classified, it can extract the image features of each sub-region, and then determine the probability that each sub-region belongs to the preset category according to the correspondence between the image features of the sub-regions included and the category, and the composition probability vector. The probability vectors corresponding to all the sub-regions constitute the above-mentioned multi-dimensional category matrix, and each element in the category matrix is the probability vector of the sub-region at the corresponding position.
例如,预设类别为正常类别、低俗类别及色情类别,那么第二分类模型的输出结果为t×t×3的类别矩阵Z,其中t的具体值与第二分类模型的网络结构、网络参数以及输入第二分类模型前对待分类图像的处理等有关,t×t为待分类图像的子区域的数量。类别矩阵Z中每个元素(t a,t b)对应待分类图像中的相应位置的一个子区域,其中,a∈(1,t),b∈(1,t)。也就是说,相当于将待分类图像划分为t×t个子区域,每个子区域对应类别矩阵Z中的一个元素。每个元素(t a,t b)对应一个三维概率向量(p1,p2,p3),其中,p1表示(t a,t b)对应位置的子区域的类别为正常类别的概率,p2表示(t a,t b) 对应的子区域的类别为低俗类别的概率,p3表示(t a,t b)对应的子区域的类别为色情类别的概率。 For example, if the preset categories are normal, vulgar, and pornographic categories, the output result of the second classification model is a t×t×3 category matrix Z, where the specific value of t is related to the network structure and network parameters of the second It is related to the processing of the image to be classified before inputting the second classification model, and t×t is the number of sub-regions of the image to be classified. Each element (t a , t b ) in the category matrix Z corresponds to a sub-region of the corresponding position in the image to be classified, where a ∈ (1, t), b ∈ (1, t). In other words, it is equivalent to dividing the image to be classified into t×t subregions, and each subregion corresponds to an element in the category matrix Z. Each element (t a , t b ) corresponds to a three-dimensional probability vector (p1, p2, p3), where p1 represents the probability that the category of the sub-region corresponding to (t a , t b ) is a normal category, and p2 represents ( t a , t b ) the probability that the category of the corresponding sub-region is a vulgar category, p3 represents the probability that the category of the sub-region corresponding to (t a , t b ) is a pornographic category.
通过上述方式,电子设备可以确定待分类图像中所有子区域的类别,进而,电子设备便可以根据每个子区域的类别确定待分类图像的色情等级类别。作为一种实施方式,如果子区域的类别中包括色情类别,那么便将待分类图像的色情等级类别确定为色情类别。当然也可以采用其他方式确定待分类图像的色情等级类别,例如,将所有子区域的类别中数量最多的类别确定为待分类图像的色情等级类别,这都是合理的,在此不做具体限定。In the above manner, the electronic device can determine the categories of all sub-regions in the image to be classified, and further, the electronic device can determine the pornographic level category of the image to be classified according to the category of each sub-region. As an implementation manner, if the category of the subregion includes a pornographic category, then the pornographic level category of the image to be classified is determined as the pornographic category. Of course, other methods can also be used to determine the pornographic level category of the image to be classified. For example, the category with the largest number of categories in all sub-regions is determined as the pornographic level category of the image to be classified. This is reasonable and no specific limitation is made here. .
作为本申请实施例的一种实施方式,如图4所示,上述S304具体可以通过如下步骤实现:As an implementation manner of the embodiment of the present application, as shown in FIG. 4, the above-mentioned S304 can be specifically implemented through the following steps:
S401,对待分类图像按照预定比例进行放大。S401, the image to be classified is enlarged according to a predetermined ratio.
由于第二分类模型需要对待分类图像的子区域进行图像特征的提取,如果子区域较小会降低第二分类模型处理的准确度,所以为了保证第二分类模型输出结果的准确率,可以将待分类图像进行放大处理,例如,将待分类图像的长宽各放大K倍等,其中,K的具体值可以根据实际分类要求及待分类图像的大小等因素预先设定,在此不做具体限定。Since the second classification model needs to extract image features from the sub-regions of the image to be classified, if the sub-region is small, the accuracy of the second classification model processing will be reduced. Therefore, in order to ensure the accuracy of the output results of the second classification model, you can The classified image is enlarged, for example, the length and width of the image to be classified are enlarged by K times, etc., where the specific value of K can be preset according to the actual classification requirements and the size of the image to be classified, which is not specifically limited here .
S402,将放大后的待分类图像输入第二分类模型,得到类别矩阵。S402: Input the enlarged image to be classified into a second classification model to obtain a class matrix.
其中,类别矩阵包括多组元素,每组元素对应待分类图像的一个子区域,每组元素中的每个元素代表子区域对应的一个预设类别的概率。The category matrix includes multiple sets of elements, each set of elements corresponds to a subregion of the image to be classified, and each element in each set of elements represents the probability of a preset category corresponding to the subregion.
将待分类图像进行放大处理后,电子设备便可以将进行放大处理后的待分类图像输入第二分类模型,进而,第二分类模型便可以提取放大处理后的待分类图像的子区域的图像特征,图像特征提取更加准确。After the image to be classified is enlarged, the electronic device can input the enlarged image to be classified into the second classification model, and the second classification model can extract the image features of the sub-regions of the enlarged image to be classified , Image feature extraction is more accurate.
可见,在本实施例中,在上述将待分类图像输入第二分类模型之前,电子设备可以将待分类图像进行放大处理,可以使第二分类模型能够更准确地确定待分类图像的类别。It can be seen that, in this embodiment, before inputting the image to be classified into the second classification model, the electronic device can enlarge the image to be classified, so that the second classification model can more accurately determine the category of the image to be classified.
S403,将类别矩阵的每组元素中值最大的元素对应的预设类别,确定为该组元素代表的待分类图像的子区域的类别。S403: Determine the preset category corresponding to the element with the largest value in each group of elements of the category matrix as the category of the subregion of the image to be classified represented by the group of elements.
由于类别矩阵中每个元素由相应子区域为预设类别的概率组成的概率向量,所以电子设备可以将每个元素中的最大值对应的预设类别,确定为该元素对应的待分类图像的子区域的类别。Since each element in the category matrix is a probability vector composed of the probability that the corresponding subregion is the preset category, the electronic device can determine the preset category corresponding to the maximum value of each element as the value of the image to be classified corresponding to the element. The category of the subarea.
例如,多维矩阵中元素(t a,t b)对应的三维概率向量为(p1,p2,p3),那么电子设备便可以将p1、p2及p3中至最大的一个对应的预设类别确定为(t a,t b)对应位置的子区域的类别。 For example, the three-dimensional probability vector corresponding to the element (t a , t b ) in the multi-dimensional matrix is (p1, p2, p3), then the electronic device can determine the preset category corresponding to the largest one of p1, p2, and p3 as (t a , t b ) The category of the sub-region corresponding to the location.
S404,根据每个子区域的类别,确定所述待分类图像的色情等级类别。S404: Determine the pornographic level category of the image to be classified according to the category of each subregion.
通过上述方式电子设备可以确定待分类图像中所有子区域的类别,进而,电子设备便可以根据每个子区域的类别确定待分类图像的色情等级类别。作为一种实施方式,如果子区域的类别中包括色情类别,那么便将待分类图像的色情等级类别确定为色情类别。当然也可以采用其他方式确定待分类图像的色情等级类别,例如,将所有子区域的类别中数量最多的类别确定为待分类图像的色情等级类别,这都是合理的,在此不做具体限定。In the foregoing manner, the electronic device can determine the categories of all sub-regions in the image to be classified, and further, the electronic device can determine the pornographic level category of the image to be classified according to the category of each sub-region. As an implementation manner, if the category of the subregion includes a pornographic category, then the pornographic level category of the image to be classified is determined as the pornographic category. Of course, other methods can also be used to determine the pornographic level category of the image to be classified. For example, the category with the largest number of categories in all sub-regions is determined as the pornographic level category of the image to be classified. This is reasonable and no specific limitation is made here. .
可见,在本实施例中,上述第二分类模型的输出结果为多维的类别矩阵,电子设备可以将类别矩阵中每个元素中的最大值对应的预设类别,确定为该元素对应的待分类图像的子区域的类别,进而根据每个子区域的类别确定待分类图像的色情等级类别。第二分类模型可以提取待分类图像中子区域的图像特征,进而输出表示每个子区域类别的类别矩阵,这样电子设备便可以准确地确定每个子区域的类别以及待分类图像的色情等级类别。It can be seen that, in this embodiment, the output result of the above-mentioned second classification model is a multi-dimensional category matrix, and the electronic device can determine the preset category corresponding to the maximum value of each element in the category matrix as the element to be classified The category of the sub-region of the image, and then the pornographic level category of the image to be classified is determined according to the category of each sub-region. The second classification model can extract the image features of the subregions in the image to be classified, and then output a category matrix representing the category of each subregion, so that the electronic device can accurately determine the category of each subregion and the pornographic level category of the image to be classified.
作为本申请实施例的一种实施方式,如图5所示,上述S404具体可以通过如下步骤实现:As an implementation manner of the embodiment of the present application, as shown in FIG. 5, the foregoing S404 may be specifically implemented through the following steps:
S501,根据每个子区域的类别,分别确定属于各异常类别的子区域的数量与子区域总数量的比值。S501: According to the category of each sub-region, respectively determine the ratio of the number of sub-regions belonging to each abnormal category to the total number of sub-regions.
确定了待分类图像中子区域的类别后,电子设备可以根据每个子区域的类别确定属于各异常类别的子区域的数量与子区域总数量的比值,这里提及的异常类别可以为色情类别、低俗类别、违法类别等。例如,子区域总数量为100,其中,类别为低俗类别的子区域的数量为35,类别为色情类别的子区域的数量为40,其余子区域的类别为正常类别,那么,低俗类别的子区域的数量与子区域总数量的比值为35/100=0.35,色情类别的子区域的数量与子区域总数量的比值为40/100=0.4。After determining the category of the sub-region in the image to be classified, the electronic device can determine the ratio of the number of sub-regions belonging to each abnormal category to the total number of sub-regions according to the category of each sub-region. The abnormal categories mentioned here can be pornographic categories, Vulgar categories, illegal categories, etc. For example, the total number of sub-regions is 100. Among them, the number of sub-areas in the vulgar category is 35, the number of sub-regions in the porn category is 40, and the other sub-areas are in normal categories. The ratio of the number of regions to the total number of subregions is 35/100=0.35, and the ratio of the number of subregions of the pornographic category to the total number of subregions is 40/100=0.4.
S502,分别判断各比值是否小于预设阈值。S502: Determine whether each ratio is less than a preset threshold.
确定了上述各比值后,电子设备便可以判断各比值是否小于预设阈值,针对不同的异常类别,所设置的预设阈值可以相同也可以不同,具体的预设 阈值可以根据实际分类要求等因素设定。After determining the above ratios, the electronic device can determine whether each ratio is less than the preset threshold. For different abnormal categories, the preset thresholds can be the same or different. The specific preset thresholds can be based on actual classification requirements and other factors set up.
如果对某一异常类型的要求比较严格,那么对应的预设阈值可以设置的低一些。为了杜绝某一异常类型的内容,则可以将对应的预设阈值设置为0。如果对某一异常类型的要求比较宽松,那么对应的预设阈值可以设置的高一些。If the requirements for a certain type of abnormality are stricter, then the corresponding preset threshold can be set lower. In order to eliminate a certain type of abnormal content, the corresponding preset threshold can be set to 0. If the requirements for a certain abnormal type are relatively loose, the corresponding preset threshold can be set higher.
S503,如果各比值均小于预设阈值,则确定待分类图像的色情等级类别为正常类别。S503: If the ratios are all smaller than the preset threshold, it is determined that the pornographic level category of the image to be classified is a normal category.
如果各比值均小于预设阈值,那么说明待分类图像中子区域为异常类别的数量非常小,那么便可以确定待分类图像的色情等级类别为正常类别。If the ratios are all less than the preset threshold, it means that the number of abnormal categories in the sub-regions in the image to be classified is very small, and the pornographic level category of the image to be classified can be determined as the normal category.
S504,如果存在大于预设阈值的比值,则比较各比值的大小,确定待分类图像的色情等级类别为比值最大的子区域的类别。S504: If there is a ratio greater than a preset threshold, compare the ratios to determine that the pornographic level category of the image to be classified is the category of the sub-region with the largest ratio.
如果有任何一个比值大于预设阈值,那么为了进一步确定待分类图像的色情等级类别为哪一种异常类别,电子设备可以比较上述各比值的大小,查找出哪个比值最大,则说明待分类图像的色情等级类别为比值最大的子区域的类别。例如,通过比较,发现为低俗类别的比值最大,说明低俗类别的子区域的数量多于其他异常类别的子区域的数量,那么便可以确定待分类图像的色情等级类别为低俗类别。If any of the ratios is greater than the preset threshold, then in order to further determine which abnormal category the pornographic level category of the image to be classified is, the electronic device can compare the above-mentioned ratios and find which ratio is the largest. The pornographic level category is the category of the sub-region with the largest ratio. For example, through comparison, it is found that the ratio of the vulgar category is the largest, indicating that the number of sub-regions of the vulgar category is more than the number of sub-regions of other abnormal categories, and then the pornographic level category of the image to be classified can be determined as the vulgar category.
为了检测图像是否包括低俗色情内容,作为本申请实施例的一种实施方式,上述预设类别可以包括正常类别、低俗类别及色情类别。In order to detect whether the image includes vulgar pornographic content, as an implementation manner of the embodiment of the present application, the foregoing preset categories may include a normal category, a vulgar category, and a pornographic category.
电子设备可以根据每个子区域的类别分别确定属于低俗类别的子区域与子区域总数量的第一比值,以及属于色情类别的子区域与子区域总数量的第二比值,进而根据第一比值及第二比值确定待分类图像的类别为正常类别、低俗类别或者色情类别。对于画中画图像图像,电子设备可以准确地识别低俗和色情图像,分类准确率和效率得到提高。The electronic device can determine the first ratio of the sub-areas belonging to the vulgar category to the total number of sub-areas, and the second ratio of the sub-areas belonging to the pornographic category to the total number of sub-areas, and then according to the first ratio and The second ratio determines whether the category of the image to be classified is a normal category, a vulgar category or a pornographic category. For picture-in-picture images, electronic equipment can accurately identify vulgar and pornographic images, and the classification accuracy and efficiency are improved.
第二分类模型除了如上述可以是通过对训练好的模型进行更改得到,还可以是预先训练得到的。作为本申请实施例的一种实施方式,如图6所示,第二分类模型的训练方式,可以包括如下步骤:The second classification model may be obtained by modifying the trained model as described above, or may be obtained by pre-training. As an implementation manner of the embodiment of the present application, as shown in FIG. 6, the training method of the second classification model may include the following steps:
S601,获取神经网络模型及第二数据集。S601: Obtain a neural network model and a second data set.
其中,神经网络模型包括特征提取部分、非全局平均池化层和卷积层,第二数据集包括多组数据,每组数据均包括一个第二图像和对应的第二标签, 第二标签是人工标定的,用于表示第二图像的色情等级类别。可以将第二数据集按照K:1的比例分成训练集和测试集,利用训练集训练神经网络模型。Among them, the neural network model includes a feature extraction part, a non-global average pooling layer, and a convolutional layer. The second data set includes multiple sets of data. Each set of data includes a second image and a corresponding second label. The second label is Manually calibrated, used to indicate the pornographic level category of the second image. The second data set can be divided into a training set and a test set according to a K:1 ratio, and the training set is used to train the neural network model.
S602,从第二数据集中获取一个第二图像,并将第二图像输入神经网络模型,通过特征提取部分、非全局平均池化层和卷积层的顺序运算,得到类别矩阵。S602: Obtain a second image from the second data set, and input the second image into the neural network model, and obtain a category matrix through sequential operations of the feature extraction part, the non-global average pooling layer, and the convolutional layer.
获取到第二数据集后,可以从第二数据集中获取一个第二图像,并将该第二图像输入上述神经网络模型,神经网络模型便可以对该第二图像进行处理,进而得到输出结果,可以理解的是,输出结果是类别矩阵,可以表示第二图像的类别。After the second data set is obtained, a second image can be obtained from the second data set, and the second image can be input into the aforementioned neural network model, and the neural network model can process the second image to obtain the output result. It is understandable that the output result is a category matrix, which can represent the category of the second image.
S603,将类别矩阵输入预设的损失函数,得到概率分布向量。S603: Input the category matrix into a preset loss function to obtain a probability distribution vector.
将类别矩阵输入损失函数,可以得到概率分布向量p。Input the category matrix into the loss function to get the probability distribution vector p.
Figure PCTCN2020092898-appb-000004
Figure PCTCN2020092898-appb-000004
其中,向量X的大小为n,p i和X i分别为p和X的第i个元素,i∈(1,n),n为色情等级类别的种类数量。 Wherein the magnitude of the vector is n-X, X i and p i are the i-th element of p and X, i∈ (1, n), n is the number of types of categories of sexual level.
例如,色情等级类别包括正常类别、低俗类别及色情类别,那么n即为3,假设第二图像T对应的类别向量X为{1,3,6},那么第二图像T对应的概率向量
Figure PCTCN2020092898-appb-000005
For example, the pornographic category includes normal category, vulgar category and pornographic category, then n is 3, assuming that the category vector X corresponding to the second image T is {1, 3, 6}, then the probability vector corresponding to the second image T
Figure PCTCN2020092898-appb-000005
S604,根据概率分布向量及第二图像对应的第二标签,确定损失量。S604: Determine the loss amount according to the probability distribution vector and the second label corresponding to the second image.
接下来,电子设备便可以根据L=-log(p y),计算第二图像对应的损失量L,其中,p y为该第二图像对应的概率向量p中的元素。 Next, the electronic device can calculate the loss L corresponding to the second image according to L=-log(p y ), where p y is an element in the probability vector p corresponding to the second image.
例如,色情等级类别包括正常类别、低俗类别及色情类别,该第二图像对应的第二标签为色情类别,那么p y即为该第二图像对应的概率向量p中的元素p 3For example, the pornographic level category includes a normal category, a vulgar category and a pornographic category, and the second label corresponding to the second image is a pornographic category, then p y is the element p 3 in the probability vector p corresponding to the second image.
S605,判断损失量对应的损失函数是否收敛,若未收敛,则执行S606,若收敛,则确定完成训练,得到第二分类模型。S605: Determine whether the loss function corresponding to the loss amount has converged, if it does not converge, execute S606, and if it converges, it is determined to complete the training and obtain the second classification model.
S606,根据损失量,更新神经网络模型的网络参数,返回执行S602-S605。S606: Update the network parameters of the neural network model according to the loss amount, and return to execute S602-S605.
损失量L与p y的值成反比,也就是说,更新神经网络模型的网络参数使 得损失量L越来越小,使得p y的值越来越大,甚至趋近于1,这样能够使目标深度学习模型输出结果中某个类别的概率趋近于1,也就使得分类结果越来越准确。 The loss amount L is inversely proportional to the value of p y , that is to say, updating the network parameters of the neural network model makes the loss amount L smaller and smaller, making the value of p y larger and larger, even approaching 1, which can make The probability of a certain category in the output result of the target deep learning model approaches 1, which makes the classification result more and more accurate.
具体的,可以采用反向传播算法计算损失量L对网络参数的倒数
Figure PCTCN2020092898-appb-000006
其中,W为网络参数。进而,采用随机梯度下降算法更新网络参数。即根据下列算式计算新的网络参数:
Specifically, the backpropagation algorithm can be used to calculate the reciprocal of the loss L to the network parameters
Figure PCTCN2020092898-appb-000006
Among them, W is a network parameter. Furthermore, the stochastic gradient descent algorithm is used to update the network parameters. That is, calculate the new network parameters according to the following formula:
Figure PCTCN2020092898-appb-000007
Figure PCTCN2020092898-appb-000007
其中,W *为新的网络参数,α预设调整参数,其具体值可以根据训练要求、目标神经网络模型的准确率等因素设定,例如,可以为0.001、0.0015、0.002等,在此不做具体限定。 Among them, W * is a new network parameter, and α is a preset adjustment parameter. Its specific value can be set according to factors such as training requirements and the accuracy of the target neural network model. For example, it can be 0.001, 0.0015, 0.002, etc. Make specific restrictions.
通过判断损失函数是否收敛来确定神经网络模型是否已经达到使用要求,如果损失函数收敛,则说明当前的神经网络模型的输出结果准确度已经达到要求,可以对图像进行准确地分类,所以此时便可以停止训练,也就得到了第二分类模型。Determine whether the neural network model has met the requirements by judging whether the loss function converges. If the loss function converges, it means that the accuracy of the output result of the current neural network model has reached the requirements, and the image can be accurately classified, so at this time The training can be stopped, and the second classification model is obtained.
可见,在本实施例中,训练的过程保证第二分类模型的输出结果准确,第二分类模型的层数更深,可以提取更加准确精确度更高的图像特征,分类效果更好、泛化能力强、鲁棒性高。It can be seen that, in this embodiment, the training process ensures that the output result of the second classification model is accurate. The second classification model has a deeper number of layers and can extract more accurate and precise image features, with better classification effects and generalization capabilities. Strong and robust.
对于第一分类模型而言,模型的结构与画中画图像分类模型相同,都是特征提取部分、全局平均池化层和输出层,只是输出的分类结果不相同,具体的训练过程可以参照图2所示的画中画图像分类模型的训练过程,采用反向传播算法和随机梯度下降算法更新模型的权重,直至收敛,具体的过程这里不再赘述。For the first classification model, the structure of the model is the same as the picture-in-picture image classification model, which is the feature extraction part, the global average pooling layer and the output layer, but the output classification results are different. The specific training process can refer to the figure The training process of the picture-in-picture image classification model shown in 2 uses the backpropagation algorithm and the stochastic gradient descent algorithm to update the weight of the model until convergence. The specific process is not repeated here.
相应于上述识别画中画图像的方法,本申请实施例还提供了一种识别画中画图像的装置。Corresponding to the foregoing method for recognizing a picture-in-picture image, an embodiment of the present application also provides a device for recognizing a picture-in-picture image.
下面对本申请实施例所提供的一种识别画中画图像的装置进行介绍。The following describes a device for recognizing picture-in-picture images provided by embodiments of the present application.
如图7所示,一种识别画中画图像的装置可以包括:As shown in Fig. 7, an apparatus for recognizing picture-in-picture images may include:
获取模块710,用于获取待识别图像;The obtaining module 710 is used to obtain the image to be recognized;
画中画图像识别模块720,用于基于画中画图像识别模型对待识别图像进行分析,确定待识别图像是否为画中画图像,其中,画中画图像识别模型为采用第一数据集通过机器学习训练得到,第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,第一标签用于表示第一图像是否为画中画图像。The picture-in-picture image recognition module 720 is used to analyze the image to be recognized based on the picture-in-picture image recognition model to determine whether the image to be recognized is a picture-in-picture image. The picture-in-picture image recognition model uses the first data set to pass the machine Through learning and training, the first data set includes multiple sets of data, and each set of data includes a first image and a corresponding first label. The first label is used to indicate whether the first image is a picture-in-picture image.
作为本申请实施例的一种实施方式,画中画图像识别模块720,可以包括:As an implementation manner of the embodiment of the present application, the picture-in-picture image recognition module 720 may include:
第一识别单元,用于将待识别图像输入画中画图像识别模型,得到第一输出结果,其中,第一输出结果为二维向量,第一输出结果包括第一分量和第二分类,第一分量表示待识别图像为画中画图像的概率,第二分量表示待识别图像不为画中画图像的概率;The first recognition unit is used to input the image to be recognized into the picture-in-picture image recognition model to obtain a first output result, where the first output result is a two-dimensional vector, and the first output result includes a first component and a second classification. One component represents the probability that the image to be recognized is a picture-in-picture image, and the second component represents the probability that the image to be recognized is not a picture-in-picture image;
第一判断单元,用于判断第一分量是否大于第二分量,如果大于,则确定待识别图像为画中画图像。The first determining unit is used to determine whether the first component is greater than the second component, and if it is greater, determine that the image to be recognized is a picture-in-picture image.
应用本申请实施例,在获取到待识别图像后,基于画中画图像识别模型对待识别图像进行分析,确定待识别图像是否为画中画图像。画中画图像识别模型是采用第一数据集通过机器学习训练得到,第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,第一标签用于表示第一图像是否为画中画图像。可见,画中画图像识别模型是一种神经网络模型,能够实现端到端的输出待识别图像是否为画中画图像的识别结果,实现了对画中画图像的自动识别。Using the embodiment of the application, after the image to be recognized is acquired, the image to be recognized is analyzed based on the picture-in-picture image recognition model to determine whether the image to be recognized is a picture-in-picture image. The picture-in-picture image recognition model is obtained through machine learning training using a first data set. The first data set includes multiple sets of data. Each set of data includes a first image and a corresponding first label. The first label is used to represent the first image. Whether an image is a picture-in-picture image. It can be seen that the picture-in-picture image recognition model is a neural network model, which can realize the end-to-end output of the recognition result of whether the image to be recognized is a picture-in-picture image, and realize the automatic recognition of the picture-in-picture image.
相应于上述画中画图像识别模型的训练方法,本申请实施例还提供了一种画中画图像识别模型的训练装置。Corresponding to the above-mentioned training method of the picture-in-picture image recognition model, an embodiment of the present application also provides a device for training the picture-in-picture image recognition model.
下面对本申请实施例所提供的一种画中画图像识别模型的训练装置进行介绍。The following describes a training device for a picture-in-picture image recognition model provided by an embodiment of the present application.
如图8所示,一种画中画图像识别模型的训练装置可以包括:As shown in Fig. 8, a training device for a picture-in-picture image recognition model may include:
获取模块810,用于获取预先创建的初始神经网络模型;The obtaining module 810 is used to obtain the pre-created initial neural network model;
训练模块820,用于基于第一数据集对初始神经网络模型进行训练,得到画中画图像识别模型,其中,第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,第一标签用于表示第一图像是否为画中画图像。The training module 820 is used to train the initial neural network model based on the first data set to obtain the picture-in-picture image recognition model, where the first data set includes multiple sets of data, and each set of data includes a first image and a corresponding The first label, the first label is used to indicate whether the first image is a picture-in-picture image.
作为本申请实施例的一种实施方式,训练模块820,可以包括:As an implementation manner of the embodiment of the present application, the training module 820 may include:
第二识别单元,用于从第一数据集中获取一个第一图像,并将第一图像输入至初始神经网络模型,得到第二输出结果,其中,第二输出结果为二维向量,第二输出结果包括第三分量和第四分量,第三分量表示第一图像为画中画图像的概率,第四分量表示第一图像不为画中画图像的概率;The second recognition unit is used to obtain a first image from the first data set, and input the first image into the initial neural network model to obtain a second output result, where the second output result is a two-dimensional vector, and the second output The result includes a third component and a fourth component. The third component represents the probability that the first image is a picture-in-picture image, and the fourth component represents the probability that the first image is not a picture-in-picture image;
损失计算单元,用于根据第二输出结果和第一图像对应的第一标签,确定损失量,其中,损失量表示第二输出结果与第一标签之间的差异;A loss calculation unit, configured to determine the loss amount according to the second output result and the first label corresponding to the first image, where the loss amount represents the difference between the second output result and the first label;
权重更新单元,用于根据损失量,更新初始神经网络模型中的权重参数;The weight update unit is used to update the weight parameters in the initial neural network model according to the loss;
收敛条件判断单元,用于在损失量对应的损失函数收敛时,向第二识别单元发送停止指令,以使得第二识别单元停止将第一数据集中的第一图像输入至初始神经网络模型。The convergence condition judgment unit is configured to send a stop instruction to the second recognition unit when the loss function corresponding to the loss amount converges, so that the second recognition unit stops inputting the first image in the first data set to the initial neural network model.
应用本申请实施例,通过构建初始神经网络模型,基于第一数据集对初始神经网络模型进行训练,得到画中画图像识别模型。通过训练得到画中画图像识别模型是一种神经网络模型,能够实现端到端的输出待识别图像是否为画中画图像的识别结果。Using the embodiments of the present application, by constructing an initial neural network model, the initial neural network model is trained based on the first data set to obtain a picture-in-picture image recognition model. The picture-in-picture image recognition model obtained through training is a neural network model, which can realize end-to-end output of the recognition result of whether the image to be recognized is a picture-in-picture image.
相应于上述图像色情等级分类方法,本申请实施例还提供了一种图像色情等级分类装置。Corresponding to the foregoing image pornography level classification method, an embodiment of the present application also provides an image pornography level classification device.
下面对本申请实施例所提供的一种图像色情等级分类装置进行介绍。The following describes a device for classifying image pornography provided by an embodiment of the present application.
如图9所示,一种图像色情等级分类装置可以包括:As shown in Fig. 9, an image pornography grade classification device may include:
获取模块910,用于获取待分类图像;The obtaining module 910 is used to obtain the image to be classified;
画中画图像识别模块920,用于基于画中画图像识别模型对待分类图像进行分析,确定待分类图像是否为画中画图像,其中,画中画图像识别模型为采用第一数据集通过机器学习训练得到,第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,第一标签用于表示第一图像是否为画中画图像;The picture-in-picture image recognition module 920 is used to analyze the image to be classified based on the picture-in-picture image recognition model to determine whether the image to be classified is a picture-in-picture image. The picture-in-picture image recognition model adopts the first data set to pass the machine Through learning and training, the first data set includes multiple sets of data, and each set of data includes a first image and a corresponding first label, and the first label is used to indicate whether the first image is a picture-in-picture image;
第一类别确定模块930,用于当待分类图像非画中画图像时,则基于第一分类模型对待分类图像进行分析,确定待分类图像的色情等级类别,其中,第一分类模型为采用第二数据集通过机器学习训练得到,第二数据集包括多组数据,每组数据均包括一个第二图像和对应的第二标签,第二标签用于表示第二图像的色情等级类别;The first category determining module 930 is used to analyze the image to be classified based on the first classification model when the image to be classified is not a picture-in-picture image to determine the pornographic level category of the image to be classified, wherein the first classification model is The second data set is obtained through machine learning training, the second data set includes multiple sets of data, each set of data includes a second image and a corresponding second label, and the second label is used to indicate the pornographic level category of the second image;
第二类别确定模块940,用于当待分类图像为画中画图像时,则基于第二 分类模型对待分类图像进行分析,确定待分类图像的色情等级类别,其中,第二分类模型为采用第二数据集通过机器学***均池化层和卷积核后得到。The second category determining module 940 is configured to analyze the image to be classified based on the second classification model to determine the pornographic level category of the image to be classified when the image to be classified is a picture-in-picture image, wherein the second classification model is The second data set is obtained through machine learning training, and the global average pooling layer and convolution kernel are changed from the trained model.
作为本申请实施例的一种实施方式,第一分类模型包括特征提取部分、全局平均池化层和输出层;特征提取部分用于提取待分类图像中的特征;全局平均池化层用于对特征提取部分提取的特征进行全局平均池化操作,得到待分类图像的全局特征;输出层用于对全局特征进行全连接处理,得到用于确定待分类图像的色情等级类别的类别向量。As an implementation of the embodiments of the present application, the first classification model includes a feature extraction part, a global average pooling layer, and an output layer; the feature extraction part is used to extract features in the image to be classified; the global average pooling layer is used to compare The features extracted by the feature extraction part are subjected to a global average pooling operation to obtain the global features of the image to be classified; the output layer is used to perform full connection processing on the global features to obtain a category vector used to determine the pornographic level category of the image to be classified.
作为本申请实施例的一种实施方式,第二分类模型中的全局平均池化层,是由基于第二数据集训练得到的模型的全局平均池化层中用于判断是否为全局池化的参数变更为否后得到;第二分类模型中的卷积层,是由基于第二数据集训练得到的模型的输出层变更为核函数大小为1×1的卷积层后得到。As an implementation of the embodiments of the present application, the global average pooling layer in the second classification model is used to determine whether it is global pooling from the global average pooling layer of the model trained on the second data set Obtained after the parameter is changed to No; the convolutional layer in the second classification model is obtained by changing the output layer of the model trained on the second data set to a convolutional layer with a kernel function size of 1×1.
作为本申请实施例的一种实施方式,第二分类模型包括特征提取部分、非全局平均池化层和卷积层;特征提取部分用于提取待分类图像中的特征;非全局平均池化层用于对特征提取部分提取的特征进行池化操作,得到池化结果;卷积层用于对池化结果进行卷积运算,得到用于确定待分类图像的色情等级类别的类别矩阵。As an implementation of the embodiments of the present application, the second classification model includes a feature extraction part, a non-global average pooling layer, and a convolutional layer; the feature extraction part is used to extract features in the image to be classified; the non-global average pooling layer It is used to perform a pooling operation on the features extracted by the feature extraction part to obtain a pooling result; the convolutional layer is used to perform a convolution operation on the pooling result to obtain a category matrix used to determine the pornographic level category of the image to be classified.
作为本申请实施例的一种实施方式,第二类别确定模块940,可以包括:As an implementation manner of the embodiment of the present application, the second category determining module 940 may include:
图像放大单元,用于对待分类图像按照预定比例进行放大;The image enlargement unit is used to enlarge the image to be classified according to a predetermined ratio;
类别概率生成单元,用于将放大后的待分类图像输入第二分类模型,得到类别矩阵,其中,类别矩阵包括多组元素,每组元素对应待分类图像的一个子区域,每组元素中的每个元素代表子区域对应的一个预设类别的概率;The category probability generating unit is used to input the enlarged image to be classified into the second classification model to obtain a category matrix, where the category matrix includes multiple groups of elements, each group of elements corresponds to a sub-region of the image to be classified, and the Each element represents the probability of a preset category corresponding to the subregion;
子区域类别确认单元,用于将类别矩阵的每组元素中值最大的元素对应的预设类别,确定为该组元素代表的待分类图像的子区域的类别;The sub-region category confirmation unit is used to determine the preset category corresponding to the element with the largest value in each group of elements of the category matrix as the category of the sub-region of the image to be classified represented by the group of elements;
色情等级类别确认单元,用于根据每个子区域的类别,确定待分类图像的色情等级类别。The pornographic level category confirmation unit is used to determine the pornographic level category of the image to be classified according to the category of each subregion.
作为本申请实施例的一种实施方式,第二类别确定模块940,可以包括:As an implementation manner of the embodiment of the present application, the second category determining module 940 may include:
比值确认单元,用于根据每个子区域的类别,分别确定属于各异常类别的子区域的数量与子区域总数量的比值;The ratio confirmation unit is used to determine the ratio of the number of sub-regions belonging to each abnormal category to the total number of sub-regions according to the category of each sub-region;
阈值判断单元,用于分别判断各比值是否小于预设阈值;如果各比值均小于预设阈值,则确定待分类图像的色情等级类别为正常类别;如果存在大 于预设阈值的比值,则比较各比值的大小,确定待分类图像的色情等级类别为比值最大的子区域的类别。The threshold judgment unit is used to judge whether each ratio is less than a preset threshold; if each ratio is less than a preset threshold, determine that the pornographic level category of the image to be classified is a normal category; if there is a ratio greater than the preset threshold, compare each The size of the ratio determines the pornographic level category of the image to be classified as the category of the sub-region with the largest ratio.
作为本申请实施例的一种实施方式,第二图像的色情等级类别包括正常类别、低俗类别及色情类别。As an implementation manner of the embodiment of the present application, the pornographic level category of the second image includes a normal category, a vulgar category, and a pornographic category.
应用本申请实施例,获取待分类图像,基于画中画图像识别模型对待分类图像进行分析,确定待分类图像是否为画中画图像,如果待分类图像非画中画图像,则基于第一分类模型对待分类图像进行分析,确定待分类图像的色情等级类别,如果待分类图像为画中画图像,则基于第二分类模型对待分类图像进行分析,确定待分类图像的色情等级类别。其中,画中画图像识别模型为采用第一数据集通过机器学***均池化层和卷积核后得到。先确定待分类图像是否为画中画图像,如果不是画中画图像,则使用第一分类模型确定待分类图像的色情等级类别,如果是画中画图像,则使用第二分类模型确定待分类图像的色情等级类别,第二分类模型和第一分类模型有所区别,第二分类模型是采用第二数据集通过机器学***均池化层和卷积核后得到的,因此,能够对是画中画图像的待分类图像的色情等级类别进行准确识别,从而实现了对包含有低俗色情的画中画图像的准确识别。Apply the embodiment of this application to obtain the image to be classified, analyze the image to be classified based on the picture-in-picture image recognition model, and determine whether the image to be classified is a picture-in-picture image. If the image to be classified is not a picture-in-picture image, it is based on the first classification The model analyzes the image to be classified to determine the pornographic level category of the image to be classified. If the image to be classified is a picture-in-picture image, analyze the image to be classified based on the second classification model to determine the pornographic level category of the image to be classified. Among them, the picture-in-picture image recognition model is obtained through machine learning training using a first data set. The first data set includes multiple sets of data. Each set of data includes a first image and a corresponding first label. The first label is used for Indicates whether the first image is a picture-in-picture image; the first classification model is obtained through machine learning training using a second data set. The second data set includes multiple sets of data, and each set of data includes a second image and a corresponding second image. Label, the second label is used to indicate the pornographic level category of the second image; the second classification model is obtained by using the second data set to train through machine learning, and changing the global average pooling layer and convolution kernel from the trained model. First determine whether the image to be classified is a picture-in-picture image, if it is not a picture-in-picture image, use the first classification model to determine the pornographic level category of the image to be classified, if it is a picture-in-picture image, use the second classification model to determine the image to be classified The pornographic level category of the image, the second classification model is different from the first classification model. The second classification model uses the second data set through machine learning training and the trained model changes the global average pooling layer and convolution kernel. Therefore, it is possible to accurately identify the pornographic level category of the image to be classified that is a picture-in-picture image, thereby realizing accurate identification of the picture-in-picture image containing vulgar pornography.
本申请实施例还提供了一种电子设备,如图10所示,电子设备可以包括处理器1001、通信接口1002、存储器1003和通信总线1004,其中,处理器1001,通信接口1002,存储器1003通过通信总线1004完成相互间的通信,An embodiment of the present application also provides an electronic device. As shown in FIG. 10, the electronic device may include a processor 1001, a communication interface 1002, a memory 1003, and a communication bus 1004. The processor 1001, the communication interface 1002, and the memory 1003 pass through The communication bus 1004 completes mutual communication,
存储器1003,用于存放计算机程序;The memory 1003 is used to store computer programs;
处理器1001,用于执行存储器1003上所存放的程序时,实现上述任一实施例提供的识别画中画图像的方法、画中画图像识别模型的训练方法或者图像色情等级分类方法。The processor 1001 is configured to implement the method for recognizing a picture-in-picture image, the method for training a picture-in-picture image recognition model, or the method for classifying image pornography provided by any of the above embodiments when executing the program stored in the memory 1003.
可见,本申请实施例所提供的方案中,电子设备可以获取待分类图像, 基于画中画图像识别模型对待分类图像进行分析,确定待分类图像是否为画中画图像,如果待分类图像非画中画图像,则基于第一分类模型对待分类图像进行分析,确定待分类图像的色情等级类别,如果待分类图像为画中画图像,则基于第二分类模型对待分类图像进行分析,确定待分类图像的色情等级类别。其中,画中画图像识别模型为采用第一数据集通过机器学***均池化层和卷积核后得到。先确定待分类图像是否为画中画图像,如果不是画中画图像,则使用第一分类模型确定待分类图像的色情等级类别,如果是画中画图像,则使用第二分类模型确定待分类图像的色情等级类别,第二分类模型和第一分类模型有所区别,第二分类模型是采用第二数据集通过机器学***均池化层和卷积核后得到的,因此,能够对是画中画图像的待分类图像的色情等级类别进行准确识别,从而实现了对包含有低俗色情的画中画图像的准确识别。It can be seen that in the solution provided by the embodiment of the application, the electronic device can obtain the image to be classified, analyze the image to be classified based on the picture-in-picture image recognition model, and determine whether the image to be classified is a picture-in-picture image. If the image to be classified is not a picture For the image in the picture, analyze the image to be classified based on the first classification model to determine the pornographic level category of the image to be classified. If the image to be classified is a picture-in-picture image, analyze the image to be classified based on the second classification model to determine the image to be classified The pornographic level category of the image. Among them, the picture-in-picture image recognition model is obtained through machine learning training using a first data set. The first data set includes multiple sets of data. Each set of data includes a first image and a corresponding first label. The first label is used for Indicates whether the first image is a picture-in-picture image; the first classification model is obtained through machine learning training using a second data set. The second data set includes multiple sets of data, and each set of data includes a second image and a corresponding second image. Label, the second label is used to indicate the pornographic level category of the second image; the second classification model is obtained by using the second data set to train through machine learning, and changing the global average pooling layer and convolution kernel from the trained model. First determine whether the image to be classified is a picture-in-picture image, if it is not a picture-in-picture image, use the first classification model to determine the pornographic level category of the image to be classified, if it is a picture-in-picture image, use the second classification model to determine the image to be classified The pornographic level category of the image, the second classification model is different from the first classification model. The second classification model uses the second data set through machine learning training and the trained model changes the global average pooling layer and convolution kernel. Therefore, it is possible to accurately identify the pornographic level category of the image to be classified that is a picture-in-picture image, thereby realizing accurate identification of the picture-in-picture image containing vulgar pornography.
上述电子设备提到的通信总线可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The communication bus mentioned in the above electronic device may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
通信接口用于上述电子设备与其他设备之间的通信。The communication interface is used for communication between the aforementioned electronic device and other devices.
存储器可以包括随机存取存储器(Random Access Memory,RAM),也可以包括非易失性存储器(Non-Volatile Memory,NVM),例如至少一个磁盘存储器。可选的,存储器还可以是至少一个位于远离前述处理器的存储装置。The memory may include random access memory (Random Access Memory, RAM), and may also include non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk storage. Optionally, the memory may also be at least one storage device located far away from the foregoing processor.
上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组 件。The above-mentioned processor can be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (DSP), a dedicated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
本申请实施例还提供了一种计算机可读存储介质,计算机可读存储介质内存储有计算机程序,计算机程序被处理器执行时实现上述任一实施例提供的识别画中画图像的方法、画中画图像识别模型的训练方法或者图像色情等级分类方法。The embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the method for recognizing a picture-in-picture image and the picture-in-picture image provided by any of the above embodiments are implemented The training method of the Chinese painting image recognition model or the image pornographic classification method.
本申请实施例还提供了一种应用程序,用于在运行时执行:上述任一实施例提供的识别画中画图像的方法、画中画图像识别模型的训练方法或者图像色情等级分类方法。The embodiments of the present application also provide an application program for executing at runtime: the method for recognizing picture-in-picture images, the method for training picture-in-picture image recognition models, or the method for classifying image pornography provided by any of the above embodiments.
需要说明的是,对于上述装置、电子设备、计算机可读存储介质及应用程序实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。It should be noted that, for the above-mentioned device, electronic equipment, computer-readable storage medium, and application program embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and for related parts, please refer to the partial descriptions of the method embodiments. OK.
进一步需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be further noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities or operations. There is any such actual relationship or order between. Moreover, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article, or device that includes a series of elements includes not only those elements, but also includes Other elements of, or also include elements inherent to this process, method, article or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other same elements in the process, method, article, or equipment including the element.
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。The various embodiments in this specification are described in a related manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments.
以上所述仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和原则之内所作的任何修改、等同替换、改进等,均包含在本申请的保护范围内。The above are only preferred embodiments of the present application, and are not used to limit the protection scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of this application are all included in the protection scope of this application.

Claims (22)

  1. 一种识别画中画图像的方法,其特征在于,所述方法包括:A method for recognizing picture-in-picture images, characterized in that the method includes:
    获取待识别图像;Obtain the image to be recognized;
    基于画中画图像识别模型对所述待识别图像进行分析,确定所述待识别图像是否为画中画图像,其中,所述画中画图像识别模型为采用第一数据集通过机器学习训练得到,所述第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,所述第一标签用于表示所述第一图像是否为画中画图像。Analyze the image to be recognized based on the picture-in-picture image recognition model to determine whether the image to be recognized is a picture-in-picture image, wherein the picture-in-picture image recognition model is obtained through machine learning training using the first data set The first data set includes multiple sets of data, and each set of data includes a first image and a corresponding first label, and the first label is used to indicate whether the first image is a picture-in-picture image.
  2. 根据权利要求1所述的方法,其特征在于,所述基于画中画图像识别模型对所述待识别图像进行分析,确定所述待识别图像是否为画中画图像,包括:The method of claim 1, wherein the analyzing the image to be recognized based on a picture-in-picture image recognition model to determine whether the image to be recognized is a picture-in-picture image comprises:
    将所述待识别图像输入所述画中画图像识别模型,得到第一输出结果,其中,所述第一输出结果为二维向量,所述第一输出结果包括第一分量和第二分量,所述第一分量表示所述待识别图像为画中画图像的概率,所述第二分量表示所述待识别图像不为画中画图像的概率;The image to be recognized is input into the picture-in-picture image recognition model to obtain a first output result, wherein the first output result is a two-dimensional vector, and the first output result includes a first component and a second component, The first component represents the probability that the image to be recognized is a picture-in-picture image, and the second component represents the probability that the image to be recognized is not a picture-in-picture image;
    如果所述第一分量大于所述第二分量,则确定所述待识别图像为画中画图像。If the first component is greater than the second component, it is determined that the image to be recognized is a picture-in-picture image.
  3. 一种画中画图像识别模型的训练方法,其特征在于,所述方法包括:A training method for a picture-in-picture image recognition model, characterized in that the method includes:
    获取预先创建的初始神经网络模型;Obtain the pre-created initial neural network model;
    基于第一数据集对所述初始神经网络模型进行训练,得到所述画中画图像识别模型,其中,所述第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,所述第一标签用于表示所述第一图像是否为画中画图像。The initial neural network model is trained based on the first data set to obtain the picture-in-picture image recognition model, wherein the first data set includes multiple sets of data, and each set of data includes a first image and a corresponding A first label, where the first label is used to indicate whether the first image is a picture-in-picture image.
  4. 根据权利要求3所述的方法,其特征在于,所述基于第一数据集对所述初始神经网络模型进行训练,得到所述画中画图像识别模型,包括:The method according to claim 3, wherein the training the initial neural network model based on the first data set to obtain the picture-in-picture image recognition model comprises:
    从所述第一数据集中获取一个第一图像,并将所述第一图像输入至所述初始神经网络模型,得到第二输出结果,其中,所述第二输出结果为二维向量,所述第二输出结果包括第三分量和第四分量,所述第三分量表示所述第一图像为画中画图像的概率,所述第四分量表示所述第一图像不为画中画图像的概率;Obtain a first image from the first data set, and input the first image into the initial neural network model to obtain a second output result, where the second output result is a two-dimensional vector, and The second output result includes a third component and a fourth component. The third component represents the probability that the first image is a picture-in-picture image, and the fourth component represents that the first image is not a picture-in-picture image. Probability
    根据所述第二输出结果和所述第一图像对应的第一标签,确定损失量,其中,所述损失量表示所述第二输出结果与所述第一标签之间的差异;Determine a loss amount according to the second output result and the first label corresponding to the first image, where the loss amount represents the difference between the second output result and the first label;
    根据所述损失量,更新所述初始神经网络模型中的权重参数;Update the weight parameter in the initial neural network model according to the loss amount;
    从所述第一数据集中再次获取一个第一图像并输入至权重参数更新后的所述初始神经网络模型,并通过重复上述步骤以不断迭代的对所述初始神经网络模型的权重参数进行更新,直至所述损失量对应的损失函数收敛,得到所述画中画图像识别模型。Obtain a first image again from the first data set and input it into the initial neural network model after the weight parameter is updated, and update the weight parameters of the initial neural network model iteratively by repeating the above steps, Until the loss function corresponding to the loss amount converges, the picture-in-picture image recognition model is obtained.
  5. 一种图像色情等级分类方法,其特征在于,所述方法包括:An image pornography grade classification method, characterized in that the method includes:
    获取待分类图像;Obtain the image to be classified;
    基于画中画图像识别模型对所述待分类图像进行分析,确定所述待分类图像是否为画中画图像,其中,所述画中画图像识别模型为采用第一数据集通过机器学习训练得到,所述第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,所述第一标签用于表示所述第一图像是否为画中画图像;Analyze the image to be classified based on the picture-in-picture image recognition model to determine whether the image to be classified is a picture-in-picture image, wherein the picture-in-picture image recognition model is obtained through machine learning training using the first data set , The first data set includes multiple sets of data, each set of data includes a first image and a corresponding first label, and the first label is used to indicate whether the first image is a picture-in-picture image;
    如果所述待分类图像非画中画图像,则基于第一分类模型对所述待分类图像进行分析,确定所述待分类图像的色情等级类别,其中,所述第一分类模型为采用第二数据集通过机器学习训练得到,所述第二数据集包括多组数据,每组数据均包括一个第二图像和对应的第二标签,所述第二标签用于表示所述第二图像的色情等级类别;If the image to be classified is not a picture-in-picture image, analyze the image to be classified based on the first classification model to determine the pornographic level category of the image to be classified, wherein the first classification model adopts the second The data set is obtained through machine learning training. The second data set includes multiple sets of data, and each set of data includes a second image and a corresponding second label. The second label is used to indicate the pornography of the second image. Grade category
    如果所述待分类图像为画中画图像,则基于第二分类模型对所述待分类图像进行分析,确定所述待分类图像的色情等级类别,其中,所述第二分类模型为采用所述第二数据集通过机器学***均池化层和卷积核后得到。If the image to be classified is a picture-in-picture image, the image to be classified is analyzed based on a second classification model to determine the pornographic level category of the image to be classified, wherein the second classification model adopts the The second data set is trained by machine learning, and is obtained by changing the global average pooling layer and convolution kernel from the trained model.
  6. 根据权利要求5所述的方法,其特征在于,所述第一分类模型包括特征提取部分、全局平均池化层和输出层;所述特征提取部分用于提取所述待分类图像中的特征;所述全局平均池化层用于对所述特征提取部分提取的特征进行全局平均池化操作,得到所述待分类图像的全局特征;所述输出层用于对所述全局特征进行全连接处理,得到用于确定所述待分类图像的色情等级类别的类别向量。The method according to claim 5, wherein the first classification model includes a feature extraction part, a global average pooling layer, and an output layer; the feature extraction part is used to extract features in the image to be classified; The global average pooling layer is used to perform a global average pooling operation on the features extracted by the feature extraction part to obtain the global features of the image to be classified; the output layer is used to perform full connection processing on the global features To obtain a category vector used to determine the pornographic level category of the image to be classified.
  7. 根据权利要求5所述的方法,其特征在于,The method of claim 5, wherein:
    所述第二分类模型中的全局平均池化层,是由基于所述第二数据集训练 得到的模型的全局平均池化层中用于判断是否为全局池化的参数变更为否后得到;The global average pooling layer in the second classification model is obtained by changing the parameter used to determine whether it is global pooling in the global average pooling layer of the model trained on the second data set to No;
    所述第二分类模型中的卷积层,是由基于所述第二数据集训练得到的模型的输出层变更为核函数大小为1×1的卷积层后得到。The convolutional layer in the second classification model is obtained by changing the output layer of the model trained on the second data set to a convolutional layer with a kernel function size of 1×1.
  8. 根据权利要求5所述的方法,其特征在于,所述第二分类模型包括特征提取部分、非全局平均池化层和卷积层;所述特征提取部分用于提取所述待分类图像中的特征;所述非全局平均池化层用于对所述特征提取部分提取的特征进行池化操作,得到池化结果;所述卷积层用于对所述池化结果进行卷积运算,得到用于确定所述待分类图像的色情等级类别的类别矩阵。The method according to claim 5, wherein the second classification model includes a feature extraction part, a non-global average pooling layer, and a convolutional layer; the feature extraction part is used to extract the image to be classified Features; the non-global average pooling layer is used to perform a pooling operation on the features extracted by the feature extraction part to obtain a pooling result; the convolution layer is used to perform a convolution operation on the pooling result to obtain A category matrix used to determine the pornographic level category of the image to be classified.
  9. 根据权利要求5所述的方法,其特征在于,所述基于第二分类模型对所述待分类图像进行分析,确定所述待分类图像的色情等级类别,包括:The method according to claim 5, wherein the analyzing the image to be classified based on a second classification model to determine the pornographic level category of the image to be classified comprises:
    对所述待分类图像按照预定比例进行放大;Enlarge the image to be classified according to a predetermined ratio;
    将放大后的待分类图像输入第二分类模型,得到类别矩阵,其中,所述类别矩阵包括多组元素,每组元素对应所述待分类图像的一个子区域,每组元素中的每个元素代表所述子区域对应的一个预设类别的概率;Input the enlarged image to be classified into the second classification model to obtain a category matrix, wherein the category matrix includes multiple groups of elements, each group of elements corresponds to a sub-region of the image to be classified, and each element in each group of elements Represents the probability of a preset category corresponding to the sub-region;
    将所述类别矩阵的每组元素中值最大的元素对应的预设类别,确定为该组元素代表的所述待分类图像的子区域的类别;Determining the preset category corresponding to the element with the largest value in each group of elements of the category matrix as the category of the sub-region of the image to be classified represented by the group of elements;
    根据每个子区域的类别,确定所述待分类图像的色情等级类别。According to the category of each sub-region, the pornographic level category of the image to be classified is determined.
  10. 根据权利要求9所述的方法,其特征在于,所述根据每个子区域的类别,确定所述待分类图像的色情等级类别,包括:The method according to claim 9, wherein the determining the pornographic level category of the image to be classified according to the category of each sub-region comprises:
    根据每个子区域的类别,分别确定属于各异常类别的子区域的数量与子区域总数量的比值;According to the category of each sub-region, determine the ratio of the number of sub-regions belonging to each abnormal category to the total number of sub-regions;
    分别判断各比值是否小于预设阈值;Judge whether each ratio is less than the preset threshold;
    如果所述各比值均小于所述预设阈值,则确定所述待分类图像的色情等级类别为正常类别;If the ratios are all less than the preset threshold, determining that the pornographic level category of the image to be classified is a normal category;
    如果存在大于所述预设阈值的比值,则比较所述各比值的大小,确定所述待分类图像的色情等级类别为比值最大的子区域的类别。If there is a ratio greater than the preset threshold, comparing the magnitudes of the ratios, it is determined that the pornographic level category of the image to be classified is the category of the subregion with the largest ratio.
  11. 根据权利要求5-10任一项所述的方法,其特征在于,所述第二图像的色情等级类别包括正常类别、低俗类别及色情类别。The method according to any one of claims 5-10, wherein the pornographic level category of the second image includes a normal category, a vulgar category, and a pornographic category.
  12. 一种识别画中画图像的装置,其特征在于,所述装置包括:A device for recognizing picture-in-picture images, characterized in that the device comprises:
    获取模块,用于获取待识别图像;The acquisition module is used to acquire the image to be recognized;
    画中画图像识别模块,用于基于画中画图像识别模型对所述待识别图像进行分析,确定所述待识别图像是否为画中画图像,其中,所述画中画图像识别模型为采用第一数据集通过机器学习训练得到,所述第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,所述第一标签用于表示所述第一图像是否为画中画图像。The picture-in-picture image recognition module is used to analyze the image to be recognized based on the picture-in-picture image recognition model to determine whether the image to be recognized is a picture-in-picture image, wherein the picture-in-picture image recognition model adopts The first data set is obtained through machine learning training, the first data set includes multiple sets of data, each set of data includes a first image and a corresponding first label, and the first label is used to represent the first image Whether it is a picture-in-picture image.
  13. 根据权利要求12所述的装置,其特征在于,所述画中画图像识别模块,包括:The device according to claim 12, wherein the picture-in-picture image recognition module comprises:
    第一识别单元,用于将所述待识别图像输入所述画中画图像识别模型,得到第一输出结果,其中,所述第一输出结果为二维向量,所述第一输出结果包括第一分量和第二分量,所述第一分量表示所述待识别图像为画中画图像的概率,所述第二分量表示所述待识别图像不为画中画图像的概率;The first recognition unit is configured to input the image to be recognized into the picture-in-picture image recognition model to obtain a first output result, where the first output result is a two-dimensional vector, and the first output result includes the first output result. A component and a second component, where the first component represents the probability that the image to be recognized is a picture-in-picture image, and the second component represents the probability that the image to be recognized is not a picture-in-picture image;
    第一判断单元,用于判断所述第一分量是否大于所述第二分量,如果大于,则确定所述待识别图像为画中画图像。The first determining unit is configured to determine whether the first component is greater than the second component, and if it is greater, determine that the image to be recognized is a picture-in-picture image.
  14. 一种画中画图像识别模型的训练装置,其特征在于,所述装置包括:A training device for a picture-in-picture image recognition model, characterized in that the device comprises:
    获取模块,用于获取预先创建的初始神经网络模型;The acquisition module is used to acquire the pre-created initial neural network model;
    训练模块,用于基于第一数据集对所述初始神经网络模型进行训练,得到所述画中画图像识别模型,其中,所述第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,所述第一标签用于表示所述第一图像是否为画中画图像。The training module is configured to train the initial neural network model based on the first data set to obtain the picture-in-picture image recognition model, wherein the first data set includes multiple sets of data, and each set of data includes a first An image and a corresponding first label, where the first label is used to indicate whether the first image is a picture-in-picture image.
  15. 根据权利要求14所述的装置,其特征在于,所述训练模块,包括:The device according to claim 14, wherein the training module comprises:
    第二识别单元,用于从所述第一数据集中获取一个第一图像,并将所述第一图像输入至所述初始神经网络模型,得到第二输出结果,其中,所述第二输出结果为二维向量,所述第二输出结果包括第三分量和第四分量,所述第三分量表示所述第一图像为画中画图像的概率,所述第四分量表示所述第一图像不为画中画图像的概率;The second recognition unit is configured to obtain a first image from the first data set, and input the first image into the initial neural network model to obtain a second output result, wherein the second output result Is a two-dimensional vector, the second output result includes a third component and a fourth component, the third component represents the probability that the first image is a picture-in-picture image, and the fourth component represents the first image Probability of not being a picture-in-picture image;
    损失计算单元,用于根据所述第二输出结果和所述第一图像对应的第一标签,确定损失量,其中,所述损失量表示所述第二输出结果与所述第一标签之间的差异;A loss calculation unit, configured to determine a loss amount according to the second output result and the first label corresponding to the first image, where the loss amount represents the difference between the second output result and the first label The difference;
    权重更新单元,用于根据所述损失量,更新所述初始神经网络模型中的权重参数;A weight update unit, configured to update the weight parameter in the initial neural network model according to the loss amount;
    收敛条件判断单元,用于在所述损失量对应的损失函数收敛时,向所述 第二识别单元发送停止指令,以使得所述第二识别单元停止将所述第一数据集中的第一图像输入至所述初始神经网络模型。The convergence condition judgment unit is configured to send a stop instruction to the second recognition unit when the loss function corresponding to the loss amount converges, so that the second recognition unit stops collecting the first image in the first data Input to the initial neural network model.
  16. 一种图像色情等级分类装置,其特征在于,所述装置包括:An image pornography grade classification device, characterized in that the device includes:
    获取模块,用于获取待分类图像;An acquisition module for acquiring images to be classified;
    画中画图像识别模块,用于基于画中画图像识别模型对所述待分类图像进行分析,确定所述待分类图像是否为画中画图像,其中,所述画中画图像识别模型为采用第一数据集通过机器学习训练得到,所述第一数据集包括多组数据,每组数据均包括一个第一图像和对应的第一标签,所述第一标签用于表示所述第一图像是否为画中画图像;The picture-in-picture image recognition module is used to analyze the image to be classified based on the picture-in-picture image recognition model to determine whether the image to be classified is a picture-in-picture image, wherein the picture-in-picture image recognition model adopts The first data set is obtained through machine learning training, the first data set includes multiple sets of data, each set of data includes a first image and a corresponding first label, and the first label is used to represent the first image Whether it is a picture-in-picture image;
    第一类别确定模块,用于当所述待分类图像非画中画图像时,则基于第一分类模型对所述待分类图像进行分析,确定所述待分类图像的色情等级类别,其中,所述第一分类模型为采用第二数据集通过机器学习训练得到,所述第二数据集包括多组数据,每组数据均包括一个第二图像和对应的第二标签,所述第二标签用于表示所述第二图像的色情等级类别;The first category determination module is configured to analyze the image to be classified based on the first classification model when the image to be classified is not a picture-in-picture image, and determine the pornographic level category of the image to be classified, wherein The first classification model is obtained through machine learning training using a second data set. The second data set includes multiple sets of data, and each set of data includes a second image and a corresponding second label. Yu represents the pornographic level category of the second image;
    第二类别确定模块,用于当所述待分类图像为画中画图像时,则基于第二分类模型对所述待分类图像进行分析,确定所述待分类图像的色情等级类别,其中,所述第二分类模型为采用所述第二数据集通过机器学***均池化层和卷积核后得到。The second category determining module is configured to analyze the image to be classified based on the second classification model when the image to be classified is a picture-in-picture image, and determine the pornographic level category of the image to be classified, wherein The second classification model is obtained by using the second data set through machine learning training, and changing the global average pooling layer and convolution kernel from the trained model.
  17. 根据权利要求16所述的装置,其特征在于,The device according to claim 16, wherein:
    所述第二分类模型中的全局平均池化层,是由基于所述第二数据集训练得到的模型的全局平均池化层中用于判断是否为全局池化的参数变更为否后得到;The global average pooling layer in the second classification model is obtained by changing the parameter used to determine whether it is global pooling in the global average pooling layer of the model trained on the second data set to No;
    所述第二分类模型中的卷积层,是由基于所述第二数据集训练得到的模型的输出层变更为核函数大小为1×1的卷积层后得到。The convolutional layer in the second classification model is obtained by changing the output layer of the model trained on the second data set to a convolutional layer with a kernel function size of 1×1.
  18. 根据权利要求16所述的装置,其特征在于,所述第二类别确定模块,包括:The device according to claim 16, wherein the second category determining module comprises:
    图像放大单元,用于对所述待分类图像按照预定比例进行放大;An image enlargement unit, configured to enlarge the image to be classified according to a predetermined ratio;
    类别概率生成单元,用于将放大后的待分类图像输入第二分类模型,得到类别矩阵,其中,所述类别矩阵包括多组元素,每组元素对应所述待分类图像的一个子区域,每组元素中的每个元素代表所述子区域对应的一个预设类别的概率;The category probability generating unit is used to input the enlarged image to be classified into the second classification model to obtain a category matrix, wherein the category matrix includes multiple groups of elements, and each group of elements corresponds to a sub-region of the image to be classified. Each element in the group element represents the probability of a preset category corresponding to the sub-region;
    子区域类别确认单元,用于将所述类别矩阵的每组元素中值最大的元素对应的预设类别,确定为该组元素代表的所述待分类图像的子区域的类别;A sub-region category confirmation unit, configured to determine the preset category corresponding to the element with the largest value in each group of elements of the category matrix as the category of the sub-region of the image to be classified represented by the group of elements;
    色情等级类别确认单元,用于根据每个子区域的类别,确定所述待分类图像的色情等级类别。The pornographic level category confirmation unit is used to determine the pornographic level category of the image to be classified according to the category of each subregion.
  19. 根据权利要求18所述的装置,其特征在于,所述第二类别确定模块,包括:The device according to claim 18, wherein the second category determining module comprises:
    比值确认单元,用于根据每个子区域的类别,分别确定属于各异常类别的子区域的数量与子区域总数量的比值;The ratio confirmation unit is used to determine the ratio of the number of sub-regions belonging to each abnormal category to the total number of sub-regions according to the category of each sub-region;
    阈值判断单元,用于分别判断各比值是否小于预设阈值;如果所述各比值均小于所述预设阈值,则确定所述待分类图像的色情等级类别为正常类别;如果存在大于所述预设阈值的比值,则比较所述各比值的大小,确定所述待分类图像的色情等级类别为比值最大的子区域的类别。The threshold judgment unit is used to judge whether each ratio is less than a preset threshold; if the ratios are all less than the preset threshold, determine that the pornographic level category of the image to be classified is a normal category; Setting the ratio of the threshold value, compare the magnitudes of the ratios, and determine that the pornographic level category of the image to be classified is the category of the subregion with the largest ratio.
  20. 一种电子设备,其特征在于,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;An electronic device, characterized by comprising a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete mutual communication through the communication bus;
    存储器,用于存放计算机程序;Memory, used to store computer programs;
    处理器,用于执行存储器上所存放的程序时,实现权利要求1-2、3-4或者5-11任一所述的方法。The processor is configured to implement the method described in any one of claims 1-2, 3-4 or 5-11 when executing the program stored in the memory.
  21. 一种计算机可读存储介质,其特征在于,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行权利要求1-2、3-4或者5-11任一所述的方法。A computer-readable storage medium, characterized in that, when the instructions in the storage medium are executed by the processor of the electronic device, the electronic device can execute any of claims 1-2, 3-4, or 5-11. The method described.
  22. 一种应用程序,其特征在于,用于在运行时执行:权利要求1-2、3-4或者5-11任一所述的方法。An application program characterized by being used to execute the method described in any one of claims 1-2, 3-4, or 5-11 at runtime.
PCT/CN2020/092898 2019-05-31 2020-05-28 Image recognition method and apparatus, image classification method and apparatus, electronic device, and storage medium WO2020239015A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910469236.2A CN110163300B (en) 2019-05-31 2019-05-31 Image classification method and device, electronic equipment and storage medium
CN201910469236.2 2019-05-31

Publications (1)

Publication Number Publication Date
WO2020239015A1 true WO2020239015A1 (en) 2020-12-03

Family

ID=67630464

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092898 WO2020239015A1 (en) 2019-05-31 2020-05-28 Image recognition method and apparatus, image classification method and apparatus, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN110163300B (en)
WO (1) WO2020239015A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733912A (en) * 2020-12-31 2021-04-30 华侨大学 Fine-grained image recognition method based on multi-grained countermeasure loss
CN112837345A (en) * 2021-01-29 2021-05-25 北京农业智能装备技术研究中心 Method and system for detecting deposition distribution of liquid medicine of plant canopy
CN112949693A (en) * 2021-02-02 2021-06-11 北京嘀嘀无限科技发展有限公司 Training method of image classification model, image classification method, device and equipment
CN113239804A (en) * 2021-05-13 2021-08-10 杭州睿胜软件有限公司 Image recognition method, readable storage medium, and image recognition system
CN113344102A (en) * 2021-06-23 2021-09-03 昆山星际舟智能科技有限公司 Target image identification method based on image HOG characteristics and ELM model
CN113744161A (en) * 2021-09-16 2021-12-03 北京顺势兄弟科技有限公司 Enhanced data acquisition method and device, data enhancement method and electronic equipment
CN114760484A (en) * 2021-01-08 2022-07-15 腾讯科技(深圳)有限公司 Live video identification method and device, computer equipment and storage medium
CN115827880A (en) * 2023-02-10 2023-03-21 之江实验室 Service execution method and device based on emotion classification
CN116910296A (en) * 2023-09-08 2023-10-20 上海任意门科技有限公司 Method, system, electronic device and medium for identifying transport content
CN117245672A (en) * 2023-11-20 2023-12-19 南昌工控机器人有限公司 Intelligent motion control system and method for modularized assembly of camera support

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163300B (en) * 2019-05-31 2021-04-23 北京金山云网络技术有限公司 Image classification method and device, electronic equipment and storage medium
CN110781834A (en) * 2019-10-28 2020-02-11 上海眼控科技股份有限公司 Traffic abnormality image detection method, device, computer device and storage medium
CN110909803B (en) * 2019-11-26 2023-04-18 腾讯科技(深圳)有限公司 Image recognition model training method and device and computer readable storage medium
CN111104874B (en) * 2019-12-03 2024-02-20 北京金山云网络技术有限公司 Face age prediction method, training method and training device for model, and electronic equipment
CN112926608A (en) * 2019-12-05 2021-06-08 北京金山云网络技术有限公司 Image classification method and device, electronic equipment and storage medium
CN113033545B (en) * 2019-12-24 2023-11-03 同方威视技术股份有限公司 Empty tray identification method and device
CN111291819B (en) * 2020-02-19 2023-09-15 腾讯科技(深圳)有限公司 Image recognition method, device, electronic equipment and storage medium
CN111695453B (en) * 2020-05-27 2024-02-09 深圳市优必选科技股份有限公司 Drawing recognition method and device and robot
CN111767959B (en) * 2020-06-30 2023-10-31 创新奇智(广州)科技有限公司 Plush fiber classifying method and device
CN111898658B (en) * 2020-07-15 2023-03-24 北京字节跳动网络技术有限公司 Image classification method and device and electronic equipment
CN114065826A (en) * 2020-07-28 2022-02-18 紫东信息科技(苏州)有限公司 Construction method, classification method and device of image classification model and electronic equipment
CN112598016A (en) * 2020-09-17 2021-04-02 北京小米松果电子有限公司 Image classification method and device, communication equipment and storage medium
CN112348083A (en) * 2020-11-06 2021-02-09 北京钠纬智能科技有限公司 Image classification method and device
CN113705686B (en) * 2021-08-30 2023-09-15 平安科技(深圳)有限公司 Image classification method, device, electronic equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070064983A1 (en) * 2005-09-16 2007-03-22 Wen-Chen Huang Method for automatically detecting nasal tumor
CN101137011A (en) * 2006-08-29 2008-03-05 索尼株式会社 Image processing apparatus, image processing method and computer program
CN107330453A (en) * 2017-06-19 2017-11-07 中国传媒大学 The Pornographic image recognizing method of key position detection is recognized and merged based on substep
CN108520229A (en) * 2018-04-04 2018-09-11 北京旷视科技有限公司 Image detecting method, device, electronic equipment and computer-readable medium
CN110163300A (en) * 2019-05-31 2019-08-23 北京金山云网络技术有限公司 A kind of image classification method, device, electronic equipment and storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8358837B2 (en) * 2008-05-01 2013-01-22 Yahoo! Inc. Apparatus and methods for detecting adult videos
CN105654059A (en) * 2015-12-31 2016-06-08 武汉鸿瑞达信息技术有限公司 Detection method for vulgar adult content of Internet video image
CN107871314B (en) * 2016-09-23 2022-02-18 商汤集团有限公司 Sensitive image identification method and device
CN107341518A (en) * 2017-07-07 2017-11-10 东华理工大学 A kind of image classification method based on convolutional neural networks
CN108154134B (en) * 2018-01-11 2019-07-23 天格科技(杭州)有限公司 Pornographic image detection method is broadcast live in internet based on depth convolutional neural networks
CN108764374B (en) * 2018-06-11 2022-07-19 杭州网易智企科技有限公司 Image classification method, system, medium, and electronic device
CN109101523A (en) * 2018-06-14 2018-12-28 北京搜狗科技发展有限公司 A kind of image processing method, device and electronic equipment
CN109145979B (en) * 2018-08-15 2022-06-21 上海嵩恒网络科技股份有限公司 Sensitive image identification method and terminal system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070064983A1 (en) * 2005-09-16 2007-03-22 Wen-Chen Huang Method for automatically detecting nasal tumor
CN101137011A (en) * 2006-08-29 2008-03-05 索尼株式会社 Image processing apparatus, image processing method and computer program
CN107330453A (en) * 2017-06-19 2017-11-07 中国传媒大学 The Pornographic image recognizing method of key position detection is recognized and merged based on substep
CN108520229A (en) * 2018-04-04 2018-09-11 北京旷视科技有限公司 Image detecting method, device, electronic equipment and computer-readable medium
CN110163300A (en) * 2019-05-31 2019-08-23 北京金山云网络技术有限公司 A kind of image classification method, device, electronic equipment and storage medium

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733912A (en) * 2020-12-31 2021-04-30 华侨大学 Fine-grained image recognition method based on multi-grained countermeasure loss
CN112733912B (en) * 2020-12-31 2023-06-09 华侨大学 Fine granularity image recognition method based on multi-granularity countering loss
CN114760484A (en) * 2021-01-08 2022-07-15 腾讯科技(深圳)有限公司 Live video identification method and device, computer equipment and storage medium
CN114760484B (en) * 2021-01-08 2023-11-07 腾讯科技(深圳)有限公司 Live video identification method, live video identification device, computer equipment and storage medium
CN112837345B (en) * 2021-01-29 2023-12-08 北京农业智能装备技术研究中心 Method and system for detecting deposition distribution of plant canopy liquid medicine
CN112837345A (en) * 2021-01-29 2021-05-25 北京农业智能装备技术研究中心 Method and system for detecting deposition distribution of liquid medicine of plant canopy
CN112949693B (en) * 2021-02-02 2024-04-26 北京嘀嘀无限科技发展有限公司 Training method of image classification model, image classification method, device and equipment
CN112949693A (en) * 2021-02-02 2021-06-11 北京嘀嘀无限科技发展有限公司 Training method of image classification model, image classification method, device and equipment
CN113239804B (en) * 2021-05-13 2023-06-02 杭州睿胜软件有限公司 Image recognition method, readable storage medium, and image recognition system
CN113239804A (en) * 2021-05-13 2021-08-10 杭州睿胜软件有限公司 Image recognition method, readable storage medium, and image recognition system
CN113344102A (en) * 2021-06-23 2021-09-03 昆山星际舟智能科技有限公司 Target image identification method based on image HOG characteristics and ELM model
CN113344102B (en) * 2021-06-23 2023-07-25 昆山星际舟智能科技有限公司 Target image recognition method based on image HOG features and ELM model
CN113744161A (en) * 2021-09-16 2021-12-03 北京顺势兄弟科技有限公司 Enhanced data acquisition method and device, data enhancement method and electronic equipment
CN113744161B (en) * 2021-09-16 2024-03-29 北京顺势兄弟科技有限公司 Enhanced data acquisition method and device, data enhancement method and electronic equipment
CN115827880B (en) * 2023-02-10 2023-05-16 之江实验室 Business execution method and device based on emotion classification
CN115827880A (en) * 2023-02-10 2023-03-21 之江实验室 Service execution method and device based on emotion classification
CN116910296B (en) * 2023-09-08 2023-12-08 上海任意门科技有限公司 Method, system, electronic device and medium for identifying transport content
CN116910296A (en) * 2023-09-08 2023-10-20 上海任意门科技有限公司 Method, system, electronic device and medium for identifying transport content
CN117245672A (en) * 2023-11-20 2023-12-19 南昌工控机器人有限公司 Intelligent motion control system and method for modularized assembly of camera support
CN117245672B (en) * 2023-11-20 2024-02-02 南昌工控机器人有限公司 Intelligent motion control system and method for modularized assembly of camera support

Also Published As

Publication number Publication date
CN110163300B (en) 2021-04-23
CN110163300A (en) 2019-08-23

Similar Documents

Publication Publication Date Title
WO2020239015A1 (en) Image recognition method and apparatus, image classification method and apparatus, electronic device, and storage medium
CN108921206B (en) Image classification method and device, electronic equipment and storage medium
CN108898086B (en) Video image processing method and device, computer readable medium and electronic equipment
CN109583489B (en) Defect classification identification method and device, computer equipment and storage medium
JP7051267B2 (en) Image detection methods, equipment, electronic equipment, storage media, and programs
WO2020253127A1 (en) Facial feature extraction model training method and apparatus, facial feature extraction method and apparatus, device, and storage medium
CN109308463B (en) Video target identification method, device and equipment
CN109002766B (en) Expression recognition method and device
WO2019020049A1 (en) Image retrieval method and apparatus, and electronic device
CN109145299A (en) Text similarity determination method, device, equipment and storage medium
WO2020056999A1 (en) Picture recommendation method and apparatus, computer device, and storage medium
CN110135505B (en) Image classification method and device, computer equipment and computer readable storage medium
US8542912B2 (en) Determining the uniqueness of a model for machine vision
JP2022512065A (en) Image classification model training method, image processing method and equipment
CN109598298B (en) Image object recognition method and system
US8542905B2 (en) Determining the uniqueness of a model for machine vision
CN112232506A (en) Network model training method, image target recognition method, device and electronic equipment
CN115797735A (en) Target detection method, device, equipment and storage medium
CN114998679A (en) Online training method, device and equipment for deep learning model and storage medium
CN112784494A (en) Training method of false positive recognition model, target recognition method and device
CN110880018B (en) Convolutional neural network target classification method
CN111340140A (en) Image data set acquisition method and device, electronic equipment and storage medium
CN116645719A (en) Pupil and iris positioning method and device, electronic equipment and storage medium
CN115661564A (en) Training method and device of image processing model, electronic equipment and storage medium
CN113344994B (en) Image registration method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20813587

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20813587

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 30.03.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20813587

Country of ref document: EP

Kind code of ref document: A1