WO2019101021A1 - 图像识别方法、装置及电子设备 - Google Patents

图像识别方法、装置及电子设备 Download PDF

Info

Publication number
WO2019101021A1
WO2019101021A1 PCT/CN2018/116044 CN2018116044W WO2019101021A1 WO 2019101021 A1 WO2019101021 A1 WO 2019101021A1 CN 2018116044 W CN2018116044 W CN 2018116044W WO 2019101021 A1 WO2019101021 A1 WO 2019101021A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target
candidate region
layer
target candidate
Prior art date
Application number
PCT/CN2018/116044
Other languages
English (en)
French (fr)
Inventor
李峰
左小祥
陈家君
李昊沅
曾维亿
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2019101021A1 publication Critical patent/WO2019101021A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image

Definitions

  • the embodiments of the present invention relate to the field of machine learning technologies, and in particular, to an image recognition method, device, and electronic device.
  • Image recognition technology refers to the technique of recognizing objects included in an image, and is a common method of image processing.
  • the terminal first uses a sample set to train a Convolutional Neural Network (CNN) to obtain an image recognition model, and then inputs the image to be recognized into the trained image recognition model, and the image recognition model pairs the image. Identify and output the recognition result.
  • CNN Convolutional Neural Network
  • the embodiment of the present application provides an image recognition method, device, and electronic device, which can be used to solve the problem that the recognition error or the unrecognizable situation occurs when the proportion of the object to be identified in the image is small in the related art.
  • the problem. The technical solution is as follows:
  • an embodiment of the present application provides an image recognition method, which is applied to an electronic device, where the method includes:
  • Image recognition is performed based on the target candidate region by using an image recognition model, and a recognition result of the target image is obtained.
  • an embodiment of the present application provides an image recognition apparatus, which is applied to an electronic device, and the apparatus includes:
  • An image detecting module configured to detect, by using an image detection model, a target candidate region in the target image, where the target candidate region is an image block that includes the target object;
  • a region extracting module configured to extract the target candidate region when the target candidate region is detected from the target image
  • An image recognition module is configured to perform image recognition based on the target candidate region by using an image recognition model to obtain a recognition result of the target image.
  • the image detecting module is configured to:
  • the target candidate region is determined according to a probability corresponding to each pixel, and the target candidate region includes a pixel whose probability is greater than a preset threshold.
  • the image detecting module is configured to:
  • the image recognition module is configured to:
  • the recognition result corresponding to the maximum value in the first probability distribution is determined as the recognition result of the target image.
  • the image recognition module is configured to:
  • the recognition result corresponding to the maximum value in the second probability distribution is determined as the recognition result of the target image.
  • the image detection model includes an input layer, a convolution layer, a pooling layer, an upper convolution layer, a stitching layer, a layer and an output layer;
  • the input layer is used to input the target image;
  • a convolution layer is configured to convert the target image into a feature map;
  • the pooling layer is configured to perform a pooling process on a feature map output by the convolution layer to reduce a number of features in the feature map;
  • the upper convolution layer is configured to perform an upper convolution operation on the feature map outputted by the convolution layer;
  • the concatenation layer is configured to perform splicing processing on the feature map processed by the pooling layer and the upper convolution layer And obtaining a spliced feature map;
  • the homing layer is configured to perform normalization processing on the spliced feature map to obtain location information of the target candidate region; and the output layer is configured to output the target Location information of the candidate area.
  • the image recognition model includes an input layer, a convolution layer, a pooling layer, a layer and an output layer; the input layer is used to input the target candidate region; and the convolution layer is used for Converting the target candidate region into a feature map; the pooling layer is configured to perform pooling processing on the feature map to reduce the number of features in the feature map; and the layer is used to perform the convolution The layer and the pooled layer processed feature map are subjected to normalization processing to obtain the recognition result; and the output layer is configured to output the recognition result.
  • the device further includes:
  • a ratio acquisition module configured to acquire a ratio of the target candidate area to the target image
  • the image recognition module is further configured to directly perform the step of recognizing the target candidate region by using an image recognition model to obtain a recognition result of the target image, if the ratio is greater than a preset threshold.
  • the device further includes:
  • a first acquiring module configured to acquire a first training sample set, where the first training sample set includes a plurality of first training samples, each of the first training samples is marked with an area including the target and/or not The area including the target;
  • the first training module is configured to train the convolutional neural network CNN by using the first training sample set to obtain the image detection model.
  • the device further includes:
  • a second acquiring module configured to acquire a second training sample set, where the second training sample set includes a plurality of second training samples, and each of the second training samples corresponds to a recognition result;
  • the second training module is configured to train the convolutional neural network CNN by using the second training sample set to obtain the image recognition model.
  • an embodiment of the present application provides an electronic device, where the electronic device includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or a set of instructions, the at least one instruction The at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the image recognition method of the first aspect.
  • the embodiment of the present application provides a computer readable storage medium, where the computer readable storage medium stores at least one instruction, at least one program, a code set, or a set of instructions, the at least one instruction, the At least one program, the set of codes, or a set of instructions is loaded and executed by a processor to implement the image recognition method of the first aspect.
  • an embodiment of the present application provides a computer program product for performing the image recognition method of the first aspect described above when the computer program product is executed.
  • the target candidate region of the target object may be detected by the image detection model, and the target candidate region may be extracted, and then the image recognition model is used to identify the target candidate region based on the extracted target, and the recognition result is obtained when the target object is in the image.
  • the electronic device extracts the target candidate region including the target object from the image, and the target object occupies a large proportion in the target candidate region.
  • the target candidate region is identified by the image recognition model. It can avoid the situation in the related art that the recognition of the target object in the image is small, and the recognition success rate of the image recognition is improved.
  • FIG. 1 is a flowchart of an image recognition method illustrated by an exemplary embodiment of the present application.
  • Figure 2 is a schematic view of the embodiment shown in Figure 1;
  • FIG. 3 is a schematic diagram of a first training sample shown in an exemplary embodiment of the present application.
  • FIG. 4 is a schematic diagram of a detection process illustrated by an exemplary embodiment of the present application.
  • FIG. 5 is a schematic diagram of a second training sample set shown in an exemplary embodiment of the present application.
  • FIG. 6 is a schematic diagram of an identification process illustrated by an exemplary embodiment of the present application.
  • FIG. 7 is a flowchart of an image recognition method illustrated by another exemplary embodiment of the present application.
  • FIG. 8 is a schematic diagram of an interface for pattern recognition according to an exemplary embodiment of the present application.
  • FIG. 9 is a schematic diagram of an interface for pattern recognition according to an exemplary embodiment of the present application.
  • FIG. 10 is a block diagram showing the structure of an image recognition apparatus according to an exemplary embodiment of the present application.
  • FIG. 11 is a block diagram showing the structure of an image recognition apparatus according to another embodiment of the present application.
  • FIG. 12 is a block diagram showing the structure of an electronic device according to an exemplary embodiment of the present application.
  • the model when performing image recognition by a correlation model, the model generally divides the image into a plurality of regions according to the degree of interest in one image, and then learns related features from regions with a higher degree of interest, and further The image recognition result is determined based on the learned features.
  • the proportion of the object to be identified in the image is small
  • the probability that the region containing the object to be identified is determined by the model as the region of interest is low, and the above model is based on the inclusion of the image.
  • An area other than the area of the object to be identified is used for image recognition, and an identification error or an unrecognizable condition may occur.
  • the embodiment of the present application provides an image recognition method, device, and electronic device.
  • the target candidate region that may include the target object in the image is initially detected by the image detection model, and the target candidate region is extracted, and then the image recognition model is used to identify and extract the target candidate region based on the extracted image.
  • the image recognition model is used to identify and extract the target candidate region based on the extracted image.
  • the method provided by the embodiment of the present application may be an electronic device having an image processing capability.
  • the electronic device may be a terminal such as a personal computer, a mobile phone, a tablet computer, or a server.
  • FIG. 1 shows a flowchart of an image recognition method according to an embodiment of the present application.
  • the method can include the following steps:
  • step 101 an image detection model is used to detect a target candidate region in the target image.
  • the target candidate area is an image block containing the target object.
  • the target object refers to the object to be identified in the target image, which may be a face, an object, a gesture, and the like, which is not limited in this embodiment of the present application.
  • the target image is an image to be detected, which may be a picture or a certain frame image in the video.
  • the image detection model is used to detect whether a target object is included in the target image.
  • the image detection model is further configured to detect a rough area of the target object in the target image, that is, a target candidate area.
  • the image detection model is a model obtained by training the CNN. The training process and network architecture for the image detection model will be described in the following examples.
  • step 101 may include the following sub-steps:
  • Step 101a using an image detection model to acquire a probability that each pixel in the target image belongs to the target object;
  • the image detection model can perform feature extraction on each pixel in the target image, and match the feature extraction result corresponding to each pixel with the preset image feature, and the matching degree between the feature extraction result and the preset image feature can be It is used to measure the probability that the pixel corresponding to the feature extraction result belongs to the target object.
  • the greater the degree of matching between the feature extraction result and the preset image feature the greater the probability that the pixel corresponding to the feature extraction result belongs to the target object; the smaller the degree of matching between the feature extraction result and the preset image feature, Then, the probability that the pixel corresponding to the feature extraction result belongs to the target object is smaller.
  • the preset image feature may be an image feature corresponding to the pixel constituting the target, which may be obtained after the image detection model is trained.
  • the probability matrix may be used to represent the above probability.
  • the probability included in the probability matrix corresponds one-to-one with the pixel points included in the target image.
  • the value of the fourth row and the third column of the probability matrix is used to indicate the probability that the pixel points of the fourth row and the third column of the target image correspond.
  • step 101b the target candidate region is determined according to the probability corresponding to each pixel.
  • the target candidate region includes pixels whose probability is greater than a preset threshold.
  • the preset threshold may be determined according to the image recognition model's requirement for the target to occupy the target image. For example, when the image recognition model requires that the target has a large proportion of the target image, the preset threshold is also larger. Illustratively, the preset threshold is 0.7.
  • the terminal performs binarization processing on the probability matrix, sets a probability greater than or equal to the preset threshold to 1, and sets a threshold that is not greater than the preset threshold to 0. In the above manner, the probability of being greater than or equal to the preset threshold and the probability of being less than the preset threshold are distinguished.
  • determining the target candidate region may be performed by: acquiring an image block that meets the first preset condition according to a probability corresponding to each pixel, and determining an image block that meets the first preset condition as a target image block, where
  • the preset condition is to include a target pixel that is continuous and larger than the preset number, and the target pixel refers to a pixel whose probability is greater than a preset threshold; and the rectangular area that includes the target image block and meets the second preset condition is determined as the target candidate area,
  • the second preset condition is that the proportion of the target image block in the rectangular area is greater than the preset ratio.
  • the preset number, the preset threshold, and the preset ratio may be set according to actual requirements, which is not limited in this embodiment of the present application.
  • the second preset condition may also be that the proportion of the target image block is maximized, that is, the rectangular area is the smallest rectangular area including the target image block.
  • the proportion of the target in the target candidate area is as large as possible, and when the image recognition model is used for subsequent recognition, the recognition efficiency can be improved, and the accuracy of the recognition can be improved.
  • FIG. 2 there is shown a schematic diagram of the embodiment of Figure 1. After the image detection model 11 detects the input target image 10, the target image 10 marked with the target candidate region 12 is output.
  • Step 102 When the target candidate region is detected from the target image, the target candidate region is extracted.
  • the target candidate region is extracted from the target image, that is, the target candidate region is intercepted from the target image.
  • the terminal extracts the target candidate region 12 from the target image 10.
  • the terminal can directly identify the target image without performing step 102, that is, without extracting the target candidate region from the target image, so before step 102, The terminal can obtain the ratio of the target candidate area to the target image. If the ratio is greater than the preset threshold, step 103 is performed. If the ratio is less than or equal to the preset threshold, step 102 is performed.
  • the preset threshold may be actually determined according to the recognition accuracy of the image recognition model. Illustratively, the preset threshold is 30%. In the above manner, the time required to extract the target candidate region can be omitted, and the efficiency of image recognition can be improved.
  • Step 103 Perform image recognition based on the target candidate region by using an image recognition model to obtain a recognition result of the target image.
  • the recognition result of the target image refers to the classification to which the target object included in the target image belongs.
  • the icon image is an image including a gesture
  • the recognition result of the target image refers to the classification to which the gesture belongs.
  • Image recognition models are used to identify targets and classify them.
  • the image recognition model is also a model obtained by training the CNN. The training process and network architecture for the image recognition model will be explained in the following examples.
  • the target candidate region may be directly identified, or the processed target candidate region may be identified after the target candidate region is preprocessed. The above two methods will be explained separately below.
  • step 103 may include the following sub-steps:
  • Step 103a performing feature extraction on the target candidate region by using an image recognition model to obtain image features of the target candidate region;
  • Step 103b Determine, according to image features of the target candidate region, a first probability distribution of the target object in the target candidate region in the plurality of recognition results;
  • step 103c the recognition result corresponding to the maximum value in the first probability distribution is determined as the recognition result of the target image.
  • the first probability distribution of the target object among the plurality of recognition results The probability that the index finger target object belongs to each of the plurality of recognition results.
  • the probability that the target object belongs to the gesture "Good” is 0.95
  • the probability that the target object belongs to the gesture "Yeah” is 0.05, at which time the electronic device determines the gesture "Good” as the recognition result of the target image.
  • Step 103 may include the following sub-steps:
  • Step 103d Perform pre-processing on the target candidate region to obtain the processed target candidate region, and the resolution of the processed target candidate region reaches a preset resolution;
  • the preset resolution is a requirement of the resolution of the image recognition model to be recognized.
  • the preset resolution is 440*360. Due to the requirement of the resolution of the image recognition model to be recognized, if the resolution does not meet the requirements, the image recognition model needs to take into account the resolution conversion problem in the recognition process, and the process requires more calculation and takes longer. .
  • the resolution of the image to be recognized is converted to the resolution required by the image recognition model in advance, and the workload can be reduced in subsequent image recognition, and the image recognition is saved. Time to improve the efficiency of image recognition.
  • the terminal first acquires the resolution of the target candidate region, and then performs resolution improvement processing on the resolution of the target candidate region, and the resolution of the processed target candidate region reaches a preset resolution.
  • the algorithm used in the resolution enhancement processing may be a nearest neighbor interpolation algorithm, a bilinear interpolation algorithm, a cubic convolution interpolation algorithm, and the like, which are not limited in this embodiment of the present application.
  • Step 103e performing feature extraction on the processed target candidate region by using an image recognition model to obtain an image feature of the processed target candidate region;
  • Step 103f Determine, according to the image feature of the processed target candidate region, a second probability distribution of the target object in the target candidate region in the plurality of recognition results;
  • step 103g the recognition result corresponding to the maximum value in the second probability distribution is determined as the recognition result of the target image.
  • Steps 103e to 103f are the same as steps 103a to 103b, and are not described herein again.
  • the image recognition model 13 recognizes the target candidate region 12, and outputs the recognition result 14 of the target image 10, which is the gesture "Good” shown in the figure, that is, the gesture of raising the thumb.
  • the method provided by the embodiment of the present application firstly detects a target candidate region that may include a target object in an image by using an image detection model, and extracts a target candidate region, and then uses the image recognition model based on the extracted target candidate.
  • the region is identified and the recognition result is obtained.
  • the target candidate region is identified by the image recognition model, which can avoid the situation that the target object does not recognize or even recognize the error when the proportion of the target object in the image is small, and the success rate of the image recognition is improved.
  • each sub-network that is, the image detection model and the image recognition model
  • each sub-network can be flexibly reused or replaced.
  • a combination of models that provide different optimization preferences for different users. For example, if some users have higher accuracy requirements, the graphic recognition model can be optimized to obtain more accurate image recognition results.
  • the training process of the image detection model is as follows: the first training sample set is acquired, and the CNN is trained by using the first training sample set to obtain an image detection model.
  • the first training sample set includes a plurality of first training samples.
  • the number of first training samples included in the first training sample set may be determined according to actual needs.
  • Each first training sample is marked with an area including the target object and/or an area not including the target object.
  • the process of marking the first training sample can be done manually. Referring to FIG. 3, a schematic diagram of a first training sample 20 shown in an exemplary embodiment of the present application is shown.
  • the first training sample 20 includes a contour 21 composed of black lines, the inside of the contour 21 is an area including a target object, and the outside of the contour 21 is an area not including the target object.
  • the proportion of the target object to the first training sample may be the same or different.
  • the ratio of the target object to the first training sample A is 0.3
  • the ratio of the target to the first training sample B is 0.6
  • the types of target objects included in the first training sample may be the same or different.
  • the target object included in the first training sample A is the gesture "Good”
  • the target object included in the first training sample B is the gesture "Yeah”.
  • CNN can be an alexNet network, a VGG-16 network, and the like.
  • the algorithm used to train the CNN and obtain the image detection model may be a Regions Convolutional Neural Network (RCNN) algorithm, a Fast Region Convolutional Neural Network (faster RCNN) algorithm, or the like.
  • RCNN Regions Convolutional Neural Network
  • fast RCNN Fast Region Convolutional Neural Network
  • the embodiments of the present application do not specifically limit the CNN and the algorithm for training the CNN.
  • the image detection model may also be tested using the first test sample set.
  • the first test sample set includes a plurality of first test samples, each of which corresponds to a test result. After inputting the first test sample into the image detection model, the terminal detects whether the detection result output by the image detection model is the same as the test result corresponding to the test sample, so as to realize whether the detection image detection model is trained to the set precision.
  • the network architecture of the image detection model is described below.
  • the image detection model includes an input layer, a convolution layer, a pooling layer, an upper convolution layer, a mosaic layer, a layer and an output layer.
  • the embodiment of the present application does not limit the number of layers included in the image detection model. Generally speaking, the more layers of the image detection model, the better the effect but the longer the calculation time. In practical applications, the pair can be combined. To determine the accuracy and efficiency requirements, design an image detection model with the appropriate number of layers.
  • the input layer is used to input the target image.
  • a convolution layer is used to convert the target image into a feature map.
  • the convolution layer is used to perform a convolution operation on the target image, the output of the active layer, the output of the pooling layer, and the output of the splicing layer.
  • the purpose of the convolution operation is to extract image features and map the input data to the feature space.
  • Each convolution layer is used to perform one or more convolution operations.
  • the input data of each convolution layer may be determined according to the position of the convolution layer in the image detection model.
  • the input data of the convolution layer is the target image;
  • the input data of the convolutional layer is the output data of the active layer;
  • the input data of the convolutional layer is pooled.
  • the output data of the layer; when the convolution layer is located at a layer behind the concatenation layer, the input data of the convolution layer is the output data of the concatenation layer.
  • the pooling layer is used to pool the feature map outputted by the convolutional layer to reduce the number of features in the feature map.
  • the pooling process can be a maximum pooling process or a mean pooling process. Among them, the function of the maximum pooling operation is to reduce the size of the feature map and increase the receptive field of the next layer.
  • the receptive field is the size of the area on the original image on which the pixel points on the feature map of each layer of the image detection model are mapped.
  • the input data of the pooling layer is usually the output data of the convolution layer, and the output data of the pooling layer is usually the input data of the convolution layer.
  • the upper convolution layer is used to perform a convolution operation on the feature map of the convolutional layer output.
  • the effect of the upper convolution operation is to increase the size of the feature map and map the learned features to larger sizes.
  • the input data of the upper convolution layer is usually the output data of the active layer, and the output data of the upper convolution layer is usually the input data of the concatenated layer.
  • the splicing layer is used for splicing the feature map processed by the pooling layer and the upper convolution layer to obtain a spliced feature map.
  • the function of the splicing operation is to splicing different feature images to facilitate the fusion of information of different feature dimensions, so as to learn more robust features.
  • the input data of the stitching layer is usually the output data of the pooling layer and the output data of the upper convolution layer, and the output data of the stitching layer is usually the input data of the convolution layer.
  • the layer is used to normalize the stitched feature map to obtain the location information of the target candidate region.
  • the function of the normalization process is to obtain the probability that each pixel in the spliced feature map belongs to the target object, and determine the position information of the target candidate region according to the above probability.
  • the image detection model may further include an activation layer.
  • the activation layer may be located in front of the pooling layer and/or the upper convolution layer, and the convolution layer is used to perform an activation operation on the output of the convolutional layer and output a target image that is marked out of the target candidate region. Since the feature space obtained by the convolution operation is limited, the feature space can be processed by an activation operation so that the feature space can represent more features.
  • the input data of the active layer is usually the output data of the convolutional layer.
  • the output data of the active layer may be determined according to the position of the active layer in the image detection model. When the active layer is located at the last layer in the image detection model, the output data of the active layer is the target image labeled with the target candidate region.
  • FIG. 4 there is shown a schematic diagram of a detection process shown in an exemplary embodiment of the present application (only the convolutional layer, the active layer, the pooled layer, the upper convolutional layer, and the splice layer are shown).
  • 1 represents a convolution operation
  • 2 represents an activation operation
  • 3 represents a maximum pooling operation
  • 4 represents a roll-up convolution operation
  • 5 represents a splicing operation
  • the leftmost rectangle represents the target image
  • the rightmost rectangle represents the Mark the target image of the target candidate area.
  • the other rectangular boxes represent the multi-channel feature map.
  • the height of the rectangular frame indicates the size of the feature map.
  • the thickness of the rectangular frame indicates the feature.
  • the number of channels in the graph the more the number of channels in the feature map, the thicker the thickness of the rectangle.
  • the black rectangular frame indicates the copy result of the output data of the active layer, and the rectangular frame spliced with the black rectangular frame indicates the output data of the upper convolution layer.
  • the image detection model performs a total of 15 convolution operations, 15 activation operations, 3 maximum pool operations, 3 convolution operations, and 3 splicing operations, that is, the image recognition model includes 9 Convolutional layer, 9 active layers, 3 pooled layers, 3 upper convolutional layers, and 3 splice layers.
  • the layers in the image detection model are sequentially connected by left and right in accordance with the execution order of the operations in FIG. 4, wherein the input end of the stitching layer is connected to both the upper convolution layer and the active layer.
  • the input data of the first convolutional layer is the target candidate area, and the input data of each layer is the output data of the upper layer.
  • the input data of the concatenated layer is the output data of the active layer and the output data of the upper convolution layer, and finally
  • the output data of an active layer is a target image marked with a target candidate area.
  • the training process of the image recognition model is as follows: the second training sample set is acquired, and the second training sample set is used to train the convolutional neural network CNN to obtain an image recognition model.
  • the second training sample set includes a plurality of second training samples.
  • the number of first training samples included in the first training sample set may be determined according to actual needs. The more the first training samples, the higher the accuracy of the detection of the image detection model; the lower the first training samples, the lower the accuracy of the image detection model.
  • Each second training sample corresponds to a recognition result.
  • the recognition result corresponding to the second training sample may be actually determined according to the type of the target object included in the second training sample.
  • the terminal can also classify according to the recognition result of each training sample.
  • FIG. 5 a schematic diagram of a second training sample set shown in one embodiment of the present application is shown.
  • the second training sample set includes two recognition results, respectively a gesture "Good” 31 and a gesture "Yeah” 32, and the gesture "Good” 31 corresponds to a plurality of second training samples 311 including a gesture of a thumbs up, the gesture " The Good” 32 corresponds to a plurality of second training samples 321 including gestures for erecting the index finger and the middle finger.
  • CNN can be an alexNet network, a VGG-16 network, and the like.
  • the algorithm used to train the CNN and obtain the image recognition model may be a fast RCNN algorithm, an RCNN algorithm, or the like.
  • the embodiments of the present application do not specifically limit the CNN and the algorithm for training the CNN.
  • the image recognition model may also be tested using the second test sample set.
  • the second test sample set includes a plurality of second test samples, each of which corresponds to a recognition result. After the terminal inputs the second test sample into the image recognition model, the detected image is whether the recognition result output by the model is the same as the recognition result corresponding to the test sample, so as to realize whether the detected image recognition model is trained to the set precision.
  • the network architecture of the image recognition model is described below.
  • the image recognition model includes an input layer, a convolution layer, a pooling layer, and an output layer.
  • the embodiment of the present application does not limit the number of layers included in the image recognition model.
  • the number of layers of the image recognition model is The more the effect, the better the calculation time will be.
  • the image recognition model with the appropriate number of layers can be designed in combination with the requirements for detection accuracy and efficiency.
  • the input layer is used to input a target candidate area.
  • the convolution layer is used to convert the target candidate region into a feature map.
  • the convolution layer is used to perform a convolution operation on the output of the target candidate region and the pooled layer.
  • the purpose of the convolution operation is to extract image features and map the input data to the feature space.
  • Each convolution layer is used to perform one or more convolution operations.
  • the input data of each convolution layer may be determined according to the position of the convolution layer in the image recognition model.
  • the input data of the convolution layer is the target candidate region or The processed target candidate region; when the convolution layer is located at a layer subsequent to the active layer, the input data of the convolution layer is the output data of the active layer; when the convolution layer is located at a layer subsequent to the pooling layer, the volume The layered input data is the output data of the pooled layer.
  • the pooling layer is used to pool the feature map outputted by the convolutional layer to reduce the number of features in the feature map.
  • the pooling process can be a maximum pooling process or a mean pooling process. Maximum The effect of the pooling operation is to reduce the size of the feature map and increase the receptive field of the next layer.
  • the receptive field is the size of the area on the original image of the pixel on the feature map output by each layer of the image recognition model.
  • the input data of the pooling layer is usually the output data of the active layer, and the output data of the pooling layer is usually the input data of the convolution layer.
  • the layer is used to normalize the feature map processed by the convolution layer and the pooling layer to obtain the recognition result.
  • the effect of the normalization process is to obtain a probability distribution in which the target object belongs to a plurality of recognition results, and to determine the recognition result based on the probability distribution.
  • the image recognition model may further include an activation layer.
  • the activation layer can be located before the pooling layer and after the activation layer.
  • the activation layer is used to perform an activation operation on the output of the convolutional layer. Since the feature space obtained by the convolution operation is limited, the feature space can be processed by an activation operation so that the feature space can represent more features.
  • the input data of the active layer is usually the output data of the convolutional layer.
  • the output data of the active layer may be determined according to the position of the active layer in the image recognition model. When the active layer is located at the last layer in the image recognition model, the output data of the active layer is the recognition result of the target image.
  • FIG. 6 there is shown a schematic diagram of a detection process (only the convolution layer, the activation layer, the pooling layer are shown) shown in an exemplary embodiment of the present application.
  • 1 represents a convolution operation
  • 2 represents an activation operation
  • 3 represents a maximum pooling operation
  • the leftmost rectangular box represents the target candidate area or the processed target candidate area
  • the rightmost rectangular box represents the target image is marked. Recognizing the result, other rectangular boxes represent multi-channel feature maps.
  • the height of the rectangular frame indicates the size of the feature map. The larger the size of the feature map, the higher the height of the rectangular frame.
  • the thickness of the rectangular frame indicates the number of channels of the feature map. The larger the number of channels in the graph, the thicker the thickness of the rectangular frame.
  • the image recognition model performs a total of 9 convolution operations, 9 activation operations, and 3 maximum pooling operations, that is, the image recognition model includes 9 convolution layers, 9 activation layers, and 3 Pooling layer.
  • the respective layers in the image recognition model are sequentially connected by left and right in the order in which the operations of the respective operations in FIG. 6 are performed.
  • the input data of the first convolutional layer is the target candidate area, and the input data of each layer is the output data of the upper layer, and the output data of the last active layer is the recognition result of the target image.
  • FIG. 7 shows a flowchart of an image recognition method according to another embodiment of the present application.
  • the method can include the following steps:
  • Step 401 Acquire a first training sample set.
  • the first training sample set includes a plurality of first training samples, each of which is marked with an area including the target object and/or an area not including the target object.
  • Step 402 Train the CNN with the first training sample set to obtain an image detection model.
  • Step 403 Acquire a second training sample set.
  • the second training sample set includes a plurality of second training samples, each of which corresponds to a recognition result.
  • Step 404 The CNN is trained by using the second training sample set to obtain an image recognition model.
  • the training process of the image detection model and the sequence of the training process of the image recognition process are not limited in the embodiment of the present application. That is, the terminal may perform steps 401 and 402 first, and then perform steps 403 and 404; the terminal may also perform steps 403 and 404 before performing steps 401 and 402.
  • Step 405 The target candidate region in the target image is detected by using an image detection model.
  • the target candidate area is an image block containing the target object.
  • Step 406 Obtain a ratio of the target candidate area to the target image.
  • step 407 If the ratio is less than or equal to the preset threshold, step 407 is performed; if the ratio is greater than the preset threshold, step 408 is performed.
  • Step 407 when the target candidate region is detected from the target image, the target candidate region is extracted.
  • Step 408 Identify the target candidate region by using an image recognition model, and obtain a recognition result of the target image.
  • the method provided by the embodiment of the present application firstly detects a target candidate region in the image that may include a target by using an image detection model, and then uses the image recognition model to identify the target candidate region based on the detected target.
  • the model is combined so that the target in the image can be accurately recognized when the proportion of the target in the image is small, and the accuracy of image recognition is improved.
  • the terminal needs to authenticate the user. For example, the terminal requires the user to perform a specified action, such as putting a gesture "Good” or a gesture "Yeah", the terminal collects an image through the camera, and performs the captured image. Identifying, obtaining the recognition result, and then comparing the recognition result with the specified specified action. If they are consistent, the identity verification is successful. If not, the identity verification fails.
  • a specified action such as putting a gesture "Good” or a gesture "Yeah”
  • the terminal collects an image through the camera, and performs the captured image. Identifying, obtaining the recognition result, and then comparing the recognition result with the specified specified action. If they are consistent, the identity verification is successful. If not, the identity verification fails.
  • FIG. 8 a schematic diagram of an interface for image recognition provided by an embodiment of the present application is shown.
  • the terminal recognizes the image collected by the electronic device, and the recognition result of the image is the gesture “Good” shown in the figure, that is, the gesture of raising the thumb.
  • FIG. 9 a schematic diagram of an interface for image recognition provided by another embodiment of the present application is shown.
  • the terminal recognizes the image collected by the electronic device, and the recognition result of the image is the gesture “Yeah” shown in the figure, that is, the gesture of erecting the index finger and the middle finger.
  • FIG. 10 shows a block diagram of an image recognition apparatus provided by an embodiment of the present application.
  • the device is applied to an electronic device and has the functions in the example of the above method, and the function may be implemented by hardware or may be implemented by hardware.
  • the apparatus may include an image detection module 501, an area extraction module 502, and an image recognition module 503.
  • the image detecting module 501 is configured to detect a target candidate region in the target image by using an image detection model, where the target candidate region is an image block that includes the target object.
  • the region extraction module 502 is configured to extract the target candidate region when the target candidate region is detected from the target image.
  • the image recognition module 503 is configured to perform image recognition based on the target candidate region by using an image recognition model to obtain a recognition result of the target image.
  • the image detecting module 501 is configured to:
  • the target candidate region is determined according to a probability corresponding to each pixel, and the target candidate region includes a pixel whose probability is greater than a preset threshold.
  • the image detecting module 501 is configured to:
  • a rectangular area including the target image block and meeting the second preset condition is determined as the target candidate area, and the second preset condition is that the proportion of the target image block in the rectangular area is greater than a preset ratio.
  • the image recognition module 503 is configured to:
  • the recognition result corresponding to the maximum value in the first probability distribution is determined as the recognition result of the target image.
  • the image recognition module 503 is configured to:
  • the recognition result corresponding to the maximum value in the second probability distribution is determined as the recognition result of the target image.
  • the image detection model includes an input layer, a convolution layer, a pooling layer, an upper convolution layer, a concatenation layer, a layer and an output layer;
  • the input layer is configured to input the target image;
  • the convolution layer is configured to convert the target image into a feature map;
  • the pooling layer is configured to perform a pooling process on a feature map output by the convolution layer
  • the upper convolution layer is configured to perform an up-convolution operation on a feature map output by the convolution layer;
  • the stitching layer is used to pass through the pooling layer and
  • the feature layer processed by the upper convolution layer is spliced to obtain a spliced feature map;
  • the homing layer is used for normalizing the spliced feature map to obtain the target candidate region.
  • Position information configured to output location information of the target candidate area.
  • the image recognition model includes an input layer, a convolution layer, a pooling layer, a layer and an output layer; and the input layer is used for inputting a target candidate region; the convolution layer is configured to convert the target candidate region into a feature map; the pooling layer is configured to perform pooling processing on the feature map to reduce the number of features in the feature map And returning the layer to perform normalization processing on the feature map processed by the convolution layer and the pooling layer to obtain the recognition result; and the output layer is configured to output the recognition result.
  • the apparatus further includes a ratio acquisition module 504 (not shown).
  • the ratio acquisition module 504 is configured to acquire a ratio of the target candidate area to the target image.
  • the image recognition module 503 is further configured to directly perform the step of recognizing the target candidate region by using an image recognition model to obtain a recognition result of the target image, if the ratio is greater than a preset threshold.
  • the apparatus further includes: a first obtaining module 505 and a first training module 506 (not shown).
  • a first obtaining module 505 configured to acquire a first training sample set, where the first training sample set includes a plurality of first training samples, each of the first training samples is marked with an area including the target and/or The area of the target is not included.
  • the first training module 506 is configured to train the convolutional neural network CNN by using the first training sample set to obtain the image detection model.
  • the apparatus further includes: a second acquisition module 507 and a second training module 508 (not shown).
  • the second obtaining module 507 is configured to obtain a second training sample set, where the second training sample set includes a plurality of second training samples, and each of the second training samples corresponds to a recognition result.
  • the second training module 508 is configured to train the convolutional neural network CNN by using the second training sample set to obtain the image recognition model.
  • the apparatus provided by the embodiment of the present application firstly detects a target candidate region that may include a target by using an image detection model, and then uses an image recognition model to identify the target candidate region based on the detected target.
  • the model is combined so that the target in the image can be accurately recognized when the proportion of the target in the image is small, and the accuracy of image recognition is improved.
  • FIG. 11 is a structural block diagram of an electronic device 600 provided by an exemplary embodiment of the present application.
  • the electronic device 600 can be a terminal such as a smart phone, tablet, laptop or desktop computer, or can be a server. In the embodiment of the present application, only the electronic device 600 is taken as an example for description.
  • the electronic device 600 includes a processor 601 and a memory 602.
  • Processor 601 can include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • the processor 601 may be configured by at least one of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). achieve.
  • the processor 601 may also include a main processor and a coprocessor.
  • the main processor is a processor for processing data in an awake state, which is also called a CPU (Central Processing Unit); the coprocessor is A low-power processor for processing data in standby.
  • the processor 601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and rendering of the content that the display needs to display.
  • the processor 601 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.
  • AI Artificial Intelligence
  • Memory 602 can include one or more computer readable storage media, which can be non-transitory. Memory 602 can also include high speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, the non-transitory computer readable storage medium in memory 602 is configured to store at least one instruction for execution by processor 601 to implement image recognition provided by the method embodiments of the present application. method.
  • the electronic device 600 also optionally includes a peripheral device interface 603 and at least one peripheral device.
  • the processor 601, the memory 602, and the peripheral device interface 603 can be connected by a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 603 via a bus, signal line or circuit board.
  • the peripheral device includes at least one of a radio frequency circuit 604, a touch display screen 605, a camera 606, an audio circuit 607, a positioning component 608, and a power source 609.
  • the peripheral device interface 603 can be used to connect at least one peripheral device associated with an I/O (Input/Output) to the processor 601 and the memory 602.
  • processor 601, memory 602, and peripheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, any of processor 601, memory 602, and peripheral interface 603 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
  • the RF circuit 604 is configured to receive and transmit an RF (Radio Frequency) signal, also referred to as an electromagnetic signal.
  • Radio frequency circuit 604 communicates with the communication network and other communication devices via electromagnetic signals.
  • the RF circuit 604 converts the electrical signal into an electromagnetic signal for transmission, or converts the received electromagnetic signal into an electrical signal.
  • the radio frequency circuit 604 includes an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and the like.
  • Radio frequency circuitry 604 can communicate with other electronic devices via at least one wireless communication protocol.
  • the wireless communication protocols include, but are not limited to, the World Wide Web, a metropolitan area network, an intranet, generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity) networks.
  • the radio frequency circuit 604 may also include a circuit related to NFC (Near Field Communication), which is not limited in this application.
  • the display screen 605 is used to display a UI (User Interface).
  • the UI can include graphics, text, icons, video, and any combination thereof.
  • display 605 is a touch display
  • display 605 also has the ability to capture touch signals over the surface or surface of display 605.
  • the touch signal can be input to the processor 601 as a control signal for processing.
  • the display screen 605 can also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards.
  • the display screen 605 can be one, and the front panel of the electronic device 600 is disposed; in other embodiments, the display screen 605 can be at least two, respectively disposed on different surfaces of the electronic device 600 or in a folded design.
  • the display screen 605 can be a flexible display screen disposed on a curved surface or a folded surface of the electronic device 600. Even the display screen 605 can be set to a non-rectangular irregular pattern, that is, a profiled screen.
  • the display screen 605 can be prepared by using a material such as an LCD (Liquid Crystal Display) or an OLED (Organic Light-Emitting Diode).
  • Camera component 606 is used to capture images or video.
  • camera assembly 606 includes a front camera and a rear camera.
  • the front camera is placed on the front panel of the electronic device and the rear camera is placed on the back of the electronic device.
  • the rear camera is at least two, which are respectively a main camera, a depth camera, a wide-angle camera, and a telephoto camera, so as to realize the background blur function of the main camera and the depth camera, and the main camera Combine with a wide-angle camera for panoramic shooting and VR (Virtual Reality) shooting or other integrated shooting functions.
  • camera assembly 606 can also include a flash.
  • the flash can be a monochrome temperature flash or a two-color temperature flash.
  • the two-color temperature flash is a combination of a warm flash and a cool flash that can be used for light compensation at different color temperatures.
  • the audio circuit 607 can include a microphone and a speaker.
  • the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals for processing to the processor 601 for processing, or input to the radio frequency circuit 604 for voice communication.
  • the microphones may be multiple, and are respectively disposed at different parts of the electronic device 600.
  • the microphone can also be an array microphone or an omnidirectional acquisition microphone.
  • the speaker is then used to convert electrical signals from the processor 601 or the RF circuit 604 into sound waves.
  • the speaker can be a conventional film speaker or a piezoelectric ceramic speaker.
  • audio circuit 607 can also include a headphone jack.
  • the location component 608 is used to locate the current geographic location of the electronic device 600 to implement navigation or LBS (Location Based Service).
  • the positioning component 608 can be a positioning component based on a US-based GPS (Global Positioning System), a Chinese Beidou system, or a Russian Galileo system.
  • Power source 609 is used to power various components in electronic device 600.
  • the power source 609 can be an alternating current, a direct current, a disposable battery, or a rechargeable battery.
  • the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery.
  • a wired rechargeable battery is a battery that is charged by a wired line
  • a wireless rechargeable battery is a battery that is charged by a wireless coil.
  • the rechargeable battery can also be used to support fast charging technology.
  • electronic device 600 also includes one or more sensors 610.
  • the one or more sensors 610 include, but are not limited to, an acceleration sensor 611, a gyro sensor 612, a pressure sensor 613, a fingerprint sensor 614, an optical sensor 615, and a proximity sensor 616.
  • the acceleration sensor 611 can detect the magnitude of the acceleration on the three coordinate axes of the coordinate system established by the electronic device 600.
  • the acceleration sensor 611 can be used to detect components of gravity acceleration on three coordinate axes.
  • the processor 601 can control the touch display screen 605 to display the user interface in a landscape view or a portrait view according to the gravity acceleration signal collected by the acceleration sensor 611.
  • the acceleration sensor 611 can also be used for the acquisition of game or user motion data.
  • the gyro sensor 612 can detect the body direction and the rotation angle of the electronic device 600, and the gyro sensor 612 can cooperate with the acceleration sensor 611 to collect the 3D action of the user on the electronic device 600.
  • the processor 601 can realize functions such as motion sensing (such as changing the UI according to the user's tilting operation), image stabilization at the time of shooting, game control, and inertial navigation, based on the data collected by the gyro sensor 612.
  • the pressure sensor 613 may be disposed on a side frame of the electronic device 600 and/or a lower layer of the touch display screen 605.
  • the pressure sensor 613 When the pressure sensor 613 is disposed on the side frame of the electronic device 600, the user's holding signal to the electronic device 600 can be detected, and the processor 601 performs left and right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 613.
  • the operability control on the UI interface is controlled by the processor 601 according to the user's pressure operation on the touch display screen 605.
  • the operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
  • the fingerprint sensor 614 is used to collect the fingerprint of the user.
  • the processor 601 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 614, or the fingerprint sensor 614 identifies the identity of the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 601 authorizes the user to perform related sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying and changing settings, and the like.
  • the fingerprint sensor 614 can be disposed on the front, back, or side of the electronic device 600. When the physical device 600 or the manufacturer logo is disposed on the electronic device 600, the fingerprint sensor 614 can be integrated with the physical button or the manufacturer logo.
  • Optical sensor 615 is used to collect ambient light intensity.
  • the processor 601 can control the display brightness of the touch display 605 according to the ambient light intensity acquired by the optical sensor 615. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 605 is raised; when the ambient light intensity is low, the display brightness of the touch display screen 605 is lowered.
  • the processor 601 can also dynamically adjust the shooting parameters of the camera assembly 606 according to the ambient light intensity collected by the optical sensor 615.
  • Proximity sensor 616 also referred to as a distance sensor, is typically disposed on the front panel of electronic device 600. Proximity sensor 616 is used to capture the distance between the user and the front of electronic device 600. In one embodiment, when the proximity sensor 616 detects that the distance between the user and the front side of the electronic device 600 is gradually decreasing, the touch screen display 605 is controlled by the processor 601 to switch from the bright screen state to the screen state; when the proximity sensor 616 When it is detected that the distance between the user and the front side of the electronic device 600 gradually becomes larger, the processor 601 controls the touch display screen 605 to switch from the state of the screen to the state of the screen.
  • FIG. 11 does not constitute a limitation to the electronic device 600, and may include more or less components than those illustrated, or some components may be combined, or different component arrangements may be employed.
  • a computer readable storage medium having stored therein at least one instruction, at least one program, a code set or a set of instructions, the at least one instruction, the at least one program
  • the code set or instruction set is loaded and executed by a processor of the terminal to implement the image recognition method in the above method embodiment.
  • the computer readable storage medium described above may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
  • a plurality as referred to herein means two or more.
  • "and/or” describing the association relationship of the associated objects, indicating that there may be three relationships, for example, A and/or B, which may indicate that there are three cases where A exists separately, A and B exist at the same time, and B exists separately.
  • the character “/” generally indicates that the contextual object is an “or” relationship.
  • the words “first,” “second,” and similar terms used herein do not denote any order, quantity, or importance, but are used to distinguish different components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

一种图像识别方法、装置及电子设备。方法包括:采用图像检测模型检测目标图像中的目标候选区域(101);当从目标图像中检测出目标候选区域时,提取目标候选区域(102);采用图像识别模型基于目标候选区域进行图像识别,得到目标图像的识别结果(103)。上述方法先通过图像检测模型初步检测出图像中可能包括目标的目标候选区域,之后采用图像识别模型基于检测出的目标候选区域进行识别,将上述两种模型结合,从而在目标在图像中所占的比例较小的情况下,也能准确地识别出图像中的目标,提高了图像识别的准确性。

Description

图像识别方法、装置及电子设备
本申请要求于2017年11月23日提交的申请号为201711180320.X、发明名称为“图像识别方法、装置及终端”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及机器学习技术领域,特别涉及一种图像识别方法、装置及电子设备。
背景技术
图像识别技术是指识别出图像所包括的物体的技术,是一种常见的图像处理的方式。
相关技术中,终端先采用样本集对卷积神经网络(Convolutional Neural Network,CNN)进行训练,得到图像识别模型,之后将待识别的图像输入上述训练好的图像识别模型,由图像识别模型对图像进行识别,并输出识别结果。
相关技术中,当待识别的物体在图像中所占的比例较小时,会出现识别错误或者无法识别的情况。
发明内容
本申请实施例提供了一种图像识别方法、装置及电子设备,可用以解决相关技术中所存在的当待识别的物体在图像中所占的比例较小时,会出现识别错误或者无法识别的情况的问题。所述技术方案如下:
一方面,本申请实施例提供了一种图像识别方法,应用于电子设备中,所述方法包括:
采用图像检测模型检测目标图像中的目标候选区域,所述目标候选区域为包含目标对象的图像块;
当从所述目标图像中检测出所述目标候选区域时,提取所述目标候选区域;
采用图像识别模型基于所述目标候选区域进行图像识别,得到所述目标图像的识别结果。
另一方面,本申请实施例提供了一种图像识别装置,应用于电子设备中,所述装置包括:
图像检测模块,用于采用图像检测模型检测目标图像中的目标候选区域,所述目标候选区域为包含目标对象的图像块;
区域提取模块,用于当从所述目标图像中检测出所述目标候选区域时,提取所述目标候选区域;
图像识别模块,用于采用图像识别模型基于所述目标候选区域进行图像识别,得到所述目标图像的识别结果。
可选地,所述图像检测模块,用于:
采用所述图像检测模型获取所述目标图像中的每一个像素属于所述目标对象的概率;
根据各个像素对应的概率确定所述目标候选区域,所述目标候选区域包括概率大于预设阈值的像素。
可选地,所述图像检测模块,用于:
根据各个像素对应的概率获取符合第一预设条件的图像块,将所述符合第一预设条件的图像块确定为目标图像块,其中,所述第一预设条件是指包含连续且大于预设数量的目标像素,所述目标像素是指概率大于预设阈值的像素;
将包含所述目标图像块且符合第二预设条件的矩形区域确定为所述目标候选区域,所述第二预设条件为所述目标图像块在所述矩形区域内的占比大于预设比例。
可选地,所述图像识别模块,用于:
采用所述图像识别模型对所述目标候选区域进行特征提取,得到所述目标候选区域的图像特征;
根据所述目标候选区域的图像特征,确定所述目标候选区域中的目标对象在多个识别结果中的第一概率分布;
将所述第一概率分布中的最大值所对应的识别结果,确定为所述目标图像的识别结果。
可选地,所述图像识别模块,用于:
对所述目标候选区域进行预处理,得到处理后的目标候选区域,所述处理后的目标候选区域的分辨率达到预设分辨率;
采用所述图像识别模型对所述处理后的目标候选区域进行特征提取,得到所述处理后的目标候选区域的图像特征;
根据所述处理后的目标候选区域的图像特征,确定所述目标候选区域中的目标对象在多个识别结果中的第二概率分布;
将所述第二概率分布中的最大值所对应的识别结果,确定为所述目标图像的识别结果。
可选地,所述图像检测模型包括输入层、卷积层、池化层、上卷积层、拼接层、归一层和输出层;所述输入层用于输入所述目标图像;所述卷积层用于将所述目标图像转化为特征图;所述池化层用于对所述卷积层输出的特征图进行池化处理,以减少所述特征图中的特征数量;所述上卷积层用于对所述卷积层输出的特征图执行上卷积操作;所述拼接层用于对经过所述池化层和所述上卷积层处理后的特征图进行拼接处理,得到拼接后的特征图;所述归一层,用于对所述拼接后的特征图进行归一处理,得到所述目标候选区域的位置信息;所述输出层,用于输出所述目标候选区域的位置信息。
可选地,所述图像识别模型包括输入层、卷积层、池化层、归一层和输出层;所述输入层用于输入所述目标候选区域;所述卷积层用于将所述目标候选区域转化为特征图;所述池化层用于对所述特征图进行池化处理,以减少所述特征图中的特征数量;所述归一层用于对经过所述卷积层和所述池化层处理后的特征图进行归一处理,得到所述识别结果;所述输出层用于输出所述识别结果。
可选地,所述装置还包括:
比例获取模块,用于获取所述目标候选区域占所述目标图像的比例;
所述图像识别模块,还用于若所述比例大于预设门限,则直接执行所述采用图像识别模型对所述目标候选区域进行识别,得到所述目标图像的识别结果的步骤。
可选地,所述装置还包括:
第一获取模块,用于获取第一训练样本集,所述第一训练样本集包含多张第一训练样本,每张所述第一训练样本被标记出包括所述目标的区域和/或不包括所述目标的区域;
第一训练模块,用于采用所述第一训练样本集对卷积神经网络CNN进行训 练,得到所述图像检测模型。
可选地,所述装置还包括:
第二获取模块,用于获取第二训练样本集,所述第二训练样本集包含多张第二训练样本,每张所述第二训练样本对应有识别结果;
第二训练模块,用于采用所述第二训练样本集对卷积神经网络CNN进行训练,得到所述图像识别模型。
又一方面,本申请实施例提供了一种电子设备,所述电子设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如第一方面所述的图像识别方法。
再一方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如第一方面所述的图像识别方法。
再一方面,本申请实施例提供了一种计算机程序产品,当该计算机程序产品被执行时,其用于执行上述第一方面所述的图像识别方法。
本申请实施例提供的技术方案可以带来如下有益效果:
先通过图像检测模型初步检测出图像中可能包括目标对象的目标候选区域,并提取出目标候选区域,之后采用图像识别模型基于提取到的目标候选区域进行识别,得到识别结果,当目标对象在图像中所占的比例较小时,由于电子设备从图像中提取出了包含目标对象的目标候选区域,目标对象在目标候选区域所占的比例较大,此时通过图像识别模型对目标候选区域进行识别,可以避免相关技术中由于目标对象在图像中所占的比例较小时出现无法识别甚至识别错误的情况,提高图像识别的成功率。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一个示例性实施例示出的图像识别方法的流程图;
图2是图1所示实施例涉及的示意图;
图3是本申请一个示例性实施例示出的第一训练样本的示意图;
图4是本申请一个示例性实施例示出的检测过程的示意图;
图5是本申请一个示例性实施例示出的第二训练样本集的示意图;
图6是本申请一个示例性实施例示出的识别过程的示意图;
图7是本申请另一个示例性实施例示出的图像识别方法的流程图;
图8是本申请一个示例性实施例示出的图形识别的界面示意图;
图9是本申请一个示例性实施例示出的图形识别的界面示意图;
图10是本申请一个示例性实施例示出的图像识别装置的结构方框图;
图11是本申请另一个实施例示出的图像识别装置的结构方框图;
图12是本申请一个示例性实施例示出的电子设备的结构方框图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
相关技术中,在通过相关模型进行图像识别时,该模型通常根据对一张图像中的感兴趣程度将该图像划分为多个区域,之后从感兴趣程度较高的区域中学习相关特征,进而根据学习到的特征来确定图像识别结果。当待识别物体在图像中所占的比例较小时,后续通过上述模型进行识别时,包含该待识别物体的区域被模型确定为感兴趣区域的概率较低,此时上述模型基于图像中除包含该待识别物体的区域之外的区域来进行图像识别,可能会出现识别错误或者无法识别的情况。
基于此,本申请实施例提供了一种图像识别方法、装置及电子设备。在本申请实施例中,先通过图像检测模型初步检测出图像中可能包括目标对象的目标候选区域,并提取出目标候选区域,之后采用图像识别模型基于提取到的目标候选区域进行识别,得到识别结果,当目标对象在图像中所占的比例较小时,由于电子设备从图像中提取出了包含目标对象的目标候选区域,目标对象在目标候选区域所占的比例较大,此时通过图像识别模型对目标候选区域进行识别,可以避免相关技术中由于目标对象在图像中所占的比例较小时出现无法识别甚 至识别错误的情况,提高图像识别的成功率。
本申请实施例提供的方法,各步骤的执行主体可以是电子设备,该电子设备具有图像处理能力。可选地,该电子设备可以是诸如个人计算机、手机、平板电脑等终端,也可以是服务器。
请参考图1,其示出了本申请一个实施例示出的图像识别方法的流程图。该方法可以包括如下步骤:
步骤101,采用图像检测模型检测目标图像中的目标候选区域。
目标候选区域为包含目标对象的图像块。目标对象是指目标图像中的待识别物体,其可以是人脸、物体、手势等等,本申请实施例对此不作限定。目标图像是待检测的图像,其可以是图片,也可以是视频中的某一帧图像。
图像检测模型用于检测目标图像中是否包括目标对象。可选地,图像检测模型还用于检测目标对象在目标图像中的大致区域,也即目标候选区域。可选地,图像检测模型是对CNN进行训练得到的模型。对于图像检测模型的训练过程以及网络架构,将在下文实施例进行介绍。
可选地,步骤101可以包括如下几个子步骤:
步骤101a,采用图像检测模型获取目标图像中的每一个像素属于目标对象的概率;
图像检测模型能够对目标图像中的每一个像素进行特征提取,并将各个像素对应的特征提取结果与预设的图像特征进行匹配,上述特征提取结果与预设的图像特征之间的匹配程度可以用来衡量特征提取结果对应的像素属于目标对象的概率。特征提取结果与预设的图像特征之间的匹配程度越大,则该特征提取结果对应的像素属于目标对象的概率越大;特征提取结果与预设的图像特征之间的匹配程度越小,则该特征提取结果对应的像素属于目标对象的概率越小。其中,预设的图像特征可以是组成目标的像素对应的图像特征,其可以在训练出图像检测模型之后得到。
另外,获取目标图像的各个像素属于目标对象的概率之后,可以采用概率矩阵来表示上述概率。其中,概率矩阵所包括的概率与目标图像所包括的像素点一一对应。例如,概率矩阵第4行第3列的数值用于指示目标图像第4行第3列的像素点对应的概率。
步骤101b,根据各个像素对应的概率确定目标候选区域。
目标候选区域包括概率大于预设阈值的像素。预设阈值可以根据图像识别模型对目标占目标图像的比例要求实际确定。例如,图像识别模型要求目标占目标图像的比例较大时,则预设阈值也越大。示例性地,预设阈值为0.7。可选地,终端对概率矩阵进行二值化处理,将大于或等于预设阈值的概率设置为1,将不大于预设阈值的设置为0。通过上述方式,将大于或等于预设阈值的概率,以及小于预设阈值的概率进行区分。
可选地,确定目标候选区域可以采用如下方式:根据各个像素对应的概率获取符合第一预设条件的图像块,将符合第一预设条件的图像块确定为目标图像块,其中,第一预设条件是指包含连续且大于预设数量的目标像素,目标像素是指概率大于预设阈值的像素;将包含目标图像块且符合第二预设条件的矩形区域确定为目标候选区域,第二预设条件为目标图像块在矩形区域内的占比大于预设比例。预设数量、预设阈值和预设比例均可以根据实际需求设定,本申请实施例对此不作限定。
进一步地,第二预设条件还可以是目标图像块的占比达到最大,也即,矩形区域是包含目标图像块的最小矩形区域。通过上述方式,目标在目标候选区域的占比尽可能地大,后续采用图像识别模型识别时,能使识别效率得到提高,并且能提高识别的准确度。
结合参考图2,其示出了图1所示实施例涉及的示意图。图像检测模型11对输入的目标图像10进行检测之后,输出被标记有目标候选区域12的目标图像10。
步骤102,当从目标图像中检测出目标候选区域时,提取目标候选区域。
从目标图像中提取目标候选区域,也即从目标图像中截取目标候选区域。结合参考图2,终端从目标图像10中提取目标候选区域12。
当从目标图像中未检测到目标候选区域时,说明该目标图像中不包括目标对象,即可结束流程。
另外,当目标对象在目标图像中所占的比例较大时,终端可以直接对目标图像进行识别,而无需执行步骤102,也即无需从目标图像中提取目标候选区域,因此在步骤102之前,终端可以获取目标候选区域占目标图像的比例,若比例大于预设门限,则直接执行步骤103,若比例小于或等于预设门限,则执行步骤 102。其中,预设门限可以根据图像识别模型的识别精度实际确定。示例性地,预设门限为30%。通过上述方式,可以省去提取目标候选区域所需的时间,提升图像识别的效率。
步骤103,采用图像识别模型基于目标候选区域进行图像识别,得到目标图像的识别结果。
目标图像的识别结果是指目标图像中所包括的目标对象所属的分类。例如,图标图像为包括一手势的图像,该目标图像的识别结果是指该手势所属的分类。图像识别模型用于识别目标并对目标进行分类。可选地,图像识别模型也是对CNN进行训练得到的模型。对于图像识别模型的训练过程以及网络架构,将在下文实施例进行解释说明。
另外,终端获取目标候选区域之后,可以直接对目标候选区域进行识别,也可以在对目标候选区域进行预处理之后,再对处理后的目标候选区域进行识别。下面将分别对上述两种方式进行讲解。
在第一种可能的实施方式中,终端直接对目标候选区域进行识别,步骤103可以包括如下子步骤:
步骤103a,采用图像识别模型对目标候选区域进行特征提取,得到目标候选区域的图像特征;
步骤103b,根据目标候选区域的图像特征,确定目标候选区域中的目标对象在多个识别结果中的第一概率分布;
步骤103c,将第一概率分布中的最大值所对应的识别结果,确定为目标图像的识别结果。
目标对象在多个识别结果中的第一概率分布食指目标对象属于上述多个识别结果中的每个识别结果的概率。示例性,目标对象属于手势“Good”的概率为0.95,目标对象属于手势“Yeah”的概率为0.05,此时电子设备将手势“Good”确定为目标图像的识别结果。
在第二种可能的实施方式中,终端在对目标候选区域进行预处理之后,再对处理后的目标候选区域进行识别,此时步骤103可以包括如下子步骤:
步骤103d,对目标候选区域进行预处理,得到处理后的目标候选区域,处理后的目标候选区域的分辨率达到预设分辨率;
预设分辨率是图像识别模型对待识别图像的分辨率的要求。示例性地,预 设分辨率为440*360。由于图像识别模型对待识别的分辨率存在要求,若分辨率不符合要求,则图像识别模型在识别过程中由于需要考虑到分辨率换算问题,该过程所需的计算量较多,耗时较长。在该示例中,在通过图像识别模型进行图像识别时,预先将待识别图像的分辨率转换至图像识别模型要求的分辨率,后续图像识别时可以减小工作量,并节省图像识别所需的时间,提升图像识别的效率。终端先获取目标候选区域的分辨率,之后对目标候选区域的分辨率进行分辨率提升处理,并使处理后的目标候选区域的分辨率达到预设分辨率。其中,分辨率提升处理所采用的算法可以是最近邻插值法算法、双线性插值算法、立方卷积插值算法等等,本申请实施例对此不作限定。
步骤103e,采用图像识别模型对处理后的目标候选区域进行特征提取,得到处理后的目标候选区域的图像特征;
步骤103f,根据处理后的目标候选区域的图像特征,确定目标候选区域中的目标对象在多个识别结果中的第二概率分布;
步骤103g,将第二概率分布中的最大值所对应的识别结果,确定为目标图像的识别结果。
步骤103e至步骤103f与步骤103a至103b相同,此处不再赘述。
结合参考图2,图像识别模型13对目标候选区域12进行识别,输出目标图像10的识别结果14,该识别结果14为图中所示的手势“Good”,也即竖起大拇指的手势。
综上所述,本申请实施例提供的方法,先通过图像检测模型初步检测出图像中可能包括目标对象的目标候选区域,并提取出目标候选区域,之后采用图像识别模型基于提取到的目标候选区域进行识别,得到识别结果,当目标对象在图像中所占的比例较小时,由于电子设备从图像中提取出了包含目标对象的目标候选区域,目标对象在目标候选区域所占的比例较大,此时通过图像识别模型对目标候选区域进行识别,可以避免相关技术中由于目标对象在图像中所占的比例较小时出现无法识别甚至识别错误的情况,提高图像识别的成功率。
另外,在本申请实施例中,由于级联网络中的每个子网络(也即图像检测模型与图像识别模型)是相互独立和解耦的,因此能够灵活复用或者替换每一个子网络,方便针对不同的用户提供不同优化偏好的模型组合。例如,有的用户对对准确率要求更高,则可以对图形识别模型进行优化,以期获得更准确的 图像识别结果。
下面将对图像检测模型的训练过程以及网络架构进行讲解。
图像检测模型的训练过程如下:获取第一训练样本集,采用第一训练样本集对CNN进行训练,得到图像检测模型。
第一训练样本集包含多张第一训练样本。第一训练样本集所包括的第一训练样本的数量可以根据实际需求确定。每张第一训练样本被标记出包括目标对象的区域和/或不包括目标对象的区域。其中,对第一训练样本进行标记的过程可以人工完成。结合参考图3,其示出了本申请一个示例性实施例示出的第一训练样本20的示意图。其中,第一训练样本20中包括由黑线组成的轮廓21,轮廓21的内部是包括目标对象的区域,轮廓21的外部是不包括目标对象的区域。
需要说明的是,在不同的第一训练样本中,目标对象占第一训练样本的比例可以相同,也可以不同。示例性地,目标对象占第一训练样本A的比例为0.3,目标占第一训练样本B的比例为0.6。另外,第一训练样本所包括的目标对象的类型可以相同,也可以不同。示例性地,第一训练样本A所包括的目标对象为手势“Good”,第一训练样本B所包括的目标对象为手势“Yeah”。
另外,CNN可以是alexNet网络、VGG-16网络等等。另外。对CNN进行训练并得到图像检测模型所采用的算法可以是采用区域卷积神经网络(Regions with Convolutional Neural Network,RCNN)算法、快速区域卷积神经网络(faster RCNN)算法等等。本申请实施例对CNN,以及训练CNN的算法不作具体限定。
另外,在训练出图像检测模型之后,还可以采用第一测试样本集对图像检测模型进行测试。第一测试样本集包括多张第一测试样本,每张测试样本对应有测试结果。终端将第一测试样本输入图像检测模型后,检测图像检测模型输出的检测结果与该测试样本对应的测试结果是否相同,以实现检测图像检测模型是否训练至设定的精度。
图像检测模型的网络架构参见下文介绍。
图像检测模型包括输入层、卷积层、池化层、上卷积层、拼接层、归一层和输出层。本申请实施例对图像检测模型所包括的各层的数量不作限定,一般来说,图像检测模型的层数越多,效果越好但计算时间也会越长,在实际应用中,可结合对检测精度和效率的要求,设计适当层数的图像检测模型。
输入层用于输入目标图像。
卷积层用于将目标图像转化为特征图。在本申请实施例中,卷积层用于对目标图像、激活层的输出、池化层的输出、拼接层的输出执行卷积操作。卷积操作的作用是提取图像特征,并将输入数据映射到特征空间。每个卷积层用于执行一次或多次卷积操作。另外,各个卷积层的输入数据可以根据卷积层在图像检测模型中的位置确定,当卷积层位于图像检测模型中的第一层时,该卷积层的输入数据为目标图像;当卷积层位于激活层之后的一层时,该卷积层的输入数据为激活层的输出数据;当卷积层位于池化层之后的一层时,该卷积层的输入数据为池化层的输出数据;当卷积层位于拼接层之后的一层时,该卷积层的输入数据为拼接层的输出数据。
池化层用于对卷积层输出的特征图进行池化处理,以减少所述特征图中的特征数量。池化处理可以是最大值池化处理,也可以是均值池化处理。其中,最大值池化操作的作用是降低特征图的尺寸,增大下一层的感受野。感受野是图像检测模型每一层输出的特征图(feature map)上的像素点在原始图像上映射的区域大小。池化层的输入数据通常为卷积层的输出数据,池化层的输出数据通常为卷积层的输入数据。
上卷积层用于对卷积层输出的特征图进行上卷积操作。上卷积操作的作用是增大特征图的尺寸,将学习到的特征映射到更大的尺寸上。上卷积层的输入数据通常为激活层的输出数据,上卷积层的输出数据通常为拼接层的输入数据。
拼接层用于对经过池化层和上卷积层处理后的特征图进行拼接处理,得到拼接后的特征图。拼接操作的作用是将不同的特征图拼接起来,方便融合不同特征维度的信息,从而学习到更鲁棒的特征。拼接层的输入数据通常为池化层的输出数据和上卷积层的输出数据,拼接层的输出数据通常为卷积层的输入数据。
归一层用于对拼接后的特征图进行归一处理,得到目标候选区域的位置信息。归一处理的作用是得到拼接后的特征图中每个像素点属于目标对象的概率,并根据上述概率来确定目标候选区域的位置信息。
可选地,该图像检测模型还可以包括激活层。激活层可以位于池化层和/或上卷积层的前面,以及卷积层的后面,激活层用于对卷积层的输出执行激活操作,并输出被标记出目标候选区域的目标图像。由于卷积操作所得到的特征空间有限,因此可以通过激活操作对特征空间进行处理,以使得特征空间能表示 的特征更多。激活层的输入数据通常都是卷积层的输出数据。激活层的输出数据可以根据激活层在图像检测模型中的位置确定,当激活层位于图像检测模型中的最后一层时,该激活层的输出数据为被标记出目标候选区域的目标图像。
下面将结合图像检测模型的网络架构,对图像检测模型的检测过程进行讲解。结合参考图4,其示出了本申请一个示例性实施例示出的检测过程的示意图(图中仅示出了卷积层、激活层、池化层、上卷积层与拼接层)。其中,①代表卷积操作,②代表激活操作,③代表最大值池化操作,④代表上卷卷积操作,⑤代表拼接操作;最左边的矩形框表示目标图像,最右边的矩形框表示被标记出目标候选区域的目标图像,其它的矩形框表示多通道特征图,矩形框的高度表示特征图的尺寸,特征图的尺寸越大,矩形框的高度就越高;矩形框的厚度表示特征图的通道数量,特征图的通道数量越多,矩形框的厚度就越厚。黑色的矩形框表示对激活层的输出数据的复制结果,与黑色的矩形框拼接的矩形框表示上卷积层的输出数据。
在本申请实施例中,以图像检测模型中的每层仅执行一次操作来进行解释说明。在图4中,图像检测模型共执行了15次卷积操作、15次激活操作、3次最大值池化操作、3次上卷积操作和3个拼接操作,也即,图像识别模型包括9个卷积层、9个激活层、3个池化层、3个上卷积层和3个拼接层。图像检测模型中的各个层按照图4中各个操作的执行顺序由左及右顺次连接,其中,拼接层的输入端与上卷积层和激活层均连接。第一个卷积层的输入数据是目标候选区域,之后每一层的输入数据是上一层的输出数据,拼接层的输入数据是激活层的输出数据和上卷积层的输出数据,最后一个激活层的输出数据是标记有目标候选区域的目标图像。
下面将对图像识别模型的训练过程进行讲解。图像识别模型的训练过程如下:获取第二训练样本集,采用第二训练样本集对卷积神经网络CNN进行训练,得到图像识别模型。
第二训练样本集包含多张第二训练样本。第一训练样本集所包括的第一训练样本的数量可以根据实际需求确定。第一训练样本越多时,图像检测模型的检测的精度越高;第一训练样本越低时,图像检测模型的精度越低。
每张第二训练样本对应有识别结果。第二训练样本对应的识别结果可以根 据第二训练样本包括的目标对象的类型实际确定。另外,终端还可以根据各张训练样本的识别结果进行分类。结合参考图5,其示出了本申请一个实施例示出的第二训练样本集的示意图。第二训练样本集包括两个识别结果,分别为手势“Good”31和手势“Yeah”32,手势“Good”31对应有多张包含竖起大拇指的手势的第二训练样本311,手势“Good”32对应有多张包含竖起食指与中指的手势的第二训练样本321。
另外,CNN可以是alexNet网络、VGG-16网络等等。另外。对CNN进行训练并得到图像识别模型所采用的算法可以是采用faster RCNN算法、RCNN算法等等。本申请实施例对CNN,以及训练CNN的算法不作具体限定。
另外,在训练出图像识别模型之后,还可以采用第二测试样本集对图像识别模型进行测试。第二测试样本集包括多张第二测试样本,每张测试样本对应有识别结果。终端将第二测试样本输入图像识别模型后,检测图像是被模型输出的识别结果与该测试样本对应的识别结果是否相同,以实现检测图像识别模型是否训练至设定的精度。
图像识别模型的网络架构参见下文介绍。
可选地,图像识别模型包括输入层、卷积层、池化层和输出层,本申请实施例对图像识别模型所包括的各层的数量不作限定,一般来说,图像识别模型的层数越多,效果越好但计算时间也会越长,在实际应用中,可结合对检测精度和效率的要求,设计适当层数的图像识别模型。
输入层用于输入目标候选区域。
卷积层用于将目标候选区域转化为特征图。在本申请实施例中,卷积层用于对目标候选区域、和池化层的输出执行卷积操作。卷积操作的作用是提取图像特征,并将输入数据映射到特征空间。每个卷积层用于执行一次或多次卷积操作。另外,各个卷积层的输入数据可以根据卷积层在图像识别模型中的位置确定,当卷积层位于图像识别模型中的第一层时,该卷积层的输入数据为目标候选区域或处理后的目标候选区域;当卷积层位于激活层之后的一层时,该卷积层的输入数据为激活层的输出数据;当卷积层位于池化层之后的一层时,该卷积层的输入数据为池化层的输出数据。
池化层用于对卷积层输出的特征图进行池化处理,以减少所述特征图中的特征数量。池化处理可以是最大值池化处理,也可以是均值池化处理。最大值 池化操作的作用是降低特征图的尺寸,增大下一层的感受野。感受野是图像识别模型每一层输出的特征图上的像素点在原始图像上映射的区域大小。池化层的输入数据通常为激活层的输出数据,池化层的输出数据通常为卷积层的输入数据。
归一层用于对经过卷积层和池化层处理后的特征图进行归一处理,得到识别结果。在该实施例中,归一处理的作用是得目标对象属于多个识别结果的概率分布,并根据该概率分布来确定出识别结果。
可选地,该图像识别模型还可以包括激活层。激活层可以位于池化层之前,且位于激活层之后。激活层用于对卷积层的输出执行激活操作。由于卷积操作所得到的特征空间有限,因此可以通过激活操作对特征空间进行处理,以使得特征空间能表示的特征更多。激活层的输入数据通常都是卷积层的输出数据。激活层的输出数据可以根据激活层在图像识别模型中的位置确定,当激活层位于图像识别模型中的最后一层时,该激活层的输出数据为目标图像的识别结果。
下面将结合图像识别模型的网络架构,对图像识别模型的检测过程进行讲解。结合参考图6,其示出了本申请一个示例性实施例示出的检测过程的示意图(图中仅示出了卷积层、激活层、池化层)。其中,①代表卷积操作,②代表激活操作,③代表最大值池化操作;最左边的矩形框表示目标候选区域或者处理后的目标候选区域,最右边的矩形框表示被标记出目标图像的识别结果,其它的矩形框表示多通道特征图,矩形框的高度表示特征图的尺寸,特征图的尺寸越大,矩形框的高度就越高;矩形框的厚度表示特征图的通道数量,特征图的通道数量越多,矩形框的厚度就越厚。
在本申请实施例中,以图像识别模型中的每层仅执行一次操作来进行解释说明。在图6中,图像识别模型共执行了9次卷积操作、9次激活操作和3次最大值池化操作,也即,图像识别模型包括9个卷积层、9个激活层和3个池化层。图像识别模型中的各个层按照图6中各个操作的执行顺序由左及右顺次连接。第一个卷积层的输入数据是目标候选区域,之后每一层的输入数据是上一层的输出数据,最后一个激活层的输出数据是目标图像的识别结果。
请参考图7,其示出了本申请另一个实施例示出的图像识别方法的流程图。该方法可以包括如下步骤:
步骤401,获取第一训练样本集。
第一训练样本集包含多张第一训练样本,每张第一训练样本被标记出包括目标对象的区域和/或不包括目标对象的区域。
步骤402,采用第一训练样本集对CNN进行训练,得到图像检测模型。
步骤403,获取第二训练样本集。
第二训练样本集包含多张第二训练样本,每张第二训练样本对应有识别结果。
步骤404,采用第二训练样本集对CNN进行训练,得到图像识别模型。
本申请实施例对图像检测模型的训练过程,以及对图像识别过程的训练过程的先后顺序不作限定。也即,终端可以先执行步骤401和402,再执行步骤403和404;终端还可以先执行步骤403和404,再执行步骤401和402。
步骤405,采用图像检测模型检测目标图像中的目标候选区域。
目标候选区域为包含目标对象的图像块。
步骤406,获取目标候选区域占目标图像的比例。
若比例小于或等于预设门限,则执行步骤407;若比例大于预设门限,则执行步骤408。
步骤407,当从目标图像中检测出目标候选区域时,提取目标候选区域。
步骤408,采用图像识别模型对目标候选区域进行识别,得到目标图像的识别结果。
综上所述,本申请实施例提供的方法,先通过图像检测模型初步检测出图像中可能包括目标的目标候选区域,之后采用图像识别模型基于检测出的目标候选区域进行识别,将上述两种模型结合,从而在目标在图像中所占的比例较小的情况下,也能准确地识别出图像中的目标,提高了图像识别的准确性。
实际应用中,终端存在对用户进行身份验证的需求,例如,终端要求用户做出指定动作,比如摆出手势“Good”或者手势“Yeah”,终端通过摄像头采集图像,并对采集到的图像进行识别,得到识别结果,之后将该识别结果与所要求的指定动作进行比对,若一致,则说明身份验证成功,若不一致,则说明身份验证失败。
结合参考图8,其示出了本申请一个实施例提供的图像识别的界面示意图。 在该图中,终端对电子设备采集到的图像进行识别,该图像的识别结果为为图中所示的手势“Good”,也即竖起大拇指的手势。
结合参考图9,其示出了本申请另一个实施例提供的图像识别的界面示意图。在该图中,终端对电子设备采集到的图像进行识别,该图像的识别结果为为图中所示的手势“Yeah”,也即竖起食指和中指的手势。
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。
请参考图10,其示出了本申请一个实施例提供的图像识别装置的框图。该装置应用于电子设备中,具有实现上述方法示例中的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以包括:图像检测模块501、区域提取模块502和图像识别模块503。
图像检测模块501,用于采用图像检测模型检测目标图像中的目标候选区域,所述目标候选区域为包含目标对象的图像块。
区域提取模块502,用于当从所述目标图像中检测出所述目标候选区域时,提取所述目标候选区域。
图像识别模块503,用于采用图像识别模型基于所述目标候选区域进行图像识别,得到所述目标图像的识别结果。
在基于图10所示实施例提供的一个可选实施例中,所述图像检测模块501,用于:
采用所述图像检测模型获取所述目标图像中的每一个像素属于所述目标对象的概率;
根据各个像素对应的概率确定所述目标候选区域,所述目标候选区域包括概率大于预设阈值的像素。
在基于图10所示实施例提供的另一个可选实施例中,所述图像检测模块501,用于:
根据各个像素对应的概率获取符合第一预设条件的图像块,将所述符合第一预设条件的图像块确定为目标图像块,其中,所述第一预设条件是指包含连续且大于预设数量的目标像素,所述目标像素是指概率大于预设阈值的像素;
将包含所述目标图像块且符合第二预设条件的矩形区域确定为所述目标候 选区域,所述第二预设条件为所述目标图像块在矩形区域内的占比大于预设比例。
在基于图10所示实施例提供的另一个可选实施例中,所述图像识别模块503,用于:
采用所述图像识别模型对所述目标候选区域进行特征提取,得到所述目标候选区域的图像特征;
根据所述目标候选区域的图像特征,确定所述目标候选区域中的目标对象在多个识别结果中的第一概率分布;
将所述第一概率分布中的最大值所对应的识别结果,确定为所述目标图像的识别结果。
在基于图10所示实施例提供的另一个可选实施例中,所述图像识别模块503,用于:
对所述目标候选区域进行预处理,得到处理后的目标候选区域,所述处理后的目标候选区域的分辨率达到预设分辨率;
采用所述图像识别模型对所述处理后的目标候选区域进行特征提取,得到所述处理后的目标候选区域的图像特征;
根据所述处理后的目标候选区域的图像特征,确定所述目标候选区域中的目标对象在多个识别结果中的第二概率分布;
将所述第二概率分布中的最大值所对应的识别结果,确定为所述目标图像的识别结果。
在基于图10所示实施例提供的另一个可选实施例中,所述图像检测模型包括输入层、卷积层、池化层、上卷积层、拼接层、归一层和输出层;所述输入层用于输入所述目标图像;所述卷积层用于将所述目标图像转化为特征图;所述池化层用于对所述卷积层输出的特征图进行池化处理,以减少所述特征图中的特征数量;所述上卷积层用于对所述卷积层输出的特征图执行上卷积操作;所述拼接层用于对经过所述池化层和所述上卷积层处理后的特征图进行拼接处理,得到拼接后的特征图;所述归一层,用于对所述拼接后的特征图进行归一处理,得到所述目标候选区域的位置信息;所述输出层,用于输出所述目标候选区域的位置信息。
在基于图10所示实施例提供的另一个可选实施例中,所述图像识别模型包 括输入层、卷积层、池化层、归一层和输出层;所述输入层用于输入所述目标候选区域;所述卷积层用于将所述目标候选区域转化为特征图;所述池化层用于对所述特征图进行池化处理,以减少所述特征图中的特征数量;所述归一层用于对经过所述卷积层和所述池化层处理后的特征图进行归一处理,得到所述识别结果;所述输出层用于输出所述识别结果。
在基于图10所示实施例提供的另一个可选实施例中,请参考图11,所述装置还包括:比例获取模块504(图中未示出)。
比例获取模块504,用于获取所述目标候选区域占所述目标图像的比例。
所述图像识别模块503,还用于若所述比例大于预设门限,则直接执行所述采用图像识别模型对所述目标候选区域进行识别,得到所述目标图像的识别结果的步骤。
在基于图8所示实施例提供的另一个可选实施例中,请参考图9,所述装置还包括:第一获取模块505和第一训练模块506(图中未示出)。
第一获取模块505,用于获取第一训练样本集,所述第一训练样本集包含多张第一训练样本,每张所述第一训练样本被标记出包括所述目标的区域和/或不包括所述目标的区域。
第一训练模块506,用于采用所述第一训练样本集对卷积神经网络CNN进行训练,得到所述图像检测模型。
在基于图8所示实施例提供的另一个可选实施例中,请参考图11,所述装置还包括:第二获取模块507和第二训练模块508(图中未示出)。
第二获取模块507,用于获取第二训练样本集,所述第二训练样本集包含多张第二训练样本,每张所述第二训练样本对应有识别结果。
第二训练模块508,用于采用所述第二训练样本集对卷积神经网络CNN进行训练,得到所述图像识别模型。
综上所述,本申请实施例提供的装置,先通过图像检测模型初步检测出图像中可能包括目标的目标候选区域,之后采用图像识别模型基于检测出的目标候选区域进行识别,将上述两种模型结合,从而在目标在图像中所占的比例较小的情况下,也能准确地识别出图像中的目标,提高了图像识别的准确性。
图11示出了本申请一个示例性实施例提供的电子设备600的结构框图。该 电子设备600可以是诸如智能手机、平板电脑、笔记本电脑或台式电脑之类的终端,也可以是服务器。在本申请实施例中,仅以电子设备600为终端为例进行说明。
通常,电子设备600包括有:处理器601和存储器602。
处理器601可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器601可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器601也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器601可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器601还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器602可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器602还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器602中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器601所执行以实现本申请中方法实施例提供的图像识别方法。
在一些实施例中,电子设备600还可选包括有:***设备接口603和至少一个***设备。处理器601、存储器602和***设备接口603之间可以通过总线或信号线相连。各个***设备可以通过总线、信号线或电路板与***设备接口603相连。具体地,***设备包括:射频电路604、触摸显示屏605、摄像头606、音频电路607、定位组件608和电源609中的至少一种。
***设备接口603可被用于将I/O(Input/Output,输入/输出)相关的至少一个***设备连接到处理器601和存储器602。在一些实施例中,处理器601、存储器602和***设备接口603被集成在同一芯片或电路板上;在一些其他实施例中,处理器601、存储器602和***设备接口603中的任意一个或两个可以 在单独的芯片或电路板上实现,本实施例对此不加以限定。
射频电路604用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路604通过电磁信号与通信网络以及其他通信设备进行通信。射频电路604将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。可选地,射频电路604包括:天线***、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路604可以通过至少一种无线通信协议来与其它电子设备进行通信。该无线通信协议包括但不限于:万维网、城域网、内联网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路604还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本申请对此不加以限定。
显示屏605用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏605是触摸显示屏时,显示屏605还具有采集在显示屏605的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器601进行处理。此时,显示屏605还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏605可以为一个,设置电子设备600的前面板;在另一些实施例中,显示屏605可以为至少两个,分别设置在电子设备600的不同表面或呈折叠设计;在再一些实施例中,显示屏605可以是柔性显示屏,设置在电子设备600的弯曲表面上或折叠面上。甚至,显示屏605还可以设置成非矩形的不规则图形,也即异形屏。显示屏605可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件606用于采集图像或视频。可选地,摄像头组件606包括前置摄像头和后置摄像头。通常,前置摄像头设置在电子设备的前面板,后置摄像头设置在电子设备的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件606还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的 组合,可以用于不同色温下的光线补偿。
音频电路607可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器601进行处理,或者输入至射频电路604以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在电子设备600的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器601或射频电路604的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路607还可以包括耳机插孔。
定位组件608用于定位电子设备600的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件608可以是基于美国的GPS(Global Positioning System,全球定位***)、中国的北斗***或俄罗斯的伽利略***的定位组件。
电源609用于为电子设备600中的各个组件进行供电。电源609可以是交流电、直流电、一次性电池或可充电电池。当电源609包括可充电电池时,该可充电电池可以是有线充电电池或无线充电电池。有线充电电池是通过有线线路充电的电池,无线充电电池是通过无线线圈充电的电池。该可充电电池还可以用于支持快充技术。
在一些实施例中,电子设备600还包括有一个或多个传感器610。该一个或多个传感器610包括但不限于:加速度传感器611、陀螺仪传感器612、压力传感器613、指纹传感器614、光学传感器615以及接近传感器616。
加速度传感器611可以检测以电子设备600建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器611可以用于检测重力加速度在三个坐标轴上的分量。处理器601可以根据加速度传感器611采集的重力加速度信号,控制触摸显示屏605以横向视图或纵向视图进行用户界面的显示。加速度传感器611还可以用于游戏或者用户的运动数据的采集。
陀螺仪传感器612可以检测电子设备600的机体方向及转动角度,陀螺仪传感器612可以与加速度传感器611协同采集用户对电子设备600的3D动作。处理器601根据陀螺仪传感器612采集的数据,可以实现如下功能:动作感应 (比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。
压力传感器613可以设置在电子设备600的侧边框和/或触摸显示屏605的下层。当压力传感器613设置在电子设备600的侧边框时,可以检测用户对电子设备600的握持信号,由处理器601根据压力传感器613采集的握持信号进行左右手识别或快捷操作。当压力传感器613设置在触摸显示屏605的下层时,由处理器601根据用户对触摸显示屏605的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器614用于采集用户的指纹,由处理器601根据指纹传感器614采集到的指纹识别用户的身份,或者,由指纹传感器614根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器601授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器614可以被设置电子设备600的正面、背面或侧面。当电子设备600上设置有物理按键或厂商Logo时,指纹传感器614可以与物理按键或厂商Logo集成在一起。
光学传感器615用于采集环境光强度。在一个实施例中,处理器601可以根据光学传感器615采集的环境光强度,控制触摸显示屏605的显示亮度。具体地,当环境光强度较高时,调高触摸显示屏605的显示亮度;当环境光强度较低时,调低触摸显示屏605的显示亮度。在另一个实施例中,处理器601还可以根据光学传感器615采集的环境光强度,动态调整摄像头组件606的拍摄参数。
接近传感器616,也称距离传感器,通常设置在电子设备600的前面板。接近传感器616用于采集用户与电子设备600的正面之间的距离。在一个实施例中,当接近传感器616检测到用户与电子设备600的正面之间的距离逐渐变小时,由处理器601控制触摸显示屏605从亮屏状态切换为息屏状态;当接近传感器616检测到用户与电子设备600的正面之间的距离逐渐变大时,由处理器601控制触摸显示屏605从息屏状态切换为亮屏状态。
本领域技术人员可以理解,图11中示出的结构并不构成对电子设备600的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同 的组件布置。
在示例性实施例中,还提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由终端的处理器加载并执行以实现上述方法实施例中的图像识别方法。
可选地,上述计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
应当理解的是,在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。本文中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
以上仅为本申请的示例性实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (18)

  1. 一种图像识别方法,应用于电子设备中,所述方法包括:
    采用图像检测模型检测目标图像中的目标候选区域,所述目标候选区域为包含目标对象的图像块;
    当从所述目标图像中检测出所述目标候选区域时,提取所述目标候选区域;
    采用图像识别模型基于所述目标候选区域进行图像识别,得到所述目标图像的识别结果。
  2. 根据权利要求1所述的方法,其中,所述采用图像检测模型检测目标图像中的目标候选区域,包括:
    采用所述图像检测模型获取所述目标图像中的每一个像素属于所述目标对象的概率;
    根据各个像素对应的概率确定所述目标候选区域,所述目标候选区域包括概率大于预设阈值的像素。
  3. 根据权利要求2所述的方法,其中,所述根据各个像素对应的概率确定所述目标候选区域,包括:
    根据各个像素对应的概率获取符合第一预设条件的图像块,将所述符合第一预设条件的图像块确定为目标图像块,其中,所述第一预设条件是指包含连续且大于预设数量的目标像素,所述目标像素是指概率大于预设阈值的像素;
    将包含所述目标图像块且符合第二预设条件的矩形区域确定为所述目标候选区域,所述第二预设条件为所述目标图像块在所述矩形区域内的占比大于预设比例。
  4. 根据权利要求1所述的方法,其中,所述采用图像识别模型基于所述目标候选区域进行图像识别,得到所述目标图像的识别结果,包括:
    采用所述图像识别模型对所述目标候选区域进行特征提取,得到所述目标候选区域的图像特征;
    根据所述目标候选区域的图像特征,确定所述目标候选区域中的目标对象在多个识别结果中的第一概率分布;
    将所述第一概率分布中的最大值所对应的识别结果,确定为所述目标图像的识别结果。
  5. 根据权利要求1所述的方法,其中,所述采用图像识别模型基于所述目标候选区域进行图像识别,得到所述目标图像的识别结果,包括:
    对所述目标候选区域进行预处理,得到处理后的目标候选区域,所述处理后的目标候选区域的分辨率达到预设分辨率;
    采用所述图像识别模型对所述处理后的目标候选区域进行特征提取,得到所述处理后的目标候选区域的图像特征;
    根据所述处理后的目标候选区域的图像特征,确定所述目标候选区域中的目标对象在多个识别结果中的第二概率分布;
    将所述第二概率分布中的最大值所对应的识别结果,确定为所述目标图像的识别结果。
  6. 根据权利要求1所述的方法,其中,所述图像检测模型包括输入层、卷积层、池化层、上卷积层、拼接层、归一层和输出层;
    所述输入层用于输入所述目标图像;
    所述卷积层用于将所述目标图像转化为特征图;
    所述池化层用于对所述卷积层输出的特征图进行池化处理,以减少所述特征图中的特征数量;
    所述上卷积层用于对所述卷积层输出的特征图执行上卷积操作;
    所述拼接层用于对经过所述池化层和所述上卷积层处理后的特征图进行拼接处理,得到拼接后的特征图;
    所述归一层,用于对所述拼接后的特征图进行归一处理,得到所述目标候选区域的位置信息;
    所述输出层,用于输出所述目标候选区域的位置信息。
  7. 根据权利要求1所述的方法,其中,所述图像识别模型包括输入层、卷积层、池化层、归一层和输出层;
    所述输入层用于输入所述目标候选区域;
    所述卷积层用于将所述目标候选区域转化为特征图;
    所述池化层用于对所述特征图进行池化处理,以减少所述特征图中的特征数量;
    所述归一层用于对经过所述卷积层和所述池化层处理后的特征图进行归一处理,得到所述识别结果;
    所述输出层用于输出所述识别结果。
  8. 根据权利要求1至7任一项所述的方法,其中,所述提取所述目标候选区域之前,还包括:
    获取所述目标候选区域占所述目标图像的比例;
    若所述比例大于预设门限,则直接执行所述采用图像识别模型对所述目标候选区域进行识别,得到所述目标图像的识别结果的步骤。
  9. 一种图像识别装置,应用于电子设备中,所述装置包括:
    图像检测模块,用于采用图像检测模型检测目标图像中的目标候选区域,所述目标候选区域为包含目标对象的图像块;
    区域提取模块,用于当从所述目标图像中检测出所述目标候选区域时,提取所述目标候选区域;
    图像识别模块,用于采用图像识别模型基于所述目标候选区域进行图像识别,得到所述目标图像的识别结果。
  10. 根据权利要求9所述的装置,其中,所述图像检测模块,用于:
    采用所述图像检测模型获取所述目标图像中的每一个像素属于所述目标对象的概率;
    根据各个像素对应的概率确定所述目标候选区域,所述目标候选区域包括概率大于预设阈值的像素。
  11. 根据权利要求10所述的装置,其中,所述图像检测模块,用于:
    根据各个像素对应的概率获取符合第一预设条件的图像块,将所述符合第一预设条件的图像块确定为目标图像块,其中,所述第一预设条件是指包含连 续且大于预设数量的目标像素,所述目标像素是指概率大于预设阈值的像素;
    将包含所述目标图像块且符合第二预设条件的矩形区域确定为所述目标候选区域,所述第二预设条件为所述目标图像块在所述矩形区域内的占比大于预设比例。
  12. 根据权利要求9所述的装置,其中,所述图像识别模块,用于:
    采用所述图像识别模型对所述目标候选区域进行特征提取,得到所述目标候选区域的图像特征;
    根据所述目标候选区域的图像特征,确定所述目标候选区域中的目标对象在多个识别结果中的第一概率分布;
    将所述第一概率分布中的最大值所对应的识别结果,确定为所述目标图像的识别结果。
  13. 根据权利要求9所述的装置,其中,所述图像识别模块,用于:
    对所述目标候选区域进行预处理,得到处理后的目标候选区域,所述处理后的目标候选区域的分辨率达到预设分辨率;
    采用所述图像识别模型对所述处理后的目标候选区域进行特征提取,得到所述处理后的目标候选区域的图像特征;
    根据所述处理后的目标候选区域的图像特征,确定所述目标候选区域中的目标对象在多个识别结果中的第二概率分布;
    将所述第二概率分布中的最大值所对应的识别结果,确定为所述目标图像的识别结果。
  14. 根据权利要求9所述的装置,其中,所述图像检测模型包括输入层、卷积层、池化层、上卷积层、拼接层、输出层;
    所述输入层用于输入所述目标图像;
    所述卷积层用于将所述目标图像转化为特征图;
    所述池化层用于对所述卷积层输出的特征图进行池化处理,以减少所述特征图中的特征数量;
    所述上卷积层用于对所述卷积层输出的特征图执行上卷积操作;
    所述拼接层用于对经过所述池化层和所述上卷积层处理后的特征图进行拼接处理,得到拼接后的特征图;
    所述归一层,用于对所述拼接后的特征图进行归一处理,得到所述目标候选区域的位置信息;
    所述输出层,用于输出所述目标候选区域的位置信息。
  15. 根据权利要求9所述的装置,其中,所述图像识别模型包括输入层、卷积层、池化层、归一层和输出层;
    所述输入层用于输入所述目标候选区域;
    所述卷积层用于将所述目标候选区域转化为特征图;
    所述池化层用于对所述特征图进行池化处理,以减少所述特征图中的特征数量;
    所述归一层用于对经过所述卷积层和所述池化层处理后的特征图进行归一处理,得到所述识别结果;
    所述输出层用于输出所述识别结果。
  16. 根据权利要求9至15任一项所述的装置,其中,所述装置还包括:
    比例获取模块,用于获取所述目标候选区域占所述目标图像的比例;
    所述图像识别模块,还用于若所述比例大于预设门限,则直接执行所述采用图像识别模型对所述目标候选区域进行识别,得到所述目标图像的识别结果的步骤。
  17. 一种电子设备,所述电子设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至8任一项所述的图像识别方法。
  18. 一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如权利要求1至8任一 项所述的图像识别方法。
PCT/CN2018/116044 2017-11-23 2018-11-16 图像识别方法、装置及电子设备 WO2019101021A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711180320.X 2017-11-23
CN201711180320.XA CN109829456B (zh) 2017-11-23 2017-11-23 图像识别方法、装置及终端

Publications (1)

Publication Number Publication Date
WO2019101021A1 true WO2019101021A1 (zh) 2019-05-31

Family

ID=66631339

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/116044 WO2019101021A1 (zh) 2017-11-23 2018-11-16 图像识别方法、装置及电子设备

Country Status (2)

Country Link
CN (1) CN109829456B (zh)
WO (1) WO2019101021A1 (zh)

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110335224A (zh) * 2019-07-05 2019-10-15 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机设备及存储介质
CN110400304A (zh) * 2019-07-25 2019-11-01 腾讯科技(深圳)有限公司 基于深度学习的物体检测方法、装置、设备及存储介质
CN110517261A (zh) * 2019-08-30 2019-11-29 上海眼控科技股份有限公司 安全带状态检测方法、装置、计算机设备和存储介质
CN110647881A (zh) * 2019-09-19 2020-01-03 腾讯科技(深圳)有限公司 确定图像对应的卡片类型的方法、装置、设备及存储介质
CN110765525A (zh) * 2019-10-18 2020-02-07 Oppo广东移动通信有限公司 生成场景图片的方法、装置、电子设备及介质
CN110807361A (zh) * 2019-09-19 2020-02-18 腾讯科技(深圳)有限公司 人体识别方法、装置、计算机设备及存储介质
CN110991491A (zh) * 2019-11-12 2020-04-10 苏州智加科技有限公司 图像标注方法、装置、设备及存储介质
CN110991298A (zh) * 2019-11-26 2020-04-10 腾讯科技(深圳)有限公司 图像的处理方法和装置、存储介质及电子装置
CN111144408A (zh) * 2019-12-24 2020-05-12 Oppo广东移动通信有限公司 一种图像识别方法、图像识别装置、电子设备和存储介质
CN111161195A (zh) * 2020-01-02 2020-05-15 重庆特斯联智慧科技股份有限公司 一种特征图处理方法、装置、存储介质及终端
CN111178126A (zh) * 2019-11-20 2020-05-19 北京迈格威科技有限公司 目标检测方法、装置、计算机设备和存储介质
CN111242070A (zh) * 2020-01-19 2020-06-05 上海眼控科技股份有限公司 目标物体检测方法、计算机设备和存储介质
CN111292377A (zh) * 2020-03-11 2020-06-16 南京旷云科技有限公司 目标检测方法、装置、计算机设备和存储介质
CN111325258A (zh) * 2020-02-14 2020-06-23 腾讯科技(深圳)有限公司 特征信息获取方法、装置、设备及存储介质
CN111428806A (zh) * 2020-04-03 2020-07-17 北京达佳互联信息技术有限公司 图像标签确定方法、装置、电子设备及存储介质
CN111444906A (zh) * 2020-03-24 2020-07-24 腾讯科技(深圳)有限公司 基于人工智能的图像识别方法和相关装置
CN111611947A (zh) * 2020-05-25 2020-09-01 济南博观智能科技有限公司 一种车牌检测方法、装置、设备及介质
CN111711750A (zh) * 2020-06-05 2020-09-25 腾讯科技(深圳)有限公司 基于人工智能的图像处理方法、装置、设备及介质
CN111797754A (zh) * 2020-06-30 2020-10-20 上海掌门科技有限公司 图像检测的方法、装置、电子设备及介质
CN111860485A (zh) * 2020-07-24 2020-10-30 腾讯科技(深圳)有限公司 图像识别模型的训练方法、图像的识别方法、装置、设备
CN112288345A (zh) * 2019-07-25 2021-01-29 顺丰科技有限公司 装卸口状态检测方法、装置、服务器及存储介质
CN112700494A (zh) * 2019-10-23 2021-04-23 北京灵汐科技有限公司 定位方法、装置、电子设备及计算机可读存储介质
CN112766257A (zh) * 2019-10-21 2021-05-07 阿里巴巴集团控股有限公司 一种目标区域的选定方法、装置、以及电子设备
CN112785567A (zh) * 2021-01-15 2021-05-11 北京百度网讯科技有限公司 地图检测方法、装置、电子设备和存储介质
CN112818979A (zh) * 2020-08-26 2021-05-18 腾讯科技(深圳)有限公司 文本识别方法、装置、设备及存储介质
CN113034427A (zh) * 2019-12-25 2021-06-25 合肥欣奕华智能机器有限公司 图像识别方法及图像识别装置
CN113221920A (zh) * 2021-05-20 2021-08-06 北京百度网讯科技有限公司 图像识别方法、装置、设备、存储介质以及计算机程序产品
CN113704554A (zh) * 2021-07-13 2021-11-26 湖南中惠旅智能科技有限责任公司 基于电子地图的视频检索方法及***
CN113807410A (zh) * 2021-08-27 2021-12-17 北京百度网讯科技有限公司 图像识别方法、装置以及电子设备
CN114049518A (zh) * 2021-11-10 2022-02-15 北京百度网讯科技有限公司 图像分类方法、装置、电子设备和存储介质
US11294047B2 (en) * 2019-12-23 2022-04-05 Sensetime International Pte. Ltd. Method, apparatus, and system for recognizing target object
CN115209032A (zh) * 2021-04-09 2022-10-18 美智纵横科技有限责任公司 基于清洁机器人的图像采集方法、装置、电子设备及介质
WO2023045602A1 (zh) * 2021-09-27 2023-03-30 杭州海康威视***技术有限公司 一种图像识别方法及电子设备
CN116188919A (zh) * 2023-04-25 2023-05-30 之江实验室 一种测试方法、装置、可读存储介质及电子设备
CN117576490A (zh) * 2024-01-16 2024-02-20 口碑(上海)信息技术有限公司 一种后厨环境检测方法和装置、存储介质和电子设备

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390261B (zh) * 2019-06-13 2022-06-17 北京汽车集团有限公司 目标检测方法、装置、计算机可读存储介质及电子设备
CN112115748B (zh) * 2019-06-21 2023-08-25 腾讯科技(深圳)有限公司 证件图像识别方法、装置、终端及存储介质
CN112183158B (zh) * 2019-07-03 2023-07-21 九阳股份有限公司 一种谷物烹饪设备的谷物种类识别方法和谷物烹饪设备
CN110516636A (zh) * 2019-08-30 2019-11-29 盈盛智创科技(广州)有限公司 一种工序的监测方法、装置、计算机设备和存储介质
CN110705633B (zh) * 2019-09-27 2022-06-07 北京猎户星空科技有限公司 目标物检测、目标物检测模型的建立方法及装置
CN111091147B (zh) * 2019-12-10 2024-01-19 东软集团股份有限公司 一种图像分类方法、装置及设备
CN111563086B (zh) * 2020-01-13 2023-09-19 杭州海康威视***技术有限公司 信息关联方法、装置、设备及存储介质
CN111368682B (zh) * 2020-02-27 2023-12-12 上海电力大学 一种基于faster RCNN台标检测与识别的方法及***
CN111626208B (zh) * 2020-05-27 2023-06-13 阿波罗智联(北京)科技有限公司 用于检测小目标的方法和装置
CN111783878B (zh) * 2020-06-29 2023-08-04 北京百度网讯科技有限公司 目标检测方法、装置、电子设备以及可读存储介质
CN112902987B (zh) * 2021-02-02 2022-07-15 北京三快在线科技有限公司 一种位姿修正的方法及装置
CN113011418B (zh) * 2021-02-09 2024-02-23 杭州海康慧影科技有限公司 确定图像中待处理区域的方法、装置、设备
CN112990387B (zh) * 2021-05-17 2021-07-20 腾讯科技(深圳)有限公司 模型优化方法、相关设备及存储介质
CN114489549B (zh) * 2022-01-30 2023-04-25 深圳创维-Rgb电子有限公司 投屏图像处理方法、装置、电子设备及存储介质
CN115994947B (zh) * 2023-03-22 2023-06-02 万联易达物流科技有限公司 基于定位的智能打卡的估算方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850845A (zh) * 2015-05-30 2015-08-19 大连理工大学 一种基于非对称卷积神经网络的交通标志识别方法
CN105320945A (zh) * 2015-10-30 2016-02-10 小米科技有限责任公司 图像分类的方法及装置
CN106446784A (zh) * 2016-08-30 2017-02-22 东软集团股份有限公司 一种图像检测方法及装置
CN107194393A (zh) * 2016-03-15 2017-09-22 杭州海康威视数字技术股份有限公司 一种检测临时车牌的方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9514381B1 (en) * 2013-03-15 2016-12-06 Pandoodle Corporation Method of identifying and replacing an object or area in a digital image with another object or area
CN106504233B (zh) * 2016-10-18 2019-04-09 国网山东省电力公司电力科学研究院 基于Faster R-CNN的无人机巡检图像电力小部件识别方法及***
CN107273836A (zh) * 2017-06-07 2017-10-20 深圳市深网视界科技有限公司 一种行人检测识别方法、装置、模型和介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850845A (zh) * 2015-05-30 2015-08-19 大连理工大学 一种基于非对称卷积神经网络的交通标志识别方法
CN105320945A (zh) * 2015-10-30 2016-02-10 小米科技有限责任公司 图像分类的方法及装置
CN107194393A (zh) * 2016-03-15 2017-09-22 杭州海康威视数字技术股份有限公司 一种检测临时车牌的方法及装置
CN106446784A (zh) * 2016-08-30 2017-02-22 东软集团股份有限公司 一种图像检测方法及装置

Cited By (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110335224B (zh) * 2019-07-05 2022-12-13 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机设备及存储介质
CN110335224A (zh) * 2019-07-05 2019-10-15 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机设备及存储介质
CN110400304A (zh) * 2019-07-25 2019-11-01 腾讯科技(深圳)有限公司 基于深度学习的物体检测方法、装置、设备及存储介质
CN110400304B (zh) * 2019-07-25 2023-12-12 腾讯科技(深圳)有限公司 基于深度学习的物体检测方法、装置、设备及存储介质
CN112288345A (zh) * 2019-07-25 2021-01-29 顺丰科技有限公司 装卸口状态检测方法、装置、服务器及存储介质
CN110517261A (zh) * 2019-08-30 2019-11-29 上海眼控科技股份有限公司 安全带状态检测方法、装置、计算机设备和存储介质
CN110647881A (zh) * 2019-09-19 2020-01-03 腾讯科技(深圳)有限公司 确定图像对应的卡片类型的方法、装置、设备及存储介质
CN110807361B (zh) * 2019-09-19 2023-08-08 腾讯科技(深圳)有限公司 人体识别方法、装置、计算机设备及存储介质
CN110647881B (zh) * 2019-09-19 2023-09-05 腾讯科技(深圳)有限公司 确定图像对应的卡片类型的方法、装置、设备及存储介质
CN110807361A (zh) * 2019-09-19 2020-02-18 腾讯科技(深圳)有限公司 人体识别方法、装置、计算机设备及存储介质
CN110765525B (zh) * 2019-10-18 2023-11-10 Oppo广东移动通信有限公司 生成场景图片的方法、装置、电子设备及介质
CN110765525A (zh) * 2019-10-18 2020-02-07 Oppo广东移动通信有限公司 生成场景图片的方法、装置、电子设备及介质
CN112766257A (zh) * 2019-10-21 2021-05-07 阿里巴巴集团控股有限公司 一种目标区域的选定方法、装置、以及电子设备
CN112766257B (zh) * 2019-10-21 2024-04-12 阿里巴巴集团控股有限公司 一种目标区域的选定方法、装置、以及电子设备
CN112700494A (zh) * 2019-10-23 2021-04-23 北京灵汐科技有限公司 定位方法、装置、电子设备及计算机可读存储介质
CN110991491A (zh) * 2019-11-12 2020-04-10 苏州智加科技有限公司 图像标注方法、装置、设备及存储介质
CN111178126A (zh) * 2019-11-20 2020-05-19 北京迈格威科技有限公司 目标检测方法、装置、计算机设备和存储介质
CN110991298B (zh) * 2019-11-26 2023-07-14 腾讯科技(深圳)有限公司 图像的处理方法和装置、存储介质及电子装置
CN110991298A (zh) * 2019-11-26 2020-04-10 腾讯科技(深圳)有限公司 图像的处理方法和装置、存储介质及电子装置
US11294047B2 (en) * 2019-12-23 2022-04-05 Sensetime International Pte. Ltd. Method, apparatus, and system for recognizing target object
CN111144408A (zh) * 2019-12-24 2020-05-12 Oppo广东移动通信有限公司 一种图像识别方法、图像识别装置、电子设备和存储介质
CN113034427A (zh) * 2019-12-25 2021-06-25 合肥欣奕华智能机器有限公司 图像识别方法及图像识别装置
CN113034427B (zh) * 2019-12-25 2024-01-23 合肥欣奕华智能机器股份有限公司 图像识别方法及图像识别装置
CN111161195B (zh) * 2020-01-02 2023-10-13 重庆特斯联智慧科技股份有限公司 一种特征图处理方法、装置、存储介质及终端
CN111161195A (zh) * 2020-01-02 2020-05-15 重庆特斯联智慧科技股份有限公司 一种特征图处理方法、装置、存储介质及终端
CN111242070A (zh) * 2020-01-19 2020-06-05 上海眼控科技股份有限公司 目标物体检测方法、计算机设备和存储介质
CN111325258A (zh) * 2020-02-14 2020-06-23 腾讯科技(深圳)有限公司 特征信息获取方法、装置、设备及存储介质
CN111325258B (zh) * 2020-02-14 2023-10-24 腾讯科技(深圳)有限公司 特征信息获取方法、装置、设备及存储介质
CN111292377A (zh) * 2020-03-11 2020-06-16 南京旷云科技有限公司 目标检测方法、装置、计算机设备和存储介质
CN111292377B (zh) * 2020-03-11 2024-01-23 南京旷云科技有限公司 目标检测方法、装置、计算机设备和存储介质
CN111444906A (zh) * 2020-03-24 2020-07-24 腾讯科技(深圳)有限公司 基于人工智能的图像识别方法和相关装置
CN111444906B (zh) * 2020-03-24 2023-09-29 腾讯科技(深圳)有限公司 基于人工智能的图像识别方法和相关装置
CN111428806A (zh) * 2020-04-03 2020-07-17 北京达佳互联信息技术有限公司 图像标签确定方法、装置、电子设备及存储介质
CN111428806B (zh) * 2020-04-03 2023-10-10 北京达佳互联信息技术有限公司 图像标签确定方法、装置、电子设备及存储介质
CN111611947B (zh) * 2020-05-25 2024-04-09 济南博观智能科技有限公司 一种车牌检测方法、装置、设备及介质
CN111611947A (zh) * 2020-05-25 2020-09-01 济南博观智能科技有限公司 一种车牌检测方法、装置、设备及介质
CN111711750B (zh) * 2020-06-05 2023-11-07 腾讯科技(深圳)有限公司 基于人工智能的图像处理方法、装置、设备及介质
CN111711750A (zh) * 2020-06-05 2020-09-25 腾讯科技(深圳)有限公司 基于人工智能的图像处理方法、装置、设备及介质
CN111797754A (zh) * 2020-06-30 2020-10-20 上海掌门科技有限公司 图像检测的方法、装置、电子设备及介质
CN111860485B (zh) * 2020-07-24 2024-04-26 腾讯科技(深圳)有限公司 图像识别模型的训练方法、图像的识别方法、装置、设备
CN111860485A (zh) * 2020-07-24 2020-10-30 腾讯科技(深圳)有限公司 图像识别模型的训练方法、图像的识别方法、装置、设备
CN112818979B (zh) * 2020-08-26 2024-02-02 腾讯科技(深圳)有限公司 文本识别方法、装置、设备及存储介质
CN112818979A (zh) * 2020-08-26 2021-05-18 腾讯科技(深圳)有限公司 文本识别方法、装置、设备及存储介质
CN112785567A (zh) * 2021-01-15 2021-05-11 北京百度网讯科技有限公司 地图检测方法、装置、电子设备和存储介质
CN112785567B (zh) * 2021-01-15 2023-09-22 北京百度网讯科技有限公司 地图检测方法、装置、电子设备和存储介质
CN115209032B (zh) * 2021-04-09 2024-04-16 美智纵横科技有限责任公司 基于清洁机器人的图像采集方法、装置、电子设备及介质
CN115209032A (zh) * 2021-04-09 2022-10-18 美智纵横科技有限责任公司 基于清洁机器人的图像采集方法、装置、电子设备及介质
CN113221920B (zh) * 2021-05-20 2024-01-12 北京百度网讯科技有限公司 图像识别方法、装置、设备、存储介质以及计算机程序产品
CN113221920A (zh) * 2021-05-20 2021-08-06 北京百度网讯科技有限公司 图像识别方法、装置、设备、存储介质以及计算机程序产品
CN113704554B (zh) * 2021-07-13 2024-03-29 湖南中惠旅智能科技有限责任公司 基于电子地图的视频检索方法及***
CN113704554A (zh) * 2021-07-13 2021-11-26 湖南中惠旅智能科技有限责任公司 基于电子地图的视频检索方法及***
CN113807410A (zh) * 2021-08-27 2021-12-17 北京百度网讯科技有限公司 图像识别方法、装置以及电子设备
CN113807410B (zh) * 2021-08-27 2023-09-05 北京百度网讯科技有限公司 图像识别方法、装置以及电子设备
WO2023045602A1 (zh) * 2021-09-27 2023-03-30 杭州海康威视***技术有限公司 一种图像识别方法及电子设备
CN114049518A (zh) * 2021-11-10 2022-02-15 北京百度网讯科技有限公司 图像分类方法、装置、电子设备和存储介质
CN116188919A (zh) * 2023-04-25 2023-05-30 之江实验室 一种测试方法、装置、可读存储介质及电子设备
CN117576490A (zh) * 2024-01-16 2024-02-20 口碑(上海)信息技术有限公司 一种后厨环境检测方法和装置、存储介质和电子设备
CN117576490B (zh) * 2024-01-16 2024-04-05 口碑(上海)信息技术有限公司 一种后厨环境检测方法和装置、存储介质和电子设备

Also Published As

Publication number Publication date
CN109829456A (zh) 2019-05-31
CN109829456B (zh) 2022-05-17

Similar Documents

Publication Publication Date Title
WO2019101021A1 (zh) 图像识别方法、装置及电子设备
CN109034102B (zh) 人脸活体检测方法、装置、设备及存储介质
US20210349940A1 (en) Video clip positioning method and apparatus, computer device, and storage medium
CN111079576B (zh) 活体检测方法、装置、设备及存储介质
CN108594997B (zh) 手势骨架构建方法、装置、设备及存储介质
WO2019105285A1 (zh) 人脸属性识别方法、电子设备及存储介质
EP3779883A1 (en) Method and device for repositioning in camera orientation tracking process, and storage medium
WO2019219065A1 (zh) 视频分析的方法和装置
CN110647865A (zh) 人脸姿态的识别方法、装置、设备及存储介质
CN110059652B (zh) 人脸图像处理方法、装置及存储介质
US20220309836A1 (en) Ai-based face recognition method and apparatus, device, and medium
CN109684980B (zh) 自动阅卷方法及装置
CN110807361A (zh) 人体识别方法、装置、计算机设备及存储介质
CN110490179B (zh) 车牌识别方法、装置及存储介质
CN110795019B (zh) 软键盘的按键识别方法、装置及存储介质
CN109360222B (zh) 图像分割方法、装置及存储介质
CN109522863B (zh) 耳部关键点检测方法、装置及存储介质
WO2022042425A1 (zh) 视频数据处理方法、装置、计算机设备及存储介质
CN111027490B (zh) 人脸属性识别方法及装置、存储介质
CN110991457A (zh) 二维码处理方法、装置、电子设备及存储介质
CN110991445B (zh) 竖排文字识别方法、装置、设备及介质
CN111754386A (zh) 图像区域屏蔽方法、装置、设备及存储介质
KR20230071720A (ko) 얼굴 이미지의 랜드마크 좌표 예측 방법 및 장치
CN111860064B (zh) 基于视频的目标检测方法、装置、设备及存储介质
CN110163192B (zh) 字符识别方法、装置及可读介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18881277

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18881277

Country of ref document: EP

Kind code of ref document: A1