WO2018228375A1 - 一种对形变图像的目标识别方法及装置 - Google Patents

一种对形变图像的目标识别方法及装置 Download PDF

Info

Publication number
WO2018228375A1
WO2018228375A1 PCT/CN2018/090826 CN2018090826W WO2018228375A1 WO 2018228375 A1 WO2018228375 A1 WO 2018228375A1 CN 2018090826 W CN2018090826 W CN 2018090826W WO 2018228375 A1 WO2018228375 A1 WO 2018228375A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
recognized
identified
preset
corrected
Prior art date
Application number
PCT/CN2018/090826
Other languages
English (en)
French (fr)
Inventor
许昀璐
郑钢
程战战
钮毅
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Priority to US16/622,197 priority Critical patent/US11126888B2/en
Priority to EP18817876.8A priority patent/EP3640844A4/en
Publication of WO2018228375A1 publication Critical patent/WO2018228375A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2136Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on sparsity criteria, e.g. with an overcomplete basis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/247Aligning, centring, orientation detection or correction of the image by affine transforms, e.g. correction due to perspective effects; Quadrilaterals, e.g. trapezoids

Definitions

  • the present application relates to the field of image recognition, and in particular to a method and apparatus for object recognition of a deformed image.
  • the target recognition method based on neural network is to use the autonomous learning characteristics of the neural network to extract image features and obtain the classification result of the target, which is the recognition result of the target.
  • the accuracy of target recognition can be improved, and the types of targets that can be recognized are also wider, such as people, animals, plants, buildings, vehicles, characters, and the like.
  • the existing neural network-based target recognition method uses the deep neural network model to target the image, but the process of target recognition is not considered in the complex scene, and the deformation of the target due to shooting or other reasons brings the target recognition
  • the effects such as the tilt, zoom and perspective transformation of the target caused by the change of the viewing angle during image capture, or the artificial deformation of the target in the natural scene, such as the tilt and distortion caused by the font design and changes encountered in character recognition. Wait.
  • the existing neural network-based target recognition method directly classifies the target with large deformation, resulting in a lower accuracy of target recognition.
  • the purpose of the embodiments of the present application is to provide a target recognition method and device for a deformed image, which can improve the accuracy of target recognition for a deformed image.
  • the specific technical solutions are as follows:
  • the embodiment of the present application discloses a target recognition method for a deformed image, including:
  • the preset positioning network includes a preset convolution layer, and the plurality of positioning parameters are convolution of the image to be identified
  • the obtained image features in the feature map are obtained after regression;
  • the inputting the image to be identified into the preset positioning network, and acquiring the multiple positioning parameters of the image to be identified includes:
  • the positioning parameter is In the image to be recognized, the coordinates of the pixel points that match the image features of the preset number of reference points in the corrected image to be recognized.
  • the spatially transforming the image to be identified according to the plurality of positioning parameters to obtain the corrected image to be identified including:
  • the positioning parameter corresponding to the preset number of reference points, the coordinates of the preset number of reference points in the corrected image to be recognized, the reference point to be acquired in the image to be recognized and the corrected to be identified The spatial transformation relationship between images, including:
  • the inputting the corrected image to be recognized into the preset identification network, and acquiring the target classification result of the image to be identified includes:
  • the image features in the feature image of the corrected image to be identified are classified by using the all-connection layer in the preset identification network, and the target classification result of the image to be recognized is obtained.
  • the embodiment of the present application further discloses a target recognition device for a deformed image, including:
  • a positioning module configured to input an image to be identified into a preset positioning network, and acquire a plurality of positioning parameters of the image to be identified, where the preset positioning network includes a preset convolution layer, and the plurality of positioning parameters are Obtaining the image features in the feature map obtained after the image is to be convolved;
  • a spatial transformation module configured to spatially transform the image to be identified according to the plurality of positioning parameters, to obtain a corrected image to be identified
  • an identifier module configured to input the corrected image to be recognized into a preset recognition network, and obtain a target classification result of the image to be identified.
  • the positioning module includes:
  • a feature map obtaining sub-module configured to extract an image feature of the image to be recognized by using the preset convolution layer, and obtain a feature image of the image to be recognized that includes the image feature;
  • a locating sub-module configured to perform a regression process on the image feature in the feature image of the image to be identified, and obtain a plurality of positioning parameters of the image to be identified, by using a fully connected layer in the preset positioning network,
  • the positioning parameter is a coordinate of a pixel point that matches an image feature of a preset number of reference points in the corrected image to be recognized.
  • the space transformation module includes:
  • a transformation relationship acquisition sub-module configured to acquire a reference point in the image to be recognized and the corrected image according to a positioning parameter corresponding to the preset number of reference points, a coordinate of the preset number of reference points in the corrected image to be recognized a spatial transformation relationship between the images to be identified;
  • a correction submodule configured to calculate, according to the spatial transformation relationship, coordinates of all pixels in the image to be identified in the image to be recognized, and obtain coordinates of all pixels in the image to be identified corresponding to the image to be recognized after correction , the corrected image to be recognized is obtained.
  • the transformation relationship acquisition submodule is specifically configured to:
  • the calibration submodule is specifically configured to:
  • the identifying module includes:
  • a feature acquiring sub-module configured to extract, by using the convolution layer in the preset identification network, an image feature on the corrected image to be recognized, and obtain a feature image of the corrected image to be recognized that includes the image feature;
  • a classification sub-module configured to perform classification processing on the image features in the feature image of the corrected image to be recognized by using the fully-connected layer in the preset identification network, and obtain a target classification result of the image to be identified.
  • an embodiment of the present application further discloses an electronic device, including: a processor and a memory,
  • a memory for storing a computer program
  • the processor is configured to input a to-be-identified image into a preset positioning network to obtain a plurality of positioning parameters of the image to be identified, where the preset positioning network includes a preset convolution a layer, the plurality of positioning parameters are obtained after regression of image features in the feature map obtained by convolving the image to be identified;
  • the inputting the image to be identified into the preset positioning network, and acquiring the multiple positioning parameters of the image to be identified may include:
  • the positioning parameter is In the image to be recognized, the coordinates of the pixel points that match the image features of the preset number of reference points in the corrected image to be recognized.
  • the spatially transforming the to-be-identified image according to the multiple positioning parameters to obtain the corrected image to be identified may include:
  • the positioning parameter corresponding to the preset number of reference points, the coordinates of the preset number of reference points in the corrected image to be recognized, the reference point to be acquired in the image to be recognized and the corrected to be identified can include:
  • the inputting the corrected image to be recognized into the preset identification network, and acquiring the target classification result of the image to be identified may include:
  • the image features in the feature image of the corrected image to be identified are classified by using the all-connection layer in the preset identification network, and the target classification result of the image to be recognized is obtained.
  • the embodiment of the present application further discloses a computer readable storage medium, where the computer readable storage medium stores a computer program, and when the computer program is executed by the processor, the image to be recognized is input to be preset. Locating a network, acquiring a plurality of positioning parameters of the image to be identified, the preset positioning network comprising a preset convolution layer, wherein the plurality of positioning parameters are in a feature map obtained by convolving the image to be identified Obtained after image feature regression;
  • the inputting the image to be identified into the preset positioning network, and acquiring the multiple positioning parameters of the image to be identified may include:
  • the positioning parameter is In the image to be recognized, the coordinates of the pixel points that match the image features of the preset number of reference points in the corrected image to be recognized.
  • the spatially transforming the to-be-identified image according to the multiple positioning parameters to obtain the corrected image to be identified may include:
  • the positioning parameter corresponding to the preset number of reference points, the coordinates of the preset number of reference points in the corrected image to be recognized, the reference point to be acquired in the image to be recognized and the corrected to be identified can include:
  • the inputting the corrected image to be recognized into the preset identification network, and acquiring the target classification result of the image to be identified may include:
  • the image features in the feature image of the corrected image to be identified are classified by using the all-connection layer in the preset identification network, and the target classification result of the image to be recognized is obtained.
  • the object recognition method and device for deforming an image provided by the embodiment of the present application first input an image to be identified into a preset positioning network, and acquire a plurality of positioning parameters of the image to be identified, where the positioning network includes a preset convolution layer. And the plurality of positioning parameters are obtained after the image features in the feature map obtained by convolving the image to be recognized are returned. Secondly, spatially transforming the image to be identified according to the plurality of positioning parameters to obtain a corrected image to be recognized. Finally, the corrected image to be identified is input into a preset recognition network, and the target classification result of the image to be identified is acquired.
  • the deformation image is corrected first, and the target image is corrected based on the corrected image, which can reduce the interference of the deformation on the target recognition. Therefore, the embodiment of the present application can improve the deformation image.
  • the accuracy of target recognition is not necessarily required for implementing any of the products or methods of the present application.
  • FIG. 1 is a flow chart of a method for identifying a target of a deformed image according to an embodiment of the present application
  • FIG. 2 is a flowchart of training of a neural network according to an embodiment of the present application
  • FIG. 3 is a structural diagram of a neural network according to an embodiment of the present application.
  • FIG. 4 is another flow chart of a method for identifying a target of a deformed image according to an embodiment of the present application
  • FIG. 5 is a structural diagram of a target recognition apparatus for a deformed image according to an embodiment of the present application.
  • FIG. 6 is another structural diagram of a target recognition apparatus for a deformed image according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the embodiment of the present application discloses a target recognition method and device for a deformed image, which can improve the accuracy of target recognition for the deformed image.
  • Target recognition technology based on neural network is widely used in many fields, such as intelligent monitoring field and character recognition field.
  • the pre-built neural network is trained through a large number of sample images and known target recognition results in the sample images.
  • the neural network autonomous learning characteristics are used to extract the image features of the sample images, and the sample images are obtained.
  • the recognition result of the target and by comparing the known target recognition results in the sample image, automatically adjust the parameters of the neural network, and finally obtain a neural network with high target recognition accuracy.
  • the target recognition result with high accuracy can be obtained for any image to be recognized, and the target recognition result reflects the target information in the image.
  • Neural network-based target recognition technology can identify a wide variety of target types, such as license plates, characters, faces, animals, plants, and the like.
  • the text recognition technology based on neural network detects and identifies the string information in the image, including the license plate number, container number, train number, express order number, and bar code number that may appear in the image.
  • the existing neural network-based target recognition method uses the deep neural network model to target the image, but the target recognition process does not consider the target deformation band due to shooting or other reasons in the complex scene. Effects such as tilting, zooming, and perspective transformation of the target caused by changes in the angle of view during image capture, or artificial deformation of objects in natural scenes, such as the tilt of font design and variations encountered in character recognition, Distorted, etc.
  • the existing neural network-based target recognition method directly classifies the target with large deformation, resulting in a lower accuracy of target recognition.
  • the embodiment of the present application proposes a target recognition method for a deformed image, and the method provides an end-to-end deep neural network model.
  • the end-to-end deep neural network model of the embodiment of the present application can be based on a sample image of a large number of deformed images.
  • the target recognition result is trained in the known sample image.
  • the method of the embodiment of the present application mainly includes:
  • the first step is to obtain the image after the deformation correction, which mainly includes: first inputting the image to be recognized into the trained neural network in the embodiment of the present application, and processing the image to be identified by using the multi-layer convolution layer convolution correction network of the correction network to obtain the image to be identified.
  • the feature map of the image is followed by regression processing on the image features in the feature map of the identified image, obtaining a plurality of positioning parameters of the image to be identified, and then spatially transforming the image to be recognized according to the plurality of positioning parameters, and obtaining the corrected to be identified image.
  • the target classification result is obtained by using the image after the deformation correction, which mainly comprises: inputting the corrected image to be recognized into a preset recognition network, and obtaining a target classification result of the image to be identified, wherein the preset identification network may be multiple Existing target recognition network.
  • the deformation image is corrected first, and the target image is corrected based on the corrected image, which can reduce the interference of the deformation on the target recognition. Therefore, the embodiment of the present application can be applied to the deformation image. Improve the accuracy of target recognition.
  • FIG. 1 is a flowchart of a method for identifying a target of a deformed image according to an embodiment of the present application, including the following steps:
  • Step 101 Input the image to be identified into a preset positioning network, and obtain a plurality of positioning parameters of the image to be identified.
  • the positioning network includes a preset convolution layer, and the plurality of positioning parameters are in the feature map obtained by convolving the image to be identified.
  • the image features are obtained after regression.
  • the image to be recognized is an image containing a target captured by an arbitrary image capturing device, such as an image taken by a camera, an image taken by a camera, an image taken by a mobile phone, etc.
  • the target may be a person, an animal, a plant, a building, or a vehicle. , characters and many other types.
  • the image to be identified in the embodiment of the present application may be an undeformed image or a deformed image.
  • the following describes the method of the embodiment of the present application by using the image to be recognized as a deformed image.
  • Deformation refers to the morphological changes of objects in the image such as translation, scaling, rotation, and distortion.
  • the deformation in the embodiment of the present application may be the tilting, zooming, and perspective transformation of the target due to the change of the shooting angle during image capturing, or may be the deformation of the target caused by human beings in a complex natural scene, such as font design and changes. Tilting, twisting, etc.
  • the embodiment of the present application provides an end-to-end deep neural network model.
  • the embodiment of the present application can perform image correction and recognition for different deformation types and specific networks corresponding to the deformation type, and different specific networks are based on The same model idea, but the network structure and parameters of the image correction part may be slightly different.
  • the specific network corresponding to different deformation types can be fine-tuned based on a basic network, and the network structure and parameters of the image correction part are fine-tuned.
  • the deformation type for the embodiment of the present application may have various preset deformation types, such as deformation including one or any combination of rotation, translation, and contraction, and multi-angle tensile deformation and the like on the basis of the deformation type.
  • the embodiments of the present application may separately train and obtain corresponding specific networks in advance for various preset deformation types.
  • the embodiment of the present application can predict the deformation type of the image to be recognized for different image tasks, requirements, or different scenes of the generated image, such as images generated under different shooting angles.
  • the deformation is basically a perspective problem caused by the shooting angle.
  • the deformation of the perspective problem caused by the shooting angle may not only contain one or any combination of rotation, translation, and scaling, and there may be multi-angle stretching deformation on the basis of this.
  • the embodiment of the present application adopts a trained specific network corresponding to the deformation type for image correction and target recognition, and the specific network is deformed according to a large number of rotations, translations, and scalings, and on this basis.
  • the deformation model image of the multi-angle stretch deformation is also obtained, and the parameters and transformation algorithms in the network have been optimized and adjusted for the sample image of the deformation type, so that the specific network to be trained can be applied to one
  • the embodiment of the present application can assume that there is a blank image as the corrected image to be recognized. If all the pixels in the blank corrected image to be recognized can be filled, a specific image can be obtained.
  • the corrected image to be recognized is based on the idea that a preset number of pixel points are set as reference points for the corrected image to be obtained to be obtained, and the positions and numbers of the reference points are determined by The trained preset positioning network automatically analyzes the image features and outputs the selected points.
  • the selected reference points can provide the parameters needed to correct the deformation, so that the deformation pictures can be corrected.
  • the principle of setting these reference points is to reflect the corrected after treatment as much as possible.
  • the shape information of the image is identified such that the shape contour of the corrected image to be recognized can be obtained with a preset number of reference points.
  • the preset number of reference points may be a plurality of pixel points uniformly distributed at the edges of the corrected image to be recognized.
  • the corrected image to be recognized that is, the image input to the recognition network is a regular rectangle, to reduce the calculation amount and computational complexity of the recognition network, and it is desired
  • the corrected number of characters in the image to be recognized can fill the entire range of the rectangular frame, so that the pixels in the image to be recognized have the maximum available image information
  • the preset number of reference points in the embodiment of the present application is The preset number of pixels are evenly distributed on the edge of the rectangular frame, so that the shape information of the corrected image to be recognized is reflected by a preset number of reference points uniformly distributed on the outer edge of the rectangular frame.
  • the embodiment of the present application uses the preset positioning network corresponding to the deformation type, and the position and quantity of the reference point corresponding to the deformation type are directly output through the trained corresponding preset positioning network, and different deformation types are respectively used.
  • the position and number of the reference points of the corrected image to be recognized outputted by the corresponding preset positioning network may be different.
  • the reference point may be any pixel in the image, and for other more complicated deformations.
  • at least 4 corner points of the image edge are required as reference points. It can be understood that the more complex the deformation type, the more the number of reference points needed, and the higher the position requirement of the reference point.
  • the preset positioning network in the embodiment of the present application is a preset neural network, and includes multiple convolution layers and at least one full connection layer.
  • Each convolutional layer contains multiple convolution kernels with weights, and the fully concatenated layer contains a weight matrix.
  • the convolution kernel can extract different image features in the image to be identified, such as image edges and acute angles, and obtain feature maps of the image to be recognized containing the image features by extracting image features, and the weight matrix in the fully connected layer contains multiple weights.
  • the value, the weight value reflects the linear relationship between the input data and the corresponding classification result.
  • the weight in the weight matrix can represent the feature map of the image to be identified, and the position of the preset number of reference points in the image to be identified. Linear relationship.
  • the main process of the step 101 is to extract different image features in the image to be identified by using the multi-layer convolution layer of the embodiment of the present application, and then use the fully connected layer to search for the preset number of references in the image features of the image to be identified.
  • the image feature matching the image features of the point that is, the regression processing according to the image feature of the reference point, and the position of the pixel point where the image feature matching the image feature of the reference point is located in the acquired image to be recognized is referred to as positioning. parameter. That is, through step 101, the embodiment of the present application can obtain the position of the preset number of reference points of the image to be identified in the corrected image to be recognized, and the position of the preset number of reference points corresponding to the image to be recognized.
  • Step 102 Perform spatial transformation on the image to be recognized according to the plurality of positioning parameters, to obtain a corrected image to be identified.
  • the spatial transformation of the image to be recognized from the image to be recognized that satisfies all the reference points to the corrected image to be recognized is obtained.
  • the spatial transformation relationship is applied to all the pixels in the image to be identified, and the spatial transformation relationship is used to obtain the position of all the pixels in the image to be identified corresponding to the corrected image to be recognized, and then the corrected image to be recognized is obtained.
  • the embodiments of the present application are different for different types of deformation, and the specific transformation algorithm used by the spatial transformation is different.
  • an affine transformation algorithm is adopted for rotating, translating, and scaling.
  • the complexity of the deformation type for the transformation algorithm increases in turn, and the complexity of the corresponding spatial transformation relationship increases in turn.
  • the spatial transformation relationship of the affine transformation algorithm may be a coordinate transformation matrix containing the relationship of the coordinates of the reference point position.
  • the thin-plate spline transformation algorithm contains multiple complex transformation steps, which contain multiple transformation parameters or formulas.
  • the embodiment of the present application obtains a corresponding position according to the reference point in the corrected image to be recognized, that is, the preset position of the reference point, and according to the positioning parameter, that is, the position of the reference point in the image to be identified.
  • Step 103 Input the corrected image to be recognized into a preset recognition network, and obtain a target classification result of the image to be identified.
  • the preset identification network is one or more neural networks that have been trained.
  • the preset identification network extracts image features from the corrected image to be recognized by using the convolution layer, obtains the feature map of the corrected image to be recognized, and then uses the full connection.
  • the layer classifies the image features in the feature map of the corrected image to be recognized, and obtains the classification result of the target.
  • the preset identification network in the embodiment of the present application may be a plurality of existing target recognition networks. According to different types of identification targets, the preset identification network may be an identification network corresponding to the type of the target, such as a network for recognizing characters, Identify the network of faces and more.
  • the embodiments of the present application can combine and replace a plurality of existing target recognition networks to achieve the purpose of identifying multiple types of targets.
  • the image to be identified is first input into a preset positioning network, and multiple positioning parameters of the image to be identified are acquired, and the positioning network includes a preset convolution layer and multiple positioning.
  • the parameter is obtained after the regression of the image features in the feature map obtained by convolving the image to be recognized.
  • the image to be recognized is spatially transformed to obtain a corrected image to be recognized.
  • the corrected image to be identified is input into a preset recognition network, and the target classification result of the image to be identified is obtained.
  • the deformation image is corrected first, and the target image is corrected based on the corrected image, which can reduce the interference of the deformation on the target recognition. Therefore, the embodiment of the present application can improve the deformation image.
  • FIG. 2 is a flowchart of training of a neural network according to an embodiment of the present application, including:
  • step 201 the structure of the initial neural network is constructed, and the parameter values of the initial neural network are set.
  • the neural network has been widely used in the field of image recognition and the like, and there are many existing neural network structures.
  • the embodiment of the present application can be combined with the existing neural network structure to construct the structure of the initial neural network.
  • the network may include a correction network and a recognition network according to functions.
  • the correction network includes a positioning network and a spatial transformation network.
  • the correction network and the identification network include at least one convolution layer and at least one full connection layer, and the parameter values include convolution of the convolution layer. The number of cores, the size of the convolution kernel, the weight of the convolution kernel, the value of the weight matrix of the fully connected layer, and so on.
  • the embodiment of the present application sets the parameter value of the initial neural network while constructing the structure of the initial neural network, and the parameter value includes the number of convolution kernels of the convolution layer, such as 32, 64, etc., convolution kernel size, such as 3* 3, 5 * 5, etc., convolution kernel weight value, weight matrix value of the fully connected layer, and so on.
  • any matrix value of the initial neural network may be assigned an arbitrary value as an initial value, or a random number may be generated as an initial value for each matrix value of the initial neural network by using a method such as the initialization method msra.
  • the numbers are all in the form of real numbers.
  • the embodiment of the present application may adjust the structure, parameters, and the like of the correction network for the preset deformation type, and obtain a plurality of specific initial neural networks for different preset deformation types.
  • Step 202 Acquire each sample image of the deformed image, and a target recognition result known for each sample image.
  • the embodiment of the present application pre-acquires a large number of sample images containing the deformation image of the target, and the target recognition result known for each sample image.
  • the deformation image is an image in which an object in the image has deformation such as translation, scaling, rotation, distortion, etc., and the object in the embodiment of the present application may be a person, an animal, a plant, a building, a vehicle, a character, or the like.
  • the source of the image may be an image taken by any image capturing device, such as an image taken by a camera, an image taken by a camera, an image taken by a mobile phone, etc., and the sample image may be an image acquired in real time or a stored historical image.
  • the embodiment of the present application presets a plurality of deformation types, obtains a sample image containing the target corresponding to the preset deformation type, and trains the respective neural networks with the corresponding sample images for different preset deformation types.
  • Step 203 Input the target recognition result of each sample image and each sample image into the initial neural network, and obtain a corresponding target recognition result obtained by the initial neural network of each sample image.
  • each sample image corresponding to the preset deformation type and the target recognition result known to each sample image are input into a specific initial neural network of a preset deformation type, and the initial neural network output, each sample image is obtained.
  • Target recognition results obtained through the initial neural network are input into a specific initial neural network of a preset deformation type, and the initial neural network output, each sample image is obtained.
  • Step 204 Obtain a response value of the loss function according to the corresponding target recognition result obtained by the initial sample neural network of each sample image and the target recognition result known to each sample image.
  • one or more loss functions may be preset, and each loss function measures a corresponding target recognition result obtained by the initial neural network of the sample image by a certain angle, and a difference with a known target recognition result, such as a loss function. It may be a corresponding target recognition result obtained by the initial image of the sample image, a subtraction function of the known target recognition result, or a function of finding the Euclidean distance between the two, and the like, and the loss rate function may be used in the embodiment of the present application.
  • the response values are weighted to obtain a combination of multiple angles to measure the difference between the two, so as to more accurately measure the degree of difference between the corresponding target recognition results obtained by the initial neural network of each sample image and the known target recognition results.
  • Step 205 continuously adjust the structure or parameter value of the initial neural network according to the response value of the loss function until the neural network satisfies the preset condition, and obtain the trained neural network.
  • the embodiment of the present application aims to train the neural network to approach the loss function to a minimum value. Therefore, the embodiment of the present application can set a target value for the response value of the loss function, and continuously adjust the structure or parameter value of the initial neural network until the neural network satisfies the pre- Let the condition that the response value of the loss function reach the target value and obtain the trained neural network.
  • the target recognition result of the image output by the neural network is tested.
  • the target recognition result outputted by the neural network is compared with the known target recognition result of the sample to obtain the recognition correct rate.
  • the recognition correct rate reaches the preset correct rate, if the preset correct rate is 98%, the training is stopped and obtained. Train the resulting neural network.
  • the process is guided by the recognition rate of the target recognition result, and the response value of the loss function is not set, which is more suitable for the purpose of use and actual use.
  • the structure of the initial neural network may be adjusted to change the type and quantity of each layer in the network, and may also increase or decrease other components. Adjusting the parameter values of the initial neural network may be: modifying the convolutional layer of the convolutional layer, the convolution kernel size, the convolution kernel weight value, the weight matrix value of the fully connected layer, and the like. Adjusting the parameter values of the initial neural network may employ a gradient descent method or the like.
  • FIG. 3 is a structural diagram of a neural network according to an embodiment of the present application, including a correction network and an identification network, where the correction network further includes a positioning network and a spatial transformation network.
  • FIG. 4 is another flowchart of a method for identifying a target of a deformed image according to an embodiment of the present application, including the following steps:
  • Step 401 Extract image features by using a preset convolution layer to identify the image, and obtain a feature map of the image to be recognized that includes the image features.
  • common images have three channels of red, green and blue.
  • the channel is an idiom that refers to the specific components of the image.
  • the image data corresponding to each color is a two-dimensional matrix, and the values in each matrix. For the value of the pixel, the value ranges from 0 to 255, and the two-dimensional matrix of the three channels is superimposed, which is the matrix corresponding to the pixel point of the original image, that is, the original image data.
  • the image to be recognized is input into the trained neural network, and a plurality of convolution kernels in a convolution layer are preset by the positioning network, and a matrix corresponding to the pixel points of the image to be recognized is convoluted, and the convolution kernel is used.
  • a convolution kernel can be thought of as a matrix containing weights, and the convolution kernel uses its weights to extract image features.
  • a typical convolution kernel of the first layer of convolutional layer may have a size of 5x5x3, that is, a width and height of 5, a matrix of weights with a depth of 3, and a depth of 3 because of the input image. That is to say, the image to be recognized has three color channels of red, green and blue.
  • each convolution kernel that is, the filter
  • slides on the input data that is, the width and height of the matrix corresponding to the pixel of the image to be recognized.
  • the sliding step size when the step size is 1, the filter moves by 1 pixel at a time.
  • the inner product of either the entire filter and the input data is then calculated, that is, the inner product of the weight of the filter and the value of the pixel at the corresponding position.
  • a two-dimensional activation map that is, a feature map of the image to be recognized in the embodiment of the present application, is generated, and the activation map is given at each spatial position.
  • the response of the filter which is the image feature extracted by the filter.
  • the convolutional neural network lets the filter learn that it is activated when it sees certain types of image features.
  • the specific visual image features may be boundaries in certain orientations, or spots of certain colors. Wait.
  • Each filter looks for something different in the input data, that is, different image features, and superimposes the obtained different image features to obtain a feature map of the image to be identified.
  • the embodiment of the present application may further add a pooling layer Pool to perform down-sampling processing on the feature map obtained by the convolution layer.
  • the main processing of the pooling layer is to divide the feature map obtained by the convolution layer into a plurality of preset regions, and downsample the plurality of pixel values in each preset region into one pixel value to reduce the data amount. , obtain the feature map after downsampling.
  • the input of the first layer of convolutional layer is the original image
  • different neurons in the depth dimension will likely be activated by boundaries in different directions, or by color spots.
  • These sets of neurons arranged in the depth direction and receiving the same area are referred to as depth columns or depth slices.
  • the role of the pooling layer pool is to reduce the spatial size of the data volume, which can reduce the number of parameters in the network, make the computing resources less expensive, and can effectively control over-fitting.
  • the pooling layer can operate independently on each depth slice of the input data volume using the maximum MAX operation, changing its spatial size.
  • the most common form is that the pooling layer uses a 2 ⁇ 2 size filter, and each depth slice is downsampled with a step size of 2.
  • the convolutional layer outputs a 32*32*12 data volume
  • the pooling layer divides 32*32 into 16*16 2*2 data bodies, and then selects a maximum value in each 2*2 data body, that is, 2*2, and finally obtains one sample after sampling.
  • the 16*16*12 data body Compared with the original 32*32*12 data volume, the data volume is reduced in width and height, but the depth is unchanged.
  • the maximum pooled MaxPool discards 75% of the activation information in the original data volume, which can reduce the amount of data.
  • the pooling layer can also use other pooling methods such as average pooled meatpool.
  • Step 402 Perform regressive processing on the image features in the feature image of the image to be recognized by using the fully connected layer in the preset positioning network, and obtain a plurality of positioning parameters of the image to be identified, where the positioning parameter is, in the image to be identified, and after the correction The coordinates of the pixel points matched by the image features of the preset number of reference points in the image to be recognized.
  • a preset number of reference points are obtained by a preset positioning network in a pre-trained neural network corresponding to the deformation type, that is, coordinates of a preset number of pixel points are obtained, and the coordinates are obtained. It can be understood as the coordinates in the blank corrected image to be recognized.
  • the position and the number of the reference points are such that the positions and the number of pixel points required for the deformation image to be corrected can be satisfied by the parameters required to provide the deformation correction of the deformation type.
  • the corrected image to be recognized can be obtained by obtaining the deformation correction parameter of the rotation angle by at least one reference point.
  • the perspective transformation requires at least four corner points of the image edge as The reference point obtains the parameters required for the deformation correction to obtain the corrected image to be identified.
  • the reference point obtains the parameters required for the deformation correction to obtain the corrected image to be identified.
  • the corrected image to be recognized is obtained.
  • the reference point of the embodiment of the present application is related to the type of deformation, and the principle of setting the reference point is such that the entire image of the image to be recognized can obtain the expected correction effect at the provided reference point position and number.
  • the image features in the image to be identified are classified, and which image features are matched with the image features of the preset reference points in the corrected image to be recognized, that is, for a preset reference point, A pixel point corresponding to the image feature of the preset reference point is found in the image to be identified, and the coordinates of the pixel point are used as positioning parameters corresponding to the reference point.
  • the fully connected layer in the preset positioning network in the embodiment of the present application is trained, so that the weight in the weight matrix reflects the feature map of the image to be identified, and the position of the preset number of reference points in the image to be identified.
  • the linear relationship Multiplying the weight matrix of the trained fully connected layer by the matrix of pixel points corresponding to the feature map of the image to be identified, to obtain a preset number of positioning parameters, that is, the corrected number of references in the image to be identified after correction
  • the points correspond to the coordinates in the image to be identified, respectively.
  • 20 reference points are selected, and the positioning parameter obtained by step 402 is the coordinates of 20 reference points, which is a total of 40 coordinate values including the x coordinate component and the y coordinate component.
  • Step 403 Acquire a space between the to-be-identified image and the corrected image to be recognized according to the positioning parameter corresponding to the preset number of reference points and the coordinates of the preset number of reference points in the corrected image to be recognized. Transform the relationship.
  • the embodiment of the present application calculates the entire spatial transformation relationship by the coordinate correspondence between the image to be recognized and the corrected image to be recognized by the reference point, and then Correcting the coordinates of all the pixels in the image to be recognized by the entire spatial transformation relationship obtained by a predetermined number of reference points, and obtaining the coordinates of all the pixels in the image to be identified corresponding to the corrected image to be recognized, The purpose of filling the entire corrected image to be recognized is achieved.
  • the spatial transformation relationship of the embodiment of the present application is related to the deformation type.
  • a simple transformation such as a simple translation deformation
  • only the coordinate change amount of the displacement is required as a parameter required for the deformation to complete the spatial transformation
  • There is only a scaling deformation and only the multiple required for scaling is sufficient to complete the spatial transformation
  • very complex deformations such as translations including translation, rotation, scaling, and distortion
  • Pixel coordinates or parameters required for one or two deformations have been unable to complete the entire spatial transformation, so the reference points need to be increased accordingly to obtain more and more complex parameters needed for deformation to estimate the entire spatial transformation relationship.
  • the spatial transformation relationship may contain various steps, parameters, calculation formulas or mathematical calculation forms according to the different complexity of the deformation type.
  • the simplest spatial transformation method may be the coordinates of the reference point in the image to be identified and the corrected image to be recognized.
  • a coordinate transformation matrix wherein the coordinate transformation matrix is a positioning parameter corresponding to a preset number of reference points, that is, a coordinate of a preset number of reference points in the image to be recognized, and is corrected according to a preset number of reference points
  • the coordinates in the image to be identified are summarized as a coordinate transformation matrix that is suitable for all the pixels in the image to be recognized, and represents the coordinate of the pixel point from the image to be identified to the image to be recognized after correction.
  • the coordinate transformation matrix is an example of the spatial transformation relationship.
  • the positioning parameter is the information necessary for the spatial transformation.
  • the deformation type can be obtained.
  • the specific transformation parameters may include specific different steps, multiple parameters and calculation methods, etc., by using the specific different steps, various parameters and calculation methods, etc. The transformation algorithm corrects the deformation type accordingly.
  • step 403 can be further specifically summarized as:
  • the coordinate of the reference point in the image to be recognized is obtained as a reference point to be identified after correction.
  • the transformation parameters required by the preset transformation algorithm of the coordinates in the image, the preset transformation algorithm includes one of an affine transformation algorithm, a perspective transformation algorithm, and a thin plate spline transformation algorithm.
  • Step 404 according to the spatial transformation relationship, obtain coordinates of all the pixels in the image to be identified corresponding to the corrected image to be recognized, and obtain the corrected image to be recognized.
  • the spatial transformation relationship may have different steps, multiple parameters, calculation methods, and the like, and the corresponding transformation may be performed according to the spatial transformation relationship corresponding to the preset deformation type.
  • the algorithm obtains the coordinates of all the pixels in the image to be identified corresponding to the image to be recognized in the different steps, parameters and calculation manners, and obtains the corrected image to be recognized.
  • the three transformation algorithms are specific transformation algorithms for deformation types of different complexity.
  • each specific The network selects a transformation algorithm according to the deformation type, and the three transformation algorithms are used separately for the respective deformation types, for example, the perspective problem caused by the shooting angle, the embodiment of the present application only needs to use the perspective transformation algorithm;
  • the affine transformation algorithm can not solve the perspective problem, and the affine transformation algorithm is not used in the deformation type of the perspective problem brought by the shooting angle.
  • the thin plate spline algorithm can also solve the perspective transformation, of course,
  • the perspective transformation algorithm is used in the network to replace it, but the thin-plate spline algorithm can solve various distortion problems such as distortion and bending in addition to solving the perspective.
  • the powerful calculation requires a corresponding amount of calculation and time overhead. Large, so generally do not need to use thin sheets when only need to solve the perspective problem Article algorithm, perspective transformation algorithm sufficient.
  • the embodiment of the present application may use an affine transformation algorithm to multiply the coordinate matrix of the pixel corresponding to the image to be identified, and multiply the coordinate matrix by the coordinate transformation matrix to obtain the pixel corresponding to the corrected image to be identified.
  • the coordinate matrix is obtained according to the coordinate matrix of the pixel corresponding to the corrected image to be recognized, the coordinates of all the pixels of the image to be recognized in the corrected image to be recognized are obtained, and finally the corrected image to be recognized is obtained.
  • step 404 can be further specifically summarized as: according to the transformation parameters required by the preset transformation algorithm, using the preset transformation algorithm, calculating the coordinates of all the pixels in the image to be recognized in the image to be identified, and obtaining all the pixels in the image to be identified.
  • the point corresponds to the coordinates in the corrected image to be recognized, and the corrected image to be recognized is obtained.
  • Step 405 Extract a feature of the corrected image to be recognized by using a convolution layer in the preset recognition network, and obtain a feature map of the corrected image to be recognized that includes the image feature.
  • the preset identification network in the embodiment of the present application is a trained neural network, and may be a plurality of existing target recognition networks corresponding to the target type.
  • the convolutional neural network CNN and the circulating neural network may be used.
  • the pixel points of the corrected image to be identified are convoluted with the convolution kernel of the convolution layer in the preset identification network, and the convolution kernel is extracted, and the corrected image to be recognized is included.
  • a feature map of image feature information is included.
  • Step 406 Perform a classification process on the image features in the feature image of the corrected image to be recognized by using the fully connected layer in the preset identification network, and obtain a target classification result of the image to be identified.
  • the pixel point matrix corresponding to the feature image of the corrected image to be identified is multiplied by the weight matrix of the fully connected layer in the preset identification network, and the target in the feature map of the corrected image to be identified is obtained. Classification results.
  • the convolutional neural network CNN and the cyclic neural network RNN are first used to extract the feature of the corrected image to be recognized, and then obtain the corrected feature image of the image to be recognized, and then use the fully connected layer pair to be identified.
  • the feature map of the image is classified to obtain the feature sequence, and the feature sequence is still the data information corresponding to the value of the pixel, and then the sequence decoder with the correspondence between the feature sequence and the string is used to complete the conversion of the feature sequence and the string result.
  • the recognized string For example, an image containing the string "hello" is processed by the convolution layer and the fully connected layer to obtain a 1*60 feature sequence, which contains data information corresponding to the image features, such as different values such as 0, 1, and the like.
  • the feature sequence is input to the sequence decoder, and the sequence decoder outputs 8, 5, 12, 12, and 15. Further, the sequence decoder can obtain the correspondence between the feature sequence and the character string preset in the sequence decoder.
  • the image to be recognized is input into a preset positioning network, and the image is extracted by using a predetermined convolution layer to obtain an image to be recognized.
  • the feature map uses the fully connected layer in the preset positioning network to perform regression processing on the image features in the feature image of the image to be recognized, and obtains a plurality of positioning parameters of the image to be identified, the positioning parameters are, in the image to be identified, and the correction The coordinates of the pixel points matched by the image features of the preset number of reference points in the image to be recognized.
  • the spatial transformation between the image to be recognized and the corrected image to be recognized is obtained.
  • the relationship is obtained, and according to the spatial transformation relationship, coordinates of all the pixels in the image to be identified corresponding to the corrected image to be recognized are obtained, and the corrected image to be recognized is obtained.
  • the convolution layer in the preset recognition network is used to extract the image features of the corrected image to be recognized, obtain the feature image of the corrected image to be recognized containing the image features, and then use the fully connected layer in the preset identification network.
  • the deformation image is corrected first, and the target image is corrected based on the corrected image, which can reduce the interference of the deformation on the target recognition. Therefore, the embodiment of the present application can improve the deformation image.
  • FIG. 5 is a structural diagram of a target recognition apparatus for a deformed image according to an embodiment of the present application, including:
  • the positioning module 501 is configured to input the image to be identified into a preset positioning network, and obtain a plurality of positioning parameters of the image to be identified.
  • the positioning network includes a preset convolution layer, and the plurality of positioning parameters are characteristics obtained by convolving the image to be identified. The image features in the figure are obtained after regression.
  • the spatial transformation module 502 is configured to perform spatial transformation on the image to be recognized according to the plurality of positioning parameters to obtain a corrected image to be identified.
  • the identification module 503 is configured to input the corrected image to be recognized into a preset recognition network, and obtain a target classification result of the image to be identified.
  • the target recognition device for the deformed image provided by the embodiment of the present application first inputs the image to be identified into a preset positioning network, and acquires a plurality of positioning parameters of the image to be identified, and the positioning network includes a preset convolution layer and multiple positioning.
  • the parameter is obtained after the regression of the image features in the feature map obtained by convolving the image to be recognized.
  • the image to be recognized is spatially transformed to obtain a corrected image to be recognized.
  • the corrected image to be identified is input into a preset recognition network, and the target classification result of the image to be identified is obtained.
  • the deformation image is corrected first, and the target image is corrected based on the corrected image, which can reduce the interference of the deformation on the target recognition. Therefore, the embodiment of the present application can improve the deformation image.
  • the apparatus in the embodiment of the present application is a device that applies the foregoing target recognition method for a deformed image, and all the embodiments of the target recognition method for the deformed image are applicable to the device, and both can achieve the same or Similar benefits.
  • FIG. 6 is another structural diagram of a target recognition apparatus for a deformed image according to an embodiment of the present application, including:
  • the positioning module 601 includes:
  • the feature map obtaining sub-module 6011 is configured to extract an image feature by using a preset convolution layer to identify the image, and obtain a feature map of the image to be recognized that includes the image feature.
  • the positioning sub-module 6012 is configured to perform regression processing on the image features in the feature image of the image to be recognized by using the fully connected layer in the preset positioning network, and obtain a plurality of positioning parameters of the image to be identified, and the positioning parameter is in the image to be identified. a coordinate of a pixel point that matches an image feature of a preset number of reference points in the corrected image to be recognized.
  • the spatial transformation module 602 includes:
  • the transformation relationship acquisition sub-module 6021 is configured to acquire the reference point in the image to be recognized and the corrected image according to the positioning parameter corresponding to the preset number of reference points and the coordinates of the preset number of reference points in the corrected image to be recognized. Identify spatial transformation relationships between images.
  • the correction sub-module 6022 is configured to obtain, according to the spatial transformation relationship, coordinates of all the pixels in the image to be identified corresponding to the corrected image to be recognized, and obtain the corrected image to be recognized.
  • the transformation relationship acquisition sub-module 6021 is specifically configured to:
  • the coordinate of the reference point in the image to be recognized is obtained as a reference point to be identified after correction.
  • the transformation parameters required by the preset transformation algorithm of the coordinates in the image, the preset transformation algorithm is one of an affine transformation algorithm, a perspective transformation algorithm, and a thin plate spline transformation algorithm.
  • the syndrome module 6022 is specifically configured to:
  • the preset transformation algorithm uses the preset transformation algorithm, calculating the coordinates of all the pixels in the image to be identified in the image to be identified, and obtaining all the pixels in the image to be identified corresponding to the image to be identified after correction.
  • the identification module 603 includes:
  • the feature map obtaining sub-module 6031 is configured to extract an image feature from the corrected image to be recognized by using a convolution layer in the preset recognition network, and obtain a feature map of the corrected image to be recognized that includes the image feature.
  • the classification sub-module 6032 is configured to perform classification processing on the image features in the feature image of the corrected image to be recognized by using the fully-connected layer in the preset identification network, and obtain the target classification result of the image to be identified.
  • the image to be recognized is input into a preset positioning network, and the image is extracted by using a predetermined convolution layer to obtain an image to be recognized.
  • the feature map uses the fully connected layer in the preset positioning network to perform regression processing on the image features in the feature image of the image to be recognized, and obtains a plurality of positioning parameters of the image to be identified, the positioning parameters are, in the image to be identified, and the correction The coordinates of the pixel points matched by the image features of the preset number of reference points in the image to be recognized.
  • the spatial transformation between the image to be recognized and the corrected image to be recognized is obtained.
  • the relationship is obtained, and according to the spatial transformation relationship, coordinates of all the pixels in the image to be identified corresponding to the corrected image to be recognized are obtained, and the corrected image to be recognized is obtained.
  • the convolution layer in the preset recognition network is used to extract the image features of the corrected image to be recognized, obtain the feature image of the corrected image to be recognized containing the image features, and then use the fully connected layer in the preset identification network.
  • the deformation image is corrected first, and the target image is corrected based on the corrected image, which can reduce the interference of the deformation on the target recognition. Therefore, the embodiment of the present application can improve the deformation image.
  • An embodiment of the present invention further provides an electronic device, as shown in FIG. 7, including a processor 701 and a memory 702.
  • a memory 702 configured to store a computer program
  • the processor 701 is configured to implement a target recognition method for the deformed image when the computer program stored on the memory 702 is executed, and the method includes:
  • the preset positioning network includes a preset convolution layer, and the plurality of positioning parameters are convolution of the image to be identified
  • the obtained image features in the feature map are obtained after regression;
  • the electronic device first inputs the image to be identified into a preset positioning network, and acquires multiple positioning parameters of the image to be identified.
  • the positioning network includes a preset convolution layer, and the plurality of positioning parameters are convolved images to be identified.
  • the obtained image features in the feature map are obtained after regression.
  • the image to be recognized is spatially transformed to obtain a corrected image to be recognized.
  • the corrected image to be identified is input into a preset recognition network, and the target classification result of the image to be identified is obtained.
  • the deformation image is corrected first, and the target image is corrected based on the corrected image, which can reduce the interference of the deformation on the target recognition. Therefore, the embodiment of the present application can improve the deformation image.
  • the inputting the image to be recognized into the preset positioning network, and acquiring the plurality of positioning parameters of the image to be identified may include:
  • the positioning parameter is In the image to be recognized, the coordinates of the pixel points that match the image features of the preset number of reference points in the corrected image to be recognized.
  • the spatially transforming the image to be identified according to the plurality of positioning parameters to obtain the corrected image to be identified may include:
  • the positioning parameter corresponding to the preset number of reference points, the coordinates of the preset number of reference points in the corrected image to be recognized, and the reference point are obtained in the
  • the spatial transformation relationship between the identification image and the corrected image to be recognized may include:
  • the inputting the corrected image to be recognized into the preset recognition network, and acquiring the target classification result of the image to be identified may include:
  • the image features in the feature image of the corrected image to be identified are classified by using the all-connection layer in the preset identification network, and the target classification result of the image to be recognized is obtained.
  • the embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores a computer program, and when the computer program is executed by the processor, the object recognition method for the deformed image is implemented, and the method includes:
  • the preset positioning network includes a preset convolution layer, and the plurality of positioning parameters are convolution of the image to be identified
  • the obtained image features in the feature map are obtained after regression;
  • the image to be identified when the computer program is executed by the processor, the image to be identified is first input into a preset positioning network, and multiple positioning parameters of the image to be identified are acquired.
  • the positioning network includes a preset convolution layer, and multiple positioning parameters are to be
  • the image features obtained in the feature map obtained after the image convolution are obtained are returned.
  • the image to be recognized is spatially transformed to obtain a corrected image to be recognized.
  • the corrected image to be identified is input into a preset recognition network, and the target classification result of the image to be identified is obtained.
  • the deformation image is corrected first, and the target image is corrected based on the corrected image, which can reduce the interference of the deformation on the target recognition. Therefore, the embodiment of the present application can improve the deformation image.
  • the inputting the image to be recognized into the preset positioning network, and acquiring the plurality of positioning parameters of the image to be identified may include:
  • the positioning parameter is In the image to be recognized, the coordinates of the pixel points that match the image features of the preset number of reference points in the corrected image to be recognized.
  • the spatially transforming the image to be identified according to the plurality of positioning parameters to obtain the corrected image to be identified may include:
  • the positioning parameter corresponding to the preset number of reference points, the coordinates of the preset number of reference points in the corrected image to be recognized, and the reference point are obtained in the
  • the spatial transformation relationship between the identification image and the corrected image to be recognized may include:
  • the inputting the corrected image to be recognized into the preset recognition network, and acquiring the target classification result of the image to be identified may include:
  • the image features in the feature image of the corrected image to be identified are classified by using the all-connection layer in the preset identification network, and the target classification result of the image to be recognized is obtained.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供了一种对形变图像的目标识别方法及装置,所述方法包括:将待识别图像输入预设定位网络,获取所述待识别图像的多个定位参数,所述定位网络包括预设个卷积层,所述多个定位参数是所述待识别图像卷积后得到的特征图中的图像特征回归后得到的;根据所述多个定位参数,对所述待识别图像进行空间变换,得到校正后的待识别图像;将所述校正后的待识别图像输入预设识别网络,获取所述待识别图像的目标分类结果。本申请实施例在基于神经网络的目标识别过程中,先对形变图像进行校正,基于校正后的图像进行目标识别,能够减少形变对目标识别的干扰,因此本申请实施例能够针对形变图像,提高目标识别的准确性。

Description

一种对形变图像的目标识别方法及装置
本申请要求于2017年6月16日提交中国专利局、申请号为201710457725.7发明名称为“一种对形变图像的目标识别方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像识别领域,特别是涉及一种对形变图像的目标识别方法及装置。
背景技术
随着神经网络技术的发展,基于图像的目标识别技术得到了迅速发展。基于神经网络的目标识别方法是利用神经网络的自主学习特性,提取图像特征,获得目标的分类结果,也就是目标的识别结果。相比于传统的目标识别方法,能够提高目标识别的准确率,且能够识别的目标的类型也更广泛,如人、动物、植物、建筑物、车辆、字符等等。
现有的基于神经网络的目标识别方法,使用深度神经网络模型对图像进行目标识别,但目标识别的过程中未考虑在复杂场景中,由于拍摄或其他原因导致的目标的形变对目标识别带来的影响,如图像拍摄过程中拍摄视角变化带来的目标的倾斜、缩放和透视变换等,或自然场景中目标的人为形变,如字符识别中遇到的字体设计和变化带来的倾斜、扭曲等。针对形变图像,现有的基于神经网络的目标识别方法将形变大的目标直接进行分类,导致目标识别的准确率降低。
发明内容
本申请实施例的目的在于提供一种对形变图像的目标识别方法及装置,能够针对形变图像,提高目标识别的准确性。具体技术方案如下:
一方面,本申请实施例公开了一种对形变图像的目标识别方法,包括:
将待识别图像输入预设定位网络,获取所述待识别图像的多个定位参数, 所述预设定位网络包括预设个卷积层,所述多个定位参数是所述待识别图像卷积后得到的特征图中的图像特征回归后得到的;
根据所述多个定位参数,对所述待识别图像进行空间变换,得到校正后的待识别图像;
将所述校正后的待识别图像输入预设识别网络,获取所述待识别图像的目标分类结果。
可选的,所述将待识别图像输入预设定位网络,获取所述待识别图像的多个定位参数,包括:
利用所述预设个卷积层对所述待识别图像提取图像特征,获得含有图像特征的待识别图像的特征图;
利用所述预设定位网络中的全连接层,对所述待识别图像的特征图中的图像特征进行回归处理,获取所述待识别图像的多个定位参数,所述定位参数为,所述待识别图像中,与校正后的待识别图像中预设数量个基准点的图像特征匹配的像素点的坐标。
可选的,所述根据所述多个定位参数,对所述待识别图像进行空间变换,得到校正后的待识别图像,包括:
根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在所述待识别图像和校正后的待识别图像之间的空间变换关系;
根据所述空间变换关系,获得所述待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
可选的,所述根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在所述待识别图像和校正后的待识别图像之间的空间变换关系,包括:
根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获得将基准点在所述待识别图像中的坐标变换为基准点在校正后的待识别图像中的坐标的预设变换算法所需要的变换参数,所述 预设变换算法包括仿射变换算法、透视变换算法、薄板样条变换算法之一;
所述根据所述空间变换关系,获得待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像,包括:
根据所述预设变换算法所需要的变换参数,利用所述预设变换算法,计算所述待识别图像中所有像素点在所述待识别图像中的坐标,获得所述待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
可选的,所述将所述校正后的待识别图像输入预设识别网络,获取所述待识别图像的目标分类结果,包括:
利用所述预设识别网络中的卷积层,对所述校正后的待识别图像提取图像特征,获得含有图像特征的校正后的待识别图像的特征图;
利用所述预设识别网络中的全连接层,对所述校正后的待识别图像的特征图中的图像特征进行分类处理,获取待识别图像的目标分类结果。
另一方面,本申请实施例还公开了一种对形变图像的目标识别装置,包括:
定位模块,用于将待识别图像输入预设定位网络,获取所述待识别图像的多个定位参数,所述预设定位网络包括预设个卷积层,所述多个定位参数是所述待识别图像卷积后得到的特征图中的图像特征回归后得到的;
空间变换模块,用于根据所述多个定位参数,对所述待识别图像进行空间变换,得到校正后的待识别图像;
识别模块,用于将所述校正后的待识别图像输入预设识别网络,获取所述待识别图像的目标分类结果。
可选的,所述定位模块,包括:
特征图获取子模块,用于利用所述预设个卷积层对所述待识别图像提取图像特征,获得含有图像特征的待识别图像的特征图;
定位子模块,用于利用所述预设定位网络中的全连接层,对所述待识别 图像的特征图中的图像特征进行回归处理,获取所述待识别图像的多个定位参数,所述定位参数为,所述待识别图像中,与校正后的待识别图像中预设数量个基准点的图像特征匹配的像素点的坐标。
可选的,所述空间变换模块,包括:
变换关系获取子模块,用于根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在所述待识别图像和校正后的待识别图像之间的空间变换关系;
校正子模块,用于根据所述空间变换关系,计算所述待识别图像中所有像素点在待识别图像中的坐标,获得待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
可选的,所述变换关系获取子模块,具体用于:
根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获得将基准点在所述待识别图像中的坐标变换为基准点在校正后的待识别图像中的坐标的预设变换算法所需要的变换参数,所述预设变换算法包括仿射变换算法、透视变换算法、薄板样条变换算法之一;
所述校正子模块,具体用于:
根据所述预设变换算法所需要的变换参数,利用所述预设变换算法,计算所述待识别图像中所有像素点在所述待识别图像中的坐标,获得所述待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
可选的,所述识别模块,包括:
特征图获取子模块,用于利用所述预设识别网络中的卷积层,对所述校正后的待识别图像提取图像特征,获得含有图像特征的校正后的待识别图像的特征图;
分类子模块,用于利用所述预设识别网络中的全连接层,对所述校正后的待识别图像的特征图中的图像特征进行分类处理,获取待识别图像的目标分类结果。
为达到上述目的,本申请实施例还公开了一种电子设备,包括:处理器和存储器,
存储器,用于存放计算机程序;
处理器,用于执行存储器上所存放的计算机程序时,实现将待识别图像输入预设定位网络,获取所述待识别图像的多个定位参数,所述预设定位网络包括预设个卷积层,所述多个定位参数是所述待识别图像卷积后得到的特征图中的图像特征回归后得到的;
根据所述多个定位参数,对所述待识别图像进行空间变换,得到校正后的待识别图像;
将所述校正后的待识别图像输入预设识别网络,获取所述待识别图像的目标分类结果。
可选的,所述将待识别图像输入预设定位网络,获取所述待识别图像的多个定位参数,可以包括:
利用所述预设个卷积层对所述待识别图像提取图像特征,获得含有图像特征的待识别图像的特征图;
利用所述预设定位网络中的全连接层,对所述待识别图像的特征图中的图像特征进行回归处理,获取所述待识别图像的多个定位参数,所述定位参数为,所述待识别图像中,与校正后的待识别图像中预设数量个基准点的图像特征匹配的像素点的坐标。
可选的,所述根据所述多个定位参数,对所述待识别图像进行空间变换,得到校正后的待识别图像,可以包括:
根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在所述待识别图像和校正后的待识别图像之间的空间变换关系;
根据所述空间变换关系,获得所述待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
可选的,所述根据预设数量个基准点对应的定位参数、预设数量个基准 点在校正后的待识别图像中的坐标,获取基准点在所述待识别图像和校正后的待识别图像之间的空间变换关系,可以包括:
根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获得将基准点在所述待识别图像中的坐标变换为基准点在校正后的待识别图像中的坐标的预设变换算法所需要的变换参数,所述预设变换算法包括仿射变换算法、透视变换算法、薄板样条变换算法之一;
所述根据所述空间变换关系,获得待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像,包括:
根据所述预设变换算法所需要的变换参数,利用所述预设变换算法,计算所述待识别图像中所有像素点在所述待识别图像中的坐标,获得所述待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
可选的,所述将所述校正后的待识别图像输入预设识别网络,获取所述待识别图像的目标分类结果,可以包括:
利用所述预设识别网络中的卷积层,对所述校正后的待识别图像提取图像特征,获得含有图像特征的校正后的待识别图像的特征图;
利用所述预设识别网络中的全连接层,对所述校正后的待识别图像的特征图中的图像特征进行分类处理,获取待识别图像的目标分类结果。
为达到上述目的,本申请实施例还公开了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现将待识别图像输入预设定位网络,获取所述待识别图像的多个定位参数,所述预设定位网络包括预设个卷积层,所述多个定位参数是所述待识别图像卷积后得到的特征图中的图像特征回归后得到的;
根据所述多个定位参数,对所述待识别图像进行空间变换,得到校正后的待识别图像;
将所述校正后的待识别图像输入预设识别网络,获取所述待识别图像的目标分类结果。
可选的,所述将待识别图像输入预设定位网络,获取所述待识别图像的多个定位参数,可以包括:
利用所述预设个卷积层对所述待识别图像提取图像特征,获得含有图像特征的待识别图像的特征图;
利用所述预设定位网络中的全连接层,对所述待识别图像的特征图中的图像特征进行回归处理,获取所述待识别图像的多个定位参数,所述定位参数为,所述待识别图像中,与校正后的待识别图像中预设数量个基准点的图像特征匹配的像素点的坐标。
可选的,所述根据所述多个定位参数,对所述待识别图像进行空间变换,得到校正后的待识别图像,可以包括:
根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在所述待识别图像和校正后的待识别图像之间的空间变换关系;
根据所述空间变换关系,获得所述待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
可选的,所述根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在所述待识别图像和校正后的待识别图像之间的空间变换关系,可以包括:
根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获得将基准点在所述待识别图像中的坐标变换为基准点在校正后的待识别图像中的坐标的预设变换算法所需要的变换参数,所述预设变换算法包括仿射变换算法、透视变换算法、薄板样条变换算法之一;
所述根据所述空间变换关系,获得待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像,包括:
根据所述预设变换算法所需要的变换参数,利用所述预设变换算法,计算所述待识别图像中所有像素点在所述待识别图像中的坐标,获得所述待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待 识别图像。
可选的,所述将所述校正后的待识别图像输入预设识别网络,获取所述待识别图像的目标分类结果,可以包括:
利用所述预设识别网络中的卷积层,对所述校正后的待识别图像提取图像特征,获得含有图像特征的校正后的待识别图像的特征图;
利用所述预设识别网络中的全连接层,对所述校正后的待识别图像的特征图中的图像特征进行分类处理,获取待识别图像的目标分类结果。
本申请实施例提供的对形变图像的目标识别方法及装置,首先将待识别图像输入预设定位网络,获取所述待识别图像的多个定位参数,所述定位网络包括预设个卷积层,所述多个定位参数是所述待识别图像卷积后得到的特征图中的图像特征回归后得到的。其次根据所述多个定位参数,对所述待识别图像进行空间变换,得到校正后的待识别图像。最后将所述校正后的待识别图像输入预设识别网络,获取所述待识别图像的目标分类结果。本申请实施例在基于神经网络的目标识别过程中,先对形变图像进行校正,基于校正后的图像进行目标识别,能够减少形变对目标识别的干扰,因此本申请实施例能够针对形变图像,提高目标识别的准确性。当然,实施本申请的任一产品或方法必不一定需要同时达到以上所述的所有优点。
附图说明
为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例的对形变图像的目标识别方法的一种流程图;
图2为本申请实施例的神经网络的训练流程图;
图3为本申请实施例的神经网络的结构图;
图4为本申请实施例的对形变图像的目标识别方法的另一种流程图;
图5为本申请实施例的对形变图像的目标识别装置的一种结构图;
图6为本申请实施例的对形变图像的目标识别装置的另一种结构图;
图7本申请实施例提供的电子设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例公开了一种对形变图像的目标识别方法及装置,能够针对形变图像,提高目标识别的准确性。
下面首先对本申请实施例提供的一种对形变图像的目标识别方法进行介绍。
神经网络的发展极大提高了目标识别的准确率,基于神经网络的目标识别技术被广泛应用于多个领域,如智能监控领域、字符识别领域等。
基于神经网络的目标识别技术,首先通过大量样本图像、样本图像中已知目标识别结果,对预先构建的神经网络进行训练,利用神经网络自主学习特性,提取样本图像的图像特征,获得样本图像中目标的识别结果,并通过对比样本图像中已知目标识别结果,自动调整神经网络的参数,最终获得目标识别准确度高的神经网络。之后利用训练好的神经网络,针对任意待识别图像即可获得准确度高的目标识别结果,目标的识别结果体现图像中的目标信息。
通过基于神经网络的目标识别技术能够识别的目标类型广泛,如车牌、字符、人脸、动物、植物等等。比如,基于神经网络的文字识别技术即对图像中字符串信息进行检测和识别,包括图像中可能出现的车牌号、集装箱号、火车号、快递单号、条码号等。
现有的基于神经网络的目标识别方法,虽然使用深度神经网络模型对图像进行目标识别,但目标识别的过程中未考虑在复杂场景中,由于拍摄或其他原因导致的目标的形变对目标识别带来的影响,如图像拍摄过程中拍摄视角变化带来的目标的倾斜、缩放和透视变换等,或自然场景中目标的人为形变,如字符识别中遇到的字体设计和变化带来的倾斜、扭曲等。针对形变图像,现有的基于神经网络的目标识别方法将形变大的目标直接进行分类,导致目标识别的准确率降低。
本申请实施例提出一种对形变图像的目标识别方法,该方法提出一种端到端的深度神经网络模型,本申请实施例的端到端的深度神经网络模型可以根据大量形变图像的样本图像及已知的样本图像中目标识别结果训练得到。本申请实施例方法主要包括:
第一步,获得形变校正后的图像,主要包括:首先将待识别图像输入本申请实施例训练好的神经网络,利用其中校正网络的多层卷积层卷积处理待识别图像,获得待识别图像的特征图,其次对待识别图像的特征图中的图像特征进行回归处理,获得待识别图像的多个定位参数,然后根据多个定位参数,对待识别图像进行空间变换,得到校正后的待识别图像。
第二步,利用形变校正后的图像获得目标分类结果,主要包括:将校正后的待识别图像输入预设识别网络,获取待识别图像的目标分类结果,其中,预设识别网络可以为多种现有的目标识别网络。
本申请实施例在基于神经网络的目标识别过程中,先对形变图像进行校正,基于校正后的图像再进行目标识别,能够减少形变对目标识别的干扰,因此本申请实施例能够针对形变图像,提高目标识别的准确性。
参见图1,图1为本申请实施例的对形变图像的目标识别方法的一种流程图,包括如下步骤:
步骤101,将待识别图像输入预设定位网络,获取待识别图像的多个定位参数,定位网络包括预设个卷积层,多个定位参数是待识别图像卷积后得到的特征图中的图像特征回归后得到的。
本申请实施例中,待识别图像为任意图像拍摄设备拍摄的含有目标的图 像,如摄像头拍摄的图像、相机拍摄的图像、手机拍摄的图像等,目标可以为人、动物、植物、建筑物、车辆、字符等多种类型。
本申请实施例中的待识别图像可以为无形变图像,也可以为形变图像,以下均以待识别图像为形变图像说明本申请实施例方法。形变是指图像中的目标存在如平移、放缩、旋转、扭曲等形态变化。本申请实施例中的形变可以是图像拍摄过程中由于拍摄视角变化带来的目标的倾斜、缩放和透视变换等,也可以是复杂自然场景中由于人为造成的目标的形变,如字体设计和变化带来的倾斜、扭曲等。
本申请实施例提出一种端到端的深度神经网络模型,在该模型思想下,本申请实施例可以针对不同的形变类型,使用形变类型对应的具体网络进行图像校正、识别,不同的具体网络基于同一模型思想,但图像校正部分的网络结构、参数等可能略有不同,不同形变类型对应的具体网络可以基于一基本网络,对图像校正部分的网络结构、参数等进行微调获得。具体来说,本申请实施例针对的形变类型可以有多种预设形变类型,如含有旋转、平移、放缩之一或任意组合的形变,及在此基础上含有多角度拉伸形变等。本申请实施例可以针对各种预设形变类型,预先分别训练获得对应的具体的网络。
在对待识别图像进行识别时,本申请实施例针对不同的图像任务、需求,或者产生图像的不同场景,可以预判出一待识别图像的形变类型,如在不同拍摄角度下产生的图像存在的形变基本为拍摄角度带来的透视问题,该拍摄角度带来的透视问题的形变可能不仅含有旋转、平移、放缩之一或任意组合,且在此基础上还可能存在多角度拉伸形变,针对该拍摄角度带来的透视问题类型的形变。本申请实施例采用与该形变类型对应的训练好的一具体网络进行图像校正、目标识别,该具体网络是根据大量的含有旋转、平移、放缩之一或任意组合形变,及在此基础上还存在的多角度拉伸形变的形变样本图像训练得到的,网络中的参数及变换算法等均已针对该形变类型的样本图像进行过优化调整,使得应用该训练好的具体网络能够针对一具有拍摄角度带来的透视问题的形变图像,校正旋转、平移、放缩、之一或任意组合,及在此基础上的多角度拉伸形变,获得针对该形变类型的识别正确率高的目标识别结果。
本申请实施例针对一待识别图像,可以假想有一空白的图像为校正后的待识别图像,如果能够将该空白的校正后的待识别图像中的像素点全部填充完毕,即可获得一具体的校正后的待识别图像,本申请实施例正是基于该种想法,针对要获得的校正后的待识别图像,设置预设数量个像素点作为基准点,这些基准点的位置和数量均是由训练好的预设定位网络自动分析图像特征后输出的,选取的基准点能够提供校正形变所需要的参数,使得形变图片能够得以校正,这些基准点的设置原则是能够尽可能反映校正后的待识别图像的形状信息,以使得利用预设数量个基准点能够获得校正后的待识别图像的形状轮廓。
比如预设数量个基准点可以是校正后的待识别图像的边缘均匀分布的多个像素点。具体来说,比如对于通常的基于图像的字符识别,期望校正后的待识别图像,也就是输入给识别网络的图像为规则的矩形,以减小识别网络的计算量及计算复杂度,而且期望校正后的待识别图像中的字符能填充整个矩形框的范围,以使得校正后的待识别图像中的像素点具有最大限度的可用图像信息,则本申请实施例的预设数量个基准点就设置为均匀分布在矩形框边缘的预设数量个像素点,使得校正后的待识别图像的形状信息就通过均匀分布在这个矩形框外轮廓边缘的预设数量个基准点来反映。
本申请实施例针对不同的形变类型,使用形变类型对应的预设定位网络,与形变类型对应的基准点的位置、数量是通过训练好的对应的预设定位网络直接输出的,不同的形变类型对应的预设定位网络输出的校正后的待识别图像的基准点的位置和数量可以是不同的,如仅针对平移形变,基准点可以为图像中任一像素点,而对于其他较复杂的形变如透视变换,至少需要图像边缘4个角点作为基准点,可以理解的是,形变类型越复杂,需要的基准点的数量会越多,基准点的位置要求也会越高。
本申请实施例的预设定位网络为预设的神经网络,包含多个卷积层和至少一层全连接层。每个卷积层中包含多个含有权值的卷积核,全连接层含有权值矩阵。
卷积核能够提取待识别图像中不同的图像特征,如图像边缘、锐角等特征,通过提取图像特征获得含有图像特征的待识别图像的特征图,全连接层 中的权值矩阵含有多个权值,权值体现输入数据和对应分类结果的线性关系。
本申请实施例的预设定位网络中的全连接层通过训练后,权值矩阵中的权值就可以体现待识别图像的特征图,与预设数量个基准点在待识别图像中的位置的线性关系。本步骤101主要过程为,利用本申请实施例的多层卷积层提取待识别图像中的不同的图像特征,然后利用全连接层在待识别图像的图像特征中,寻找与预设数量个基准点的图像特征匹配的图像特征,也就是按照基准点的图像特征进行回归处理,将获取到的待识别图像中,与基准点的图像特征匹配的图像特征所在的像素点的位置称之为定位参数。也就是通过步骤101,本申请实施例可以获得待识别图像的预设数量个基准点在校正后的待识别图像中的位置、及预设数量个基准点对应在待识别图像中的位置。
步骤102,根据多个定位参数,对待识别图像进行空间变换,得到校正后的待识别图像。
本申请实施例根据基准点在待识别图像中的位置、基准点在校正后的待识别图像中的位置,获得一个满足于所有基准点的从待识别图像到校正后的待识别图像的空间变换关系,该空间变换关系适用于待识别图像中所有像素点,利用该空间变换关系获得待识别图像中所有像素点对应在校正后的待识别图像中的位置,继而得到校正后的待识别图像。
本申请实施例针对不同的形变类型,空间变换采用的具体变换算法不同,如针对旋转、平移、放缩之一或任意组合的形变类型,采用仿射变换算法,针对在旋转、平移、放缩之一或任意组合之外还存在多角度拉伸形变,采用透视变换算法,而针对透视变换之上的空间扭曲形变,采用薄板样条变换算法,以上例举的针对三种变换算法的基准点不同,变换算法针对的形变类型的复杂度依次上升,对应的空间变换关系的复杂度也依次上升,如仿射变换算法的空间变换关系可能为一个含有基准点位置坐标变化关系的坐标变换矩阵,而薄板样条变换算法含有多重的复杂变换步骤,其中含有多个变换参数或者公式等。
不同的变换算法均是以定位参数为基础数据,继而获得对应变换算法各自需要的变换参数进行空间变换的。具体来说,本申请实施例根据基准点在校正后的待识别图像中的位置,也就是基准点的预设位置,并根据定位参数, 也就是基准点在待识别图像中的位置,获得对应的变换算法所需要的变换参数,然后利用获得的变换参数,使用对应的变换算法计算待识别图像的像素点,得到校正后的待识别图像的像素点位置信息等,获得校正后的待识别图像的所有像素点位置信息就能够获得校正后的待识别图像。
步骤103,将校正后的待识别图像输入预设识别网络,获取待识别图像的目标分类结果。
预设识别网络为一个或多个已经训练完成的神经网络,预设识别网络利用卷积层对校正后的待识别图像提取图像特征,获得校正后的待识别图像的特征图,再利用全连接层对校正后的待识别图像的特征图中的图像特征进行分类,获得目标的分类结果。
本申请实施例中的预设识别网络可以是多种现有的目标识别网络,根据识别目标的类型不同,预设识别网络可以为,与目标的类型对应的识别网络,如识别字符的网络、识别人脸的网络等等。
本申请实施例可以结合、替换多种现有的目标识别网络,实现对多种类型目标进行识别的目的。
可见,本申请实施例提供的对形变图像的目标识别方法,首先将待识别图像输入预设定位网络,获取待识别图像的多个定位参数,定位网络包括预设个卷积层,多个定位参数是待识别图像卷积后得到的特征图中的图像特征回归后得到的。其次根据多个定位参数,对待识别图像进行空间变换,得到校正后的待识别图像。最后将校正后的待识别图像输入预设识别网络,获取待识别图像的目标分类结果。本申请实施例在基于神经网络的目标识别过程中,先对形变图像进行校正,基于校正后的图像进行目标识别,能够减少形变对目标识别的干扰,因此本申请实施例能够针对形变图像,提高目标识别的准确性。
本申请实施例可以利用图1所示方法建立并训练一个神经网络,以利用建立并训练的神经网络具体执行本申请实施例方法。参见图2,图2为本申请实施例的神经网络的训练流程图,包括:
步骤201,构建初始神经网络的结构,并设置初始神经网络的参数值。
神经网络目前已被广泛应用于图像识别等领域,存在多种已有的神经网络结构,本申请实施例可以预先结合已有的神经网络结构,构建初始神经网络的结构,本申请实施例的神经网络按照功能可以包括校正网络及识别网络,校正网络包括定位网络及空间变换网络,校正网络及识别网络包括至少一层卷积层、至少一层全连接层,参数值包括卷积层的卷积核数量、卷积核尺寸、卷积核权重值、全连接层的权值矩阵值等。
本申请实施例在构建初始神经网络的结构的同时,设置初始神经网络的参数值,参数值包括卷积层的卷积核数量,如32个、64个等、卷积核尺寸,如3*3、5*5等、卷积核权重值、全连接层的权矩阵值等。本申请实施例可以对初始神经网络的各个矩阵值赋以任意已知值作为初始值,或者可以利用如初始化方法msra等方法,对初始神经网络的各个矩阵值产生随机数作为初始值,这些随机数均是以实数的形式存在。
至此为止,本申请实施例的初始神经网络构建完成。本申请实施例可以针对预设形变类型,对校正网络的结构、参数等进行调整,获得多个针对不同预设形变类型的具体的初始神经网络。
步骤202,获取形变图像的各样本图像,以及各样本图像已知的目标识别结果。
本申请实施例预先获取大量的含有目标的形变图像的各样本图像,以及各样本图像已知的目标识别结果。形变图像为图像中的目标存在如平移、放缩、旋转、扭曲等形变的图像,本申请实施例中的目标可以为人、动物、植物、建筑物、车辆、字符等等。图像的来源可以是任意图像拍摄设备拍摄的图像,如摄像头拍摄的图像、相机拍摄的图像、手机拍摄的图像等,样本图像既可以是实时获取的图像,也可以是已存储的历史图像。
本申请实施例预设多种形变类型,获取预设形变类型对应的含有目标的样本图像,并针对不同的预设形变类型,利用对应的样本图像训练各自的神经网络。
步骤203,将各样本图像及各样本图像已知的目标识别结果,输入初始神经网络,得到各样本图像经初始神经网络得到的对应的目标识别结果。
本申请实施例将预设形变类型对应的每个样本图像、每个样本图像已知的目标识别结果输入预设形变类型的具体的初始神经网络,得到该初始神经网络输出的,每个样本图像经初始神经网络得到的目标识别结果。
步骤204,根据各样本图像经初始神经网络得到的对应的目标识别结果,与各样本图像已知的目标识别结果,获得损失函数的响应值。
本申请实施例中可以预设一个或多个损失函数,每个损失函数以一定角度衡量样本图像经初始神经网络得到的对应的目标识别结果,与已知的目标识别结果的差异,如损失函数可以是样本图像经初始神经网络得到的对应的目标识别结果,与已知的目标识别结果的减函数,或者是求取两者欧式距离的函数等,本申请实施例可以将多个损失函数的响应值进行加权,获得多个角度综合衡量两者差异的结果,以此更加准确地衡量每个样本图像经初始神经网络得到的对应的目标识别结果,与已知的目标识别结果的差异程度。
步骤205,根据损失函数的响应值,不断调整初始神经网络的结构或参数值,直至神经网络满足预设条件,获得训练得到的神经网络。
本申请实施例旨在训练神经网络以逼近损失函数达到极小值,因此本申请实施例可以对损失函数的响应值设置目标值,不断调整初始神经网络的结构或参数值,直至神经网络满足预设条件为损失函数的响应值达到目标值,获得训练得到的神经网络。
在实际训练过程中,通常可以采用,多次抽检并对比损失函数的响应值,在损失函数的响应值的减小程度达到预设值时,测试一样本图像经神经网络输出的目标识别结果,将神经网络输出的目标识别结果,与该样本已知目标识别结果对比获得识别正确率,当识别正确率达到预设正确率时,如预设正确率为98%等,此时停止训练,获得训练得到的神经网络。该过程以目标识别结果的识别正确率为导向,不用设置损失函数的响应值,更贴合使用目的及实际使用情况。
其中,调整初始神经网络的结构可以为,更改网络中各层类型、数量等,还可以增加或者减少其他组件等。调整初始神经网络的参数值可以为,修改卷积层的卷积核数量、卷积核尺寸、卷积核权重值、全连接层的权值矩阵值 等。调整初始神经网络的参数值可以采用梯度下降法等。
通过步骤201至步骤205,完成了本申请实施例的神经网络的建立及训练过程,训练得到的神经网络针对输入的含有目标的任意图像,都能够自动提取图像特征,输出获得图像中目标识别结果,具体来说,本申请实施例针对多种预设形变类型,训练得到形变类型对应的具体神经网络,在之后的基于图像的目标识别中,针对一预设形变类型的图像,采用预设形变类型对应的具体神经网络进行计算能够获得目标识别正确率高的识别结果。参见图3,图3为本申请实施例的神经网络的结构图,包括校正网络和识别网络,其中,校正网络又包括定位网络和空间变换网络。
在图1、图2、图3的基础上,作为优选的实施例,参见图4,图4为本申请实施例的对形变图像的目标识别方法的另一种流程图,包括如下步骤:
步骤401,利用预设个卷积层对待识别图像提取图像特征,获得含有图像特征的待识别图像的特征图。
根据现有图像知识可知,常见的图像有红、绿、蓝三个通道,通道是代指图像特定成分的习语,每种颜色对应的图像数据为一个二维矩阵,每个矩阵中的值为像素点的值,数值范围在0-255之间,将三个通道的二维矩阵进行叠加,就是原图的像素点对应的矩阵,也就是原图像数据。
本申请实施例中,将待识别图像输入训练好的神经网络,利用定位网络预设个卷积层中的多个卷积核,对待识别图像像素点对应的矩阵进行卷积运算,卷积核可以理解为一些可学习的滤波器,每个滤波器在空间上的宽度尺寸和高度尺寸小于待识别图像像素点对应的矩阵的尺寸,但是深度和待识别图像像素点对应的矩阵的尺寸一致,卷积核可以视为一个含有权值的矩阵,卷积核就是利用其权值提取图像特征。
举例来说,第一层卷积层的一个典型的卷积核的尺寸可以是5x5x3,也就是宽、高为5,深度为3的一个含有权值的矩阵,深度为3是因为输入图像,也就是待识别图像有红、绿、蓝3个颜色通道。在卷积运算时,每个卷积核,也就是滤波器,都在输入数据,也就是在待识别图像像素点对应的矩阵的宽度和高度上滑动,在滑动滤波器的时候,需要预设滑动步长,当步长为1时,滤 波器每次移动1个像素。然后计算整个滤波器和输入数据任一处的内积,也就是计算滤波器的权值和对应位置的像素点的值的内积。
当滤波器沿着输入数据的宽度和高度滑动完成后,会生成一个二维的激活图,也就是本申请实施例的待识别图像的特征图,该激活图给出了在每个空间位置处滤波器的反应,也就是滤波器提取的图像特征。直观地来说,卷积神经网络会让滤波器学习到,当它看到某些类型的图像特征时就激活,具体的视觉图像特征可能是某些方位上的边界,或者某些颜色的斑点等。每个滤波器在输入数据中寻找一些不同的东西,也就是不同的图像特征,将获得的不同的图像特征进行叠加得到待识别图像的特征图。
本申请实施例在卷积层处理后,还可以加入池化层Pool,对卷积层得到的特征图进行降采样处理。池化层的主要处理过程为将卷积层得到的特征图划分为多个预设区域,将每个预设区域内的多个像素值,降采样处理为一个像素值,以减小数据量,获得降采样后的特征图。
举例来说,如果第一层卷积层的输入是原始图像,那么在深度维度上的不同神经元将可能被不同方向的边界,或者是颜色斑点激活。将这些沿着深度方向排列、接受区域相同的神经元集合称为深度列或者深度切片。池化层Pool的作用是降低数据体的空间尺寸,这样就能减少网络中参数的数量,使得计算资源耗费变少,也能有效控制过拟合。
池化层可以使用最大MAX操作,对输入数据体的每一个深度切片独立进行操作,改变它的空间尺寸。最常见的形式是池化层使用尺寸2×2的滤波器,以步长为2对每个深度切片进行降采样,举例说明,如果卷积层输出的是32*32*12的数据体,池化层将32*32分成16*16个2*2数据体,然后在每个2*2的数据体里面,也就是2*2的4个数字中选取一个最大值,最后得到一个采样过后的16*16*12的数据体。该数据体相比于原来的32*32*12的数据体,宽高缩小,但深度不变。最大池化MaxPool将原数据体中可能75%的激活信息都丢掉,能够减小数据量。当然池化层也可以使用平均池化meanpool等其他池化方式。
步骤402,利用预设定位网络中的全连接层,对待识别图像的特征图中的图像特征进行回归处理,获取待识别图像的多个定位参数,定位参数为,待 识别图像中,与校正后的待识别图像中预设数量个基准点的图像特征匹配的像素点的坐标。
本申请实施例针对预设形变类型,通过与形变类型对应的预先训练好的神经网络中的预设定位网络获得预设数量个基准点,也就是获得预设数量个像素点的坐标,该坐标可以理解为在空白的校正后的待识别图像中的坐标。基准点的位置和个数是,能够满足提供该形变类型的形变校正所需要的参数使得形变图像得以校正所需要的像素点的位置和个数。如对于形变类型为旋转的待识别图像,通过至少一个基准点获得旋转角度这一个形变校正的参数即可获得校正后的待识别图像,对于形变类型为透视变换至少需要图像边缘四个角点作为基准点,获得形变校正所需要的参数以得到校正后的待识别图像,对于空间扭曲等复杂形变,需要获取图像中20个或者更多个像素点作为基准点,获得形变校正所需要的参数来得到校正后的待识别图像。
因此,本申请实施例的基准点与形变类型相关,基准点的设置原则为能够使待识别图像的整张图像在所提供的基准点位置、个数下能够得到预期的校正效果。
本申请实施例对待识别图像中的图像特征进行分类,识别出哪些图像特征是与校正后的待识别图像中预设的基准点的图像特征匹配的,也就是对于一个预设的基准点,在待识别图像中寻找与该预设的基准点的图像特征一致的一个像素点,将该像素点的坐标作为该基准点对应的定位参数。
具体为,本申请实施例的预设定位网络中的全连接层通过训练,使得权值矩阵中的权值体现待识别图像的特征图,与预设数量个基准点在待识别图像中的位置的线性关系。将训练好的全连接层的权值矩阵,与待识别图像的特征图对应的像素点矩阵相乘,获得预设数量个定位参数,也就是校正后的待识别图像中,预设数量个基准点分别对应在待识别图像中的坐标。比如选取20个基准点,通过步骤402获得的定位参数为20个基准点的坐标,为含有x坐标分量、y坐标分量的一共40个坐标值。
步骤403,根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在待识别图像和校正后的待识别图像之间的空间变换关系。
本申请实施例根据预设数量个基准点对应的定位参数,也就是基准点在待识别图像中的坐标,并根据预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在待识别图像和校正后的待识别图像之间的空间变换关系,本申请实施例就是通过基准点在待识别图像与校正后的待识别图像中的坐标对应关系计算出整个空间变换关系,继而通过由少数的预设数量个基准点得到的整个空间变换关系校正整张待识别图像中所有像素点的坐标,获得待识别图像中所有像素点对应在校正后的待识别图像中的坐标,也就实现填充整张校正后的待识别图像的目的。
本申请实施例的空间变换关系与形变类型相关,如对于简单的变换比如单纯的平移形变,仅需要位移的坐标变化量作为形变所需要的参数就足以完成空间变换,对于无平移、无旋转,仅存在放缩的形变,仅需要放缩的倍数作为形变所需要的参数就足以完成空间变换,而对于非常复杂的形变,比如包括平移、旋转、放缩、扭曲在内的形变,仅仅提供个别像素点坐标或者一两个形变所需要的参数已经无法完成整个空间变换,所以需要相应地增加基准点,以获得更多、更复杂的形变所需要的参数来推算整个空间变换关系。
空间变换关系根据形变类型的不同复杂程度可能含有多种步骤、参数、计算公式或者数学计算形式等,如最简单的空间变换方式可以为基准点的坐标在待识别图像和校正后的待识别图像中的一个坐标变换矩阵,该坐标变换矩阵是根据预设数量个基准点对应的定位参数,也就是预设数量个基准点在待识别图像中的坐标,并根据预设数量个基准点在校正后的待识别图像中的坐标总结出的,是适合于待识别图像中所有像素点的,表述像素点坐标从待识别图像变换到校正后的待识别图像的一个坐标变换矩阵。
需要说明的是,坐标变换矩阵是空间变换关系的一种举例,统一概括来讲,定位参数是空间变换所必须的信息,对于一种形变类型,一旦获得定位参数,就能够获得校正该形变类型的对应变换算法所需的具体变换参数,具体变换参数可能包括具体的不同的步骤、多种参数及计算方式等,利用该具体的不同的步骤、多种参数及计算方式等就能够利用对应的变换算法对该形变类型进行对应的校正。
因此,步骤403可以进一步具体概括为:
根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获得将基准点在待识别图像中的坐标变换为基准点在校正后的待识别图像中的坐标的预设变换算法所需要的变换参数,预设变换算法包括仿射变换算法、透视变换算法、薄板样条变换算法之一。
步骤404,根据空间变换关系,获得待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
本申请实施例中针对不同复杂程度的形变类型,空间变换关系可以具有不同的步骤、多种参数及计算方式等,本申请实施例根据校正预设形变类型对应的空间变换关系,利用对应的变换算法,以不同步骤、参数及计算方式等获得待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
如前述的三种变换算法:仿射变换算法、透视变换算法、薄板样条变换算法,是针对不同复杂度的形变类型的具体的变换算法,在本申请实施例的具体网络中,每个具体网络根据形变类型选择一种变换算法,三种变换算法针对各自对应的形变类型单独使用,比如针对拍摄角度带来的透视问题,本申请实施例仅需要使用透视变换算法即可;因为针对拍摄角度带来的透视问题,仿射变换算法不能解决透视问题,在针对拍摄角度带来的透视问题的形变类型时不使用仿射变换算法;另外,薄板样条算法也可以解决透视变换,当然也可以在网络中替换透视变换算法来使用,但薄板样条算法除了解决透视还能解决扭曲、弯折等各种各样的形变问题,功能强大的同时所需要的计算量和时间开销也相应的更大,所以一般当仅需要解决透视问题的时候不需要采用薄板样条算法,采用透视变换算法足以。
以上述的坐标变换矩阵举例来说,本申请实施例可以采用仿射变换算法将待识别图像对应的像素点的坐标矩阵,与坐标变换矩阵相乘,获得校正后的待识别图像对应的像素点的坐标矩阵。并根据校正后的待识别图像对应的像素点的坐标矩阵,获得待识别图像所有像素点在校正后的待识别图像中的坐标,最终获得校正后的待识别图像。
因此,步骤404可以进一步具体概括为,根据预设变换算法所需要的变换参数,利用预设变换算法,计算待识别图像中所有像素点在待识别图像中的 坐标,获得待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
步骤405,利用预设识别网络中的卷积层,对校正后的待识别图像提取图像特征,获得含有图像特征的校正后的待识别图像的特征图。
本申请实施例的预设识别网络为训练好的神经网络,可以为多种现有的与目标类型对应的目标识别网络,如在字符识别中,可以为由卷积神经网络CNN和循环神经网络RNN组成的识别网络等。
本申请实施例将校正后的待识别图像的各个像素点,与预设识别网络中的卷积层的卷积核进行卷积运算,获得卷积核提取的,含有校正后的待识别图像的图像特征信息的特征图。
步骤406,利用预设识别网络中的全连接层,对校正后的待识别图像的特征图中的图像特征进行分类处理,获取待识别图像的目标分类结果。
本申请实施例将校正后的待识别图像的特征图对应的像素点矩阵,与预设识别网络中的全连接层的权值矩阵相乘,获得对校正后的待识别图像的特征图中目标的分类结果。
如在字符识别中,首先采用卷积神经网络CNN和循环神经网络RNN对校正后的待识别图像进行特征提取获得校正后的待识别图像的特征图,再利用全连接层对校正后的待识别图像的特征图分类获得特征序列,特征序列仍是像素点的值对应的数据信息,然后利用预设有特征序列与字符串对应关系的序列解码器,完成特征序列与字符串结果的转换,得到识别后的字符串。如将一张包含字符串“hello”的图像经过卷积层及全连接层的处理,得到一个1*60的特征序列,该特征序列含有图像特征对应的数据信息,如0、1等不同数值,将该特征序列输入序列解码器,序列解码器输出8、5、12、12、15,进一步的,根据序列解码器中预设的特征序列与字符串的对应关系,序列解码器就能得到“hello”这个字符串。
可见,本申请实施例提供的对形变图像的目标识别方法,首先,将待识别图像输入预设定位网络,利用预设个卷积层对待识别图像提取图像特征,获得含有图像特征的待识别图像的特征图,利用预设定位网络中的全连接层, 对待识别图像的特征图中的图像特征进行回归处理,获取待识别图像的多个定位参数,定位参数为,待识别图像中,与校正后的待识别图像中预设数量个基准点的图像特征匹配的像素点的坐标。其次,根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在待识别图像和校正后的待识别图像之间的空间变换关系,并根据空间变换关系,获得待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。最后,利用预设识别网络中的卷积层,对校正后的待识别图像提取图像特征,获得含有图像特征的校正后的待识别图像的特征图,再利用预设识别网络中的全连接层,对校正后的待识别图像的特征图中的图像特征进行分类处理,获取待识别图像的目标分类结果。本申请实施例在基于神经网络的目标识别过程中,先对形变图像进行校正,基于校正后的图像进行目标识别,能够减少形变对目标识别的干扰,因此本申请实施例能够针对形变图像,提高目标识别的准确性。
参见图5,图5为本申请实施例的对形变图像的目标识别装置的一种结构图,包括:
定位模块501,用于将待识别图像输入预设定位网络,获取待识别图像的多个定位参数,定位网络包括预设个卷积层,多个定位参数是待识别图像卷积后得到的特征图中的图像特征回归后得到的。
空间变换模块502,用于根据多个定位参数,对待识别图像进行空间变换,得到校正后的待识别图像。
识别模块503,用于将校正后的待识别图像输入预设识别网络,获取待识别图像的目标分类结果。
可见,本申请实施例提供的对形变图像的目标识别装置,首先将待识别图像输入预设定位网络,获取待识别图像的多个定位参数,定位网络包括预设个卷积层,多个定位参数是待识别图像卷积后得到的特征图中的图像特征回归后得到的。其次根据多个定位参数,对待识别图像进行空间变换,得到校正后的待识别图像。最后将校正后的待识别图像输入预设识别网络,获取待识别图像的目标分类结果。本申请实施例在基于神经网络的目标识别过程中,先对形变图像进行校正,基于校正后的图像进行目标识别,能够减少形 变对目标识别的干扰,因此本申请实施例能够针对形变图像,提高目标识别的准确性。
需要说明的是,本申请实施例的装置是应用上述对形变图像的目标识别方法的装置,则上述应用对形变图像的目标识别方法的所有实施例均适用于该装置,且均能达到相同或相似的有益效果。
在图5基础上,作为优选的实施例,参见图6,图6为本申请实施例的对形变图像的目标识别装置的另一种结构图,包括:
本申请实施例中,定位模块601,包括:
特征图获取子模块6011,用于利用预设个卷积层对待识别图像提取图像特征,获得含有图像特征的待识别图像的特征图。
定位子模块6012,用于利用预设定位网络中的全连接层,对待识别图像的特征图中的图像特征进行回归处理,获取待识别图像的多个定位参数,定位参数为,待识别图像中,与校正后的待识别图像中预设数量个基准点的图像特征匹配的像素点的坐标。
本申请实施例中,空间变换模块602,包括:
变换关系获取子模块6021,用于根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在待识别图像和校正后的待识别图像之间的空间变换关系。
校正子模块6022,用于根据空间变换关系,获得待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
本申请实施例中,变换关系获取子模块6021,具体用于:
根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获得将基准点在待识别图像中的坐标变换为基准点在校正后的待识别图像中的坐标的预设变换算法所需要的变换参数,预设变换算法为仿射变换算法、透视变换算法、薄板样条变换算法之一。
校正子模块6022,具体用于:
根据预设变换算法所需要的变换参数,利用预设变换算法,计算待识别图像中所有像素点在待识别图像中的坐标,获得待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
本申请实施例中,识别模块603,包括:
特征图获取子模块6031,用于利用预设识别网络中的卷积层,对校正后的待识别图像提取图像特征,获得含有图像特征的校正后的待识别图像的特征图。
分类子模块6032,用于利用预设识别网络中的全连接层,对校正后的待识别图像的特征图中的图像特征进行分类处理,获取待识别图像的目标分类结果。
可见,本申请实施例提供的对形变图像的目标识别装置,首先,将待识别图像输入预设定位网络,利用预设个卷积层对待识别图像提取图像特征,获得含有图像特征的待识别图像的特征图,利用预设定位网络中的全连接层,对待识别图像的特征图中的图像特征进行回归处理,获取待识别图像的多个定位参数,定位参数为,待识别图像中,与校正后的待识别图像中预设数量个基准点的图像特征匹配的像素点的坐标。其次,根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在待识别图像和校正后的待识别图像之间的空间变换关系,并根据空间变换关系,获得待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。最后,利用预设识别网络中的卷积层,对校正后的待识别图像提取图像特征,获得含有图像特征的校正后的待识别图像的特征图,再利用预设识别网络中的全连接层,对校正后的待识别图像的特征图中的图像特征进行分类处理,获取待识别图像的目标分类结果。本申请实施例在基于神经网络的目标识别过程中,先对形变图像进行校正,基于校正后的图像进行目标识别,能够减少形变对目标识别的干扰,因此本申请实施例能够针对形变图像,提高目标识别的准确性。
本发明实施例还提供了一种电子设备,如图7所示,包括处理器701和存 储器702,
存储器702,用于存放计算机程序;
处理器701,用于执行存储器702上所存放的计算机程序时,实现对形变图像的目标识别方法,该方法包括:
将待识别图像输入预设定位网络,获取所述待识别图像的多个定位参数,所述预设定位网络包括预设个卷积层,所述多个定位参数是所述待识别图像卷积后得到的特征图中的图像特征回归后得到的;
根据所述多个定位参数,对所述待识别图像进行空间变换,得到校正后的待识别图像;
将所述校正后的待识别图像输入预设识别网络,获取所述待识别图像的目标分类结果。
本申请实施例中,电子设备首先将待识别图像输入预设定位网络,获取待识别图像的多个定位参数,定位网络包括预设个卷积层,多个定位参数是待识别图像卷积后得到的特征图中的图像特征回归后得到的。其次根据多个定位参数,对待识别图像进行空间变换,得到校正后的待识别图像。最后将校正后的待识别图像输入预设识别网络,获取待识别图像的目标分类结果。本申请实施例在基于神经网络的目标识别过程中,先对形变图像进行校正,基于校正后的图像进行目标识别,能够减少形变对目标识别的干扰,因此本申请实施例能够针对形变图像,提高目标识别的准确性。
在本发明实施例的一种实现方式中,所述将待识别图像输入预设定位网络,获取所述待识别图像的多个定位参数,可以包括:
利用所述预设个卷积层对所述待识别图像提取图像特征,获得含有图像特征的待识别图像的特征图;
利用所述预设定位网络中的全连接层,对所述待识别图像的特征图中的图像特征进行回归处理,获取所述待识别图像的多个定位参数,所述定位参数为,所述待识别图像中,与校正后的待识别图像中预设数量个基准点的图像特征匹配的像素点的坐标。
在本发明实施例的一种实现方式中,所述根据所述多个定位参数,对所述待识别图像进行空间变换,得到校正后的待识别图像,可以包括:
根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在所述待识别图像和校正后的待识别图像之间的空间变换关系;
根据所述空间变换关系,获得所述待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
在本发明实施例的一种实现方式中,所述根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在所述待识别图像和校正后的待识别图像之间的空间变换关系,可以包括:
根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获得将基准点在所述待识别图像中的坐标变换为基准点在校正后的待识别图像中的坐标的预设变换算法所需要的变换参数,所述预设变换算法包括仿射变换算法、透视变换算法、薄板样条变换算法之一;
所述根据所述空间变换关系,获得待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像,包括:
根据所述预设变换算法所需要的变换参数,利用所述预设变换算法,计算所述待识别图像中所有像素点在所述待识别图像中的坐标,获得所述待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
在本发明实施例的一种实现方式中,所述将所述校正后的待识别图像输入预设识别网络,获取所述待识别图像的目标分类结果,可以包括:
利用所述预设识别网络中的卷积层,对所述校正后的待识别图像提取图像特征,获得含有图像特征的校正后的待识别图像的特征图;
利用所述预设识别网络中的全连接层,对所述校正后的待识别图像的特征图中的图像特征进行分类处理,获取待识别图像的目标分类结果。
本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现对形变图像的目标识别方法,该方法包括:
将待识别图像输入预设定位网络,获取所述待识别图像的多个定位参数,所述预设定位网络包括预设个卷积层,所述多个定位参数是所述待识别图像卷积后得到的特征图中的图像特征回归后得到的;
根据所述多个定位参数,对所述待识别图像进行空间变换,得到校正后的待识别图像;
将所述校正后的待识别图像输入预设识别网络,获取所述待识别图像的目标分类结果。
本申请实施例中,计算机程序被处理器执行时首先将待识别图像输入预设定位网络,获取待识别图像的多个定位参数,定位网络包括预设个卷积层,多个定位参数是待识别图像卷积后得到的特征图中的图像特征回归后得到的。其次根据多个定位参数,对待识别图像进行空间变换,得到校正后的待识别图像。最后将校正后的待识别图像输入预设识别网络,获取待识别图像的目标分类结果。本申请实施例在基于神经网络的目标识别过程中,先对形变图像进行校正,基于校正后的图像进行目标识别,能够减少形变对目标识别的干扰,因此本申请实施例能够针对形变图像,提高目标识别的准确性。
在本发明实施例的一种实现方式中,所述将待识别图像输入预设定位网络,获取所述待识别图像的多个定位参数,可以包括:
利用所述预设个卷积层对所述待识别图像提取图像特征,获得含有图像特征的待识别图像的特征图;
利用所述预设定位网络中的全连接层,对所述待识别图像的特征图中的图像特征进行回归处理,获取所述待识别图像的多个定位参数,所述定位参数为,所述待识别图像中,与校正后的待识别图像中预设数量个基准点的图像特征匹配的像素点的坐标。
在本发明实施例的一种实现方式中,所述根据所述多个定位参数,对所述待识别图像进行空间变换,得到校正后的待识别图像,可以包括:
根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在所述待识别图像和校正后的待识别图像之间的空间变换关系;
根据所述空间变换关系,获得所述待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
在本发明实施例的一种实现方式中,所述根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在所述待识别图像和校正后的待识别图像之间的空间变换关系,可以包括:
根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获得将基准点在所述待识别图像中的坐标变换为基准点在校正后的待识别图像中的坐标的预设变换算法所需要的变换参数,所述预设变换算法包括仿射变换算法、透视变换算法、薄板样条变换算法之一;
所述根据所述空间变换关系,获得待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像,包括:
根据所述预设变换算法所需要的变换参数,利用所述预设变换算法,计算所述待识别图像中所有像素点在所述待识别图像中的坐标,获得所述待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
在本发明实施例的一种实现方式中,所述将所述校正后的待识别图像输入预设识别网络,获取所述待识别图像的目标分类结果,可以包括:
利用所述预设识别网络中的卷积层,对所述校正后的待识别图像提取图像特征,获得含有图像特征的校正后的待识别图像的特征图;
利用所述预设识别网络中的全连接层,对所述校正后的待识别图像的特征图中的图像特征进行分类处理,获取待识别图像的目标分类结果。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示 这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于***实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。

Claims (16)

  1. 一种对形变图像的目标识别方法,其特征在于,包括:
    将待识别图像输入预设定位网络,获取所述待识别图像的多个定位参数,所述预设定位网络包括预设个卷积层,所述多个定位参数是所述待识别图像卷积后得到的特征图中的图像特征回归后得到的;
    根据所述多个定位参数,对所述待识别图像进行空间变换,得到校正后的待识别图像;
    将所述校正后的待识别图像输入预设识别网络,获取所述待识别图像的目标分类结果。
  2. 根据权利要求1所述的方法,其特征在于,所述将待识别图像输入预设定位网络,获取所述待识别图像的多个定位参数,包括:
    利用所述预设个卷积层对所述待识别图像提取图像特征,获得含有图像特征的待识别图像的特征图;
    利用所述预设定位网络中的全连接层,对所述待识别图像的特征图中的图像特征进行回归处理,获取所述待识别图像的多个定位参数,所述定位参数为,所述待识别图像中,与校正后的待识别图像中预设数量个基准点的图像特征匹配的像素点的坐标。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述多个定位参数,对所述待识别图像进行空间变换,得到校正后的待识别图像,包括:
    根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在所述待识别图像和校正后的待识别图像之间的空间变换关系;
    根据所述空间变换关系,获得所述待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
  4. 根据权利要求3所述的方法,其特征在于,所述根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在所述待识别图像和校正后的待识别图像之间的空间变换关系,包 括:
    根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获得将基准点在所述待识别图像中的坐标变换为基准点在校正后的待识别图像中的坐标的预设变换算法所需要的变换参数,所述预设变换算法包括仿射变换算法、透视变换算法、薄板样条变换算法之一;
    所述根据所述空间变换关系,获得待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像,包括:
    根据所述预设变换算法所需要的变换参数,利用所述预设变换算法,计算所述待识别图像中所有像素点在所述待识别图像中的坐标,获得所述待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
  5. 根据权利要求1所述的方法,其特征在于,所述将所述校正后的待识别图像输入预设识别网络,获取所述待识别图像的目标分类结果,包括:
    利用所述预设识别网络中的卷积层,对所述校正后的待识别图像提取图像特征,获得含有图像特征的校正后的待识别图像的特征图;
    利用所述预设识别网络中的全连接层,对所述校正后的待识别图像的特征图中的图像特征进行分类处理,获取待识别图像的目标分类结果。
  6. 一种对形变图像的目标识别装置,其特征在于,包括:
    定位模块,用于将待识别图像输入预设定位网络,获取所述待识别图像的多个定位参数,所述预设定位网络包括预设个卷积层,所述多个定位参数是所述待识别图像卷积后得到的特征图中的图像特征回归后得到的;
    空间变换模块,用于根据所述多个定位参数,对所述待识别图像进行空间变换,得到校正后的待识别图像;
    识别模块,用于将所述校正后的待识别图像输入预设识别网络,获取所述待识别图像的目标分类结果。
  7. 根据权利要求6所述的装置,其特征在于,所述定位模块,包括:
    特征图获取子模块,用于利用所述预设个卷积层对所述待识别图像提取图像特征,获得含有图像特征的待识别图像的特征图;
    定位子模块,用于利用所述预设定位网络中的全连接层,对所述待识别图像的特征图中的图像特征进行回归处理,获取所述待识别图像的多个定位参数,所述定位参数为,所述待识别图像中,与校正后的待识别图像中预设数量个基准点的图像特征匹配的像素点的坐标。
  8. 根据权利要求7所述的装置,其特征在于,所述空间变换模块,包括:
    变换关系获取子模块,用于根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在所述待识别图像和校正后的待识别图像之间的空间变换关系;
    校正子模块,用于根据所述空间变换关系,计算所述待识别图像中所有像素点在待识别图像中的坐标,获得待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
  9. 根据权利要求8所述的装置,其特征在于,所述变换关系获取子模块,具体用于:
    根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获得将基准点在所述待识别图像中的坐标变换为基准点在校正后的待识别图像中的坐标的预设变换算法所需要的变换参数,所述预设变换算法包括仿射变换算法、透视变换算法、薄板样条变换算法之一;
    所述校正子模块,具体用于:
    根据所述预设变换算法所需要的变换参数,利用所述预设变换算法,计算所述待识别图像中所有像素点在所述待识别图像中的坐标,获得所述待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
  10. 根据权利要求6所述的装置,其特征在于,所述识别模块,包括:
    特征图获取子模块,用于利用所述预设识别网络中的卷积层,对所述校正后的待识别图像提取图像特征,获得含有图像特征的校正后的待识别图像 的特征图;
    分类子模块,用于利用所述预设识别网络中的全连接层,对所述校正后的待识别图像的特征图中的图像特征进行分类处理,获取待识别图像的目标分类结果。
  11. 一种电子设备,其特征在于,包括处理器和存储器,
    存储器,用于存放计算机程序;
    处理器,用于执行存储器上所存放的计算机程序时,实现将待识别图像输入预设定位网络,获取所述待识别图像的多个定位参数,所述预设定位网络包括预设个卷积层,所述多个定位参数是所述待识别图像卷积后得到的特征图中的图像特征回归后得到的;
    根据所述多个定位参数,对所述待识别图像进行空间变换,得到校正后的待识别图像;
    将所述校正后的待识别图像输入预设识别网络,获取所述待识别图像的目标分类结果。
  12. 根据权利要求11所述的电子设备,其特征在于,所述将待识别图像输入预设定位网络,获取所述待识别图像的多个定位参数,包括:
    利用所述预设个卷积层对所述待识别图像提取图像特征,获得含有图像特征的待识别图像的特征图;
    利用所述预设定位网络中的全连接层,对所述待识别图像的特征图中的图像特征进行回归处理,获取所述待识别图像的多个定位参数,所述定位参数为,所述待识别图像中,与校正后的待识别图像中预设数量个基准点的图像特征匹配的像素点的坐标。
  13. 根据权利要求12所述的电子设备,其特征在于,所述根据所述多个定位参数,对所述待识别图像进行空间变换,得到校正后的待识别图像,包括:
    根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在所述待识别图像和校正后的待识别图像 之间的空间变换关系;
    根据所述空间变换关系,获得所述待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
  14. 根据权利要求13所述的电子设备,其特征在于,所述根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获取基准点在所述待识别图像和校正后的待识别图像之间的空间变换关系,包括:
    根据预设数量个基准点对应的定位参数、预设数量个基准点在校正后的待识别图像中的坐标,获得将基准点在所述待识别图像中的坐标变换为基准点在校正后的待识别图像中的坐标的预设变换算法所需要的变换参数,所述预设变换算法包括仿射变换算法、透视变换算法、薄板样条变换算法之一;
    所述根据所述空间变换关系,获得待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像,包括:
    根据所述预设变换算法所需要的变换参数,利用所述预设变换算法,计算所述待识别图像中所有像素点在所述待识别图像中的坐标,获得所述待识别图像中所有像素点对应在校正后的待识别图像中的坐标,获得校正后的待识别图像。
  15. 根据权利要求11所述的电子设备,其特征在于,所述将所述校正后的待识别图像输入预设识别网络,获取所述待识别图像的目标分类结果,包括:
    利用所述预设识别网络中的卷积层,对所述校正后的待识别图像提取图像特征,获得含有图像特征的校正后的待识别图像的特征图;
    利用所述预设识别网络中的全连接层,对所述校正后的待识别图像的特征图中的图像特征进行分类处理,获取待识别图像的目标分类结果。
  16. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-5任一所述的方法步骤。
PCT/CN2018/090826 2017-06-16 2018-06-12 一种对形变图像的目标识别方法及装置 WO2018228375A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/622,197 US11126888B2 (en) 2017-06-16 2018-06-12 Target recognition method and apparatus for a deformed image
EP18817876.8A EP3640844A4 (en) 2017-06-16 2018-06-12 TARGET RECOGNITION METHOD AND APPARATUS FOR DEFORMED IMAGE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710457725.7A CN109145927A (zh) 2017-06-16 2017-06-16 一种对形变图像的目标识别方法及装置
CN201710457725.7 2017-06-16

Publications (1)

Publication Number Publication Date
WO2018228375A1 true WO2018228375A1 (zh) 2018-12-20

Family

ID=64660030

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/090826 WO2018228375A1 (zh) 2017-06-16 2018-06-12 一种对形变图像的目标识别方法及装置

Country Status (4)

Country Link
US (1) US11126888B2 (zh)
EP (1) EP3640844A4 (zh)
CN (1) CN109145927A (zh)
WO (1) WO2018228375A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111476247A (zh) * 2019-01-23 2020-07-31 斯特拉德视觉公司 利用了1xK或Kx1卷积运算的CNN方法及装置
CN112396082A (zh) * 2019-08-19 2021-02-23 北京中关村科金技术有限公司 图像认证的方法、装置以及存储介质

Families Citing this family (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112805750A (zh) 2018-08-13 2021-05-14 奇跃公司 跨现实***
US11227435B2 (en) 2018-08-13 2022-01-18 Magic Leap, Inc. Cross reality system
EP3861387A4 (en) 2018-10-05 2022-05-25 Magic Leap, Inc. RENDERING LOCATION-SPECIFIC VIRTUAL CONTENT IN ANY LOCATION
CN109800294B (zh) * 2019-01-08 2020-10-13 中国科学院自动化研究所 基于物理环境博弈的自主进化智能对话方法、***、装置
CN109829848A (zh) * 2019-01-17 2019-05-31 柳州康云互联科技有限公司 一种用于互联网检测中基于神经网络的图像空间变换的***及方法
CN109829437B (zh) * 2019-02-01 2022-03-25 北京旷视科技有限公司 图像处理方法、文本识别方法、装置和电子***
CN111695380B (zh) * 2019-03-13 2023-09-26 杭州海康威视数字技术股份有限公司 目标检测方法及装置
CN110009747B (zh) * 2019-04-11 2023-03-31 武汉轻工大学 单叶双曲面方程识别方法、设备、存储介质及装置
CN110717430A (zh) * 2019-09-27 2020-01-21 聚时科技(上海)有限公司 基于目标检测与rnn的长物体识别方法及识别***
WO2021076754A1 (en) 2019-10-15 2021-04-22 Magic Leap, Inc. Cross reality system with localization service
EP4046401A4 (en) 2019-10-15 2023-11-01 Magic Leap, Inc. CROSS-REALLY SYSTEM WITH WIRELESS FINGERPRINTS
CN112668600B (zh) * 2019-10-16 2024-05-21 商汤国际私人有限公司 一种文本识别方法及装置
JP2023504775A (ja) 2019-11-12 2023-02-07 マジック リープ, インコーポレイテッド 位置特定サービスおよび共有場所ベースのコンテンツを伴うクロスリアリティシステム
CN112825141B (zh) * 2019-11-21 2023-02-17 上海高德威智能交通***有限公司 识别文本的方法、装置、识别设备和存储介质
CN110942064B (zh) * 2019-11-25 2023-05-09 维沃移动通信有限公司 图像处理方法、装置和电子设备
CN111062396B (zh) * 2019-11-29 2022-03-25 深圳云天励飞技术有限公司 车牌号码识别方法、装置、电子设备及存储介质
WO2021118962A1 (en) 2019-12-09 2021-06-17 Magic Leap, Inc. Cross reality system with simplified programming of virtual content
US11830149B2 (en) 2020-02-13 2023-11-28 Magic Leap, Inc. Cross reality system with prioritization of geolocation information for localization
JP2023514208A (ja) 2020-02-13 2023-04-05 マジック リープ, インコーポレイテッド マルチ分解能フレーム記述子を使用したマップ処理を伴うクロスリアリティシステム
WO2021163306A1 (en) 2020-02-13 2021-08-19 Magic Leap, Inc. Cross reality system with accurate shared maps
JP2023515524A (ja) 2020-02-26 2023-04-13 マジック リープ, インコーポレイテッド 高速位置特定を伴うクロスリアリティシステム
CN111246113B (zh) * 2020-03-05 2022-03-18 上海瑾盛通信科技有限公司 图像处理方法、装置、设备及存储介质
CN111583099A (zh) * 2020-04-14 2020-08-25 上海联影智能医疗科技有限公司 图像摆正方法、计算机设备和存储介质
CN111401326B (zh) * 2020-04-21 2023-04-18 招商局金融科技有限公司 基于图片识别的目标身份识别方法、服务器及存储介质
CN115803788A (zh) 2020-04-29 2023-03-14 奇跃公司 用于大规模环境的交叉现实***
JP2021196951A (ja) * 2020-06-16 2021-12-27 キヤノン株式会社 画像処理装置、画像処理方法、プログラム、学習済みモデルの製造方法、および画像処理システム
CN113077391B (zh) 2020-07-22 2024-01-26 同方威视技术股份有限公司 校正扫描图像的方法和装置以及图像扫描***
CN112149442B (zh) * 2020-09-15 2022-12-06 浙江大华技术股份有限公司 畸变二维码的识别方法和装置、存储介质及电子装置
CN112580544A (zh) * 2020-12-24 2021-03-30 上海依图网络科技有限公司 图像识别方法、装置和介质及其电子设备
CN113012136A (zh) * 2021-03-24 2021-06-22 中国民航大学 一种基于目标检测的机场行李计数方法及计数***
CN112990046B (zh) * 2021-03-25 2023-08-04 北京百度网讯科技有限公司 差异信息获取方法、相关装置及计算机程序产品
CN113705386A (zh) * 2021-08-12 2021-11-26 北京有竹居网络技术有限公司 视频分类方法、装置、可读介质和电子设备
CN113780286A (zh) * 2021-09-27 2021-12-10 浙江大华技术股份有限公司 对象识别方法及装置、存储介质、电子装置
CN114396877B (zh) * 2021-11-19 2023-09-26 重庆邮电大学 面向材料力学性能的智能三维位移场及应变场测量方法
CN117036665B (zh) * 2023-09-04 2024-03-08 南京航空航天大学 一种基于孪生神经网络的旋钮开关状态识别方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599878A (zh) * 2016-12-28 2017-04-26 深圳市捷顺科技实业股份有限公司 一种基于深度学习的人脸重建矫正方法及装置
CN106778737A (zh) * 2016-11-24 2017-05-31 北京文安智能技术股份有限公司 一种车牌矫正方法、装置和一种视频采集装置
CN106845487A (zh) * 2016-12-30 2017-06-13 佳都新太科技股份有限公司 一种端到端的车牌识别方法

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100643305B1 (ko) * 2005-02-14 2006-11-10 삼성전자주식회사 컨볼루션 커널을 이용한 라인 패턴 처리 방법 및 장치
US8463025B2 (en) * 2011-04-26 2013-06-11 Nec Laboratories America, Inc. Distributed artificial intelligence services on a cell phone
CN104298976B (zh) * 2014-10-16 2017-09-26 电子科技大学 基于卷积神经网络的车牌检测方法
US9892301B1 (en) * 2015-03-05 2018-02-13 Digimarc Corporation Localization of machine-readable indicia in digital capture systems
EP3262569A1 (en) * 2015-06-05 2018-01-03 Google, Inc. Spatial transformer modules
US9858496B2 (en) * 2016-01-20 2018-01-02 Microsoft Technology Licensing, Llc Object detection and classification in images
CN105740909B (zh) * 2016-02-02 2017-06-13 华中科技大学 一种基于空间变换的自然场景下文本识别方法
CN106778659B (zh) * 2016-12-28 2020-10-27 深圳市捷顺科技实业股份有限公司 一种车牌识别方法及装置
US10426442B1 (en) * 2019-06-14 2019-10-01 Cycle Clarity, LLC Adaptive image processing in assisted reproductive imaging modalities

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106778737A (zh) * 2016-11-24 2017-05-31 北京文安智能技术股份有限公司 一种车牌矫正方法、装置和一种视频采集装置
CN106599878A (zh) * 2016-12-28 2017-04-26 深圳市捷顺科技实业股份有限公司 一种基于深度学习的人脸重建矫正方法及装置
CN106845487A (zh) * 2016-12-30 2017-06-13 佳都新太科技股份有限公司 一种端到端的车牌识别方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3640844A4

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111476247A (zh) * 2019-01-23 2020-07-31 斯特拉德视觉公司 利用了1xK或Kx1卷积运算的CNN方法及装置
CN111476247B (zh) * 2019-01-23 2023-09-26 斯特拉德视觉公司 利用了1xK或Kx1卷积运算的CNN方法及装置
CN112396082A (zh) * 2019-08-19 2021-02-23 北京中关村科金技术有限公司 图像认证的方法、装置以及存储介质

Also Published As

Publication number Publication date
US11126888B2 (en) 2021-09-21
US20200134366A1 (en) 2020-04-30
EP3640844A4 (en) 2020-06-10
CN109145927A (zh) 2019-01-04
EP3640844A1 (en) 2020-04-22

Similar Documents

Publication Publication Date Title
WO2018228375A1 (zh) 一种对形变图像的目标识别方法及装置
CN109360171B (zh) 一种基于神经网络的视频图像实时去模糊方法
US11281925B2 (en) Method and terminal for recognizing object node in image, and computer-readable storage medium
CN108108764B (zh) 一种基于随机森林的视觉slam回环检测方法
KR102629380B1 (ko) 실제 3차원 객체를 실제 객체의 2-차원 스푸프로부터 구별하기 위한 방법
CN108764041B (zh) 用于下部遮挡人脸图像的人脸识别方法
CN112446383B (zh) 车牌识别方法及装置、存储介质、终端
CN110287846A (zh) 一种基于注意力机制的人脸关键点检测方法
CN111783748B (zh) 人脸识别方法、装置、电子设备及存储介质
CN111091075B (zh) 人脸识别方法、装置、电子设备及存储介质
CN107016646A (zh) 一种基于改进的逼近投影变换图像拼接方法
CN109919971B (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
CN112381061B (zh) 一种面部表情识别方法及***
CN110674759A (zh) 一种基于深度图的单目人脸活体检测方法、装置及设备
CN111797882A (zh) 图像分类方法及装置
CN113724379B (zh) 融合图像与激光点云的三维重建方法及装置
CN113095470A (zh) 神经网络的训练方法、图像处理方法及装置、存储介质
CN111652910A (zh) 一种基于对象空间关系的目标跟踪算法
CN116681636A (zh) 基于卷积神经网络的轻量化红外与可见光图像融合方法
CN110503002B (zh) 一种人脸检测方法和存储介质
CN113096023A (zh) 神经网络的训练方法、图像处理方法及装置、存储介质
CN110647813A (zh) 一种基于无人机航拍的人脸实时检测识别方法
KR20220098895A (ko) 인체 포즈 추정 장치 및 방법
CN115049827B (zh) 目标物体检测分割方法、装置、设备及存储介质
US11797854B2 (en) Image processing device, image processing method and object recognition system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18817876

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2018817876

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2018817876

Country of ref document: EP

Effective date: 20200116