WO2019011249A1 - 一种图像中物体姿态的确定方法、装置、设备及存储介质 - Google Patents

一种图像中物体姿态的确定方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2019011249A1
WO2019011249A1 PCT/CN2018/095191 CN2018095191W WO2019011249A1 WO 2019011249 A1 WO2019011249 A1 WO 2019011249A1 CN 2018095191 W CN2018095191 W CN 2018095191W WO 2019011249 A1 WO2019011249 A1 WO 2019011249A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
image block
block
determining
target object
Prior art date
Application number
PCT/CN2018/095191
Other languages
English (en)
French (fr)
Inventor
李佳
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to KR1020197030205A priority Critical patent/KR102319177B1/ko
Priority to JP2019541339A priority patent/JP6789402B2/ja
Priority to EP18832199.6A priority patent/EP3576017A4/en
Publication of WO2019011249A1 publication Critical patent/WO2019011249A1/zh
Priority to US16/531,434 priority patent/US11107232B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/17Image acquisition using hand-held instruments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the embodiments of the present invention relate to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for determining an object posture in an image.
  • the augmented reality technology organically integrates the virtual information such as graphics and text generated by the computer into the real scene seen by the user, and enhances or expands the scene of the human visual system.
  • the basis for implementing augmented reality technology is the ability to obtain observation angles of real scenes. For example, when an image of a real scene is acquired by a camera, it is necessary to estimate the posture of the three-dimensional object by the two-dimensional observation image, and then the virtual content is added and displayed in the real scene by the posture of the three-dimensional object.
  • a commonly used method is to detect artificially designed features and then compare between different images.
  • such methods require additional steps such as accurate scale selection, rotation correction, and density normalization, which are computationally complex and time consuming.
  • augmented reality technology is applied to a terminal device such as a mobile device or a wearable device, the above method is no longer applicable because such terminal device has limited resources and limited information input and computing capabilities.
  • the embodiments of the present application provide a method, a device, a device, and a storage medium for determining an object posture in an image, which can improve the time efficiency of image processing, consume less memory resources, and improve resource utilization of the terminal device.
  • the present application provides a method for determining an object pose in an image, the method being applied to a terminal device, the method comprising:
  • the convolutional neural network Determining, by the convolutional neural network, a label image block matching each of the first image blocks according to the training model parameter, the label image block being a partial image of a standard image of the target object;
  • the present application provides a method for determining an object pose in an image, the method being applied to a server, the method comprising:
  • the terminal device Transmitting the training model parameters to the terminal device, so that the terminal device acquires a real-time image of the target object, and identifies at least one first image block from the real-time image, where the first image block is a partial image of the real-time image; determining, according to the training model parameter, a label image block that matches each of the first image blocks by the convolutional neural network, the label image block being part of the standard image And determining an attitude of the target object according to each of the first image blocks and the respective matched label image blocks, and adding virtual content to the real-time image according to the gesture.
  • the present application provides a device for determining an attitude of an object in an image, the device comprising:
  • An offline receiving module configured to acquire, from a server, a training model parameter of a convolutional neural network of a target object
  • An online receiving module configured to acquire a real-time image of the target object
  • An identification module configured to identify at least one first image block from the real-time image, the first image block being a partial image of the real-time image;
  • a matching module configured to determine, according to the training model parameter, a label image block that matches each of the first image blocks by the convolutional neural network, the label image block being a standard image of the target object Partial image
  • a posture determining module configured to determine a posture of the target object according to the label image block that is matched by each of the first image blocks and each of the first image blocks.
  • the present application provides a device for determining an attitude of an object in an image, the device comprising:
  • An acquisition module configured to acquire a standard image of the target object and a plurality of distortion images of the target object
  • a training module configured to input the standard image and the multiple distortion images into a convolutional neural network for training, to obtain training model parameters
  • a sending module configured to send the training model parameter to the terminal device, so that the terminal device acquires a real-time image of the target object, and identifies at least one first image block from the real-time image, the first An image block is a partial image of the real-time image; and according to the training model parameter, a label image block matching each of the first image blocks is determined by the convolutional neural network, the label image block is a partial image of the standard image; determining a pose of the target object according to the label image block that each of the first image block and the first image block matches.
  • the present application provides a terminal device including a processor and a memory, the memory storing at least one instruction loaded by the processor and executed to implement the application as described above A method of determining an object pose in an image in a terminal device.
  • the application provides a server, the server including a processor and a memory, the memory storing at least one instruction loaded by the processor and executed to implement the application to the server as described above The method of determining the pose of an object in an image.
  • the present application provides a computer readable storage medium having stored therein at least one instruction, at least one program, a code set, or a set of instructions, at least one instruction, the at least one program, and the code
  • a set or set of instructions is loaded and executed by the processor to implement a method of determining an object pose in an image as described above.
  • the method provided by the embodiment of the present application uses the convolutional neural network to perform offline training, and then uses the trained training model parameters when determining the posture of the object online, so that the computational complexity of the image processing is greatly reduced, and the time efficiency is High, occupying less memory resources, and ensuring the accuracy of the method.
  • This method is especially applicable to the application of augmented reality services on resource-constrained devices, which improves the resource utilization rate of the terminal devices.
  • FIG. 1 is a schematic diagram of an implementation environment involved in an embodiment of the present application.
  • FIG. 2 is a schematic flow chart of a method for determining an attitude of an object in an image according to an embodiment of the present application
  • 3a is a schematic diagram of a standard image of a target object in an embodiment of the present application.
  • FIG. 3b is a schematic diagram of a distortion image of a target object in an embodiment of the present application.
  • 4a is a schematic diagram of a standard image of a target object in another embodiment of the present application.
  • 4b is a schematic diagram of a distortion image of a target object in another embodiment of the present application.
  • FIG. 5 is a schematic flow chart of a method for determining an attitude of an object in an image according to another embodiment of the present application.
  • FIG. 6 is a schematic flow chart of a method for determining an attitude of an object in an image according to an embodiment of the present application
  • FIG. 7 is a schematic structural diagram of a convolutional neural network according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a client in an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a client in another embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a server according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a server in another embodiment of the present application.
  • FIG. 1 is a schematic diagram of an augmented reality implementation environment according to an embodiment of the present application.
  • a target object 101 a terminal device 102, and a server 103 are included in the augmented reality application system 100.
  • the terminal device 102 is equipped with an imaging device 1021, a screen 1023, an object posture determining client according to the embodiment of the present application, and an application running with augmented reality.
  • the user photographs the image 1022 regarding the target object 101 in real time using the camera device 1021, and displays it on the screen 1023.
  • the posture of the target object 101 is estimated from the captured image 1022, by which the position of the target object 101 in the captured image 1022 can be determined, and then the virtual content 1024 is determined according to the posture. Add to the same location, so that the real world and virtual information are superimposed in the same picture.
  • the terminal device 102 first obtains an offline training result for the target object 101 from the server 103 before the online detection of the real-time image at the terminal device.
  • a large number of image samples of the target object 101 are stored in the database 1031 in the server 103, and then the offline training sub-server 1032 performs offline training on the image samples using the convolutional neural network.
  • the training model parameters are determined, and then sent to the training model parameters.
  • the terminal device 102 is used for online detection of real-time images.
  • the above-described terminal device 102 refers to a terminal device having an image capturing and processing function, including but not limited to a smartphone, a palmtop computer, a tablet computer, and the like.
  • Operating systems are installed on these terminal devices, including but not limited to: Android operating system, Symbian operating system, Windows mobile operating system, and Apple iPhone OS operating system.
  • Communication between the terminal device 102 and the server 103 can be through a wireless network.
  • FIG. 2 is a schematic flow chart of a method for determining an attitude of an object in an image according to an embodiment of the present application.
  • the method can be applied to a separate client or to a client with augmented reality functionality, which can be installed in the terminal device 102 in the embodiment of FIG.
  • the method includes, but is not limited to, the following steps.
  • Step 201 Acquire a convolutional neural network training model parameter of the target object from the server.
  • the server acquires a standard image of a target object in a scene and a plurality of distorted images, and inputs a standard image and a plurality of distorted images into a convolutional neural network for training to obtain training model parameters. Then, the server sends the training model parameters to the client, and the terminal device installed with the client receives the training model parameters through the client.
  • the trained training model parameters are associated with a particular scene and are directed to a single target object.
  • the so-called standard image refers to a clear image taken for a target object in a specific scene, and the distorted image is obtained by introducing various perspective distortions on the basis of the standard image.
  • Figure 3a shows a standard image for a target object in a city scene
  • Figure 3b shows the corresponding three distortion images.
  • the scene is a city complex near the river, with the target object being the tallest building, as shown by the oval in Figure 3a.
  • the three distorted images are obtained by rotating and translating the standard image in Fig. 3a.
  • the target object-floor is visible in each distorted image, and some random numbers are filled in the background portion.
  • Figures 4a and 4b show a standard image and three distorted images for one target object in another scene, respectively.
  • the target object is the bridge over the river, as shown by the box in Figure 4a.
  • the three distorted images are also obtained by rotating and translating the standard image. In each distorted image, a complete or partial target object-bridge can be seen.
  • This step is performed before the user uses the augmented reality service, and the obtained training model parameters are stored in advance in the client.
  • the training model parameters are read for gesture determination of the target object.
  • Step 202 Acquire a real-time image of the target object, and identify at least one first image block from the real-time image.
  • the user is in the above scenario, and wants to use the augmented reality service.
  • the real-time image of the target object is captured by the camera device on the terminal device where the client is located, and the real-time image is transmitted to the client.
  • the client identifies at least one first image block from the real-time image, wherein the first image block is a partial image of the real-time image, and the method for identifying includes but is not limited to the following steps:
  • Step 2021 Perform feature detection on the real-time image to acquire a plurality of local features.
  • a local feature refers to something in the image that is different from its surroundings. It describes an area that makes it highly distinguishable.
  • Step 2022 For each local feature, if it is determined that the image contrast of the local feature is higher than a preset contrast threshold and the local feature is not an edge of the image, the local feature is determined as the first image block.
  • contrast refers to the measurement of different brightness levels between the brightest white and the darkest black in a dark and dark area of an image, that is, the magnitude of the grayscale contrast of an image.
  • the identified first image block can be highlighted from the surrounding environment, reducing ambiguity in position.
  • the real-time image is a facial image
  • the first image block is the tip of the nose, the corner of the eye, and the like.
  • SIFT Scale Invariant Feature Transform
  • SURF Accelerated Robust Feature
  • FAST accelerated segmentation test feature recognition
  • local features may also be determined based on a single determination. For example, if it is determined that the image contrast of the local feature is higher than a preset contrast threshold, the local feature is determined as the first image block. Alternatively, the local feature is not the edge of the image, and the local feature is determined as the first image block. Here, the recognition accuracy of the local features will affect the subsequent matches and the determined pose results.
  • the terminal device inputs each first image block into a convolutional neural network, and the convolutional neural network outputs a label image block that matches each of the first image blocks based on the training model parameters.
  • the label image block is a partial image of a standard image that matches the first image block.
  • the training model parameters include weights and second image blocks identified from the standard image, and the second image block is a partial image of the standard image.
  • the convolutional neural network includes multiple convolutional layers, and the weights refer to the values of the individual elements in the convolution matrix used by each convolutional layer.
  • the matching method includes but is not limited to the following steps:
  • step 2031 the first image block is input to the convolutional neural network, and the probability that the first image block matches each of the second image blocks is output based on the weight.
  • the convolutional neural network is capable of classifying the first image block, each second image block representing a category label, and processing the first image block by weights, the output of which is the first image block and each second image block The probability of matching. This probability value represents the similarity of the first image block and the second image block.
  • Step 2032 Determine a second image block corresponding to the maximum probability value as a label image block.
  • Step 204 Determine a posture of the target object according to each of the first image block and the label image block that matches each of the first image blocks.
  • the pose of the target object is represented by an affine transformation, that is, each label image block matches the first image block via an affine transformation.
  • the affine transformation can reflect the translation and rotation of the target object relative to the camera lens, and can describe the imaging process of the target object in the 3D space to the 2D plane image.
  • An affine transformation belongs to a linear transformation, that is, a general characteristic of transforming parallel lines into parallel lines and finite points mapping to finite points.
  • the affine transformation on the two-dimensional Euclidean space can be expressed as:
  • (x, y) and (x', y') refer to the coordinates of two points (ie, pixels) in the standard image and the real-time image, respectively.
  • (a 0 , a 5 ) T is a translation vector
  • a i is a real number.
  • the affine transformation has 6 degrees of freedom, and the attitude estimated according to the affine transformation is also often referred to as a 6D pose.
  • Translation, rotation, scaling, reflection, and clipping are all cases of affine transformation based on the specific values of the parameters in the vector.
  • the matrix estimation value of the affine transformation matrix set may be determined from the affine transformation matrix set according to the least squares principle, wherein the matrix estimation value is the inverse transformation corresponding to the affine transformation matrix set angle.
  • matrix estimates It can be calculated by the following formula:
  • any virtual content that you want to add to the live image can be The transformation is performed, and the observation angle is consistent with the real-time image, thereby realizing the addition of the virtual content in the real-time image, and displaying the mixed image effect after the augmented reality is displayed for the user.
  • the trained convolutional neural network training model parameters from the server, receiving a real-time image obtained by the user capturing the target object, identifying at least one first image block from the real-time image, and using the image block as a convolution
  • the image block has strong resistance to transformation, especially translational transformation, compared to the entire image; and, no segmentation or any other prior image semantic interpretation is required.
  • FIG. 5 is a schematic flow chart of a method for determining an attitude of an object in an image according to another embodiment of the present application. As shown in FIG. 5, the method includes but is not limited to the following steps:
  • Step 501 Receive and store training model parameters of the trained convolutional neural network from the server.
  • the server performs offline training for the target object in a specific scenario. After the training is completed, the training model parameters are sent to the client for storage, and then the client invokes the training model parameters during real-time monitoring.
  • Step 502 Acquire a real-time image of the target object.
  • Step 503 Identify at least one first image block from the real-time image, and input each first image block into the convolutional neural network.
  • Step 504 For each first image block, output a probability that the first image block matches each second image block based on the weight, and determine a second image block corresponding to the maximum probability value as the label image block.
  • Step 505 Determine, according to each of the first image blocks and the respective matched label image blocks, a matrix estimation value of the affine transformation to represent the geometric posture of the target object.
  • the first image block and the matched tag image block form a matching pair, that is, (q i , p i ).
  • the tradeoffs to the matching pairs may be further included prior to determining the pose. For each first image block, including but not limited to the following steps:
  • Step 5051 Input the first image block into a convolutional neural network, and output a probability that the first image block matches each of the second image blocks based on the weight.
  • the output layer of the convolutional neural network outputs a classification vector of 1 ⁇ M dimensions, and the elements in the vector take the value [0, 1], representing the above probability.
  • the first image block whose total number of second image blocks whose probability is greater than the preset probability threshold is greater than the preset number threshold is referred to as a target image block, and the terminal device determines the target object according to the target image block and the label image block corresponding to the target image. Gesture.
  • a random sampling consistency strategy can also be used to filter out mismatched pairs.
  • Step 506 adding virtual content to the real-time image according to the matrix estimation value.
  • the inverse process can be performed, and the virtual content is converted into the reference frame of the real-time image by affine transformation, so that the two can be superimposed to realize the augmented reality.
  • the matching pair (the first image block, the label image block)
  • the effective value of N is reduced, thereby reducing the computational complexity and improving the posture.
  • the geometrical pose of the target object is characterized by the matrix estimation of the affine transformation. The processing is simple and easy to calculate, which further improves the time efficiency of the algorithm.
  • FIG. 6 is a schematic flow chart of a method for determining an attitude of an object in an image according to an embodiment of the present application. This method can be applied to the server 103 in FIG. The method includes, but is not limited to, the following steps.
  • Step 601 Acquire a standard image of the target object and multiple distortion images of the target object.
  • a standard image is required to determine a plurality of label image blocks used in the classification.
  • the distortion image can be acquired in various ways. For example, a plurality of distorted images are obtained by randomly photographing the same target object using an imaging device, or a plurality of distorted images are obtained by performing various types of distortion processing from a standard image. For the latter, in one embodiment, the distortion of the image can also be introduced by affine transformation. Methods for acquiring a distorted image from a standard image include, but are not limited to, the following steps:
  • step 6011 a plurality of affine transformation matrices are randomly generated.
  • Definition matrix Representing an affine transformation, randomly generating a plurality of affine transformation matrices according to the following formula:
  • the parameters And ⁇ are uniformly generated from (- ⁇ , ⁇ ), the parameters t x and f x are uniformly generated from [0, w], w is the width of the standard image, and the parameters t y and f y are from [0, h Uniformly generated, h is the height of the standard image, and parameters ⁇ 1 and ⁇ 2 are uniformly generated from [0.5, 1.5].
  • Step 6012 Perform affine transformation on the standard image using the affine transformation matrix for each affine transformation matrix to obtain a distortion image.
  • I is the input standard image
  • I' is the generated distortion image
  • N Gaussian white noise
  • the mean is ⁇
  • the variance is ⁇
  • Step 602 Input the standard image and the plurality of distortion images into the convolutional neural network for training, and obtain training model parameters.
  • step 603 the training model parameters are sent to the client.
  • the terminal device receives the real-time image obtained by the user capturing the target object through the client, and identifies at least one first image block from the real-time image; for each first image block, determines, according to the training model parameter, the first image block. Matching the label image block; and, according to each of the first image block and the respective matched label image block, determining the posture of the target object, and adding the virtual content to the real-time image according to the posture.
  • the server constructs a convolutional neural network and then performs training.
  • the convolutional neural network performs feature extraction by convolution operations and then performs feature mapping.
  • Each computational layer of the convolutional neural network consists of multiple feature maps.
  • Each feature map is a plane, and the weights of all neurons on the plane are equal, thus reducing the number of network free parameters.
  • FIG. 7 is a schematic structural diagram of a convolutional neural network in an embodiment of the present application. As shown in Figure 7, the convolutional neural network consists of multiple layers of processing, namely:
  • a convolution matrix is used as a filter.
  • the filter convolves the input image block 700, the weight in the filter is multiplied by the corresponding pixel value in the image block, and all the results are summed. Get a sum value. This process is then repeated, and each region of the convolved image block is left to right, top to bottom, and a value is obtained for each step, and the final matrix is the feature image.
  • the pooling layer is usually used after the convolution layer. Its function is to simplify the output information in the convolution layer, reduce the data dimension, reduce the computational overhead, and control over-fitting.
  • a convolved feature image has a "static" property, which indicates that features that are useful in one image region are most likely to be equally applicable in another region. Therefore, in order to describe a large image, the statistics of the different positions are aggregated, that is, the pooling process. For example, calculating the average or maximum value of a particular feature on an area of an image.
  • the output is a 1 ⁇ M-dimensional classification vector.
  • the elements in the vector take the value [0, 1], and each dimension of the output refers to the probability that the image block belongs to the category.
  • multi-layer convolution is usually used, and then the fully connected layer is used for training. That is, in Fig. 7, the 701 convolutional layer and the 702 pooled layer are combined as a combination, and a plurality of such combinations are sequentially performed.
  • This network is called a deep convolutional neural network.
  • the purpose of multi-layer convolution is to take into account that the features learned by a layer of convolution are often local, and the higher the number of layers, the more global the learned features are.
  • the method for determining the number of convolutional layers includes, but is not limited to, the following steps: presetting the correspondence between the number of image blocks and the number of convolution layers; from the standard image Identifying at least one second image block; determining the number of convolutional layers in the convolutional neural network according to the number and correspondence of the second image blocks.
  • the total number of second image blocks is 400 and the entire network includes 13 layers.
  • there are four convolutional layers in which the first, fourth, seventh, and tenth layers are convolutional layers, followed by the first layer of convolutional layer to perform the maximum pooling layer and row linear rectification function (English: Rectified Linear Unit, ReLU)
  • the excitation layer followed by the Relu excitation layer and the average pooling layer after the 4th layer convolutional layer, followed by the ReLU excitation layer and the average pooling layer after the 7th layer convolutional layer
  • the 10th layer convolutional layer is followed by the ReLU excitation layer, and finally the fully connected layer and the soft maximum (soft-max) output layer.
  • an excitation function will be called in the excitation layer to add nonlinear factors to solve the problem of linear inseparability.
  • the selected excitation function is called ReLU, and its expression is:
  • the value less than zero is classified as 0, so that the convolutional neural network training speed will be faster, and the problem of reducing the gradient disappears.
  • the convolutional neural network also needs to determine the input samples and the ideal output samples during the training process, and then iteratively adjust the weights.
  • at least one second image block is identified from the standard image; each of the distorted images is separately identified to obtain at least one third image block; and the third image block is used when the convolutional neural network performs training As an input sample, each second image block is used as an ideal output sample, and the weight is trained.
  • the weight is adjusted by a back propagation algorithm.
  • the backpropagation algorithm can be divided into four different parts: forward pass, loss function, reverse pass, update weight.
  • the image block is input and passed through a convolutional neural network.
  • all weights are randomly initialized, such as random values [0.3, 0.1, 0.4, 0.2, 0.3....]. Since the convolutional neural network cannot extract the accurate feature image through the initialized weight, it cannot give any reasonable conclusion as to which category the picture belongs to.
  • the convolutional neural network update weight is helped by the loss function in backpropagation to find the desired feature image.
  • MSE mean squared error
  • Table 2 gives the values of the accuracy and the occupied memory of the two methods.
  • the experimental data is set as follows:
  • the convolutional neural network architecture given in Table 1 is used, and the size of the image block is 27 ⁇ 27, and there are 27 rows and 27 columns of pixels.
  • the image block is preprocessed to have a mean of 0 and a variance of one.
  • 2000 affine transformation matrices are randomly generated according to formula (4) for generating a distorted image.
  • the number of second image blocks is 400, and the output vector is a classification vector of 1 ⁇ 400 dimensions.
  • the number of Fern in the Ferns method is 30, and the number of features in each Fern is 12.
  • the accuracy of the method given in the examples of the present application is 86%, and the accuracy of the Ferns method is 88%; for Figure 4a, Figure 4b
  • the accuracy of the method given in the examples of the present application is 87%, and the accuracy of the Ferns method is 88%.
  • the method given in the embodiment of the present application has substantially the same accuracy as the Ferns method.
  • the method provided in the embodiment of the present application uses a convolutional neural network, and the memory usage is only 0.5557M, and the Ferns method occupies 93.75M of memory. It can be seen that the method given in the embodiment of the present application has a low method. Memory resource consumption.
  • FIG. 8 is a schematic structural diagram of a client 800 in an embodiment of the present application.
  • the client 800 may be a virtual device that performs the determining method of the pose of the object in the image in the embodiment of FIG. 2 and FIG. 5, and the device includes:
  • An offline receiving module 810 configured to acquire, from a server, a training model parameter of a convolutional neural network of a target object;
  • An online receiving module 820 configured to acquire a real-time image of the target object
  • the identification module 830 is configured to identify at least one first image block from the real-time image
  • a matching module 840 configured to determine, according to the training model parameter, a label image block that matches each of the first image blocks by using a convolutional neural network
  • the attitude determining module 850 is configured to determine a posture of the target object according to each of the first image block and each of the first image blocks that match the label image block.
  • the adding module 860 is configured to add virtual content in the real-time image according to the gesture. Among them, the adding module 860 is an optional module.
  • the identification module 830 includes:
  • the detecting unit 831 is configured to perform feature detection on the real-time image to obtain a plurality of local features.
  • the determining unit 832 is configured to determine that the image contrast of the plurality of local features is higher than a preset contrast threshold, and the local feature that is not the edge of the image is determined as the first image block.
  • the training model parameters include weights and second image blocks identified from the standard image
  • the matching module 840 is further configured to input each of the first image blocks into the convolutional neural network, based on the weight The value outputs a probability that each of the first image blocks matches each of the second image blocks; acquires a quantity greater than a probability threshold in a probability corresponding to each of the first image blocks; and determines a first image block whose number is greater than a preset number A target image block; the gesture is determined according to the target image block and the label image block that matches the target image block.
  • the matching module 840 is further configured to: acquire a probability that the target image block matches each of the second image blocks; determine a second image block corresponding to the largest probability among the probabilities as the target image block phase The matched label image block; the gesture is determined according to the label image block that matches the target image block and the target image block.
  • each first image block is obtained by affine transformation using a affine transformation matrix for each of the first image block matching label image blocks, each affine transformation matrix forming an affine transformation Matrix set
  • the attitude determination module 850 is further configured to determine a matrix estimation value of the affine transformation matrix set from the affine transformation matrix set according to a least squares principle.
  • the attitude determination module 850 is further configured to calculate a matrix estimate by the following formula:
  • q i is the first image block
  • p i is the label image block matching q i
  • A is the affine transformation matrix
  • represents the square of the modulus value of G
  • G is the set of affine transformation matrices.
  • FIG. 9 is a schematic structural diagram of a client 900 according to another embodiment of the present application.
  • the client 900 may be the terminal device 102 shown in FIG. 1.
  • the server 900 includes a processor 910, a memory 920, a port 930, and a bus 940.
  • Processor 910 and memory 920 are interconnected by a bus 940.
  • Processor 910 can receive and transmit data through port 930. among them,
  • the processor 910 is configured to execute a machine readable instruction module stored by the memory 920.
  • Memory 920 stores machine readable instruction modules executable by processor 910.
  • the instruction modules executable by the processor 910 include an offline receiving module 921, an online receiving module 922, an identifying module 923, a matching module 924, a posture determining module 925, and an adding module 926. among them,
  • the training model parameters of the convolutional neural network of the target object may be acquired from the server;
  • the online receiving module 922 When the online receiving module 922 is executed by the processor 910, it may be: acquiring a real-time image of the target object;
  • the identification module 923 may be executed by the processor 910 to: identify at least one first image block from the real-time image;
  • the matching module 924 may be executed by the processor 910 to: determine, according to the training model parameters, a label image block that matches each of the first image blocks by using a convolutional neural network;
  • the posture determining module 925 may be executed by the processor 910 to: determine a posture of the target object according to each of the first image block and each of the first image blocks that match the label image block;
  • the adding module 926 when executed by the processor 910, may be: adding virtual content to the live image according to the gesture. Among them, the adding module 926 is an optional module.
  • FIG. 10 is a schematic structural diagram of a server 1000 according to an embodiment of the present application.
  • the server 1000 includes a virtual device for performing a method for determining an attitude of an object in an image in the embodiment of FIG. 6, the device comprising:
  • the acquiring module 1010 is configured to acquire a standard image of the target object and multiple distortion images of the target object;
  • a training module 1020 configured to input a standard image and a plurality of distorted images into a convolutional neural network for training, and obtain training model parameters of the convolutional neural network;
  • the sending module 1030 is configured to send the training model parameter to the client, so that the terminal device acquires a real-time image of the target object through the client, and identifies at least one first image block from the real-time image; and convolves according to the training model parameter.
  • the neural network determines a label image block that matches each of the first image blocks; and determines a pose of the target object based on each of the first image blocks and the label image blocks that match each of the first image blocks.
  • the obtaining module 1010 is further configured to randomly generate a plurality of affine transformation matrices; and perform affine transformation on the standard image using each affine transformation matrix to obtain each distorted image.
  • the convolutional neural network includes a plurality of convolution layers
  • the training module 1020 is further configured to: identify at least one second image block from the standard image; according to the number of the second image blocks, and The correspondence between the preset second image block and the number of convolution layers determines the number of convolution layers in the convolutional neural network.
  • the training module 1010 is further configured to: identify at least one second image block from the standard image; separately identify each of the distorted images to obtain at least one third image block; When the network is training, the third image block is taken as an input sample, and the second image block is used as an ideal output sample, and the weight is trained.
  • FIG. 11 is a schematic structural diagram of a server 1100 according to another embodiment of the present application.
  • the server 1100 includes a processor 1110, a memory 1120, a port 1130, and a bus 1140.
  • the processor 1110 and the memory 1120 are interconnected by a bus 1140.
  • the processor 1110 can receive and transmit data through the port 1130. among them,
  • the processor 1110 is configured to execute a machine readable instruction module stored by the memory 1120.
  • the memory 1120 stores a machine readable instruction module executable by the processor 1110.
  • the instruction module executable by the processor 1110 includes an acquisition module 1121, a training module 1122, and a transmission module 1123. among them,
  • the acquiring module 1121 may be executed by the processor 1110 to: acquire a standard image of the target object and multiple distortion images;
  • the training module 1122 When the training module 1122 is executed by the processor 1110, the standard image and the plurality of distorted images are input to the convolutional neural network for training, and the training model parameters are obtained;
  • the training model parameter may be sent to the client, so that the terminal device acquires a real-time image of the target object through the client, and identifies at least one first image block from the real-time image;
  • the model parameter determines a label image block that matches each of the first image blocks by a convolutional neural network; and determines a pose of the target object according to each of the first image block and the label image block that matches each of the first image blocks.
  • each functional module in each embodiment of the present application may be integrated into one processing unit, or each module may exist physically separately, or two or more modules may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the storage medium can use any type of recording method, such as paper storage medium (such as paper tape, etc.), magnetic storage medium (such as floppy disk, hard disk, flash memory, etc.), optical storage medium (such as CD-ROM, etc.), magneto-optical storage medium (such as MO, etc.).
  • paper storage medium such as paper tape, etc.
  • magnetic storage medium such as floppy disk, hard disk, flash memory, etc.
  • optical storage medium such as CD-ROM, etc.
  • magneto-optical storage medium Such as MO, etc.
  • the present application also discloses a storage medium having stored therein at least one piece of data processing program for performing any of the above-described embodiments of the present application.
  • the storage medium has at least one instruction, code set or instruction set, and the at least one instruction, code set or instruction set is loaded and executed by the processor to implement any one of the foregoing methods of the present application.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种图像中物体姿态的确定方法、装置、设备及存储介质。该方法包括:从服务器获取目标物体的卷积神经网络的训练模型参数;获取目标物体的实时图像,从实时图像中识别出至少一个第一图像块;根据训练模型参数,通过卷积神经网络确定与该第一图像块相匹配的标签图像块;根据每个第一图像块和每个第一图像块各自匹配的标签图像块,确定目标物体的姿态。本申请的这种方法,能够提升图像处理的时间效率,消耗较少的内存资源,提高终端设备的资源利用率。

Description

一种图像中物体姿态的确定方法、装置、设备及存储介质
本申请要求于2017年7月14日提交中国国家知识产权局、申请号为201710573908.5、发明名称为“一种图像中物体姿态的确定方法、客户端及服务器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及图像处理技术领域,特别涉及一种图像中物体姿态的确定方法、装置、设备及存储介质。
背景技术
随着计算机图形技术的飞速发展,增强现实技术将计算机产生的图形、文字等虚拟信息有机的融合到使用者所看到的真实场景中,对人的视觉***进行景象的增强或扩张。实现增强现实技术的基础是能够获取真实场景的观测角度。例如,当通过摄像机获取真实场景的图像时,需要通过二维观测图像估计出三维物体的姿态,进而通过三维物体的姿态在真实的场景中增加并显示虚拟内容。
相关技术中,常用的方法是对人工设计的特征进行检测,然后在不同的图像之间进行比较。但是,这类方法需要准确的尺度选择、旋转纠正、密度归一化等附加步骤,计算复杂度很高,耗时较长。当将增强现实技术应用于移动设备或者可穿戴设备等终端设备上时,由于此类终端设备资源受限、具备有限的信息输入和计算能力,上述方法将不再适用。
发明内容
有鉴于此,本申请实施例提供了一种图像中物体姿态的确定方法、装置、设备及存储介质,能够提升图像处理的时间效率,消耗较少的内存资源,提高终端设备的资源利用率。
一方面,本申请提供了一种图像中物体姿态的确定方法,所述方法应用于终端设备中,所述方法包括:
从服务器获取针对目标物体的卷积神经网络的训练模型参数;
获取所述目标物体的实时图像,从所述实时图像中识别出至少一个第一图像块,所述第一图像块是所述实时图像的局部图像;
根据所述训练模型参数,通过所述卷积神经网络确定与每个所述第一图像块相匹配的标签图像块,所述标签图像块是所述目标物体的标准图像的局部图像;
根据所述每个第一图像块和所述每个第一图像块相匹配的标签图像块,确定所述目标物体的姿态。
一方面,本申请提供了一种图像中物体姿态的确定方法,所述方法应用于服务器中,所述方法包括:
获取目标物体的标准图像以及所述目标物体的多张畸变图像;
将所述标准图像和所述多张畸变图像输入到卷积神经网络进行训练,获得训练模型参数;
将所述训练模型参数发送给终端设备,以使所述终端设备获取所述目标物体的实时图像,从所述实时图像中识别出至少一个第一图像块,所述第一图像块是所述实时图像的局部的图像;根据所述训练模型参数,通过所述卷积神经网络确定与每个所述第一图像块相匹配的标签图像块,所述标签图像块是所述标准图像的局部图像;根据所述每个第一图像块和各自匹配的标签图像块,确定所述目标物体的姿态,根据所述姿态在所述实时图像中增加虚拟内容。
一方面,本申请提供了一种图像中物体姿态的确定装置,所述装置包括:
离线接收模块,用于从服务器获取目标物体的卷积神经网络的训练模型参数;
在线接收模块,用于获取所述目标物体的实时图像;
识别模块,用于从所述实时图像中识别出至少一个第一图像块,所述第一图像块是所述实时图像的局部图像;
匹配模块,用于根据所述训练模型参数确定,通过所述卷积神经网络确定与每个所述第一图像块相匹配的标签图像块,所述标签图像块是所述目标物体的标准图像的局部图像;
姿态确定模块,用于根据所述每个第一图像块和所述每个第一图像块相匹配的标签图像块,确定所述目标物体的姿态。
一方面,本申请提供了一种图像中物体姿态的确定装置,所述装置包括:
获取模块,用于获取目标物体的标准图像以及所述目标物体的多张畸变图 像;
训练模块,用于将所述标准图像和所述多张畸变图像输入到卷积神经网络进行训练,获得训练模型参数;
发送模块,用于将所述训练模型参数发送给终端设备,以使所述终端设备获取所述目标物体的实时图像,从所述实时图像中识别出至少一个第一图像块,所述第一图像块是所述实时图像的局部的图像;根据所述训练模型参数,通过所述卷积神经网络确定与每个所述第一图像块相匹配的标签图像块,所述标签图像块是所述标准图像的局部图像;根据所述每个第一图像块和所述每个第一图像块相匹配的标签图像块,确定所述目标物体的姿态。
一方面,本申请提供了一种终端设备,所述终端设备包括处理器和存储器,所述存储器中存储有至少一条指令,所述指令由所述处理器加载并执行以实现如上所述的应用于终端设备中的图像中物体姿态的确定方法。
一方面,本申请提供了一种服务器,所述服务器包括处理器和存储器,所述存储器中存储有至少一条指令,所述指令由所述处理器加载并执行以实现如上所述的应用于服务器中的图像中物体姿态的确定方法。
一方面,本申请提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如上所述的图像中物体姿态的确定方法。
由上述技术方案可见,本申请实施例提供的方法,通过使用卷积神经网络进行离线训练,然后在线确定物体姿态时使用训练好的训练模型参数,使得图像处理的计算复杂度大大低,时间效率高,占用内存资源少,同时还能保证确定方法的准确率,该方法尤其适用于资源受限设备上应用增强现实服务,提升了终端设备的资源使用率。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请一个实施例所涉及的实施环境示意图;
图2为本申请一个实施例中图像中物体姿态的确定方法的流程示意图;
图3a为本申请一个实施例中目标物体的标准图像的示意图;
图3b为本申请一个实施例中目标物体的畸变图像的示意图;
图4a为本申请另一个实施例中目标物体的标准图像的示意图;
图4b为本申请另一个实施例中目标物体的畸变图像的示意图;
图5为本申请另一个实施例中图像中物体姿态的确定方法的流程示意图;
图6为本申请一个实施例中图像中物体姿态的确定方法的流程示意图;
图7为本申请一个实施例中卷积神经网络的结构示意图;
图8为本申请一个实施例中客户端的结构示意图;
图9为本申请另一个实施例中客户端的结构示意图;
图10为本申请一个实施例中服务器的结构示意图;
图11为本申请另一个实施例中服务器的结构示意图。
具体实施方式
为使本申请的目的、技术方案及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。
图1为本申请一个实施例所涉及的增强现实实施环境示意图。如图1所示,在增强现实应用***100中包括目标物体101、终端设备102和服务器103。其中,终端设备102中安装有摄像装置1021、屏幕1023、本申请实施例给出的物体姿态确定客户端,并且运行有增强现实的应用程序。
例如,用户使用摄像装置1021实时拍摄到关于目标物体101的图像1022,显示在屏幕1023上。根据本申请实施例所述的方法,从所拍摄的图像1022中估计出目标物体101的姿态,通过该姿态能够确定目标物体101在所拍摄图像1022中的位置,然后根据该姿态将虚拟内容1024加在同一位置处,从而实现真实世界和虚拟信息叠加在同一个画面中。
根据本申请的实施例,在终端设备处对实时图像进行在线检测之前,终端设备102首先从服务器103处获得对目标物体101的离线训练结果。服务器103中的数据库1031中保存有目标物体101的大量图像样本,然后离线训练子服务器1032使用卷积神经网络对这些图像样本进行离线训练,训练完成后确定出训练模型参数,然后将其发送给终端设备102用于对实时图像的在线检测。
这里,上述终端设备102是指具有图像拍摄以及处理功能的终端设备,包 括但不限于智能手机、掌上电脑、平板电脑等。这些终端设备上都安装有操作***,包括但不限于:Android操作***、Symbian操作***、Windows mobile操作***、以及苹果iPhone OS操作***等等。终端设备102和服务器103之间可以通过无线网络进行通信。
图2为本申请一个实施例中图像中物体姿态的确定方法的流程示意图。该方法可以应用于单独的客户端,或者应用于具备增强现实功能的客户端,该客户端可安装于图1实施例中的终端设备102中。该方法包括但不限于以下步骤。
步骤201,从服务器获取目标物体的卷积神经网络训练模型参数。
例如,服务器获取一个场景中目标物体的标准图像以及多张畸变图像,将标准图像和多张畸变图像输入到卷积神经网络进行训练,获得训练模型参数。然后,服务器将训练模型参数发送给客户端,安装有该客户端的终端设备通过客户端接收到该训练模型参数。
在本申请的实施例中,训练出来的训练模型参数与特定的场景相关,针对的是单一目标物体。所谓标准图像是指在一个特定场景中针对一个目标物体拍摄得到的清晰图像,而畸变图像是在该标准图像的基础上引入各种透视失真而得到的。
图3a给出了一个城市场景中针对一个目标物体的标准图像,图3b则给出了相应的3张畸变图像。该场景是河边的城市建筑群,目标物体是其中最高的楼,如图3a中椭圆所示。3张畸变图像是对图3a中的标准图像进行旋转、平移得到的,在每张畸变图像中都可以看到目标物体—楼,而在背景部分则填充了一些随机数。
图4a和图4b分别给出了另一个场景中针对一个目标物体的标准图像和3张畸变图像。目标物体是河上的桥,如图4a中方框所示。3张畸变图像也是对标准图像进行旋转、平移得到的,在每张畸变图像中都可以看到完整的或者部分的目标物体—桥。
此步骤是在用户使用增强现实服务之前执行,在客户端中事先存储获得的训练模型参数。在用户使用增强现实服务时,则读取该训练模型参数用于目标物体的姿态确定。
步骤202,获取目标物体的实时图像,从实时图像中识别出至少一个第一图像块。
此步骤中,用户处于上述场景中,希望使用增强现实服务,首先通过客户 端所在的终端设备上的摄像装置拍摄得到目标物体的实时图像,将实时图像传递给客户端。然后,客户端从实时图像中识别出至少一个第一图像块,其中,第一图像块是实时图像的局部图像,识别的方法包括但不限于如下步骤:
步骤2021,对实时图像进行特征检测,获取多个局部特征。
局部特征是指图像中一些有别于其周围的地方,描述的是一块区域,使其能具有高可区分度。
步骤2022,针对每个局部特征,若判断出该局部特征的图像对比度高于预设的对比度阈值并且该局部特征并非图像的边缘,则将该局部特征确定为第一图像块。
例如,对比度指的是一幅图像中明暗区域最亮的白和最暗的黑之间不同亮度层级的测量,即指一幅图像灰度反差的大小。这样,识别出的第一图像块,能从周围环境中凸显,减少位置上的歧义。例如,实时图像为一个脸部图像,第一图像块为脸部的鼻尖、眼角等。
例如,可以使用尺度不变特征变换(SIFT)、加速鲁棒特征(SURF)识别算法、加速分段测试的特征识别(FAST)等方法。这些方法检测的准确性和速度各有不同。在实际应用时,可以根据终端设备的硬件能力在处理复杂度和时间效率之间进行折中选择。
在其他实施例中,也可以根据单个判断结果确定局部特征。例如,若判断出该局部特征的图像对比度高于预设的对比度阈值,则将该局部特征确定为第一图像块。或者,该局部特征并非图像的边缘,则将该局部特征确定为第一图像块。这里,局部特征的识别准确度将会影响后续的匹配和确定出的姿态结果。
步骤203,根据训练模型参数,通过卷积神经网络确定与该第一图像块相匹配的标签图像块。
终端设备将每个第一图像块输入至卷积神经网络中,卷积神经网络基于训练模型参数,输出每个第一图像块相匹配的标签图像块。其中,标签图像块是与第一图像块相匹配的标准图像的局部图像。
可选的,训练模型参数包括权值和从标准图像中识别出来的第二图像块,第二图像块是标准图像的局部图像。卷积神经网络包括多个卷积层,权值是指每个卷积层所使用的卷积矩阵中的各个元素值。
此步骤中,匹配的方法包括但不限于如下步骤:
步骤2031,将该第一图像块输入卷积神经网络,基于权值输出该第一图像 块与每个第二图像块相匹配的概率。
卷积神经网络能够对第一图像块进行分类,每个第二图像块代表了类别标签,通过权值对第一图像块进行处理,输出的结果是第一图像块与每个第二图像块相匹配的概率。这个概率数值代表了第一图像块和第二图像块的相似度。
步骤2032,将最大概率值所对应的第二图像块确定为标签图像块。
例如,在客户端和服务器侧预先设置目标物体的标识,训练模型参数中包括该标识。那么,当客户端接收到该训练模型参数后,获知上述标识。在执行步骤202时,根据获取到的实时图像或者终端的当前定位信息,判断出该实时图像对应了哪个目标物体,那么根据该目标物体的标识就能获知在执行步骤203时使用哪个训练模型参数进行匹配。
步骤204,根据每个第一图像块和每个第一图像块相匹配的标签图像块,确定目标物体的姿态。
可选的,目标物体的姿态由仿射变换来表示,也就是说,每个标签图像块经由仿射变换与第一图像块相匹配。其中,仿射变换可以由仿射变换矩阵的形式来表示,由每个第一图像块和其相匹配的标签图像块之间对应的仿射变换矩阵构成仿射变换集群。若第一图像块为q i,i=1,…,N,N为第一图像块的总数,与q i匹配的标签图像块为p i,仿射变换由矩阵A来表示,那么有:
q i=Ap i       (1)
仿射变换能够体现出目标物体相对于摄像镜头的平移和旋转量,可以描述3D空间中的目标物体到2D平面图像的成像过程。仿射变换属于线性变换,即具有将平行线变换成平行线、有限点映射到有限点的一般特性。二维欧氏空间上的仿射变换可以表示为:
Figure PCTCN2018095191-appb-000001
其中,(x,y)和(x′,y′)分别是指标准图像和实时图像中两个点(即像素)的坐标,
Figure PCTCN2018095191-appb-000002
为旋转、伸缩、切变的合成变换的矩阵表示,(a 0,a 5) T为平移矢量,a i均为实数。其中,6个参数组成的向量a=(a 0,a 1,a 2,a 3,a 4,a 5) T代表了仿射变换,决定了两个点之间的坐标转换关系,包括三维旋转和平移。
可见,仿射变换具有6个自由度,根据仿射变换估计出的姿态也常称为6D姿态。根据向量中参数的具体数值,平移、旋转、缩放、反射和剪切等都是仿射变换的一种情况。
在确定目标物体的姿态时,可根据最小二乘原则从仿射变换矩阵集合中确定出仿射变换矩阵集合的矩阵估计值,其中,矩阵估计值是仿射变换矩阵集合对应的逆变换的幅角。例如,矩阵估计值
Figure PCTCN2018095191-appb-000003
可以通过以下公式计算:
Figure PCTCN2018095191-appb-000004
其中,||·||表示·的模值的平方,G为仿射变换矩阵集合。
确定出由
Figure PCTCN2018095191-appb-000005
表示的姿态后,任何希望添加于实时图像中的虚拟内容都可以由
Figure PCTCN2018095191-appb-000006
进行变换,与实时图像保持一致的观测角度,从而实现了在实时图像中增加虚拟内容,为用户展示增强现实后的混合图像效果。
本实施例中,通过从服务器接收已训练完的卷积神经网络训练模型参数,接收用户拍摄目标物体得到的实时图像,从实时图像中识别出至少一个第一图像块,将图像块作为卷积神经网络的输入,好处在于相比整幅图像,这种图像小块抗变换能力强,尤其是平移变换;并且,不需要做分割或者其它任何预先的图像语义解释。
然后,针对每个第一图像块,根据训练模型参数确定与该第一图像块相匹配的标签图像块,根据各个第一图像块和各自匹配的标签图像块,确定目标物体的姿态,根据姿态在实时图像中增加虚拟内容。使用卷积神经网络用于姿态确定的好处在于,这种网络中权值数据在多个连接中可以共享,使得上述方法的计算复杂度低,时间效率高,占用内存资源少,尤其适用于资源受限设备上应用增强现实服务,例如,电池能力受限的移动终端、可穿戴式设备等。
图5为本申请另一个实施例中图像中物体姿态的确定方法的流程示意图。如图5所示,该方法包括但不限于如下步骤:
步骤501,从服务器接收并存储已训练完的卷积神经网络的训练模型参数。
服务器针对某个特定场景下的目标物体进行离线训练,训练完毕后,将训练模型参数发送给客户端进行存储,然后客户端在实时监测时调用该训练模型参数。
步骤502,获取目标物体的实时图像。
例如,实时图像可以是用户拍摄的静态图片或者视频中的一帧图像。当接 收到的是视频流时,每隔固定间隔从视频流中抽取出一帧图像作为待处理的实时图像。例如,视频流每秒包括24帧图像,可以每隔一秒从中抽取出一帧图像。
步骤503,从实时图像中识别出至少一个第一图像块,将每个第一图像块输入卷积神经网络。
步骤504,对于每个第一图像块,基于权值输出该第一图像块与每个第二图像块相匹配的概率,将最大概率值所对应的第二图像块确定为标签图像块。
参见上述步骤202、203中的描述,此处不再赘述。
步骤505,根据各个第一图像块和各自匹配的标签图像块,确定出仿射变换的矩阵估计值来表征目标物体的几何姿态。
本步骤中,第一图像块和与其匹配的标签图像块组成一个匹配对,即(q i,p i)。在确定姿态之前,可以进一步包括对匹配对的取舍。对于每个第一图像块,包括但不限于如下步骤:
步骤5051,将该第一图像块输入卷积神经网络,基于权值输出该第一图像块与每个第二图像块相匹配的概率。
例如,若第二图像块的总数为M,卷积神经网络的输出层输出一个1×M维的分类向量,向量中的元素取值为[0,1],代表了上述概率。
步骤5052,若概率大于预设概率阈值的第二图像块的总数大于预设个数阈值,则将该第一图像块和与其匹配的标签图像块用于确定目标物体的姿态。
例如,预设概率阈值为0.6,预设个数阈值为200,则若数值大于0.6的元素个数大于200个,则保留该匹配对,用于姿态确定。这样选择出来的匹配对能够服从大多数的姿态。
其中,将概率大于预设概率阈值的第二图像块总数大于预设个数阈值的第一图像块称为目标图像块,终端设备根据目标图像块,以及目标图像对应的标签图像块确定目标物体的姿态。
例如,也可以使用随机抽样一致性的策略,来滤除误匹配对。
步骤506,根据矩阵估计值将虚拟内容添加在实时图像中。
通过标准图像确定出仿射变换的矩阵估计值后,可以再执行逆过程,将虚拟内容通过仿射变换再转换到实时图像的参照系中,从而可以将二者叠加在一起,实现增强现实的功能。
在上述实施例中,通过对匹配对(第一图像块,标签图像块)的取舍,由 公式(3)可见,N的有效数值减少,因此,降低了计算的复杂度,同时还能提高姿态确定的准确性。此外,通过仿射变换的矩阵估计值来表征目标物体的几何姿态,处理简单,易于计算,进一步提高了算法的时间效率。
图6为本申请一个实施例中图像中物体姿态的确定方法的流程示意图。该方法可以应用于图1中的服务器103。该方法包括但不限于以下步骤。
步骤601,获取目标物体的标准图像以及目标物体的多张畸变图像。
服务器侧执行离线训练时,首先需要获取大量的训练样本。其中,标准图像是必需的,用于确定分类时使用的多个标签图像块。而畸变图像的获取方式可以有多种,例如,使用摄像装置针对同一目标物体随机拍摄获得多个畸变图像,或者,从标准图像进行各类失真处理获得多个畸变图像。对于后者,在一实施例中,图像的失真也可以通过仿射变换引入。根据标准图像获取畸变图像的方法包括但不限于如下步骤:
步骤6011,随机产生多个仿射变换矩阵。
定义矩阵
Figure PCTCN2018095191-appb-000007
表示仿射变换,按照下式随机产生多个仿射变换矩阵:
Figure PCTCN2018095191-appb-000008
其中,参数
Figure PCTCN2018095191-appb-000009
和θ是从(-π,π]中均匀产生,参数t x和f x是从[0,w]中均匀生成,w为标准图像的宽度,参数t y和f y是从[0,h]中均匀生成,h为标准图像的高度,参数λ 1和λ 2是从[0.5,1.5]中均匀生成。
步骤6012,针对每个仿射变换矩阵,使用该仿射变换矩阵对标准图像进行仿射变换,得到一张畸变图像。
执行变换的表达式如下:
I′=A(I)+N     (5)
其中,I为输入的标准图像,I′为生成的畸变图像,N为高斯白噪声,均值为μ,方差为σ,并且满足如下的关系:
Figure PCTCN2018095191-appb-000010
当0≤μ≤5     (6)
σ=0.3×(μ/2-1)+0.8     (7)
步骤602,将标准图像和多张畸变图像输入到卷积神经网络进行训练,获得训练模型参数。
步骤603,将训练模型参数发送给客户端。
这样,终端设备通过客户端接收用户拍摄目标物体得到的实时图像,从实时图像中识别出至少一个第一图像块;针对每个第一图像块,根据训练模型参数确定与该第一图像块相匹配的标签图像块;及,根据各个第一图像块和各自匹配的标签图像块,确定目标物体的姿态,根据姿态在实时图像中增加虚拟内容。
在上述步骤602中,服务器构建卷积神经网络,然后进行训练。卷积神经网络通过卷积操作进行特征提取,然后进行特征映射。卷积神经网络的每个计算层由多个特征映射组成,每个特征映射是一个平面,平面上所有神经元的权值相等,因而可以减少网络自由参数的个数。
图7为本申请一个实施例中卷积神经网络的结构示意图。如图7所示,卷积神经网络包括多层处理,分别为:
701卷积层:通过一个卷积矩阵作为过滤器,当过滤器卷积输入的图像块700时,把过滤器里面的权值和图像块里对应的像素值相乘,把所有结果加和,得到一个加和值。然后重复这个过程,从左到右、从上到下卷积图像块的每一个区域,每一步都可以得到一个值,最后的矩阵为特征图像。
702池化层:池化层通常用在卷积层之后,其作用就是简化卷积层里输出的信息,减少数据维度,降低计算开销,控制过拟合。
例如,卷积后的特征图像具有一种“静态性”的属性,这表明在一个图像区域有用的特征极有可能在另一个区域同样适用。因此,为了描述一副大的图像,对不同位置的特征进行聚合统计,即池化过程。例如,计算图像一个区域上的某个特定特征的平均值或最大值。相比使用所有提取得到的特征,这些统计特征不仅具有低得多的维度,同时还会改善结果,不容易过拟合。
703全连接层:检测获取到的这些特征图像与哪种类别更相近。这里的类别即由M个第二图像块代表的各种可能标签。
704输出层:输出为1×M维的分类向量,向量中的元素取值为[0,1],输出的每一维都是指该图像块属于该类别的概率。
在实际应用中,通常使用多层卷积,然后再使用全连接层进行训练。即在图7中,将701卷积层和702池化层作为一个组合,将依次执行多个该组合, 这种网络被称为深度卷积神经网络。多层卷积的目的是考虑到一层卷积学到的特征往往是局部的,层数越高,学到的特征就越全局化。
当卷积神经网络包括多个卷积层时,确定卷积层的个数的方法,包括但不限于如下步骤:预设图像块个数与卷积层个数的对应关系;从标准图像中识别出至少一个第二图像块;根据第二图像块的个数和对应关系确定卷积神经网络中卷积层的个数。
例如,表1给出的实施例中,第二图像块的总数为400,整个网络包括了13层。其中,有4个卷积层,其中第1、4、7、10层是卷积层,在第1层卷积层之后紧跟着进行最大池化层和行线性整流函数(英文:Rectified Linear Unit,ReLU)激励层,在第4层卷积层之后紧跟着进行ReLU激励层和平均池化层,在第7层卷积层之后紧跟着进行ReLU激励层和平均池化层,在第10层卷积层之后紧跟着进行ReLU激励层,最后是全连接层和软最大值(英文:soft-max)输出层。
层数 类型 输入矩阵 输出矩阵
1 卷积 27×27 32×27×27
2 最大池化 32×27×27 32×14×14
3 ReLU 32×14×14 32×14×14
4 卷积 32×14×14 32×14×14
5 ReLU 32×14×14 32×14×14
6 平均池化 32×14×14 32×7×7
7 卷积 32×7×7 64×7×7
8 ReLU 64×7×7 64×7×7
9 平均池化 64×7×7 64×4×4
10 卷积 64×4×4 64×1×1
11 ReLU 64×1×1 64×1×1
12 全连接 64×1×1 1×400
13 Soft-max输出 1×400 1×400
表1深度卷积神经网络的结构
其中,激励层中将调用一种激励函数来加入非线性因素,以解决线性不可 分的问题。如表1所示,选择的激励函数方式叫做ReLU,其表达式为:
f(x)=max(0,x)    (8)
即把小于零的值都归为0,这样,卷积神经网络训练的速度会更快,减少梯度消失的问题出现。
此外,卷积神经网络在训练的过程中也需要确定输入样本和理想的输出样本,然后迭代进行权值的调整。在一实施例中,从标准图像中识别出至少一个第二图像块;分别对每张畸变图像进行识别,得到至少一个第三图像块;在卷积神经网络进行训练时,将第三图像块作为输入样本,将各个第二图像块作为理想的输出样本,训练得到权值。
卷积神经网络训练时,通过反向传播算法来调整权值。反向传播算法可以分成4个不同的部分:向前传递,损失函数,反向传递,更新权重。
向前传播过程中,输入图像块,通过卷积神经网络传递它。起初,所有的权值都被随机初始化,如随机值[0.3,0.1,0.4,0.2,0.3....]。由于卷积神经网络通过初始化的权值无法提取准确特征图像,因此无法给出任何合理的结论,图片属于哪种类别。此时,通过反向传播中的损失函数来帮助卷积神经网络更新权值找到想要的特征图像。损失函数的定义方式有很多种,例如,均方误差(英文:mean squared error,MSE)。在卷积神经网络刚开始训练的时候,由于权值都是随机初始化出来的,这个损失值可能会很高。而训练的目的是希望预测值和真实值一样。为此,需要尽量减少损失值,损失值越小就说明预测结果越接近。在这一个过程中,将不断的调整权值,来寻找出哪些权值能使网络的损失减小。例如,采用梯度下降算法。
每次训练,将会完成多次的前向传递、损失函数、反向传递和参数更新的过程。当训练结束后,就得到了训练出来的一些权值。
根据本申请上述实施例给出的物体姿态确定方法,和相关技术中使用随机蕨(英文:Radom Ferns)方法确定姿态相比,表2给出了两种方法在准确率和占用内存的数值。
首先,实验数据是这样设置的:本申请实施例给出的方法中,使用表1给出的卷积神经网络架构,图像块的大小为27×27,共有27行27列个像素,对该图像块进行预处理,使其均值为0,方差为1。离线训练时根据公式(4)随机产生了2000个仿射变换矩阵,用于生成畸变图像。第二图像块的个数为400,输出向量为1×400维的分类向量。Ferns方法中Fern的个数为30,每个Fern 中特征的个数为12。
如表2所示,对于图3a、图3b给出的图像,本申请实施例给出的方法的准确率为86%,而Ferns方法的准确率为88%;对于图4a、图4b给出的图像,本申请实施例给出的方法的准确率为87%,而Ferns方法的准确率为88%。可见,本申请实施例给出的方法与Ferns方法的准确率大致相同。但是就占用内存来看,本申请实施例给出的方法由于使用卷积神经网络,占用内存仅为0.5557M,而Ferns方法占用内存93.75M,可见,本申请实施例给出的方法具有很低的内存资源消耗。
Figure PCTCN2018095191-appb-000011
表2实验数据对比
图8为本申请一个实施例中客户端800的结构示意图。如图8所示,客户端800可以是执行图2和图5实施例中图像中物体姿态的确定方法的虚拟装置,该装置包括:
离线接收模块810,用于从服务器获取目标物体的卷积神经网络的训练模型参数;
在线接收模块820,用于获取目标物体的实时图像;
识别模块830,用于从实时图像中识别出至少一个第一图像块;
匹配模块840,用于根据训练模型参数,通过卷积神经网络确定与每个第一图像块相匹配的标签图像块;
姿态确定模块850,用于根据每个第一图像块和每个第一图像块相匹配的标签图像块,确定目标物体的姿态。
增加模块860,用于根据该姿态在实时图像中增加虚拟内容。其中,增加模块860是可选的模块。
在一个可选的实施例中,识别模块830包括:
检测单元831,用于对实时图像进行特征检测,获取多个局部特征;
判断单元832,用于将多个局部特征中图像对比度高于预设的对比度阈值,且并非图像的边缘的局部特征确定为第一图像块。
在一个可选的实施例中,训练模型参数包括权值和从标准图像中识别出来的第二图像块,匹配模块840还用于,将每个第一图像块输入卷积神经网络,基于权值输出每个第一图像块与每个第二图像块相匹配的概率;获取每个第一图像块对应的概率中大于概率阈值的数量;将数量大于预设个数的第一图像块确定为目标图像块;根据目标图像块和与目标图像块相匹配的标签图像块,确定姿态。
在一个可选的实施例中,匹配模块840还用于,获取目标图像块与每个第二图像块相匹配的概率;将概率中最大的概率对应的第二图像块确定为目标图像块相匹配的标签图像块;根据目标图像块和目标图像块相匹配的标签图像块,确定姿态。
在一个可选的实施例中,每个第一图像块是每个第一图像块相匹配的标签图像块使用仿射变换矩阵进行仿射变换得到的,每个仿射变换矩阵构成仿射变换矩阵集合;
姿态确定模块850还用于,根据最小二乘原则从仿射变换矩阵集合中确定出仿射变换矩阵集合的矩阵估计值。
在一个可选的实施例中,姿态确定模块850还用于通过以下公式计算矩阵估计值:
Figure PCTCN2018095191-appb-000012
其中,
Figure PCTCN2018095191-appb-000013
为矩阵估计值,q i为第一图像块,i=1,…,N,N为第一图像块的总数,p i为与q i匹配的标签图像块,A为仿射变换矩阵,||·||表示·的模值的平方,G为仿射变换矩阵集合。
图9为本申请另一个实施例中客户端900的结构示意图,该客户端900可以是图1中所示的终端设备102。如图9所示,服务器900包括:处理器910、存储器920、端口930以及总线940。处理器910和存储器920通过总线940互联。处理器910可通过端口930接收和发送数据。其中,
处理器910用于执行存储器920存储的机器可读指令模块。
存储器920存储有处理器910可执行的机器可读指令模块。处理器910可 执行的指令模块包括:离线接收模块921、在线接收模块922、识别模块923、匹配模块924、姿态确定模块925和增加模块926。其中,
离线接收模块921被处理器910执行时可以为:从服务器获取目标物体的卷积神经网络的训练模型参数;
在线接收模块922被处理器910执行时可以为:获取目标物体的实时图像;
识别模块923被处理器910执行时可以为:从实时图像中识别出至少一个第一图像块;
匹配模块924被处理器910执行时可以为:根据训练模型参数,通过卷积神经网络确定与每个第一图像块相匹配的标签图像块;
姿态确定模块925被处理器910执行时可以为:根据每个第一图像块和每个第一图像块相匹配的标签图像块,确定目标物体的姿态;
增加模块926被处理器910执行时可以为:根据姿态在实时图像中增加虚拟内容。其中,增加模块926为可选的模块。
由此可以看出,当存储在存储器920中的指令模块被处理器910执行时,可实现前述各个实施例中离线接收模块、在线接收模块、识别模块、匹配模块、姿态确定模块和增加模块的各种功能。
图10为本申请一个实施例中服务器1000的结构示意图。如图10所示,服务器1000包括执行图6实施例中图像中物体姿态的确定方法的虚拟装置,该装置包括:
获取模块1010,用于获取目标物体的标准图像以及目标物体的多张畸变图像;
训练模块1020,用于将标准图像和多张畸变图像输入到卷积神经网络进行训练,获得卷积神经网络的训练模型参数;
发送模块1030,用于将训练模型参数发送给客户端,以使终端设备通过客户端获取目标物体的实时图像,从实时图像中识别出至少一个第一图像块;根据训练模型参数,通过卷积神经网络确定与每个第一图像块相匹配的标签图像块;根据每个第一图像块和每个第一图像块相匹配的标签图像块,确定目标物体的姿态。
在一个可选的实施例中,获取模块1010还用于,随机产生多个仿射变换矩阵;使用每个仿射变换矩阵对标准图像进行仿射变换,得到每张畸变图像。
在一个可选的实施例中,卷积神经网络包括多个卷积层,训练模块1020 还用于,从标准图像中识别出至少一个第二图像块;根据第二图像块的个数,以及预设的第二图像块与卷积层个数的对应关系,确定卷积神经网络中卷积层的个数。
在一个可选的实施例中,训练模块1010还用于,从标准图像中识别出至少一个第二图像块;分别对每张畸变图像进行识别,得到至少一个第三图像块;在卷积神经网络进行训练时,将第三图像块作为输入样本,将第二图像块作为理想的输出样本,训练得到权值。
图11为本申请另一个实施例中服务器1100的结构示意图。如图11所示,服务器1100包括:处理器1110、存储器1120、端口1130以及总线1140。处理器1110和存储器1120通过总线1140互联。处理器1110可通过端口1130接收和发送数据。其中,
处理器1110用于执行存储器1120存储的机器可读指令模块。
存储器1120存储有处理器1110可执行的机器可读指令模块。处理器1110可执行的指令模块包括:获取模块1121、训练模块1122和发送模块1123。其中,
获取模块1121被处理器1110执行时可以为:获取目标物体的标准图像以及多张畸变图像;
训练模块1122被处理器1110执行时可以为:将标准图像和多张畸变图像输入到卷积神经网络进行训练,获得训练模型参数;
发送模块1123被处理器1110执行时可以为:将训练模型参数发送给客户端,以使终端设备通过客户端获取目标物体的实时图像,从实时图像中识别出至少一个第一图像块;根据训练模型参数,通过卷积神经网络确定与每个第一图像块相匹配的标签图像块;根据每个第一图像块和每个第一图像块相匹配的标签图像块,确定目标物体的姿态。
由此可以看出,当存储在存储器1120中的指令模块被处理器1110执行时,可实现前述各个实施例中获取模块、训练模块和发送模块的各种功能。
上述装置实施例中,各个模块及单元实现自身功能的示例性的方法在方法实施例中均有描述,这里不再赘述。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的 形式实现。
另外,本申请的每一个实施例可以通过由数据处理设备如计算机执行的数据处理程序来实现。显然,数据处理程序构成了本申请。此外,通常存储在一个存储介质中的数据处理程序通过直接将程序读取出存储介质或者通过将程序安装或复制到数据处理设备的存储设备(如硬盘和或内存)中执行。因此,这样的存储介质也构成了本申请。存储介质可以使用任何类别的记录方式,例如纸张存储介质(如纸带等)、磁存储介质(如软盘、硬盘、闪存等)、光存储介质(如CD-ROM等)、磁光存储介质(如MO等)等。
因此,本申请还公开了一种存储介质,其中存储有至少一段数据处理程序,该数据处理程序用于执行本申请上述方法的任何一种实施例。可选的,该存储介质中有至少一条指令、代码集或指令集,该至少一条指令、代码集或指令集由处理器加载并执行以实现本申请上述方法的任何一种实施例。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (25)

  1. 一种图像中物体姿态的确定方法,其特征在于,所述方法应用于终端设备中,所述方法包括:
    从服务器获取目标物体的卷积神经网络的训练模型参数;
    获取所述目标物体的实时图像,从所述实时图像中识别出至少一个第一图像块,所述第一图像块是所述实时图像的局部图像;
    根据所述训练模型参数,通过所述卷积神经网络确定与每个所述第一图像块相匹配的标签图像块,所述标签图像块是所述目标物体的标准图像的局部图像;
    根据所述每个第一图像块和所述每个第一图像块相匹配的标签图像块,确定所述目标物体的姿态。
  2. 根据权利要求1所述的方法,其特征在于,所述从所述实时图像中识别出至少一个第一图像块,包括:
    对所述实时图像进行特征检测,获取多个局部特征;
    将所述多个局部特征中图像对比度高于预设的对比度阈值,且并非图像的边缘的局部特征确定为所述第一图像块。
  3. 根据权利要求1所述的方法,其特征在于,所述训练模型参数包括权值和从所述标准图像中识别出来的第二图像块,所述第二图像块是所述标准图像的局部图像,所述根据每个所述第一图像块和所述第一图像块相匹配的标签图像块,确定所述目标物体的姿态,包括:
    将每个所述第一图像块输入所述卷积神经网络,基于所述权值输出所述每个第一图像块与每个所述第二图像块相匹配的概率;
    获取所述每个第一图像块对应的概率中大于概率阈值的数量;
    将所述数量大于预设个数的第一图像块确定为目标图像块;
    根据所述目标图像块和所述目标图像块相匹配的标签图像块,确定所述姿态。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述目标图像块和 所述目标图像块相匹配的标签图像块,确定所述姿态,包括:
    获取所述目标图像块与所述每个第二图像块相匹配的概率;
    将所述概率中最大的概率对应的第二图像块确定为所述目标图像块的标签图像块;
    根据所述目标图像块和所述目标图像块相匹配的标签图像块,确定所述姿态。
  5. 根据权利要求1至4任一所述的方法,其特征在于,所述每个第一图像块是所述每个第一图像块相匹配的标签图像块使用仿射变换矩阵进行仿射变换得到的,每个所述仿射变换矩阵构成仿射变换矩阵集合;
    所述根据所述每个第一图像块和所述每个第一图像块相匹配的标签图像块,确定所述目标物体的姿态,包括:
    根据最小二乘原则从所述仿射变换矩阵集合中确定出所述仿射变换矩阵集合的矩阵估计值,所述矩阵估计值是所述仿射变换矩阵对应的逆变换的幅角。
  6. 根据权利要求5所述的方法,其特征在于,所述根据最小二乘原则从所述仿射变换矩阵集合中确定出所述仿射变换的矩阵估计值,包括:
    通过以下公式计算所述矩阵估计值:
    Figure PCTCN2018095191-appb-100001
    其中,
    Figure PCTCN2018095191-appb-100002
    为所述矩阵估计值,q i为所述第一图像块,i=1,…,N,N为所述第一图像块的总数,p i为与q i匹配的标签图像块,A为所述仿射变换矩阵,||·||表示·的模值的平方,G为所述仿射变换矩阵集合。
  7. 根据权利要求1至4任一所述的方法,其特征在于,所述方法还包括:
    根据所述姿态在所述实时图像中增加并显示虚拟内容。
  8. 一种图像中物体姿态的确定方法,其特征在于,所述方法应用于服务器中,所述方法包括:
    获取目标物体的标准图像以及多张所述目标物体的畸变图像;
    将所述标准图像和所述多张畸变图像输入到卷积神经网络进行训练,获得 所述卷积神经网络的训练模型参数;
    将所述训练模型参数发送给终端设备,以使所述终端设备获取所述目标物体的实时图像,从所述实时图像中识别出至少一个第一图像块,所述第一图像块是所述实时图像的局部的图像;根据所述训练模型参数,通过所述卷积神经网络确定与每个所述第一图像块相匹配的标签图像块,所述标签图像块是所述标准图像的局部图像;根据所述每个第一图像块和所述每个第一图像块相匹配的标签图像块,确定所述目标物体的姿态。
  9. 根据权利要求8所述的方法,其特征在于,所述获取目标物体的标准图像以及多张所述目标物体的畸变图像,包括:
    随机产生多个仿射变换矩阵;
    使用每个所述仿射变换矩阵对所述标准图像进行仿射变换,得到每张所述畸变图像。
  10. 根据权利要求8所述的方法,其特征在于,所述卷积神经网络包括多个卷积层,所述方法还包括:
    从所述标准图像中识别出至少一个第二图像块,所述第二图像块是所述标准图像的局部图像;根据所述第二图像块的个数,以及预设的第二图像块与卷积层个数的对应关系,确定所述卷积神经网络中卷积层的个数。
  11. 根据权利要求8至10中任一所述的方法,其特征在于,所述将所述标准图像和所述多张畸变图像输入到所述卷积神经网络进行训练,包括:
    从所述标准图像中识别出至少一个第二图像块,所述第二图像块是所述标准图像的局部图像;
    分别对所述每张畸变图像进行识别,得到至少一个第三图像块,所述第三图像块是所述畸变图像的局部图像;
    在所述卷积神经网络进行训练时,将所述第三图像块作为输入样本,将所述第二图像块作为理想的输出样本,训练得到所述权值。
  12. 一种图像中物体姿态的确定装置,其特征在于,所述装置包括:
    离线接收模块,用于从服务器获取目标物体的卷积神经网络的训练模型参 数;
    在线接收模块,用于获取所述目标物体的实时图像;
    识别模块,用于从所述实时图像中识别出至少一个第一图像块,所述第一图像块是所述实时图像的局部图像;
    匹配模块,用于根据所述训练模型参数,通过所述卷积神经网络确定与每个所述第一图像块相匹配的标签图像块,所述标签图像块是所述目标物体的标准图像的局部图像;
    姿态确定模块,用于根据所述每个第一图像块和所述每个第一图像块相匹配的标签图像块,确定所述目标物体的姿态。
  13. 根据权利要求12所述的装置,其特征在于,所述识别模块,还用于对所述实时图像进行特征检测,获取多个局部特征;将所述多个局部特征中图像对比度高于预设的对比度阈值,且并非图像的边缘的局部特征确定为所述第一图像块。
  14. 根据权利要求12所述的装置,其特征在于,所述训练模型参数包括权值和从所述标准图像中识别出来的第二图像块,所述第二图像块是所述标准图像的局部图像,所述匹配模块还用于,将每个所述第一图像块输入所述卷积神经网络,基于所述权值输出所述每个第一图像块与每个所述第二图像块相匹配的概率;获取所述每个第一图像块对应的概率中大于概率阈值的数量;将所述数量大于预设个数的第一图像块确定为目标图像块;根据所述目标图像块和与所述目标图像块相匹配的标签图像块,确定所述姿态。
  15. 根据权利要求14所述的装置,其特征在于,所述匹配模块还用于获取所述目标图像块与所述每个第二图像块相匹配的概率;将所述概率中最大的概率对应的第二图像块确定为所述目标图像块相匹配的标签图像块;根据所述目标图像块和所述目标图像块相匹配的标签图像块,确定所述姿态。
  16. 根据权利要求12至15任一所述的装置,其特征在于,所述每个第一图像块是所述每个第一图像块相匹配的标签图像块使用仿射变换矩阵进行仿射变换得到的,每个所述仿射变换矩阵构成仿射变换矩阵集合;
    所述姿态确定模块还用于,根据最小二乘原则从所述仿射变换矩阵集合中确定出所述仿射变换矩阵集合的矩阵估计值,所述矩阵估计值是所述仿射变换矩阵对应的逆变换的幅角。
  17. 根据权利要求16所述的装置,其特征在于,所述姿态确定模块还用于通过以下公式计算所述矩阵估计值:
    Figure PCTCN2018095191-appb-100003
    其中,
    Figure PCTCN2018095191-appb-100004
    为所述矩阵估计值,q i为所述第一图像块,i=1,…,N,N为所述第一图像块的总数,p i为与q i匹配的标签图像块,A为所述仿射变换矩阵,||·||表示·的模值的平方,G为所述仿射变换矩阵集合。
  18. 根据权利要求12至15任一所述的装置,其特征在于,所述装置还包括增加模块;
    所述增加模块,用于根据所述姿态在所述实时图像中增加并显示虚拟内容。
  19. 一种图像中物体姿态的确定装置,其特征在于,所述装置包括:
    获取模块,用于获取目标物体的标准图像以及所述目标物体多张畸变图像;
    训练模块,用于将所述标准图像和所述多张畸变图像输入到卷积神经网络进行训练,获得所述卷积神经网络的训练模型参数;
    发送模块,用于将所述训练模型参数发送给终端设备,以使所述终端设备获取所述目标物体的实时图像,从所述实时图像中识别出至少一个第一图像块,所述第一图像块是所述实时图像的局部的图像;根据所述训练模型参数,通过所述卷积神经网络确定与每个所述第一图像块相匹配的标签图像块,所述标签图像块是所述标准图像的局部图像;根据所述每个第一图像块和所述每个第一图像块相匹配的标签图像块,确定所述目标物体的姿态。
  20. 根据权利要求19所述的装置,其特征在于,所述获取模块还用于,随机产生多个仿射变换矩阵;使用每个所述仿射变换矩阵对所述标准图像进行仿射变换,得到每张所述畸变图像。
  21. 根据权利要求19所述的装置,其特征在于,所述卷积神经网络包括多个卷积层,所述训练模块,还用于从所述标准图像中识别出至少一个第二图像块,所述第二图像块是所述标准图像的局部图像;根据所述第二图像块的个数,以及预设的第二图像块与卷积层个数的对应关系,确定所述卷积神经网络中卷积层的个数。
  22. 根据权利要求19至21所述的装置,其特征在于,所述训练模块,还用于从所述标准图像中识别出至少一个第二图像块,所述第二图像块是所述标准图像的局部图像;分别对所述每张畸变图像进行识别,得到至少一个第三图像块,所述第三图像块是所述畸变图像的局部图像;在所述卷积神经网络进行训练时,将所述第三图像块作为输入样本,将所述第二图像块作为理想的输出样本,训练得到所述权值。
  23. 一种终端设备,其特征在于,所述终端设备包括处理器和存储器,所述存储器中存储有至少一条指令,所述指令由所述处理器加载并执行以实现如权利要求1至7任一所述的图像中物体姿态的确定方法。
  24. 一种服务器,其特征在于,所述服务器包括处理器和存储器,所述存储器中存储有至少一条指令,所述指令由所述处理器加载并执行以实现如权利要求8至11任一所述的图像中物体姿态的确定方法。
  25. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至9任一所述的图像中物体姿态的确定方法。
PCT/CN2018/095191 2017-07-14 2018-07-10 一种图像中物体姿态的确定方法、装置、设备及存储介质 WO2019011249A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020197030205A KR102319177B1 (ko) 2017-07-14 2018-07-10 이미지 내의 객체 자세를 결정하는 방법 및 장치, 장비, 및 저장 매체
JP2019541339A JP6789402B2 (ja) 2017-07-14 2018-07-10 画像内の物体の姿の確定方法、装置、設備及び記憶媒体
EP18832199.6A EP3576017A4 (en) 2017-07-14 2018-07-10 METHOD, DEVICE AND DEVICE FOR DETERMINING THE POSITION OF AN OBJECT ON AN IMAGE AND STORAGE MEDIUM
US16/531,434 US11107232B2 (en) 2017-07-14 2019-08-05 Method and apparatus for determining object posture in image, device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710573908.5A CN107330439B (zh) 2017-07-14 2017-07-14 一种图像中物体姿态的确定方法、客户端及服务器
CN201710573908.5 2017-07-14

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/531,434 Continuation US11107232B2 (en) 2017-07-14 2019-08-05 Method and apparatus for determining object posture in image, device, and storage medium

Publications (1)

Publication Number Publication Date
WO2019011249A1 true WO2019011249A1 (zh) 2019-01-17

Family

ID=60227213

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/095191 WO2019011249A1 (zh) 2017-07-14 2018-07-10 一种图像中物体姿态的确定方法、装置、设备及存储介质

Country Status (6)

Country Link
US (1) US11107232B2 (zh)
EP (1) EP3576017A4 (zh)
JP (1) JP6789402B2 (zh)
KR (1) KR102319177B1 (zh)
CN (1) CN107330439B (zh)
WO (1) WO2019011249A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109903375A (zh) * 2019-02-21 2019-06-18 Oppo广东移动通信有限公司 模型生成方法、装置、存储介质及电子设备
CN110751223A (zh) * 2019-10-25 2020-02-04 北京达佳互联信息技术有限公司 一种图像匹配方法、装置、电子设备及存储介质
CN111507908A (zh) * 2020-03-11 2020-08-07 平安科技(深圳)有限公司 图像矫正处理方法、装置、存储介质及计算机设备
CN112446433A (zh) * 2020-11-30 2021-03-05 北京数码视讯技术有限公司 训练姿势的准确度确定方法、装置及电子设备
WO2021098831A1 (zh) * 2019-11-22 2021-05-27 乐鑫信息科技(上海)股份有限公司 一种适用于嵌入式设备的目标检测***
CN114037740A (zh) * 2021-11-09 2022-02-11 北京字节跳动网络技术有限公司 图像数据流的处理方法、装置及电子设备

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10586111B2 (en) * 2017-01-13 2020-03-10 Google Llc Using machine learning to detect which part of the screen includes embedded frames of an uploaded video
CN107330439B (zh) 2017-07-14 2022-11-04 腾讯科技(深圳)有限公司 一种图像中物体姿态的确定方法、客户端及服务器
CN108012156B (zh) * 2017-11-17 2020-09-25 深圳市华尊科技股份有限公司 一种视频处理方法及控制平台
US10977755B2 (en) * 2017-11-21 2021-04-13 International Business Machines Corporation Cognitive screening for prohibited items across multiple checkpoints by using context aware spatio-temporal analysis
US20210374476A1 (en) * 2017-11-24 2021-12-02 Truemed Oy Method and system for identifying authenticity of an object
CN108449489B (zh) * 2018-01-31 2020-10-23 维沃移动通信有限公司 一种柔性屏控制方法、移动终端及服务器
DE102018207977B4 (de) * 2018-05-22 2023-11-02 Zf Friedrichshafen Ag Innenüberwachung für Sicherheitsgurteinstellung
US10789696B2 (en) * 2018-05-24 2020-09-29 Tfi Digital Media Limited Patch selection for neural network based no-reference image quality assessment
CN109561210A (zh) * 2018-11-26 2019-04-02 努比亚技术有限公司 一种交互调控方法、设备及计算机可读存储介质
CN109903332A (zh) * 2019-01-08 2019-06-18 杭州电子科技大学 一种基于深度学习的目标姿态估计方法
CN110097087B (zh) * 2019-04-04 2021-06-11 浙江科技学院 一种自动钢筋捆扎位置识别方法
CN110232411B (zh) * 2019-05-30 2022-08-23 北京百度网讯科技有限公司 模型蒸馏实现方法、装置、***、计算机设备及存储介质
CN110263918A (zh) * 2019-06-17 2019-09-20 北京字节跳动网络技术有限公司 训练卷积神经网络的方法、装置、电子设备和计算机可读存储介质
US10922877B2 (en) * 2019-07-01 2021-02-16 Samsung Electronics Co., Ltd. Higher-order function networks for learning composable three-dimensional (3D) object and operating method thereof
US11576794B2 (en) 2019-07-02 2023-02-14 Wuhan United Imaging Healthcare Co., Ltd. Systems and methods for orthosis design
CN110327146A (zh) * 2019-07-02 2019-10-15 武汉联影医疗科技有限公司 一种矫形器设计方法、装置和服务器
CN110443149A (zh) * 2019-07-10 2019-11-12 安徽万维美思信息科技有限公司 目标物体搜索方法、***及存储介质
CN112308103B (zh) * 2019-08-02 2023-10-20 杭州海康威视数字技术股份有限公司 生成训练样本的方法和装置
CN110610173A (zh) * 2019-10-16 2019-12-24 电子科技大学 基于Mobilenet的羽毛球动作分析***及方法
CN111194000B (zh) * 2020-01-07 2021-01-26 东南大学 基于蓝牙融合混合滤波与神经网络的测距方法与***
CN111734974B (zh) * 2020-01-22 2022-06-03 中山明易智能家居科技有限公司 一种具有坐姿提醒功能的智能台灯
CN111402399B (zh) * 2020-03-10 2024-03-05 广州虎牙科技有限公司 人脸驱动和直播方法、装置、电子设备及存储介质
CN111462239B (zh) * 2020-04-03 2023-04-14 清华大学 姿态编码器训练及姿态估计方法及装置
KR102466978B1 (ko) 2020-04-23 2022-11-14 엔에이치엔클라우드 주식회사 딥러닝 기반 가상 이미지 생성방법 및 시스템
CN111553419B (zh) * 2020-04-28 2022-09-09 腾讯科技(深圳)有限公司 一种图像识别方法、装置、设备以及可读存储介质
CN111553420B (zh) * 2020-04-28 2023-08-15 北京邮电大学 一种基于神经网络的x线影像识别方法及装置
CN111638797A (zh) * 2020-06-07 2020-09-08 浙江商汤科技开发有限公司 一种展示控制方法及装置
CN112288816B (zh) * 2020-11-16 2024-05-17 Oppo广东移动通信有限公司 位姿优化方法、位姿优化装置、存储介质与电子设备
CN112200862B (zh) * 2020-12-01 2021-04-13 北京达佳互联信息技术有限公司 目标检测模型的训练方法、目标检测方法及装置
CN113034439B (zh) * 2021-03-03 2021-11-23 北京交通大学 一种高速铁路声屏障缺损检测方法及装置
CN114819149B (zh) * 2022-06-28 2022-09-13 深圳比特微电子科技有限公司 基于变换神经网络的数据处理方法、装置和介质
CN116051486B (zh) * 2022-12-29 2024-07-02 抖音视界有限公司 内窥镜图像识别模型的训练方法、图像识别方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103927534A (zh) * 2014-04-26 2014-07-16 无锡信捷电气股份有限公司 一种基于卷积神经网络的喷码字符在线视觉检测方法
CN104268538A (zh) * 2014-10-13 2015-01-07 江南大学 一种易拉罐点阵喷码字符在线视觉检测方法
CN105512676A (zh) * 2015-11-30 2016-04-20 华南理工大学 一种智能终端上的食物识别方法
US20160170492A1 (en) * 2014-12-15 2016-06-16 Aaron DeBattista Technologies for robust two-dimensional gesture recognition
CN106683091A (zh) * 2017-01-06 2017-05-17 北京理工大学 一种基于深度卷积神经网络的目标分类及姿态检测方法
CN107330439A (zh) * 2017-07-14 2017-11-07 腾讯科技(深圳)有限公司 一种图像中物体姿态的确定方法、客户端及服务器

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4126541B2 (ja) * 2002-11-28 2008-07-30 富士ゼロックス株式会社 画像処理装置及び画像処理方法、画像処理プログラム、記憶媒体
JP4196302B2 (ja) * 2006-06-19 2008-12-17 ソニー株式会社 情報処理装置および方法、並びにプログラム
EP2553657A4 (en) * 2010-04-02 2015-04-15 Nokia Corp METHODS AND APPARATUS FOR FACIAL DETECTION
CN102324043B (zh) * 2011-09-07 2013-12-18 北京邮电大学 基于dct的特征描述算子及优化空间量化的图像匹配方法
AU2011253779A1 (en) * 2011-12-01 2013-06-20 Canon Kabushiki Kaisha Estimation of shift and small image distortion
US9235780B2 (en) * 2013-01-02 2016-01-12 Samsung Electronics Co., Ltd. Robust keypoint feature selection for visual search with self matching score
US20140204013A1 (en) * 2013-01-18 2014-07-24 Microsoft Corporation Part and state detection for gesture recognition
KR102221152B1 (ko) * 2014-03-18 2021-02-26 에스케이플래닛 주식회사 객체 자세 기반 연출 효과 제공 장치 및 방법, 그리고 이를 위한 컴퓨터 프로그램이 기록된 기록매체
KR102449533B1 (ko) * 2015-05-28 2022-10-04 삼성전자주식회사 전자 장치 및 전자 장치에서 어플리케이션의 실행을 제어하는 방법
JP2017059207A (ja) * 2015-09-18 2017-03-23 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America 画像認識方法
CN105718960B (zh) * 2016-01-27 2019-01-04 北京工业大学 基于卷积神经网络和空间金字塔匹配的图像排序模型
CN106845440B (zh) * 2017-02-13 2020-04-10 山东万腾电子科技有限公司 一种增强现实图像处理方法及***
CN107038681B (zh) * 2017-05-31 2020-01-10 Oppo广东移动通信有限公司 图像虚化方法、装置、计算机可读存储介质和计算机设备
US10706535B2 (en) * 2017-09-08 2020-07-07 International Business Machines Corporation Tissue staining quality determination

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103927534A (zh) * 2014-04-26 2014-07-16 无锡信捷电气股份有限公司 一种基于卷积神经网络的喷码字符在线视觉检测方法
CN104268538A (zh) * 2014-10-13 2015-01-07 江南大学 一种易拉罐点阵喷码字符在线视觉检测方法
US20160170492A1 (en) * 2014-12-15 2016-06-16 Aaron DeBattista Technologies for robust two-dimensional gesture recognition
CN105512676A (zh) * 2015-11-30 2016-04-20 华南理工大学 一种智能终端上的食物识别方法
CN106683091A (zh) * 2017-01-06 2017-05-17 北京理工大学 一种基于深度卷积神经网络的目标分类及姿态检测方法
CN107330439A (zh) * 2017-07-14 2017-11-07 腾讯科技(深圳)有限公司 一种图像中物体姿态的确定方法、客户端及服务器

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109903375A (zh) * 2019-02-21 2019-06-18 Oppo广东移动通信有限公司 模型生成方法、装置、存储介质及电子设备
CN109903375B (zh) * 2019-02-21 2023-06-06 Oppo广东移动通信有限公司 模型生成方法、装置、存储介质及电子设备
CN110751223A (zh) * 2019-10-25 2020-02-04 北京达佳互联信息技术有限公司 一种图像匹配方法、装置、电子设备及存储介质
CN110751223B (zh) * 2019-10-25 2022-09-30 北京达佳互联信息技术有限公司 一种图像匹配方法、装置、电子设备及存储介质
WO2021098831A1 (zh) * 2019-11-22 2021-05-27 乐鑫信息科技(上海)股份有限公司 一种适用于嵌入式设备的目标检测***
CN111507908A (zh) * 2020-03-11 2020-08-07 平安科技(深圳)有限公司 图像矫正处理方法、装置、存储介质及计算机设备
CN111507908B (zh) * 2020-03-11 2023-10-20 平安科技(深圳)有限公司 图像矫正处理方法、装置、存储介质及计算机设备
CN112446433A (zh) * 2020-11-30 2021-03-05 北京数码视讯技术有限公司 训练姿势的准确度确定方法、装置及电子设备
CN114037740A (zh) * 2021-11-09 2022-02-11 北京字节跳动网络技术有限公司 图像数据流的处理方法、装置及电子设备

Also Published As

Publication number Publication date
JP6789402B2 (ja) 2020-11-25
KR20190128686A (ko) 2019-11-18
CN107330439B (zh) 2022-11-04
EP3576017A4 (en) 2020-12-30
US20190355147A1 (en) 2019-11-21
KR102319177B1 (ko) 2021-10-28
JP2020507850A (ja) 2020-03-12
CN107330439A (zh) 2017-11-07
US11107232B2 (en) 2021-08-31
EP3576017A1 (en) 2019-12-04

Similar Documents

Publication Publication Date Title
WO2019011249A1 (zh) 一种图像中物体姿态的确定方法、装置、设备及存储介质
US11481869B2 (en) Cross-domain image translation
US10769411B2 (en) Pose estimation and model retrieval for objects in images
CN108229296B (zh) 人脸皮肤属性识别方法和装置、电子设备、存储介质
US10204423B2 (en) Visual odometry using object priors
US10373380B2 (en) 3-dimensional scene analysis for augmented reality operations
CN109683699B (zh) 基于深度学习实现增强现实的方法、装置及移动终端
TWI587205B (zh) Method and system of three - dimensional interaction based on identification code
WO2018137623A1 (zh) 图像处理方法、装置以及电子设备
WO2019020075A1 (zh) 图像处理方法、装置、存储介质、计算机程序和电子设备
US10726599B2 (en) Realistic augmentation of images and videos with graphics
US20200111234A1 (en) Dual-view angle image calibration method and apparatus, storage medium and electronic device
WO2020134818A1 (zh) 图像处理方法及相关产品
CN113688907B (zh) 模型训练、视频处理方法,装置,设备以及存储介质
CN111459269B (zh) 一种增强现实显示方法、***及计算机可读存储介质
US20210334569A1 (en) Image depth determining method and living body identification method, circuit, device, and medium
CN113793370B (zh) 三维点云配准方法、装置、电子设备及可读介质
CN114627173A (zh) 通过差分神经渲染进行对象检测的数据增强
WO2022052782A1 (zh) 图像的处理方法及相关设备
EP4107650A1 (en) Systems and methods for object detection including pose and size estimation
CN113436251B (zh) 一种基于改进的yolo6d算法的位姿估计***及方法
WO2022063321A1 (zh) 图像处理方法、装置、设备及存储介质
CN112102145A (zh) 图像处理方法及装置
CN111260544B (zh) 数据处理方法及装置、电子设备和计算机存储介质
US11769263B2 (en) Three-dimensional scan registration with deformable models

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18832199

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019541339

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2018832199

Country of ref document: EP

Effective date: 20190829

ENP Entry into the national phase

Ref document number: 20197030205

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE