WO2019011249A1 - 一种图像中物体姿态的确定方法、装置、设备及存储介质 - Google Patents
一种图像中物体姿态的确定方法、装置、设备及存储介质 Download PDFInfo
- Publication number
- WO2019011249A1 WO2019011249A1 PCT/CN2018/095191 CN2018095191W WO2019011249A1 WO 2019011249 A1 WO2019011249 A1 WO 2019011249A1 CN 2018095191 W CN2018095191 W CN 2018095191W WO 2019011249 A1 WO2019011249 A1 WO 2019011249A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- image block
- block
- determining
- target object
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 92
- 238000012549 training Methods 0.000 claims abstract description 99
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 77
- 230000009466 transformation Effects 0.000 claims description 74
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 67
- 239000011159 matrix material Substances 0.000 claims description 50
- 238000001514 detection method Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 abstract description 13
- 238000010586 diagram Methods 0.000 description 16
- 230000003190 augmentative effect Effects 0.000 description 14
- 230000006870 function Effects 0.000 description 12
- 238000011176 pooling Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 230000005284 excitation Effects 0.000 description 7
- 241000985694 Polypodiopsida Species 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 3
- 238000003384 imaging method Methods 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/17—Image acquisition using hand-held instruments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/20—Scenes; Scene-specific elements in augmented reality scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20076—Probabilistic image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the embodiments of the present invention relate to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for determining an object posture in an image.
- the augmented reality technology organically integrates the virtual information such as graphics and text generated by the computer into the real scene seen by the user, and enhances or expands the scene of the human visual system.
- the basis for implementing augmented reality technology is the ability to obtain observation angles of real scenes. For example, when an image of a real scene is acquired by a camera, it is necessary to estimate the posture of the three-dimensional object by the two-dimensional observation image, and then the virtual content is added and displayed in the real scene by the posture of the three-dimensional object.
- a commonly used method is to detect artificially designed features and then compare between different images.
- such methods require additional steps such as accurate scale selection, rotation correction, and density normalization, which are computationally complex and time consuming.
- augmented reality technology is applied to a terminal device such as a mobile device or a wearable device, the above method is no longer applicable because such terminal device has limited resources and limited information input and computing capabilities.
- the embodiments of the present application provide a method, a device, a device, and a storage medium for determining an object posture in an image, which can improve the time efficiency of image processing, consume less memory resources, and improve resource utilization of the terminal device.
- the present application provides a method for determining an object pose in an image, the method being applied to a terminal device, the method comprising:
- the convolutional neural network Determining, by the convolutional neural network, a label image block matching each of the first image blocks according to the training model parameter, the label image block being a partial image of a standard image of the target object;
- the present application provides a method for determining an object pose in an image, the method being applied to a server, the method comprising:
- the terminal device Transmitting the training model parameters to the terminal device, so that the terminal device acquires a real-time image of the target object, and identifies at least one first image block from the real-time image, where the first image block is a partial image of the real-time image; determining, according to the training model parameter, a label image block that matches each of the first image blocks by the convolutional neural network, the label image block being part of the standard image And determining an attitude of the target object according to each of the first image blocks and the respective matched label image blocks, and adding virtual content to the real-time image according to the gesture.
- the present application provides a device for determining an attitude of an object in an image, the device comprising:
- An offline receiving module configured to acquire, from a server, a training model parameter of a convolutional neural network of a target object
- An online receiving module configured to acquire a real-time image of the target object
- An identification module configured to identify at least one first image block from the real-time image, the first image block being a partial image of the real-time image;
- a matching module configured to determine, according to the training model parameter, a label image block that matches each of the first image blocks by the convolutional neural network, the label image block being a standard image of the target object Partial image
- a posture determining module configured to determine a posture of the target object according to the label image block that is matched by each of the first image blocks and each of the first image blocks.
- the present application provides a device for determining an attitude of an object in an image, the device comprising:
- An acquisition module configured to acquire a standard image of the target object and a plurality of distortion images of the target object
- a training module configured to input the standard image and the multiple distortion images into a convolutional neural network for training, to obtain training model parameters
- a sending module configured to send the training model parameter to the terminal device, so that the terminal device acquires a real-time image of the target object, and identifies at least one first image block from the real-time image, the first An image block is a partial image of the real-time image; and according to the training model parameter, a label image block matching each of the first image blocks is determined by the convolutional neural network, the label image block is a partial image of the standard image; determining a pose of the target object according to the label image block that each of the first image block and the first image block matches.
- the present application provides a terminal device including a processor and a memory, the memory storing at least one instruction loaded by the processor and executed to implement the application as described above A method of determining an object pose in an image in a terminal device.
- the application provides a server, the server including a processor and a memory, the memory storing at least one instruction loaded by the processor and executed to implement the application to the server as described above The method of determining the pose of an object in an image.
- the present application provides a computer readable storage medium having stored therein at least one instruction, at least one program, a code set, or a set of instructions, at least one instruction, the at least one program, and the code
- a set or set of instructions is loaded and executed by the processor to implement a method of determining an object pose in an image as described above.
- the method provided by the embodiment of the present application uses the convolutional neural network to perform offline training, and then uses the trained training model parameters when determining the posture of the object online, so that the computational complexity of the image processing is greatly reduced, and the time efficiency is High, occupying less memory resources, and ensuring the accuracy of the method.
- This method is especially applicable to the application of augmented reality services on resource-constrained devices, which improves the resource utilization rate of the terminal devices.
- FIG. 1 is a schematic diagram of an implementation environment involved in an embodiment of the present application.
- FIG. 2 is a schematic flow chart of a method for determining an attitude of an object in an image according to an embodiment of the present application
- 3a is a schematic diagram of a standard image of a target object in an embodiment of the present application.
- FIG. 3b is a schematic diagram of a distortion image of a target object in an embodiment of the present application.
- 4a is a schematic diagram of a standard image of a target object in another embodiment of the present application.
- 4b is a schematic diagram of a distortion image of a target object in another embodiment of the present application.
- FIG. 5 is a schematic flow chart of a method for determining an attitude of an object in an image according to another embodiment of the present application.
- FIG. 6 is a schematic flow chart of a method for determining an attitude of an object in an image according to an embodiment of the present application
- FIG. 7 is a schematic structural diagram of a convolutional neural network according to an embodiment of the present application.
- FIG. 8 is a schematic structural diagram of a client in an embodiment of the present application.
- FIG. 9 is a schematic structural diagram of a client in another embodiment of the present application.
- FIG. 10 is a schematic structural diagram of a server according to an embodiment of the present application.
- FIG. 11 is a schematic structural diagram of a server in another embodiment of the present application.
- FIG. 1 is a schematic diagram of an augmented reality implementation environment according to an embodiment of the present application.
- a target object 101 a terminal device 102, and a server 103 are included in the augmented reality application system 100.
- the terminal device 102 is equipped with an imaging device 1021, a screen 1023, an object posture determining client according to the embodiment of the present application, and an application running with augmented reality.
- the user photographs the image 1022 regarding the target object 101 in real time using the camera device 1021, and displays it on the screen 1023.
- the posture of the target object 101 is estimated from the captured image 1022, by which the position of the target object 101 in the captured image 1022 can be determined, and then the virtual content 1024 is determined according to the posture. Add to the same location, so that the real world and virtual information are superimposed in the same picture.
- the terminal device 102 first obtains an offline training result for the target object 101 from the server 103 before the online detection of the real-time image at the terminal device.
- a large number of image samples of the target object 101 are stored in the database 1031 in the server 103, and then the offline training sub-server 1032 performs offline training on the image samples using the convolutional neural network.
- the training model parameters are determined, and then sent to the training model parameters.
- the terminal device 102 is used for online detection of real-time images.
- the above-described terminal device 102 refers to a terminal device having an image capturing and processing function, including but not limited to a smartphone, a palmtop computer, a tablet computer, and the like.
- Operating systems are installed on these terminal devices, including but not limited to: Android operating system, Symbian operating system, Windows mobile operating system, and Apple iPhone OS operating system.
- Communication between the terminal device 102 and the server 103 can be through a wireless network.
- FIG. 2 is a schematic flow chart of a method for determining an attitude of an object in an image according to an embodiment of the present application.
- the method can be applied to a separate client or to a client with augmented reality functionality, which can be installed in the terminal device 102 in the embodiment of FIG.
- the method includes, but is not limited to, the following steps.
- Step 201 Acquire a convolutional neural network training model parameter of the target object from the server.
- the server acquires a standard image of a target object in a scene and a plurality of distorted images, and inputs a standard image and a plurality of distorted images into a convolutional neural network for training to obtain training model parameters. Then, the server sends the training model parameters to the client, and the terminal device installed with the client receives the training model parameters through the client.
- the trained training model parameters are associated with a particular scene and are directed to a single target object.
- the so-called standard image refers to a clear image taken for a target object in a specific scene, and the distorted image is obtained by introducing various perspective distortions on the basis of the standard image.
- Figure 3a shows a standard image for a target object in a city scene
- Figure 3b shows the corresponding three distortion images.
- the scene is a city complex near the river, with the target object being the tallest building, as shown by the oval in Figure 3a.
- the three distorted images are obtained by rotating and translating the standard image in Fig. 3a.
- the target object-floor is visible in each distorted image, and some random numbers are filled in the background portion.
- Figures 4a and 4b show a standard image and three distorted images for one target object in another scene, respectively.
- the target object is the bridge over the river, as shown by the box in Figure 4a.
- the three distorted images are also obtained by rotating and translating the standard image. In each distorted image, a complete or partial target object-bridge can be seen.
- This step is performed before the user uses the augmented reality service, and the obtained training model parameters are stored in advance in the client.
- the training model parameters are read for gesture determination of the target object.
- Step 202 Acquire a real-time image of the target object, and identify at least one first image block from the real-time image.
- the user is in the above scenario, and wants to use the augmented reality service.
- the real-time image of the target object is captured by the camera device on the terminal device where the client is located, and the real-time image is transmitted to the client.
- the client identifies at least one first image block from the real-time image, wherein the first image block is a partial image of the real-time image, and the method for identifying includes but is not limited to the following steps:
- Step 2021 Perform feature detection on the real-time image to acquire a plurality of local features.
- a local feature refers to something in the image that is different from its surroundings. It describes an area that makes it highly distinguishable.
- Step 2022 For each local feature, if it is determined that the image contrast of the local feature is higher than a preset contrast threshold and the local feature is not an edge of the image, the local feature is determined as the first image block.
- contrast refers to the measurement of different brightness levels between the brightest white and the darkest black in a dark and dark area of an image, that is, the magnitude of the grayscale contrast of an image.
- the identified first image block can be highlighted from the surrounding environment, reducing ambiguity in position.
- the real-time image is a facial image
- the first image block is the tip of the nose, the corner of the eye, and the like.
- SIFT Scale Invariant Feature Transform
- SURF Accelerated Robust Feature
- FAST accelerated segmentation test feature recognition
- local features may also be determined based on a single determination. For example, if it is determined that the image contrast of the local feature is higher than a preset contrast threshold, the local feature is determined as the first image block. Alternatively, the local feature is not the edge of the image, and the local feature is determined as the first image block. Here, the recognition accuracy of the local features will affect the subsequent matches and the determined pose results.
- the terminal device inputs each first image block into a convolutional neural network, and the convolutional neural network outputs a label image block that matches each of the first image blocks based on the training model parameters.
- the label image block is a partial image of a standard image that matches the first image block.
- the training model parameters include weights and second image blocks identified from the standard image, and the second image block is a partial image of the standard image.
- the convolutional neural network includes multiple convolutional layers, and the weights refer to the values of the individual elements in the convolution matrix used by each convolutional layer.
- the matching method includes but is not limited to the following steps:
- step 2031 the first image block is input to the convolutional neural network, and the probability that the first image block matches each of the second image blocks is output based on the weight.
- the convolutional neural network is capable of classifying the first image block, each second image block representing a category label, and processing the first image block by weights, the output of which is the first image block and each second image block The probability of matching. This probability value represents the similarity of the first image block and the second image block.
- Step 2032 Determine a second image block corresponding to the maximum probability value as a label image block.
- Step 204 Determine a posture of the target object according to each of the first image block and the label image block that matches each of the first image blocks.
- the pose of the target object is represented by an affine transformation, that is, each label image block matches the first image block via an affine transformation.
- the affine transformation can reflect the translation and rotation of the target object relative to the camera lens, and can describe the imaging process of the target object in the 3D space to the 2D plane image.
- An affine transformation belongs to a linear transformation, that is, a general characteristic of transforming parallel lines into parallel lines and finite points mapping to finite points.
- the affine transformation on the two-dimensional Euclidean space can be expressed as:
- (x, y) and (x', y') refer to the coordinates of two points (ie, pixels) in the standard image and the real-time image, respectively.
- (a 0 , a 5 ) T is a translation vector
- a i is a real number.
- the affine transformation has 6 degrees of freedom, and the attitude estimated according to the affine transformation is also often referred to as a 6D pose.
- Translation, rotation, scaling, reflection, and clipping are all cases of affine transformation based on the specific values of the parameters in the vector.
- the matrix estimation value of the affine transformation matrix set may be determined from the affine transformation matrix set according to the least squares principle, wherein the matrix estimation value is the inverse transformation corresponding to the affine transformation matrix set angle.
- matrix estimates It can be calculated by the following formula:
- any virtual content that you want to add to the live image can be The transformation is performed, and the observation angle is consistent with the real-time image, thereby realizing the addition of the virtual content in the real-time image, and displaying the mixed image effect after the augmented reality is displayed for the user.
- the trained convolutional neural network training model parameters from the server, receiving a real-time image obtained by the user capturing the target object, identifying at least one first image block from the real-time image, and using the image block as a convolution
- the image block has strong resistance to transformation, especially translational transformation, compared to the entire image; and, no segmentation or any other prior image semantic interpretation is required.
- FIG. 5 is a schematic flow chart of a method for determining an attitude of an object in an image according to another embodiment of the present application. As shown in FIG. 5, the method includes but is not limited to the following steps:
- Step 501 Receive and store training model parameters of the trained convolutional neural network from the server.
- the server performs offline training for the target object in a specific scenario. After the training is completed, the training model parameters are sent to the client for storage, and then the client invokes the training model parameters during real-time monitoring.
- Step 502 Acquire a real-time image of the target object.
- Step 503 Identify at least one first image block from the real-time image, and input each first image block into the convolutional neural network.
- Step 504 For each first image block, output a probability that the first image block matches each second image block based on the weight, and determine a second image block corresponding to the maximum probability value as the label image block.
- Step 505 Determine, according to each of the first image blocks and the respective matched label image blocks, a matrix estimation value of the affine transformation to represent the geometric posture of the target object.
- the first image block and the matched tag image block form a matching pair, that is, (q i , p i ).
- the tradeoffs to the matching pairs may be further included prior to determining the pose. For each first image block, including but not limited to the following steps:
- Step 5051 Input the first image block into a convolutional neural network, and output a probability that the first image block matches each of the second image blocks based on the weight.
- the output layer of the convolutional neural network outputs a classification vector of 1 ⁇ M dimensions, and the elements in the vector take the value [0, 1], representing the above probability.
- the first image block whose total number of second image blocks whose probability is greater than the preset probability threshold is greater than the preset number threshold is referred to as a target image block, and the terminal device determines the target object according to the target image block and the label image block corresponding to the target image. Gesture.
- a random sampling consistency strategy can also be used to filter out mismatched pairs.
- Step 506 adding virtual content to the real-time image according to the matrix estimation value.
- the inverse process can be performed, and the virtual content is converted into the reference frame of the real-time image by affine transformation, so that the two can be superimposed to realize the augmented reality.
- the matching pair (the first image block, the label image block)
- the effective value of N is reduced, thereby reducing the computational complexity and improving the posture.
- the geometrical pose of the target object is characterized by the matrix estimation of the affine transformation. The processing is simple and easy to calculate, which further improves the time efficiency of the algorithm.
- FIG. 6 is a schematic flow chart of a method for determining an attitude of an object in an image according to an embodiment of the present application. This method can be applied to the server 103 in FIG. The method includes, but is not limited to, the following steps.
- Step 601 Acquire a standard image of the target object and multiple distortion images of the target object.
- a standard image is required to determine a plurality of label image blocks used in the classification.
- the distortion image can be acquired in various ways. For example, a plurality of distorted images are obtained by randomly photographing the same target object using an imaging device, or a plurality of distorted images are obtained by performing various types of distortion processing from a standard image. For the latter, in one embodiment, the distortion of the image can also be introduced by affine transformation. Methods for acquiring a distorted image from a standard image include, but are not limited to, the following steps:
- step 6011 a plurality of affine transformation matrices are randomly generated.
- Definition matrix Representing an affine transformation, randomly generating a plurality of affine transformation matrices according to the following formula:
- the parameters And ⁇ are uniformly generated from (- ⁇ , ⁇ ), the parameters t x and f x are uniformly generated from [0, w], w is the width of the standard image, and the parameters t y and f y are from [0, h Uniformly generated, h is the height of the standard image, and parameters ⁇ 1 and ⁇ 2 are uniformly generated from [0.5, 1.5].
- Step 6012 Perform affine transformation on the standard image using the affine transformation matrix for each affine transformation matrix to obtain a distortion image.
- I is the input standard image
- I' is the generated distortion image
- N Gaussian white noise
- the mean is ⁇
- the variance is ⁇
- Step 602 Input the standard image and the plurality of distortion images into the convolutional neural network for training, and obtain training model parameters.
- step 603 the training model parameters are sent to the client.
- the terminal device receives the real-time image obtained by the user capturing the target object through the client, and identifies at least one first image block from the real-time image; for each first image block, determines, according to the training model parameter, the first image block. Matching the label image block; and, according to each of the first image block and the respective matched label image block, determining the posture of the target object, and adding the virtual content to the real-time image according to the posture.
- the server constructs a convolutional neural network and then performs training.
- the convolutional neural network performs feature extraction by convolution operations and then performs feature mapping.
- Each computational layer of the convolutional neural network consists of multiple feature maps.
- Each feature map is a plane, and the weights of all neurons on the plane are equal, thus reducing the number of network free parameters.
- FIG. 7 is a schematic structural diagram of a convolutional neural network in an embodiment of the present application. As shown in Figure 7, the convolutional neural network consists of multiple layers of processing, namely:
- a convolution matrix is used as a filter.
- the filter convolves the input image block 700, the weight in the filter is multiplied by the corresponding pixel value in the image block, and all the results are summed. Get a sum value. This process is then repeated, and each region of the convolved image block is left to right, top to bottom, and a value is obtained for each step, and the final matrix is the feature image.
- the pooling layer is usually used after the convolution layer. Its function is to simplify the output information in the convolution layer, reduce the data dimension, reduce the computational overhead, and control over-fitting.
- a convolved feature image has a "static" property, which indicates that features that are useful in one image region are most likely to be equally applicable in another region. Therefore, in order to describe a large image, the statistics of the different positions are aggregated, that is, the pooling process. For example, calculating the average or maximum value of a particular feature on an area of an image.
- the output is a 1 ⁇ M-dimensional classification vector.
- the elements in the vector take the value [0, 1], and each dimension of the output refers to the probability that the image block belongs to the category.
- multi-layer convolution is usually used, and then the fully connected layer is used for training. That is, in Fig. 7, the 701 convolutional layer and the 702 pooled layer are combined as a combination, and a plurality of such combinations are sequentially performed.
- This network is called a deep convolutional neural network.
- the purpose of multi-layer convolution is to take into account that the features learned by a layer of convolution are often local, and the higher the number of layers, the more global the learned features are.
- the method for determining the number of convolutional layers includes, but is not limited to, the following steps: presetting the correspondence between the number of image blocks and the number of convolution layers; from the standard image Identifying at least one second image block; determining the number of convolutional layers in the convolutional neural network according to the number and correspondence of the second image blocks.
- the total number of second image blocks is 400 and the entire network includes 13 layers.
- there are four convolutional layers in which the first, fourth, seventh, and tenth layers are convolutional layers, followed by the first layer of convolutional layer to perform the maximum pooling layer and row linear rectification function (English: Rectified Linear Unit, ReLU)
- the excitation layer followed by the Relu excitation layer and the average pooling layer after the 4th layer convolutional layer, followed by the ReLU excitation layer and the average pooling layer after the 7th layer convolutional layer
- the 10th layer convolutional layer is followed by the ReLU excitation layer, and finally the fully connected layer and the soft maximum (soft-max) output layer.
- an excitation function will be called in the excitation layer to add nonlinear factors to solve the problem of linear inseparability.
- the selected excitation function is called ReLU, and its expression is:
- the value less than zero is classified as 0, so that the convolutional neural network training speed will be faster, and the problem of reducing the gradient disappears.
- the convolutional neural network also needs to determine the input samples and the ideal output samples during the training process, and then iteratively adjust the weights.
- at least one second image block is identified from the standard image; each of the distorted images is separately identified to obtain at least one third image block; and the third image block is used when the convolutional neural network performs training As an input sample, each second image block is used as an ideal output sample, and the weight is trained.
- the weight is adjusted by a back propagation algorithm.
- the backpropagation algorithm can be divided into four different parts: forward pass, loss function, reverse pass, update weight.
- the image block is input and passed through a convolutional neural network.
- all weights are randomly initialized, such as random values [0.3, 0.1, 0.4, 0.2, 0.3....]. Since the convolutional neural network cannot extract the accurate feature image through the initialized weight, it cannot give any reasonable conclusion as to which category the picture belongs to.
- the convolutional neural network update weight is helped by the loss function in backpropagation to find the desired feature image.
- MSE mean squared error
- Table 2 gives the values of the accuracy and the occupied memory of the two methods.
- the experimental data is set as follows:
- the convolutional neural network architecture given in Table 1 is used, and the size of the image block is 27 ⁇ 27, and there are 27 rows and 27 columns of pixels.
- the image block is preprocessed to have a mean of 0 and a variance of one.
- 2000 affine transformation matrices are randomly generated according to formula (4) for generating a distorted image.
- the number of second image blocks is 400, and the output vector is a classification vector of 1 ⁇ 400 dimensions.
- the number of Fern in the Ferns method is 30, and the number of features in each Fern is 12.
- the accuracy of the method given in the examples of the present application is 86%, and the accuracy of the Ferns method is 88%; for Figure 4a, Figure 4b
- the accuracy of the method given in the examples of the present application is 87%, and the accuracy of the Ferns method is 88%.
- the method given in the embodiment of the present application has substantially the same accuracy as the Ferns method.
- the method provided in the embodiment of the present application uses a convolutional neural network, and the memory usage is only 0.5557M, and the Ferns method occupies 93.75M of memory. It can be seen that the method given in the embodiment of the present application has a low method. Memory resource consumption.
- FIG. 8 is a schematic structural diagram of a client 800 in an embodiment of the present application.
- the client 800 may be a virtual device that performs the determining method of the pose of the object in the image in the embodiment of FIG. 2 and FIG. 5, and the device includes:
- An offline receiving module 810 configured to acquire, from a server, a training model parameter of a convolutional neural network of a target object;
- An online receiving module 820 configured to acquire a real-time image of the target object
- the identification module 830 is configured to identify at least one first image block from the real-time image
- a matching module 840 configured to determine, according to the training model parameter, a label image block that matches each of the first image blocks by using a convolutional neural network
- the attitude determining module 850 is configured to determine a posture of the target object according to each of the first image block and each of the first image blocks that match the label image block.
- the adding module 860 is configured to add virtual content in the real-time image according to the gesture. Among them, the adding module 860 is an optional module.
- the identification module 830 includes:
- the detecting unit 831 is configured to perform feature detection on the real-time image to obtain a plurality of local features.
- the determining unit 832 is configured to determine that the image contrast of the plurality of local features is higher than a preset contrast threshold, and the local feature that is not the edge of the image is determined as the first image block.
- the training model parameters include weights and second image blocks identified from the standard image
- the matching module 840 is further configured to input each of the first image blocks into the convolutional neural network, based on the weight The value outputs a probability that each of the first image blocks matches each of the second image blocks; acquires a quantity greater than a probability threshold in a probability corresponding to each of the first image blocks; and determines a first image block whose number is greater than a preset number A target image block; the gesture is determined according to the target image block and the label image block that matches the target image block.
- the matching module 840 is further configured to: acquire a probability that the target image block matches each of the second image blocks; determine a second image block corresponding to the largest probability among the probabilities as the target image block phase The matched label image block; the gesture is determined according to the label image block that matches the target image block and the target image block.
- each first image block is obtained by affine transformation using a affine transformation matrix for each of the first image block matching label image blocks, each affine transformation matrix forming an affine transformation Matrix set
- the attitude determination module 850 is further configured to determine a matrix estimation value of the affine transformation matrix set from the affine transformation matrix set according to a least squares principle.
- the attitude determination module 850 is further configured to calculate a matrix estimate by the following formula:
- q i is the first image block
- p i is the label image block matching q i
- A is the affine transformation matrix
- represents the square of the modulus value of G
- G is the set of affine transformation matrices.
- FIG. 9 is a schematic structural diagram of a client 900 according to another embodiment of the present application.
- the client 900 may be the terminal device 102 shown in FIG. 1.
- the server 900 includes a processor 910, a memory 920, a port 930, and a bus 940.
- Processor 910 and memory 920 are interconnected by a bus 940.
- Processor 910 can receive and transmit data through port 930. among them,
- the processor 910 is configured to execute a machine readable instruction module stored by the memory 920.
- Memory 920 stores machine readable instruction modules executable by processor 910.
- the instruction modules executable by the processor 910 include an offline receiving module 921, an online receiving module 922, an identifying module 923, a matching module 924, a posture determining module 925, and an adding module 926. among them,
- the training model parameters of the convolutional neural network of the target object may be acquired from the server;
- the online receiving module 922 When the online receiving module 922 is executed by the processor 910, it may be: acquiring a real-time image of the target object;
- the identification module 923 may be executed by the processor 910 to: identify at least one first image block from the real-time image;
- the matching module 924 may be executed by the processor 910 to: determine, according to the training model parameters, a label image block that matches each of the first image blocks by using a convolutional neural network;
- the posture determining module 925 may be executed by the processor 910 to: determine a posture of the target object according to each of the first image block and each of the first image blocks that match the label image block;
- the adding module 926 when executed by the processor 910, may be: adding virtual content to the live image according to the gesture. Among them, the adding module 926 is an optional module.
- FIG. 10 is a schematic structural diagram of a server 1000 according to an embodiment of the present application.
- the server 1000 includes a virtual device for performing a method for determining an attitude of an object in an image in the embodiment of FIG. 6, the device comprising:
- the acquiring module 1010 is configured to acquire a standard image of the target object and multiple distortion images of the target object;
- a training module 1020 configured to input a standard image and a plurality of distorted images into a convolutional neural network for training, and obtain training model parameters of the convolutional neural network;
- the sending module 1030 is configured to send the training model parameter to the client, so that the terminal device acquires a real-time image of the target object through the client, and identifies at least one first image block from the real-time image; and convolves according to the training model parameter.
- the neural network determines a label image block that matches each of the first image blocks; and determines a pose of the target object based on each of the first image blocks and the label image blocks that match each of the first image blocks.
- the obtaining module 1010 is further configured to randomly generate a plurality of affine transformation matrices; and perform affine transformation on the standard image using each affine transformation matrix to obtain each distorted image.
- the convolutional neural network includes a plurality of convolution layers
- the training module 1020 is further configured to: identify at least one second image block from the standard image; according to the number of the second image blocks, and The correspondence between the preset second image block and the number of convolution layers determines the number of convolution layers in the convolutional neural network.
- the training module 1010 is further configured to: identify at least one second image block from the standard image; separately identify each of the distorted images to obtain at least one third image block; When the network is training, the third image block is taken as an input sample, and the second image block is used as an ideal output sample, and the weight is trained.
- FIG. 11 is a schematic structural diagram of a server 1100 according to another embodiment of the present application.
- the server 1100 includes a processor 1110, a memory 1120, a port 1130, and a bus 1140.
- the processor 1110 and the memory 1120 are interconnected by a bus 1140.
- the processor 1110 can receive and transmit data through the port 1130. among them,
- the processor 1110 is configured to execute a machine readable instruction module stored by the memory 1120.
- the memory 1120 stores a machine readable instruction module executable by the processor 1110.
- the instruction module executable by the processor 1110 includes an acquisition module 1121, a training module 1122, and a transmission module 1123. among them,
- the acquiring module 1121 may be executed by the processor 1110 to: acquire a standard image of the target object and multiple distortion images;
- the training module 1122 When the training module 1122 is executed by the processor 1110, the standard image and the plurality of distorted images are input to the convolutional neural network for training, and the training model parameters are obtained;
- the training model parameter may be sent to the client, so that the terminal device acquires a real-time image of the target object through the client, and identifies at least one first image block from the real-time image;
- the model parameter determines a label image block that matches each of the first image blocks by a convolutional neural network; and determines a pose of the target object according to each of the first image block and the label image block that matches each of the first image blocks.
- each functional module in each embodiment of the present application may be integrated into one processing unit, or each module may exist physically separately, or two or more modules may be integrated into one unit.
- the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
- the storage medium can use any type of recording method, such as paper storage medium (such as paper tape, etc.), magnetic storage medium (such as floppy disk, hard disk, flash memory, etc.), optical storage medium (such as CD-ROM, etc.), magneto-optical storage medium (such as MO, etc.).
- paper storage medium such as paper tape, etc.
- magnetic storage medium such as floppy disk, hard disk, flash memory, etc.
- optical storage medium such as CD-ROM, etc.
- magneto-optical storage medium Such as MO, etc.
- the present application also discloses a storage medium having stored therein at least one piece of data processing program for performing any of the above-described embodiments of the present application.
- the storage medium has at least one instruction, code set or instruction set, and the at least one instruction, code set or instruction set is loaded and executed by the processor to implement any one of the foregoing methods of the present application.
- a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
- the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Human Computer Interaction (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
Description
层数 | 类型 | 输入矩阵 | 输出矩阵 |
1 | 卷积 | 27×27 | 32×27×27 |
2 | 最大池化 | 32×27×27 | 32×14×14 |
3 | ReLU | 32×14×14 | 32×14×14 |
4 | 卷积 | 32×14×14 | 32×14×14 |
5 | ReLU | 32×14×14 | 32×14×14 |
6 | 平均池化 | 32×14×14 | 32×7×7 |
7 | 卷积 | 32×7×7 | 64×7×7 |
8 | ReLU | 64×7×7 | 64×7×7 |
9 | 平均池化 | 64×7×7 | 64×4×4 |
10 | 卷积 | 64×4×4 | 64×1×1 |
11 | ReLU | 64×1×1 | 64×1×1 |
12 | 全连接 | 64×1×1 | 1×400 |
13 | Soft-max输出 | 1×400 | 1×400 |
Claims (25)
- 一种图像中物体姿态的确定方法,其特征在于,所述方法应用于终端设备中,所述方法包括:从服务器获取目标物体的卷积神经网络的训练模型参数;获取所述目标物体的实时图像,从所述实时图像中识别出至少一个第一图像块,所述第一图像块是所述实时图像的局部图像;根据所述训练模型参数,通过所述卷积神经网络确定与每个所述第一图像块相匹配的标签图像块,所述标签图像块是所述目标物体的标准图像的局部图像;根据所述每个第一图像块和所述每个第一图像块相匹配的标签图像块,确定所述目标物体的姿态。
- 根据权利要求1所述的方法,其特征在于,所述从所述实时图像中识别出至少一个第一图像块,包括:对所述实时图像进行特征检测,获取多个局部特征;将所述多个局部特征中图像对比度高于预设的对比度阈值,且并非图像的边缘的局部特征确定为所述第一图像块。
- 根据权利要求1所述的方法,其特征在于,所述训练模型参数包括权值和从所述标准图像中识别出来的第二图像块,所述第二图像块是所述标准图像的局部图像,所述根据每个所述第一图像块和所述第一图像块相匹配的标签图像块,确定所述目标物体的姿态,包括:将每个所述第一图像块输入所述卷积神经网络,基于所述权值输出所述每个第一图像块与每个所述第二图像块相匹配的概率;获取所述每个第一图像块对应的概率中大于概率阈值的数量;将所述数量大于预设个数的第一图像块确定为目标图像块;根据所述目标图像块和所述目标图像块相匹配的标签图像块,确定所述姿态。
- 根据权利要求3所述的方法,其特征在于,所述根据所述目标图像块和 所述目标图像块相匹配的标签图像块,确定所述姿态,包括:获取所述目标图像块与所述每个第二图像块相匹配的概率;将所述概率中最大的概率对应的第二图像块确定为所述目标图像块的标签图像块;根据所述目标图像块和所述目标图像块相匹配的标签图像块,确定所述姿态。
- 根据权利要求1至4任一所述的方法,其特征在于,所述每个第一图像块是所述每个第一图像块相匹配的标签图像块使用仿射变换矩阵进行仿射变换得到的,每个所述仿射变换矩阵构成仿射变换矩阵集合;所述根据所述每个第一图像块和所述每个第一图像块相匹配的标签图像块,确定所述目标物体的姿态,包括:根据最小二乘原则从所述仿射变换矩阵集合中确定出所述仿射变换矩阵集合的矩阵估计值,所述矩阵估计值是所述仿射变换矩阵对应的逆变换的幅角。
- 根据权利要求1至4任一所述的方法,其特征在于,所述方法还包括:根据所述姿态在所述实时图像中增加并显示虚拟内容。
- 一种图像中物体姿态的确定方法,其特征在于,所述方法应用于服务器中,所述方法包括:获取目标物体的标准图像以及多张所述目标物体的畸变图像;将所述标准图像和所述多张畸变图像输入到卷积神经网络进行训练,获得 所述卷积神经网络的训练模型参数;将所述训练模型参数发送给终端设备,以使所述终端设备获取所述目标物体的实时图像,从所述实时图像中识别出至少一个第一图像块,所述第一图像块是所述实时图像的局部的图像;根据所述训练模型参数,通过所述卷积神经网络确定与每个所述第一图像块相匹配的标签图像块,所述标签图像块是所述标准图像的局部图像;根据所述每个第一图像块和所述每个第一图像块相匹配的标签图像块,确定所述目标物体的姿态。
- 根据权利要求8所述的方法,其特征在于,所述获取目标物体的标准图像以及多张所述目标物体的畸变图像,包括:随机产生多个仿射变换矩阵;使用每个所述仿射变换矩阵对所述标准图像进行仿射变换,得到每张所述畸变图像。
- 根据权利要求8所述的方法,其特征在于,所述卷积神经网络包括多个卷积层,所述方法还包括:从所述标准图像中识别出至少一个第二图像块,所述第二图像块是所述标准图像的局部图像;根据所述第二图像块的个数,以及预设的第二图像块与卷积层个数的对应关系,确定所述卷积神经网络中卷积层的个数。
- 根据权利要求8至10中任一所述的方法,其特征在于,所述将所述标准图像和所述多张畸变图像输入到所述卷积神经网络进行训练,包括:从所述标准图像中识别出至少一个第二图像块,所述第二图像块是所述标准图像的局部图像;分别对所述每张畸变图像进行识别,得到至少一个第三图像块,所述第三图像块是所述畸变图像的局部图像;在所述卷积神经网络进行训练时,将所述第三图像块作为输入样本,将所述第二图像块作为理想的输出样本,训练得到所述权值。
- 一种图像中物体姿态的确定装置,其特征在于,所述装置包括:离线接收模块,用于从服务器获取目标物体的卷积神经网络的训练模型参 数;在线接收模块,用于获取所述目标物体的实时图像;识别模块,用于从所述实时图像中识别出至少一个第一图像块,所述第一图像块是所述实时图像的局部图像;匹配模块,用于根据所述训练模型参数,通过所述卷积神经网络确定与每个所述第一图像块相匹配的标签图像块,所述标签图像块是所述目标物体的标准图像的局部图像;姿态确定模块,用于根据所述每个第一图像块和所述每个第一图像块相匹配的标签图像块,确定所述目标物体的姿态。
- 根据权利要求12所述的装置,其特征在于,所述识别模块,还用于对所述实时图像进行特征检测,获取多个局部特征;将所述多个局部特征中图像对比度高于预设的对比度阈值,且并非图像的边缘的局部特征确定为所述第一图像块。
- 根据权利要求12所述的装置,其特征在于,所述训练模型参数包括权值和从所述标准图像中识别出来的第二图像块,所述第二图像块是所述标准图像的局部图像,所述匹配模块还用于,将每个所述第一图像块输入所述卷积神经网络,基于所述权值输出所述每个第一图像块与每个所述第二图像块相匹配的概率;获取所述每个第一图像块对应的概率中大于概率阈值的数量;将所述数量大于预设个数的第一图像块确定为目标图像块;根据所述目标图像块和与所述目标图像块相匹配的标签图像块,确定所述姿态。
- 根据权利要求14所述的装置,其特征在于,所述匹配模块还用于获取所述目标图像块与所述每个第二图像块相匹配的概率;将所述概率中最大的概率对应的第二图像块确定为所述目标图像块相匹配的标签图像块;根据所述目标图像块和所述目标图像块相匹配的标签图像块,确定所述姿态。
- 根据权利要求12至15任一所述的装置,其特征在于,所述每个第一图像块是所述每个第一图像块相匹配的标签图像块使用仿射变换矩阵进行仿射变换得到的,每个所述仿射变换矩阵构成仿射变换矩阵集合;所述姿态确定模块还用于,根据最小二乘原则从所述仿射变换矩阵集合中确定出所述仿射变换矩阵集合的矩阵估计值,所述矩阵估计值是所述仿射变换矩阵对应的逆变换的幅角。
- 根据权利要求12至15任一所述的装置,其特征在于,所述装置还包括增加模块;所述增加模块,用于根据所述姿态在所述实时图像中增加并显示虚拟内容。
- 一种图像中物体姿态的确定装置,其特征在于,所述装置包括:获取模块,用于获取目标物体的标准图像以及所述目标物体多张畸变图像;训练模块,用于将所述标准图像和所述多张畸变图像输入到卷积神经网络进行训练,获得所述卷积神经网络的训练模型参数;发送模块,用于将所述训练模型参数发送给终端设备,以使所述终端设备获取所述目标物体的实时图像,从所述实时图像中识别出至少一个第一图像块,所述第一图像块是所述实时图像的局部的图像;根据所述训练模型参数,通过所述卷积神经网络确定与每个所述第一图像块相匹配的标签图像块,所述标签图像块是所述标准图像的局部图像;根据所述每个第一图像块和所述每个第一图像块相匹配的标签图像块,确定所述目标物体的姿态。
- 根据权利要求19所述的装置,其特征在于,所述获取模块还用于,随机产生多个仿射变换矩阵;使用每个所述仿射变换矩阵对所述标准图像进行仿射变换,得到每张所述畸变图像。
- 根据权利要求19所述的装置,其特征在于,所述卷积神经网络包括多个卷积层,所述训练模块,还用于从所述标准图像中识别出至少一个第二图像块,所述第二图像块是所述标准图像的局部图像;根据所述第二图像块的个数,以及预设的第二图像块与卷积层个数的对应关系,确定所述卷积神经网络中卷积层的个数。
- 根据权利要求19至21所述的装置,其特征在于,所述训练模块,还用于从所述标准图像中识别出至少一个第二图像块,所述第二图像块是所述标准图像的局部图像;分别对所述每张畸变图像进行识别,得到至少一个第三图像块,所述第三图像块是所述畸变图像的局部图像;在所述卷积神经网络进行训练时,将所述第三图像块作为输入样本,将所述第二图像块作为理想的输出样本,训练得到所述权值。
- 一种终端设备,其特征在于,所述终端设备包括处理器和存储器,所述存储器中存储有至少一条指令,所述指令由所述处理器加载并执行以实现如权利要求1至7任一所述的图像中物体姿态的确定方法。
- 一种服务器,其特征在于,所述服务器包括处理器和存储器,所述存储器中存储有至少一条指令,所述指令由所述处理器加载并执行以实现如权利要求8至11任一所述的图像中物体姿态的确定方法。
- 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至9任一所述的图像中物体姿态的确定方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020197030205A KR102319177B1 (ko) | 2017-07-14 | 2018-07-10 | 이미지 내의 객체 자세를 결정하는 방법 및 장치, 장비, 및 저장 매체 |
JP2019541339A JP6789402B2 (ja) | 2017-07-14 | 2018-07-10 | 画像内の物体の姿の確定方法、装置、設備及び記憶媒体 |
EP18832199.6A EP3576017A4 (en) | 2017-07-14 | 2018-07-10 | METHOD, DEVICE AND DEVICE FOR DETERMINING THE POSITION OF AN OBJECT ON AN IMAGE AND STORAGE MEDIUM |
US16/531,434 US11107232B2 (en) | 2017-07-14 | 2019-08-05 | Method and apparatus for determining object posture in image, device, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710573908.5A CN107330439B (zh) | 2017-07-14 | 2017-07-14 | 一种图像中物体姿态的确定方法、客户端及服务器 |
CN201710573908.5 | 2017-07-14 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/531,434 Continuation US11107232B2 (en) | 2017-07-14 | 2019-08-05 | Method and apparatus for determining object posture in image, device, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019011249A1 true WO2019011249A1 (zh) | 2019-01-17 |
Family
ID=60227213
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/095191 WO2019011249A1 (zh) | 2017-07-14 | 2018-07-10 | 一种图像中物体姿态的确定方法、装置、设备及存储介质 |
Country Status (6)
Country | Link |
---|---|
US (1) | US11107232B2 (zh) |
EP (1) | EP3576017A4 (zh) |
JP (1) | JP6789402B2 (zh) |
KR (1) | KR102319177B1 (zh) |
CN (1) | CN107330439B (zh) |
WO (1) | WO2019011249A1 (zh) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109903375A (zh) * | 2019-02-21 | 2019-06-18 | Oppo广东移动通信有限公司 | 模型生成方法、装置、存储介质及电子设备 |
CN110751223A (zh) * | 2019-10-25 | 2020-02-04 | 北京达佳互联信息技术有限公司 | 一种图像匹配方法、装置、电子设备及存储介质 |
CN111507908A (zh) * | 2020-03-11 | 2020-08-07 | 平安科技(深圳)有限公司 | 图像矫正处理方法、装置、存储介质及计算机设备 |
CN112446433A (zh) * | 2020-11-30 | 2021-03-05 | 北京数码视讯技术有限公司 | 训练姿势的准确度确定方法、装置及电子设备 |
WO2021098831A1 (zh) * | 2019-11-22 | 2021-05-27 | 乐鑫信息科技(上海)股份有限公司 | 一种适用于嵌入式设备的目标检测*** |
CN114037740A (zh) * | 2021-11-09 | 2022-02-11 | 北京字节跳动网络技术有限公司 | 图像数据流的处理方法、装置及电子设备 |
Families Citing this family (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10586111B2 (en) * | 2017-01-13 | 2020-03-10 | Google Llc | Using machine learning to detect which part of the screen includes embedded frames of an uploaded video |
CN107330439B (zh) | 2017-07-14 | 2022-11-04 | 腾讯科技(深圳)有限公司 | 一种图像中物体姿态的确定方法、客户端及服务器 |
CN108012156B (zh) * | 2017-11-17 | 2020-09-25 | 深圳市华尊科技股份有限公司 | 一种视频处理方法及控制平台 |
US10977755B2 (en) * | 2017-11-21 | 2021-04-13 | International Business Machines Corporation | Cognitive screening for prohibited items across multiple checkpoints by using context aware spatio-temporal analysis |
US20210374476A1 (en) * | 2017-11-24 | 2021-12-02 | Truemed Oy | Method and system for identifying authenticity of an object |
CN108449489B (zh) * | 2018-01-31 | 2020-10-23 | 维沃移动通信有限公司 | 一种柔性屏控制方法、移动终端及服务器 |
DE102018207977B4 (de) * | 2018-05-22 | 2023-11-02 | Zf Friedrichshafen Ag | Innenüberwachung für Sicherheitsgurteinstellung |
US10789696B2 (en) * | 2018-05-24 | 2020-09-29 | Tfi Digital Media Limited | Patch selection for neural network based no-reference image quality assessment |
CN109561210A (zh) * | 2018-11-26 | 2019-04-02 | 努比亚技术有限公司 | 一种交互调控方法、设备及计算机可读存储介质 |
CN109903332A (zh) * | 2019-01-08 | 2019-06-18 | 杭州电子科技大学 | 一种基于深度学习的目标姿态估计方法 |
CN110097087B (zh) * | 2019-04-04 | 2021-06-11 | 浙江科技学院 | 一种自动钢筋捆扎位置识别方法 |
CN110232411B (zh) * | 2019-05-30 | 2022-08-23 | 北京百度网讯科技有限公司 | 模型蒸馏实现方法、装置、***、计算机设备及存储介质 |
CN110263918A (zh) * | 2019-06-17 | 2019-09-20 | 北京字节跳动网络技术有限公司 | 训练卷积神经网络的方法、装置、电子设备和计算机可读存储介质 |
US10922877B2 (en) * | 2019-07-01 | 2021-02-16 | Samsung Electronics Co., Ltd. | Higher-order function networks for learning composable three-dimensional (3D) object and operating method thereof |
US11576794B2 (en) | 2019-07-02 | 2023-02-14 | Wuhan United Imaging Healthcare Co., Ltd. | Systems and methods for orthosis design |
CN110327146A (zh) * | 2019-07-02 | 2019-10-15 | 武汉联影医疗科技有限公司 | 一种矫形器设计方法、装置和服务器 |
CN110443149A (zh) * | 2019-07-10 | 2019-11-12 | 安徽万维美思信息科技有限公司 | 目标物体搜索方法、***及存储介质 |
CN112308103B (zh) * | 2019-08-02 | 2023-10-20 | 杭州海康威视数字技术股份有限公司 | 生成训练样本的方法和装置 |
CN110610173A (zh) * | 2019-10-16 | 2019-12-24 | 电子科技大学 | 基于Mobilenet的羽毛球动作分析***及方法 |
CN111194000B (zh) * | 2020-01-07 | 2021-01-26 | 东南大学 | 基于蓝牙融合混合滤波与神经网络的测距方法与*** |
CN111734974B (zh) * | 2020-01-22 | 2022-06-03 | 中山明易智能家居科技有限公司 | 一种具有坐姿提醒功能的智能台灯 |
CN111402399B (zh) * | 2020-03-10 | 2024-03-05 | 广州虎牙科技有限公司 | 人脸驱动和直播方法、装置、电子设备及存储介质 |
CN111462239B (zh) * | 2020-04-03 | 2023-04-14 | 清华大学 | 姿态编码器训练及姿态估计方法及装置 |
KR102466978B1 (ko) | 2020-04-23 | 2022-11-14 | 엔에이치엔클라우드 주식회사 | 딥러닝 기반 가상 이미지 생성방법 및 시스템 |
CN111553419B (zh) * | 2020-04-28 | 2022-09-09 | 腾讯科技(深圳)有限公司 | 一种图像识别方法、装置、设备以及可读存储介质 |
CN111553420B (zh) * | 2020-04-28 | 2023-08-15 | 北京邮电大学 | 一种基于神经网络的x线影像识别方法及装置 |
CN111638797A (zh) * | 2020-06-07 | 2020-09-08 | 浙江商汤科技开发有限公司 | 一种展示控制方法及装置 |
CN112288816B (zh) * | 2020-11-16 | 2024-05-17 | Oppo广东移动通信有限公司 | 位姿优化方法、位姿优化装置、存储介质与电子设备 |
CN112200862B (zh) * | 2020-12-01 | 2021-04-13 | 北京达佳互联信息技术有限公司 | 目标检测模型的训练方法、目标检测方法及装置 |
CN113034439B (zh) * | 2021-03-03 | 2021-11-23 | 北京交通大学 | 一种高速铁路声屏障缺损检测方法及装置 |
CN114819149B (zh) * | 2022-06-28 | 2022-09-13 | 深圳比特微电子科技有限公司 | 基于变换神经网络的数据处理方法、装置和介质 |
CN116051486B (zh) * | 2022-12-29 | 2024-07-02 | 抖音视界有限公司 | 内窥镜图像识别模型的训练方法、图像识别方法及装置 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103927534A (zh) * | 2014-04-26 | 2014-07-16 | 无锡信捷电气股份有限公司 | 一种基于卷积神经网络的喷码字符在线视觉检测方法 |
CN104268538A (zh) * | 2014-10-13 | 2015-01-07 | 江南大学 | 一种易拉罐点阵喷码字符在线视觉检测方法 |
CN105512676A (zh) * | 2015-11-30 | 2016-04-20 | 华南理工大学 | 一种智能终端上的食物识别方法 |
US20160170492A1 (en) * | 2014-12-15 | 2016-06-16 | Aaron DeBattista | Technologies for robust two-dimensional gesture recognition |
CN106683091A (zh) * | 2017-01-06 | 2017-05-17 | 北京理工大学 | 一种基于深度卷积神经网络的目标分类及姿态检测方法 |
CN107330439A (zh) * | 2017-07-14 | 2017-11-07 | 腾讯科技(深圳)有限公司 | 一种图像中物体姿态的确定方法、客户端及服务器 |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4126541B2 (ja) * | 2002-11-28 | 2008-07-30 | 富士ゼロックス株式会社 | 画像処理装置及び画像処理方法、画像処理プログラム、記憶媒体 |
JP4196302B2 (ja) * | 2006-06-19 | 2008-12-17 | ソニー株式会社 | 情報処理装置および方法、並びにプログラム |
EP2553657A4 (en) * | 2010-04-02 | 2015-04-15 | Nokia Corp | METHODS AND APPARATUS FOR FACIAL DETECTION |
CN102324043B (zh) * | 2011-09-07 | 2013-12-18 | 北京邮电大学 | 基于dct的特征描述算子及优化空间量化的图像匹配方法 |
AU2011253779A1 (en) * | 2011-12-01 | 2013-06-20 | Canon Kabushiki Kaisha | Estimation of shift and small image distortion |
US9235780B2 (en) * | 2013-01-02 | 2016-01-12 | Samsung Electronics Co., Ltd. | Robust keypoint feature selection for visual search with self matching score |
US20140204013A1 (en) * | 2013-01-18 | 2014-07-24 | Microsoft Corporation | Part and state detection for gesture recognition |
KR102221152B1 (ko) * | 2014-03-18 | 2021-02-26 | 에스케이플래닛 주식회사 | 객체 자세 기반 연출 효과 제공 장치 및 방법, 그리고 이를 위한 컴퓨터 프로그램이 기록된 기록매체 |
KR102449533B1 (ko) * | 2015-05-28 | 2022-10-04 | 삼성전자주식회사 | 전자 장치 및 전자 장치에서 어플리케이션의 실행을 제어하는 방법 |
JP2017059207A (ja) * | 2015-09-18 | 2017-03-23 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | 画像認識方法 |
CN105718960B (zh) * | 2016-01-27 | 2019-01-04 | 北京工业大学 | 基于卷积神经网络和空间金字塔匹配的图像排序模型 |
CN106845440B (zh) * | 2017-02-13 | 2020-04-10 | 山东万腾电子科技有限公司 | 一种增强现实图像处理方法及*** |
CN107038681B (zh) * | 2017-05-31 | 2020-01-10 | Oppo广东移动通信有限公司 | 图像虚化方法、装置、计算机可读存储介质和计算机设备 |
US10706535B2 (en) * | 2017-09-08 | 2020-07-07 | International Business Machines Corporation | Tissue staining quality determination |
-
2017
- 2017-07-14 CN CN201710573908.5A patent/CN107330439B/zh active Active
-
2018
- 2018-07-10 KR KR1020197030205A patent/KR102319177B1/ko active IP Right Grant
- 2018-07-10 EP EP18832199.6A patent/EP3576017A4/en active Pending
- 2018-07-10 JP JP2019541339A patent/JP6789402B2/ja active Active
- 2018-07-10 WO PCT/CN2018/095191 patent/WO2019011249A1/zh unknown
-
2019
- 2019-08-05 US US16/531,434 patent/US11107232B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103927534A (zh) * | 2014-04-26 | 2014-07-16 | 无锡信捷电气股份有限公司 | 一种基于卷积神经网络的喷码字符在线视觉检测方法 |
CN104268538A (zh) * | 2014-10-13 | 2015-01-07 | 江南大学 | 一种易拉罐点阵喷码字符在线视觉检测方法 |
US20160170492A1 (en) * | 2014-12-15 | 2016-06-16 | Aaron DeBattista | Technologies for robust two-dimensional gesture recognition |
CN105512676A (zh) * | 2015-11-30 | 2016-04-20 | 华南理工大学 | 一种智能终端上的食物识别方法 |
CN106683091A (zh) * | 2017-01-06 | 2017-05-17 | 北京理工大学 | 一种基于深度卷积神经网络的目标分类及姿态检测方法 |
CN107330439A (zh) * | 2017-07-14 | 2017-11-07 | 腾讯科技(深圳)有限公司 | 一种图像中物体姿态的确定方法、客户端及服务器 |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109903375A (zh) * | 2019-02-21 | 2019-06-18 | Oppo广东移动通信有限公司 | 模型生成方法、装置、存储介质及电子设备 |
CN109903375B (zh) * | 2019-02-21 | 2023-06-06 | Oppo广东移动通信有限公司 | 模型生成方法、装置、存储介质及电子设备 |
CN110751223A (zh) * | 2019-10-25 | 2020-02-04 | 北京达佳互联信息技术有限公司 | 一种图像匹配方法、装置、电子设备及存储介质 |
CN110751223B (zh) * | 2019-10-25 | 2022-09-30 | 北京达佳互联信息技术有限公司 | 一种图像匹配方法、装置、电子设备及存储介质 |
WO2021098831A1 (zh) * | 2019-11-22 | 2021-05-27 | 乐鑫信息科技(上海)股份有限公司 | 一种适用于嵌入式设备的目标检测*** |
CN111507908A (zh) * | 2020-03-11 | 2020-08-07 | 平安科技(深圳)有限公司 | 图像矫正处理方法、装置、存储介质及计算机设备 |
CN111507908B (zh) * | 2020-03-11 | 2023-10-20 | 平安科技(深圳)有限公司 | 图像矫正处理方法、装置、存储介质及计算机设备 |
CN112446433A (zh) * | 2020-11-30 | 2021-03-05 | 北京数码视讯技术有限公司 | 训练姿势的准确度确定方法、装置及电子设备 |
CN114037740A (zh) * | 2021-11-09 | 2022-02-11 | 北京字节跳动网络技术有限公司 | 图像数据流的处理方法、装置及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
JP6789402B2 (ja) | 2020-11-25 |
KR20190128686A (ko) | 2019-11-18 |
CN107330439B (zh) | 2022-11-04 |
EP3576017A4 (en) | 2020-12-30 |
US20190355147A1 (en) | 2019-11-21 |
KR102319177B1 (ko) | 2021-10-28 |
JP2020507850A (ja) | 2020-03-12 |
CN107330439A (zh) | 2017-11-07 |
US11107232B2 (en) | 2021-08-31 |
EP3576017A1 (en) | 2019-12-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019011249A1 (zh) | 一种图像中物体姿态的确定方法、装置、设备及存储介质 | |
US11481869B2 (en) | Cross-domain image translation | |
US10769411B2 (en) | Pose estimation and model retrieval for objects in images | |
CN108229296B (zh) | 人脸皮肤属性识别方法和装置、电子设备、存储介质 | |
US10204423B2 (en) | Visual odometry using object priors | |
US10373380B2 (en) | 3-dimensional scene analysis for augmented reality operations | |
CN109683699B (zh) | 基于深度学习实现增强现实的方法、装置及移动终端 | |
TWI587205B (zh) | Method and system of three - dimensional interaction based on identification code | |
WO2018137623A1 (zh) | 图像处理方法、装置以及电子设备 | |
WO2019020075A1 (zh) | 图像处理方法、装置、存储介质、计算机程序和电子设备 | |
US10726599B2 (en) | Realistic augmentation of images and videos with graphics | |
US20200111234A1 (en) | Dual-view angle image calibration method and apparatus, storage medium and electronic device | |
WO2020134818A1 (zh) | 图像处理方法及相关产品 | |
CN113688907B (zh) | 模型训练、视频处理方法,装置,设备以及存储介质 | |
CN111459269B (zh) | 一种增强现实显示方法、***及计算机可读存储介质 | |
US20210334569A1 (en) | Image depth determining method and living body identification method, circuit, device, and medium | |
CN113793370B (zh) | 三维点云配准方法、装置、电子设备及可读介质 | |
CN114627173A (zh) | 通过差分神经渲染进行对象检测的数据增强 | |
WO2022052782A1 (zh) | 图像的处理方法及相关设备 | |
EP4107650A1 (en) | Systems and methods for object detection including pose and size estimation | |
CN113436251B (zh) | 一种基于改进的yolo6d算法的位姿估计***及方法 | |
WO2022063321A1 (zh) | 图像处理方法、装置、设备及存储介质 | |
CN112102145A (zh) | 图像处理方法及装置 | |
CN111260544B (zh) | 数据处理方法及装置、电子设备和计算机存储介质 | |
US11769263B2 (en) | Three-dimensional scan registration with deformable models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18832199 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2019541339 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2018832199 Country of ref document: EP Effective date: 20190829 |
|
ENP | Entry into the national phase |
Ref document number: 20197030205 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |