WO2024113290A1 - 图像处理方法、装置、交互设备、电子设备和存储介质 - Google Patents

图像处理方法、装置、交互设备、电子设备和存储介质 Download PDF

Info

Publication number
WO2024113290A1
WO2024113290A1 PCT/CN2022/135733 CN2022135733W WO2024113290A1 WO 2024113290 A1 WO2024113290 A1 WO 2024113290A1 CN 2022135733 W CN2022135733 W CN 2022135733W WO 2024113290 A1 WO2024113290 A1 WO 2024113290A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional
target
key points
face
coordinate system
Prior art date
Application number
PCT/CN2022/135733
Other languages
English (en)
French (fr)
Inventor
马思研
张�浩
李鑫恺
吕耀宇
李言
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to PCT/CN2022/135733 priority Critical patent/WO2024113290A1/zh
Publication of WO2024113290A1 publication Critical patent/WO2024113290A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G3/00Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes
    • G09G3/20Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes for presentation of an assembly of a number of characters, e.g. a page, by composing the assembly by combination of individual elements arranged in a matrix no fixed position being assigned to or needed to be assigned to the individual characters or partial characters
    • G09G3/34Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes for presentation of an assembly of a number of characters, e.g. a page, by composing the assembly by combination of individual elements arranged in a matrix no fixed position being assigned to or needed to be assigned to the individual characters or partial characters by control of light from an independent source
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/302Image reproducers for viewing without the aid of special glasses, i.e. using autostereoscopic displays

Definitions

  • the present disclosure relates to the field of image processing technology, and in particular to an image processing method, apparatus, interactive device, electronic device, storage medium and computer program product.
  • Augmented reality (AR) devices, virtual reality (VR) devices, and three-dimensional (3D) screen interactive devices usually need to obtain the relative position of the user's face in real time, and adjust the display effect according to the relative position of the face to provide users with a more realistic experience.
  • a naked-eye 3D screen can adjust the opening and closing of the screen's internal grating according to the real-time position of the user's face, head, or pupil, so as to present the user with the best naked-eye 3D viewing effect under the current viewing position.
  • the present disclosure provides a method, apparatus, interactive device, electronic device, storage medium and computer program product for image processing of a display panel.
  • the present disclosure provides an image processing method, including: constructing an initial three-dimensional face template using multiple sample face images; iteratively optimizing the initial three-dimensional face template using the face image of a target object to obtain a target three-dimensional face template; and determining the face posture of the target object at the current moment based on the correspondence between the face image of the target object at the current moment and the target three-dimensional face template.
  • the method of using multiple sample face images to construct an initial three-dimensional face template includes: obtaining multiple three-dimensional sample key points from each sample face image of the multiple sample face images; determining an average three-dimensional face template based on the multiple three-dimensional sample key points of the multiple sample face images; determining a feature matrix of the multiple face sample images using the average three-dimensional face template; and constructing an initial three-dimensional face template based on an iteration parameter, the average three-dimensional face template and the feature matrix.
  • determining a feature matrix of multiple face sample images includes: using the average three-dimensional face template to decentralize multiple three-dimensional sample key points of multiple sample face images to obtain a covariance matrix; calculating multiple eigenvalues of the covariance matrix and multiple eigenvectors corresponding to the multiple eigenvalues; determining multiple valid eigenvectors from the multiple eigenvectors based on contribution values of the multiple eigenvalues to the linear projection in the covariance matrix, wherein the sum of contribution values of the multiple eigenvalues corresponding to the multiple valid eigenvectors is greater than a preset contribution value; and constructing a feature matrix based on the multiple valid eigenvectors.
  • the target three-dimensional face template is iteratively optimized using the face image of the target object to obtain the target three-dimensional face template, including: obtaining multiple two-dimensional target key points from the face image of the target object; determining multiple three-dimensional key points from the initial three-dimensional face template; projecting the multiple three-dimensional key points into multiple two-dimensional projection key points; calculating the average error between the multiple two-dimensional projection key points and the multiple two-dimensional target key points; and iteratively optimizing the initial three-dimensional face template based on the average error to obtain the target three-dimensional face template.
  • projecting multiple three-dimensional key points into multiple two-dimensional projection key points includes: constructing a weak perspective projection model according to the coordinate values of the three-dimensional key points, the scaling scale, the coordinate system rotation matrix and the center point offset vector of the pixel coordinate system; and projecting the multiple three-dimensional key points into multiple two-dimensional projection key points through the weak perspective projection model.
  • the weak perspective projection model includes projecting a plurality of three-dimensional key points into a plurality of two-dimensional projection key points according to the following formula:
  • x and y are the coordinate values of the x-axis and y-axis of the two-dimensional projection key point in the pixel coordinate system
  • X, Y and Z are the coordinate values of the x-axis, y-axis and z-axis of the three-dimensional key point in the coordinate system of the target object
  • tx and ty are the offset vectors of the pixel coordinate system origin relative to the camera coordinate system origin on the x-axis and y-axis respectively.
  • the initial three-dimensional face template is iteratively optimized to obtain a target three-dimensional face template, including: constructing an iterative model based on a weak perspective projection model and iteration parameters; determining a mapping function between the iterative model and multiple two-dimensional projection key points; calculating the Jacobian matrix of the mapping function to obtain the iteratively optimized two-dimensional iteration key points; calculating the average error based on the two-dimensional iteration key points and multiple two-dimensional target key points from the face image; when it is determined that the average error does not meet the convergence conditions, updating the parameters of the iterative model along the descending gradient direction of the Jacobian matrix to obtain the updated iterative model, and returning the operation of determining the mapping function between the iterative model and multiple two-dimensional projection key points; and when it is determined that the average error meets the convergence conditions, determining the iteration parameters, and constructing the target three-dimensional face template based on the iteration parameters.
  • mapping function includes the following formula:
  • updating the parameters of the iterative model along the descending gradient direction of the Jacobian matrix to obtain the updated iterative model includes: calculating the parameter change of the iterative model according to the descending gradient direction of the Jacobian matrix and the average error; and updating the parameters of the iterative model according to the parameter change to obtain the updated iterative model.
  • updating the parameters of the iterative model according to the parameter change amount to obtain the updated iterative model includes updating the iterative model according to the following formula:
  • calculating the average error between multiple two-dimensional projection key points and multiple two-dimensional target key points includes: calculating reprojection errors based on the multiple two-dimensional projection key points and the multiple two-dimensional target key points; and calculating the average error based on the reprojection errors.
  • calculating the average error includes calculating the average error according to the following formula:
  • error is the average error
  • proj err is the reprojection error
  • proj err landmarks_2D-current_shape_2D, where landmarks_2D is the coordinate value of the two-dimensional target key point, and current_shape_2D is the coordinate value of the two-dimensional projection key point.
  • determining the current facial posture of the target object includes: determining multiple preset three-dimensional key points of the target object from multiple three-dimensional key points of the target three-dimensional facial template, the multiple preset three-dimensional key points are in the target coordinate system, and the multiple preset three-dimensional key points correspond to multiple specified two-dimensional key points of the current facial image of the target object; according to the correspondence between the pixel coordinate system of the facial image from the camera and the target coordinate system, determining the transformation matrix between the camera coordinate system and the target coordinate system; and according to the transformation matrix, converting the multiple preset three-dimensional key points into multiple target three-dimensional key points, the multiple target three-dimensional key points are in the camera coordinate system; and determining the current facial posture of the target object according to the multiple target three-dimensional key points.
  • the transformation matrix between the camera coordinate system and the target coordinate system is determined, including determining the transformation matrix according to the following formula:
  • c is the scale of the camera
  • x and y are the coordinate values of the two-dimensional projection key point on the x-axis and y-axis of the pixel coordinate system
  • X, Y and Z are the coordinate values of the preset three-dimensional key point on the x-axis, y-axis and z-axis of the target coordinate system
  • K is the camera intrinsic parameter matrix
  • obtaining multiple two-dimensional target key points from a face image of a target object includes: performing distortion correction on the face image to obtain a corrected face image; and determining multiple two-dimensional target key points from the corrected face image using a key point detection algorithm.
  • performing distortion correction on a face image to obtain a corrected face image includes:
  • the face image is distorted according to the following formula:
  • x0 and y0 are the coordinate values of any coordinate point on the face image on the x-axis and y-axis
  • x and y are the coordinate values of any coordinate point on the face image after correction on the x-axis and y-axis
  • r is the distance between the center point of the face image and the coordinate point (x, y)
  • k1 , k2 and k3 are radial distortion coefficients
  • p1 and p2 are tangential distortion coefficients.
  • the present disclosure provides an image processing device, including: a construction module, used to construct an initial three-dimensional face template using multiple sample face images; an iteration module, used to iteratively optimize the initial three-dimensional face template using the face image of the target object to obtain the target three-dimensional face template; and a determination module, used to determine the current face posture of the target object based on the correspondence between the current face image of the target object and the target three-dimensional face template.
  • the present disclosure provides an interactive device, including: a camera, used to obtain a facial image of a target object; a processor, electrically connected to the camera, used to: use the facial image to iteratively optimize an initial three-dimensional facial template to obtain a target three-dimensional facial template; use the target three-dimensional facial template to estimate the facial posture to obtain the pupil coordinates of the target object; and calculate a grating opening and closing sequence based on the pupil coordinates; a driving circuit, electrically connected to the processor, used to control an output interface to output a grating opening and closing sequence; and a screen, electrically connected to the driving circuit, used to control the opening and closing of the grating in the screen according to the grating opening and closing sequence.
  • the present disclosure provides an electronic device, comprising: one or more processors; and a memory for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors implement the method described in the embodiments of the present disclosure.
  • the present disclosure provides a computer-readable storage medium having executable instructions stored thereon, which, when executed by a processor, enables the processor to implement the method described in an embodiment of the present disclosure.
  • the present disclosure provides a computer program product, including a computer program, which implements the method described in the embodiments of the present disclosure when the computer program is executed by a processor.
  • FIG1 shows a flow chart of an image processing method according to an embodiment of the present disclosure
  • FIG2A shows an application scenario diagram of an image processing method according to an embodiment of the present disclosure
  • FIG2B shows a schematic flow chart of an image processing method according to an embodiment of the present disclosure
  • FIG2C is a schematic diagram showing the distribution of key points of a face image according to an embodiment of the present disclosure.
  • FIG3 shows a flowchart of constructing an initial three-dimensional face template according to an embodiment of the present disclosure
  • FIG4 shows a flowchart of iterating an initial 3D face template according to an embodiment of the present disclosure
  • FIG5 shows a flowchart of iterating an initial 3D face template according to another embodiment of the present disclosure
  • FIG6A shows a flowchart of determining the facial pose of a target object according to an embodiment of the present disclosure
  • FIG6B shows a schematic diagram of transformation from a target coordinate system to a camera coordinate system according to an embodiment of the present disclosure
  • FIG7 shows a structural block diagram of an image processing apparatus according to an embodiment of the present disclosure.
  • FIG8 shows a block diagram of an electronic device suitable for implementing an image processing method according to an embodiment of the present disclosure.
  • the user's authorization or consent is obtained before obtaining or collecting the user's personal information.
  • FIG. 1 shows a flow chart of an image processing method according to an embodiment of the present disclosure.
  • the image processing method may include the following steps S110 to S130. It should be noted that the sequence numbers of the steps in the following method are only used as representations of the steps for the purpose of description, and should not be regarded as representing the execution order of the steps. Unless explicitly stated, the method does not need to be executed in the order shown.
  • step S110 an initial three-dimensional face template is constructed using a plurality of sample face images.
  • the multiple sample face images may be face images of multiple users collected in advance.
  • the multiple sample face images may be derived from an open source Large Scale 3D Faces in-the-Wild (LS3D-W) dataset.
  • the multiple sample face images include two-dimensional face images and three-dimensional face images. From the multiple sample face images, two-dimensional coordinate points and three-dimensional coordinate points located on the sample face images may be obtained.
  • an initial 3D face template may be constructed based on multiple 2D coordinate points and 3D coordinate points from multiple sample face data.
  • the initial 3D face template may be a general 3D face template used to characterize average features of face images.
  • step S120 the initial three-dimensional face template is iteratively optimized using the face image of the target object to obtain a target three-dimensional face template.
  • the target object is a user of a head-mounted display device (e.g., AR or VR) or a 3D screen interactive device.
  • a camera is provided on the head-mounted display device or the 3D screen interactive device.
  • the head-mounted display device or the 3D screen interactive device captures the user's facial image through the camera.
  • the camera may be a depth sensor (depth camera), a binocular camera, a monocular camera, a laser radar, and the like.
  • a user's face image is captured by a monocular camera, and the face image is a two-dimensional face image.
  • the two-dimensional face image describes the facial features of the user.
  • step S130 the current face posture of the target object is determined according to the correspondence between the current face image of the target object and the target three-dimensional face template.
  • the current face image of the target object is a two-dimensional face image that can be taken by a camera at the current moment.
  • the current face position of the target object is estimated.
  • the position and posture of the corresponding pupil part is estimated in the target three-dimensional face template.
  • the face pose determined by the target three-dimensional face template is the face pose in the target coordinate system where the user is located.
  • the head mounted display device or 3D screen interactive device estimates the face pose of the user using the target face template to determine the face pose in the camera coordinate system where the camera is located.
  • a head-mounted display device or a 3D screen interactive device can determine the user's current facial posture based on the facial posture data in the camera coordinate system, such as facial orientation and facial expression information, and provide interactive services to the user based on the user's current facial posture.
  • a head-mounted display device or a 3D screen interactive device can use a pose estimation algorithm to estimate the pose of a face using a target face template.
  • pose estimation algorithms include 3D target detection based on point clouds, template matching based on point clouds, and perspective-n-points (PNP) pose estimation algorithms based on a single image.
  • the real-time pose of the pupil in the face is determined.
  • the 3D screen interactive device can adjust the opening and closing of the grating inside the screen based on the real-time pose of the pupil to present the best naked-eye 3D viewing effect to the user in the current viewing pose.
  • the user's real-time face image is used to iteratively optimize the initial 3D face template to obtain a target 3D face template that can characterize the user's current face posture characteristics.
  • the target 3D face template Through the target 3D face template, the user's current face posture can be accurately determined, and the face posture error can be reduced, thereby providing the user with a better 3D visual effect.
  • Fig. 2A shows an application scenario diagram of an image processing method according to an embodiment of the present disclosure.
  • Fig. 2B shows a flow chart of an image processing method according to an embodiment of the present disclosure.
  • Fig. 2C shows a schematic diagram of the key point distribution of a face image according to an embodiment of the present disclosure.
  • the camera 230 installed on the 3D screen interaction device 220 captures the facial image of the user 210.
  • the facial image captured by the camera 230 is a two-dimensional facial image.
  • the camera 230 sends the two-dimensional facial image to the 3D screen interaction device 230, and the 3D screen interaction device 220 iteratively optimizes the initial three-dimensional facial model according to the two-dimensional facial image to obtain a target three-dimensional facial template corresponding to the user 210.
  • the 3D screen interaction device 220 estimates the facial pose of the user 210 using the target three-dimensional facial template, obtains the three-dimensional facial pose in the camera coordinate system, and provides 3D visual services for the user 210 according to the three-dimensional facial pose.
  • FIG. 2B shows a schematic diagram of image processing performed by, for example, a 3D screen interaction device 220 .
  • the 3D screen interactive device 220 obtains a two-dimensional face image of the user from the camera 210, and processes the two-dimensional face image by the mainboard of the 3D screen interactive device 220 to obtain a three-dimensional face posture in the camera coordinate system.
  • a control signal is sent to the screen grating based on the three-dimensional face posture in the camera coordinate system to control the opening and closing of the screen grating.
  • the main board may include a wireless access point (AP) main board, in which a processor CPU performs processing operations.
  • AP wireless access point
  • the CPU converts the format of the two-dimensional face image from the camera from the NV21 format image to the Mat format image.
  • the CPU performs distortion correction on the Mat format image to obtain the corrected face image, and then uses the key point detection algorithm to determine multiple two-dimensional target key points from the corrected face image.
  • pose estimation is performed under a camera model without distortion.
  • the camera will have tangential distortion due to the lens not being completely parallel to the image screen, and radial distortion due to the bending of light.
  • the present invention provides a method for correcting image distortion.
  • the face image is distorted according to the following formula (1):
  • x0 and y0 are the x-axis and y-axis coordinate values of any coordinate point on the face image before correction in the pixel coordinate system.
  • x and y are the x-axis and y-axis coordinate values of any coordinate point on the face image after correction in the pixel coordinate system.
  • the pixel coordinate system is the coordinate system of the two-dimensional face image captured by the camera.
  • k 1 , k 2 and k 3 are radial distortion coefficients
  • p 1 and p 2 are tangential distortion coefficients.
  • k 1 , k 2 , k 3 , p 1 and p 2 are fixed parameters of the camera. Those skilled in the art can obtain the fixed parameters of the camera by any method in the art.
  • the CPU When determining the two-dimensional target key points, the CPU needs to first perform face detection on the Mat format image to determine the face area in the Mat format image, and then perform key point detection on the image of the face area through the face key point detection algorithm to obtain multiple two-dimensional target key points.
  • the number of two-dimensional target key points can be 5 key points, 21 key points, 49 key points or 68 key points. According to the actual detection requirements, other numbers of key points can also be selected, such as tens of thousands of key points.
  • the key points are distributed in areas such as the eyes, the nose tip, the left and right corners of the mouth, and the eyebrows. Multiple key points need to be distributed in multiple planes so that the key points can more accurately describe the facial features.
  • 68 facial key points are used to ensure both the real-time requirements and the accuracy requirements of the algorithm.
  • the distribution and order of the 68 facial key points are shown in FIG2C .
  • the face detection algorithm and the key point detection algorithm may include the implementation of the CascadeClassifier class and the Facemark class in the OpenCV library.
  • CascadeClassifier can perform key point detection based on a cascade classifier of features such as Haar, LBP, and HOG.
  • Facemark can perform key point detection based on local binary features LBF and cascaded random forest global linear regression.
  • the face detection algorithm and the key point detection algorithm modules can also be replaced by other algorithms (faster or more accurate algorithms) according to actual needs.
  • the present disclosure does not limit the face detection algorithm and the key point detection algorithm.
  • the CPU iteratively optimizes the initial 3D face template to obtain the optimal target 3D face template.
  • the target 3D face template has the best matching effect with the user's current facial features.
  • the CPU can estimate the user's face pose based on the target 3D face template and the PNP pose estimation algorithm, obtain the 3D face pose of the face image in the camera coordinate system, and calculate the coordinates of the user's left and right pupils based on the pose in the camera coordinate system.
  • the 3D face pose can be described by 68 3D key points corresponding to 68 2D target key points.
  • the 68 facial key points do not include the positions of the centers of the pupils of both eyes.
  • the CPU calculates the centroid (mean) of the 3D coordinates of the six key points (key points 36 to 41) around the left eye of the face to determine the 3D coordinates P l of the user's left pupil in the camera coordinate system.
  • the CPU calculates the centroid of the 3D coordinates of the six key points (key points 42 to 47) around the right eye of the face to determine the coordinates P r of the user's right pupil in the camera coordinate system.
  • the grating opening and closing sequence that presents the best naked eye 3D effect to the user at this time is calculated.
  • the grating opening and closing sequence may include a sequence of 0s and 1s, where 0 represents the corresponding grating being closed and 1 represents the corresponding grating being open.
  • the CPU transmits the grating opening and closing sequence to the whole machine driver of the mainboard, and the whole machine driver controls the output interface of the mainboard to output high and low levels to control the opening and closing of the grating on the screen.
  • the output interface may be a general-purpose input/output (GPIO) interface.
  • FIG. 3 shows a flow chart of constructing an initial three-dimensional face template according to an embodiment of the present disclosure.
  • step S110 uses a plurality of sample face images to construct an initial three-dimensional face template, which may include the following steps S310 to S340 .
  • step S310 a plurality of three-dimensional sample key points are obtained from each sample face image of the plurality of sample face images.
  • step S320 an average 3D face template is determined based on a plurality of 3D sample key points of a plurality of sample face images.
  • step S330 the feature matrix of the plurality of face sample images is determined using the average three-dimensional face template.
  • step S340 an initial three-dimensional face template is constructed according to the iteration parameters, the average three-dimensional face template and the feature matrix.
  • the LS3D-W data set can be used as the data source of the average face model to obtain multiple sample face images.
  • 68 three-dimensional sample key points are obtained from each sample face image. It should be noted that the distribution of the 68 three-dimensional sample key points from each sample face image is the same. For example, among the 68 three-dimensional sample key points from each sample face image, the periphery of the left eye of the face includes 6 key points (key points 36 to 41), and the periphery of the right eye of the face includes 6 key points (key points 42 to 47).
  • Each key point includes three coordinate values (X, Y and Z) of the key point in the target coordinate system. Calculate the average value of all key points located at the same face image position to determine the average 3D face template mean_shape.
  • the average 3D face template mean_shape includes three coordinate values of 68 average key points, and the coordinate unit can be mm.
  • the 3D face template mean_shape may be a column vector [X0, X1, ..., X67, Y0, Y1, ..., Y67, Z0, Z1, ..., Z67] T with a dimension of 204*1, where X0, Y0 and Z0 are the coordinate values of the first average key point respectively.
  • an average three-dimensional face template is used to determine a feature matrix of multiple face sample images, including: using the average three-dimensional face template, decentralized processing is performed on multiple three-dimensional sample key points of the multiple sample face images to obtain a covariance matrix; multiple eigenvalues of the covariance matrix and multiple eigenvectors corresponding to the multiple eigenvalues are calculated; multiple valid eigenvectors are determined from the multiple eigenvectors based on contribution values of linear projections of the multiple eigenvalues in the covariance matrix, and the sum of contribution values of the multiple eigenvalues corresponding to the multiple valid eigenvectors is greater than a preset contribution value; and a feature matrix is constructed based on the multiple valid eigenvectors.
  • the principal component analysis algorithm is used to analyze the facial key point data in the LS3D-W dataset to reduce the linear dimension of the key point data. For example, by linear projection, high-dimensional data is mapped to a low-dimensional space, so as to use fewer data dimensions and retain more original data point characteristics.
  • the difference between the 68 key points of each sample face image and the 68 average key points of the average 3D face template mean_shape is calculated to achieve decentralization (removing the mean).
  • a covariance matrix is constructed from the 68 key points of each decentralized sample face image, and multiple eigenvalues of the covariance matrix and eigenvectors corresponding to the eigenvalues are solved.
  • Each eigenvalue of the covariance matrix represents a contribution value to the linear projection.
  • a valid eigenvalue is selected from multiple eigenvalues. For example, according to the order of the eigenvalues, the sum of the eigenvalues whose values are in the first N is calculated. When it is determined that the proportion of the sum of the first N eigenvalues to the sum of all eigenvalues is greater than or equal to 99%, the N eigenvalues are selected as valid eigenvalues.
  • the preset contribution value can be 99%. Counting personnel in this field can also set other contribution values according to actual needs. The present disclosure does not limit this.
  • N num is recorded, and the first num eigenvalues are selected as valid eigenvalues, and N is a positive integer.
  • num represents the minimum feature dimension of the 204-dimensional feature of the three-dimensional face image.
  • the eigenvectors corresponding to the first num eigenvalues are valid eigenvectors, and num valid eigenvectors form a feature matrix pv, and the dimension of the feature matrix is 204*num.
  • the initial three-dimensional face template current_shape_3D can be expressed by formula (2):
  • params is an iteration parameter, which represents the changing characteristics of the face image at different times.
  • the dimension of the initial 3D face template current_shape_3D is 204*1.
  • the dimension of the iteration parameter params is num*1.
  • FIG. 4 shows a flowchart of iterating an initial 3D face template according to an embodiment of the present disclosure.
  • step S120 iteratively optimizes the initial three-dimensional face template using the face image of the target object to obtain the target three-dimensional face template, which may include the following steps S410 to S440 .
  • step S410 a plurality of two-dimensional target key points are obtained from a face image of a target object.
  • step S420 a plurality of three-dimensional key points are determined from the initial three-dimensional face template.
  • step S430 the plurality of three-dimensional key points are projected into a plurality of two-dimensional projection key points.
  • step S440 the average error between the plurality of two-dimensional projection key points and the plurality of two-dimensional target key points is calculated.
  • step S450 the initial three-dimensional face template is iteratively optimized according to the average error to obtain a target three-dimensional face template.
  • the two-dimensional target key points may be obtained from the face image after distortion correction.
  • 68 two-dimensional target key points are obtained from the face image after distortion correction.
  • the multiple three-dimensional key points determined from the initial three-dimensional face template may be unknown key points.
  • the coordinate value of the three-dimensional key point may be a function (X, Y, Z) of the iteration parameter params.
  • multiple 3D key points from the initial 3D face template can be expressed by formula (3):
  • projecting multiple three-dimensional key points into multiple two-dimensional projection key points may include constructing a weak perspective projection model params global according to the coordinate values of the three-dimensional key points, the scaling scale, the coordinate system rotation matrix, and the center point offset vector of the pixel coordinate system; and projecting the multiple three-dimensional key points into multiple two-dimensional projection key points through the weak perspective projection model.
  • the weak perspective projection model can be expressed by equation (5):
  • params_global [scale R x R y R z t x t y ] (5)
  • scale is the scaling scale
  • Rx is the rotation of the x-axis of the target coordinate system relative to the x-axis of the camera coordinate system
  • Ry is the rotation of the y-axis of the target coordinate system relative to the y-axis of the camera coordinate system
  • Rz is the rotation of the z-axis of the target coordinate system relative to the z-axis of the camera coordinate system
  • tx and ty are the offset vectors of the origin of the pixel coordinate system relative to the origin of the camera coordinate system on the x-axis and y-axis respectively.
  • the origin of the pixel coordinate system may be located at the upper left corner of the two-dimensional face image, and the origin of the camera coordinate system is located at the center of the camera optical axis. Therefore, the pixel coordinates of a coordinate point on the camera optical axis, such as the coordinate point (0, 0) in the pixel coordinate system are ( tx , ty ).
  • x and y are the coordinate values of the x-axis and y-axis of the two-dimensional projection key point in the pixel coordinate system.
  • X, Y and Z are the coordinate values of the x-axis, y-axis and z-axis of the three-dimensional key point in the target coordinate system.
  • X, Y and Z can be the coordinate values of the 68 average key points of the initial three-dimensional face template.
  • Formula (7) indicates that the three-dimensional key points from the initial three-dimensional face template current_shape_3D are projected onto the pixel coordinate system plane to obtain a two-dimensional projection key point matrix current_shape_2D, and the dimension of the two-dimensional projection key point matrix current_shape_2D is 2*68. Since the initial three-dimensional face template current_shape_3D is a function (X, Y, Z) of the iteration parameter params, the two-dimensional projection key points in the two-dimensional projection key point matrix current_shape_2D are also functions (x, y) of the iteration parameter params.
  • the two-dimensional projection key point matrix current_shape_2D can be expressed by equation (8):
  • the two-dimensional target key point matrix landmarks_2D consisting of 68 two-dimensional target key points obtained from the distortion-corrected face image can be expressed by formula (9):
  • an average error between multiple two-dimensional projection key points and multiple two-dimensional target key points is calculated, including calculating a reprojection error based on the multiple two-dimensional projection key points and the multiple two-dimensional target key points; and calculating an average error based on the reprojection error.
  • the reprojection error proj err of multiple two-dimensional projection key points and multiple two-dimensional target key points is calculated according to the two-dimensional projection key point matrix current_shape_2D and the two-dimensional target key point matrix landmarks_2D, which can be expressed by formula (10):
  • Error_X i represents the reprojection error of the coordinate values of the i-th two-dimensional projection key point and the i-th two-dimensional target key point matrix on the x-axis
  • Error_Y i represents the reprojection error of the coordinate values of the i-th two-dimensional projection key point and the i-th two-dimensional target key point matrix on the y-axis.
  • the average error error is a function of the iteration parameter params.
  • the accuracy of the target 3D face template can be measured by the reprojection error, and the reprojection error is related to the parameters of the iteration parameter params and the weak perspective projection model params global.
  • the iteration parameter params is the optimal iteration parameter
  • the target 3D face template current_shape_3D is the optimal target 3D face template.
  • FIG. 5 shows a flowchart of iterating an initial 3D face template according to another embodiment of the present disclosure.
  • step S450 iteratively optimizes the initial three-dimensional face template according to the average error to obtain the target three-dimensional face template, which may include the following steps S551 to S556 .
  • step S551 an iterative model is constructed according to the weak perspective projection model and iterative parameters.
  • step S552 a mapping function between the iterative model and a plurality of two-dimensional projection key points is determined.
  • step S553 the Jacobian matrix of the mapping function is calculated to obtain the two-dimensional iterative key points after iterative optimization.
  • step S554 an average error is calculated based on the two-dimensional iterative key points and multiple two-dimensional target key points from the face image.
  • step S555 when it is determined that the average error does not meet the convergence condition, the parameters of the iterative model are updated along the descending gradient direction of the Jacobian matrix to obtain an updated iterative model, and the operation returns to step S552.
  • step S556 when it is determined that the average error satisfies the convergence condition, an iteration parameter is determined, and a target three-dimensional face template is constructed according to the iteration parameter.
  • mapping function can be expressed by formula (12):
  • current_shape_2D_x is the x-axis coordinate value matrix of the 68 two-dimensional projection key points
  • current_shape_2D_y is the y-axis coordinate value matrix of the 68 two-dimensional projection key points.
  • the iterative model is iteratively optimized through the Jacobi Matrix J.
  • the two-dimensional iterative key points after iterative optimization can be determined.
  • the dimension of the Jacobian matrix J of the iterative model is 136*(num+6).
  • the Jacobian matrix J is obtained by respectively taking partial derivatives of the weak perspective projection model parameters scale, Rx , Ry , Rz , tx and ty on the right side of equation (7) and taking partial derivatives of the weak perspective projection model parameters params on the right side of equation (12).
  • the Jacobian matrix J can be expressed by equations (13) to (20):
  • Formula (13) and formula (14) constitute the first column of the Jacobian matrix J
  • formula (15) and formula (16) constitute the second to fourth columns of the Jacobian matrix J
  • formula (17) and formula (18) constitute the fifth and sixth columns of the Jacobian matrix J
  • formula (19) and formula (20) constitute the seventh to sixth columns of the Jacobian matrix J.
  • pv_X is the 1st to 68th rows of the feature matrix pv
  • pv_Y is the 69th to 138th rows of the feature matrix pv
  • pv_Z is the 139th to 204th rows of the feature matrix pv.
  • the dimensions of pv_X, pv_Y, and pv_Z are all 68*num.
  • initial values can be set for the weak perspective projection model parameters.
  • scale is the ratio of the size of the average three-dimensional face template to the face area detected in the face detection algorithm module.
  • tx and ty are determined by the center coordinates of the face area on the plane of the two-dimensional face image, and the initial values of tx and ty can both be 0.
  • the reprojection error proj err after each iteration optimization is calculated, and the iterative model can be updated according to the reprojection error proj err after each iteration optimization.
  • Updating the iterative model includes: calculating the parameter change of the iterative model according to the descending gradient direction and the average error of the Jacobian matrix; and updating the parameters of the iterative model according to the parameter change to obtain an updated iterative model.
  • the moving direction (gradient descent direction) of the Jacobian matrix J is determined according to the gradient descent principle, and the parameter change of the iterative model is calculated.
  • the iterative model is updated according to the parameter change, and the mapping function between the iterative model and the two-dimensional projection key point is re-determined.
  • parameter change delta of the iterative model can be expressed by formula (21):
  • update iteration model can be expressed by formula (22):
  • the average error error is calculated according to the reprojection error proj err .
  • the average error error hardly changes any more, it is considered that the average error error converges.
  • the iteration parameter params' is determined, and the target three-dimensional face template is constructed according to the iteration parameter params'.
  • Fig. 6A shows a flow chart of determining the face pose of a target object according to an embodiment of the present disclosure.
  • Fig. 6B shows a schematic diagram of transforming a target coordinate system to a camera coordinate system according to an embodiment of the present disclosure.
  • step S130 may include the following steps S610 to S640 , based on the correspondence between the target object's current facial image and the target 3D facial template, to determine the target object's current facial posture.
  • step S610 a plurality of preset three-dimensional key points of the target object are determined from the plurality of three-dimensional key points of the target three-dimensional face template.
  • step S620 a transformation matrix between the camera coordinate system and the target coordinate system is determined according to the correspondence between the pixel coordinate system of the face image from the camera and the target coordinate system.
  • step S630 the plurality of preset three-dimensional key points are converted into a plurality of target three-dimensional key points according to the transformation matrix, and the plurality of target three-dimensional key points are in a camera coordinate system.
  • step S640 the current facial posture of the target object is determined based on the multiple target three-dimensional key points.
  • multiple preset three-dimensional key points correspond to multiple specified two-dimensional key points of the current face image of the target object.
  • the multiple specified two-dimensional key points of the current face image can be located in a specified face area where face pose estimation is required.
  • multiple specified two-dimensional key points are located in the user's eye area.
  • multiple preset three-dimensional key points located around the user's eyes are determined in the corresponding area of the target three-dimensional face template.
  • the position of the user's pupil is determined by the multiple preset key points located around the user's eyes.
  • the multiple preset three-dimensional key points are in the target coordinate system.
  • the PNP pose estimation algorithm can correspond the two-dimensional coordinates in the pixel coordinate system to the three-dimensional coordinates in the target coordinate system one by one, thereby solving the transformation matrix from the target coordinate system W to the camera coordinate system C.
  • the coordinate origin OW of the target coordinate system W is transformed into the coordinate origin OC of the camera coordinate system by the transformation matrix
  • the coordinate point Pi of the target coordinate system W is transformed into the coordinate origin pi of the camera coordinate system by the transformation matrix.
  • the transformation matrix between the camera coordinate system and the target coordinate system can be determined by equation (24):
  • c is the scale of the camera
  • x and y are the coordinate values of the two-dimensional projection key point on the x-axis and y-axis in the pixel coordinate system
  • X, Y and Z are the coordinate values of the preset three-dimensional key point on the x-axis, y-axis and z-axis in the target coordinate system
  • K is the camera intrinsic parameter matrix.
  • the least squares solution is solved for the overdetermined equations to prevent the final face position estimation result from being biased due to errors in the detection of individual key points or excessive errors.
  • the present disclosure also provides a verification method for verifying the accuracy of the target three-dimensional face template of the present disclosure.
  • a depth sensor is fixedly installed on a 3D screen interactive device, and the transformation matrix T between the depth sensor and an ordinary monocular camera of the 3D screen interactive device is calibrated through tools such as the Stereo Camera Calibrator toolbox of MATLAB or the stereoCalibrate function of the OpenCV library.
  • the 3D coordinates of the user's pupil in the camera coordinate system of the 3D screen interactive device are [x, y, z] T
  • the coordinates transposed in the depth sensor coordinate system are [x′, y′, z′] T .
  • the transformation relationship between the two is:
  • the pupil 3D coordinates obtained by the depth sensor are taken as the true value
  • the pupil 3D coordinates determined according to the target three-dimensional face template determined according to the embodiment of the present disclosure are converted into 3D coordinates in the depth sensor coordinate system through matrix T, and the converted 3D coordinates are compared with the true value, so as to verify the accuracy of the target three-dimensional face template determined according to the embodiment of the present disclosure.
  • the verification results show that the pupil 3D coordinates obtained by the target three-dimensional face template determined by the embodiment of the present disclosure at different viewing distances have small errors in the x, y and z directions, and the errors are relatively stable.
  • the error may be introduced by the calibration error of the transformation matrix between the depth sensor and the camera or the error of the facial key point detection. Since the error of the target 3D face template determined by the embodiment of the present disclosure is relatively stable, the error can be compensated by adding a fixed bias in actual use. The accuracy and stability of the target 3D face template determined by the embodiment of the present disclosure are relatively reliable and have practical value.
  • the present disclosure further provides an image processing device, which will be described in detail below in conjunction with FIG.
  • FIG. 7 shows a structural block diagram of an image processing apparatus according to an embodiment of the present disclosure.
  • the image processing device 700 of this embodiment includes a construction module 710 , an iteration module 720 and a determination module 730 .
  • the construction module 710 is used to construct an initial 3D face template using a plurality of sample face images.
  • the construction module 710 can be used to perform the operation S110 described above, which will not be described in detail herein.
  • the iteration module 720 is used to iteratively optimize the initial 3D face template using the face image of the target object to obtain the target 3D face template.
  • the iteration module 720 can be used to perform the operation S120 described above, which will not be repeated here.
  • the determination module 730 is used to determine the current face pose of the target object according to the correspondence between the current face image of the target object and the target 3D face template. In one embodiment, the determination module 730 can be used to perform the operation S130 described above, which will not be repeated here.
  • any multiple modules in the construction module 710, the iteration module 720 and the determination module 730 can be combined in one module for implementation, or any one of the modules can be split into multiple modules. Alternatively, at least part of the functions of one or more of these modules can be combined with at least part of the functions of other modules and implemented in one module.
  • At least one of the construction module 710, the iteration module 720 and the determination module 730 can be at least partially implemented as a hardware circuit, such as a field programmable gate array (FPGA), a programmable logic array (PLA), a system on a chip, a system on a substrate, a system on a package, an application specific integrated circuit (ASIC), or can be implemented by hardware or firmware such as any other reasonable way of integrating or packaging the circuit, or implemented in any one of the three implementation methods of software, hardware and firmware or in any appropriate combination of any of them.
  • FPGA field programmable gate array
  • PLA programmable logic array
  • ASIC application specific integrated circuit
  • at least one of the construction module 710, the iteration module 720 and the determination module 730 can be at least partially implemented as a computer program module, and when the computer program module is run, the corresponding function can be performed.
  • the present disclosure also provides an interactive device, including a camera, a processor, a driving circuit, an input/output interface, and a screen.
  • the camera, the processor, the driving circuit, the input/output interface, and the screen are electrically connected in sequence.
  • the camera acquires a face image of the target object and sends the face image to the processor.
  • the processor uses the face image to iteratively optimize the initial three-dimensional face template to obtain the target three-dimensional face template.
  • the processor uses the target three-dimensional face template to estimate the face posture and obtain the pupil coordinates of the target object.
  • the processor can calculate the grating opening and closing sequence based on the pupil coordinates.
  • the driving circuit receives the grating opening and closing sequence from the processor, and controls the output interface to output the grating opening and closing sequence.
  • the screen is set up by a grating array.
  • the screen controls the opening and closing of the gratings in the grating array according to the grating opening and closing sequence.
  • the interaction device is similar to the 3D screen interaction device 220 shown in FIG. 2B of the present disclosure.
  • the present disclosure will not be repeated here.
  • FIG8 shows a block diagram of an electronic device suitable for implementing an image processing method according to an embodiment of the present disclosure.
  • the electronic device 800 includes a processor 801, which can perform various appropriate actions and processes according to the program stored in the read-only memory (ROM) 802 or the program loaded from the storage part 808 to the random access memory (RAM) 803.
  • the processor 801 may include, for example, a general-purpose microprocessor (such as a CPU), an instruction set processor and/or a related chipset and/or a special-purpose microprocessor (for example, an application-specific integrated circuit (ASIC)), etc.
  • the processor 801 may also include an onboard memory for caching purposes.
  • the processor 801 may include a single processing unit or multiple processing units for performing different actions of the method flow according to the embodiment of the present disclosure.
  • RAM 803 various programs and data required for the operation of the electronic device 800 are stored.
  • the processor 801, ROM 802 and RAM 803 are connected to each other through a bus 804.
  • the processor 801 performs various operations of the method flow according to the embodiment of the present disclosure by executing the program in ROM 802 and/or RAM 803. It should be noted that the program can also be stored in one or more memories other than ROM 802 and RAM 803.
  • the processor 801 can also perform various operations of the method flow according to the embodiment of the present disclosure by executing the program stored in the one or more memories.
  • the electronic device 800 may further include an input/output (I/O) interface 805, which is also connected to the bus 804.
  • the electronic device 800 may further include one or more of the following components connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, etc.; an output portion 807 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage portion 808 including a hard disk, etc.; and a communication portion 809 including a network interface card such as a LAN card, a modem, etc.
  • the communication portion 809 performs communication processing via a network such as the Internet.
  • a drive 810 is also connected to the I/O interface 805 as needed.
  • a removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 810 as needed, so that a computer program read therefrom is installed into the storage portion 808 as needed.
  • the present disclosure also provides a computer-readable storage medium, which may be included in the device/apparatus/system described in the above embodiments; or may exist independently without being assembled into the device/apparatus/system.
  • the above computer-readable storage medium carries one or more programs, and when the above one or more programs are executed, the method according to the embodiment of the present disclosure is implemented.
  • a computer-readable storage medium may be a non-volatile computer-readable storage medium, such as but not limited to: a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium containing or storing a program that may be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable storage medium may include the ROM 802 and/or RAM 803 described above and/or one or more memories other than ROM 802 and RAM 803.
  • the embodiment of the present disclosure also includes a computer program product, which includes a computer program, and the computer program contains program code for executing the method shown in the flowchart.
  • the program code is used to enable the computer system to implement the image processing method provided by the embodiment of the present disclosure.
  • the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, etc.
  • the computer program may also be transmitted and distributed in the form of a signal on a network medium, and downloaded and installed through the communication part 809, and/or installed from a removable medium 811.
  • the program code contained in the computer program may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the above.
  • the computer program can be downloaded and installed from the network through the communication part 809, and/or installed from the removable medium 811.
  • the computer program is executed by the processor 801, the above functions defined in the system of the embodiment of the present disclosure are performed.
  • the system, device, means, module, unit, etc. described above can be implemented by a computer program module.
  • the program code for executing the computer program provided by the embodiment of the present disclosure can be written in any combination of one or more programming languages.
  • these computing programs can be implemented using high-level process and/or object-oriented programming languages, and/or assembly/machine languages.
  • Programming languages include, but are not limited to, Java, C++, python, "C" language or similar programming languages.
  • the program code can be executed entirely on the user computing device, partially on the user device, partially on the remote computing device, or entirely on the remote computing device or server.
  • the remote computing device can be connected to the user computing device through any type of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computing device (e.g., using an Internet service provider to connect through the Internet).
  • LAN local area network
  • WAN wide area network
  • an Internet service provider to connect through the Internet
  • each box in the flowchart or block diagram can represent a module subcircuit, program segment, or part of the code, and the above-mentioned subcircuit, program segment, or part of the code contains one or more executable instructions for realizing the specified logical function.
  • the functions marked in the box can also occur in a different order from the order marked in the accompanying drawings. For example, two boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
  • each box in the block diagram or flowchart, and the combination of boxes in the block diagram or flowchart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了一种图像处理方法、装置、交互设备、电子设备、存储介质和计算机程序产品,涉及图像处理技术领域。该图像处理方法包括利用多个样本人脸图像,构建初始三维人脸模板;利用目标对象的人脸图像对初始三维人脸模板进行迭代优化,得到目标三维人脸模板;以及根据目标对象的当前时刻人脸图像与目标三维人脸模板的对应关系,确定目标对象的当前时刻人脸位姿。

Description

图像处理方法、装置、交互设备、电子设备和存储介质 技术领域
本公开涉及图像处理技术领域,尤其涉及一种图像处理方法、装置、交互设备、电子设备、存储介质和计算机程序产品。
背景技术
增强现实(Augmented Reality,AR)设备、虚拟现实(Virtual Reality,VR)设备和三维(three-dimensional,3D)屏交互设备通常需要实时获取用户的人脸相对位姿,根据人脸相对位姿调整显示效果从而为用户提供更逼真的体验效果。例如,裸眼3D屏可根据用户脸部、头部或瞳孔的实时位姿,调整屏幕内部光栅的开合情况,从而为用户呈现当前观看位姿下最好的裸眼3D观看效果。
目前,传统的位姿估计算法是基于刚体目标和固定模板进行位姿估计。但人脸图像为典型的非刚体目标,且每个人脸图像都存在个体差异。AR设备、VR设备或3D屏交互设备适用的用户范围广泛,因此传统的位姿估计算法难以针对不同用户获取准确的人脸位姿估计结果。
发明内容
本公开提供了一种显示面板的图像处理方法、装置、交互设备、电子设备、存储介质和计算机程序产品。
根据第一方面,本公开提供了一种图像处理方法,包括:利用多个样本人脸图像,构建初始三维人脸模板;利用目标对象的人脸图像对初始三维人脸模板进行迭代优化,得到目标三维人脸模板;以及根据目标对象的当前时刻人脸图像与目标三维人脸模板的对应关系,确定目标对象的当前时刻人脸位姿。
例如,述利用多个样本人脸图像,构建初始三维人脸模板,包括:分别从多个样本人脸图像的每个样本人脸图像中获取多个三维样本关键点;根据多个样本人脸图像的多个三维样本关键点,确定平均三维人脸模板;利用平均三维人脸模板,确定多个人脸样本图像的特征矩阵;以及根据迭代参数、平均三维人脸模板和特征矩阵,构建初始三维人脸模板。
例如,利用平均三维人脸模板,确定多个人脸样本图像的特征矩阵,包括:利用平 均三维人脸模板,对多个样本人脸图像的多个三维样本关键点进行去中心化处理,得到协方差矩阵;计算协方差矩阵的多个特征值和与多个特征值对应的多个特征向量;根据多个特征值在协方差矩阵中对线性投影的贡献值,从多个特征向量中确定多个有效特征向量,多个有效特征向量对应的多个特征值的贡献值之和大于预设贡献值;以及根据多个有效特征向量,构建特征矩阵。
例如,利用目标对象的人脸图像对初始三维人脸模板进行迭代优化,得到目标三维人脸模板,包括:从目标对象的人脸图像中获取多个二维目标关键点;从初始三维人脸模板中确定多个三维关键点;将多个三维关键点投影为多个二维投影关键点;计算多个二维投影关键点与多个二维目标关键点的平均误差;以及根据平均误差,对初始三维人脸模板进行迭代优化,得到目标三维人脸模板。
例如,将多个三维关键点投影为多个二维投影关键点,包括:根据三维关键点的坐标值、缩放尺度、坐标系旋转矩阵和像素坐标系的中心点偏移向量,构建弱透视投影模型;以及通过弱透视投影模型,将多个三维关键点投影为多个二维投影关键点。
例如,弱透视投影模型包括根据以下公式来将多个三维关键点投影为多个二维投影关键点:
Figure PCTCN2022135733-appb-000001
其中,x和y分别为二维投影关键点在像素坐标系的x轴和y轴的坐标值,X、Y和Z分别为三维关键点在目标对象所在坐标系的x轴、y轴和z轴的坐标值,为缩放尺度,
Figure PCTCN2022135733-appb-000002
为目标对象所在坐标系相对于相机坐标系的旋转矩阵,t x和t y分别为像素坐标系原点相对于相机坐标系原点在x轴和y轴的偏移向量。
例如,根据平均误差,对初始三维人脸模板进行迭代优化,得到目标三维人脸模板,包括:根据弱透视投影模型和迭代参数,构建迭代模型;确定迭代模型与多个二维投影关键点之间的映射函数;计算映射函数的雅克比矩阵,得到迭代优化后的二维迭代关键点;根据二维迭代关键点和来自人脸图像的多个二维目标关键点,计算平均误差;在确定平均误差不满足收敛条件的情况下,沿着雅克比矩阵的下降梯度方向,对迭代模型的参数进行更新,得到更新后的迭代模型,并返回确定迭代模型与多个二维投影关键点之间的映射函数的操作;以及在确定平均误差满足收敛条件的情况下,确定迭代参数,并根据迭代参数构建目标三维人脸模板。
例如,映射函数包括以下公式:
Figure PCTCN2022135733-appb-000003
其中,
Figure PCTCN2022135733-appb-000004
为多个二维投影关键点的坐标值矩阵,
Figure PCTCN2022135733-appb-000005
为迭代模型,scale为缩放尺度,R x、R y、R z为目标对象所在坐标系相对于相机坐标系的旋转量,t x和t y分别为像素坐标系原点相对于相机坐标系原点在x轴和y轴的偏移向量,params为迭代参数。
例如,沿着雅克比矩阵的下降梯度方向,对迭代模型的参数进行更新,得到更新后的迭代模型包括:根据雅克比矩阵的下降梯度方向和平均误差,计算迭代模型的参数变化量;以及根据参数变化量,更新迭代模型的参数,得到更新后的迭代模型。
例如,根据参数变化量,更新迭代模型的参数,得到更新后的迭代模型包括根据以下公式更新迭代模型:
Figure PCTCN2022135733-appb-000006
其中,
Figure PCTCN2022135733-appb-000007
为更新后的迭代模型,
Figure PCTCN2022135733-appb-000008
为更新前的迭代模型,delta为参数变化量。
例如,计算多个二维投影关键点与多个二维目标关键点的平均误差,包括:根据多个二维投影关键点和多个二维目标关键点,计算重投影误差;以及根据重投影误差,计算平均误差。
例如,根据重投影误差,计算平均误差包括根据以下公式计算平均误差:
Figure PCTCN2022135733-appb-000009
其中,error为平均误差,proj err为重投影误差,proj err=landmarks_2D-current_shape_2D,其中landmarks_2D为二维目标关键点的坐标值,current_shape_2D为二维投影关键点的坐标值。
例如,根据目标对象的当前时刻人脸图像与目标三维人脸模板的对应关系,确定目标对象的当前时刻人脸位姿包括:从目标三维人脸模板的多个三维关键点中确定目标对象的多个预设三维关键点,多个预设三维关键点处于目标坐标系,多个预设三维关键点与目标对象的当前人脸图像的多个指定二维关键点对应;根据来自相机的人脸图像所处的像素坐标系和目标坐标系之间的对应关系,确定相机坐标系与目标坐标系之间的变换矩阵;以及根据变换矩阵,将多个预设三维关键点转换成多个目标三维关键点,多个目标三维关键点处于相机坐标系;以及根据多个目标三维关键点,确定目标对象的当前时刻人脸位姿。
例如,根据人脸图像所处的像素坐标系和目标坐标系之间的对应关系,确定相机坐标系与目标坐标系之间的变换矩阵,包括根据以下公式确定变换矩阵:
Figure PCTCN2022135733-appb-000010
其中,c为相机的尺度,x和y分别为二维投影关键点在像素坐标系的x轴和y轴的坐标值,X、Y和Z分别为预设三维关键点在目标坐标系的x轴、y轴和z轴的坐标值,K为相机内参矩阵,
Figure PCTCN2022135733-appb-000011
为变换矩阵。
例如,从目标对象的人脸图像中获取多个二维目标关键点,包括:对人脸图像进行畸变校正,得到校正后人脸图像;利用关键点检测算法,从校正后人脸图像中确定多个二维目标关键点。
例如,对人脸图像进行畸变校正,得到校正后人脸图像,包括:
根据以下公式来对人脸图像进行畸变校正:
Figure PCTCN2022135733-appb-000012
其中,x 0和y 0为人脸图像上的任意一坐标点在x轴和y轴的坐标值,x和y为校正后人脸图像上的任意一坐标点在x轴和y轴的坐标值,r为人脸图像的中心点与坐标点(x,y)的距离,k 1、k 2和k 3为径向畸变系数,p 1和p 2为切向畸变系数。
根据第二方面,本公开提供了一种图像处理装置,包括:构建模块,用于利用多个样本人脸图像,构建初始三维人脸模板;迭代模块,用于利用目标对象的人脸图像对初始三维人脸模板进行迭代优化,得到目标三维人脸模板;以及确定模块,用于根据目标对象的当前时刻人脸图像与目标三维人脸模板的对应关系,确定目标对象的当前时刻人脸位姿。
根据第三方面,本公开提供了一种交互设备,包括:相机,用于获取目标对象的人脸图像;处理器,与相机电连接,用于:利用人脸图像对初始三维人脸模板进行迭代优化,得到目标三维人脸模板;利用目标三维人脸模板进行人脸位姿估计,得到目标对象的瞳孔坐标;以及根据瞳孔坐标,计算光栅开合序列;驱动电路,与处理器电连接,用于控制输出接口输出光栅开合序列;以及屏幕,与驱动电路电连接,用于根据光栅开合序列控制屏幕中光栅的开合。
根据第四方面,本公开提供了一种电子设备,包括:一个或多个处理器;存储器,用于存储一个或多个程序,其中,当一个或多个程序被一个或多个处理器执行时,使得一个或多个处理器实现本公开实施例所述的方法。
根据第五方面,本公开提供了一种计算机可读存储介质,其上存储有可执行指令,该指令被处理器执行时使处理器实现本公开实施例所述的方法。
根据第六方面,本公开提供了一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现本公开实施例所述的方法。
附图说明
图1示出了根据本公开实施例的图像处理方法的流程图;
图2A示出了根据本公开实施例的图像处理方法的应用场景图;
图2B示出了根据本公开实施例的图像处理方法的流程示意图;
图2C出了根据本公开实施例的人脸图像的关键点分布示意图;
图3示出了根据本公开实施例的构建初始三维人脸模板的流程图;
图4示出了根据本公开实施例的迭代初始三维人脸模板的流程图;
图5示出了根据本公开另一实施例的迭代初始三维人脸模板的流程图;
图6A示出了根据本公开实施例的确定目标对象的人脸位姿的流程图;
图6B示出了根据本公开实施例的目标坐标系到相机坐标系的变换示意图;
图7示出了根据本公开实施例的图像处理装置的结构框图;以及
图8示出了根据本公开实施例的适于实现图像处理方法的电子设备的方框图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整的描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部。基于所描述的本公开实施例,本领域普通技术人员在无需创造性劳动的前提下获得的所有其他实施例都属于本公开保护的范围。应注意,贯穿附图,相同的元素由相同或相近的附图标记来表示。在以下描述中,一些具体实施例仅用于描述目的,而不应该理解为对本公开有任何限制,而只是本公开实施例的示例。在可能导致对本公开的理解造成混淆时,将省略常规结构或配置。应注意,图中各部件的形状和尺寸不反映真实大小和比例,而仅示意本公开实施例的内容。
除非另外定义,本公开实施例使用的技术术语或科学术语应当是本领域技术人员所理解的通常意义。本公开实施例中使用的“第一”、“第二”以及类似词语并不表示任何顺序、数量或重要性,而只是用于区分不同的组成部分。
在本公开的技术方案中,所涉及的用户个人信息的收集、存储、使用、加工、传输、提供、公开和应用等处理,均符合相关法律法规的规定,采取了必要保密措施,且不违背公序良俗。
在本公开的技术方案中,在获取或采集用户个人信息之前,均获取了用户的授权或同意。
下面,将参照附图详细描述根据本公开的各个实施例。需要注意的是,在附图中,将相同的附图标记赋予基本上具有相同或类似结构和功能的组成部分,并且将省略关于它们的重复描述。
图1示出了根据本公开实施例的图像处理方法的流程图。
如图1所示,根据本公开实施例的图像处理方法可以包括以下步骤S110~步骤S130。应注意,以下方法中各个步骤的序号仅作为该步骤的表示以便描述,而不应被看作表示该各个步骤的执行顺序。除非明确指出,否则该方法不需要完全按照所示顺序来执行。
在步骤S110,利用多个样本人脸图像,构建初始三维人脸模板。
例如,多个样本人脸图像可以为预先采集到的多个用户的人脸图像。例如,多个样本人脸图像可以来源于开源的大规模人脸对齐3D(Large Scale 3D Faces in-the-Wild,LS3D-W)数据集。多个样本人脸图像包括二维人脸图像和三维人脸图像。从多个样本人脸图像中可以获取位于样本人脸图像上的二维坐标点和三维坐标点。
在本公开实施例中,根据来自多个样本人脸数据的多个二维坐标点和三维坐标点,可以构建初始三维人脸模板。初始三维人脸模板可以为通用的三维人脸模板,用于表征人脸图像的平均特征。
在步骤S120,利用目标对象的人脸图像对初始三维人脸模板进行迭代优化,得到目标三维人脸模板。
例如,目标对象为适用头戴式显示设备(例如,AR或VR)或3D屏交互设备的用户。在头戴式显示设备或3D屏交互设备上设有相机。在获取用户同意或授权的情况下,头戴式显示设备或3D屏交互设备通过相机捕获用户的人脸图像。例如,相机可以为深度传感器(深度相机)、双目相机、单目相机和激光雷达等等。
例如,通过单目相机捕获用户的人脸图像,人脸图像为二维人脸图像。二维人脸图像描述了属于该用户的人脸特征。利用来自该二维人脸图像的人脸特征对初始三维人脸模板进行迭代优化,可以得到能够表征该用户人脸特征的目标三维人脸模板。
在步骤S130,根据目标对象的当前时刻人脸图像与目标三维人脸模板的对应关系,确定目标对象的当前时刻人脸位姿。
例如,目标对象的当前时刻人脸图像为可以为当前时刻相机拍摄的二维人脸图像。根据二维人脸图像与目标三维人脸模板中相同部分的对应关系,对目标对象的当前人脸位置进行估计。例如,根据目标对象的当前人脸图像中关于目标对象瞳孔部分的图像信息,在目标三维人脸模板中,对相应的瞳孔部分的位姿进行估计。
目标三维人脸模板确定的人脸位姿为用户所处的目标坐标系中的人脸位姿。头戴式显示设备或3D屏交互设备利用目标人脸模板对该用户进行人脸位姿估计,确定在相机所处的相机坐标系内人脸位姿。
例如,头戴式显示设备或3D屏交互设备可以根据相机坐标系内的人脸位姿数据确定用户当前的人脸位姿情况,例如面部朝向和面部表情的信息,并基于用户当前的人脸位姿情况为用户提供互动服务。
例如,头戴式显示设备或3D屏交互设备可以通过位姿估计算法,利用目标人脸模板进行人脸位姿估计。例如,位姿估计算法包括基于点云的3D目标检测、基于点云的模板匹配和基于单张图像的透视N点投影(Perspective-n-Points,PNP)位姿估计算法等。
例如,在相机坐标系内的人脸位姿中,确定人脸中瞳孔的实时位姿。3D屏交互设备可以瞳孔的实时位姿调整屏幕内部光栅的开合情况,为用户呈现当前观看位姿下最好 的裸眼3D观看效果。
根据本公开实施例,利用用户的实时人脸图像,对初始为维人脸模板进行迭代优化,得到能够表征用户当前人脸位姿特征的目标三维人脸模板。通过目标三维人脸模板,可以准确地确定用户当前的人脸位姿,减小人脸位姿误差,从而为用户提供更好的3D视觉效果。
图2A示出了根据本公开实施例的图像处理方法的应用场景图。图2B示出了根据本公开实施例的图像处理方法的流程示意图。图2C出了根据本公开实施例的人脸图像的关键点分布示意图。
如图2A所示,用户210在观看3D屏交互设备220时,安装在3D屏交互设备220上的相机230捕获用户210的人脸图像。相机230捕获的人脸图像为二维人脸图像。相机230将二维人脸图像发送到3D屏交互设备230,由3D屏交互设备220根据二维人脸图像对初始三维人脸模型进行迭代优化,得到与用户210对应的目标三维人脸模板。3D屏交互设备220利用目标三维人脸模板对用户210进行人脸位姿估计,得到相机坐标系内的三维人脸位姿,并根据三维人脸位姿为用户210提供3D视觉服务。
结合图2B,图2B示出了例如3D屏交互设备220进行图像处理的示意图。
如图2B所示,3D屏交互设备220从相机210获取用户的二维人脸图像,由3D屏交互设备220的主板对二维人脸图像进行处理,得到相机坐标系内的三维人脸位姿。由基于相机坐标系内的三维人脸位姿向屏幕光栅出控制信号,控制屏幕光栅的开合情况。
例如,主板可以包括无线接入(Access Point,AP)主板。在主板中由处理器CPU执行处理操作。
例如,CPU对来自相机的二维人脸图像进行格式转换,由NV21格式图像转换为Mat格式图像。CPU对Mat格式图像进行畸变矫正,得到校正后人脸图像,再利用关键点检测算法,从校正后人脸图像中确定多个二维目标关键点。
由于位姿估计算法的解算原理建立在理想相机模型。在不存在畸变的相机模型下,进行位姿估计。通常相机会由于透镜不完全平行于图像屏幕产生的切向畸变,以及由于光线弯曲产生的径向畸变。
本公开提供一种图像畸变矫正的方法。
例如,根据以下式(1)来对人脸图像进行畸变校正:
Figure PCTCN2022135733-appb-000013
x 0和y 0为校正前人脸图像上的任意一坐标点在像素坐标系的x轴和y轴的坐标值。x和y为校正后人脸图像上的任意一坐标点在像素坐标系的x轴和y轴的坐标值。像素坐标系为相机拍摄的二维人脸图像所处的坐标系。
r为人脸图像的中心点与坐标点(x,y)的距离,r 2=x 2+y 2。人脸图像上距离中心点越远处的点畸变越大。
k 1、k 2和k 3为径向畸变系数,p 1和p 2为切向畸变系数。例如,k 1、k 2、k 3、p 1和p 2为相机的固定参数。本领域技术人员,可以根据本领域任何方式获取相机的固定参数。
在确定二维目标关键点时,CPU需要先对Mat格式图像进行人脸检测,以确定Mat格式图像中的人脸区域,再通过人脸关键点检测算法对人脸区域的图像进行关键点检测,得到多个二维目标关键点。
二维目标关键点的数量可以为5个关键点、21关键点、49关键点或68关键点。根据实际的检测需求,还可以选择其他数量的关键点,例如上万个关键点。例如,关键点分布在双眼、鼻尖、左右嘴角和眉毛等区域。多个关键点需要分布在多个平面内,使关键点可以更准确地描述人脸特征。
例如,更多的关键点可以提高位姿估计求解的结果的准确性,但也会增加检测和求解耗时。在本公开实施例中,采用68个人脸关键点,可以同时确保算法的实时性要求和精度要求。68个人脸关键点分布和顺序如图2C所示。
例如,人脸检测算法与关键点检测算法可以包括采用OpenCV库中的CascadeClassifier类和Facemark类实现。CascadeClassifier可以基于Haar、LBP和HOG等特征的级联分类器进行关键点检测。Facemark可以基于局部二值特征LBF和级联的随机森林全局线性回归进行关键点检测。人脸检测算法和关键点检测算法模块还可根据实际需求采用其他算法替换(更快或检测精度更高的算法),本公开对人脸检测算法和关键点检测算法不做限定。
由于不同的人类个体(不同种族、年龄、性别等)的三维人脸关键点坐标存在明显差异,并且人脸为典型的非刚体目标,人脸关键点在目标坐标系下的三维坐标随时存在变化。因此在获得68个二维目标关键点后,CPU对在对初始三维人脸模板进行迭代优化,得到最优的目标三维人脸模板。目标三维人脸模板与用户当前的人脸特征的匹配效 果最佳。
CPU可以根据目标三维人脸模板和PNP位姿估计算法对用户进行人脸位姿估计,得到人脸图像在相机坐标系下的三维人脸位姿,并根据相机坐标系下的位姿计算用户左右瞳孔的坐标。例如,三维人脸位姿可以由与68个二维目标关键点对应的68个三维关键点描述。
例如,如图2C所示,68个人脸关键点中不包含双眼瞳孔中心的位置。CPU计算人脸左眼眼周6个关键点(关键点36~41)的3D坐标的形心(均值)来确定用户左眼瞳孔在相机坐标系下的3D坐标P l。CPU计算人脸右眼眼周6个关键点(关键点42~47)的3D坐标的形心来确定用户右眼瞳孔在相机坐标系下的坐标P r。根据瞳孔坐标P l、P r和屏幕上的光栅排布计算出此时为用户呈现最佳裸眼3D效果的光栅开合序列。光栅开合序列可以包括一串由0和1组成的序列,0代表对应光栅关闭,1代表对应光栅打开。CPU将光栅开合序列传递给主板的整机驱动,由整机驱动控制主板的输出接口对应输出高低电平,控制屏幕中的光栅开合。例如输出接口可以为通用输入输出(General-purpose input/output,GPIO)接口。
图3示出了根据本公开实施例的构建初始三维人脸模板的流程图。
如图3所示,步骤S110利用多个样本人脸图像,构建初始三维人脸模板可以包括以下步骤S310~步骤S340。
在步骤S310,分别从多个样本人脸图像的每个样本人脸图像中获取多个三维样本关键点。
在步骤S320,根据多个样本人脸图像的多个三维样本关键点,确定平均三维人脸模板。
在步骤S330,利用平均三维人脸模板,确定多个人脸样本图像的特征矩阵。
在步骤S340,根据迭代参数、平均三维人脸模板和特征矩阵,构建初始三维人脸模板。
在本公开实施例中,可以将LS3D-W数据集作为平均人脸模型的数据来源,获取多个样本人脸图像。从每个样本人脸图像中分别获取68个三维样本关键点。需要说明的是,来自每个样本人脸图像的68个三维样本关键点的分布情况为相同的。例如,来自每个样本人脸图像的68个三维样本关键点中,人脸左眼眼周包括6个关键点(关键点36~41),人脸右眼眼周包括6个关键点(关键点42~47)。
每个关键点包括该关键点位于目标坐标系的三个坐标值(X、Y和Z)。计算位于同 一人脸图像位置的所有关键点的平均值,确定平均三维人脸模板mean_shape。平均三维人脸模板mean_shape包括68个平均关键点的三个坐标值,坐标单位可以为mm。
三维人脸模板mean_shape可以为一个维度为204*1的列向量[X0,X1,...,X67,Y0,Y1,...,Y67,Z0,Z1,...,Z67] T。其中X0、Y0和Z0分别为第一个平均关键点的坐标值。
在本公开实施例中,利用平均三维人脸模板,确定多个人脸样本图像的特征矩阵,包括:利用平均三维人脸模板,对多个样本人脸图像的多个三维样本关键点进行去中心化处理,得到协方差矩阵;计算协方差矩阵的多个特征值和与多个特征值对应的多个特征向量;根据多个特征值在协方差矩阵中线性投影的贡献值,从多个特征向量中确定多个有效特征向量,多个有效特征向量对应的多个特征值的贡献值之和大于预设贡献值;以及根据多个有效特征向量,构建特征矩阵。
例如,利用主成分分析算法(Principal component analysis algorithm,PCA)对LS3D-W数据集中的人脸关键点数据进行分析,以降低关键点数据的线性维度。例如通过线性投影,将高维的数据映射到低维的空间中,以此实现使用较少的数据维度,保留较多的原始数据点特性。
例如,对每个样本人脸图像的68个关键点与平均三维人脸模板mean_shape的68个平均关键点做差值计算,实现去中心化处理(去均值)。由去中心化后的每个样本人脸图像的68个关键点构建一个协方差矩阵,并求解该协方差矩阵的多个特征值和与特征值对应的特征向量。
协方差矩阵的每个特征值表示对线性投影的贡献值。从多个特征值中选择有效特征值。例如,根据特征值的大小顺序,计算数值位于前N个的特征值之和。在确定前N个特征值之和占全部特征值之和的比例大于等于99%的情况下,选择该N个特征值为有效特征值。预设贡献值可以为99%。本领域计数人员还可以根据实际的需求,设置其他贡献值。本公开对此不做限定。此时,记N=num,选择前num个特征值为有效特征值,N为正整数。num表示三维人脸图像的204维特征的最小特征维度。由前num个特征值对应的特征向量为有效特征向量,num个有效特征向量组成特征矩阵pv,特征矩阵的维度为204*num。
在本公开实施例中,初始三维人脸模板current_shape_3D可以由式(2)表示:
current_shape_3D=mean_shape+pv·params
params为迭代参数,通过迭代参数params表示不同时刻下,人脸图像的变化特征。
初始三维人脸模板current_shape_3D的维度为204*1。迭代参数params的维度为num*1。
图4示出了根据本公开实施例的迭代初始三维人脸模板的流程图。
如图4所示,步骤S120利用目标对象的人脸图像对初始三维人脸模板进行迭代优化,得到目标三维人脸模板可以包括以下步骤S410~步骤S440。
在步骤S410,从目标对象的人脸图像中获取多个二维目标关键点。
在步骤S420,从初始三维人脸模板中确定多个三维关键点。
在步骤S430,将多个三维关键点投影为多个二维投影关键点。
在步骤S440,计算多个二维投影关键点与多个二维目标关键点的平均误差。
在步骤S450,根据平均误差,对初始三维人脸模板进行迭代优化,得到目标三维人脸模板。
在本公开实施例中,二维目标关键点可以为从进行畸变校正后的人脸图像中获得。例如,从畸变校正后的人脸图像中获取68个二维目标关键点。从初始三维人脸模板中确定的多个三维关键点可以为未知的关键点。例如,三维关键点的坐标值可以为关于迭代参数params的函数(X,Y,Z)。
例如,来自初始三维人脸模板的多个三维关键点可以由式(3)表示:
current_shape_3D=[X0 X1 ... X67 Y0 Y1 ... Y67 Z0 Z1 ... Z67]   (3)
对式(3)进行变化得到维度为3*68的矩阵,由式(4)表示:
Figure PCTCN2022135733-appb-000014
在本公开实施例中,将多个三维关键点投影为多个二维投影关键点,可以包括根据三维关键点的坐标值、缩放尺度、坐标系旋转矩阵和像素坐标系的中心点偏移向量,构建弱透视投影模型params global;以及通过弱透视投影模型,将多个三维关键点投影为多个二维投影关键点。
例如,弱透视投影模型可以由式(5)表示:
params_global=[scale R x R y R z t x t y]     (5)
scale为缩放尺度,R x为目标坐标系的x轴相对于相机坐标系的x轴的旋转量,R y为目标坐标系的y轴相对于相机坐标系的y轴的旋转量,R z为目标坐标系的z轴相对于相机坐标系的z轴的旋转量,t x和t y分别为像素坐标系原点相对于相机坐标系原点在x轴和y轴的偏移向量。
例如,像素坐标系的原点可以位于二维人脸图像的左上角,相机坐标系的原点位于相机光轴的中心。因此位于相机光轴上的坐标点,例如坐标点(0,0)在像素坐标系上的像素坐标为(t x,t y)。
在本公开实施例中,将来自初始三维人脸模板的多个三维关键点投影为多个二维投影关键点可以由式(6)表示:
Figure PCTCN2022135733-appb-000015
例如,x和y分别为二维投影关键点在像素坐标系的x轴和y轴的坐标值。X、Y和Z分别为三维关键点在目标坐标系的x轴、y轴和z轴的坐标值。例如X、Y和Z可以为初始三维人脸模板的68个平均关键点的坐标值。
Figure PCTCN2022135733-appb-000016
为目标坐标系相对于相机坐标系的旋转矩阵。
将式(4)代入式(6),得到式(7):
Figure PCTCN2022135733-appb-000017
式(7)表示将来自初始三维人脸模板current_shape_3D的三维关键点投影至像素坐标系平面上,得到二维投影关键点矩阵current_shape_2D,二维投影关键点矩阵current_shape_2D维度为2*68。由于来自初始三维人脸模板current_shape_3D为关于迭代参数params的函数(X,Y,Z),此时二维投影关键点矩阵current_shape_2D中的二维投影关键点也为关于迭代参数params的函数(x,y)。
例如,二维投影关键点矩阵current_shape_2D可以由式(8)表示:
Figure PCTCN2022135733-appb-000018
例如,从畸变校正后的人脸图像中获取68个二维目标关键点构成的二维目标关键点矩阵landmarks_2D可以由式(9)表示:
Figure PCTCN2022135733-appb-000019
在本公开实施例中,计算多个二维投影关键点与多个二维目标关键点的平均误差,包括根据多个二维投影关键点和来多个二维目标关键点,计算重投影误差;以及根据重投影误差,计算平均误差。
根据二维投影关键点矩阵current_shape_2D和二维目标关键点矩阵landmarks_2D计算多个二维投影关键点与多个二维目标关键点的重投影误差proj err,可以由式(10)表示:
Figure PCTCN2022135733-appb-000020
根据重投影误差proj err,计算平均误差error由式(11)表示:
Figure PCTCN2022135733-appb-000021
Error_X i表示第i个二维投影关键点与第i个二维目标关键点矩阵在x轴的坐标值的重投影误差,Error_Y i表示第i个二维投影关键点与第i个二维目标关键点矩阵在y轴的坐标值的重投影误差。此时,平均误差error为关于迭代参数params的函数。
在本公开实施例中,目标三维人脸模板的准确性可以由重投影误差衡量,而重投影误差与迭代参数params和弱透视投影模型params global的参数相关。通过不断的迭代优化,使error达到收敛条件时,迭代参数params为最优迭代参数,此时的目标三维人脸模板current_shape_3D为最优目标三维人脸模板。
图5示出了根据本公开另一实施例的迭代初始三维人脸模板的流程图。
如图5所示,步骤S450根据平均误差,对初始三维人脸模板进行迭代优化,得到目标三维人脸模板可以包括以下步骤S551~步骤S556。
在步骤S551,根据弱透视投影模型和迭代参数,构建迭代模型。
在步骤S552,确定迭代模型与多个二维投影关键点之间的映射函数。
在步骤S553,计算映射函数的雅克比矩阵,得到迭代优化后的二维迭代关键点。
在步骤S554,根据二维迭代关键点和来自人脸图像的多个二维目标关键点,计算平均误差。
在步骤S555,在确定平均误差不满足收敛条件的情况下,沿着雅克比矩阵的下降梯度方向,对迭代模型的参数进行更新,得到更新后的迭代模型,并返回步骤S552的操作。
在步骤S556,在确定平均误差满足收敛条件的情况下,确定迭代参数,并根据迭代参数构建目标三维人脸模板。
在本公开实施例中,映射函数可以由式(12)表示:
Figure PCTCN2022135733-appb-000022
Figure PCTCN2022135733-appb-000023
为关于弱透视投影模型和迭代参数的迭代模型。
Figure PCTCN2022135733-appb-000024
为二维投影关键点的坐标值矩阵。current_shape_2D_x为68个二维投影关键点的x轴坐标值矩阵,current_shape_2D_y为68个二维投影关键点的y轴坐标值矩阵。
例如,
Figure PCTCN2022135733-appb-000025
例如,通过雅克比矩阵J(Jacobi Matrix)对迭代模型进行迭代优化。在优化后的迭代模型中可以确定迭代优化后的二维迭代关键点。
例如,迭代模型的雅克比矩阵J的维度为136*(num+6),对式(7)的等式右侧中的弱透视投影模型参数scale、R x、R y、R z、t x和t y分别求偏导,以及对式(12)的等式右侧中的弱透视投影模型参数params求偏导,得到雅克比矩阵J。
雅克比矩阵J可以由式(13)~式(20)表示:
Figure PCTCN2022135733-appb-000026
Figure PCTCN2022135733-appb-000027
Figure PCTCN2022135733-appb-000028
Figure PCTCN2022135733-appb-000029
Figure PCTCN2022135733-appb-000030
Figure PCTCN2022135733-appb-000031
Figure PCTCN2022135733-appb-000032
Figure PCTCN2022135733-appb-000033
式(13)和式(14)构成雅克比矩阵J的第1列,式(15)和式(16)构成雅克 比矩阵J的第2~4列,式(17)和式(18)构成雅克比矩阵J的第5和第6列,式(19)和式(20)构成雅克比矩阵J的第7~num+6列。
在式(19)和式(20)中,pv_X为特征矩阵pv的第1~68行,pv_Y为特征矩阵pv的第69~138行,pv_Z为特征矩阵pv的第139~204行,pv_X、pv_Y和pv_Z的维度都是68*num。
例如,在对迭代模型进行第一次迭代时,可以为弱透视投影模型参数设置初始值。例如R x=0,R y=0,R z=0。scale为平均三维人脸模板的尺寸和人脸检测算法模块中检测到的人脸区域的比值。t x和t y由二维人脸图像的平面上人脸区域的中心坐标决定,t x和t y初始值可以均为0。
计算每一次迭代优化后的重投影误差proj err,可以根据每一次迭代优化后的重投影误差proj err更新迭代模型。更新迭代模型包括:根据雅克比矩阵的下降梯度方向和平均误差,计算迭代模型的参数变化量;以及根据参数变化量,更新迭代模型的参数,得到更新后的迭代模型。
例如,在平均误差不满足收敛条件的情况下,根据梯度下降原理,确定雅克比矩阵J的移动方向(下降梯度方向),并计算迭代模型的参数变化量。根据参数变化量更新迭代模型,并重新确定迭代模型与二维投影关键点的映射函数。
例如,计算迭代模型的参数变化量delta可以由式(21)表示:
delta=0.75·J T·proj err         (21)
例如,更新迭代模型可以由式(22)表示:
Figure PCTCN2022135733-appb-000034
其中,
Figure PCTCN2022135733-appb-000035
为更新后的迭代模型,
Figure PCTCN2022135733-appb-000036
为更新前的迭代模型。
例如,在计算每一次迭代优化后的重投影误差proj err,根据重投影误差proj err计算 平均误差error。在确定平均误差error几乎不再发生变化的情况下,认为平均误差error收敛。
例如,在完成迭代优化后的平均投影误差current_error与完成上一次迭代优化后的平均投影误差last_error满足式(23)时,认为平均误差error收敛:
current_error>0.999·last_error     (23)
在确定平均误差满足收敛条件的情况下,确定迭代参数params’,并根据迭代参数params’构建目标三维人脸模板。目标三维人脸模板可以为current_shape_3D=mean_shape+pv·params’。
图6A示出了根据本公开实施例的确定目标对象的人脸位姿的流程图。图6B示出了根据本公开实施例的目标坐标系到相机坐标系的变换示意图。
如图6A所示,步骤S130根据目标对象的当前时刻人脸图像与目标三维人脸模板的对应关系,确定目标对象的当前时刻人脸位姿可以包括以下步骤S610~步骤S640。
在步骤S610,从目标三维人脸模板的多个三维关键点中确定目标对象的多个预设三维关键点。
在步骤S620,根据来自相机的人脸图像所处的像素坐标系和目标坐标系之间的对应关系,确定相机坐标系与目标坐标系之间的变换矩阵。
在步骤S630,根据变换矩阵,将多个预设三维关键点转换成多个目标三维关键点,多个目标三维关键点处于相机坐标系。
在步骤S640,根据多个目标三维关键点,确定目标对象的当前时刻人脸位姿。
例如,多个预设三维关键点与目标对象的当前人脸图像的多个指定二维关键点对应。当前人脸图像的多个指定二维关键点可以位于需要进行人脸位姿估计的指定人脸区域。例如,多个指定二维关键点位于用户的眼周区域。根据位于眼周的多个指定二维关键点,在目标三维人脸模板的相应区域中,确定位于用户眼周的多个预设三维关键点。通过位于用户眼周的多个预设关键点确定用户瞳孔的位置。多个预设三维关键点处于目标坐标系。
通过PNP位姿估计算法可以将像素坐标系下的二维坐标与目标坐标系下的三维坐标一一对应,从而求解出目标坐标系W变换到相机坐标系C的变换矩阵
Figure PCTCN2022135733-appb-000037
如图6B所示,通过变换矩阵将目标坐标系W的坐标原点OW变换为相机坐标系的坐标原点O C,通过变换矩阵将目标坐标系W的坐标点P i变换为相机坐标系的坐标原点p i
例如,相机坐标系与目标坐标系之间的变换矩阵可由式(24)确定:
Figure PCTCN2022135733-appb-000038
其中,c为相机的尺度,x和y分别为二维投影关键点在像素坐标系的x轴和y轴的坐标值,X、Y和Z分别为预设三维关键点在目标坐标系的x轴、y轴和z轴的坐标值,K为相机内参矩阵。
根据变换矩阵的特性可以得知,变换矩阵包括3个轴(x轴、y轴和z轴)的旋转角度R t和沿3个轴方向的平移量t。由于二维关键点包括68个关键点,因此可以68个关键点构建包括2*68=136个方程的超定方程组。通过136个方程求解3个轴(x轴、y轴和z轴)的旋转角度R t和沿3个轴方向的平移量t。
例如,通过公式x=(A T·A) -1·A T·b包括136个方程的超定方程组。其中,
Figure PCTCN2022135733-appb-000039
Figure PCTCN2022135733-appb-000040
对超定方程组求解最小二乘解,以防止个别关键点检测错误或误差过大使得最终人脸的位置估计结果产生偏差。
本公开还提供了一种验证本公开目标三维人脸模板准确性的验证方法。
例如,在3D屏交互设备上固定安装深度传感器,通过matlab的Stereo Camera Calibrator工具箱或OpenCV库的stereoCalibrate函数等工具标定出深度传感器与3D屏交互设备的普通单目相机之间的变换矩阵T。
假设某时刻下,用户瞳孔在3D屏交互设备的相机坐标系中的3D坐标为[x,y,z] T,转置在深度传感器坐标系下的坐标为[x′,y′,z′] T。两者的变换关系为:
Figure PCTCN2022135733-appb-000041
将深度传感器获取得到的瞳孔3D坐标作为真值,将根据本公开实施例确定的目标三维人脸模板确定的瞳孔3D坐标通过矩阵T转换为深度传感器坐标系下的3D坐标,并将转换后的3D坐标与真值对比,从而对本公开实施例确定的目标三维人脸模板的准确性进行验证。
验证结果表示,本公开实施例确定的目标三维人脸模板在不同观看距离上获取到的瞳孔3D坐标在x、y和z方向上误差均较小,且误差较为稳定。
误差可能由于深度传感器和相机之间变换矩阵的标定误差或人脸关键点检测误差 引入。由于本公开实施例确定的目标三维人脸模板的误差较为稳定,可在实际使用过程中,通过增加固定偏置弥补误差。本公开实施例确定的目标三维人脸模板的准确性和稳定性都较为可靠,具有实用价值。
基于上述图像处理方法,本公开还提供了一种图像处理装置。以下将结合图7对该装置进行详细描述。
图7示出了根据本公开实施例的图像处理装置的结构框图。
如图7所示,该实施例的图像处理装置700包括构建模块710、迭代模块720和确定模块730。
构建模块710用于利用多个样本人脸图像,构建初始三维人脸模板。在一实施例中,构建模块710可以用于执行前文描述的操作S110,在此不再赘述。
迭代模块720用于利用目标对象的人脸图像对初始三维人脸模板进行迭代优化,得到目标三维人脸模板。在一实施例中,迭代模块720可以用于执行前文描述的操作S120,在此不再赘述。
确定模块730用于根据目标对象的当前时刻人脸图像与目标三维人脸模板的对应关系,确定目标对象的当前时刻人脸位姿。在一实施例中,确定模块730可以用于执行前文描述的操作S130,在此不再赘述。
根据本公开的实施例,构建模块710、迭代模块720和确定模块730中的任意多个模块可以合并在一个模块中实现,或者其中的任意一个模块可以被拆分成多个模块。或者,这些模块中的一个或多个模块的至少部分功能可以与其他模块的至少部分功能相结合,并在一个模块中实现。根据本公开的实施例,构建模块710、迭代模块720和确定模块730中的至少一个可以至少被部分地实现为硬件电路,例如现场可编程门阵列(FPGA)、可编程逻辑阵列(PLA)、片上***、基板上的***、封装上的***、专用集成电路(ASIC),或可以通过对电路进行集成或封装的任何其他的合理方式等硬件或固件来实现,或以软件、硬件以及固件三种实现方式中任意一种或以其中任意几种的适当组合来实现。或者,构建模块710、迭代模块720和确定模块730中的至少一个可以至少被部分地实现为计算机程序模块,当该计算机程序模块被运行时,可以执行相应的功能。
本公开还提供了一种交互设备,包括相机、处理器、驱动电路、输入/输出接口和屏幕。相机、处理器、驱动电路、输入/输出接口和屏幕依次电连接。
相机获取目标对象的人脸图像,并将人脸图像发送给处理器。
处理器利用人脸图像对初始三维人脸模板进行迭代优化,得到目标三维人脸模板。处理器再利用目标三维人脸模板进行人脸位姿估计,得到目标对象的瞳孔坐标。处理器可以根据瞳孔坐标,计算光栅开合序列。
驱动电路接收来自处理器的光栅开合序列,并控制输出接口输出光栅开合序列。
屏幕上设置由光栅阵列。屏幕根据光栅开合序列控制光栅阵列中光栅的开合。
在本公开实施例中,交互设备与本公开图2B所示的3D屏交互设备220类似。为了简明,本公开在此处不再赘述。
图8示出了根据本公开实施例的适于实现图像处理方法的电子设备的方框图。
如图8示,根据本公开实施例的电子设备800包括处理器801,其可以根据存储在只读存储器(ROM)802中的程序或者从存储部分808加载到随机访问存储器(RAM)803中的程序而执行各种适当的动作和处理。处理器801例如可以包括通用微处理器(例如CPU)、指令集处理器和/或相关芯片组和/或专用微处理器(例如,专用集成电路(ASIC))等等。处理器801还可以包括用于缓存用途的板载存储器。处理器801可以包括用于执行根据本公开实施例的方法流程的不同动作的单一处理单元或者是多个处理单元。
在RAM 803中,存储有电子设备800操作所需的各种程序和数据。处理器801、ROM 802以及RAM 803通过总线804彼此相连。处理器801通过执行ROM 802和/或RAM 803中的程序来执行根据本公开实施例的方法流程的各种操作。需要注意,所述程序也可以存储在除ROM 802和RAM 803以外的一个或多个存储器中。处理器801也可以通过执行存储在所述一个或多个存储器中的程序来执行根据本公开实施例的方法流程的各种操作。
根据本公开的实施例,电子设备800还可以包括输入/输出(I/O)接口805,输入/输出(I/O)接口805也连接至总线804。电子设备800还可以包括连接至I/O接口805的以下部件中的一项或多项:包括键盘、鼠标等的输入部分806;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分807;包括硬盘等的存储部分808;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分809。通信部分809经由诸如因特网的网络执行通信处理。驱动器810也根据需要连接至I/O接口805。可拆卸介质811,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器810上,以便于从其上读出的计算机程序根据需要被安装入存储部分808。
本公开还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施 例中描述的设备/装置/***中所包含的;也可以是单独存在,而未装配入该设备/装置/***中。上述计算机可读存储介质承载有一个或者多个程序,当上述一个或者多个程序被执行时,实现根据本公开实施例的方法。
根据本公开的实施例,计算机可读存储介质可以是非易失性的计算机可读存储介质,例如可以包括但不限于:便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行***、装置或者器件使用或者与其结合使用。例如,根据本公开的实施例,计算机可读存储介质可以包括上文描述的ROM 802和/或RAM 803和/或ROM 802和RAM 803以外的一个或多个存储器。
本公开的实施例还包括一种计算机程序产品,其包括计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。当计算机程序产品在计算机***中运行时,该程序代码用于使计算机***实现本公开实施例所提供的图像处理方法。
在该计算机程序被处理器801执行时执行本公开实施例的***/装置中限定的上述功能。根据本公开的实施例,上文描述的***、装置、模块、单元等可以通过计算机程序模块来实现。
在一种实施例中,该计算机程序可以依托于光存储器件、磁存储器件等有形存储介质。在另一种实施例中,该计算机程序也可以在网络介质上以信号的形式进行传输、分发,并通过通信部分809被下载和安装,和/或从可拆卸介质811被安装。该计算机程序包含的程序代码可以用任何适当的网络介质传输,包括但不限于:无线、有线等等,或者上述的任意合适的组合。
在这样的实施例中,该计算机程序可以通过通信部分809从网络上被下载和安装,和/或从可拆卸介质811被安装。在该计算机程序被处理器801执行时,执行本公开实施例的***中限定的上述功能。根据本公开的实施例,上文描述的***、设备、装置、模块、单元等可以通过计算机程序模块来实现。
根据本公开的实施例,可以以一种或多种程序设计语言的任意组合来编写用于执行本公开实施例提供的计算机程序的程序代码,具体地,可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算程序。程序设计语言包括但不限于诸如Java,C++,python,“C”语言或类似的程序设计语言。程序代码可以完全地在用户 计算设备上执行、部分地在用户设备上执行、部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的***、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块子电路、程序段、或代码的一部分,上述子电路、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的***来实现,或者可以用专用硬件与计算机指令的组合来实现。
本领域技术人员可以理解,本公开的各个实施例和/或权利要求中记载的特征可以进行多种组合和/或结合,即使这样的组合或结合没有明确记载于本公开中。特别地,在不脱离本公开精神和教导的情况下,本公开的各个实施例和/或权利要求中记载的特征可以进行多种组合和/或结合。所有这些组合和/或结合均落入本公开的范围。
以上对本公开的实施例进行了描述。但是,这些实施例仅仅是为了说明的目的,而并非为了限制本公开的范围。尽管在以上分别描述了各实施例,但是这并不意味着各个实施例中的措施不能有利地结合使用。本公开的范围由所附权利要求及其等同物限定。不脱离本公开的范围,本领域技术人员可以做出多种替代和修改,这些替代和修改都应落在本公开的范围之内。

Claims (21)

  1. 一种图像处理方法,包括:
    利用多个样本人脸图像,构建初始三维人脸模板;
    利用目标对象的人脸图像对所述初始三维人脸模板进行迭代优化,得到目标三维人脸模板;以及
    根据所述目标对象的当前时刻人脸图像与所述目标三维人脸模板的对应关系,确定所述目标对象的当前时刻人脸位姿。
  2. 根据权利要求1所述的图像处理方法,其中,所述利用多个样本人脸图像,构建初始三维人脸模板,包括:
    分别从多个样本人脸图像的每个样本人脸图像中获取多个三维样本关键点;
    根据所述多个样本人脸图像的多个三维样本关键点,确定平均三维人脸模板;
    利用所述平均三维人脸模板,确定所述多个人脸样本图像的特征矩阵;以及
    根据迭代参数、所述平均三维人脸模板和所述特征矩阵,构建初始三维人脸模板。
  3. 根据权利要求2所述的图像处理方法,其中,所述利用所述平均三维人脸模板,确定所述多个人脸样本图像的特征矩阵,包括:
    利用所述平均三维人脸模板,对所述多个样本人脸图像的多个三维样本关键点进行去中心化处理,得到协方差矩阵;
    计算所述协方差矩阵的多个特征值和与所述多个特征值对应的多个特征向量;
    根据所述多个特征值在所述协方差矩阵中对线性投影的贡献值,从所述多个特征向量中确定的多个有效特征向量,所述多个有效特征向量对应的多个特征值的贡献值之和大于预设贡献值;以及
    根据所述多个有效特征向量,构建所述特征矩阵。
  4. 根据权利要求1所述的图像处理方法,其中,所述利用目标对象的人脸图像对所述初始三维人脸模板进行迭代优化,得到目标三维人脸模板,包括:
    从所述目标对象的人脸图像中获取多个二维目标关键点;
    从所述初始三维人脸模板中确定多个三维关键点;
    将所述多个三维关键点投影为多个二维投影关键点;
    计算所述多个二维投影关键点与所述多个二维目标关键点的平均误差;以及
    根据所述平均误差,对所述初始三维人脸模板进行迭代优化,得到所述目标三维人脸模板。
  5. 根据权利要求4所述的图像处理方法,其中,所述将所述多个三维关键点投影为多个二维投影关键点,包括:
    根据所述三维关键点的坐标值、缩放尺度、坐标系旋转矩阵和像素坐标系的中心点偏移向量,构建弱透视投影模型;以及
    通过所述弱透视投影模型,将所述多个三维关键点投影为多个二维投影关键点。
  6. 根据权利要求5所述的图像处理方法,其中,所述弱透视投影模型包括根据以下公式来将所述多个三维关键点投影为多个二维投影关键点:
    Figure PCTCN2022135733-appb-100001
    其中,x和y分别为二维投影关键点在像素坐标系的x轴和y轴的坐标值,X、Y和Z分别为三维关键点在目标对象所在坐标系的x轴、y轴和z轴的坐标值,scale为缩放尺度,
    Figure PCTCN2022135733-appb-100002
    为目标对象所在坐标系相对于相机坐标系的旋转矩阵,t x和t y分别为所述像素坐标系原点相对于所述相机坐标系原点在x轴和y轴的偏移向量。
  7. 根据权利要求5所述的图像处理方法,其中,所述根据所述平均误差,对所述初始三维人脸模板进行迭代优化,得到目标三维人脸模板,包括:
    根据所述弱透视投影模型和迭代参数,构建迭代模型;
    确定所述迭代模型与多个二维投影关键点之间的映射函数;
    计算所述映射函数的雅克比矩阵,得到迭代优化后的二维迭代关键点;
    根据所述二维迭代关键点和来自所述人脸图像的多个二维目标关键点,计算平均误差;
    在确定所述平均误差不满足收敛条件的情况下,沿着所述雅克比矩阵的下降梯度方向,对所述迭代模型的参数进行更新,得到更新后的迭代模型,并返回所述确定所 述迭代模型与多个二维投影关键点之间的映射函数的操作;以及
    在确定所述平均误差满足收敛条件的情况下,确定迭代参数,并根据所述迭代参数构建目标三维人脸模板。
  8. 根据权利要求7所述的图像处理方法,其中,所述映射函数包括以下公式:
    Figure PCTCN2022135733-appb-100003
    其中,
    Figure PCTCN2022135733-appb-100004
    为多个二维投影关键点的坐标值矩阵,
    Figure PCTCN2022135733-appb-100005
    为迭代模型,scale为缩放尺度,R x、R y、R z为目标对象所在坐标系相对于相机坐标系的旋转量,t x和t y分别为像素坐标系原点相对于所述相机坐标系原点在x轴和y轴的偏移向量,params为迭代参数。
  9. 根据权利要求7所述的图像处理方法,其中,所述沿着所述雅克比矩阵的下降梯度方向,对所述迭代模型的参数进行更新,得到更新后的迭代模型包括:
    根据所述雅克比矩阵的下降梯度方向和所述平均误差,计算所述迭代模型的参数变化量;以及
    根据所述参数变化量,更新所述迭代模型的参数,得到更新后的迭代模型。
  10. 根据权利要求9所述的图像处理方法,其中,所述根据所述参数变化量,更新所述迭代模型的参数,得到更新后的迭代模型包括根据以下公式更新迭代模型:
    Figure PCTCN2022135733-appb-100006
    其中,
    Figure PCTCN2022135733-appb-100007
    为更新后的迭代模型,
    Figure PCTCN2022135733-appb-100008
    为更新前的迭代模型,delta为参数变化量。
  11. 根据权利要求4所述的图像处理方法,其中,所述计算所述多个二维投影关键点与所述多个二维目标关键点的平均误差,包括:
    根据所述多个二维投影关键点和所述多个二维目标关键点,计算重投影误差;以及
    根据所述重投影误差,计算平均误差。
  12. 根据权利要求11所述的图像处理方法,其中,所述根据所述重投影误差,计算平均误差包括根据以下公式计算平均误差:
    Figure PCTCN2022135733-appb-100009
    其中,error为平均误差,proj err为重投影误差,proj err=landmarks_2D-current_shape_2D,其中landmarks_2D为二维目标关键点的坐标值,current_shape_2D为二维投影关键点的坐标值。
  13. 根据权利要求1所述的图像处理方法,其中,所述根据所述目标对象的当前时刻人脸图像与所述目标三维人脸模板的对应关系,确定所述目标对象的当前时刻人脸位姿,包括:
    从所述目标三维人脸模板的多个三维关键点中确定所述目标对象的多个预设三维关键点,所述多个预设三维关键点处于目标坐标系,所述多个预设三维关键点与所述目标对象的当前人脸图像的多个指定二维关键点对应;
    根据来自相机的所述人脸图像所处的像素坐标系和所述目标坐标系之间的对应关系,确定相机坐标系与所述目标坐标系之间的变换矩阵;
    根据所述变换矩阵,将所述多个预设三维关键点转换成多个目标三维关键点,所述多个目标三维关键点处于所述相机坐标系;以及
    根据所述多个目标三维关键点,确定所述目标对象的当前时刻人脸位姿。
  14. 根据权利要求13所述的图像处理方法,其中,所述根据所述人脸图像所处的像素坐标系和所述目标坐标系之间的对应关系,确定相机坐标系与所述目标坐标系之间的变换矩阵,包括根据以下公式确定变换矩阵:
    Figure PCTCN2022135733-appb-100010
    其中,c为所述相机的尺度,x和y分别为二维投影关键点在所述像素坐标系的x轴和y轴的坐标值,X、Y和Z分别为预设三维关键点在所述目标坐标系的x轴、y轴和z轴的坐标值,K为所述相机内参矩阵,
    Figure PCTCN2022135733-appb-100011
    为变换矩阵。
  15. 根据权利要求4所述的图像处理方法,其中,所述从所述目标对象的人脸图像中获取多个二维目标关键点,包括:
    对所述人脸图像进行畸变校正,得到校正后人脸图像;以及
    利用关键点检测算法,从所述校正后人脸图像中确定多个二维目标关键点。
  16. 根据权利要求15所述的图像处理方法,其中,所述对所述人脸图像进行畸变校正,得到校正后人脸图像,包括:
    根据以下公式来对所述人脸图像进行畸变校正:
    Figure PCTCN2022135733-appb-100012
    其中,x 0和y 0为所述人脸图像上的任意一坐标点在x轴和y轴的坐标值,x和y为所述校正后人脸图像上的任意一坐标点在x轴和y轴的坐标值,r为所述人脸图像的中心点与坐标点(x,y)的距离,k 1、k 2和k 3为径向畸变系数,p 1和p 2为切向畸变系数。
  17. 一种图像处理装置,包括:
    构建模块,用于利用多个样本人脸图像,构建初始三维人脸模板;
    迭代模块,用于利用目标对象的人脸图像对所述初始三维人脸模板进行迭代优化, 得到目标三维人脸模板;以及
    确定模块,用于根据所述目标对象的当前时刻人脸图像与所述目标三维人脸模板的对应关系,确定所述目标对象的当前时刻人脸位姿。
  18. 一种交互设备,包括:
    相机,用于获取目标对象的人脸图像;
    处理器,与所述相机电连接,用于:
    利用所述人脸图像对初始三维人脸模板进行迭代优化,得到目标三维人脸模板;
    利用所述目标三维人脸模板进行人脸位姿估计,得到所述目标对象的瞳孔坐标;以及
    根据所述瞳孔坐标,计算光栅开合序列;
    驱动电路,与所述处理器电连接,用于控制输出接口输出所述光栅开合序列;以及
    屏幕,与所述驱动电路电连接,用于根据所述光栅开合序列控制屏幕中光栅的开合。
  19. 一种电子设备,包括:
    一个或多个处理器;
    存储器,用于存储一个或多个程序,
    其中,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现权利要求1至16中任一项所述的方法。
  20. 一种计算机可读存储介质,其上存储有可执行指令,该指令被处理器执行时使处理器实现权利要求1至16中任一项所述的方法。
  21. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现权利要求1至16中任一项所述的方法。
PCT/CN2022/135733 2022-12-01 2022-12-01 图像处理方法、装置、交互设备、电子设备和存储介质 WO2024113290A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/135733 WO2024113290A1 (zh) 2022-12-01 2022-12-01 图像处理方法、装置、交互设备、电子设备和存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/135733 WO2024113290A1 (zh) 2022-12-01 2022-12-01 图像处理方法、装置、交互设备、电子设备和存储介质

Publications (1)

Publication Number Publication Date
WO2024113290A1 true WO2024113290A1 (zh) 2024-06-06

Family

ID=91322825

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/135733 WO2024113290A1 (zh) 2022-12-01 2022-12-01 图像处理方法、装置、交互设备、电子设备和存储介质

Country Status (1)

Country Link
WO (1) WO2024113290A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108495116A (zh) * 2018-03-29 2018-09-04 京东方科技集团股份有限公司 3d显示装置及其控制方法、计算机设备
CN111768477A (zh) * 2020-07-06 2020-10-13 网易(杭州)网络有限公司 三维人脸表情基建立方法及装置、存储介质及电子设备
US20210012523A1 (en) * 2018-12-25 2021-01-14 Zhejiang Sensetime Technology Development Co., Ltd. Pose Estimation Method and Device and Storage Medium
CN114266860A (zh) * 2021-12-22 2022-04-01 西交利物浦大学 三维人脸模型建立方法、装置、电子设备及存储介质
CN115049738A (zh) * 2021-03-08 2022-09-13 广东博智林机器人有限公司 人与相机之间距离的估计方法及***
CN115335865A (zh) * 2021-01-07 2022-11-11 广州视源电子科技股份有限公司 虚拟图像构建方法、装置、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108495116A (zh) * 2018-03-29 2018-09-04 京东方科技集团股份有限公司 3d显示装置及其控制方法、计算机设备
US20210012523A1 (en) * 2018-12-25 2021-01-14 Zhejiang Sensetime Technology Development Co., Ltd. Pose Estimation Method and Device and Storage Medium
CN111768477A (zh) * 2020-07-06 2020-10-13 网易(杭州)网络有限公司 三维人脸表情基建立方法及装置、存储介质及电子设备
CN115335865A (zh) * 2021-01-07 2022-11-11 广州视源电子科技股份有限公司 虚拟图像构建方法、装置、设备及存储介质
CN115049738A (zh) * 2021-03-08 2022-09-13 广东博智林机器人有限公司 人与相机之间距离的估计方法及***
CN114266860A (zh) * 2021-12-22 2022-04-01 西交利物浦大学 三维人脸模型建立方法、装置、电子设备及存储介质

Similar Documents

Publication Publication Date Title
US11737852B2 (en) Computer-implemented method of smoothing a shape of a tooth model
Özyeşil et al. A survey of structure from motion*.
CA2887763C (en) Systems and methods for relating images to each other by determining transforms without using image acquisition metadata
WO2020034785A1 (zh) 三维模型处理方法和装置
US20130272581A1 (en) Method and apparatus for solving position and orientation from correlated point features in images
US20100295854A1 (en) Viewpoint-invariant image matching and generation of three-dimensional models from two-dimensional imagery
US20190272670A1 (en) Real-time hand modeling and tracking using convolution models
CN113269862A (zh) 场景自适应的精细三维人脸重建方法、***、电子设备
CN111768477A (zh) 三维人脸表情基建立方法及装置、存储介质及电子设备
JP2007257324A (ja) 顔モデル作成システム
CN109376698B (zh) 人脸建模方法和装置、电子设备、存储介质、产品
KR20240032962A (ko) 3차원 동적 추적 방법, 장치, 전자 기기 및 저장 매체
Zheng et al. Minimal solvers for 3d geometry from satellite imagery
CN114608521B (zh) 单目测距方法及装置、电子设备和存储介质
Wang et al. Joint head pose and facial landmark regression from depth images
Botterill et al. Fast RANSAC hypothesis generation for essential matrix estimation
JP4850768B2 (ja) 3次元の人の顔の表面データを再構築するための装置及びプログラム
KR20010055957A (ko) 증강현실 기반의 3차원 트래커와 컴퓨터 비젼을 이용한영상 정합 방법
CN113544744A (zh) 一种头部姿态测量方法及装置
WO2024113290A1 (zh) 图像处理方法、装置、交互设备、电子设备和存储介质
US11893681B2 (en) Method for processing two-dimensional image and device for executing method
Guillemaut et al. Using points at infinity for parameter decoupling in camera calibration
CN116597020A (zh) 外参数标定方法、计算设备、图像采集***和存储介质
CN118435247A (zh) 图像处理方法、装置、交互设备、电子设备和存储介质
CN112967329A (zh) 图像数据优化方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22966881

Country of ref document: EP

Kind code of ref document: A1