CN105469018B

CN105469018B - Method and equipment for positioning human eyes

Info

Publication number: CN105469018B
Application number: CN201410388103.XA
Authority: CN
Inventors: 王勃飞; 邓伟洪; 张殿凯; 雷晨雨; 瞿广财
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2014-08-07
Filing date: 2014-08-07
Publication date: 2020-03-13
Anticipated expiration: 2034-08-07
Also published as: WO2016019715A1; CN105469018A

Abstract

The embodiment of the invention discloses a method and equipment for positioning human eyes, wherein the method comprises the following steps: preprocessing an original face image to obtain an image to be processed; filtering each pixel point of the image to be processed according to a pre-trained average synthesis correlation filter (ASEF) template to obtain a corresponding filtering response value of each pixel point of the image to be processed; sequentially selecting a preset number of human eye candidate points from all pixel points of the image to be processed according to the sequence of the filtering response values corresponding to each pixel point of the image to be processed from large to small; and determining human eye position points from the preset number of human eye candidate points through parameters of a pre-trained fast local linear Support Vector Machine (SVM).

Description

Method and equipment for positioning human eyes

Technical Field

The present invention relates to image processing technologies, and in particular, to a method and an apparatus for positioning human eyes.

Background

In the related art of facial image processing such as face tracking, face recognition, expression analysis, eye control, etc., eye localization is a very critical step because the eyes are the most stable and prominent features of the face and there is enough gradient information to facilitate accurate localization.

Currently, among many methods for locating human eyes, a classifier method based on a Support Vector Machine (SVM) has high precision, but the time complexity in the calculation process is o (n)²) Moreover, because the movement speed of human eyes is high, the requirement of quickly positioning the human eyes in practical application is difficult to meet with high time complexity, and the positioning efficiency is low.

Disclosure of Invention

In order to solve the above technical problems, embodiments of the present invention are expected to provide a method and an apparatus for positioning human eyes, which can reduce the time complexity of human eye positioning under a high-precision condition, so as to achieve higher positioning efficiency while maintaining high-precision human eye positioning.

The technical scheme of the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a method for positioning a human eye, where the method may include:

preprocessing an original face image to obtain an image to be processed;

filtering each pixel point of the image to be processed according to a pre-trained average synthesis correlation filter (ASEF) template to obtain a corresponding filtering response value of each pixel point of the image to be processed;

sequentially selecting a preset number of human eye candidate points from all pixel points of the image to be processed according to the sequence of the filtering response values corresponding to each pixel point of the image to be processed from large to small;

and determining human eye position points from the preset number of human eye candidate points through parameters of a pre-trained fast local linear Support Vector Machine (SVM).

Further, the method further comprises:

carrying out face detection on a preset number of sample images, normalizing the detected face images to a preset size, and carrying out Gaussian smoothing to obtain a preset number of face sample images;

and training the parameters of the ASEF template and the rapid local linear SVM through a preset number of face sample images.

Further, the training of the ASEF template by a preset number of face sample images includes:

the ith human face sample image Im _ orign (x)_i，y_i) After windowing, performing two-dimensional fast Fourier transform on the ith windowed human face sample image to obtain the ith human faceFrequency domain image Im _ orign _ FFT (u) of sample image_i，v_i) Wherein i represents the number of the face sample image, and the value range of i is a positive integer greater than zero and less than N, where N is the preset number, (x)_i，y_i) The ith human face sample image representing the image domain, (u)_i，v_i) An ith human face sample image representing a frequency domain;

according to the ith human face sample image Im _ orign (x)_i，y_i) Actual position of the human eye (x)₀，y₀) Acquiring a frequency domain image Im _ orign _ FFT (u) of the ith human face sample image_i，v_i) Corresponding Gaussian impulse response after passing through the filter template

And responding to the Gaussian pulse with a Response _ im (x)_i，y_i) Performing two-dimensional fast Fourier transform to obtain the Gaussian pulse Response _ im (x)_i，y_i) Frequency domain Response _ im _ FFT (u)_i，v_i) Wherein e is a natural base number, and σ is the Gaussian impulse Response _ im (x)_i，y_i) The variance of (a);

acquiring the ith human face sample image Im _ orign (x) according to the following formula_i，y_i) Corresponding Filter template Filter _ FFT (u_i，v_i)，

Carrying out averaging operation on Filter templates corresponding to all face sample images according to the following formula to obtain an ASEF template Filter _ FFT (u, v),

where the symbol Σ is the sum operator.

Further, according to the ith human face sample image Im _ orign (x)_i，y_i) Middle-aged peopleActual position of eye (x)₀，y₀) Acquiring a frequency domain image Im _ orign _ FFT (u) of the ith human face sample image_i，v_i) Corresponding Gaussian impulse response after passing through the filter template

And responding to the Gaussian pulse with a Response _ im (x)_i，y_i) Performing two-dimensional fast Fourier transform to obtain the Gaussian pulse Response _ im (x)_i，y_i) Frequency domain Response _ im _ FFT (u)_i，v_i) The method comprises the following steps:

according to the ith human face sample image Im _ orign (x)_i，y_i) Actual position of the left eye of the middle human (x)_0L，y_0L) Acquiring a frequency domain image Im _ orign _ FFT (u) of the ith human face sample image_i，v_i) Corresponding left-eye Gaussian impulse response after passing through the filter template

And responding to the left-eye Gaussian pulse by L _ Response _ im (x)_i，y_i) Performing two-dimensional fast Fourier transform to obtain the left-eye Gaussian pulse Response L _ Response _ im (x)_i，y_i) Frequency domain Response of L _ Response _ im _ FFT (u)_i，v_i)；

According to the ith human face sample image Im _ orign (x)_i，y_i) Actual position of the right eye of the middle person (x)_0R，y_0R) Acquiring a frequency domain image Im _ orign _ FFT (u) of the ith human face sample image_i，v_i) Corresponding right-eye Gaussian impulse response after passing through the filter template

And responding to the Gaussian pulse of the right eye R _ Response _ im (x)_i，y_i) Performing two-dimensional fast Fourier transform to obtain the Gaussian impulse Response R _ Response _ im (x) of the right eye_i，y_i) Frequency domain Response of R _ Response _ im _ FFT (u)_i，v_i)；

Accordingly, the ith human face sample image Im _ orign (x)_i，y_i) Corresponding Filter template Filter _ FFT (u_i，v_i) The method comprises the following steps:

the ith human face sample image Im _ orign (x)_i，y_i) Corresponding left eye Filter template L _ Filter _ FFT (u)_i，v_i) And the ith human face sample image Im _ orign (x)_i，y_i) Corresponding right eye Filter template R _ Filter _ FFT (u)_i，v_i) Wherein, in the step (A),

the ASEF template Filter _ FFT (u, v) comprises: left eye ASEF template L _ Filter _ FFT (u, v) and right eye ASEF template R _ Filter _ FFT (u, v), where,

further, training the parameters of the fast local linear SVM through a preset number of face sample images, including:

generating corresponding human eye sample images from a preset number of human face sample images, and forming a sample matrix X from all the human eye sample images through dimensionality reduction;

obtaining a correlation matrix K between any two human eye sample images according to the sample matrix X;

acquiring a support vector of the sample matrix X and a weight corresponding to the support vector through the correlation matrix K and an SVM optimization algorithm;

and calculating parameters of the fast local linear SVM according to the support vector of the sample matrix X and the weight corresponding to the support vector.

Further, generating corresponding human eye sample images from a preset number of human face sample images, and forming a sample matrix X from all the human eye sample images through dimensionality reduction, wherein the method comprises the following steps:

setting the ith human face sample image Im _ orign (x)_i，y_i) Wherein i represents the image number of the human face sample, the value range of i is a positive integer greater than zero and less than N, N is the preset number, (x)_i，y_i) The ith human face sample image representing the image domain;

after acquiring human eye positive samples and human eye negative samples of all human face sample images, performing mean value removing and modular normalization processing on all the human eye positive samples and the human eye negative samples to generate human eye sample images corresponding to each human face sample image;

stretching the ith human eye sample image into a column vector corresponding to the ith human eye sample image, and taking the column vector corresponding to the ith human eye sample image as the ith column vector in the sample matrix X.

Further, the obtaining a correlation matrix K between any two face sample images according to the sample matrix X includes:

performing singular value SVD decomposition on the sample matrix X to obtain a result shown as the following formula:

X＝V∑U^H

the first decomposition matrix V and the third decomposition matrix U are unitary matrixes, the second decomposition matrix sigma is a semi-positive determined diagonal matrix, and H represents a conjugate transpose;

a sub-matrix of a first decomposition matrix V consisting of the first t column vectors of the first decomposition matrix V

And a sub-matrix of a second decomposition matrix sigma consisting of the first t rows and the first t columns of said second decomposition matrix sigma

Obtaining code word byForming a matrix G:

wherein the superscript one 1 denotes the sub-matrix to the second decomposition matrix sigma

Carrying out inversion operation;

generating a matrix G according to the codes and a column vector x corresponding to the ith human eye sample image_iThe column vector x corresponding to the jth human eye sample image_jObtaining the element K of the ith row and the jth column of the correlation matrix K by the following formula_ij：

K_ij＝(G^Tx_i)^T(G^Tx_j)x_i ^Tx_j

Wherein, K_ijA column vector x representing the i-th human eye sample image_iThe column vector x corresponding to the jth human eye sample image_jThe degree of correlation between them, T, represents the transposition operation.

Further, the obtaining the parameters of the fast local linear SVM according to the support vector of the sample matrix X and the weight corresponding to the support vector includes:

obtaining an intermediate matrix A according to the support vector of the sample matrix X, the weight corresponding to the support vector and the code generation matrix G by the following formula:

wherein M is the number of support vectors of the sample matrix X, s_jThe jth support vector, α, representing the sample matrix X_jRepresents the weight corresponding to the jth support vector, y_jRepresenting a training sample label corresponding to the jth support vector;

by A' ═ A + A according to the intermediate matrix A^T) A symmetric matrix A' is obtained;

performing eigenvalue decomposition on the symmetric matrix A 'to obtain eigenvalues of the symmetric matrix A' and eigenvectors corresponding to the eigenvalues;

according to the sequence of the eigenvalues of the symmetric matrix A 'from large to small, P selected eigenvalues and eigenvectors corresponding to the P selected eigenvalues are sequentially selected from the eigenvalues of the symmetric matrix A', and the P selected eigenvalues and the eigenvectors corresponding to the P selected eigenvalues are used as the parameters of the fast local linear SVM.

Further, determining a human eye position point from the preset number of human eye candidate points through a pre-trained parameter of the fast local linear support vector machine SVM, comprising:

stretching pixel block with k-th personal eye candidate point as neighborhood into vector z to be processed_k；

According to the vector z to be processed_kAnd the candidate decision value V1 of the kth personal eye candidate point is obtained through the following formula by the P selected feature values and the feature vectors corresponding to the P selected feature values:

wherein λ is_iRepresenting the ith selected characteristic value, q_iRepresenting the feature vector corresponding to the ith selected feature value;

adding the candidate decision value V1 of the kth personal eye candidate point and the filter response value corresponding to the kth personal eye candidate point to obtain a final decision value of the kth personal eye candidate point;

and selecting the human eye candidate point with the highest final judgment value from the preset number of human eye candidate points as the human eye position point.

In a second aspect, an embodiment of the present invention provides an apparatus for positioning a human eye, where the apparatus includes: a preprocessing unit, a filtering unit, a selecting unit and a determining unit, wherein,

the preprocessing unit is used for preprocessing the original face image to obtain an image to be processed;

the filtering unit is used for filtering each pixel point of the image to be processed according to a pre-trained average synthesis correlation filter (ASEF) template to obtain a filtering response value corresponding to each pixel point of the image to be processed;

the selecting unit is used for sequentially selecting a preset number of human eye candidate points from all pixel points of the image to be processed according to the sequence that the filtering response value corresponding to each pixel point of the image to be processed is from large to small;

and the determining unit is used for determining the human eye position points from the preset number of human eye candidate points through the parameters of the pre-trained fast local linear support vector machine SVM.

Further, the apparatus further comprises: a detection unit, a first training unit and a second training unit, wherein,

the detection unit is used for carrying out face detection on a preset number of sample images, normalizing the detected face images to a preset size, and then carrying out Gaussian smoothing to obtain a preset number of face sample images;

the first training unit is used for training the ASEF template through a preset number of face sample images;

the second training unit is used for training the parameters of the fast local linear SVM through a preset number of face sample images.

Further, the first training unit is configured to:

the ith human face sample image Im _ orign (x)_i，y_i) After windowing, performing two-dimensional fast Fourier transform on the ith windowed human face sample image to obtain a frequency domain image Im _ orign _ FFT (u) of the ith human face sample image_i，v_i) Wherein i represents the number of the face sample image, and the value range of i is a positive integer greater than zero and less than N, where N is the preset number, (x)_i，y_i) The ith human face sample image representing the image domain, (u)_i，v_i) Ith sheet showing frequency domainA face sample image;

and according to the ith human face sample image Im _ orign (x)_i，y_i) Actual position of the human eye (x)₀，y₀) Acquiring a frequency domain image Im _ orign _ FFT (u) of the ith human face sample image_i，v_i) Corresponding Gaussian impulse response after passing through the filter template

and acquiring the ith human face sample image Im _ orign (x) according to the following formula_i，y_i) Corresponding Filter template Filter _ FFT (u_i，v_i)，

where the symbol Σ is the sum operator.

Further, the first training unit is configured to train the face image according to the ith face sample image Im _ orign (x)_i，y_i) Actual position of the left eye of the middle human (x)_0L，y_0L) Acquiring a frequency domain image Im _ orign _ FFT (u) of the ith human face sample image_i，v_i) Corresponding left-eye Gaussian impulse response after passing through the filter template

And according to the ith human face sample image Im _ orign (x)_i，y_i) Actual position of the right eye of the middle person (x)_0R，y_0R) Acquiring a frequency domain image Im _ orign _ FFT (u) of the ith human face sample image_i，v_i) Corresponding right-eye Gaussian impulse response after passing through the filter template

further, the second training unit includes: a dimension reduction subunit, an acquisition subunit, and a computation subunit, wherein,

the dimension reduction subunit is used for generating corresponding human eye sample images from a preset number of human face sample images and forming a sample matrix X from all the human eye sample images through dimension reduction;

the obtaining subunit is configured to obtain a correlation matrix K between any two human eye sample images according to the sample matrix X; and the number of the first and second groups,

and the calculating subunit is configured to calculate a parameter of the fast local linear SVM according to the support vector of the sample matrix X and the weight corresponding to the support vector.

Further, the dimension reduction subunit is configured to:

after acquiring human eye positive samples and human eye negative samples of all human face sample images, performing mean value removing and modulo two normalization processing on all human eye positive samples and human eye negative samples to generate human eye sample images corresponding to each human face sample image;

and stretching the ith human eye sample image into a column vector corresponding to the ith human eye sample image, and taking the column vector corresponding to the ith human eye sample image as the ith column vector in the sample matrix X.

Further, the obtaining subunit is configured to:

X＝V∑U^H

and a sub-matrix of the first decomposition matrix V composed of the first t column vectors of the first decomposition matrix V

The code generation matrix G is obtained by:

where the superscript-1 denotes the sub-matrix to the second decomposition matrix sigma

Carrying out inversion operation;

and generating a matrix G according to the codes and a column vector x corresponding to the ith human eye sample image_iThe column vector x corresponding to the jth human eye sample image_jObtaining the element K of the ith row and the jth column of the correlation matrix K by the following formula_ij：

K_ij＝(G^Tx_i)^T(G^Tx_j)x_i ^Tx_j

Wherein, K_ijA column vector x representing the i-th human eye sample image_iAnd the jth human eye sample imageLike corresponding column vector x_jThe degree of correlation between them, T, represents the transposition operation.

Further, the calculating subunit is configured to:

and, according to said intermediate matrix a, by a' ═ a + a^T) A symmetric matrix A' is obtained;

and sequentially selecting P selected eigenvalues and eigenvectors corresponding to the P selected eigenvalues from the eigenvalues of the symmetric matrix A 'according to the descending order of the eigenvalues of the symmetric matrix A', and taking the P selected eigenvalues and the eigenvectors corresponding to the P selected eigenvalues as the parameters of the fast local linear SVM.

Further, the determining unit is configured to:

And according to the vector z to be processed_kAnd the candidate decision value V1 of the kth personal eye candidate point is obtained through the following formula by the P selected feature values and the feature vectors corresponding to the P selected feature values:

adding the candidate judgment value V1 of the kth personal eye candidate point and the filter response value corresponding to the kth personal eye candidate point to obtain a final judgment value of the kth personal eye candidate point;

The embodiment of the invention provides a method and equipment for positioning human eyes, wherein the position of the human eyes is determined by an ASEF template obtained by pre-training an image to be positioned and parameters of a fast local linear SVM; the time complexity of human eye positioning can be reduced under the high-precision condition, so that high positioning efficiency is realized while high-precision human eye positioning is kept.

Drawings

Fig. 1 is a schematic flow chart of a method for positioning human eyes according to an embodiment of the present invention;

fig. 2 is a schematic flow chart illustrating filtering performed on each pixel point of an image to be processed according to a pre-trained ASEF template according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of determining a position point of a human eye according to an embodiment of the present invention;

fig. 4 is a schematic flow chart of a method for training parameters of an ASEF template and a fast local linear SVM according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a Gaussian impulse response provided by an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an apparatus for positioning human eyes according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of another human eye positioning device according to an embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

Referring to fig. 1, a flow chart of a method for positioning a human eye according to an embodiment of the present invention is shown, where the method may include:

s101: preprocessing an original face image to obtain an image to be processed;

exemplarily, since the face image is obtained under natural conditions, the original face image needs to be preprocessed before being processed, so as to remove interference caused by uneven illumination, noise and other factors in the original face image;

in this embodiment, the process of preprocessing the original face image may include:

firstly, detecting an original face image from an original obtained image by a common face detector;

then, normalizing the original face image into an image with a preset size, wherein the preset size is selected to be 100 × 100 in the embodiment, and the unit is a pixel;

and then, performing Gaussian smoothing on the normalized image, and dividing the pixel value of the position corresponding to the normalized image by the pixel value of the position corresponding to the image after the Gaussian smoothing, thereby completing the preprocessing of the original face image and obtaining the image to be processed without the influences of uneven illumination, noise and the like.

It should be noted that, because the preprocessing is used to remove the influence of uneven illumination, noise, etc. in the original image, the embodiment of the present invention is not limited to use only one preprocessing method, and other preprocessing methods that can achieve corresponding effects may also be applied to the embodiment of the present invention.

S102: filtering each pixel point of the image to be processed according to a pre-trained ASEF template to obtain a corresponding filtering response value of each pixel point of the image to be processed;

it should be noted that the pre-trained ASEF template essentially belongs to a filter, and therefore, a specific process of filtering each pixel point of the image to be processed according to the pre-trained ASEF template may be as shown in fig. 2: taking each pixel point of the image to be processed as an input value IN of the pre-trained ASEF template, and obtaining an output value OUT which is a filter response value corresponding to each pixel point of the image to be processed after the filtering processing of the pre-trained ASEF template;

as for the filtering process shown in fig. 2, it can be understood that, in the image domain, the convolution operation is performed on the image to be processed and the pre-trained ASEF template; in the frequency domain, the frequency spectrum of the image to be processed is multiplied by the frequency spectrum of the ASEF template which is trained in advance.

S103: sequentially selecting a preset number of human eye candidate points from all pixel points of the image to be processed according to the sequence of the filtering response values corresponding to each pixel point of the image to be processed from large to small;

s104: determining human eye position points from the preset number of human eye candidate points through parameters of a pre-trained fast local linear SVM;

for example, in step S104, a corresponding determination value may be determined for each eye candidate point through a pre-trained parameter of the fast local linear SVM, so as to determine a human eye position point in the human eye candidate points; however, in order to improve the accuracy of determining the eye position points, the determination value corresponding to each eye candidate point may be considered in combination with the filter response value corresponding to the eye candidate point obtained in step S102 to obtain a final determination value corresponding to each eye candidate point, and the eye candidate point with the highest final determination value is used as the eye position point;

specifically, referring to fig. 3, determining the human eye position point from the preset number of human eye candidate points through the parameters of the pre-trained fast local linear support vector machine SVM may include S1041 to S1044:

s1041: stretching pixel block with k-th personal eye candidate point as neighborhood into vector z to be processed_k；

S1042: according to the vector z to be processed_kP selected eigenvalues and eigenvectors corresponding to the P selected eigenvalues, and obtaining the kth personal eye candidate by the following formulaCandidate determination value of point V1:

further, the P selected feature values and the feature vectors corresponding to the P selected feature values are parameters of the pre-trained fast local linear support vector machine SVM, λ_iRepresenting the ith selected characteristic value, q_iRepresenting the eigenvector corresponding to the ith selected eigenvalue, wherein T is transposition operation;

s1043: adding the candidate decision value V1 of the kth personal eye candidate point and the filter response value corresponding to the kth personal eye candidate point to obtain a final decision value of the kth personal eye candidate point;

s1044: and selecting the human eye candidate point with the highest final judgment value from the preset number of human eye candidate points as the human eye position point.

It can be understood that, in the prior art, usually, the method for locating the human eye directly by the SVM needs to calculate each pixel point of the face image, and therefore, the time complexity of the prior art for the method for locating the human eye is o (n)²) (ii) a In the human eye positioning method provided in this embodiment, because the positioning is performed only in the preset number of human eye candidate points through the fast local linear SVM, the time complexity of the method provided in this embodiment is o (n), so that the time complexity in the human eye positioning process is reduced, and the efficiency is improved.

It should be noted that, since the pre-trained ASEF template and the pre-trained fast local linear SVM parameters are used in step S102 and step S104, respectively, the parameters of the ASEF template and the fast local linear SVM need to be trained before the method shown in fig. 1 locates the human eye. Therefore, on the basis of the embodiment shown in fig. 1, referring to fig. 4, it shows a flow of a training method for parameters of an ASEF template and a fast local linear SVM provided in an embodiment of the present invention, where the training method may be obtained by training a preset number of existing sample images, and the specific process may include:

s401: carrying out face detection on a preset number of sample images, normalizing the detected face images to a preset size, and carrying out Gaussian smoothing to obtain a preset number of face sample images;

optionally, the specific process of S401 may be the same as the specific process of S101, and is not described herein again.

S402: training parameters of the ASEF template and the fast local linear SVM through a preset number of face sample images;

exemplarily, the parameters of the ASEF template and the fast local linear SVM may be trained through a preset number of face sample images, respectively; therefore, the following describes in detail the training process of the parameters of the ASEF template and the fast local linear SVM respectively through a preset number of face sample images according to the embodiment of the present invention.

Preferably, the training of the ASEF template by a preset number of face sample images may include:

first, the ith human face sample image Im _ orign (x)_i，y_i) After windowing, performing two-dimensional fast Fourier transform on the ith windowed human face sample image to obtain a frequency domain image Im _ orign _ FFT (u) of the ith human face sample image_i，v_i)；

Wherein i represents the number of the face sample image, and the value range of i is a positive integer greater than zero and less than N, where N is the preset number, (x)_i，y_i) The ith human face sample image representing the image domain, (u)_i，v_i) An ith human face sample image representing a frequency domain; in this embodiment, the ith human face sample image Im _ orign (x) may be obtained first_i，y_i) And after windowing is carried out through the cosine window, two-dimensional fast Fourier transform is carried out.

Secondly, according to the ith human face sample image Im _ orign (x)_i，y_i) Actual position of the human eye (x)₀，y₀) Acquiring a frequency domain image Im _ orign _ FFT (u) of the ith human face sample image_i，v_i) Warp beamCorresponding Gaussian impulse response after passing through the filter template

And responding to the Gaussian pulse with a Response _ im (x)_i，y_i) Performing two-dimensional fast Fourier transform to obtain the Gaussian pulse Response _ im (x)_i，y_i) Frequency domain Response _ im _ FFT (u)_i，v_i) Wherein e is a natural base number, and σ is the Gaussian impulse Response _ im (x)_i，y_i) The variance of (a); and the gaussian impulse response may be as shown in fig. 5;

as will be appreciated, since the human eye can be divided into a left eye and a right eye, this step may specifically comprise:

And responding to the left-eye Gaussian pulse by L _ Response _ im (x)_i，y_i) Performing two-dimensional fast Fourier transform to obtain the left-eye Gaussian pulse Response L _ Response _ im (x)_i，y_i) Frequency domain Response of L _ Response _ im _ FFT (u)_i，v_i) (ii) a And the number of the first and second groups,

And responding to the Gaussian pulse of the right eye R _ Response _ im (x)_i，y_i) A two-dimensional fast fourier transform is performed,obtaining the Gaussian impulse Response R _ Response _ im (x) of the right eye_i，y_i) Frequency domain Response of R _ Response _ im _ FFT (u)_i，v_i)；

Then, the ith human face sample image Im _ orign (x) is obtained according to the following formula_i，y_i) Corresponding Filter template Filter _ FFT (u_i，v_i)，

Corresponding to the previous step, the Gaussian impulse Response L _ Response _ im _ FFT (u) of the left eye_i，v_i) And a Gaussian impulse Response R _ Response _ im _ FFT (u) for the right eye_i，v_i) The ith human face sample image Im _ orign (x) obtained in the step_i，y_i) Corresponding Filter template Filter _ FFT (u_i，v_i) The method comprises the following steps: the ith human face sample image Im _ orign (x)_i，y_i) Corresponding left eye Filter template L _ Filter _ FFT (u)_i，v_i) And the ith human face sample image Im _ orign (x)_i，y_i) Corresponding right eye Filter template R _ Filter _ FFT (u)_i，v_i) Wherein, in the step (A),

finally, according to the following formula, averaging operation is carried out on Filter templates corresponding to all face sample images to obtain the ASEF template Filter _ FFT (u, v),

where the symbol Σ is the sum operator.

Corresponding to the ith human face sample image Im \uobtained in the last steporign(x_i，y_i) Left eye Filter template L _ Filter _ FFT (u)_i，v_i) And the ith face sample image Im _ orign (x)_i，y_i) Corresponding right eye Filter template R _ Filter _ FFT (u)_i，v_i) The Filter _ FFT (u, v) of the ASEF template obtained in this step may include: left eye ASEF template L _ Filter _ FFT (u, v) and right eye ASEF template R _ Filter _ FFT (u, v), where,

preferably, the training of the parameters of the fast local linear SVM through a preset number of face sample images may include:

firstly, generating corresponding human eye sample images from a preset number of human face sample images, and forming a sample matrix X from all the human eye sample images through dimensionality reduction;

further, the step may include:

setting the ith human face sample image Im _ orign (x)_i，y_i) Wherein i represents a face sample image number, a value range of i is a positive integer greater than zero and smaller than N, N is the preset number, in this embodiment, N is 10000, (x) is_i，y_i) The ith human face sample image representing the image domain; it should be noted that, in this embodiment, a rectangular pixel block of a fixed size centered on the human eye may be used as a positive sample of the human eye; taking a rectangular pixel block of a human eye region except for a fixed pixel from the center of the human eye as a human eye negative sample, it can be understood that a common region exists between the human eye positive sample and the human eye negative sample.

After acquiring human eye positive samples and human eye negative samples of all human face sample images, performing mean value removing and modulo two normalization processing on all human eye positive samples and human eye negative samples to generate human eye sample images corresponding to each human face sample image; in this embodiment, the size of the human eye sample image corresponding to each human face sample image may be 31 pixels × 31 pixels;

stretching the ith human eye sample image into a column vector corresponding to the ith human eye sample image, and taking the column vector corresponding to the ith human eye sample image as the ith column vector in the sample matrix X; in this embodiment, the dimension of the column vector corresponding to the i-th human eye sample image is 31 × 31 ═ 961;

and, the above-mentioned processing is performed on each human eye sample image, and finally the sample matrix X is obtained, so that it can be known that, in the present embodiment, the sample matrix is a 961 × 10000 matrix.

Secondly, obtaining a correlation matrix K between any two human eye sample images according to the sample matrix X;

further, the step may include:

X＝V∑U^H

in the present embodiment, the first decomposition matrix V is a 961 × 961 matrix, the second decomposition matrix Σ is a 961 × 10000 matrix, and the third decomposition matrix U is 10000 × 10000 matrix.

The code generation matrix G is obtained by the following equation:

Carrying out inversion operation;

in this embodiment, the value of t may be set to 150, and thus, the sub-matrix of the first decomposition matrix V

A sub-matrix of 961 × 150 matrix, a second decomposition matrix ∑

The code generation matrix G is a 961 × 150 matrix.

K_ij＝(G^Tx_i)^T(G^Tx_j)x_i ^Tx_j

Wherein, K_ijA column vector x representing the i-th human eye sample image_iThe column vector x corresponding to the jth human eye sample image_jThe correlation degree between the two elements, T, represents a transposition operation, and it can be understood that all the elements of the correlation matrix K can be obtained through this step, and the correlation matrix K is a symmetric matrix.

Then, acquiring a support vector of the sample matrix X and a weight corresponding to the support vector through the correlation matrix K and an SVM optimization algorithm; in this embodiment, the support vector of the sample matrix X may be represented by a labeled column vector, where an element in the labeled column vector is a column label of the support vector of the sample matrix X in the sample matrix X; the weight corresponding to the support vector can also be represented by a weight column vector, the elements in the weight column vector are the weights corresponding to the support vectors of the sample matrix X represented by the corresponding positions in the label column vector,

finally, calculating parameters of the fast local linear SVM according to the support vector of the sample matrix X and the weight corresponding to the support vector;

further, the step may include:

Through the two preferred training methods, the parameters of the ASEF template and the fast local linear SVM can be obtained, and then the human eyes in the image can be located through the embodiment shown in fig. 1 according to the parameters of the ASEF template and the fast local linear SVM obtained through training.

The embodiment of the invention provides a method for positioning human eyes, which is characterized in that the positions of the human eyes are determined by an ASEF template obtained by pre-training a face image and parameters of a fast local linear SVM; the time complexity of human eye positioning can be reduced under the high-precision condition, so that high positioning efficiency is realized while high-precision human eye positioning is kept.

Based on the same technical concept of the foregoing embodiment, referring to fig. 6, which illustrates an apparatus 60 for positioning human eyes provided by an embodiment of the present invention, the apparatus may include: a preprocessing unit 601, a filtering unit 602, a selecting unit 603, and a determining unit 604, wherein,

the preprocessing unit 601 is configured to preprocess an original face image to obtain an image to be processed;

the filtering unit 602 is configured to filter each pixel point of the to-be-processed image obtained by the preprocessing unit 601 according to a pre-trained average synthesis correlation filter ASEF template, and obtain a filtering response value corresponding to each pixel point of the to-be-processed image;

the selecting unit 603 is configured to sequentially select a preset number of human eye candidate points from all pixel points of the image to be processed according to a descending order of the filtering response values corresponding to each pixel point of the image to be processed, which are obtained by the filtering unit 602;

the determining unit 604 is configured to determine a human eye position point from a preset number of human eye candidate points selected by the selecting unit 603 according to a parameter of a pre-trained fast local linear support vector machine SVM.

Exemplarily, referring to fig. 7, the apparatus further comprises: a detection unit 605, a first training unit 606 and a second training unit 607, wherein,

the detection unit 605 is configured to perform face detection on a preset number of sample images, normalize the detected face images to a preset size, and perform gaussian smoothing to obtain a preset number of face sample images;

the first training unit 606 is configured to train the ASEF template through a preset number of face sample images obtained by the detection unit 605;

the second training unit 607 is configured to train the parameters of the fast local linear SVM through the preset number of face sample images obtained by the detection unit 605.

Preferably, the first training unit 606 may be configured to:

the ith human face sample image Im _ orign (x)_i，y_i) After windowing, performing two-dimensional fast Fourier transform on the ith windowed human face sample image to obtain a frequency domain image Im _ orign _ FFT (u) of the ith human face sample image_i，v_i) Wherein i represents the number of the face sample image, and the value range of i is a positive integer greater than zero and less than N, where N is the preset number, (x)_i，y_i) The ith human face sample image representing the image domain, (u)_i，v_i) An ith human face sample image representing a frequency domain;

where the symbol Σ is the sum operator.

Specifically, the first training unit 606 is configured to:

Accordingly, the number of the first and second electrodes,the ith human face sample image Im _ orign (x)_i，y_i) Corresponding Filter template Filter _ FFT (u_i，v_i) The method comprises the following steps:

preferably, referring to fig. 7, the second training unit 607 includes: a dimension reduction sub-unit 6071, an acquisition sub-unit 6072 and a calculation sub-unit 6073, wherein,

the dimension reduction subunit 6071 is configured to generate corresponding human eye sample images from a preset number of human face sample images, and form a sample matrix X from all the human eye sample images through dimension reduction;

the obtaining subunit 6072 is configured to obtain a correlation matrix K between any two human eye sample images according to a sample matrix X formed by the dimension reduction subunit 6071; and the number of the first and second groups,

the calculating subunit 6073 is configured to calculate a parameter of the fast local linear SVM according to the support vector of the sample matrix X obtained by the obtaining subunit 6072 and a weight corresponding to the support vector.

Specifically, the dimension reduction subunit 6071 is configured to:

Specifically, the acquiring subunit 6072 is configured to:

X＝V∑U^H

Is obtained byCode generation matrix G:

Carrying out inversion operation;

K_ij＝(G^Tx_i)^T(G^Tx_j)x_i ^Tx_j

In particular, the computing subunit 6073 is configured to:

Further, the determining unit 604 is configured to:

The embodiment provides a device 60 for positioning human eyes, which determines the positions of human eyes by an ASEF template obtained by pre-training a human face image and parameters of a fast local linear SVM; the time complexity of human eye positioning can be reduced under the high-precision condition, so that high positioning efficiency is realized while high-precision human eye positioning is kept.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims

1. A method of eye localization, the method comprising:

preprocessing an original face image to obtain an image to be processed;

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein the training of the ASEF template by a preset number of face sample images comprises:

the ith human face sample image Im _ orign (x)_i,y_i) After windowing, performing two-dimensional fast Fourier transform on the ith windowed human face sample image to obtain a frequency domain image Im _ orign _ FFT (u) of the ith human face sample image_i,v_i) Wherein i represents the number of the face sample image, the value range of i is a positive integer which is larger than zero and smaller than N, wherein,n is the predetermined number, (x)_i,y_i) The ith human face sample image representing the image domain, (u)_i,v_i) An ith human face sample image representing a frequency domain;

according to the ith human face sample image Im _ orign (x)_i,y_i) Actual position of the human eye (x)₀,y₀) Acquiring a frequency domain image Im _ orign _ FFT (u) of the ith human face sample image_i,v_i) Corresponding Gaussian impulse response after passing through the filter template

And responding to the Gaussian pulse with a Response _ im (x)_i,y_i) Performing two-dimensional fast Fourier transform to obtain the Gaussian pulse Response _ im (x)_i,y_i) Frequency domain Response _ im _ FFT (u)_i,v_i) Wherein e is a natural base number, and σ is the Gaussian impulse Response _ im (x)_i,y_i) The variance of (a);

acquiring the ith human face sample image Im _ orign (x) according to the following formula_i,y_i) Corresponding Filter template Filter _ FFT (u_i,v_i)，

where the symbol Σ is the sum operator.

4. The method of claim 3, wherein the image is based on an i-th face sample image Im _ orign (x)_i,y_i) Actual position of the human eye (x)₀,y₀) Obtaining the ith human face sample imageFrequency domain image Im _ orign _ FFT (u)_i,v_i) Corresponding Gaussian impulse response after passing through the filter template

And responding to the Gaussian pulse with a Response _ im (x)_i,y_i) Performing two-dimensional fast Fourier transform to obtain the Gaussian pulse Response _ im (x)_i,y_i) Frequency domain Response _ im _ FFT (u)_i,v_i) The method comprises the following steps:

according to the ith human face sample image Im _ orign (x)_i,y_i) Actual position of the left eye of the middle human (x)_0L,y_0L) Acquiring a frequency domain image Im _ orign _ FFT (u) of the ith human face sample image_i,v_i) Corresponding left-eye Gaussian impulse response after passing through the filter template

And responding to the left-eye Gaussian pulse by L _ Response _ im (x)_i,y_i) Performing two-dimensional fast Fourier transform to obtain the left-eye Gaussian pulse Response L _ Response _ im (x)_i,y_i) Frequency domain Response of L _ Response _ im _ FFT (u)_i,v_i)；

According to the ith human face sample image Im _ orign (x)_i,y_i) Actual position of the right eye of the middle person (x)_0R,y_0R) Acquiring a frequency domain image Im _ orign _ FFT (u) of the ith human face sample image_i,v_i) Corresponding right-eye Gaussian impulse response after passing through the filter template

And responding to the Gaussian pulse of the right eye R _ Response _ im (x)_i,y_i) Performing two-dimensional fast Fourier transform to obtain the Gaussian impulse Response R _ Response _ im (x) of the right eye_i,y_i) Frequency domain Response of R _ Response _ im _ FFT (u)_i,v_i)；

Accordingly, the ith human face sample image Im _ orign (x)_i,y_i) Corresponding Filter template Filter _ FFT (u_i,v_i) The method comprises the following steps:

the ith human face sample image Im _ orign (x)_i,y_i) Corresponding left eye Filter template L _ Filter _ FFT (u)_iv,_i) And the ith human face sample image Im _ orign (x)_i,y_i) Corresponding right eye Filter template R _ Filter _ FFT (u)_i,v_i) Wherein, in the step (A),

5. the method of claim 2, wherein the training of the parameters of the fast local linear SVM is performed by using a preset number of face sample images, and comprises:

6. The method of claim 5, wherein generating corresponding human eye sample images from a preset number of human face sample images, and forming a sample matrix X from all human eye sample images by dimension reduction comprises:

setting the ith human face sample image Im _ orign (x)_i,y_i) Wherein i represents the image number of the human face sample, the value range of i is a positive integer greater than zero and less than N, N is the preset number, (x)_i,y_i) The ith human face sample image representing the image domain;

7. The method according to claim 6, wherein the obtaining a correlation matrix K between any two face sample images according to the sample matrix X comprises:

X＝VΣU^H

the first decomposition matrix V and the third decomposition matrix U are unitary matrixes, the second decomposition matrix sigma is a semi-positive diagonal matrix, and H represents a conjugate transpose;

And t rows before the second decomposition matrix ΣSubmatrix of a second decomposition matrix sigma composed of t columns

The code generation matrix G is obtained by:

wherein the superscript-1 denotes the sub-matrix to the second decomposition matrix Σ

Carrying out inversion operation;

K_ij＝(G^Tx_i)^T(G^Tx_j)x_i ^Tx_j

8. The method according to claim 7, wherein the obtaining parameters of the fast local linear SVM according to the support vector of the sample matrix X and the corresponding weight of the support vector comprises:

9. The method of claim 8, wherein determining human eye location points from the preset number of human eye candidate points through parameters of a pre-trained fast local linear Support Vector Machine (SVM) comprises:

10. An apparatus for eye positioning, the apparatus comprising: a preprocessing unit, a filtering unit, a selecting unit and a determining unit, wherein,

11. The apparatus of claim 10, further comprising: a detection unit, a first training unit and a second training unit, wherein,

12. The apparatus of claim 11, wherein the first training unit is configured to:

the ith human face sample image Im _ orign (x)_i,y_i) After windowing, performing two-dimensional fast Fourier transform on the ith windowed human face sample image to obtain a frequency domain image Im _ orign _ FFT (u) of the ith human face sample image_i,v_i) Wherein i represents the number of the face sample image, and the value range of i is a positive integer greater than zero and less than N, where N is the preset number, (x)_i,y_i) The ith human face sample image representing the image domain, (u)_i,v_i) An ith human face sample image representing a frequency domain;

and according to the ith human face sample image Im _ orign (x)_i,y_i) Actual position of the human eye (x)₀,y₀) Acquiring a frequency domain image Im _ orign _ FFT (u) of the ith human face sample image_i,v_i) Corresponding Gaussian impulse response after passing through the filter template

and acquiring the ith human face sample image Im _ orign (x) according to the following formula_i,y_i) Corresponding Filter template Filter _ FFT (u_i,v_i)，

where the symbol Σ is the sum operator.

13. The apparatus according to claim 12, wherein the first training unit is configured to train the face recognition unit according to an ith face sample image Im _ orign (x)_i,y_i) Actual position of the left eye of the middle human (x)_0L,y_0L) Acquiring a frequency domain image Im _ orign _ FFT (u) of the ith human face sample image_i,v_i) Corresponding left-eye Gaussian impulse response after passing through the filter template

And according to the ith human face sample image Im _ orign (x)_i,y_i) Actual position of the right eye of the middle person (x)_0R,y_0R) Acquiring a frequency domain image Im _ orign _ FFT (u) of the ith human face sample image_i,v_i) Corresponding right-eye Gaussian impulse response after passing through the filter template

And responding to the Gaussian pulse of the right eye R _ Response _ im (x)_i,y_i) Performing two-dimensional fast Fourier transform to obtain the Gaussian impulse Response R _ Response i _ m (x) of the right eye_iy_i) Frequency domain Response R _ Response i _ m FF _ T (u)_iv_i)；

the ith human face sample image Im _ orign (x)_i,y_i) Corresponding left eye Filter template L _ Filter _ FFT (u)_i,v_i) And the ith human face sample image Im _ orign (x)_i,y_i) Corresponding right eye Filter template R _ Filter _ FFT (u)_i,v_i) Wherein, in the step (A),

14. the apparatus of claim 11, wherein the second training unit comprises: a dimension reduction subunit, an acquisition subunit, and a computation subunit, wherein,

15. The apparatus of claim 14, wherein the dimension reduction subunit is configured to:

16. The apparatus of claim 15, wherein the obtaining subunit is configured to:

X＝VΣU^H

A sub-matrix of a second decomposition matrix Σ consisting of t rows before t columns before t rows before the second decomposition matrix Σ

The code generation matrix G is obtained by:

Carrying out inversion operation;

K_ij＝(G^Tx_i)^T(G^Tx_j)x_i ^Tx_j

17. The apparatus of claim 16, wherein the computing subunit is configured to:

18. The apparatus of claim 17, wherein the determining unit is configured to: