AU739936B2

AU739936B2 - Face detection in digital images

Info

Publication number: AU739936B2
Application number: AU63173/99A
Authority: AU
Inventors: Andrew Peter Bradley; Edwin Ho; Alison Joan Lennon
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1998-06-10
Filing date: 1999-12-07
Publication date: 2001-10-25
Anticipated expiration: 2019-06-09
Also published as: AU6317399A

Description

S&F Ref: 461584D1

AUSTRALIA

PATENTS ACT 1990 COMPLETE SPECIFICATION

ORIGINAL

Name and Address of Applicant: Actual Inventors: Address for Service: Canon Kabushiki Kaisha 30-2, Shimomaruko 3-chome Ohta-ku, Tokyo, 146, Japan Edwin Ho Alison Joan Lennon Andrew Peter Bradley Spruson Ferguson, Patent Attorneys St Martins Tower, 31 Market Street, Sydney, New South Wales, 2000, Australia Invention Title: Face Detection in Digital Images The following statement is a full description of this invention, including the best method of performing it known to me/us:- 5845c [I:\DAYLIB\LIBU] 17547.doc:vsg -1- FACE DETECTION IN DIGITAL IMAGES Technical Field of the Invention The present invention relates to digital colour images and, in particular, to the detection of faces in colour digital images.

Background Art Colour digital images are increasingly being stored in multi-media databases, and utilised in various computer applications. In many such applications it is desirable to be able to detect the location of a face in a visual image as one step in a multi-step process. The multi-step process can include content-based image retrieval, personal identification or verification for use with automatic teller machines or security cameras, or automated interaction between humans and computational devices.

Various prior art face detection methods are known including eigenfaces, neural networks, clustering, feature identification and skin colour techniques. Each of these techniques has its strengths and weaknesses, however, one feature which they have in common is that they are computationally intensive and therefore very slow, or they are fast but not sufficiently robust to detect faces.

The eigenface or eigenvector method is particularly suitable for face recognition and there is some tolerance for lighting variation, however it does not cope with different viewpoints of faces and does not handle occlusion of various facial features (such as .i 20 occurs ifa person is wearing sunglasses). Also it is not scale invariant.

.ooooi The neural network approach utilises training based on a large number of face images and non-face images and has the advantages of being relatively simple to implement, providing some tolerance to the occlusion of facial features and some tolerance to lighting variation. It is also relatively easy to improve the detection rate by re-training the neural network using false detections. However, it is not scale invariant, CFP1327AUA IPR20A 461584DI [I:\ELEC\CISRA\IPR\Ipr2Oa]461584D1 .doc:Idp -2does not cope with different viewpoints or orientation, and leads to an exhaustive process to locate faces on an image.

The clustering technique is somewhat similar to the eigenface approach. A pixel window (eg. 20 x 20) is typically moved over the image and the distance between the resulting test pattern and a prototype face image and a prototype non-face image is represented by a vector. The vector captures the similarity and differences between the test pattern and the face model. A neural network can then be trained to classify as to whether the vector represents a face or a non-face. While this method is robust, it does not cope with different scales, different viewpoints or orientations. It leads to an exhaustive approach to locate faces and relies upon assumed parameters.

The feature identification method is based upon searching for potential facial features or groups of facial features such as eyebrows, eyes, nose and mouth. The detection process involves identifying facial features and grouping these features into feature pairs, partial face groups, or face candidates. This process is advantageous in that it is relatively scale invariant, there is no exhaustive searching, it is able to handle the occlusion of some facial features and it is also able to handle different viewpoints and orientations. Its main disadvantages are that there are potentially many false detections and that its performance is very dependent upon the facial feature detection algorithms used.

.i 20 The use of skin colour to detect human faces is described in a paper by Yang J •oooo and Waibel A (1995) "Tracking Human Faces in Real-Time" CMU-CS-95-210, School of Computer Science, Carnegie Mellon University. This proposal was based on the concept that the human visual system adapts to different levels of brightness and to different illumination sources which implies that the human perception of colour is consistent within a wide range of environmental lighting conditions. It was therefore thought CFP1327AUA IPR20A 461584DI [I:\ELEC\CISRA\IPR\Ipr2Oa]461 584D.doc:Idp -3possible to remove brightness from the skin colour representation while preserving accurate, but low dimensional, colour information. As a consequence, in this prior art technique, the chromatic colour space was used. Chromatic colours (eg. r and g) can be derived from the RGB values as: r R G+ B) and g= G+ B) These chromatic colours are known as "pure" colours in the absence of brightness.

Utilising this colour space, Yang and Waibel found the distribution of skin colour of different people, including both different persons and different races, was clustered together. This means that the skin colours of different people are very close and that the main differences are in differences of intensity.

This prior art method first of all generated a skin colour distribution model using a set of example face images from which skin colour regions were manually selected.

Then the test image was converted to the chromatic colour space. Next each image in the test image (as converted) was then compared to the distribution of the skin colour model.

Finally, all skin colour pixels so detected were identified, and regions of adjacent skin o* colour pixels could then be considered potential face candidates.

This prior art method has the advantage that processing colour is much faster than processing individual facial features, that colour is substantially orientation invariant .i 20 and that it is insensitive to the occlusions of some facial features. The system is also 0o.o.: substantially viewpoint invariant and scale invariant. However, the method suffers from a number of disadvantages including that the colour representation of the face can be influenced by different lighting conditions, and that different cameras (eg. digital or film) can produce different colour values even for the same person in the same environment.

CFP]327AUA IPR20A 461584DI [I:\ELEC\CISRA\IPR\Ipr2Oa]461 584D1.doc:ldp -4- However a significant disadvantage of the prior art methods is that the skin colour model is not very discriminating (ie. selecting pixels on a basis of whether they are included in the skin colour distribution results in a lot of non-skin colour pixels being included erroneously). It is also difficult to locate clusters or regions of skin colour pixels that can be considered as candidate faces.

Disclosure of the Invention An object of the present invention is to provide an improved method of detecting one or more faces in digital colour images.

In accordance with a first aspect of the present invention there is disclosed a method of detecting a face in a colour digital image formed of a plurality of pixels, said method comprising the steps of: testing the colour of said pixels to determine those said pixels having predominantly skin colour, said testing utilising at least one image capture condition provided with said image; and (ii) subjecting only said those pixels determined in step as having predominantly skin colour to further facial feature analysis whereby those said pixels not having a predominantly skin colour are not subjected to said further facial feature analysis.

Preferably each image capture condition is acquired at a time the image is 20 captured. Advantageously, the image is encoded according to a predetermined format and at least one image capture condition is represented as meta-data associated with the format. Most preferably the at least one image capture condition comprises lighting conditions at a time the image was captured.

In a particular implementation step comprises the sub-step, preceding said testing, of: CFP1327AUA IPR20A 461584D1 [I:\ELEC\CISRA\IPR\lpr20a]461584Dl.doc:ldp dividing said image into a plurality of regions, each said region comprising a plurality of said pixels; and wherein said testing is performed on pixels within each said region to determine those ones of said regions that are predominantly skin colour and step (ii) comprises performing said further facial feature analysis on only those said regions determined to be predominantly of skin colour.

In accordance with another aspect of the present invention, there is disclosed a method of detecting a face in a colour digital image, said method comprising the steps of: segmenting said image into a plurality of regions each having a substantially homogenous colour; (ii) testing the colour of each said region created in step to determine those regions having predominantly skin colour; and (iii) subjecting only the regions determined in step (ii) to further facial feature analysis whereby said regions created in step not having a predominantly skin colour are not subjected to said further feature analysis.

Apparatus and computer readable media for performing the invention are also disclosed Brief Description of the Drawings A number of embodiments of the present invention will now be described with 20 reference to the drawings in which: Fig. 1 is a schematic representation of the pixels of a colour digital image; Fig. 2 shows the segmenting of the image of Fig. 1 into a plurality of regions each having a substantially homogenous colour according to a first embodiment; Fig. 3 is a flow chart of a face detection process according to the first embodiment; CFP1327AUA 1PR20A 461584DI [I:\ELEC\CISRA\IPR\Ipr2Oa]461 584D1 .doc:Idp -6- Fig. 4 is a schematic block diagram of a general purpose computer-upon which embodiments of the present invention can be practised; Fig. 5 is a flow chart depicting the generation of a face colour distribution model; Fig. 6 is a flow chart of a face detection process according to a second embodiment; and Fig. 7 is a flow chart of a face detection process according to a third embodiment.

Detailed Description including Best Mode Fig. 1 illustrates a typical colour digital image 1 having a size of 832 x 624 pixels 5, each of which has an RGB value.

According to a first embodiment of the present invention, rather than consider the skin colour of the image on a pixel by pixel basis as described above in relation to the prior art of Yang and Waibel, the image 1 is segmented into a number of regions. An example of such segmentation is schematically illustrated in Fig. 2 on the segmentation o 15 basis that all the pixels in each region 2 have substantially the same colour. Alternatively the image may be segmented into arbitrary regions for processing.

~The first embodiment implements a process 30 illustrated in the flow chart of Fig. 3, in which the regional segmentation of the image is carried out at step 31. Next, the regions of the image are converted in step 32 into the chromatic colour space (as 20 described above). The next step 33 is to select those of the regions determined in step 31 which have a specified percentage (typically in the range 90-95%) of pixels having a skin colour. These selected regions are then conveniently represented by a boundary box or other boundary indication. Finally, at step 34 the selected regions, including any combinations of overlapping regions, are subjected to a further analysis (preferably not based on skin colour) to determine if the selected region(s) represent a face or faces.

CFP1327AUA IPR20A 461584DI [I:\ELEC\CISRA\IPR\Ipr2a]461584D1 .doc:ldp This initial colour grouping can use any region based colour image segmentation technique. Preferably the image is partitioned into colour regions by seeking connected groups of pixels which have similar colours over a local region. Very small isolated initial spatial groupings can be ignored in order to find major colour regions and reduce a noise effect. A representative colour of each initial spatial region is determined by an average colour value of the region.

A colour region starts from an arbitrarily chosen pixel, which is compared with its neighbouring pixels. The region size is increased by adding neighbouring pixels, which are similar in colour, using a colour similarity threshold T. A neighbouring pixel is added to the region if IRp-RmI<T and IGp-Gml<T and IBp-Bmi<T, where Rp, Gp, Bp are R, G, B values of the neighbouring pixel and Rm, Gm, Bm represented average R, G, B values of the region.

When a region has no more neighbouring pixels of similar colour, the region stops growing and represents one of the initial spatial groupings. If the region size is 15 below a predetermined threshold value, it is ignored. A region having a pixel number

OSQ*@*

•equal or greater than the predetermined threshold is represented by its average colour.

S.

S

A new pixel which does not yet belong to any region is chosen to start a new colour region. The process continues until each pixel in the image either belongs to an initial spatial grouping or has been ignored as being part of a small region.

.i 20 The initial spatial groupings provide a colour region segmentation of the image with each region being represented by its average colour.

In this way, for most images where the bulk of the image is not a face, or part of a face, the majority of pixels will be grouped into regions or objects (be they foreground or background, etc) which are clearly not faces. Therefore these non-facial objects can be quickly eliminated on the basis of their colour.

CFP1327AUA IPR20A 461584DI [I:\ELEC\CISRA\IPR\Ipr2Oa]461 584D.doc:Idp -8- Once the regions have been determined, they are then converted into the "pure" chromatic colour space utilising the equations given above so as to provide r and g values.

A generous rule such as a rule that at least 85% of the pixels within a given region be of face colour can be used to select those regions worthy of further examination. Preferably, the test for face colour takes into account the nature of the original image, for example whether the image was taken with or without a flash. This information can be determined from the image source, eg. a camera.

Thereafter, only those selected regions are subjected to a further test to determine the presence of facial features. This further test provides a conclusive determination as to whether or not a region constitutes a face. In this connection, the further test is likely to be computationally slower and therefore the above described elimination of regions ensures that the computationally slow method is only applied to relatively small portions of the overall image. Thus the total processing time is reduced. Accordingly, the above method performs a computationally simple process on most, if not all pixels, and then only performs complex examination on skin colour regions.

The preferred method of verifying if a region represents a face relies upon edge detection techniques as a means of detecting facial features. In particular facial features such as eyes, eyebrows and mouths often appear as dark bars on a face and thus provide dark edges.

20 The preferred form of edge detection is use of an edge detection filter. This utilises two functions operating in orthogonal directions. To detect a horizontal bar a 0o .i second derivative Gaussian function is used in the vertical direction and a Gaussian function is used in the horizontal direction.

oooe Once an edge has been determined in this way each detected edge is examined.

Any pair of detected edges can be found to be derived from, and thus be indicative of, a CFP1327AUA IPR20A 461584DI1 [I:\ELEC\CISRA\IPR\Ipr2Oa]46 I584D1I.doc:Idp -9pair of eyes, a pair of eyebrows, or an eye and associated eyebrow, depending upon the relative position and size of the detected edges. Similarly, an individual edge can be derived from, and thus be indicative of, a mouth if it is located at an appropriate position relative to the eyes and/or eyebrows already detected.

By proceeding in this fashion, a given region begins to accumulate facial features building from skin tone through eyebrows/eyes and then to a mouth. The more facial features found for a given region which is a face candidate, the greater the possibility that the candidate actually is a face.

Furthermore, the above described method has the advantage that it is able to cater for the circumstance where a face is backgrounded against a background region of substantially the same colour. Under these circumstances, in Yang and Waibel's method, no boundary between the face and the background would be likely to be detected.

Therefore the region as a whole would be selected for further testing. However, the above method utilises the full colour space to segment the image, before making decisions about which pixels are skin colour. Consequently the face is more likely to be *o separated from the background. In addition, the method is naturally independent of orientation or partial occlusion of the face.

Furthermore, the above method also permits false positives to be examined at the further stage and therefore does not exclude from subsequent testing regions which are 20 likely to be ultimately determined as a facial region.

The first embodiment described above notes that the nature of the original image o• may be taken into account when performing an initial face detection process. Further embodiments to be now described build upon this feature.

:°•ooo When an image is captured using a camera it is necessary either for the person taking the picture to manually establish the camera settings (such as shutter speed, CFP1327AUA IPR20A 461584DI [I:\ELEC\CISRA\IPR\Ipr2oa]461 584D I.doc:ldp aperture, focal length, etc), or for the camera to perform this operation automatically.

Whichever is the case, the settings of the camera directly effect the appearance and quality of the image taken. In particular, the perceived brightness, colour, and sharpness of the objects within an image all depend on how the settings of the camera are configured. For example, it is possible to take two pictures of the same scene with different camera settings and to obtain two images in which the same objects appear with different colours and brightness. Therefore, the ability to calibrate (in particular) colour information contained in (digital) images enables a broad variety of object detection and classification tasks in which colour is a strong discriminating feature.

Face detection is one such example application, and the present inventors have determined that the creation of face colour distribution models (CDM's), each adapted to specific lighting conditions, that can improve both the accuracy and reliability of face detection. Variations in lighting conditions can result from the use of a flash, such being a feature recognised as contributing in the face detection method of the first embodiment.

Since lightness is representative of colour features such as luminance and chrominance, such features may be used to quantify face detection.

Before an image can be processed using a face colour distribution model, the face colour distribution model must be constructed. This is performed according to a method 50 shown in Fig. 5. The method 50 firstly gathers image samples at step 52 that 20 are representative images that contain faces, the images being acquired under a variety of lighting conditions and thus indicative of changes in luminance and chrominance. These i images are then manually examined in step 54 to extract regions of skin colour for further processing in model formation. Step 54 may be performed by manually drawing a bounding box around a sample of face coloured pixels. Step 56, which follows, derives colour representation values for the extracted pixels. This may be performed by CFP1327AUA IPR20A 461584DI [I:\ELEC\CISRA\IPR\Ipr2Oa]461 584D.doc:Idp -11transforming the extracted pixels into a perceptual colour space such as CIE L*u*v or CIE L*a*b, so that each pixel is represented by at least a 2-dimensional vector.

Alternatively other colour spaces such as HLS and HSV may be used. Preferably each pixel is represented as a length-3 vector incorporating both the luminance and chrominance values.

The colour representation values of pixels are then divided at step 58 into a number of sets (58a, 58b 58n) according to the lighting conditions present when each of the images were captured. Example sets are flash, non-flash, indoor, outdoor, and combinations of these. Alternatively, lighting parameters obtained directly from the camera such as the operation of a flash, may be used to identify and distinguish the sets.

Other lighting conditions such as bright or cloudy, dusk or dawn, or a type of artificial light such as fluorescent, incandescent or halogen, may be used or detected for these purposes. These details may be provided by means of human input at the time of image capture.

For each of the sets (58a 58n) of face samples, step 60 then constructs a corresponding colour distribution model (CDM) (60a 60n) that best fits the samples of o" face colour pixels. The CDM can be a histogram, a probability density function, or a binary map. In one embodiment, a mixture of Gaussian PDF's are fit to the sample data using techniques known in the art such as the expectation maximisation (EM) algorithm, with either cross-validation, jackknife, and bootstrap techniques being used to estimate the goodness of fit of the model.

When each CDM (60a 60n) has been constructed, it is then desirable as shown in step 62 to establish a corresponding probability threshold (62a 62n) below which a colour vector is to be classified as relating to a non-face pixel, and above which the colour vector is to be classified as a potential face pixel. Additionally, the face colour CFP1327AUA IPR20A 461584DI [I:\ELEC\CISRA\IPR\Ipr2oa]461 584D1.doc:Idp -12probability can be used directly in further facial image analysis steps detailed below. In the preferred embodiment, the CDM is constructed from colour representation values derived using a perceptual colour space (such as CIE L*u*v or CIE L*a*b) and then transformed back into the colour format of the input image, ie., either RGB or YUV. This removes the necessity for transforming the input image into the perceptual colour space.

Since different image capture devices have differing performance, often determined by the quality and size of optical components (eg. lens, mirrors, aperture etc.), typically a CDM or a set of CDM's are generated for a particular capture device. In one implementation, where the image capture device (eg. camera) includes a light meter, a reading from the light meter at the moment the image was captured can be used to determine the required CDM. In this fashion, a greater range of colour models may be devised and can be selected without possible human interference. Such interference may occur where the human user manually selects the operation of the flash where otherwise automatic operation of the flash would not be required. Further, the flash/outdoors example sets above give rise to four sets of CDM's. Using a light meter with, say, 4bit encoding, can provide sixteen (16) models. Also, use of a light meter provides for enhanced reproducability of results and enables the face samples used to generate the models to be taken under laboratory conditions and installation at the time of camera manufacture.

20 The processing 70 of an image according to a second embodiment is shown in Fig. 6. An input image is provided at step 72 and at step 74 the lighting conditions under which the image was captured are determined. Such a determination may be based on binary data obtained directly from the camera (eg. flash+indoors, no_flash+outdoors, noflash+indoors, flash+outdoors) or corresponding meta-data provided with, or accompanying, the image, which may be encoded or otherwise communicated according CFP]327AUA IPR20A 461584DI1 [I:\ELEC\CISRA\IPR\Ipr20a]461 584D1 .dc:dp -13to a predetermined format. Once the lighting conditions are determined, a corresponding or closest CDM is selected from a bank of look-up tables 78 retaining the CDM's (60a previously determined. At step 80, a first pixel of the input image 72 is selected and at step 82 is tested to see if the (RGB or YUV) colour of the pixel is contained in the selected CDM (60a The steps shown in Fig. 6 following the comparison of step 82 depend upon the manner in which the CDM's are stored. In a preferred implementation, the selected thresholds of step 62 (Fig. 5) are used to construct a binary map or look-up table, where a representative colour vector is represented by a 1 if it is a colour vector that is contained within the thresholded face colour distribution, and by a 0 if the colour vector does not occur in the thresholded colour distribution. Alternatively, the CDM may represent the frequencies of the representative colour vectors of the thresholded colour distribution (ie.

the CDM is effectively a histogram of representative colour vectors). A further variation is the case where a sampled distribution is approximated by a parametric model such as a Gaussian or a mixture of Gaussians. In the latter case, the CDM comprises the parameters of the model (eg. mean, co-variance) As seen in Fig. 6, and according to the preferred implementation, a 1 or 0 value arising from step 82 is added to a map in step 84. Step 86 determines if there are more pixels in the image to be processed and step 88 obtains and passes the next pixel to 20 step 82 for appropriate testing. When all pixels have been tested against the selected CDM, step 90 indicates the result of the preceding steps as being a binary face image map formed using detected skin-coloured pixels.

The map is then subjected at step 92 to further analysis of the skin-coloured pixels to provide at step 94 a face detection map for the image. The further analysis of pixels to provide at step 94 a face detection map for the image. The further analysis of CFP1327AUA IPR20A 461584DI [I:\ELEC\CISRA\IPR\Ipr20a]461584DI.doc:ldp -14step 92 is, like the first embodiment, preferably independent of considerations of facial colour.

In practice, binary face map formed at step 90 may contain areas where either there are small non-face pixels surrounded by face pixels or vice versa. One approach for further analysis according to step 92 is processing the binary face image map so as to set to 0 any pixel locations which are contained in areas that are smaller than the smallest size of a potential face and to set any 0 pixel locations to 1 if they are surrounded by likely face colour pixels. This may be performed using a pair of morphological opening and closing operations with suitably shaped structuring elements.

A first structuring element such as: 0 1 11 0 1 11 0L1 1 0I is used in the opening operation to remove potential face candidate pixel locations below this size. A second structuring element such as: is used in the closing operation to fill any holes in potential face candidate pixel locations.

*.Alternative approaches to the use of the structuring elements include using a Hough transform, or to count the number of pixels in the region having skin colour and to threshold that count against a predetermined percentage value. Other methods may be used to perform these tasks.

20 The result of the process 70 of Fig. 6 is a face detection map of pixel locations in the input image at which a face has been detected, and in all likelihood, a face is present.

CFP1327AUA IPR20A 461584DI [I\ELEC\CISRA\IPR\Ipr2Oa]46 1 584D1 .doc:Idp The aforementioned edge detection method of further processing the likely face pixels to determine if a face exits may then be performed on the face detection map 94 resulting from the method In the preferred embodiment, the face colour distribution models are built for a number of separate lighting conditions, such as flash, non-flash, indoor, outdoor, and the like. However, this technique may be extended to the more general case of arbitrary lighting conditions based directly on parameters obtained from the camera. A list of camera parameters that may be of use in this situation is as follows: white balance; (ii) white balance mode (iii) aperture (iris); (iv) shutter speed; auto gain control (AGC); (vi) auto exposure (AE) mode; (vii) gamma; (viii) pedestal level; and (ix) flare compensation.

The parameters obtained from the camera are preferably obtained from a metadata stream associated with the capture of each image (or video sequence). Examples, of 20 such transmission protocols include IEEE 1394 ("firewire"). Also the ISO standards have defined methods for attaching meta-data to images and video in MPEG-7, MPEG4, and JPEG.

Whilst the first embodiment described with reference to Figs. 1 to 3 divides the image according to regions of substantially homogeneous colour, the second embodiment, and a third embodiment, are not so constrained.

CFP1327AUA IPR20A 461584DI [I:\ELEC\CISRA\IPR\Ipr2Oa]461584D1 .doc:!dp -16- The third embodiment is depicted in Fig. 7 by a method 150 where an input image 152 is provided and processed according to steps 154, 156 and 158 corresponding to steps 74, 76 and 78, respectively, of the second embodiment. Once the appropriate CDM has been selected in step 156, step 160 follows to process the input image as one or more regions. As a single region, the entirety of the image is processed on a pixel-bypixel basis. Alternatively, the input image may be geometrically divided into simple pixel blocks (eg. 25 x 25 pixels, 10 x 20 pixels) which can be formed and processed in raster order. A further alternative, is where the regions, like the first embodiment, are divided on the basis of substantially homogeneous colour.

Step 162 selects a first region to be processed and step 164 a first pixel of that region. Step 166 compares the selected pixel with the CDM in a manner corresponding to step 82 of the second embodiment. Where the pixel matches the model, step 168 increments a count of pixels in the region meeting that criteria. Step 170 determines whether there are any other pixels is the region to be processed and if so, step 172 obtains the next pixel and returns to step 166 for appropriate testing. When all pixels in the region have been processed, step 174 follows to compare the percentage of pixels classified for the region as skin colour against a predetermined percentage threshold value. Where the percentage is less than the predetermined number, the region is considered a non-face region and step 176 follows to test if there are any more regions to be processed. If so, step 178 selects the next region and returns processing to step 164.

The count is then re-set. If not, the method 150 ends at step 184.

C. C Where the percentage exceeds the predetermined percentage, the region is considered a possible face region and step 180 follows to assess the region according to C further facial detection analysis. Where such analysis does not detect a face, the method 150 proceeds to step 176 to process any further regions. Where the further CFP1327AUA IPR20A 461584DI [I:\ELEC\CISRA\IPR\Ipr2Oa]461 584DI.doc:Idp -17analysis of step 180 detects a face, step 182 registers that region as a face region and returns to step 176.

An example of the further analysis able to be performed as a consequence of appropriate marking at step 180 is the edge detection method described above in relation to the first embodiment.

The above described embodiments each indicate that face detection in images may be performed as a two stage process, a first representing something akin to a first filtering of the image to obtain likely candidate pixels or regions, and the second representing more thorough analysis to provide an actual determination on those pixels or regions passed by the first stage. In each case, lighting conditions associated with the capture of the image contribute to the determination performed by the first stage.

The above described methods are preferably practiced using a conventional general-purpose computer system 100, such as that shown in Fig. 4 where the processes of Fig. 3 and/or Figs. 5 and 6 are implemented as software, such as an application program executing within the computer system 100. In particular, the steps of the methods are effected by instructions in the software that are carried out by the computer.

The software may be divided into two separate parts; one part for carrying out the above ooooo method steps; and another part to manage the user interface between the latter and the user. The software may be stored in a computer readable medium, including the storage 20 devices described below, for example. The software is loaded into the computer from the computer readable medium, and then executed by the computer. A computer readable medium having such software or computer program recorded on it is a computer program product. The use of the computer program product in the computer preferably effects an advantageous apparatus for detecting face candidate regions in accordance with the embodiments of the invention.

CFP1327AUA [PR20A 461584DI [I:\ELEC\CISRA\IPR\Ipr20a]461584DI .doc:ldp -18- The computer system 100 comprises a computer module 101, input devices such as a keyboard 102 and mouse 103, output devices including a printer 115 and a display device 114. A Modulator-Demodulator (Modem) transceiver device 116 is used by the computer module 101 for communicating to and from a communications network 120, for example connectable via a telephone line 121 or other functional medium. The modem 116 can be used to obtain access to the Internet, and other network systems, such as a Local Area Network (LAN) or a Wide Area Network (WAN), these being possible sources of input images and destinations for detected faces.

The computer module 101 typically includes at least one processor unit 105, a memory unit 106, for example formed from semiconductor random access memory (RAM) and read only memory (ROM), input/output interfaces including a video interface 107, and an I/0 interface 113 for the keyboard 102 and mouse 103 and optionally a joystick (not illustrated), and an interface 108 for the modem 116. A storage device 109 is provided and typically includes a hard disk drive 110 and a floppy disk drive 111. A magnetic tape drive (not illustrated) may also be used. A CD-ROM drive 112 is typically provided as a non-volatile source of data. The components 105 to 113 of the computer module 101, typically communicate via an interconnected bus 104 S. *S e

S

and in a manner which results in a conventional mode of operation of the computer system 100 known to those in the relevant art. Examples of computers on which the 20 embodiments can be practised include IBM-PC's and compatibles, Sun Sparcstations or alike computer systems evolved therefrom.

S. S Typically, the application program of the preferred embodiment is resident on the hard disk drive 110 and read and controlled in its execution by the processor 105.

Intermediate storage of the program and any data fetched from the network 120 may be accomplished using the semiconductor memory 106, possibly in concert with the hard CFP1327AUA IPR20A 461584DI [I:\ELC\CISRA\IPR\Ipr2Oa]46 1584DI.doc:ldp -19disk drive 110. In some instances, the application program may be supplied to the user encoded on a CD-ROM or floppy disk and read via the corresponding drive 112 or 111, or alternatively may be read by the user from the network 120 via the modem device 116.

Still further, the software can also be loaded into the computer system 100 from other computer readable medium including magnetic tape, a ROM or integrated circuit, a magneto-optical disk, a radio or infra-red transmission channel between the computer module 101 and another device, a computer readable card such as a PCMCIA card, and the Internet and Intranets including e-mail transmissions and information recorded on Websites and the like. The foregoing is merely exemplary of relevant computer readable mediums. Other computer readable mediums may be practiced without departing from the scope and spirit of the invention.

The further processing to candidate face images and regions may also be performed by or using the computer system 100 and known arrangements for such processing.

The method of detecting face candidate regions may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the functions or sub functions of Fig. 3 and/or Figs. 5 and 6. Such dedicated hardware

S

may include graphic processors, digital signal processors, or one or more microprocessors and associated memories.

e eO Industrial Applicability It is apparent from the above that the embodiments of the invention are applicable in fields such as content-based image retrieval, personal identification or verification for use with automatic teller machines or security cameras, or automated

*SS.

interaction between humans and computational devices.

CFP1327AUA IPR20A 461584D1 [I:\ELEC\CISRA\IPR\Ipr20a]461584D .doc:ldp The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention.

In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including" and not "consisting only of'. Variations of the word comprising, such as "comprise" and "comprises" have corresponding meanings.

0 8* 5 o CFP1327AUA IPR20A 461584DI S[I:\ELEC\CISRA\IPR\Ipr2Oa]46 1 584D I .doc:ldp

Claims

1. A method of detecting a face in a colour digital image formed of a plurality of pixels, said method comprising the steps of: testing the colour of said pixels to determine those said pixels having predominantly skin colour, said testing utilising at least one image capture condition provided with said image; and (ii) subjecting only said those pixels determined in step as having predominantly skin colour to further facial feature analysis whereby those said pixels not io having a predominantly skin colour are not subjected to said further facial feature analysis.

2. A method according to claim 1, wherein each said image capture condition is acquired at a time said image is captured. o. °o 1

3. A method according to claim 2, wherein said image is encoded according to a predetermined format and said at least one image capture condition is represented as '.,meta-data associated with said format.

4. A method according to claim 1, 2 or 3, wherein said at least one image capture condition comprises lighting conditions at a time said image was captured.

A method according to any one of claims I to 4, wherein step comprises the sub-step, preceding said testing, of: dividing said image into a plurality of regions, each said region comprising a plurality of said pixels; and wherein said testing is performed on pixels within each said region to determine those ones of said regions that are predominantly skin colour and step (ii) comprises performing said further facial feature analysis on only those said regions determined to be predominantly of skin colour.

6. A method according to any one of claims 1 to 4, wherein step utilises at least one predetermined colour distribution model, said model having been generated using Spreviously sampled facial image data. 461584Dl.doc:ldp 22

7. A method according to claim 6, wherein said colour distribution model is generated for a particular image capture device.

8. A method according to claim 6 or 7, wherein separate colour distribution models are generated for said different image capture conditions.

9. A method according to claim 8 when dependent on claim 4, wherein separate colour models are generated for different lighting conditions at a time said previously sampled facial image data was captured.

A method according to claim 9, wherein separate colour distribution models are generated for groups of images taken with a flash and images taken without a flash.

11. A method according to claim 9 or 10, wherein separate colour distribution models are generated for groups of images taken indoors and images taken outdoors. oB,i

12. A method according to any one of claims 6 to 7, wherein each said colour distribution model is represented as a frequency histogram of colour representation :vectors. 2

13. A method according to any one of claims 6 to 11, wherein each said colour distribution model is represented as a probability distribution of colour representation vectors. *25

14. A method according to any one of claims 6 to 11, wherein each said colour distribution model is represented as a binary map of colour representation vectors.

A method according to any one of claims 12 to 14, wherein said colour representation vectors are derived from perceptual colour space values of the predetermined skin-colour pixels in said previously sampled facial image data.

16. A method according to any one of claims 12 to 14, wherein said colour representation vectors contain chromatic colour values derived from those RGB values of Sthe predetermined skin-colour pixels in said previously sampled facial image data. 461584DI.doc:ldp 23

17. A method according to claim 14, wherein said binary map comprises a percentage of the skin colour pixels that were identified in said previously sampled facial image data.

18. A method according to claim 17, wherein one of said pixels is classified as being skin colour if the colour representation vector corresponding thereto occurs within said binary map.

19. A method according to claim 12, wherein each of said pixels is classified as 1o being skin colour if the frequency of the colour representation vector corresponding thereto exceeds a predetermined threshold frequency.

A method according to claim 13, wherein each of said pixels is classified as being skin colour if the probability of the colour representation vector corresponding 15 thereto exceeds a predetermined probability threshold.

21. A method according to claim 18, 19 or 20, when dependent on claim 38 wherein one said regions is determined to be predominantly skin colour if more than a predetermined percentage of the total number of said pixels in said one region are classified as being skin colour.

22. A method according to claim 5, wherein said regions are geometrically divided from said image.

23. A method according to claim 5, wherein said regions are formed of pixels having bsubstantially homogenous colour. o,

24. A method according to claim 23, wherein said regions are formed using a region growing method based upon colour differences.

A method according to claim 5, wherein said further analysis of step (ii) is independent of face colour. r tl

26. Apparatus for detecting a face in a colour digital image formed of a plurality of pixels, said apparatus comprising: 461584D1 .doc:ldp 24 means for testing the colour of said pixels to determine those said pixels having predominantly skin colour, said testing utilising at least one image capture condition provided with said image; and means for subjecting only said those pixels so determined as having predominantly skin colour to further facial feature analysis whereby those said pixels not having a predominantly skin colour are not subjected to said further facial feature analysis.

27. Apparatus according to claim 26, wherein each said image capture condition is io acquired at a time said image is captured.

28. Apparatus according to claim 27, wherein said image is encoded according to a predetermined format and said at least one image capture condition is represented as meta-data associated with said format.

29. Apparatus according to claim 26, 27 or 28, wherein said at least one image capture condition comprises lighting conditions at a time said image was captured. e S:

30. Apparatus according to any one of claims 26 to 29, wherein said means for testing comprises means for dividing said image into a plurality of regions, each said i region comprising a plurality of said pixels; wherein said means for testing operates on pixels within each said region to determine those ones of said regions that are predominantly skin colour and said means for subjecting cause said further facial feature analysis to be performed on only those said *25 regions determined to be predominantly of skin colour.

31. A computer readable medium incorporating a computer program product for detecting a face in a colour digital image formed of a plurality of pixels, said computer program product comprising: means for testing the colour of said pixels to determine those said pixels having predominantly skin colour, said testing utilising at least one image capture condition provided with said image; and means for subjecting only said those pixels so determined as having predominantly skin colour to further facial feature analysis whereby those said pixels not 461584D1 .doc:ldp having a predominantly skin colour are not subjected to said further facial feature analysis.

32. A computer readable medium according to claim 31, wherein each said image capture condition is acquired at a time said image is captured.

33. A computer readable medium according to claim 32, wherein said image is encoded according to a predetermined format and said at least one image capture condition is represented as meta-data associated with said format.

34. A computer readable medium according to claim 31, 32 or 33, wherein said at least one image capture condition comprises lighting conditions at a time said image was captured. is

35. A computer readable medium according to any one of claims 31 to 34, wherein .i said means for testing comprises means for dividing said image into a plurality of regions, each said region comprising a plurality of said pixels; wherein said means for testing operates on pixels within each said region to determine those ones of said regions that are predominantly skin colour and said means for subjecting cause said further facial feature analysis to be performed on only those said regions determined to be predominantly of skin colour.

36. A method of detecting a face in a colour digital image, said method being substantially as herein described with reference to Figs. 4 to 6 of the drawings.

37. Apparatus of detecting a face in a colour digital image, said apparatus being substantially as herein described with reference to Figs. 4 to 6 of the drawings.

38. A computer readable medium incorporating a computer program product including a sequence of computer implementable instructions for carrying out steps substantially as herein described with reference to Figs. 4 to 6 of the drawings. DATED this Twenty-seventh Day of August, 2001 RAlCanon Kabushiki Kaisha Patent Attorneys for the Applicant SPRUSON FERGUSON 461584DI.doc:ldp