GB2548088A

GB2548088A - Augmenting object features in images

Info

Publication number: GB2548088A
Application number: GB1603665.9A
Authority: GB
Inventors: Drskova Tereza
Original assignee: Holition Ltd
Current assignee: Holition Ltd
Priority date: 2016-03-02
Filing date: 2016-03-02
Publication date: 2017-09-13
Anticipated expiration: 2036-03-02
Also published as: GB2548088B; GB201603665D0

Abstract

A system and method of augmenting image data of a persons face by receiving a source image and at least one target image captured by a camera, where the images include at least a visible feature of a source face and corresponding feature of a target face respectively. A region of pixels in the source image associated with the visible feature is identified based on a face model fitted to the source face in the source image and at least one characteristic is computed based on the pixel values of at least one of those pixels. A region of pixels in the or each target image associated with the corresponding feature is identified and pixel values modified based on the at least one computed characteristic. The target image with modified pixel values is then displayed. This allows a user to use a source image to identify makeup to be applied as a virtual, augmented or mixed reality mirror.

Description

Augmenting Object Features in Images Field of the Invention [0001] This invention relates to an image processing system, and more particularly to techniques for augmenting face object features in images.

Background of the Invention [0002] Choosing a new cosmetic product is often a tedious and time consuming process, and is only usually possible in a retail environment where samples are made available. An important consideration for a customer trying on any new product is seeing how it looks as they move around, taking momentary opportunity to view themselves wearing the cosmetic from particular angles or with particular expressions.

[0003] Utilising the mass availability of handheld, or other, computing devices to make real-time virtual try-on of new cosmetics possible in any environment has the potential to radically change the way the customer finds the perfect product. Three main challenges for any such system are first, locating and tracking the features of a subject in a live captured image data stream, second, augmenting a virtual cosmetic product accurately and realistically in place over the live images, and finally to do all this in real-time, particularly on devices having limited hardware capabilities.

[0004] Conventional virtual makeup/makeover systems, for example as discussed in EPl 194898, US7079158 and EPl 196893, provide various interfaces that allow users/customers to select and apply virtual makeup products to an image of a face.

[0005] What is desired are real-time augmentation systems that provide enhanced functionality and a better user experience.

Statements of the Invention [0006] Aspects of the present invention are set out in the accompanying claims.

[0007] In one aspect, the present invention provides a computer-implemented method of augmenting image data, the method comprising receiving data of a source image and at least one target image captured by a camera, the source image including a visible feature of an object, and each target image including a corresponding visible feature of a corresponding object; identifying a region of pixels in the source image associated with the visible feature; calculating at least one characteristic of the visible feature based on pixel values of at least one of the pixels in the identified region of the source image; identifying a region of pixels in the or each target image associated with the corresponding visible feature; and modifying pixel values of the identified region in the or each target image based on the calculated at least one characteristic of the corresponding visible feature in the source image.

[0008] In another aspect, there is provided a computer-implemented method of augmenting image data, the method comprising modifying pixel values of one or more identified regions of a face in a target image based on the augmentation characteristics derived from corresponding identified regions of a face in a source image.

[0009] In further aspects, the present invention provides a system comprising means for performing the above methods. In yet other aspects, there is provided a computer program arranged to carry out the above methods when executed by a programmable device.

Brief Description of the Drawings [0010] There now follows, by way of example only, a detailed description of embodiments of the present invention, with references to the figures identified below.

[0011] Figure 1 is a block diagram showing the main components of an augmented reality system according to an embodiment of the invention.

[0012] Figure 2 is a block diagram showing the main components of the shape model training module shown in Figure 1 and the components of a trained shape model according to an embodiment of the invention.

[0013] Figure 3 is a schematic illustration of an exemplary data structure of a trained model including a global shape and a plurality of sub-shapes.

[0014] Figure 4 is a block diagram showing the main components of an exemplary training module and colourisation module in the system of Figure 1.

[0015] Figure 5, which comprises Figures 5A to 5E, schematically illustrate examples of data processed and generated by the texture model training module during the training process. Figure 5F schematically illustrates an example of the trained model fitted to a detected face in a captured image.

[0016] Figure 6 is a flow diagram illustrating exemplary processing steps performed by the texture model training module of Figures 1 and 4.

[0017] Figure 7, which comprises Figures 7A to 7D, schematically illustrate further examples of data processed and generated by the texture model training module during the training process.

[0018] Figure 8 is a flow diagram illustrating exemplary processing steps performed by the shape model training module of Figures 1 and 2.

[0019] Figure 9 shows an example of user-defined feature points defined a plurality of labelled feature points, displayed over a training image.

[0020] Figure 10, which comprises Figures lOA to IOC, schematically illustrates examples of global and sub-shaped models generated by the training module according to an embodiment.

[0021] Figure 11 is a flow diagram illustrating the processing steps performed by the shape model training module to compute statistics based on the object detector output and user-defined shape, according to an embodiment.

[0022] Figure 12, which comprises Figures 12A to 12E, show further examples of the processing steps performed by the shape model training module of Figure 2.

[0023] Figure 13, which comprises Figures 13A and 13B, is a flow diagram illustrating the main processing steps performed by the shape model training module of Figure 4 to determine cascading regression coefficient matrices according to an embodiment of the invention.

[0024] Figure 14 is a flow diagram illustrating the sub-processing steps performed by the training module to determine offset values and feature point descriptors based on a selected training image.

[0025] Figure 15 is a flow diagram illustrating the main processing steps performed by the system of Figure 1 to track and augment a face in a target image based on computed characteristics of visible features of a face in a source image, according to an embodiment.

[0026] Figure 16 is a flow diagram illustrating the processing steps of an initialization process performed by the tracking module.

[0027] Figure 17 is a flow diagram illustrating the processing steps performed by the tracking module to refine a face shape according to an embodiment.

[0028] Figure 18 is a flow diagram illustrating processing steps performed by the visible feature detector shown in Figure 1 to compute characteristics of visible features, according to an exemplary embodiment.

[0029] Figure 19 is a flow diagram illustrating processing steps performed by the visible feature detector to compute augmentation parameters of an applied makeup products in the source face.

[0030] Figure 20, which comprises Figures 20A to 20E, shows an exemplary sequence of display screens during the augmentation process of Figure 15.

[0031] Figure 21 is a flow diagram illustrating the main processing steps performed by the colourisation module of Figures 1 and 4 to apply colourisation to image data according to an embodiment.

[0032] Figure 22 shows examples of data that is processed by, and processing steps performed by the colourisation module during the colourisation process of Figure 21.

[0033] Figure 23 schematically illustrates an exemplary sequence of data that may be processed by, and processing steps performed by, the transform module to determine transformation of mesh data.

[0034] Figure 24, which comprises Figures 24A to 24D, are schematic block flow diagrams illustrating the main components and processing flows for exemplary shader shader modules in the colourisation module.

[0035] Figure 25 schematically illustrates an example process for generating a blurred version of the captured image data.

[0036] Figure 26 is a diagram of an example of a computer system on which one or more of the functions of the embodiment may be implemented.

Detailed Description of Embodiments of the Invention

Overview [0037] Referring to Figure 1, an augmented reality system 1 is schematically illustrated. In this embodiment, the augmented reality system 1 replicates the visible appearance of facial features that have respective applied makeup products, as detected from a source face of one person, to image data of a target face of another person in each of a sequence of subsequent target image frames. The augmented reality system 1 comprises a source image processing module 3 that automatically processes image data captured by a camera 5 to detect characteristics of one or more visible features of a source face in the source scene, and a colourisation module 7 that modifies image data of subsequently captured image frames containing a target face in a target scene, based on colourisation parameters 9 corresponding to the detected characteristics of visible features of the source face. The colourisation parameters 9 may be stored in a database 9a. The colourisation parameters database 9a may be a database of beauty product details, each product or group of products associated with a respective set of colourisation parameters 9. Alternatively, the database 9a may include colourisation parameters 9 derived from product details retrieved from such a product database.

[0038] The source scene may contain a person facing the camera or a physical photograph of the source face, captured by the camera 5 as the source image data. The target face is detected and located in each subsequent image frame by a tracking module 11 that automatically processes the subsequent image data captured by the camera 5. The augmented image data is then output to a display 13. Alternatively or additionally, the face locator 15a may be configured to output the original captured target image frames to the display 11, with the colourisation module 7 configured to output the regions of modified pixels to the display 11, for display over the captured pixels of respective regions in the captured image frame. Preferably the operations are conducted in real time, or near real time.

[0039] The source image processing module 3 includes a face locator 15a that automatically determines the location of a source face in the captured source image data, for example using a trained face shape model 15 and texture model 16 stored in a model database 21, and a refinement module 17a to perform processing to refine an initial approximation of the location of the detected source face in the captured source image. The source image processing module 3 also includes a visible feature detector 19 that automatically identifies regions of pixels in the source image associated with one or more visible features of the source face, such as predefined cheek, eye and lip regions of the source face that have applied makeup products. Typical makeup products comprise foundation, blush, eyeshadow, eyeliner, mascara, lipstick, lip gloss, lip liner, or the like.

[0040] The visible feature detector 19 computes characteristics of the visible features based on pixel values in the respective identified regions of the source image, based for example on the trained face texture model 16 that defines feature points of a trained mesh representation of the face. The visible feature detector 19 also determines corresponding colourisation parameters 9 for each detected visible feature. The colourisation parameters 9 may be retrieved from a database of virtual makeup products defining respective one or more colourisation parameters 9 for each of a plurality of virtual makeup products. For example, the visible feature detector 19 may be configured to find the closest matching virtual makeup product for each category of makeup on the respective characteristics of the detected applied makeup products in the source image. The colourisation parameters 9 may define property data values that are passed to the colourisation module 7 to control augmentation of the pixel values of the captured image data to apply a representation of an associated virtual makeup product. Alternatively or additionally, the colourisation parameters 9 may include one or more texture files defining image data of respective associated image augmentation that can be transformed to fit respective regions of the detected face in the target image, and applied by the colourisation module 7 to augment the captured target image data.

[0041] The tracking module 11 also includes a face locator 15b that automatically detects and determines the location of the person’s face in the captured target image, for example based on the trained face models 15, 16, and a refinement module 17b to perform processing to refine an initial approximation of the location of the detected target face in the captured target image frame. The face locator 15b passes the captured image frame data together with the determined location of the target face in that frame to the colourisation module 7. The colourisation module 7 includes one or more shader modules 7a to modify the pixel values of each identified region based on the colourisation parameters 9 retrieved from the database 9a by the visible feature detector 19. For example, the colourisation module 7 may include a plurality of shader modules 23 each configured to determine and apply image colourisation to respective identified regions of each target image frame, to replicate the appearance of applying a virtual foundation, blusher, eyeshadow or lipstick makeup product to the target face, based on respective received sets of colourisation parameters 9 for a specific virtual makeup product determined to have characteristics matching a corresponding visible makeup feature in the source image.

[0042] The augmented reality system 1 may further include a shape model training module 23 for processing training images in a training image database 25 to generate and store trained shape models 30 for use during real-time processing of input image data from the camera 5 by the face locator 15a. A texture model training module 27 may also be provided to generate and store trained texture models 16, for example based on a representative image in the training image database 25 as will be described below. The processing of image data by the shape model training module 23 and texture model training module 27 may be referred to as “offline” pre-processing, as the training processes are typically carried out in advance of the “real-time” image processing by the face locator 15a.

[0043] The system 1 may be implemented by any suitable computing device of a type that is known per se, such as a desktop computer, laptop computer, a tablet computer, a smartphone such as an iOS™, Blackberry™ or Android™ based smartphone, a ‘feature’ phone, a personal digital assistant (PDA), or any processor-powered device with suitable user input, camera and display means. Additionally or alternatively, the display 11 can include an external computing device, such as a mobile phone, tablet PC, laptop, etc. in communication with a host device for example via a data network (not shown), for example a terrestrial cellular network such as a 2G, 3G or 4G network, a private or public wireless network such as a WiFi™-based network and/or a mobile satellite network or the Internet.

Shape Model Training Module [0044] An exemplary shape model training module 23 in an embodiment of the augmented reality system 1 will now be described in more detail with reference to Figure 2, which shows the main elements of the shape model training module 23 as well as the data elements processed and generated by the shape model training module 23 for the trained shape models 15. As shown, the shape model training module 23 includes a shape model module 23a that retrieves training images 25a and corresponding user-defined feature points 25b from the training image database 25. The training image database 25 may store a plurality of training images 25a, each comprising the entire face of a respective person, including one or more facial features such as a mouth, eye or eyes, eyebrows, nose, chin, etc. For example, the training images 25a may include subject faces and facial features in different orientations and variations, such as fronton, slightly to one side, closed, pressed, open slightly, open wide, etc. The shape model training module 23 may include a face detector module 23b to detect and determine the location of a face in each retrieved training image 25a. The shape model module 23a generates and stores a global shape model 15a and a plurality of sub-shape models 15b for a trained shape model 15 in the model database 21, as will be described in more detail below. It will be appreciated that a plurality of trained shape models may be generated and stored in the model database 21, for example associated with respective different types of objects.

[0045] Figure 3 is a schematic illustration of an exemplary data structure of a trained shape model 15, including a global shape 15a and a plurality of sub-shapes 15b. As shown, the exemplary data structure of the shape model 15 is an array of (x,y) coordinates, each coordinate associated with a respective feature point of the global shape 15a, corresponding to respective labelled feature point 25b in the training data. Each sub-shape model 15b may be associated with a respective subset of the (x,y) coordinates, each subset thereby defining a plurality of feature points 25b of the respective sub-shape. The subsets of feature points 25b for each sub-shape may overlap.

[0046] The shape model training module 23 may include an appearance sub-shape module 23c that can be used to generate sub-shape appearance models 15c for one or more of the sub-shape models 15b, for example based on pre-defined sub-shape detailed textures. The sub-shape detail textures may be pre-prepared grey scale textures, for example for the lip, cheek and eyes of a subject face. Different textures may be used to implement different appearance finishes, for example glossy, matt, shiny etc. The process of generating a sub-shape appearance model structure can involve warping (through piecewise affine transformations) an image representing the sub-shape detailed texture to the mean shape specified by the corresponding sub-shape model 15b. A combined sub-model module 23d can be provided to generate a sub-shape combined model 15d from a sub-shape model 15b and a corresponding sub-shape appearance model 15c.

[0047] In this embodiment, the shape model training module 23 also includes a statistics computation module 23e that computes and stores mean and standard deviation statistics based on the plurality of global shape models 15a of the trained models 15 generated by the shape model module 23a and the output of the face detector module 23b. The computed statistics can advantageously provide for more robust, accurate and efficient initial positioning of an object that is to be located within the bounding box output by the face detector module 23b.

[0048] A regression computation module 23f of the shape model training module 23 generates a global shape regression coefficient matrix 15e based on the global shape 15a generated by the shape model module 23a, and at least one sub-shape regression coefficient matrix 15f for each sub-shape 15b generated by the shape model module 23a. As is known in the art, the regression coefficient matrices 15e, 15f define an approximation of a trained function that can be applied, for example during a tracking phase, to bring the features of a candidate face shape from respective estimated locations to determined “real” positions in an input image. The generation of regression coefficient matrices 15e, 15f in the training process therefore define respective trained functions which relate the texture around an estimated shape and the displacement between their estimated positions and the final position where the shape features are truly located. The regression computation module 23f can be configured to compute the respective regression coefficient matrices 15e, 15f based on any known regression analysis technique, such as principal component regression (PCR), linear regression. least squares, etc. The plurality of regression coefficient matrices 15e, 15f form parts of the trained shape model 15 stored in the model database 21.

Texture Model Training Module [0049] An exemplary texture model training module 27 in an embodiment of the augmented reality system 1 will now he described in more detail with reference to Figure 4, which shows the main elements of the texture model module 27 as well as the data elements that are processed and generated by the texture model module 27 for the trained texture models 16. Reference is also made to Figure 5A to 5E schematically illustrating examples of data that are processed and generated by the texture model training module 27 during the training process.

[0050] As shown in Figure 4, the texture model training module 27 may include a mesh generator 27a that retrieves at least one reference image 25c from the training image database 25, for example as shown in Figure 5A, and generates data defining a plurality of polygonal regions based on the retrieved reference image 25c, collectively referred to as a normalised mesh 16’. Each region is defined by at least three labelled feature points and represents a polygonal face of the two-dimensional normalised mesh 16’. It is appreciated that the normalised mesh may instead define three-dimensional polygonal regions. Preferably, the shape model training module 23 and the texture model training module 27 use the same set of labelled feature points as the face locator 15a, so that vertex and texture coordinate data can be shared across a common reference plane. The mesh generator 27a may be configured to receive data defining the location of labelled feature points 25b in the, or each, reference image 25c as determined by the face locator 15a. Alternatively, the texture model training module 27 may include a face detector module (not shown) to detect and determine the location of the face in a reference image 25c. As another alternative, the mesh generator 27a may prompt a user to input the location of each feature point for the, or each, reference image 25c. Figure 5B schematically illustrates a plurality of defined feature points overlaid on a representation of a reference image 25c. Preferably, the reference image is a symmetrical reference face, in order to optimize texture space across all areas of the face where virtual makeup may be applied.

[0051] The texture model training module 27 may be configured to subsequently perform triangulation to generate a mesh of triangular regions based on the labelled feature points. Various triangulation techniques are known, such as Delaunay triangulation, and need not be described further. Figure 5C schematically illustrates an example of a resulting normalised mesh 16’ generated from the reference image shown in Figure 5A and the plurality of labelled feature points shown in Figure 5B. Optionally, the mesh generator 27a may further prompt the user for input to optimize the normalised mesh 16’, for example by reducing or increasing the number of triangles for a particular region of the reference image. Figure 5D schematically illustrates an example of a resulting optimised version 16a of the normalised mesh 16’ shown in Figure 5C. Alternatively, the mesh generator 27a may be configured to facilitate manual triangulation from the labelled featured points to generate an optimal normalised mesh 16a. It will be appreciated that in the context of the present embodiment, an optimal normalised mesh 16a consists of triangles that stretch in their optimum directions causing the least number of artefacts, resulting in a mesh that defines an ideal number of vertices and polygonal faces to be used for the application of virtual makeup as described below. Figure 5E schematically illustrates a detailed example of a normalised mesh 16a of a trained texture model 16, including eighty labelled features points 25b that are numbered in sequence.

[0052] The normalised mesh 16’ may be stored as a data structure including a first data array consisting of an indexed listing of the labelled feature points defined by x and y coordinates relative to a common two dimensional reference plane, and a second data array consisting of a listing of polygon faces defined by indices of three or more labelled feature points in the first data array. For example, the first data array be an indexed listing of m vertices: [xo, yo, xi, yi, ··· Xm, Jm], each index corresponding to a different labelled feature point. The second data array may be listing of n exemplary polygon faces: [1/2/20, 1/21/5, . . ., 92/85/86], each polygon face defined by indices or three vertices in the first data array. The normalised mesh 16’ data can be stored in the model database 21 of the system 1.

[0053] The texture model training module 27 also includes an optimisation module 27b that generates a plurality of optimised texture models 14, based on the normalised mesh 16’ generated by the mesh generator 27a and data defining one or more user-defined masks 25d, retrieved from the training image database 25 for example. Each texture model 16 may be associated with one or more virtual make-up products having a set of colourisation parameters 9, the texture model 16 defining one or more regions of captured image data corresponding to predefined areas of a person’s face that are to be augmented with the associated colourisation parameters 9. Each texture model 16 generated by the optimisation module 27b includes data defining the associated mask 16a such as a copy of or pointer to the image data defining the respective user-defined mask 16a, and a mesh subset 16b comprising a subset of the polygonal regions of the normalised mesh 16’ that is determined based on the associated mask 16a, as will be described in more detail below. In this way, the optimisation module 27b can be used to take a given makeup mask and output only the necessary polygonal faces that are to be used by the colourisation module 7 to render the respective portions of the augmented image data.

[0054] Many masks can be compounded together to produce a particular desired virtual look or appearance, which consists of multiple layers of virtually applied makeup, including for example one or more of lipstick, blusher, eyeshadow and foundation, in multiple application styles. The masks 16 may include black and white pixel data. Preferably, the masks 16 are grey-scale image data, for example including black pixels defining portions of a corresponding texture data file 20 that are not to be included in the colourisation process, white pixels defining portions of the corresponding texture data file 20 that are to be included at 100% intensity, and grey pixels defining portions of the corresponding texture data file 20 that are to be included at an intensity defined by the associated grey value. The white and grey pixels are referred to as the masked data regions. In this way, different masks 16 can be provided for various blurring effects.

[0055] Each texture model 16 is associated with texture data 33 that may also be stored in the model database 21, such as texture image data that is representative of the appearance of an associated applied virtual make-up product. The texture image data 33 may have the same dimensions as the captured image data received from the camera. Alternatively, it is appreciated that the normalised mesh 16’ may be defined at a different scale from the texture image data 20, and an additional processing step can be used to compute the necessary transformation. Where the texture image data 33 has different dimensions from the captured image data, such as defining details of a portion of the overall face, metadata can be provided to identify the location of the texture portion relative to the pixel location of a captured image and/or reference image 25c. The texture data 33 may also include data identifying one or more associated material properties. Alternatively or additionally, the texture data 33 may define a mathematical model that can be used to generate an array of augmentation values to be applied by the colourisation module 7 to the captured image data.

Colourisation Module [0056] A colourisation module 7 in an embodiment of the augmented reality system 1 will now be described in more detail, again with reference to Figure 4 also showing the main elements of the colourisation module 7 as well as the data elements that are processed by the colourisation module 7 to generate augmented image data that is output to the display 11. As shown, the colourisation module 7 includes a plurality of shader modules 7a that determine and apply image colourisation to selected regions of captured image data and/or texture data files 33. For example, four custom virtual makeup shader modules 7a can be implemented by the colourisation module 7, each having a respective predefined identifier, and used to determine and apply image colourisation to represent virtual application of lipstick (which may include lip gloss or lip liner), blusher, eyeshadow (which may include eyeliner and mascara) and foundation to the captured image data. The output of a custom makeup shader module 7a is sent to a renderer 7b that augments the underlying user's face in the captured image from the camera 5 with the specified virtual makeup. As will be described in more detail below, each shader module 7a can be based on predefined sets of sub-shader modules to be applied in sequence, for example based on selected sets of colourisation parameters 9.

[0057] As shown in Figure 4, predefined sets of colourisation parameters 9 are retrieved from the colourisation parameters database 9a by the visible feature detector 19 of the source image processing module 3, and passed to the colourisation module 7 for processing. Each set of colourisation parameters 9 may include one or more predefined property values 9-1, predefined texture values 9-2 such as respective identifiers of a stored texture model 16 and a stored texture data file 33, and a predefined shader type 9-3 such as an identifier of one or more shader modules 7a implemented by the colourisation module 7 that are to be used to augment associated regions of captured image data based on the property values 9-1 and/or texture values 9-2.

[0058] The colourisation module 7 may include a transform module 7c that receives data defining the location of labelled features points in the common reference plane, determined by the face locator 15b of the tracking module 11 for a captured image. The determined coordinates from the camera image data define the positions of the polygonal regions of a normalised mesh 16’ that matches the detected object, the user’s face in this embodiment. Figure 5F schematically illustrates a plurality of defined feature points of a warped instance of the normalised mesh 16’ to fit (or match) a detected face in a captured image, overlaid on a representation of the captured image 18. Preferably, the object face model defines a symmetrical reference face, in order to optimize the processing of mirrored areas of the left and right side of a face, where real applied makeup may be detected and virtual makeup may be applied. The transform module 7c determines a mapping from the vertices of a selected region of a trained mesh 16’ to vertices of the corresponding tracked labelled feature points. The transform module 7c uses the determined mapping to transform the associated regions of mask data 16a and texture data 33 retrieved from the model database 21 for the particular set of colourisation parameters 9, into respective “warped” versions that can be processed by the shader modules 7a.

[0059] After all of the regions and colourisation parameters are processed by the transform module 7c and the respective defined shader module(s) 7a, the renderer 7b overlays the respective mesh subsets 16b of each texture model 16 according to the common reference plane, and in conjunction with an alpha blended shader sub-module (not shown), performs an alpha blend of the respective layers of associated regions of warped texture data. The blended result is an optimized view of what will get augmented on the user's face, taking into account characteristics of visible makeup product(s) already present on corresponding regions of the face. The final result is obtained by the renderer 7b applying the blended result back onto the user's face represented by the captured image data from the camera 5, and output to the display 11.

[0060] In this way, the colourisation module 7 uses the image data coordinates from the reference face, referenced by the mesh subsets 16b, as texture coordinates to the texture data files 33, for each texture model 16 associated with a respective set of colourisation parameters 9 for a selected virtual makeup product, transformed according to the tracked feature point locations, and rendered over the captured image data, resulting in the visual effect of morphing all of the selected virtual makeup products to the user's face in a real-time augmented reality display. It will be appreciated that the transform module 7c, shader modules 7a and renderer 7b may include calls to a set of predefined functions provided by a Graphics Processing Unit (GPU) of the system 1. Advantageously, the present embodiment provides for more efficient GPU usage, as only the portions of the respective texture data files and captured image data are transmitted to the GPU for processing.

Texture Model Training Process [0061] A brief description has been given above of the components forming part of the texture model training module 27 of one embodiment. A more detailed description of the operation of these components in this embodiment will now be given with reference to the flow diagram of Figure 6, for an example computer-implemented training process using the texture model training module 27. Reference is also made to Figure 7A to 7C schematically illustrating examples of data that is processed and generated by the texture model training module 27 during the training process.

[0062] As shown in Figure 6, the training process begins at step S6-1 where the texture model training module 27 retrieves a normalized object mesh 16’ from the model database 21. At step S6-3, the model training module 4 retrieves a first one of the plurality of user-defined masks 25d from the image database 25. Figure 7 A shows an example of a mask 25d defining a lip region of the reference image 25c shown in Figure 3a. At step S6-5, the model training module 4 overlays the retrieved mask 25d on the retrieved normalised object mesh 16’ to determine a subset of regions of the normalised mesh 16’ that include at least a portion of the masked data regions. Figure 7B schematically illustrates an example of the masked regions shown in Figure 7A, overlaid on the normalised mesh 16’ shown in Figure 3d. Figure 7C schematically illustrates the subset of mesh regions as determined by the texture model training module 27. At step S5-7, the determined subset of mesh regions is stored as a mesh subset 16b in a texture model 16 along with a copy of the associated mask 16a, in the model database 21. At step S5-9, the model training module 4 determines if there is another user-defined mask 25d in the image database 25 to be processed, and if so, processing returns to step S5-3 where the next mask 25d is retrieved for processing as described above, until all of the user-defined masks 25d have been processed in this way. Figure 7D schematically illustrates an exemplary final set of masks 16a as determined by the texture model training module 27 to define lipstick, foundation, blusher and eyeshadow regions of a captured image.

Shape Model Training Process [0063] A brief description has been given above of the components forming part of the shape model training module 23 of an exemplary embodiment. A more detailed description of the operation of these components will now be given with reference to the flow diagram of Figure 8, for an example computer-implemented training process using the shape model training module 23. Reference is also made to Figure 9 schematically illustrating examples of user-defined shapes defined by labelled feature points, and to Figures lOA to IOC schematically illustrating examples of trained global and sub-shape models.

[0064] As shown in Figure 8, the training process may begin at step S8-1 where the shape model training module 23 processes user input to define a plurality of labelled feature points 25b in the training images 25a of the training image database 25. For example, a user interface may be provided to prompt the user to sequentially define a set of feature points 25b for a training image 23, each labelled feature point 25b associated with a respective location in the corresponding training image 23 and having a corresponding unique identifier. Figure 9 shows an example of a resulting user-defined shape 25a displayed over an associated training image 23, as defined by the plurality of labelled feature points 25b. The data may be defined as a set or array of x and y positions in the image, defining respectively the x-axis and y-axis position in the image of each user-defined feature point 25b in the training image 23. The plurality of feature points 25b may be grouped into subsets of feature locations, each subset corresponding to respective sub-aspects of the overall object. In the present example. the overall object is a subject’s face and the sub-aspects may be i) the lips, mouth and chin, and ii) the eyes, eyebrows, nose and face outline.

[0065] At step S8-3, the shape model module 23a of the shape model training module 23 determines a global shape model 27 for the trained face model 16, based on the training images 25a and associated feature points 25b retrieves from the training image database 25. Any known technique may be used to generate the global shape model 27. For example, in this embodiment, the shape model module 23a uses the Active Shape Modelling (ASM) technique, as mentioned above. Figure lOA shows a schematic representation of an example global shape model 27 generated by the shape model module 23a using the ASM technique. In the illustrated example, the global shape model 27 of a subject’s face includes three modes of variation as determined by the shape model module 23a from the training data. Each mode describes deviations from the same mean shape 27a of the global shape model, illustrated in the middle column, the deviations differing for each respective mode. For example, the illustrated mode zero represents deviations resulting from the subject’s face turning left and right the second mode represents deviations of the lip and mouth in various open and closed positions, while the third mode represents deviations of the subject’s face tilting vertically up and down.

[0066] It will be appreciated that the data structure of the global shaped model 27 will depend on the particular shape modelling technique that is implemented by the shape model module 23 a. For example, the ASM technique processes the distribution of user-defined feature locations in the plurality of training images 25a in order to decompose the data into a set of eigenvectors and eigenvalues, and a corresponding set of parameters/weights between predefined limits, to define a deformable global shape model for a subject’s face. The precise steps of the ASM technique are known per se, and need not be described further.

[0067] At step S8-5, the shape model module 23a determines one or more sub-shape models 15b, again using the same shape modelling technique used to generate the global shape model 27. In this step, the ASM technique for example is applied to the respective subsets of feature locations, to generate respective sub-shape models 15b corresponding to respective sub-aspects of the overall face. Figure lOB shows an example of a first sub-shape model 15b-l corresponding to the lips, mouth and chin of a subject’s face. Figure IOC shows an example of a second sub-shape model 15b-2 corresponding to the eyes, eyebrows, nose and face outline of a subject’s face. It will be appreciated that the number of modes of variation for a global and sub-shape model may vary depending on the complexity of the associated training data.

[0068] Returning to Figure 8, at step S8-7, the appearance sub-shape module 23c determines a sub-shape appearance model 15c for one or more of the sub-shape models 15b generated by the shape model module 23a. In this example embodiment, an appearance model 15c is generated for the first sub-shape model 15b corresponding to the lips, mouth and chin of a subject’s face. Any known technique for generating an appearance model 15c may be used, for example the Active Appearance Model (AAM) technique, as mentioned above. The particular implementation steps of this technique are known per se, and need not be described further. The result of the AAM technique applied by the appearance sub-shape module 23c is a deformable sub-shape appearance model 15c comprising a mean normalised grey level vector, a set of orthogonal modes of variation and a set of grey level parameters.

[0069] At step S8-9, the combined sub-model module 23c determines a sub-shape combined model 15d for each sub-shape appearance model 15c, based on the corresponding sub-shape model generated by the shape model module 23a. For example, the shape model derived from the labelled training images 25a can be processed to generate a set of shape model parameters, and the sub-shape appearance model 15c may be similarly processed to generate corresponding appearance model parameters. The shape model parameters and the appearance model parameters can then be combined, with a weighting that measures the unit differences between shape (distances) and appearance (intensities). As with the ASM and AAM techniques, the combined model can be generated by using principle component analysis and dimensionality reduction, resulting in a deformable combined model represented by a set of eigenvectors, modes of variation and deviation parameters.

[0070] At step S8-11, the statistics computation module 23d can be used to compute a set of statistics to improve the robustness of initial positioning of a detected face within a bounding box output by the face locator 15b. This exemplary processing is described in more detail with reference to Figure 11. As shown in Figure 11, at step SI 1-1, the statistics computation module 23d selects a first image from the training images 25a in the image database 25. The corresponding feature points 25b of the user-defined shape for the training image 23 are also retrieved from the training image database 25. At step SI 1-3, the selected training image 23 is processed by the face locator 15b to determine a bounding box of a detected subject’s face in the image 23. Figure 12A shows an example of a detected face in a training image, identified by the bounding box 51.

[0071] At step SI 1-5, the statistics computation module 23d determines if the identified bounding box 51 contains the majority of feature points 25b of the corresponding user-defined shape 25. For example, a threshold of 70% can be used to define a majority for this step. If it is determined that the bounding box 51 does not contain the majority of feature points 25b, then position and scale statistics are not computed for the particular training image 23 and processing skips to step SI 1-13 where the statistics computation module 23d checks for another training image to process. On the other hand, if it is determined that the bounding box 51 contains a majority of the feature points 25b, then at step SI 1-7, the relative position of the user-defined shape, as defined by the feature points 25b, within the identified bounding box 51 is calculated. At step SI 1-9, the statistics computation module 23d calculates the relative scale of the user-defined shape to the means shape 27a of the global shape model 27. At step SI 1-11, the calculated coordinates of the relative position and the relative scale are stored for example in the training image database 25, for subsequent computations as described below.

[0072] At step SI 1-13, the statistics computation module 23d determines if there is another training image 23 in the database 5 to be processed, and returns to step SI 1-1 to select and process the next image 23, as necessary. When it is determined that all of the training images 25a, or a pre-determined number of training images 25a, have been processed by the statistics computation module 23d, at step SI 1-15, a mean and standard deviation of the stored relative position and scale for all of the processed training images 25a is computed, and stored as computed statistics 44 for the particular face locator 15, for example in the training image database 25.

[0073] Returning to Figure 8, the offline training process proceeds to step S8-13, where the regression computation module 23f of the shape model training module 23 proceeds to determine regression coefficient matrices 15e, 15f for the global shape model 27 and the plurality of sub-shaped models 29. This process is described in more detail with reference to Figures 13 and 14. The regression computation module 23f computes the regression coefficient matrices 15e, 15f based on feature point descriptors and corresponding offsets that are determined from the training images 25a in the database 5. In the present embodiment, the feature point descriptors are Binary Robust Independent Elementary Features (BRIEF) descriptors, derived from the calculated conversion of an input global or sub- shape feature points to a selected image, but other feature descriptors can be used instead such as ORB, FREAK, HOG or BRISK.

[0074] As is known in the art, regression analysis is a statistical process for modelling and analyzing several variables, by estimating the relationship between a dependent variable and one or more independent variables. As mentioned above, the regression coefficient matrices 15e, 15f define trained functions that represent a series of directions and re-scaling factors, such that a matrix can be applied to a candidate shape model to produce a sequence of updates to the shape model that converge to an accurately located shape with respect to an input image (e g. a training image during a training process, or a captured image during a tracking process). In this embodiment, the plurality of subshape regression matrices 47 are arranged as a cascading data structure. Each regression matrix in level i, overcomes situations where the previous regression coefficient matrix did not lead to the final solution. For example, the first, highest level regression coefficient matrix approximates a linear function that tries to fit all cases in the database. The second and further lower level regression matrices fit situations that the first level regression matrix was not able to cope with. This cascading data structure thereby provides a more flexible function with improved generalization across variations in face shapes. The training process to determine the cascading sub-shape regression coefficient matrices 47 simulates similar captured image scenarios which might be captured and processed during the tracking procedure, utilising stored training data 5 defining the real or actual displacement or offset between the estimated and real position of the face shape feature points that are known for the training images 25a in the database 5. The texture around an estimated shape is described by the BRIEF features and the offset between corresponding labelled feature points can be measured in pixels coordinates in the reference image resolution.

[0075] As shown in Figure 13A, at step S13-1, the regression computation module 23f selects a first image 23 and corresponding feature points 25b from the trained image database 25. At step S13-3, the regression computation module 23f computes and stores a first set of BRIEF features for the global shape 29 and corresponding offsets, based on the selected training image 23. The process carried out by the regression computation module 23f to process a selected training image 23 is described with reference to Figure 14.

[0076] At step S14-1, the regression computation module 23f generates a pre-defmed number of random shape initialisations 53, based on the generated global shape model 27. This generation process involves a bounding box obtained by the face locator 15b and the output of the statistics computation module 23d. A random value is obtained for X and y displacements within the bounding box and scale relation with the mean shape 27a. Random values are extracted from the 68% of values drawn from a normal distribution or within one standard deviation away from the mean. For example, twenty random values may be computed for scale and x and y displacements, based on the computed statistics stored by the statistics computation module 23d at step S8-11 above, in order to generate a total of twenty different initializations for a single bounding box. This sub-process can be seen as a Monte Carlo initialization procedure which advantageously reduces over-fitting and provides a set of regression coefficient matrices that are capable of more generalised object representations than determinist methods or single initialization estimates, for example. Figure 12B shows an example of various random shape initialisations 53 displayed over the initial global shape model 27, for a particular training image 23.

[0077] At step S14-3, a reference shape is determined by scaling the mean shape 27a of the global shape model 27, based on a pre-defined value specified by the user, for example 200 pixels as inter-ocular distance. This procedure determines the size of the image where all the computations will be performed during training and tracking. A conversion between shape model coordinates frame in unit space to the image plane in pixel coordinates is performed. Figure 12C schematically illustrates an example of scaling of the mean shape and Figure 12D schematically illustrates an example of the resulting reference shape 55. At step S14-5, the regression computation module 23f computes the similarity transformation between the reference shape 55 and the plurality of random shape initialisations 53.

[0078] At step S14-7, the regression coefficient module 43 performs image processing on the selected training image 23 to transform the selected training image 23 based on the reference shape 55 and the computed similarity transformation. In this embodiment, the similarity transformation between the current estimate and the reference shape is computed through an iterative process aiming to minimize the distance between both shapes, by means of geometric transformations, such as rotation and scaling, to transform (or warp) the selected training image 23. In the first iteration, just scaling has a role since the first estimation is a scaled mean shape therefore, the rotation matrix will always be an identity matrix. In further iterations, once the initial scaled mean shape has been modified by the refinement process, scale and rotation will be of great importance. Subsequent regression coefficient matrices will operate in transformed images which will be very closely aligned with the reference shape. Figure 12E shows examples of various geometric transformations that can be performed on respective training images 25a. Advantageously, image transformation in this embodiment is applied globally to the whole image by means of a similarity transformation, in contrast for example to piece-wise affine warping as employed in AAM, whereby no deformation is performed and computation speed is improved considerably.

[0079] At step SI4-9, the regression computation module 23f calculates a conversion of the feature points 25b of the user-defined shape for the selected training image 23, to the corresponding locations for the labelled feature points in the transformed image generated at step S9-9. At step S14-11, the regression computation module 23f calculates a conversion of the input shape, that is the random shape initialization as defined by the process S9-3 and the current estimated shape in further iterations, to the corresponding feature locations in the transformed image. At step S14-13, the offset between the calculated conversions is determined by the regression computation module 23f At step S14-15, the regression computation module 23f determines a set of BRIEF descriptors for the current estimated shape, derived from the calculated conversion of the input shape feature points to the transformed image. The determined BRIEF descriptor features and corresponding offsets are stored by the regression computation module 23f at step SI4-17, for example in the training image database 25.

[0080] Returning to Figure 13A, at step S13-5, the regression computation module 23f determines if there is another training image 23 in the database 5 to be processed and processing returns to steps S13-1 and S13-3 where regression computation module 23f determines a corresponding set of BRIEF descriptor features and corresponding offsets, based on each of the remaining, or a predetermined number of, training images 25a in the database 5. Once all of the training images 25a have been processed in this way, a regression coefficient matrix 45 for the global shape model 27 is computed and stored for the trained shape model 15 in the model database 21, taking as input all of the stored offsets and BRIEF features determined from the training images 25a.

[0081] Accordingly, at step S13-7, the regression computation module 23f computes the regression coefficient matrix 45 for the input global shape, based on the determined BRIEF features and corresponding offsets. In this embodiment, the regression computation module 23f is configured to compute the regression coefficient matrix 45 using a regression analysis technique known as Principal Component Regression (PCR), which reduces the dimensionality of the gathered BRIEF descriptors dataset before performing linear regression using least squares minimization in order to get a regression coefficient matrix. Since the obtained matrix has a dimension equal to the number of selected principal component, a conversion to the original dimensional space is efficiently computed. As known in the art, regression coefficient matrices are an optimal data structure for efficient facial feature detection, for example as discussed in “Supervised Decent Method And Its Applications To Face Alignmenf’, Xiong and Torre. It is appreciated that alternative known regression analysis techniques may instead be used to compute the regression coefficient matrices, such as least squares regression, etc.

[0082] At step SI3-9, the regression computation module 23f updates the global shape model 27 of the current trained shape model 15 stored in the model database 21, by applying the respective trained functions defined by the computed global regression coefficient matrix 45 to the global shape model 27. It will be appreciated that the computational process for applying the cascading regression coefficient matrix to the input shape is known per se and will depend on the specific regression analysis technique implemented by the system 1. At step S13-11, the regression computation module 23f processes the random shape initializations generated at step SI0-1 above, to split each random shape initialization into a respective set of estimated sub-shapes, according to the plurality of defined sub-shape models 15b in the model database 21. For example, referring to the exemplary shape model in Figure 5, the defined subset of (x,y) coordinates for features of each sub-shape 15b can be selected from each random shape initialization to obtain the respective estimated sub-shape.

[0083] The regression computation module 23f then processes the plurality of current sub-shapes 29 to generate a respective plurality of cascading sub-shape regression coefficient matrices 47 for each current sub-shape 15b, based on the estimated subshapes obtained at step S13-11 and the training images 25a in the database 5. In this exemplary embodiment, three cascading sub-shape regression coefficient matrices 47 are defined for each current sub-shape 15b. It is appreciated that any number of cascading levels can be defined. At step S13-13, the regression computation module 23f selects a first sub-shape model, and computes and stores respective BRIEF descriptor features for each estimate sub-shape of the current selected sub-shape model 15b, and the corresponding offset based on the training images 25a in the database 5, at the current cascade level.

[0084] Accordingly, at step S13-15, the regression computation module 23f selects a first training image 23 and associated feature points 25b from the training image database 25 at step S13-15. At step S13-17, the regression computation module 23f selects a first one of the estimated sub-shapes of the current selected sub-shape model 15b. At step S13-19, the regression computation module 23f determines and stores BRIEF descriptor features for the selected estimated sub-shape, as well as the corresponding offsets, based on the current selected training image 23. At step SI3-21, the regression computation module 23f determines whether there is another estimated sub-shape to process and if so, returns to step S13-17 to select the next estimated subshape to be processed. Once all of the estimated sub-shapes have been processed based on the current selected training image 23 at the current cascade level, the regression computation module 23f determines at step S13-23 whether there is another training image 23 to process and if so, processing returns to step SI3-15 where BRIEF features and offsets data collection process is repeated for the next training image at the current cascade level.

[0085] Once all, or a predetermined number, of the training images 25a have been processed in the above way for the current cascade level, the regression computation module 23f computes at step S13-25 a sub-shape regression coefficient matrix 47 for the current selected sub-shape, at the current cascade level, based on all of the determined BRIEF features and corresponding offsets. At step S13-27, the regression computation module 23f updates all of the estimated sub-shapes, by applying the offsets obtained from the respective trained functions defined by the current cascade level subshape regression coefficient matrix 47, to the sub-shape model 27. At step SI3-29, the regression computation module 23f determines if there is another cascade level of the cascading sub-shape regression coefficient matrices 47 to be generated, and if so, returns to step S13-15 where the process is iteratively repeated for the remaining cascade levels.

[0086] After the regression computation module 23f determines at step S13-29 that the current selected sub-shape model 15b has been processed in the above manner for all of the predetermined cascade levels, then at step S13-16, the regression computation module 23f determines if there is another sub-shape model 15b to process and returns to step S13-13 to select the next sub-shape 15b, and to subsequently compute the cascading regression coefficient matrices 47 for the next selected sub-shape 15b and update the next sub-shape 15b, until all of the sub-shapes 29 have been processed and updated by the shape model training module 23 as described above.

Augmentation Process [0087] The augmentation process performed by the augmented reality system 1 will now be described in more detail with reference to Figures 15 A and 15B, which show the steps of a computer-implemented image augmentation process according to another exemplary embodiment of the present invention. Reference is also made to Figures 20A to 20E, showing an example sequence of user interface display screens illustrating the face tracking process.

[0088] As shown in Figure 15, at step SI5-1, the face locator 15a may perform an initialisation sub-process based on received data of an initial captured image from the camera. One example of this processing is described in more detail with reference to Figure 16. As shown in Figure 16, the process starts with the supply of a camera feed at a step Dl. The camera captures a (video) image of the user, and displays this to the user, for example on a tablet computer which the user is holding. An overlay is also shown on screen, which might for example comprise an outline or silhouette of a person’s face. The user is required to align the image of their face with the overlay at a step D2. An example of the displayed image overlay is shown in the representation provided to the left of the step D2.

[0089] At a step D3, a face detection step is carried out, which might for example use Haar-like features (discussed for example in "Zur Theorie der orthogonalen Funktionensysteme", Haar, Alfred (1910), 69(3): 316-371). These Haar-like features can be used to pick out the location and scale of the face in the image. An example of this, in which the location of the detected face is identified by a bounding rectangle, is shown in the representation provided to the left of the step D3. At a step D4 it is determined whether or not the face has been detected. If the face has not been detected, then processing cannot go any further, and the process returns to the step D2, for the user to realign their face with the overlay. If the face has been detected, then at a step D5 a mouth detection step is carried out, which might again for example use Haar-like features- this time to pick out the location of the mouth. In order to improve processing efficiency, the search for the mouth can be constrained to lower part of the bounding rectangle already found for the face. An example of a detected mouth area is shown in the representation provided to the left of the step D5. At a step D6, it is determined whether or not the mouth has been detected. If the mouth has not been detected, then processing cannot go any ftarther, and the process returns to the step D2, for the user to realign their face with the overlay.

[0090] If the mouth has been detected, then at a step D7 a process of building foreground and background histograms is carried out. Foreground refers to the target area to be detected for example lip regions and background refers to the area to be excluded from the foreground for instance skin regions. The foreground and background histograms are populated with a frequency of colour values occurring in different regions of the image. These regions are defined, for example, by a mask created with the face as background and the mouth as the foreground, as discussed above. In some embodiments one or more histogram updates might be carried out using the same source image and the same mask. The foreground^ackground histogram building process uses as an input a version of the camera feed, which may be converted from the camera image data colour space (e.g. RGB/RGBA) to a working colour space (e.g. YCrCb), at a step DIO. The input colour format depends on the camera installed in the device employed by the user. It is appreciated that the YCrCb colour space is useful, since the histogramming can be carried out in two dimensions by ignoring luminance (Y) and utilising only the colour difference values Cr and Cb.

[0091] The step D7 comprises a sub-step D7a of providing exclusive histogram updates based on a face area (background/skin) provided at a step Dll and a mouth area (foreground/lips) provided at a step D12. By exclusive it is meant that updates in the foreground histograms by foreground masks increases the frequency of the corresponding colour but updates the background histogram as well by decreasing the frequency of that same colour. In other words, if the colour belongs to the foreground it can not belong to the background. Therefore, the update of any colour coming from background or foreground produces effects in both histograms. The representation visible between the steps DIO and Dll illustrates the mouth area (white - foreground), and the face area (black - background) employed in the exclusive histogram updates step D7a. At a step D7al, a background histogram is updated with the frequency of occurrence of each colour value within the face area (but outside of the mouth area). Similarly, at a step D7a2, a foreground histogram is updated with the frequency of occurrence of each colour value within the mouth area. The next steps which take place in the histogram building procedure D7 are meant to improve the quality of the generated histograms.

[0092] The background histogram, foreground histogram, and the converted image data are provided to a probability map computation step D7b, which for instance uses a Bayesian framework (or similar statistic technique) to determine the probability of a particular pixel belonging to the lips (foreground) by means of the foreground and background histograms. An example of such a probability map is shown to the right of the step D7b. The probability map computation can be calculated using Bayesian inference to obtain the posterior probability according to Bayes’ rule, demonstrated below:

[0093] The probability of a pixel with colour (Cb,Cr) of belonging to the foreground (or being lip) can be computed as follows:

where

[0094] The conditional probabilities are calculated by means of the statistics stored in the histogram building procedure employed as follows:

[0095] Once the probability map of being lip has been computed around the mouth area, the result will be used in order to reinforce the histogram quality through a clustering process which will produce a finer segmentation of the lip area.

[0096] At a step D7c, cluster centres for background and foreground are initialised in CbCr colour space. The background cluster centre is computed with colour values corresponding to pixels within the probability map (and thus constrained to the mouth area) which have an associated probability of less than a predetermined threshold value - for example a value of 0.5 in the case of a probability range of 0 to 1. The foreground cluster centre is calculated with colour values corresponding to pixels within the probability map (and thus constrained to the mouth area) which have an associated probability higher than the predetermined threshold value. The cluster centre for each of these is determined as the centre of gravity of all of the points belonging to foreground or background.

[0097] An example of the initialization of the clustering procedure, showing the two cluster centres, is visible in the representation to the left of and slightly above the step D7c. Here it can be observed colour values detected as background as light grey colour and foreground pixels as dark grey tone. This figure represents the probability map, shown in the representation on the right on the process D7c, expressed in the colour-space CbCr. It is noticeable that the amount of pixels belonging to the foreground is very spare and indeed difficult to appreciate in the figure; however good enough to give an accurate approximation of where the centre of the cluster might be. This proximity of the clusters is due to the high similarity between skin and lip colour. In the case of selecting skin as foreground and any other colour as background, the clusters will be much further apart and the situation will be easier to overcome. This is an extreme example which proves the success of the algorithm.

[0098] At a step D7d, a fuzzy c-means clustering algorithm is used to associate the colour values in the CbCr space observed in the mouth area with the closest cluster centre. This can be carried out by determining the degree of membership of each colour value to the foreground cluster centre. This would effectively shift certain colour values from belonging to one cluster to belonging to the other cluster. An example of the reordering provided by this process is visible in the representation provided to the left of and slightly above the step D7d. The output of this process generates an equivalent probability map to that generated from the original histogram data but it should show a much stronger lip structure, as visible in the representation provided beneath the cluster representations. It should be noted that only a single pass of the fuzzy c-means clustering algorithm is carried out (no iteration). There is no re-computation of the cluster centres. This is because the clusters are too close together and many/further iterations might cause misclassifications.

[0099] The fuzzy c-means clustering may be carried out by minimising the following objective function:

where

and uij is the degree of membership of xi (CbCr value) in the cluster).

, where m (fuzziness) = 2

[0100] After the computation of step D7d, an exclusive histogram update step D7a reinforce the content of the histograms based on the output of the clustering stages. In particular, the background histogram is populated with the frequency of occurrence of colour values in the background (face area) - i.e. associated with the background cluster, while the foreground histogram is populated with the frequency of occurrence of colour values in the foreground (lip area) - i.e. associated with the foreground cluster. The representation to the left and above the step D7f shows the regions employed for the histogram updates where the background is the face area and the new strongly defined lip area forms the foreground. Following the histogram building step, at a step D8 it is determined whether a sufficient number of initialisation frames have been processed for the completion of the histogram building process. If less than N frames were processed then the process returns to the step D2, where the user is required to maintain facial alignment with the overlay, and the process of face/mouth detection, histogramming and clustering starts again.

[0101] The histograms are accumulated in this way over several frames, improving the robustness of the foreground and background histograms. When at the step D8 it is determined that the threshold number of initialisation frames has been reached, the initialisation process finishes, and the initialised histograms are carried through into the next stage of real-time processing. At this stage the displayed overlay can be removed from the display. It should be understood that while the histogram does not need updating every frame during the tracking process, it is desirable to update the histogram periodically, for example to account for lighting changes. The reinforcement of the histograms can takes place after the initialization and during the tracking procedure in order to overcome situations in which the user experiences changes in the scene such as lighting which affects directly to colour features.

[0102] Returning to Figure 15B, at step SI5-3, the source image processing module 3 receives data of a captured source image from the camera 5. In the present exemplary embodiment, processing of image data is described with reference to the HSV colour space (hue, saturation and value). It is appreciated that any other colour space may be used, such as HSL, RGB (as received from the camera 5, for example) or YCbCr. Accordingly, the augmentation system 1 may be configured to perform conversion of the captured image data from the camera 5 where necessary, from the camera colour space (e.g. RGB) to the working colour space (e.g. HSV). At step SI5-5, the face locator 15a of the source image processing module 3 determines the location of the source face in the captured source image, and outputs a bounding box 51 of an approximate location for the detected source face. At step SI5-7, the face locator 15a initialises the detected face shape using the trained global shape model 27, the statistics computed at step S8-11 above, and the corresponding global shape regression coefficient matrix 45 retrieved from the model database 21, based on the image data within the identified bounding box 51. Figure 18A shows an example of an initialised face shape 71 within the bounding box 51, displayed over the captured image data 73. The trained shape model may be generated by the shape model training module 23 as described by the training process above. As shown, the candidate face shape at this stage is an initial approximation of the whole shape of the object within the bounding box 51, based on the global shape model 27. Accordingly, the location and shape of individual features of the object, such as the lips and chin in the example of Figure 18 A, are not accurate.

[0103] At step SI5-9, the face locator 15a performs processing to refine the initialised global face shape using the trained sub-shape models 15b and its corresponding cascading regression coefficient matrices 47 for each sub-shape model 15b. This processing is described in more detail with reference to Figure 17. As shown in Figure 17, at step SI7-1, the refinement process starts with the face locator 15a computing and adjusting the nearest shape fitting the global shape model. The weighting of the eigenvectors or parameters of the model for the computed plausible shape should be contained in the scope of valid shapes. A valid shape is defined to have their parameters between some boundaries. Given the shape computed in the previous frame, it is checked if the output from the sub-shape regression coefficient matrices computed independently fits the global shape model definition before proceeding further. Accordingly, at step SI7-3, it is determined if the percentage of parameters out of boundaries is greater than a predefined threshold a. In the positive case, tracking of the object is considered to be lost. If so, the refinement process is terminated and processing may return to step SI5-1 where a new captured image is received from the camera for processing. Otherwise, the refinement module 61 proceeds to adjust the face shape to fit the global shape model 27, at step SI7-3.

[0104] At step S17-5, the refinement module 61 computes a similarity transformation between the adjusted shape and the reference shape defined in S9-5. At step SI7-7, the captured image is transformed based on the computed similarity transformation. At step SI7-9, the refinement module 61 calculates a conversion of the adjusted shape to the transformed image. Figure 18B shows an example of the refined, adjusted global face shape 71a displayed over the captured image data 73. At step SI7-11, the refinement module 61 determines a plurality of candidate sub-shapes from the current adjusted global shape, based on the sub-shape models 15b as discussed above. The candidate sub-shapes are then updated by iteratively applying the corresponding cascading subshape regression coefficient matrices 47 to the sub-shape, starting with the highest, most generalised cascade level.

[0105] Accordingly, at step S17-13, the refinement module 61 selects a first of the candidate sub-shapes. The refinement module 61 then determines at step SI7-15 a BRIEF descriptor for the candidate sub-shape, based on the transformed image at the current cascade level. At step SI7-17, the refinement module 61 updates the current candidate sub-shape based on the corresponding sub-shape regression coefficient matrix 47 at the current cascade level, retrieved from the model database 21. As discussed above, this updating step will depend on the particular regression analysis technique implemented by the system 1 to apply the trained function defined by the sub-shape regression coefficient matrix 47 to the sub-shape data values. At step SI7-19, the refinement module 61 determines if there is another candidate sub-shape to process and returns to step SI7-13 to select the next sub-shape to be processed at the current cascade level. Once all of the candidate sub-shapes have been processed at the current cascade level, the refinement module 61 determines at step SI7-20 if there is another cascade level to process, and processing returns to step SI7-13 where the sub-shape refinement process is repeated for the next cascade level. Figure 18C and 18D show examples of respective sequences of refinement of the two object sub-shapes 75-1, 75-2, displayed over the captured image data 73.

[0106] When it is determined at step SI7-20 that all of the sub-shapes have been processed for all of the cascade levels of the sub-shape regression coefficient matrices 47, then at step SI7-21, the refinement module 61 checks if a predefined accuracy threshold needs to be met by the refined sub-model, for example a two pixel accuracy. It will be appreciated that applying an accuracy threshold is optional. If the accuracy is not within the pre-defined threshold, then processing proceeds to step SI7-23 where the refinement module 61 determines if the percentage of eigenvector weights is under a second pre-defined limit b in sub-model parameters. If not, the refinement process is terminated and processing proceeds to step SI5-11 discussed below. On the other hand, if it is determined at SI7-21 that the pre-defined accuracy threshold needs to be met, then at step SI7-25, the refinement module 61 performs processing to refine the corresponding sub-shape appearance and combined models 15c, 15d. For example, the sub-shape appearance model 15c can be refined using known AAM techniques. At step SI7-27, the refinement module 61 converts the refined sub-shapes 29 back to the original image from the reference image coordinate frame, and brings together the respective separate data structures for the previously split candidate sub-shapes, back into a global shape framework. Figure 18E shows an example of the further refined global face shape 71a displayed over the captured image data 73, as a result of the refinement of the object sub-shapes 75, which is more efficient and accurate than carrying out further refinement of the global face shape 71.

[0107] After the face refinement process is completed, processing proceeds to step S15-11 in Figure 15 A, where the face locator 15a determines whether refinement of the detected object sub-shapes within the acceptable parameters was successfully achieved at step S15-9. If not, for example if it was determined at step S17-3 or step S17-23 that tracking of the object was lost, then processing can return to step SI5-3, where a new captured image is received from the camera for processing in a new iteration by the face locator 15a. Otherwise, if the face locator 15a determines that acceptable sub-shape refinement was achieved by the processing at step S15-9, then at step S15-13, the face locator 15a computes a warped instance of the trained face mesh 16’ to fit the detected face in the captured source image, for example as illustrated in Figure 5F. The determined coordinates from the captured image data define the positions of the vertices of the polygonal regions of the face mesh 16’ to match the detected face shape.

[0108] At step SI 5-15, the face locator 15a extracts the pixel data of the source image corresponding to face pixels of the located source face, for example defined by a mask of the warped instance of the face mesh 16’ from step SI5-13. Optionally, the face locator 15a can perform pre-processing of the extracted face pixels for image enhancement, such as automatic correction of white balance, levels and/or gamma. At step SI5-17, the visible feature detector 19 of the source image processing module 3 computes characteristics of visible makeup products present in the source face, based on the pixel values of extracted face pixels from respective predefined regions of the source face defined relative to vertices of the warped face object model 17. For example, referring to the flow diagram of Figure 18, at step SI8-1, the visible feature detector 17 computes one or more parameters for a first predefined visible feature representative of the characteristics, such as colour and brightness/intensity, of a layer of “foundation” makeup that has been applied generally to the skin areas of the detected face. The characteristics may be computed from the average value of extracted face pixels in predefined skin regions of the target face. It is appreciated that in the absence of any actual applied foundation makeup, the “foundation” parameters will instead be indicative of the base skin tone or colour of the detected face.

[0109] At step S18-3, the visible feature detector 17 computes a set of parameters for a second predefined visible facial feature representative of the characteristics of an applied layer of “blush” makeup to predefined areas of the detected face. The characteristics may be computed from the average value of extracted face pixels in cheek regions of the target face. At step SI 8-5, the visible feature detector 17 computes a set of parameters for a third predefined visible facial feature representative of the characteristics of an applied layer of “eyeshadow” makeup to predefined areas of the detected face. The characteristics may be computed from the average value of extracted face pixels in predefined regions around the eyes of the target face. At step SI8-7, the visible feature detector 17 computes a set of parameters for a fourth predefined visible facial feature representative of the characteristics of an applied layer of “lipstick” makeup to predefined areas of the detected face. The characteristics may be computed from the average value of extracted face pixels in predefined lip regions of the target face. The skin, cheek, eye and lip regions of the target face may be defined relative to respective labelled feature points of the warped instance of the face mesh 16’.

[0110] Referring back to Figure 15A, optionally, the visible feature detector 19 may repeat the processing of steps SI5-3 to SI5-17 to compute respective characteristics of visible features based on image data of one or more subsequent captured source images of the source face, at step S15-19. The visible feature detector 19 may calculate an average of the computed characteristics from each of a plurality of captured source images, to provide more accurate parameters that account for variations in the capture environment, such as lighting effects that vary from image frame to frame.

[0111] At step SI5-21, the visible feature detector 19 determines colourisation parameters 9 for each identified visible makeup product detected in the source face, based on the characteristics of each visible feature computed at step SI5-17. An example of the processing by the visible feature detector 19 to retrieve colourisation parameters 9 from a makeup product database 9a, is described with reference to the flow diagram of Figure 19, for the present exemplary worked example of predefined set of makeup products. Each virtual product in the database 9a may be defined as a data structure including a unique identifier and one or more properties that may be passed to the shader modules 23 of the colourisation module 7 to replicate the appearance of the associated virtual product on image data of a target face. Examples of virtual products and corresponding properties are provided below. It will be appreciated that depending on the property types and values, conversion of the retrieved properties may be required into types and formats that are accepted by the corresponding shader modules 23.

[0112] Foundation example:

[0113] Blush example:

[0114] Eyeshadow example:

[0115] Lipstick example:

[0116] For example, as shown in Figure 18, at step S18-1, the visible feature detector 19 identifies a virtual foundation product in the product database 9a having colour and intensity values that are a closest numerical match to colour and intensity values of the “foundation” parameters computed in step SI7-1. At step SI8-3, the visible feature detector 19 identifies a virtual blush product in the product database 9a having colour and intensity values that are a closest match to the colour and intensity of the “blush” parameters computed in step S17-3. At step S18-5, the visible feature detector 19 identifies a virtual eyeshadow product in the product database 9a having characteristics and values that are a closest match to the corresponding colour and intensity, and glitter intensity of the “eyeshadow” parameters computed in step SI7-5. At step SI8-7, the visible feature detector 19 identifies a virtual lipstick product in the product database 9a having characteristics and values that are a closest match to the corresponding colour and intensity, glitter intensity and gloss intensity of the “lipstick” parameters computed in step S17-7. Determination of a closest match may be weighted, for example giving preference to closer (or identical) matching colour values over intensity values, glitter intensity, gloss intensity, etc. At step SI8-9, the visible feature detector 19 retrieves the associated characteristics and values from each identified virtual product in the product database 9a and returns the characteristics and values as colourisation parameters 9 for each virtual product to be passed to the colourisation module 7.

[0117] Referring to Figure 15B, at step S15-23, the tracking module 11 receives captured image data of a target image from the camera 5, which can be an image in a sequence of images or video frames. Optionally, the tracking module 11 may be configured to perform another initialisation process as discussed above with reference to step SI5-1, prior to capturing the first target image frame. At step SI5-25, the tracking module 11 determines if a target face was previously detected and located for tracking in a prior target image or video frame. In subsequent iterations of the tracking process, the face locator 15b of the tracking module 11 may determine that the target face was previously detected and located, for example from tracking data (not shown) stored by the system 1, the tracking data including a determined global face shape of the detected face, which can be used as the initialised global face shape for the current captured target image. As this is the first time the tracking process is executed, processing proceeds to step SI5-27 where the face locator 15b of the tracking module 11 determines the location of the target face in the captured target image, in the same way as described above at step SI5-5. The face locator 15b initialises the detected target face shape at step SI5-29 and performs processing to refine the initialised global face shape at step SI5-31, in the same way as described above with reference to steps SI5-7 and S15-9.

[0118] At step SI5-33, the face locator 15b determines whether refinement of the detected object sub-shapes within the acceptable parameters was successfully achieved at step SI5-31. If not, then processing returns to step SI5-23, where a new captured target image is received from the camera 5 for processing in a new iteration by the face locator 15b. Otherwise, if the face locator 15b determines that acceptable sub-shape refinement was achieved by the processing at step S15-29, then at step S15-35, the face locator 15b optionally applies an exponential smoothing process to the face shape, based on the face shape detected in the previous frame when available. Exponential smoothing can be carried out on the estimated object shape data in order to produce smoothed data for presentation purposes, based on the following exemplary equation:

St = <xxt+ (l-a)st-i where St-1 is the previous object shape determined from the previous frame, St is the smoothed version of the current estimated object shape xt, and a is a weighting value which is adapted automatically during runtime. It will be appreciated that this smoothing technique advantageously provides for improved visualisation of the estimated shape(s), therefore forecasts need not be obtained to make predictions of where the object might be in the next frame. The complex environments where the invention aims to operate includes unknown lighting conditions, movements of both the camera and the object to track occasioning very complicated motion models and no ground truth of the real position or measurement to be used in the update step in more complicated strategies for tracking such as Kalman filtering.

[0119] At step S15-37, the colourisation module 7 applies image colourisation to the captured target image data by modifying pixel values of the detected target face in the captured target image, based on the received colourisation parameters 9 corresponding to one or more virtual try-on products. The colourisation process performed by the colourisation module 7 in the system 1 will now be described in more detail with reference to Figure 21. Reference is also made to Figure 22, showing examples of data that is processed by, and processing steps performed by the colourisation module during the colourisation process. As shown in Figure 21, at step S21-1, the colourisation module 7 selects a first set of the colourisation parameters 9 received from the visible feature detector 19 of the source image processing module 3. At step S21-3, the colourisation module 7 retrieves the texture model 16 and the texture data file 20 associated with the selected set of colourisation parameters 9.

[0120] In the illustrated example of Figure 22, four texture models 16 are retrieved from the model database 21, each with a respective different mask 16a and mesh subset 16b. Each retrieved texture model 16a-l to 16a-4 is selected based on a corresponding set of colourisation parameters 9 associated with detected visible applied lipstick, eyeshadow, blush and foundation, respectively. A first mask 16a-l defines a masked lip region of the reference image 25c and is associated with a first mesh subset 16b-l defining polygonal areas around the masked lip region. A second mask 16a-l defines two masked eye regions of the reference image and is associated with a second optimised mesh 16a-2 defining polygonal areas around the masked eye regions. A third mask 14c-l defines two masked cheek regions of the reference image 25c and is associated with a third mesh subset 16b-3 defining polygonal areas around the cheek regions. A fourth mask 16a-4 defines a masked skin region of the reference image and is associated with a fourth optimised mesh 16a-4 defining polygonal areas of the masked skin region.

[0121] At step S21-5, the colourisation module 7 selects a first region of the mesh subset 16b from the retrieved texture model 16. At step S21-7, the transform module 7c determines a set of transformation values by mapping the coordinates of the vertices of the selected region to the location of the corresponding tracked feature point determined by the face locator 15a. At step S21-9, the transform module 7c retrieves the corresponding region of texture data 33, again as referenced by the vertices of the selected region, and applies the transformation to the retrieved region of texture data to generate a corresponding warped texture data region. Optionally, the transform module 7c may also retrieve the corresponding region of mask data 16a, as defined by the vertices of the selected region, and apply the transformation to the retrieved masked data to generate corresponding warped masked data for the selected region. At step S21-11, the colourisation module 7 applies the one or more image colourisation adjustments to the warped texture data region using the one or more shader modules 7a as defined by the shader value parameter 9-3. As will be described below, the shader modules 7a may optionally take into account the warped mask data region, depending on the particular shader sub-modules that are used.

[0122] At step S21-13, the colourisation module 7 determines if there is another region of the optimised mesh 15’ to be processed, and if so, processing returns to step S21-5 where the next region is selected for processing as discussed above, until all of the regions of the mesh subset 16b have been processed in this way. At step S21-17, the colourisation module 7 then determines if there is another set of colourisation parameters 9 to be processed for the current captured image frame. If so, processing returns to step S21-1 where the next set of colourisation parameters 9 is selected and processed as discussed above, until all of the sets of colourisation parameters 9 have been processed in this way.

[0123] At step S21-19, the renderer 7b retrieves and overlays all of the optimised meshes 18 as a sequence of layered data to be applied to the captured image data. This is schematically illustrated at S22-1 in Figure 22. At step S21-21, the renderer 7b performs an alpha blend of the adjusted texture data regions associated with each of the layered optimised meshes 18, as output by the respective shader modules 7a. Figure 22 shows an example of the blended result at S22-2. The renderer 7b then overlays the blended results on the original captured image data for output to the display 11, at step S21-23. Figure 22 shows an example of the resulting augmented image data at S22-3.

[0124] It will be appreciated that this is just one exemplary sequence of processing steps to retrieve the respective regions of texture data 33 defined by image coordinates corresponding to the vertices of the masked regions defined by the mesh subset 16b. As one alternative, the colourisation module 7 may be configured to determine a set of transformation values by mapping all of the vertices of the normalised mesh 16’ as a whole to the respective corresponding labelled feature points of the tracking data, whereby the determined transformation values can be modified by the parameter modifier 7d before being applied to each region of texture data and mask data as discussed above. Figure 23 schematically illustrates an exemplary sequence of data that may be processed by, and processing steps performed by, the transform module 7c to determine transformation of mesh data. In the illustrated example, the captured image 8 and associated detected tracking feature point data 25b’ can be combined with the normalised mesh 16’, to produce a single mesh including the coordinates of the vertices from the tracked data 25b’ and the coordinates of the vertices from the normalised mesh 16’. The vertices from the normalised mesh 16’ are mapped to the vertices of the tracked data 25b’, to determine respective transformation values based on the respective coordinates for each corresponding pair of vertices, for example in terms of translation in the two-dimensional plane. The resulting transformation values can be illustrated as a morphed result, which can be subsequently modified by the parameter modifier 7d before being applied to at least a portion of a mask data 16a and texture data 33, as described above.

[0125] Referring back to Figure 15B, the resulting augmented target image with the applied texture and colourisation is output at step S15-39 on display 11. At step S15-41, the tracking module 11 determines if there is a new captured image frame to process and processing returns to step SI 5-2 where image data of the next captured target frame is received from the camera 5.

Shader Modules [0126] Figure 24, which comprises Figures 24A to 24D, schematically illustrate exemplary shader modules 7a and respective processes for applying colourising adjustments, as set out in step SI5-17 above, to identified portion(s) of associated texture data and/or captured image data. Each shader module 7a is defined by a predetermined set of shader sub-modules 32 for performing respective adjustments to the texture image data and/or captured image data, optionally taking into account properties 9-1 of the present set of colourisation parameters 9.

[0127] Figure 24A illustrates a first example of a lip shader module 7a-l for applying colourisation to a portion of the captured image data based on a corresponding portion of a lipstick detail texture 9-2-1. In this example, a lip mask 16a-l defines the masked portion as the lips of a face in the captured image data, for example as shown in Figures 7D and 34. At a step Gl, the warped region of the lipstick detail texture data file 9-2-1 is provided. This is a predetermined lip image 9-2-1 warped into the shape of the detected object in the captured image frame, and carrying a texture such as glossy or matte. At step G2, the captured image data from the camera 5 is provided, in which the user’s face will typically be visible. At step G7, a highlight adjustment shader sub-module 32-1 uses the lipstick detail texture 9-2-1 and captured image data to perform a blend operation in a highlight adjustment stage. This blend operation serves to average (per pixel) the luminance of the lipstick detail texture and captured image data. This adds additional detail to the captured image data which may in some cases show quite featureless lips. For example, the operation can be applied on a per channel basis for the input pixels a, b, across the red, blue and green channels, as follows:

[0128] This is followed by a greyscale conversion step G8 to convert the combined output of the captured image data and lipstick detail texture 9-2-1 (output of step G7) into greyscale. For example, this can be calculated as a weighted sum of the colour channels, with weights set to best match the human perception of colour, as follows:

[0129] Then, the exposure of the output of the step G8 is adjusted at a step G9, based on an exposure property 9-1-2, to influence the brightness level at which highlight features would be added to the lip texture, and has the effect of nonlinearly increasing or decreasing the input value. For example, exposure can be computed as:

[0130] As discussed above, the various properties taken into account by the shader sub-modules in this process can be defined by the present selected set of colourisation parameters 9.

[0131] Similarly, at a step GIO the gamma of the greyscale image is adjusted, using a gamma property 9-1-3, for the same reasons as the step G9. The result of G9 and GIO may be a pixel value which has either been emphasised (brightened) or diminished (reduced in brightness). GIO has the effect of nonlinearly adjusting the greys of an image either boosting or diminishing their output value without adjusting either complete white or complete black as follows:

[0132] A multiply shininess step Gll then modifies the shininess of the greyscale image/texture based on a shininess property 9-1-4. In other words, the step G11 linearly modulates the pixel value to inhibit harsh lighting effects. The resulting output of the highlight adjustment stage is passed to a first step of a blend colour adjustments stage. The purpose of the steps G9 to Gll is to emphasise existing areas of brightness in the final augmented lip texture. The resulting output of the highlight adjustment sub-module 32-1 is passed to a first processing step of a blend colour adjustment shader sub-module 32-2.

[0133] At a step G12, a lip colour adjustment shader sub-module 32-3 performs a greyscale operation on the captured image data as a first step to convert incoming pixel colour values into greyscale. Then, at a step G13 the greyscale image is blended with a lip colour property 9-1-1 (selected lip colour property - from a step G3) to form an overlay. The resulting output of the lip colour adjustment sub-module 32-3 is also passed to the blend colour adjustment shader sub-module 32-2.

[0134] Meanwhile, at a step G4 a static noise texture, such as a simple Gaussian noise, is provided as a 2D image. A glitter texture is provided at a step G5 (Gaussian noise, and again a 2D image, but in this case warped to the shape of the lips/model). Optionally, an appearance model texture may be provided as input for further colour adjustment, for example to a Gaussian blur at a first step G14 of a glitter adjustment shader sub-module 32-4 to soften the edges of the lip model texture. The blurred model, and the static and warped textures may be passed to a multiply step G15 in combination with a glitter amount property 9-1-5. The textures are multiplied together (weighted by the glitter amount property 9-1-5) so that the pixel values (greyscale) of spatially correlated pixels with the respective 2D images are multiplied together. When the lips (and the model) move, the warped texture will move with respect to the static texture, causing a sparkling effect on the lips. The resulting output of the glitter adjustment sub-module 32-4 is also passed to the blend colour adjustment shader sub-module 32-3.

[0135] At a step G18, the outputs of the steps Gil, G13 and G15 are added together in the first step of the blend colour adjustment shader sub-module 32-2. At a step G16, a lighting model adjustment sub-module computes a lighting model adjustment by linearly interpolating the blurred appearance model texture based on a 50% grey level set at a step G17 and a lighting property 9-1-6 (which controls how much influence is provided by the output of the appearance model, and how much influence is provided by the fixed grey level). The overlay generated at the step G18 is then blended with the lighting model by the blend colour adjustment sub-module 32-2, at a step G19. The purpose of the lighting model adjustment is to emphasise the detail taken from the appearance model texture, while controlling the level of influence this has (using the lighting property 9-1-6 and G17 grey level) so as not to produce harsh, dominating effects. The output of the step G19 is then further linearly interpolated based on alpha value of the lip colour property 9-1-1 (to control the balance between the original input image and the augmented overlay) and the captured image at a step G20.

[0136] At a step G21, an alpha blend adjustment sub-module 32-6 applies a Gaussian blur operation to soften the edges of the lip mask data 16a-l (defining which parts of an image are lip and which are not) at step G21, and then at a step G22 is used to perform an alpha blend stage with the adjusted overlay, received from the blend colour adjustment sub-module 32-2, and the captured image data.

[0137] Advantageously, this prevents the colourisation from being applied outside the lip region of the input image, and softens the colourisation at the boundary of the lips. In summary, the overall computed highlight intensity calculated by this exemplary lip shader module 7a-1 is as follows: • Highlight Adjustment CH = Gamma(Exposure( Greyscale( BlendSofiLightfWC, LD) ), EP), GP) * SP where CH is the computed highlight intensity, WC is the captured image pixel colour, LD is the Lipstick Detail Texture pixel colour, EP is the Exposure Property 25a-2, GP is the Gamma Property 9-1-3, and SP is the Shininess Property 9-1-4. • Lip Colour Adjustment CC = Overlay (LC, Greyscale (WC)) where CC is the computed lip colour, and LC is the Lip Colour Property 9-1-1. • Glitter Adjustment:

CG = GT*NT* Guassian(AM) * GA where CG is the computed glitter intensity, NT is the Static Noise Texture pixel colour, GT is the Glitter Texture pixel colour, AM is the Appearance Model pixel colour, and GA is the Glitter Amount Property 9-1-5. • Lighting Model Adjustment CL = Lerp(0.5, AM, LP) where CL is the computed lighting model intensity, and LP is the Lighting Property 9-1-6. • Blend Colour Adjustments BC = Lerp( WC, Overlay(CC + CH + CG, CL)) where BC is the blended colours adjustments. • Alpha Blend Adjustment OT = AlphaBlend(BC, WC Guassian(LM)) where OT is the Output Texture' pixel colour, and LM is the 'Lip Mask Texture' pixel colour.

[0138] Figure 24B illustrates a second example of a lip shader module 7a-2 for applying colourisation to a portion of the captured image data, based on a corresponding portion of a lipstick detail texture 9-2-1. As in the first example, the lip mask 16a-l defines the masked portion as the lips of a face in the captured image data. However, in this example, the lip stick shader module 7a-2 is configured to use a different set of shader sub-modules 32 than the first example above. Additionally, instead of applying the alpha blend to the captured and adjusted image data, an adjusted colour value for each pixel is output as the resulting colourised texture data along with a corresponding calculated alpha value for each pixel. Accordingly, as shown in Figure 24B, an alpha blend calculation sub-module 32-7 calculates the respective alpha blend values for the output texture portion by first receiving output data from a highlight adjustment sub-module 32-1 and a glitter adjustment sub-module 32-4, and adding the received data together at a step G18 based on an intensity property 9-1-7. The output of step G18 is then additively multiplied with data of the warped portion of the lip mask 16a-l at step G15, and further processed in a subsequent saturation step G19. The intensity property 9-1-7 is also used by the glitter adjustment sub-module 32-4 as a further parameter to control the glitter adjustment.

[0139] A colour adjustment sub-module 32-3 is used to apply the lip colour property 9-1-1 to a greyscale version of the portion of the captured image data to determine the colour values for the output texture. In this example, the colour adjustment sub-module 32-3 is configured to apply a “hard lighf’ blend at a modified step G13, to combine the lip colour property 9-1-1 with the greyscale captured image data. For example, the operation can apply the property b to each input pixel a as follows:

[0140] Figure 24C illustrates an example of a foundation shader module 7a-3 for applying colourisation to another portion of the captured image data, based on a corresponding warped portion of a face mask 16a-4. In this example, the face mask 16a-4 defines the masked portion as the skin portion of a face in the captured image data, for example as shown in Figures 7D and 34. As shown in Figure 24C, a blend colour adjustment sub-module 32-2 linearly interpolates the captured image data from the camera 5 with a blurred version of the captured image data, based on the weighted output of a smooth mask sub-module 32-7. The smooth mask sub-module 32-7 performs processing at a step G18 to add the face mask data 16a-4 with a ramped greyscale version of the captured image data, based on an intensity property 9-1-7 and a smooth property 9-1-8, and adjusts the saturation of the output at a step G19.

[0141] Figure 25 schematically illustrates an example process for generating a blurred version of the captured image data, which is particularly optimal in the context of applying virtual foundation make-up in a augmented reality system 1. As shown in Figure 25, a blurring sub-module 32-8 receives the captured image data from the camera 5. At a step B3, the captured image data is blurred by downsampling the input image data to a lower resolution. At a step B4, a threshold function is applied to the pixel values of the captured image data, for example by a function:

[0142] At a step B5, the thresholded image data is multiplied by the face mask 16a-4, retrieved from the model database 21, to discard pixels outside the masked face region. At a step B6, the blurred image data is mixed with the result of step B5, resulting in the discarding of pixels outside the masked face region and discarding of dark features from the input captured image data. At a step B7, the result of step B6 is alpha blended with the original captured image data. Advantageously, the blurring sub-module 32-8 outputs a resulting image with softened facial features, while maintaining sharp facial features. Although the blurring process in Figure 37 is described as applied to the entire image as captured by the camera 5, it is appreciated that the blurring process can be applied to just the masked region of the captured image data for improved efficiencies.

[0143] Figure 24D illustrates an example of a blusher and eyeshadow shader module 7a-4 for applying colourisation to yet other portions of the captured image data, based on a corresponding portion of an eye mask 16a-2 or a blusher mask 16a-3. In this example, the eye mask 16a-2 defines the masked portion as the eye portions of a face in the captured image data, and the blusher mask 16a-3 defines the masked portion as the cheek portions of a face in the captured image data for example as shown in Figures 7D and 34. As shown in Figure 24D, the colour values of the output texture portion are calculated by applying adjustments to the corresponding portion of the captured image data using the colour adjustment sub-module 32-3 and the blend colour adjustment module 32-2, similarly to the examples discussed above. The alpha blend calculation sub-module 32-7 calculates the corresponding alpha values for the output texture portion, based on the received output from the glitter adjustment sub-module 32-4, an intensity property 9-1-7, and the warped region of the blush or eye mask data 16a-3, in a similar manner as the examples discussed above.

Computer Systems [0144] The modules described herein, such as the training, tracking and colourisation modules, may be implemented by a computer system or systems, such as computer system 1000 as shown in Figure 26. Embodiments of the present invention may be implemented as programmable code for execution by such computer systems 1000. After reading this description, it will become apparent to a person skilled in the art how to implement the invention using other computer systems and/or computer architectures.

[0145] Computer system 1000 includes one or more processors, such as processor 1004. Processor 1004 may be any type of processor, including but not limited to a special purpose or a general-purpose digital signal processor. Processor 1004 is connected to a communication infrastructure 1006 (for example, a bus or network). Various software implementations are described in terms of this exemplary computer system. After reading this description, it will become apparent to a person skilled in the art how to implement the invention using other computer systems and/or computer architectures.

[0146] Computer system 1000 also includes a user input interface 1003 connected to one or more input device(s) 1005 and a display interface 1007 connected to one or more display(s) 1009. Input devices 1005 may include, for example, a pointing device such as a mouse or touchpad, a keyboard, a touchscreen such as a resistive or capacitive touchscreen, etc. After reading this description, it will become apparent to a person skilled in the art how to implement the invention using other computer systems and/or computer architectures, for example using mobile electronic devices with integrated input and display components.

[0147] Computer system 1000 also includes a main memory 1008, preferably random access memory (RAM), and may also include a secondary memory 610. Secondary memory 1010 may include, for example, a hard disk drive 1012 and/or a removable storage drive 1014, representing a floppy disk drive, a magnetic tape drive, an optical disk drive, etc. Removable storage drive 1014 reads from and/or writes to a removable storage unit 1018 in a well-known manner. Removable storage unit 1018 represents a floppy disk, magnetic tape, optical disk, etc., which is read by and written to by removable storage drive 1014. As will be appreciated, removable storage unit 1018 includes a computer usable storage medium having stored therein computer software and/or data.

[0148] In alternative implementations, secondary memory 1010 may include other similar means for allowing computer programs or other instructions to be loaded into computer system 1000. Such means may include, for example, a removable storage unit 1022 and an interface 1020. Examples of such means may include a program cartridge and cartridge interface (such as that previously found in video game devices), a removable memory chip (such as an EPROM, or PROM, or flash memory) and associated socket, and other removable storage units 1022 and interfaces 1020 which allow software and data to be transferred from removable storage unit 1022 to computer system 1000. Alternatively, the program may be executed and/or the data accessed from the removable storage unit 1022, using the processor 1004 of the computer system 1000.

[0149] Computer system 1000 may also include a communication interface 1024. Communication interface 1024 allows software and data to be transferred between computer system 1000 and external devices. Examples of communication interface 1024 may include a modem, a network interface (such as an Ethernet card), a communication port, a Personal Computer Memory Card International Association (PCMCIA) slot and card, etc. Software and data transferred via communication interface 1024 are in the form of signals 1028, which may be electronic, electromagnetic, optical, or other signals capable of being received by communication interface 1024. These signals 1028 are provided to communication interface 1024 via a communication path 1026. Communication path 1026 carries signals 1028 and may be implemented using wire or cable, fibre optics, a phone line, a wireless link, a cellular phone link, a radio frequency link, or any other suitable communication channel. For instance, communication path 1026 may be implemented using a combination of channels.

[0150] The terms "computer program medium" and "computer usable medium" are used generally to refer to media such as removable storage drive 1014, a hard disk installed in hard disk drive 1012, and signals 1028. These computer program products are means for providing software to computer system 1000. However, these terms may also include signals (such as electrical, optical or electromagnetic signals) that embody the computer program disclosed herein.

[0151] Computer programs (also called computer control logic) are stored in main memory 1008 and/or secondary memory 1010. Computer programs may also be received via communication interface 1024. Such computer programs, when executed, enable computer system 1000 to implement embodiments of the present invention as discussed herein. Accordingly, such computer programs represent controllers of computer system 1000. Where the embodiment is implemented using software, the software may be stored in a computer program product 1030 and loaded into computer system 1000 using removable storage drive 1014, hard disk drive 1012, or communication interface 1024, to provide some examples.

[0152] Alternative embodiments may be implemented as control logic in hardware, firmware, or software or any combination thereof

Alternative Embodiments [0153] It will be understood that embodiments of the present invention are described herein by way of example only, and that various changes and modifications may be made without departing from the scope of the invention. Further alternative embodiments may be envisaged, which nevertheless fall within the scope of the following claims.

[0154] For example, it will be appreciated that although the respective processes and associated processing modules are described as separate embodiments, aspects of the described embodiments can be combined to form further embodiments. For example, alternative embodiments may comprise one or more of the object tracking, shape training, texture training and object colourisation and augmentation aspects described in the above embodiments.

[0155] In the worked exemplary embodiments described above, the visible feature detector is configured to compute characteristics of foundation, blush, eyeshadow and lipstick makeup products applied to respective regions of a source face in a captured image. As those skilled in the art will appreciate, the visible feature detector may be further configured to determine the absence of makeup products applied to one or more of the predefined visible features, whereby characteristics are not computed and provided for those makeup products that are not determined to have been applied to the face.

[0156] An another alternative, the augmentation system may be configured to output to a user interface of the system, a plurality of candidate matching makeup products and allow the user to select the matching product to augment the captured image data. Additionally, the user interface may be configured to allow the user to enable or disable the virtual application of selected makeup products to the captured image data.

[0157] As yet another alternative, the source image processing module, the tracking module and/or the colourisation module may be provided as one or more distributed computing modules or processing services on a remote server that is in communication with the augmented reality system via a data network. Additionally, as those skilled in the art will appreciate, the source image processing module, the tracking module and/or the colourisation module functionality may be provided as one or more application programming interface (API) accessible by an application program executing on the augmented reality system, or as a plug-in module, extension, embedded code, etc., configured to communicate with the application program.

Claims

1. A computer-implemented method of augmenting image data of a person’s face, the method comprising: receiving data of a source image and at least one target image captured by a camera, the source image including at least a portion of a source face including a region having a visible feature, and each target image including a corresponding visible feature of a target face; identifying a region of pixels in the source image associated with the visible feature based on a face model fitted to the source face in the source image; computing at least one characteristic of the visible feature based on pixel values of at least one of the pixels in the identified region of the source image; identifying a region of pixels in the or each target image associated with the corresponding visible feature based on the face model fitted to the target face in the or each target image; modifying pixel values of the identified region in the or each target image based on the computed at least one characteristic of the corresponding visible feature in the source image; and outputting the or each captured target image with the modified pixel values for display.

2. The method of claim 1, further comprising determining a location of the source face in the source image and extracting pixels of the source image corresponding to the located face.

3. The method of claim 2, wherein determining a location comprises modifying an instance of a stored face model to match the source face in the source image.

4. The method of claim 3, wherein the region is calculated based on the locations of a predefined plurality of vertices of the modified instance of the face model.

5. The method of claim 4, wherein computing the characteristics comprises computing an average of pixel values within the region.

6. The method of any preceding claim, further comprising retrieving colourisation parameters from a database based on the computed characteristics, the colourisation parameters defining values to augment said region of the captured image.

7. The method of claim 6, wherein the colourisation parameters include one or more texture data files, each associated with at least one characteristic of a visible feature of a face, and retrieving a matching texture data file based on the calculated at least one characteristic of the visible feature in the source image, wherein pixel values of the identified region in the or each target image are modified at least based on data values of the retrieved texture data file.

8. The method of claim 6 or claim 7, wherein the colourisation parameters further comprise mask data to determine one or more masked regions of said captured image.

9. The method of claim 8, wherein the colourisation parameters further comprise data defining at least one texture image defining values to augment said one or more masked regions of said captured image.

10. The method of claim 9, wherein the mask data defines at least one polygonal region defined by three or more of vertices, wherein each vertex is associated with a corresponding labelled feature point of the model fitted to a face in a captured image.

11. The method of claim 10, determining a transformation of the at least one polygonal region of the mask data based on received coordinates of the corresponding feature points of the model fitted to the face in the captured image.

12. The method of claim 11, applying the determined transformation to corresponding regions of the texture image data defined by the at least one polygonal regions of the mask data

13. The method of any one of claims 6 to 12, wherein the colourisation parameters comprise data defining a mathematical model to generate an array of augmentation values.

14. The method of any one of claims 6 to 13, wherein the colourisation parameters further comprise data identifying one or more material properties.

15. The method of claim 14, wherein each material property is associated with one or more of a highlight adjustment, a colour adjustment, a glitter adjustment, a lighting model adjustment, a blend colour adjustment, and an alpha blend adjustment to the retrieved augmentation values.

16. The method of any one of claims 6 to 15, wherein the colourisation parameters further comprise data defining one or more shader modules to modify said pixel values based on the modified colourisation parameters.

17. The method of any preceding claim, wherein modifying the captured image data comprises alpha blending the results of augmenting captured image data with each of a plurality of modified retrieved colourisation parameter values in sequence.

18. The method of any preceding claim, wherein the visible features are one or more of a foundation, blusher, eyeshadow and lipstick makeup product applied to the source face.

19. The method of claim 18, wherein characteristics comprise colour and intensity properties

20. The method of any preceding claim, wherein the captured source and target image data are pre-processed to automatically correct white balance, levels and gamma of the pixel values.

21. The method of any preceding claim, wherein a plurality of source images in sequence are captured by the camera, each source image including the visible feature of the object, and further comprising calculating the average of the calculated characteristics of the visible feature based on each of the plurality of source images.

22. A computer-implemented method of augmenting image data, the method comprising modifying pixel values of one or more identified regions of a face in a target image based on augmentation parameters derived from corresponding identified regions of a face in a source image.

23. A system comprising means for performing the method of any one of claims 1 to 22.

24. A storage medium comprising machine readable instructions stored thereon for causing a computer system to perform a method in accordance with any one of claims 1 to 22.

25. A method substantially as hereinbefore described with reference to, or as illustrated in Figures 2 to 10 and 18 of the accompanying drawings.

26. An augmented reality system substantially as hereinbefore described with reference to, or as illustrated in Figures 1, 2 and 4 of the accompanying drawings.