CN101149795A

CN101149795A - Specific object detection device

Info

Publication number: CN101149795A
Application number: CNA2007101674514A
Authority: CN
Inventors: 艾海舟; 黄畅; 武勃; 劳世红
Original assignee: Tsinghua University; Omron Corp
Current assignee: Tsinghua University; Omron Corp
Priority date: 2004-05-14
Filing date: 2004-05-14
Publication date: 2008-03-26
Anticipated expiration: 2024-05-14
Also published as: CN100565556C

Abstract

The invention provides a detection devices and method of special objects. Make the judgment if certain notable region in the image contain human faces, and can achieve high-speed and high-precision. The detection device of special object by this invention does not match characteristics with judgment value according to a threshold, but through the use of list etc. Therefore, the corresponding relations between judgment value and characteristic value are more accurate, can achieve high-precision processing. In addition, in order to ensure the accuracy of judge judgments was repeated again and again in the past, make final judgments according to comprehensive results, but this invention reduce the times of repeated judgments, realize high speed owing to advanced accuracy of primary treatment.

Description

Specific object detection device

The application is a divisional application of Chinese patent application with the application date of 2004, 5-month and 14-day, the application number of 200410038193.6 and the name of the invention of a specific object detection device.

Technical Field

The present invention relates to a technique effectively applied to an apparatus and a method for detecting a specific object or a part of an object such as a human body, an animal, or an object included in a captured image.

Background

As a conventional technique, there is a technique of detecting a specific object or a part of an object such as a person, an animal, or an object included in a captured image. As an example of such a conventional technique, there is a technique of detecting a face of a person from a captured image (see non-patent document 1).

In non-patent document 1, a specific rectangle (hereinafter referred to as "face determination rectangle") is moved in an image to be processed, and it is determined whether or not a face of a person is included in the face determination rectangle (hereinafter referred to as "attention area") at each moving point. Fig. 14 is a diagram showing an example of a face determination rectangle (face determination rectangle P1). The process of detecting the face of a person using the face determination rectangle P1 will be described with reference to fig. 14.

The face determination rectangle P1 includes a plurality of other rectangles (hereinafter referred to as "first rectangles" and "second rectangles") P2 and P3 within the rectangle. The first rectangle P2 and the second rectangle P3 are disposed at predetermined positions within the face determination rectangle P1. One or more first rectangles P2 and one or more second rectangles P3 are arranged in one face determination rectangle P1.

In the face detection process, the feature amounts of the regions (hereinafter referred to as "first feature region" and "second feature region", respectively) surrounded by the first rectangle P2 and the second rectangle P3 of each target region are calculated. The feature values of the first feature region and the second feature region are, for example, average values of pixel values in the respective regions.

Next, the difference between the feature amount La of the first feature region and the feature amount Lb of the second feature region is calculated. Then, it is determined whether or not the face of the person is included in the target region based on whether or not the difference value is larger than a predetermined threshold value α. The threshold value α is obtained by learning using a sample image.

In actual processing, such a face determination rectangle determines each of a plurality of prepared graphics. In each figure, the number and positions of the first rectangles P2 and the second rectangles P3 are different from each other. In this way, the final determination is made as to whether or not the face of the person is included in the region of interest, based on the respective determination results.

Patent document 1 also adopts a technique of detecting a face of a person by calculating a difference between the feature amounts of the first feature region and the second feature region in this manner.

[ patent document 1 ] Japanese patent laid-open No. 2000-123148

[ Nonpatent document 1 ] Paul Viola, michael joints, "road Real-time Object Detection", SECOND INTERNATIONAL WORKSHOP ON STATISTICAL AND COMPATIONAL THEORIES OF VISION-MODELING, LEARNING, COMPUTING AND SAMPLING VANCOUVER CANADA, JULY 13, 2001.

Although the accuracy of the detection of the face of the person in the image is improved by applying the method using the face determination rectangle P1 as described above, the processing for detecting the face of the person in the image is required to be performed in real time by a device having a low specification such as a mobile phone. In addition, there is still a demand for improving the accuracy of face detection of a person of an image.

Disclosure of Invention

The invention aims to: in order to solve the above-described problems, an apparatus and the like are provided that realize high speed and high accuracy in a process of determining whether or not a face of a person is included in a certain region of interest in an image.

[ first mode ]

In order to solve the above problem, the present invention adopts the following configuration. A first aspect of the present invention is a specific object detection device including a storage device, a calculation device, and a determination device.

The storage device is used for storing each judgment value prepared corresponding to a plurality of characteristic quantities. The judgment value hereinafter means a value used in the judgment process of the judgment device. The determination value is, for example, a value indicating whether the probability that a specific object is included in the target area is high or low when the corresponding feature amount is calculated by the calculation device.

The calculation means calculates a feature amount in the region of interest. The attention area to be referred to below is an area determined to include a specific subject. That is, the target region is a region to be processed by the specific object detection device. The feature value is a value determined according to the state of the pixel included in the target region. For example, the average value, the total value, the dispersion, and the like of the pixel values of all or a part of the pixels in the attention area.

The determination device determines whether or not a specific subject is included in the region of interest, based on the determination value corresponding to the feature amount calculated by the calculation device, among the determination values stored in the storage device. For example, when the determination value corresponding to the calculated feature value is a value indicating a high possibility that a specific object is included in the target area, the determination device determines that the specific object is included in the target area.

In the first aspect of the present invention thus constituted, the determination value used in the determination process by the determination device is stored in the storage device as a value corresponding to each feature amount. Therefore, the determination value and the feature amount can be associated more accurately than in the case where the feature amount and the determination value are associated with each other based on one threshold value as in the past. Therefore, the determination device can more accurately determine whether or not a specific object is included in the region of interest for each given feature amount.

The first aspect may be modified as follows. That is, the 1 st modification includes: means for referring to area pattern information for defining a partial area of an image; an arithmetic unit for calculating a feature value of the image by performing a predetermined operation based on the area graphic information; a determination value storage unit that stores a combination of the feature amount calculated for the plurality of sample images and a determination value of an attribute of the image for which the feature amount is calculated; and a determination device for determining whether or not the image has the attribute based on the feature amount calculated for the image.

In the modification 1, the feature amounts calculated for a plurality of sample images and the determination values of the attributes of the images for which the feature amounts are calculated are stored in combination. For example, the feature amount calculated for a sample image having a certain attribute (presence of a specific subject, etc.) and the determination value having the attribute are stored. On the other hand, the feature amount calculated for the sample image not having a certain attribute (presence of a specific subject, etc.) and the determination value not having the attribute are stored.

The combination of the feature amount and the determination value may be stored in the determination value storage device in advance for a plurality of sample images. The frequency distribution of the feature amount may be obtained for a plurality of sample images. For example, a judgment value indicating that a sample image having a certain attribute (presence of a specific subject, etc.) has an attribute within a range of a feature amount in which the number of times is equal to or greater than a predetermined value may be stored. Further, a determination value indicating that the sample image does not have any attribute (presence of a specific object, etc.) may be stored in a range of feature values in which the number of times the sample image does not have any attribute is equal to or greater than a predetermined value.

[ second mode ]

A second aspect of the present invention is a specific object detection device including a storage device, a calculation device, a first determination device, a control device, and a second determination device.

The storage device is used for storing each judgment value prepared corresponding to a plurality of characteristic quantities. The determination value stored in the storage device may be two values (for example, "0" or "1"), or may be a real number. In the case where the determination value is given as a real number, the accuracy of the determination processing by the first and second determination means is improved as compared with the case where the determination value is given as two values.

The calculation device calculates the feature amount in the same region of interest by a plurality of different calculation processes. The plurality of different calculation processes may be processes in which the types of the calculated values are different (for example, an average value, a total value, and a dispersion), or may be different (for example, data relating to different partial regions in the region of interest is input) in performing the calculation process. The type of value to be calculated and the input of the calculation process may be different processes.

The first determination device calculates a score based on the determination value corresponding to the feature amount calculated by the calculation device, and determines whether or not a specific subject is included in the target region based on the score. For example, the determination device may calculate a score by accumulating a plurality of determination values corresponding to the calculated feature amounts, and determine that the specific object is included in the target region when the score is a value indicating that there is a high possibility that the specific object is included in the target region (for example, when the score exceeds a predetermined threshold).

The control device gives the plurality of feature quantities obtained by different calculation processes of the calculation device to the first determination device in the form of a group, and obtains a determination result of a sufficient number for obtaining a final determination from the first determination device, so that the first determination device reads out the corresponding determination values from the storage device for the plurality of feature quantities included in the group given by the control device, respectively, and calculates a score based on the determination values, thereby determining whether or not a specific subject is included in the region of interest. Therefore, the first judgment means derives a plurality of judgment results. The respective judgment results are not necessarily the same.

Further, the control device may dynamically determine whether or not the number of obtained determination results is sufficient for obtaining the final determination, and may determine the number in advance. For example, the number of determination results sufficient for obtaining the final determination may be set in advance by executing a learning algorithm or the like and an experience of the administrator. In this case, the number may be set in advance, or the calculation process executed by the calculation device may be set in advance.

In addition, the greater the number of judgment results of the first judgment means, the more the accuracy of the final judgment, that is, the accuracy of the judgment of the second judgment means can be improved. Therefore, the sufficient number for obtaining the final judgment means the number necessary for ensuring a certain accuracy of the final judgment.

The second judgment means finally judges whether or not the specific object is included in the region of interest based on the plurality of judgment results of the first judgment means obtained by the control means.

In the second aspect of the present invention configured as described above, the determination value used in the determination process by the first determination device is stored in the storage device for each feature amount. Therefore, the determination value can be associated with the feature amount more accurately than in the case where the feature amount and the determination value are associated with each other by one threshold value as in the conventional case. Therefore, the accuracy of the determination value can be improved, and the accuracy of the processing result of the first determination means that performs the processing using the determination value can also be improved. In other words, the first determination device can more accurately determine whether or not a specific subject is included in the target region based on each given feature value.

In addition, in order to improve the accuracy of each determination value, even if the number of feature amounts given to the first determination means in the form of a group is reduced, that is, even if the number of determination values employed in the first determination means is reduced, the accuracy of the processing result can be maintained. As a result, the number of feature values given in the form of groups can be reduced by maintaining the accuracy of the processing result of the first determination device, and high speed can be achieved.

In addition, by improving the accuracy of the judgment result of the first judgment means, even if the number of judgment results of the first judgment means is reduced, the accuracy of the final judgment (the accuracy of the judgment of the second judgment means) can be ensured. That is, the number of times of determination processing by the first determination device is reduced to a sufficient number for obtaining the final determination. Therefore, the time required for final determination as to whether or not a specific object is included in the region of interest can be shortened. That is, the process of detecting a specific object can be speeded up without lowering the accuracy.

In the first aspect, when the first determination device receives a plurality of feature values from the control device as a new group and calculates a new score, the first determination device may be configured to: a plurality of determination values of the plurality of feature quantities as the new group and the score calculated in the group for which the determination processing by the first determination means has been completed are adopted.

In the case of such a configuration, the first determination device affects the determination process not only by the determination value of the feature amount included in the group but also by the determination value in another group (the group in which the determination process by the first determination device has been completed). Therefore, the accuracy of the processing by the first judgment means can be improved. By improving the accuracy of the judgment results of the first judgment means, the accuracy of the final judgment can be ensured even if the number of the judgment results of the first judgment means is reduced. That is, the number of times of determination processing by the first determination means is reduced to a sufficient number for obtaining the final determination. Therefore, the time required for final determination as to whether or not a specific object is included in the region of interest can be shortened. That is, the process of detecting a specific object can be speeded up without lowering the accuracy.

[ third mode ]

A third aspect of the present invention is a specific object detection device including a storage device, a calculation device, a first determination device, a control device, and a second determination device.

The storage device stores a plurality of different patterns respectively based on each determination value prepared for each of the plurality of feature values. Therefore, the determination value can be determined based on the pattern and the feature amount obtained.

The calculation device calculates feature values in the same region of interest from the images based on a plurality of different patterns.

The first judgment means obtains a judgment value corresponding to the pattern used by the calculation means and the feature amount calculated by the calculation means. Then, a score is calculated based on the obtained determination value, and whether or not a specific subject is included in the attention region is determined based on the score.

The control device gives the plurality of feature amounts to the first determination device in a group form, and acquires a sufficient number of determination results from the first determination device to obtain a final determination. The plurality of feature quantities are obtained by calculation processing based on a plurality of different patterns. That is, the feature amount of each pattern is calculated by the calculation means. Then, the first determination device is given the set of feature values obtained for each pattern, and the determination result of the first determination device is obtained.

In the third aspect of the present invention thus constituted, the determination value used in the determination process by the first determination means is stored in the storage means as a value corresponding to each feature amount for each pattern. Therefore, the determination value and the feature value can be associated more accurately than in the case where the feature value and the determination value are associated with each other by one threshold value as in the conventional case. Therefore, the accuracy of each determination value can be improved, and the accuracy of the processing result of the first determination device that performs the processing using the determination value can also be improved. In other words, the first determination device can more accurately determine whether or not a specific object is included in the target region, based on the feature amount given to each pattern.

In addition, in order to improve the accuracy of each judgment value, even if the number of feature amounts given to the first judgment means in the form of a group is reduced, that is, even if the number of judgment values employed in the first judgment means is reduced, the accuracy of the processing result can be maintained. As a result, the number of feature quantities given in the form of groups can be reduced and the speed can be increased by maintaining the accuracy of the processing result of the first determination device.

In addition, by improving the accuracy of the judgment result of the first judgment means, even if the number of judgment results of the first judgment means is reduced, the accuracy of the final judgment (the accuracy of the judgment of the second judgment means) can be ensured. That is, even if the number of patterns employed in the calculation means and the first judgment means is reduced, the accuracy of the final judgment can be ensured. In other words, the number of determination results sufficient for obtaining the final determination is reduced, and the number of determination processes by the first determination device is reduced. Therefore, the time required for final determination as to whether or not a specific object is included in the region of interest can be shortened. That is, the process of detecting a specific object can be speeded up without lowering the accuracy.

In the storage device according to the third aspect of the present invention, the feature values divided into a plurality of sections may be stored in association with the determination values of the respective sections.

In the third aspect of the present invention, the determination value for each section may be a value obtained by the following determination criterion generating means. The judgment reference generating means includes sample image feature amount calculating means, frequency obtaining means, and judgment value determining means.

The sample image feature amount calculation device calculates each feature amount of a plurality of sample images from an arbitrary figure. The frequency acquisition means obtains, for each of the plurality of sections of the feature amount, the frequency of the sample image included in the section from the feature amount calculated by the sample image feature amount calculation means. For each of the plurality of sections, it is determined whether or not a specific subject is included in the region of interest included in the section based on the feature amount calculated by the sample image feature amount calculation means based on the number of times in the section, and a determination value is determined.

In the third aspect of the present invention, the structure of the sample image may be: the image processing apparatus includes a forward-resolution image including a specific subject that is a target of determination by the first determination device, and a non-forward-resolution image including no specific subject.

In the third aspect of the present invention, the determination value for each section may be set based on the relative value of each index of the forward-decoded image and the non-forward-decoded image.

In the third aspect of the present invention, the determination value for each section may be set based on the relative value of the number of times of each of the forward-solution image and the non-forward-solution image. The relative value is, for example, a ratio, a difference, or the like.

In addition, the pattern according to the third aspect of the present invention may have a structure in which: the pattern has a first characteristic region and a second characteristic region, and the position and size of each characteristic region are fixed in a specific region for each pattern.

In addition, the computing apparatus according to the third aspect of the present invention may be configured such that: the feature amount in the region of interest is calculated by calculating a relative value between a first feature amount of a first feature region and a second feature amount of a second feature region in the region of interest. The relative value is, for example, a ratio, a difference value, or the like.

In addition, the first determination device according to the third aspect of the present invention may be configured such that: when receiving the plurality of feature amounts from the control device as a new group and calculating a new score, the plurality of determination values of the plurality of feature amounts as the new group and the score calculated in the group for which the determination process by the first determination device has been completed are employed.

In the case of such a structure, the second aspect can achieve the same effects as in the case of such a structure.

[ fourth mode ]

A fourth aspect of the present invention is a criterion generating device including a calculating device, a frequency obtaining device, a judging device, and a criterion generating device.

The calculation means calculates each of the feature amounts of the plurality of sample images from an arbitrary figure.

The frequency acquisition means acquires the frequency of the sample image including the feature amount calculated by the calculation means in each of the plurality of sections into which the feature amount is divided. The number of times is, for example, a value representing the number of sample images and the number multiplied by a weight set for each sample image.

The frequency obtaining device obtains the frequency of the sample image included in the section by the feature amount calculated by the calculating device for each section obtained by dividing the feature amount into a plurality of sections. The number of times is, for example, a value representing the number of sample images and multiplying the number by a weight set for each sample image.

The judgment means determines a judgment value for each section based on the number of times of each section of the feature quantity. The determination device determines a determination value by determining whether or not a specific subject is included in a target region included in an arbitrary section based on the feature amount calculated from the graph. For example, when the feature calculated from a certain pattern fits in one of the sections, it is determined whether or not there is a high possibility that a specific object is included in the target region, and a determination value is determined.

The judgment criterion generating means generates a judgment criterion in which each section and the judgment value are associated with each other, based on the judgment result of the judging means. As a specific example of such a determination criterion, there is a table in which each section and a determination value are associated with each other.

In the fourth aspect of the present invention thus constituted, since the table is generated in which the determination value is associated with each segment for each feature amount, the table in which the determination value is associated with the feature amount can be generated more accurately than in the case where the feature amount is associated with the determination value based on one threshold value as in the conventional art. Therefore, when the process of detecting a specific object is performed using the table, it is possible to more accurately determine whether or not the specific object is included in the region of interest.

(others)

The first to fourth aspects may be realized by the information processing apparatus executing a program. That is, the above-described operation and effects can be obtained by providing a program for causing an information processing apparatus to execute the processing executed by each of the apparatuses according to the first to fourth aspects, or a storage medium storing the program. Further, the above-described operation and effect can be obtained by a method in which the information processing device executes the processing executed by each of the devices of the first to fourth aspects.

According to the present invention, the determination value used in the determination process by the determination means is stored in the storage means for each feature amount. Therefore, the determination value and the feature amount can be associated more accurately than in the case where the feature amount and the determination value are associated with each other by one threshold value as in the conventional case. Therefore, the determination device can more accurately determine whether or not a specific object is included in the region of interest for each given feature value.

In addition, in order to ensure the accuracy of the final judgment, when the judgment is performed again based on a plurality of judgment results, the accuracy of the judgment result using the judgment value is improved, and the accuracy of the final judgment can be ensured even if the number of the judgment results using such judgment values is reduced. Therefore, the time required for final determination as to whether or not a specific object is included in the region of interest can be shortened. That is, the process of detecting a specific object can be speeded up without lowering the accuracy.

Drawings

Fig. 1 is a diagram showing an example of a pattern of a face rectangle.

Fig. 2 is a diagram showing a flow of the face detection process.

Fig. 3 is a flowchart showing the face detection processing.

Fig. 4 is a diagram showing a method of selecting a target area when the size of the target area is fixed.

Fig. 5 is a diagram showing a method of selecting a target area when the size of a human image is fixed.

Fig. 6 is a diagram showing an example of processing of each layer in the first embodiment.

Fig. 7 is a diagram showing an example of an integral image.

Fig. 8 is a diagram showing an example of histograms of difference values and the number of images.

Fig. 9 is a diagram showing an example of determination values given to each section of the histogram in the first embodiment.

Fig. 10 is a diagram showing an example of the LUT according to the first embodiment.

Fig. 11 is a functional block diagram showing a configuration example of the face detection device.

Fig. 12 is a functional block diagram showing a configuration example of the determination unit.

Fig. 13 is a functional block diagram showing a configuration example of the table generating apparatus.

Fig. 14 is a diagram showing an example of a face determination rectangle.

Fig. 15 is a diagram showing an example of determination values given to each section of the histogram in the second embodiment.

Fig. 16 is a diagram showing an example of the LUT according to the second embodiment.

Fig. 17 is a schematic diagram illustrating processing of each layer according to the second embodiment.

Fig. 18 is a diagram showing a specific example of processing of each layer in the second embodiment.

In the figure: 1-face rectangle, 2-first rectangle, 3-second rectangle, 4a, 4 b-face detection means, 5-input section, 6-output section, 7a, 7b-LUT storage section, 8a, 8 b-determination section, 9-setting storage section, 10-feature amount calculation section, 11a, 11 b-first determination section, 12-control section, 13a, 13 b-second determination section, 14a, 14 b-table generation means, 15-feature amount calculation section, 16-number obtaining section, 17a, 17 b-determination section, 18a, 18 b-table generation section, 19a, 19b-LUT, P1-face determination rectangle, P2-first rectangle, P3-second rectangle.

Detailed Description

Embodiments of the specific object detection device and the like according to the present invention will be described below with reference to the drawings. In the following description, the face detection device 4 (including 4a and 4b: see fig. 11) that detects a face of a person from a person image will be described as a specific example of the specific object detection device.

In this description, the person image is an image including at least a part or all of the face of a person. Therefore, the person image may be an image including the entire person, or may be an image including only the face or upper body of the person. The personal image may be an image including a plurality of persons, or may be any figure including a scene (including an object of interest as a subject) other than a person, a pattern, and the like on a background.

The following description of the face detection device 4 is merely an example, and the configuration thereof is not limited to the following description.

[ principle of face detection ]

First, the principle of the face detection technique applied to the face detection apparatus 4 will be explained. The face detection technique applied to the face detection device 4 is slightly modified from the conventional face detection technique, and the principle of the conventional face detection technique will be described here. In this conventional face detection technique, learning using a sample image is performed in advance (hereinafter referred to as "learning processing"), and a face is detected based on the learning result (hereinafter referred to as "face detection processing").

[ learning processing ]

First, a conventional learning process using a sample image will be described. As sample images, a plurality of face images (forward solution images) and non-face images (non-forward solution images) having the same size are prepared. Here, the sample image is an image using a plurality of rectangles having the same number of pixels in the vertical and horizontal directions. The face image is an image including a human face, which is an image having an aspect ratio adjusted or trimmed in accordance with the size of the human face, and the non-face image is an image not including a human face, which is composed of, for example, an image of a landscape or an image of an animal. In the face detection device 4, such a face image is prepared as a forward-looking image in order to detect the face of a person as a specific subject. Also, such a non-face image is prepared as a non-correct image. In another specific object detection device, an image including a specific object detected by each device is prepared as a correct-resolution image, and an image including no specific object detected by each device is prepared as a non-correct-resolution image.

In the learning process, a rectangle (hereinafter referred to as "face rectangle") surrounding a region having the same size as the sample image is used. Fig. 1 is a diagram showing an example of a face rectangle. The face rectangle 1 includes a first rectangle 2 and a second rectangle 3. The face rectangle 1 has a plurality of figures (corresponding to (a) to (1) in fig. 1) depending on the number and the positions of the first rectangles 2 and the second rectangles 3. That is, each face rectangle 1 has a unique number and arrangement for the first rectangle 2 and the second rectangle 3 as a figure. Next, learning using the face rectangle 1 and the sample image will be described.

First, data collection using all sample images is performed on a face rectangle 1 of a certain figure. In this data collection, first, in the sample image, feature amounts (for example, average values of pixel values in the regions) of the regions corresponding to the first rectangle 2 and the second rectangle 3 (hereinafter, referred to as "first feature region" and "second feature region", respectively) are calculated. When the first feature region and/or the second feature region are/is included in one face rectangle 1, the total value of the feature amounts of the respective regions is calculated as the respective feature amounts. For example, in the case of fig. 1 (j), the feature amount of the first feature region is calculated as the sum of the feature amounts of the two first feature regions. The difference value is calculated as a relative value (for example, a ratio and a difference value, and here, a difference value may be set as a relative value) between the feature value of the first feature region and the feature value of the second feature region. The difference value represents a feature amount of the target region.

Then, a threshold corresponding to the face rectangle 1 of each figure is obtained from the calculated difference value (feature amount of the region of interest). The threshold is determined using a probabilistic approach. Usually, a simple mathematical model (e.g., gaussian distribution) is set to design such a probability. For example, a total value (integral value) of the number of samples is obtained for each of the face image and the non-face image having the difference values from 0 to each value, and the value having the largest difference between the total values is set as the threshold value.

The above-described processing is performed on all the prepared face rectangles 1 of the figures, and threshold values corresponding to the face rectangles 1 of all the figures are set.

Next, it is determined which face rectangle 1 should be used in the face detection processing among the plurality of face rectangles 1 for which the threshold value is set. The specific object detection device determines the presence or absence of a face for each image layer, and for example, determines whether there is a possibility of a face in the determination of the image layer 1, and stops the processing if there is no possibility. On the other hand, when there is a possibility that a face exists, a more detailed determination is made in the next layer 2.

In the above determination, in the face detection process, the face rectangles 1 of the graphics used in each layer are assigned to the plurality of layers (a specific example of the layer will be described in the column of the face detection process) for which the determination of the presence or absence of a face is performed. This process is implemented by an enhanced learning algorithm of AdaBoost or the like.

In this determination, the number of drawing layers and the number of face rectangles 1 assigned to each drawing layer, which are necessary for the face detection process, are determined by a designer. In this case, since the accuracy of the face detection process is improved as the number of face rectangles 1 used in the face detection process is increased, a designer determines the number of face rectangles 1 sufficient for obtaining the final judgment of the face detection process based on experiments, experience, and the like. The designer determines the number of drawing layers and the number of face rectangles 1 allocated to each drawing layer based on the number. The number is appropriately determined according to the speed and accuracy of the processing obtained in the face detection processing.

[ face detection processing ]

Next, a conventional face detection process will be described. Fig. 2 is a flowchart showing the face detection processing. First, a general flow of the face detection process will be described with reference to fig. 2.

The face detection processing is performed through a plurality of layers. Different combinations of face rectangles 1 are assigned to the respective layers. In fig. 2, the number of face rectangles 1 assigned to each layer is also different. Further, the order of execution determination is assigned to each layer, and each layer executes processing according to the order. That is, for example, in fig. 2, after the determination of Layer 1 (Layer 1), the determination of Layer 2 (Layer 2) is performed, and thereafter, the determination of Layer 3 (Layer 3) is performed.

Each layer determines whether or not a person's face is included in the region of interest, using the face rectangle 1 assigned to its own figure, in accordance with the order assigned to itself. In a certain layer, when it is determined that the face of the person is not included in the target region, the target region is not determined in the subsequent layer. Then, when it is determined that the face of the person is included in the target area based on the determination of the last Layer (Layer n in fig. 2), it is finally determined that the face of the person is included in the target area in the face detection process.

Fig. 3 is a flowchart showing the flow of the face detection process. Next, a specific flow of the face detection process will be described with reference to fig. 3.

In the face detection process, first, a target region to be processed is selected from a human image (S01). Basically, the region of interest is selected in order according to the fact that the regions are offset in the longitudinal direction or the lateral direction at a constant interval from the sides of the human image. For example, the region of interest is selected by raster scanning the human subject. In this case, a plurality of sizes of attention areas are selected for an arbitrary person image. In this selection method, there are: a method of changing the size of the person image by fixing the size of the region of interest; and a method of fixing the size of the human image and changing the size of the region of interest. Fig. 4 is a diagram showing a method of fixing the size of the region of interest. Fig. 5 is a diagram showing a method of fixing the size of a human image. When the size of the target area changes, the sizes of the face rectangle 1, the first rectangle 2, and the second rectangle 3 also change according to the change in the size of the target area. That is, when the size of the region of interest changes, the size of the face rectangle 1 used in each layer is controlled to be the same as or substantially the same as the size of the region of interest, and the sizes of the first rectangle 2 and the second rectangle 3 change depending on the size of the face rectangle 1.

Next, a determination is made as to whether or not the face of the person is included in the selected attention area. The determination is performed in each of the plurality of layers. Therefore, the layer to be determined is selected in order (S02).

Next, determination processing is performed on the selected layer (S03). If it is determined that the face of the person is not included in the region of interest in the determination of the map layer (no in S04), the process from S07 onward is performed. The processing in and after S07 will be described later. On the other hand, when it is determined that the face of the person is included in the region of interest (yes in S04), it is determined whether or not the previous determination process (determination process in S03) is the process of the last layer. If the layer is not the last layer (S05 — no), the process returns to S02, the next layer is selected, and the determination process is performed on the newly selected layer. On the other hand, if the image layer is the last layer (S05-YES), a final judgment is made that the face of the person is included in the current region of interest (S06). At this time, the face detection device 4 determines that the face of the person is included in the target region. That is, at this time, the face detection device 4 initially detects the face of the person.

Next, it is determined whether or not the attention area to be determined is the last attention area in the human image. If the target area is not the last target area (no in S07), the process returns to S01, and the next target area is selected and the process from S02 onward is performed. On the other hand, if the current region is the last region of interest (S07-YES), the face detection process for the human image is terminated.

Fig. 6 is a diagram showing an example of processing for determining each layer. Next, the following describes the process contents of the layer and the determination of each layer with reference to fig. 6.

More than one graphic face rectangle 1 is assigned to each layer. This allocation is implemented in the learning process by an enhanced learning algorithm of AdaBoost or the like. Each layer determines whether or not a face is included in the region of interest, based on the face rectangle 1 assigned to its own figure.

In each layer, the feature amounts of the first feature region and the second feature region in the region of interest are calculated from the face rectangle 1 of each figure assigned to each layer. In this case, when the feature value is equal to the sum and average of the pixel values in each region, that is, when the feature value is a value calculated by using the sum of the pixel values, the feature value may be calculated by using an integral image. Fig. 7 is a diagram showing an example of an integral image. The calculation process of the feature amount using the integral image will be described with reference to fig. 7.

In the integral image, each pixel has a total of pixel values from each pixel of the original image to all pixels at the upper left as its pixel value. For example, the pixel a in fig. 7 has the total of the pixel values of the full image included in the region a of the original image as the pixel value. Therefore, for example, by subtracting the pixel values of b and c from the pixel value of D and adding the pixel value of a, the total of the pixel values of the full image included in the region D of the original image (that is, the feature amount of the region D) can be calculated.

Next, a difference value, which is a relative value of the calculated feature amount, is calculated, and a determination is made that the face of the person is included in the target region based on the difference value. Specifically, it is determined whether the calculated difference value is greater than or less than a threshold set for the face rectangle 1 of the figure employed at the time of determination. Then, it is determined whether or not the face of the person is present in the region of interest based on the determination result.

However, the determination at this time is based on the determination of the face rectangle 1 of each figure, not the determination of the layer. In this way, in each layer, individual determination is performed based on the face rectangles 1 assigned to all the graphics, and each determination result (corresponding to "individual determination of face rectangles" in fig. 6) is obtained.

Next, the score of the layer is calculated. Each score (e.g., pt1, pt 2.., ptn) is assigned to the face rectangle 1 of each graph. When it is determined that the face of the person is included in the region of interest, the score of the face rectangle 1 assigned to the figure used at that time is referred to and added to the score of the layer. In this way, the total of the added scores is calculated as a score of the layer (hereinafter, referred to as "total score" to distinguish the total of the score of the layer from the score of each graph). When the total score of the layer exceeds a specific threshold value, the layer determines that the face of the person is included in the region of interest. On the other hand, when the total score of the layer does not exceed a specific threshold, the layer determines that the face of the person is not included in the target area.

In S02 to S06 (see fig. 3), determination is performed on each layer in the order of importance of the layer processing (for example, layers to which the number of face rectangles 1 to be assigned is small) (see fig. 2). Further, the structure may be: before the determination of each layer is performed, the variance of the brightness in the region of interest is calculated, and whether or not to perform the determination of each layer is determined based on the calculated value. In such a configuration, when it is determined that the determination of each layer is not performed, the process of S07 of fig. 3 is performed. This is because, for example, it is considered that a face to be determined for each layer is not included in a target area where there is little change in brightness (for example, a completely black target area, a completely white target area, or the like).

[ first embodiment ]

[ principle ]

Among the face detection techniques applied to the face detection device 4, the conventional face detection technique is described. Next, a modified process of the face detection device 4a of the first embodiment applied to the face detection device 4 in the face detection technique will be described. In the following description, the modified processing will be described. That is, the same processing as the face detection technique described above is performed for processing that is not described below.

In the conventional face detection technique, since a simple mathematical model is set when calculating the threshold value of the face rectangle 1 of each figure, it is not set which shape the histogram of the difference value and the number of samples of the face image and the non-face image actually has. For example, in the case of the face rectangle 1 shown at the uppermost part of fig. 6, the feature amounts around the left and right eyes are calculated as the feature amounts of the first feature region, and the feature amounts around the nose and the left and right cheeks are calculated as the feature amounts of the second feature region.

Conventionally, the feature amounts of such feature areas are distributed based on a simple mathematical model, and the threshold value is calculated based on the simple mathematical model. However, for example, in the case of the specific example of the first characteristic region, three cases in which the characteristic amount changes greatly, such as a case where the left and right eyes are closed, a case where one eye is closed, and a case where the left and right eyes are open, can be assumed in practice. In the case of the specific example of the second characteristic region, for example, it is assumed that the characteristic amount is changed largely because the cheek and the nose are also raised portions in the face, and thus the degree of the raised portions and the state of the skin, such as when the reflection of light is conspicuous or unnoticeable, are both significant. Therefore, the face detection device 4a sets a distribution having a plurality of peaks, instead of a simple distribution such as a gaussian distribution, on the basis of such an assumption.

Fig. 8 is a diagram showing an example of a histogram of difference values calculated by data collection in the learning process. The histogram is related to the face rectangle 1 of any one of the figures, and the histogram is formed similarly for the face rectangle 1 of each figure.

The abscissa of the histogram represents a difference value between the feature amount of the first feature region and the feature amount of the second feature region. The ordinate of the histogram indicates the number (number of times) of sample images of the calculated corresponding difference values. The positive solution distribution indicates a distribution related to a sample image of a face image, and the non-positive solution distribution indicates a distribution related to a sample image of a non-face image.

In the learning process of the first embodiment, when the histogram is formed, the abscissa is separated by a certain interval. The interval may have a certain width or may have a different width depending on the difference value. Next, in each section, a determination value is obtained from the value of the forward solution distribution (the number of times of face images) and the value of the non-forward solution distribution (the number of times of non-face images). Fig. 9 is a diagram showing a case where a determination value of each section is obtained from the formed histogram. The determination value is a value indicating whether or not the probability that an image distributed in the section of the corresponding differential value is a face image is high. For example, the determination value is "1" in a section with a high face image probability (a section with a light color in fig. 9), and the determination value is "0" in a section with a low face image probability (a section with a dark color in fig. 9). For example, the determination value is "1" when the number of times of the normal solution distribution in a certain section is higher than the number of times of the non-normal solution distribution, and is "0" when the number of times of the normal solution distribution in a certain section is lower than the number of times of the non-normal solution distribution.

Then, an LUT (Look Up Table) 19a is created from the histogram. Fig. 10 is a diagram showing an example of LUT19a. The LUT19a has determination values corresponding to the respective sections of the difference value. In the face detection processing after the change, the face of the person in the image is detected not based on the threshold value but based on the LUT19a created by the learning processing.

The above-described processing is performed on the face rectangles 1 of all the prepared figures, and the LUTs 19a corresponding to the face rectangles 1 of all the figures are created.

Next, of the obtained plurality of LUTs 19a, it is determined which LUT19a is used in the face detection process. That is, it is determined which pattern of the face rectangle 1 should be used to perform the face detection processing. In this determination, the face rectangle 1 of the pattern used in each layer is assigned to each of the plurality of layers for which face detection is performed in the face detection process. This process is implemented by an enhanced learning algorithm of AdaBoost or the like.

In each layer of the face detection process by the face detection device 4a, the feature amounts of the first feature region and the second feature region in the region of interest are calculated from the face rectangle 1 of each figure assigned to each layer. Next, a difference value of the calculated feature amount is calculated. Based on the difference value, it is determined whether or not the face of the person is included in the region of interest. Specifically, a determination value corresponding to the calculated difference value is obtained from the LUT19a corresponding to the face rectangle 1 of each pattern, and determination is performed based on the value. For example, in the case where the difference value is at least 40 or more, less than 60, 100 or more, less than 120, and 140 or more, less than 160 in the determination of the face rectangle 1 using the graph corresponding to the LUT19a shown in fig. 10, it is determined that the face of the person is not included in the region of interest. On the other hand, when the difference value is at least 60 or more, less than 100, and 120 or more, less than 140, it is determined that the face of the person is included in the target area.

In this way, the face detection device 4a uses the LUT19a set by the assumption of the distribution having a plurality of peaks to perform the determination process that has been conventionally performed using the threshold set by the assumption of the simple distribution.

[ System Structure ]

< face detection device >

Next, the configuration of the face detection device 4a to which the face determination technique having the above-described change point is applied will be described, compared with the conventional one. The face detection apparatus 4a has, in terms of hardware: a CPU (central processing unit), a main storage device (RAM), an auxiliary storage device, etc., connected through a bus. The auxiliary storage device is constituted by a nonvolatile storage device. The nonvolatile memory device referred to herein means: the ROM (Read-Only Memory) includes EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), main ROM, FRAM (Ferroelectric RAM), and hard disk.

Fig. 11 is a functional block diagram showing the face detection device 4 a. The face detection device 4a incorporates various programs (OS, application programs, and the like) stored in the auxiliary storage device into the main storage device, is executed by the CPU, and functions as a device including the input unit 5, the output unit 6, the LUT storage unit 7a, the determination unit 8a, the setting storage unit 9, and the like. Next, each functional unit included in the face detection device 4a will be described with reference to fig. 11.

Section for input

The input unit 5 functions as an interface for inputting data of an original image of a person image (hereinafter referred to as "original image data") into the face detection device 4 a. The data of the original image may be data of a still image or data of a moving image. The input unit 5 inputs the data of the original image from the outside of the face detection device 4a into the face detection device 4 a. The input unit 5 may be any conventional one that inputs the data of the original image into the face detection device 4 a.

For example, the data of the original image may be input to the face detection device 4a via a network (e.g., a local area network or the internet). In this case, the input unit 5 may be configured to use a network interface. The data of the original image may be input into the face detection device 4a from a digital camera, a scanner, a personal computer, a storage device (e.g., a hard disk drive device), or the like. In this case, the input unit 5 may be configured in accordance with a standard (for example, a standard for wired connection such as USB (Universal Serial Bus) and SCSI (Small Computer System Interface) and wireless connection such as blue tooth) for connecting the digital camera, the personal Computer, the storage device, and the like to the face detection device 4a so as to enable data communication. In addition, the data of the original image stored in the storage medium [ for example, various flash memories and flexible disks (registered trademark), CD (Compact Disk), DVD (Digital Versatile Disk), digital Video Disk ]) may be input to the face detection device 4 a. In this case, the input unit 5 may be configured as a device (for example, a flash memory reader, a flexible disk drive device, a CD drive device, or a DVD drive device) for reading data from a storage medium.

The face detection device 4a may be included in an imaging device such as a Digital camera or an imaging device having a Digital camera (e.g., a PDA (Personal Digital Assistant)), and the captured person image may be input to the face detection device 4a as data of the original image. In this case, the input unit 5 may be configured to employ a CCD (Charge-Coupled Devices) or CMOS (Complementary Metal-Oxide Semiconductor) sensor, or may be configured to serve as an interface for inputting data of an original image captured by the CCD or CMOS sensor into the face detection device 4 a. The person image input to the image output apparatus may be input to the face detection apparatus 4a as output data as data of the original image. In this case, the input unit 5 may be configured to convert the data of the original image input to the image output device into data processable by the face detection device 4 a.

The input unit 5 may be configured to be suitable for the above-described cases.

Section for output

The output unit 6 functions as an interface for outputting data indicating whether or not the determination unit 8a has detected a human face and/or data indicating the position, size, and the like of the detected face to the outside of the face detection device 4 a. The output unit 6 may be any conventional configuration that outputs data on the result of detecting the face of a person from the face detection device 4 a.

For example, data relating to the detection result may be output from the face detection apparatus 4a via a network. In this case, the output unit 6 is configured to use a network interface. Further, data relating to the detection result may be output to another information processing device such as a personal computer or the like and a storage device. In this case, the output unit 6 is configured in accordance with a specification for connecting another information processing device such as a personal computer, a storage device, or the like, and the face detection device 4a so as to enable data communication. In addition, data relating to the detection result may be output (written) to the storage medium. In this case, the output unit 6 is configured as a device (for example, a flash memory recorder, a flexible disk drive device, a CD-R drive device, or a DVD R drive device) for writing data into the storage device or the storage medium.

An example of a specific use of the data output by the output unit 6 will also be described. For example, the data output by the output unit 6 may be used to output a graphic representing the area of the face detected by the face detection device 4a to a display device such as a monitor. In this case, the output unit 6 may be configured as an interface for performing data communication with a display device such as a display, or may be configured as an interface for connecting to a display device such as a display or for submitting data to a built-in information processing device. For example, when the face detection device 4a is incorporated in a digital camera or various devices including a digital camera, the digital camera may perform control related to imaging such as focus control and exposure compensation with reference to data output by the output unit 6. In this case, the output unit 6 may be configured as an interface capable of data communication with an information processing apparatus in the digital camera, for example. For example, when the face detection device 4a is included in an information processing device that performs image compensation processing and when the information processing device is connected to such an information processing device, the information processing device may determine a processing area, processing content, and the like of the image compensation processing based on data output by the output unit 6. In this case, the output unit 6 may be configured as an interface capable of data communication with the information processing apparatus and the internal apparatus thereof, for example.

The output unit 6 may be configured to be suitable for the above-described cases.

LUT storage section

The LUT storage unit 7a is configured using a nonvolatile memory device. The LUT storage unit 7a stores the LUT19a used by the determination unit 8a when performing the face detection process. That is, the LUT storage unit 7a stores the LUTs 19a for the face rectangles 1 of the respective patterns obtained as a result of the learning process. Therefore, the LUT storage part 7a can store a plurality of LUTs 19a.

Ministry of judgment

The determination unit 8a performs face detection processing based on the setting contents stored in the setting storage unit 9 by using the LUT19a stored in the LUT storage unit 7 a. The determination section 8a supplies the result of the face detection processing to the output section 6. The determination unit 8a performs data input and output to and from the input unit 5, the output unit 6, the LUT storage unit 7a, and the setting storage unit 9 via an input unit and an output unit, which are not shown.

The determination section 8a is realized by the CPU executing a face detection program. The determination unit 8a may be a dedicated chip.

Fig. 12 is a functional block diagram showing the inside of the determination unit 8 a. The function of the determination unit 8a will be described with reference to fig. 12. The determination unit 8a includes a feature amount calculation unit 10, a first determination unit 11a, a control unit 12, and a second determination unit 13a.

The feature amount calculation unit 10 calculates the feature amounts of the first feature region and the second feature region calculated in each layer. Then, the feature amount calculation unit 10 calculates a relative value (here, a difference value) of the two calculated feature amounts.

The first determination unit 11a obtains a determination value corresponding to the difference value calculated by the feature value calculation unit 10, and determines whether or not the face of the person is included in the target area based on the determination value of 1 or more. Specifically, the first determination unit 11a reads out the LUT19a corresponding to the pattern used when the feature amount calculation unit 10 calculates the feature amount. The first determination unit 11a acquires a determination value corresponding to the difference value calculated by the feature value calculation unit 10 from the read LUT19a. The first determination unit 11a obtains determination values corresponding to all the graphics assigned to each layer, calculates a total score of each layer based on these determination values, and determines whether or not the face of a person is included in the region of interest (corresponding to S03 and S04 in fig. 3).

The control unit 12 reads various setting contents stored in the setting storage unit 9, and gives the positions, sizes, and the like of the first rectangle 2 and the second rectangle 3 of each figure to the feature value calculation unit 10. The control unit 12 gives the first determination unit 11a the feature amount calculated by the feature amount calculation unit 10 and the LUT19a corresponding to the pattern used when calculating the feature amount. Then, the result of the determination by the first determination unit 11a is given to the second determination unit 13a. The control unit 12 performs selection of the region of interest (corresponding to S01 in fig. 3) and control of the operation of the determination unit 8a (corresponding to S02, S05, and S07 in fig. 3) in addition to the selection.

The second determination unit 13a performs a final determination as to whether or not a face is included in the current region of interest, based on the result of the first determination unit 11a, that is, based on the determination result of each layer (corresponding to S06 in fig. 3).

Section for setting storage

The setting storage unit 9 is configured by using a nonvolatile storage device. The setting storage unit 9 stores various setting contents when the determination unit 8a performs the face detection processing. For example, the setting storage unit 9 stores the face rectangle 1 of each figure. Specifically, the positions, sizes, and the like of the first rectangle 2 and the second rectangle 3 of each pattern are stored. For example, the setting storage unit 9 stores whether or not the face rectangle 1 of any one of the graphics is assigned to each layer. The setting storage unit 9 also stores a method of selecting a region of interest from the human image.

< Table creation device >

Next, the configuration of the table generation device 14a that generates the LUT19a used by the face detection device 4a will be described. The table generation device 14a includes, in terms of hardware: a CPU (central processing unit), a main storage device (RAM), an auxiliary storage device, etc. connected by a bus. The auxiliary storage device is constituted by a nonvolatile storage device.

Fig. 13 is a functional block diagram showing the table generating apparatus 14 a. The table generating device 14a incorporates various programs (an OS, an application program, and the like) stored in the auxiliary storage device into the main storage device, and is executed by the CPU, and functions as a device including the feature value calculating unit 15, the frequency obtaining unit 16, the determining unit 17a, and the table generating unit 18 a.

The table generating apparatus 14a executes the learning process after the change as a whole. Next, each functional unit included in the table creation device 14a will be described with reference to fig. 13.

The feature amount calculation unit 15 calculates the feature amounts of the first feature region and the second feature region from the pattern for each sample image. The feature amount calculation unit 15 calculates a difference value as a relative value of each feature amount. The features of each pattern (the size, position, etc. of each feature region) may be stored in the feature amount calculation unit 15, or may be stored in another functional unit (not shown).

The frequency obtaining unit 16 obtains a positive solution distribution and a non-positive solution distribution from the calculation result of the feature amount calculating unit 15. The frequency obtaining unit 16 obtains the frequency of each section for the positive solution distribution and the non-positive solution distribution.

The determination unit 17a determines a determination value for each section based on the number of times of each section of the positive solution distribution and the non-positive solution distribution obtained by the number-of-times obtaining unit 16.

The table generating unit 18a generates an LUT19a in which the determination value obtained by the determining unit 17a and the section are associated with each other. The table generating unit 18a determines which LUT19a should be used in the face detection device 4a by performing a reinforcement learning algorithm, and assigns each layer.

[ Effect/Effect ]

According to the face detection device 4a, in the determination process of each layer in the face detection process, when the determination is made based on the face rectangle 1 of each pattern, the LUT19a is used without using the threshold value. In the LUT19a, the range of the difference value of the feature amount of each feature region and the determination value corresponding to each range of the difference value are stored, and determination of each pattern is performed based on the determination value.

Therefore, the face detection device 4a can perform more accurate determination in the determination of the face rectangle 1 using each figure than in the case where it is determined whether or not a face is present in the region of interest based on a certain threshold value. For example, in the case where the histogram shown in fig. 9 is obtained by learning, since a simple distribution is set in the conventional technique, for example, a boundary between a section from the left to the 4 th and a section from the 5 th is set as a threshold. That is, the distribution having a small peak (the interval from the right to the 2 nd and the 3 rd in fig. 9) is not specifically assumed. However, in the face detection device 4a, the distribution having such small peaks is independently determined by using the LUT19a. Therefore, the face detection device 4a uses the face rectangle 1 of each pattern to make more accurate determinations than in the conventional art.

In addition, in the face detection device 4a, the figures of the face rectangle 1 assigned to each layer are reduced. And/or, in the face detection apparatus 4a, the number of layers implemented in the face detection process is reduced. That is, in the face detection processing for one attention area, the total number of patterns of the face rectangle 1 for which determination is performed is reduced.

The reason why the determination of the face rectangle 1 based on a plurality of figures is performed in the face detection process of the face detection technique applied to the face detection device 4a is considered to be that the determination of the face rectangle 1 based on each figure is extremely inaccurate depending on only one of the determinations. That is, since the determination performed on the face rectangle 1 of each figure is inaccurate, the accuracy of the determination processing can be improved by the determination based on the face rectangles 1 of a plurality of figures. However, according to the face detection device 4a, the accuracy of the determination processing based on the face rectangle 1 of each figure can be improved. Therefore, the total number of patterns of the face rectangle 1 used in the face detection process for one region of interest can be reduced, and the speed can be increased without lowering the accuracy of the face detection process as a whole.

[ modification ]

In the above, the face detection device 4a that detects a human face from an image is described as a specific example of the specific object detection device. In addition, specific examples of the specific object detection device include a device that detects a car body of a car from an image, and a device that detects a specific animal such as a cat or a dog; and means for detecting specific characters, symbols, marks, etc. These devices are only different sample images used for the learning process, and the basic structure and the like can be similarly installed in the face detection device 4 a. The face detection device 4a may be appropriately modified according to each detected specific object.

In the above description, the difference value between the feature value of the first feature region and the feature value of the second feature region is used, and a relative value such as a ratio of the feature values may be used.

[ second embodiment ]

[ principle ]

Next, the principle of the face detection technique applied to the face detection device 4b of the second embodiment of the face detection device 4 will be described. In the following description, points different from the face detection technique applied to the first embodiment will be described.

In the first embodiment, after the abscissa of the histogram (see fig. 9) is spaced at a specific interval, one of the determination values "0" or "1" is given to each interval. In contrast, in the second embodiment, a determination value of a real number is given to each section. Fig. 15 is a diagram showing an example of determination values given to each section of a histogram in the second embodiment. In the second embodiment, the determination value indicates the height of the probability that an image distributed to the section of the corresponding differential value is a face image or the probability thereof. That is, the determination value of the first embodiment indicates "whether or not the image of the attention area is a face is highly likely", whereas the determination value of the second embodiment indicates "the degree of possibility that the image of the attention area is a face". For example, the determination value is a real number from "0" to "1", and indicates that the larger the value, the higher the probability of the face image. More specifically, the determination value may be calculated according to, for example, the following formula. In formula 1, a determination value is calculated as h (x).

[ EQUATION 1 ]

If f is _Haar (x)∈bin _j Then the

Wherein

l＝±1，j＝1，...，n.

f _Haar Is a Haar feature

The determination value may be obtained from a ratio of a difference between the number of positive solution distributions and the number of non-positive solution distributions. In this case, the determination value is set to a larger value as the number of times of the forward solution distribution is higher than the number of times of the non-forward solution distribution, and conversely, the determination value is set to a smaller value as the number of times of the forward solution distribution is lower than the number of times of the non-forward solution distribution.

Then, LUT19b is created based on the determination value of each section of the histogram. Fig. 16 is a diagram showing an example of LUT19b. LUT19b has a determination value for each section corresponding to the difference value, and the determination value is expressed by a real number. In the second embodiment, similarly to the LUT19a of the first embodiment, LUTs 19b corresponding to the face rectangles 1 of all the figures are created. Then, by adding the learning addition, LUT19b is assigned to each of the plurality of layers.

In the face detection processing of the second embodiment, each layer (excluding the layer that is first processed) is subjected to processing different from that of the first embodiment. Fig. 17 is a schematic diagram illustrating processing of each layer according to the second embodiment. First, the first Layer (Layer 1) acquires the determination values of the graphics assigned to the Layer, as in the layers of the first embodiment. Then, layer 1 calculates a total score of the image Layer based on the determination value of each pattern, and determines whether or not a face is present in the region of interest. On the other hand, layers subsequent to Layer 2 determine whether or not a face is present in the region of interest based on the determination value obtained based on the face rectangle 1 of each graphic assigned to each Layer and the total score calculated in the preceding Layer. That is, each layer of the second embodiment is different from each layer of the first embodiment in that the total score is calculated in consideration of the total score of the preceding layer. In addition, each layer of the second embodiment considers the determination value of each graph as the score of each graph. However, other values obtained based on the determination value of each pattern may be treated as the score of each pattern.

Fig. 18 is a diagram showing a specific example of processing of each layer according to the second embodiment. Layer m (layer m is not the layer to be processed first) calculates a feature amount from each of the graphics assigned to this layer m. Next, the layer m acquires determination values (pt 2 to ptn) for each pattern from the LUT19b and the calculated feature values. The layer m acquires the total score of the preceding layer (m-1)) as the determination value pt 1.

In the first embodiment, only the score of the figure whose determination value is "1" is considered in calculating the total score of each layer, but the total score is calculated in each layer of the second embodiment in consideration of the determination values of the real numbers of all the figures. Therefore, the layer m calculates a total score from all the determination values (pt 1 to ptn), and is determined as the layer m. In the layer m, when it is determined that the face is included in the attention area, the total score of the layer m is submitted to the next layer (m + 1)). Then, a final determination is made as to whether or not a face is present in the region of interest in the last layer.

[ System Structure ]

< face detection device >

Next, the configuration of the face detection device 4b according to the second embodiment will be described. The face detection device 4b is different from the face detection device 4a in that it includes an LUT storage unit 7b and a determination unit 8b instead of the LUT storage unit 7a and the determination unit 8 a. Next, differences between the face detection device 4b and the face detection device 4a will be described.

LUT storage section

The LUT storage unit 7b is different from the LUT storage unit 7a in that the LUT19b (see fig. 16) is stored instead of the LUT19a (see fig. 10). The LUT storage unit 7b has the same configuration as the LUT storage unit 7a in other respects.

Ministry of judgment

The determination unit 8b performs the face detection process based on the setting contents stored in the setting storage unit 9 by using the LUT19b stored in the LUT storage unit 7 b. Next, the functional blocks of the determination unit 8b will be described with reference to fig. 12. The determination unit 8b has a first determination unit 11b in place of the first determination unit 11 a; the present invention is different from the judgment unit 8a in that a second judgment unit 13b is provided instead of the second judgment unit 13a. Next, the difference between the determination unit 8b and the determination unit 8a will be described.

The first determination unit 11b obtains a determination value corresponding to the difference value calculated by the feature value calculation unit 10, and determines whether or not the face of the person is included in the target area based on the determination value of 1 or more. Specifically, the first determination unit 11b reads out the LUT19b corresponding to the pattern used when the feature amount calculation unit 10 calculates the feature amount. The first determination unit 11b obtains a determination value corresponding to the difference value calculated by the feature value calculation unit 10, that is, a determination value of each pattern, from the read LUT19b. The first determination unit 11b calculates a total score of each layer based on these determination values, and determines whether or not the face of a person is included in the region of interest.

The first determination unit 11b uses, as one determination value, a value based on the total score of the preceding layer in the second and subsequent layers. That is, the first determination unit 11b calculates the total score of the layer in the second and subsequent layers by using the value based on the total score of the preceding layer and all the determination values corresponding to the respective patterns assigned to the layer. Then, based on the calculated total score, a determination is made as to whether or not a face is included in the current region of interest.

The second determination unit 13b performs a final determination of whether or not a face is included in the current region of interest based on the result of the first determination unit 11b, that is, the determination result of each layer (corresponding to S06 in fig. 3).

< Table creation means >

Next, the configuration of the table generation device 14b that generates the LUT19b used by the face detection device 4b will be described. The table generating device 14b is different from the table generating device 14a in that the learning process of the second embodiment is performed. That is, the table generating device 14b is different from the table generating device 14a in that it includes a judging unit 17b and a table generating unit 18b in place of the judging unit 17a and the table generating unit 18 a. Next, the table generation device 14b will be described only in terms of differences from the table generation device 14 a.

The determination unit 17b calculates a determination value of a real number in each section according to formula 1 based on the number of times of each section of the positive solution distribution and the non-positive solution distribution obtained by the number-of-times obtaining unit 16.

The table generator 18b generates an LUT19b in which the determination value of the real number calculated by the determiner 17b is associated with the section. The table generation unit 18b executes a reinforcement learning algorithm to determine which LUT19b should be used in the face detection device 4b, and assigns the LUT to each layer.

[ action/Effect ]

According to the face detection device 4b of the second embodiment, in the determination process of each layer in the face detection process, LUT19a is not used but LUT19b is used in the determination based on the face rectangle 1 of each graphic (see fig. 16). In the LUT19b, real values of "0" to "1" are stored as determination values corresponding to respective ranges of difference values, instead of two values of "0" or "1".

Therefore, in the face detection apparatus 4b, the accuracy of the processing of each layer can be improved as compared with the face detection apparatus 4a that performs the processing using the LUT19a. For example, in the case of the LUT19a, the processing is performed completely equally in both the case where the determination value is determined to be "0" based on a weak difference that is hardly seen by the difference in the number of times between the normal solution distribution and the non-normal solution distribution (hereinafter, referred to as case 1) and the case where the determination value is determined to be "0" because the non-normal solution distribution significantly overwhelms the number (hereinafter, referred to as case 2). On the other hand, in the case of LUT19b, consideration is made to distinguish: in the case of case 1, for example, the judgment value is set to "0.4"; in case 2, the determination value is set to "0.1", for example. Therefore, the case of the case 1 and the case of the case 2 can be considered as different states (different scores), and the accuracy in detecting a face can be improved.

In addition, since the determination value of each pattern is made to be a real number and the accuracy is improved in this way, the number of patterns assigned to each layer can be reduced while maintaining the accuracy of the processing. That is, the determination process can be performed by a smaller number of patterns than before. Therefore, the processing speed can be increased. For the same reason, the number of layers can be reduced to achieve high speed.

In addition, according to the face detection apparatus 4b of the second embodiment, the judgment processing of the layer in which the judgment processing has been completed is implemented by adopting the total score of the layer in which the judgment processing has not been completed in the layer. In other words, according to the determination processing in which the determination value of each pattern of the layer for which the determination processing has been completed is reflected in the subsequent layer in which the number of patterns that have an influence on the determination processing is increased as compared with the number of patterns actually employed, by setting the number of patterns that have an influence on the determination processing, the accuracy of the determination processing for each layer can be improved as compared with the face detection apparatus 4a that does not perform such processing. Therefore, in the subsequent layer, the number of graphics assigned to each layer can be reduced while maintaining the accuracy of the determination process, and the processing can be speeded up. Similarly, the number of layers can be reduced, and the processing speed can be increased. In addition, in order to reduce the number of patterns, the resources used in the face detection device 4b may be reduced.

[ modified example ]

In the histogram shown in fig. 15 and the example of the LUT19b shown in fig. 16, the judgment value is expressed as the first decimal to the decimal point of the significant digit 1, but the judgment value is not necessarily limited to such a reference. That is, the judgment value is expressed as the value of the significant digit and the decimal place, and the designer can freely set the judgment value according to the situation.

The second determination unit 13b may be: instead of calculating the total score of each layer using the judgment values corresponding to the respective patterns assigned to each layer, the total score of each layer is calculated using only the judgment values exceeding the threshold value (for example, "0.2", "0.5") among the judgment values corresponding to the respective patterns assigned to each layer.

In addition, the second determination unit 13b may be: in calculating the total score of each layer, the total score of the previous layer is not limited to be used, and a value based on the total score of 1 or more layers subjected to processing prior to the layer may be used.

In addition, when obtaining the determination value using the total score of 1 or more image layers that have been processed in the past, the second determination unit 13b may use the total score as the determination value, or may use the total score as the determination value by adding some weight to the total score.

Claims

1. A specific subject detection apparatus, the specific subject detection apparatus characterized by comprising:

a storage device for storing each judgment value prepared corresponding to each of the plurality of feature quantities;

a calculation device that calculates feature amounts in the same region of interest from images by performing a plurality of different calculation processes;

a first determination unit configured to calculate a score based on the determination value corresponding to the feature amount calculated by the calculation unit among the determination values stored in the storage unit, and determine whether or not a specific object is included in the region of interest based on the score;

control means for acquiring a judgment result of a sufficient number to obtain a final judgment from the first judgment means by giving a plurality of feature amounts obtained by different calculation processes performed by the calculation means to the first judgment means in a group; and

a plurality of second judging means for finally judging whether or not a specific subject is included in the attention area based on a plurality of judgment results obtained by the first judging means by the control means,

wherein one of the second judgment devices calculates a total score of the second judgment device based on the judgment value of each graph to perform final judgment, and the other second judgment devices calculate a total score of the second judgment device based on the total score of the second judgment device before combination, and perform final judgment according to the calculated total score.

2. A specific subject detection apparatus, the specific subject detection apparatus characterized by comprising:

a storage device for storing each judgment value prepared for each of the plurality of feature quantities for each of the plurality of different graphics;

a calculation device for calculating a feature amount in the same region of interest from the image based on a plurality of different patterns;

first determination means for calculating a score based on a determination value corresponding to the pattern used by the calculation means and the feature amount calculated by the calculation means, and determining whether or not a specific subject is included in the region of interest based on the score;

a control unit that obtains a sufficient number of determination results from the first determination unit to obtain a final determination by giving a plurality of feature amounts obtained by calculation processing based on a plurality of different patterns to the first determination unit in the form of a group; and

one of the second judgment devices calculates a total score of the second judgment device based on the judgment value of each graph to perform final judgment, and the other second judgment devices calculate a total score of the second judgment device before being combined, and perform final judgment according to the calculated total score.

3. The specific object detection apparatus of claim 2,

the storage device associates and stores the feature quantities allocated to the plurality of sections with the determination values of the respective sections of the plurality of sections.

4. The specific object detection apparatus according to claim 3,

the judgment value in each section is a value determined by the judgment reference generation means,

the judgment reference generation device includes: a sample image feature amount calculating device for calculating each feature amount of the plurality of sample images from an arbitrary figure; a frequency obtaining unit that obtains, for each of the plurality of sections, a frequency of the sample image in which the feature amount calculated by the sample image feature amount calculating unit is included in the section; and a determination value determining device for determining a determination value for each of the sections by determining whether or not a specific subject should be determined to be included in a region of interest in which the feature amount calculated from the pattern is included in each of the sections, based on the number of times in the section.

5. The specific object detection apparatus according to any one of claims 1 to 4,

the first determination means obtains a plurality of feature quantities as a new group from the control means and calculates a new score using a plurality of determination values in each of the plurality of feature quantities as the new group and the score calculated for the group for which the determination processing has been completed by the first determination means.

6. A method of performing the steps of:

calculating a feature amount in the target region from the image by performing a plurality of different calculation processes;

a first determination step of calculating a score based on a determination value corresponding to the calculated feature amount from the stored determination values, and determining whether or not a specific subject is included in the target region based on the score;

a step of acquiring a sufficient number of determination results for obtaining a final determination by performing a plurality of feature amounts obtained by different calculation processes performed in the calculation step in a group to the first determination step to execute the step; and

a plurality of second determination steps for finally determining whether or not a specific object is included in the target region based on the plurality of determination results,

the method is characterized in that the method comprises the following steps,

one of the second judgment steps calculates a total score of the second judgment step based on the judgment value of each graph to perform final judgment, and the other second judgment steps combine the total score of the previous second judgment step to calculate the total score of the second judgment step and perform final judgment according to the calculated total score.

7. A method of performing the steps of:

calculating a feature amount in the same region of interest from the image based on a plurality of different patterns;

a first determination step of calculating a score based on a determination value corresponding to the pattern used in the calculation step and the feature amount calculated in the calculation step, and determining whether or not a specific object is included in the target region based on the score;

a step of obtaining a sufficient number of determination results for obtaining a final determination by giving, in a group form, a plurality of feature amounts obtained by performing the calculation step based on a plurality of different patterns to the first determination step to execute the step; and

a plurality of second determination steps for finally determining whether or not a specific object is included in the target region based on the plurality of determination results obtained,

the method is characterized in that the method comprises the following steps,

one of the second judging steps calculates a total score of the second judging device based on the judgment value of each graph to perform final judgment, and the other second judging steps combine the total score of the previous second judging step to calculate the total score of the second judging step, and perform final judgment according to the calculated total score.

8. The specific object detection apparatus according to claim 1 or 2, characterized in that the specific object detection apparatus further comprises:

means for referring to area graphic information defining a partial area for an image;

an arithmetic device for calculating a feature amount of the image by performing a predetermined operation based on the region figure information;

a determination value storage means for storing a combination of the feature values calculated for the plurality of sample images and determination values related to attributes of the images for which the feature values are calculated; and

and a determination unit for determining whether or not the image has the attribute based on the feature amount calculated for the image.

9. Method according to claim 6 or 7, characterized in that the method further performs the following steps:

a step of referring to area graphic information defining a partial area for an image;

an operation step of calculating a feature amount of the image by performing a predetermined operation based on the region figure information;

a step of referring to a determination value storage means that stores in combination the feature amount calculated for the plurality of sample images and a determination value related to an attribute of the image for which the feature amount is calculated; and

and a determination step of determining whether or not the image has the attribute based on the feature amount calculated for the image.