WO2010133161A1 - 对图像进行分类的方法和设备 - Google Patents

对图像进行分类的方法和设备 Download PDF

Info

Publication number
WO2010133161A1
WO2010133161A1 PCT/CN2010/072867 CN2010072867W WO2010133161A1 WO 2010133161 A1 WO2010133161 A1 WO 2010133161A1 CN 2010072867 W CN2010072867 W CN 2010072867W WO 2010133161 A1 WO2010133161 A1 WO 2010133161A1
Authority
WO
WIPO (PCT)
Prior art keywords
area
regions
axis
region
gradient
Prior art date
Application number
PCT/CN2010/072867
Other languages
English (en)
French (fr)
Inventor
张伦
吴伟国
Original Assignee
索尼公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 索尼公司 filed Critical 索尼公司
Priority to EP10777364A priority Critical patent/EP2434431A1/en
Priority to US13/319,914 priority patent/US20120093420A1/en
Priority to JP2012511134A priority patent/JP5545361B2/ja
Publication of WO2010133161A1 publication Critical patent/WO2010133161A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/446Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering using Haar-like filters, e.g. using integral image techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • G06V10/507Summing image-intensity values; Histogram projection analysis

Definitions

  • the present invention relates to the classification of a video or image (including an object/excluding an object), that is, detection or recognition of an object in a video or image, and more particularly to generating an image for distinguishing whether a video or an image contains an object to be detected.
  • a method and apparatus for a classifier, and a method and apparatus for classifying an image using the generated classifier are examples of classifiers.
  • the present invention is directed to a method, apparatus, and method and apparatus for classifying an image to improve the robustness of the image in the image.
  • An embodiment of the present invention is a method for generating an object image and a non-object image.
  • a method of a classifier comprising: extracting a set of features as feature vectors from each of a plurality of input images, wherein said extracting comprises: determining, for each feature of said feature vectors, a direction disposed along a first axis a plurality of first regions, and a plurality of second regions arranged along a direction of the second axis intersecting the first axis; calculating a first difference between pixels and or mean values of the plurality of first regions, and a second difference between pixels and or a mean of the plurality of second regions; and calculating a gradient size and a gradient direction based on the first difference and the second difference to form the each feature; and extracting according to the The feature vector trains the classifier.
  • Another embodiment of the present invention is an apparatus for generating a classifier for distinguishing between an object image and a non-object image, wherein the device extracts a set of features as feature vectors from each of the plurality of input images
  • the apparatus includes: a determining unit that determines, for each feature of the feature vector, a plurality of first regions arranged along a direction of the first axis, and a direction of a second axis that intersects the first axis a plurality of second regions arranged; a difference calculation unit that calculates a first difference between pixels and or mean values of the plurality of first regions, and a pixel and/or mean between the plurality of second regions a two-difference; and a gradient calculation unit that calculates a gradient size and a gradient direction based on the first difference and the second difference to form each of the features; and a training unit that trains the said based on the extracted feature vector Classifier.
  • the feature including the gradient direction and the gradient size is calculated from the pixels of the regions arranged in the two directions, the extracted features can more realistically reflect the distribution of the object edges in the corresponding image portions. .
  • a classifier generated based on such a feature can more robustly detect an object such as a person or an animal in an image, particularly an object having various postures.
  • each of the regions may be a rectangular region in which the first regions are in contact, and the second regions are also in contact.
  • the difference between the regional arrangements on which the at least two features are based includes one or more of the following: relative positional relationship of the regions, number of regions, shape of the region, size of the region , the aspect ratio of the area. This makes the features available for investigation more abundant, which makes it easier to select features that are suitable for distinguishing between objects and non-objects.
  • transforming features of at least one dimension of a plurality of feature vectors wherein the transformed features include a gradient direction and a gradient size
  • the converting comprising converting the gradient direction into a plurality of predetermined The interval in which the gradient direction belongs in the interval.
  • the distribution of the gradient sizes of the same dimensional features obtains the threshold of the corresponding sub-classifier.
  • Another embodiment of the present invention is a method of classifying an image, comprising: extracting a set of features from the image as feature vectors, wherein the extracting comprises: for each feature of the feature vector Determining a plurality of first regions arranged in a direction of the first axis, and a plurality of second regions arranged along a direction of the second axis intersecting the first axis; calculating pixels of the plurality of first regions and Or a first difference between the mean values, and a second difference between the pixels and or the mean of the plurality of second regions; and calculating a gradient magnitude and a gradient direction based on the first difference and the second difference to form a Each feature is described; and the image is classified according to the extracted feature vector.
  • Another embodiment of the present invention is an apparatus for classifying an image, comprising: feature extraction means for extracting a set of features as feature vectors from the image, comprising: a determining unit for the feature Each feature of the vector, determining a plurality of first regions arranged along a direction of the first axis, and a plurality of second regions arranged along a direction of the second axis intersecting the first axis; a difference calculation unit, the calculation a first difference between pixels and or a mean of the plurality of first regions, and a second difference between pixels and or a mean of the plurality of second regions; and a gradient calculation unit according to the first The difference and the second difference calculate a gradient size and a gradient direction to form each of the features; and a classification unit that classifies the image according to the extracted feature vector.
  • the extracted features can more completely reflect the distribution of the edge of the object in the corresponding image portion, Less affected by changes in the posture of the object.
  • a classifier generated based on such features enables more robust detection of objects such as humans or animals in an image, particularly objects having various poses.
  • each area may be a rectangular area, wherein the first area is connected And the second area is connected.
  • the difference between the regional arrangements on which the at least two features are based includes one or more of the following: a relative positional relationship of the regions, a number of regions, a shape of the region, The size of the area, the aspect ratio of the area. This makes the features available for investigation more abundant, which makes it easier to select features that are suitable for distinguishing between objects and non-objects.
  • classifying an image includes: determining, for each gradient direction and gradient size of each feature, a gradient direction interval to which the gradient direction belongs in the plurality of gradient direction intervals, each gradient direction interval Having a corresponding threshold; comparing the magnitude of the gradient with a corresponding threshold of the determined gradient direction interval; and generating a classification result based on the comparison.
  • FIG. 1 is a block diagram showing the structure of an apparatus for generating a classifier for distinguishing between an object image and a non-object image, according to an embodiment of the present invention.
  • FIGS. 2a to 2h are schematic diagrams showing an example of an area arrangement determined by a determining unit.
  • Fig. 3a shows an example of the distribution of the edge of the object (human body).
  • Figures 3b and 3c show the area arrangement shown in Figures 2a and 2b, respectively, in Figure 3a showing the replacement page (Rule 26)
  • a schematic diagram of the first region and the second region is determined in the portion.
  • Figure 4a is a schematic view showing the outline of an object included in the portion 302 shown in Figure 3a.
  • Fig. 4b is a diagram showing a gradient calculated by the gradient calculating unit based on the first difference and the second difference calculated by the difference calculating unit based on the first area and the second area shown in Figs. 3b and 3c.
  • Figure 5 illustrates a flow diagram of a method of generating a classifier for distinguishing between an object image and a non-image image, in accordance with one embodiment of the present invention.
  • Figure 6 is a block diagram showing the construction of a training unit for generating a classifier for distinguishing between an object image and a non-object image, in accordance with a preferred embodiment of the present invention.
  • Figure 7 shows a flow chart of a training method for generating a classifier for distinguishing between an object image and a non-object image, in accordance with a preferred embodiment of the present invention.
  • Figure 8 is a block diagram showing the structure of an apparatus for classifying images according to an embodiment of the present invention.
  • FIG. 9 shows a flow chart showing a method of detecting an object in an image, in accordance with one embodiment of the present invention.
  • Figure 10 is a block diagram showing the construction of a classification unit in accordance with a preferred embodiment of the present invention.
  • Figure 11 shows a flow chart of a classification method in accordance with a preferred embodiment of the present invention.
  • FIG. 12 is a block diagram showing an exemplary structure of a computer in which the present invention is implemented.
  • FIG. 1 is a block diagram showing the structure of an apparatus 100 for generating a classifier for distinguishing between an object image and a non-object image, in accordance with one embodiment of the present invention.
  • the apparatus 100 includes a determining unit 101, a difference calculating unit 102, a gradient calculating unit 103, and a training unit 104.
  • an object image and a non-object image are collected, features are extracted from the collected object image and non-object image, and the extracted features are selected and fused by the AdaBoost method.
  • Get the classification of the object image and the non-object image Device A method of collecting and preparing such an object image and a non-object image is disclosed in the patent application WO 2008/151470 to Ding et al., entitled "A Robust Human Face Detecting Method In Complicated Background Image" (see page 2 to page of the specification). 3 pages).
  • the collected and prepared object image and non-object image may be used as input images of the device 100.
  • the device 100 extracts a set of features as feature vectors from each of the plurality of input images.
  • the determining unit 101 determines, for each feature of the feature vector, a plurality of first regions arranged along a direction of the first axis, and intersects with the first axis (eg, intersecting at a right angle or a non-orthogonal angle) A plurality of second regions arranged in the direction of the second axis.
  • the determining unit 101 is for determining pixels in the input image on which each feature to be extracted is based.
  • the determining unit 101 can determine the pixels in the input image on which the image is based based on the predetermined area arrangement.
  • the arrangement of the first zone and the second zone can take various forms.
  • the weighted average position of the positions of the pixels of the plurality of first regions and the weighted average position of the positions of the pixels of the plurality of second regions are within a predetermined range of intersections of the first axis and the second axis.
  • the position of the pixel of the first region may be represented as (; ⁇ , yij ), where the jth pixel in the i-th first region is represented on the first axis (ie, the X-axis)
  • the coordinates represent the coordinates of the jth pixel in the i-th first region on the second axis (ie, the Y-axis).
  • the weighted average position ( ⁇ , ) of the position of the pixel of the first region can be defined as follows:
  • the weights of all of the first regions may be the same or at least partially different. In the case of different conditions, a smaller weight may be assigned to the first region containing more pixels, and a larger weight may be assigned to the first region containing less pixels.
  • the area may be a rectangular area, the first area is contiguous, and the second area is contiguous.
  • FIG. 2 is a schematic diagram showing another example of the area arrangement determined by the determining unit 101.
  • the X axis represents the first axis
  • the Y axis represents the second axis
  • the white and black of the rectangular block are only It is used for the purpose of distinction.
  • the first axis and the second axis in FIG. 2 are shown as being orthogonal to each other, the first axis and the second axis may also intersect at a non-orthogonal angle.
  • the number of first regions and the number of second regions are both two, the first regions are connected and the second regions are connected.
  • the intersection of the first axis and the second axis is within a predetermined range (eg, substantially coincident) of the connection line of the first region or the connection point (eg, when the vertices of the rectangular region meet), and Within the predetermined range of the connection line or connection point of the two areas.
  • Figures 2a and 2b An example of such an area arrangement is shown in Figures 2a and 2b.
  • Figure 2a shows the arrangement of the first region on the first axis, wherein the white rectangular block 201 and the black rectangular block 202 each represent the first region and meet on the connecting line, while the first axis and the second axis The intersection is on the connection line.
  • Figure 2b shows the arrangement of the second region on the second axis, wherein both the white rectangular block 203 and the black rectangular block 204 represent the second region and meet on the connecting line, while the intersection of the first axis and the second axis is Connect the line.
  • Figures 2a and 2b Although the arrangement of the regions on the first axis and the second axis is shown in Figures 2a and 2b, respectively, it actually reflects the combination of Figures 2a and 2b, i.e., the first axis and the second axis of Figure 2a, respectively.
  • the area arrangement of the first axis and the second axis of Figure 2b is the same.
  • the rectangular blocks 201 and 202, and the rectangular blocks 203 and 204 may be joined to each other by respective vertices.
  • the number of first regions and the number of second regions are both two, the first regions are spaced apart and the second regions are spaced apart.
  • the intersection of the first axis and the second axis is within a predetermined range of the midpoint between the center of the position of the first region and the center of the position of the second region.
  • Figure 2c shows the arrangement of the first area on the first axis, wherein the white rectangular block 205 and the black rectangular block 206 both represent the first area and are spaced apart, while the intersection of the first axis and the second axis is in a white rectangle
  • Figure 2d shows the arrangement of the second region on the second axis, wherein the white rectangular block 207 and the black rectangular block 208 both represent the second region and are spaced apart, while the intersection of the first axis and the second axis is in a white rectangle
  • FIG. 2g and FIG. 2h show another example of such an area arrangement in which the vertices of the rectangular blocks are opposite.
  • Figure 28 shows the arrangement of the first area on the first axis, wherein the white rectangular block 215 and the black rectangular block 216 each represent the first area and are spaced apart, and the intersection of the first axis and the second axis The point is within a predetermined range of the midpoint between the positions of the positions of the white rectangular block 215 and the black rectangular block 216.
  • Figure 2h shows the arrangement of the second region on the second axis, wherein both the white rectangular block 217 and the black rectangular block 218 represent the second region and are spaced apart, while the intersection of the first axis and the second axis is in a white rectangle
  • the number of first zones and the number of second zones are all three.
  • the intersection of the first axis and the second axis are respectively within the first region intermediate the first region and the second region intermediate the second region.
  • Figures 2e and 2f An example of such an area arrangement is shown in Figures 2e and 2f.
  • Figure 2e shows the arrangement of the first area on the first axis, wherein the white rectangular block 210 and the black rectangular blocks 209, 211 each represent the first area, and the intersection of the first axis and the second axis is in the centered white rectangular block Within 210.
  • Figure 2f shows the arrangement of the second region on the second axis, wherein the white rectangular block 213 and the black rectangular blocks 212, 214 each represent the second region, and the intersection of the first axis and the second axis is in the centered white rectangular block Within 213.
  • Figures 2e and 2f Although the arrangement of the first and second on-axis regions is shown in Figures 2e and 2f, respectively, it is actually reflected that Figure 2e and Figure 2f are combined, i.e., the first and second axes of Figure 2e are respectively The area arrangement of the first axis and the second axis of Fig. 2f is the same.
  • rectangular blocks 209, 210, and 211, and rectangular blocks 212, 213, and 214 may be separate rather than joined.
  • the shapes of the first area and the second area are not limited to rectangles, and may be other shapes such as a polygon, a triangle, a circle, a ring, and an irregular shape.
  • the shapes of the first area and the second area may also be different, and the shapes of the different first/second areas may also be different.
  • the sides of the different regions in the first region may be parallel to each other, or may be rotated at an angle relative to each other.
  • the sides of different regions in the second region may be parallel to each other or may be rotated at an angle relative to each other.
  • the intersection of the rectangular regions includes being joined by respective sides (ie, the intersection of the first axis and the second axis is on the sides), and is connected by the vertices of the respective corners (ie, The intersection of the first axis and the second axis is at these vertices).
  • the number of the first regions arranged on the first axis and the second regions arranged on the second axis is not limited to the number shown in FIG. 2, and the number of the first regions does not have to be the same as the number of the second regions.
  • the weighted average position of the position of the pixel of the first area and the weighted average position of the position of the pixel of the second area are within a predetermined range of the intersection of the first axis and the second axis.
  • first The number of regions and the number of second regions do not exceed three.
  • the relative positional relationship of the first region and the relative positional relationship of the second region may be arbitrary, for example, the first region disposed on the first axis may be connected, separated, partially connected.
  • the partially separated second regions disposed on the second axis may be contiguous, separated, partially contiguous, partially separated, as long as the weighted average position of the positions of the pixels of the first region and the pixels of the second region
  • the weighted average position of the position is within a predetermined range of intersections of the first axis and the second axis.
  • the edge contour of the object exhibits features that are different from the non-object.
  • the edge gallery of the object may have various distributions in the object image.
  • the determining unit 101 may determine the first region and the second region in different sized portions at different positions of the input image to obtain the contour features within the portion.
  • Figure 3a shows an example of the distribution of edge rims of an object (human body). As shown in Fig. 3a, in the input image, the edge contour of the human body exists in, for example, portions of different sizes and positions of the portions 301, 302, and 303.
  • Figures 3b and 3c show schematic diagrams for determining the first area and the second area in the portion 302 shown in Figure 3a based on the area arrangement shown in Figures 2a and 2b.
  • reference numeral 304 indicates the arrangement of the first region.
  • reference numeral 305 indicates the arrangement of the first region.
  • the determining unit 101 may determine the first area and the second area at different positions of the input image based on an area arrangement. A new area arrangement is then obtained by varying the area size and/or area aspect ratio in such an area arrangement, and the first area and the second area are determined at different locations of the input image based on the new area arrangement. This process is repeated until all possible area sizes or area aspect ratios for this area layout have been tried.
  • the determining unit 101 may obtain a new area arrangement by changing the relative positional relationship of the areas in the area arrangement.
  • the determining unit 101 can obtain a new area arrangement by changing the number of areas in the area arrangement.
  • the determining unit 101 can obtain a new area arrangement by changing the shape of the area in the area arrangement.
  • the determining unit 101 determines a feature to be extracted based on a first region and a second region determined by a region arrangement in a position in the input image.
  • the regional arrangement on which at least two features are based is different.
  • differences between different regional arrangements may include the following One or more of: the relative positional relationship of the regions, the number of regions, the shape of the region, the size of the region, and the aspect ratio of the region.
  • the difference calculation unit 102 calculates the pixels and or the mean of the first region (gray The first difference dx between degrees and the second difference dy between the pixels of the second region and or the mean (grayscale).
  • the first difference and the second difference can be calculated by:
  • the first difference the pixel of the rectangular block 202 and or the mean - the sum of the pixels of the rectangular block 201 and the mean
  • the second difference the pixel of the rectangular block 202 and or the mean - the pixel and or the mean of the rectangular block 201.
  • the first difference and the second difference can be calculated by:
  • the first difference the pixel of the rectangular block 206 and or the mean - the sum of the pixels of the rectangular block 205 and the mean
  • the second difference the pixel of the rectangular block 208 and or the mean - the pixel and or the mean of the rectangular block 207.
  • the first difference and the second difference can be calculated by:
  • the first difference the pixel of the rectangular block 209 and or the mean + the pixel of the rectangular block 211 and or the mean - the pixel of the rectangular block 210 and or the mean x2,
  • the second difference the pixel of the rectangular block 212 and or the mean + the block of the rectangular block 214 and or the mean - the pixel of the rectangular block 213 and or the mean x2.
  • the first difference and the second difference can be calculated by:
  • the purpose of calculating the difference between the pixels of the upper axial region and the mean (grayscale) is to obtain information reflecting the change in the gray level of the pixel in the corresponding axis. For different regional arrangements, there may be corresponding methods of calculating the first difference and the second difference as long as they reflect this change.
  • the gradient calculating unit 103 calculates the gradient size and the gradient direction based on the first difference and the second difference calculated by the difference calculating unit to form the extracted features.
  • the angle of the gradient direction ranges from 0 to 180 degrees.
  • the gradient direction can be calculated according to the following formula:
  • the angle of the gradient direction ranges from 0 to 360 degrees.
  • Fig. 4a is a schematic view showing an object corridor included in the portion 302 shown in Fig. 3a.
  • the edge 401 schematically represents the edge rim included in the portion 302.
  • FIG. 4b is a schematic diagram showing the gradient direction calculated by the gradient calculating unit 103 based on the first difference and the second difference calculated by the difference calculating unit 102 based on the first area and the second area shown in FIGS. 3b and 3c.
  • the normal 403 of the oblique line 402 represents the calculated gradient direction.
  • the features including the gradient direction and the gradient size are calculated from the pixels of the co-located regions arranged in two directions, the extracted features can more realistically reflect the distribution of the object edges in the corresponding image portions. Accordingly, a classifier generated based on such a feature can more accurately detect an object such as a person or an animal in an image, particularly an object having various postures.
  • a feature vector is formed for all features extracted for each input image.
  • the training unit 104 trains the classifier based on the extracted feature vectors.
  • the classifier can be trained according to the feature vector obtained in the above embodiment by a machine learning method such as SVM (Support Vector Machine) using a directional gradient histogram.
  • SVM Serial Vector Machine
  • a machine learning method such as SVM (Support Vector Machine) using a directional gradient histogram.
  • Figure 5 illustrates a flow diagram of a method 500 of generating a classifier for distinguishing between an object image and a non-image image, in accordance with one embodiment of the present invention.
  • Steps 503 505 and 507 are used to extract a set of features from the current input image as feature vectors.
  • step 503 for each feature of the feature vector, determining a plurality of first regions arranged along a direction of the first axis, and along with said A plurality of second regions arranged in a direction in which the one axis intersects (eg, intersect at a right angle or a non-orthogonal angle).
  • the arrangement of the first area and the second area may be the area arrangement previously described in connection with the embodiment of Fig. 1.
  • the first region and the second region may be determined within different sized portions at different locations of the input image to obtain edge contour features within the portion.
  • the first region and the second region may be determined at different locations of the input image based on an area arrangement at step 503.
  • a new area arrangement is then obtained by changing the area size and/or area aspect ratio in such an area arrangement, and the first area and the second area are determined at different locations of the input image based on the new area arrangement. This process is repeated until all possible area sizes or area aspect ratios for this area arrangement have been tried.
  • a new regional arrangement may be obtained at step 503 by changing the relative positional relationship of the regions in the regional arrangement.
  • a new regional arrangement may be obtained at step 503 by changing the number of regions in the regional arrangement.
  • a new regional arrangement may be obtained at step 503 by changing the shape of the area in the area arrangement.
  • the first region and the second region determined based on a position of a region arrangement in the input image determine a feature to be extracted.
  • the arrangement of the regions on which at least two features are based is different.
  • differences between different regional arrangements may include one or more of the following: relative positional relationship of the regions, the number of regions, the shape of the regions, the size of the regions, and the aspect ratio of the regions.
  • a first difference between the pixels and or the mean of the first region, and a second difference between the pixels and or the mean of the second region are calculated.
  • the first difference and the second difference can be calculated by the method previously described in connection with the embodiment of Fig. 1.
  • the gradient magnitude and gradient direction are calculated from the calculated first difference and second difference to form the extracted features.
  • the ladder can be calculated according to formula (1) (or (1,)) and (2) Degree direction and gradient size.
  • step 509 it is determined whether there are unextracted features for the current input image. If yes, return to step 503 to perform the process of extracting the next feature; otherwise, step 511 is performed.
  • step 511 it is determined whether there is still an input image of the feature vector not extracted. If yes, return to step 503 to perform the process of extracting the feature vector of the next input image; otherwise, the method proceeds to step 513.
  • the features including the gradient direction and the gradient size are calculated from the pixels of the co-located regions arranged in two directions, the extracted features can more realistically reflect the edge of the object in the corresponding image portion. distributed. Accordingly, a classifier generated based on such a feature can more robustly detect an object such as a person or an animal in an image, particularly an object having various postures.
  • a feature vector is formed for all features extracted for each input image.
  • the classifier is trained based on the extracted feature vector.
  • the classifier can be trained according to the feature vector obtained in the above embodiment by a machine learning method such as SVM (Support Vector Machine) using a directional gradient histogram.
  • SVM Serial Vector Machine
  • a machine learning method such as SVM (Support Vector Machine) using a directional gradient histogram.
  • Method 500 ends at step 515.
  • the directional gradient histogram may also be employed to train the classifier according to the gradient features obtained in the above embodiments.
  • Figure 6 is a block diagram showing the structure of a training unit 104 that generates a classifier for distinguishing between an object image and a non-object image in accordance with a preferred embodiment of the present invention.
  • the training unit 104 includes a conversion unit 601 and a classifier generation unit 602.
  • the transform unit 601 converts features of at least one dimension of the plurality of feature vectors, wherein the transformed features include a gradient direction and a gradient size.
  • the feature vector may be the feature vector generated in the embodiment described above with reference to FIGS. 1 and 5.
  • the conversion performed by the conversion unit 601 includes The gradient direction is converted into an interval in which the gradient direction belongs in a plurality of predetermined intervals.
  • the angular extent of the gradient direction is 180 degrees.
  • This range can be divided into a number of predetermined intervals (also called gradient direction intervals), for example, divided into three intervals of 0 to 60 degrees, 60 degrees to 120 degrees, and 120 degrees to 180 degrees.
  • the angle range of the gradient direction can also be 360 degrees.
  • the number of predetermined intervals is preferably from 3 to 15. The larger the number of predetermined intervals, the more detailed the angle division is, which is more conducive to obtaining a stronger classification ability (lower error rate), but it is more likely to produce learning phenomena during detection, which makes the classification effect worse.
  • the conversion unit 601 converts the gradient direction into a corresponding interval according to the interval in which the gradient direction of the feature is located.
  • the feature vector is expressed as ⁇ , . . . , f M >, where ft includes the gradient size Ii and the gradient direction Oi.
  • the transformed feature is denoted fi, where f, i includes the gradient size Ii and the interval.
  • a classifier corresponding to the dimension can be generated from the feature ft of the same dimension of each feature vector.
  • the classifier can be expressed as hi(I, 0), where I represents the magnitude of the gradient and O represents the direction of the gradient.
  • the classifier includes N sub-classifiers (1) corresponding to N predetermined intervals Kj, respectively, 0 ⁇ j ⁇ N + l for classifying features whose gradient directions belong to respective predetermined intervals.
  • Each sub-classifier (1) has a corresponding threshold ⁇ ⁇ and a classification aij and (object, non-object) determined based on the threshold.
  • the threshold ⁇ ⁇ and the classification aij can be learned according to the distribution of the gradient sizes of the features of the feature f′i of the respective transformed feature vectors and the interval Kj.
  • a classifier generating unit 602 for each dimension of the at least one dimension, generating a classifier including sub-classifiers respectively corresponding to the predetermined sections, wherein, for each of the predetermined sections, according to the feature vector
  • the distribution of the gradient size of the dimensional feature in the middle interval is the same as the predetermined interval, and the threshold of the corresponding sub-classifier and the classification determined based on the threshold are obtained.
  • a measure of the reliability of the determined classification and can be further obtained.
  • transformation and classifier generation can be performed for only one dimension, and the generated classifier acts as a classifier for distinguishing between an object image and a non-object image.
  • the at least one dimension may comprise at least two dimensions or all dimensions of the feature vector.
  • classifiers corresponding to each of the dimensions may be separately generated, and the final classifiers are obtained according to the generated respective classifiers.
  • the classifiers corresponding to the respective dimensions can be combined into a final classifier by known methods.
  • the Adaboost method is a method for classification that can be used to combine classifiers generated for each dimension to combine new and powerful classifiers.
  • weights are set for each sample, and the classifier is combined by iterative methods. At each iteration, when the classifier correctly classifies certain samples, the weights of these samples are reduced; when misclassification, the weights of these samples are increased, so that the learning algorithm concentrates on the more difficult training in subsequent learning.
  • the sample is studied, and finally a classifier with an ideal recognition accuracy is obtained.
  • one of the predetermined intervals is an interval representing a weak gradient.
  • the conversion unit 601 converts the gradient direction into a section representing the weak gradient in the case where the gradient magnitude of the feature is smaller than a predetermined threshold. For sub-weak classifiers corresponding to intervals representing weak gradients, features are classified as non-objects regardless of the size of the gradient.
  • Figure 7 illustrates a flow diagram of a training method 700 for generating a classifier for distinguishing between an object image and a non-object image, in accordance with a preferred embodiment of the present invention.
  • method 700 begins at step 701.
  • steps 703 features of at least one dimension of the plurality of feature vectors are transformed, wherein the transformed features include a gradient direction and a gradient size.
  • the feature vector may be the feature vector generated in the embodiment described above with reference to Figs. 1 and 5.
  • the conversion performed includes converting the gradient direction into intervals in which the gradient direction belongs in a plurality of predetermined intervals.
  • step 705 generating, for the current dimension of the transformed feature vector, a classifier including sub-classifiers respectively corresponding to the predetermined interval, wherein, for each of the predetermined intervals, according to the feature vector
  • the distribution of the gradient size of the current dimensional feature having the same interval as the predetermined interval obtains a threshold of the corresponding sub-classifier and a classification determined based on the threshold. Alternatively, a measure of the reliability of the determined classification and the sum may also be obtained.
  • step 707 it is determined if there is a dimension for which the classifier is not generated. If so, return to step 705 to generate a classifier for the next dimension; otherwise the method ends at step 709.
  • transformation and classifier generation can be performed for only one dimension, and the resulting classifier acts as a classifier for distinguishing between object images and non-object images.
  • the at least one dimension may include at least two dimensions or all dimensions of the feature vector.
  • classifiers corresponding to each of the dimensions can be separately generated, and the final classifier can be obtained based on the generated respective classifiers.
  • the classifiers corresponding to the respective dimensions can be combined into a final classifier by a known method, such as the Paula method of Paul Viola et al. to form a final classifier based on the generated classifier.
  • one of the predetermined intervals is an interval representing a weak gradient.
  • the gradient direction is converted into a section representing the weak gradient. For sub-weak classifiers corresponding to intervals representing weak gradients, features are classified as non-objects regardless of the size of the gradient.
  • FIG. 8 is a block diagram showing the structure of an apparatus 800 for classifying images in accordance with one embodiment of the present invention.
  • the apparatus 800 includes a determining unit 801, a difference calculating unit 802, a gradient calculating unit 803, and a sorting unit 804.
  • the image of input device 800 can be a pre-sized image obtained from the image to be processed through the scan window.
  • the image can be obtained by the method described in the patent application WO 2008/151470, entitled “A Robust Human Face Detecting Method In Complicated Background Image” by Ding et al. (see page 5 of the specification).
  • the feature vector to be extracted is the feature vector on which the classifier used by the classifying unit 804 is based.
  • the determining unit 801 determines, for each feature of the feature vector, a plurality of first regions arranged along a direction of the first axis, and intersects with the first axis (eg, intersecting at a right angle or a non-orthogonal angle) A plurality of second regions arranged in the direction of the second axis.
  • the area arrangement of the first area and the second area on which the determining unit 801 is based may be the area arrangement described by the front combining determining unit 101.
  • the difference calculating unit 802 calculates the pixel and or mean of the first region.
  • the gradient direction and gradient size can be calculated according to the formula (1) (or (1')) and (2).
  • the gradient calculation unit 803 calculates the gradient size and the gradient direction based on the first difference and the second difference calculated by the difference calculation unit 802 to form the extracted features.
  • the gradient size and gradient direction can be calculated using the method previously described in connection with gradient calculation unit 103.
  • a feature vector is formed for all features extracted from the input image.
  • the classification unit 804 classifies the input image based on the extracted feature vector.
  • the classifier employed by the classification unit 804 may be a classifier generated in the previous embodiment, such as a classifier generated using a directional gradient histogram, and a classifier generated based on a gradient direction interval.
  • FIG. 9 shows a flow chart showing a method 900 of classifying images in accordance with one embodiment of the present invention.
  • method 900 begins at step 901. Steps 903, 905, and 907 are used to extract a set of features from the current input image as feature vectors.
  • the feature vector to be extracted is the feature vector on which the classifier used is based.
  • the input image may be an image of a predetermined size obtained from the image to be processed through the scan window.
  • An image can be obtained by the method described in the patent application WO 2008/151470 to Ding et al., entitled "A Robust Human Face Detecting Method In Complicated Background Image" (see page 5 of the specification).
  • a plurality of first regions arranged along a direction of the first axis are determined, and intersecting with the first axis (eg, intersecting at a right angle or a non-orthogonal angle)
  • a plurality of second regions arranged in the direction of the second axis may be the area arrangement described by the front combining determining unit 101.
  • the gradient magnitude and gradient direction are calculated based on the calculated first difference and second difference to form the extracted features.
  • the gradient direction and gradient size can be calculated according to equations (1) (or (1,)) and (2).
  • step 909 it is determined whether there are unextracted features for the current input image. If yes, return to step 903 to perform the process of extracting the next feature; otherwise, step 911 is performed.
  • a feature vector is formed for all features extracted from the input image.
  • the input image is classified according to the extracted feature vector.
  • the classifier employed in step 911 may be a classifier generated in the previous embodiment, such as a classifier generated using a directional gradient histogram, a classifier generated based on a gradient direction interval.
  • Method 143 ends at step 913.
  • Figure 10 is a block diagram showing the structure of a classification unit 104 in accordance with a preferred embodiment of the present invention.
  • the classification unit 104 includes classifiers 1001 to 100M, which are the number of features in the extracted feature vectors. Each classifier corresponds to one feature.
  • the classifiers 1001 to 100M may be classifiers previously described with reference to Fig. 6.
  • the classifier 1001 includes a plurality of sub-classifiers 1001-1 to 1001-N. As previously described with reference to Figure 6, each of the sub-classifiers 1001-1 through 1001-N corresponds to a different gradient direction interval, and each gradient direction interval has a corresponding threshold.
  • a sub-classifier to which the gradient direction of the feature belongs eg, sub-classifiers 1001-1 through 1001-N
  • the sub-classifier compares the gradient size of the feature with the corresponding threshold of the gradient direction interval, and generates a classification result according to the comparison result.
  • the classification result can be the classification of the image (object, non-object). Alternatively, the classification result may also include the reliability of the image classification.
  • the classification results produced by the respective classifiers according to the corresponding features of the feature vectors can be combined into a final classification result by a known method.
  • the Adaboost method can be used.
  • FIG. 11 shows a flow chart of a classification method in accordance with a preferred embodiment of the present invention. This method can be used to implement step 911 of Figure 9.
  • the method begins at step 1101.
  • step 1103 for a feature of the extracted feature vector, a gradient direction interval to which the gradient direction of the feature belongs in a plurality of gradient direction intervals (as described with reference to Fig. 6) associated with the feature is determined.
  • each gradient direction interval has a corresponding threshold.
  • step 1105 the gradient magnitude of the feature and the corresponding threshold of the determined gradient direction interval are compared.
  • a classification result is generated based on the comparison result.
  • the classification result can be the classification of the image (object, non-object).
  • the classification result may also include the reliability of the image classification.
  • step 1109 it is determined if there are any unprocessed features in the feature vector. If so, return to step 1103 to continue processing the next feature. If not, the method ends in step 1111.
  • FIG. 12 is a block diagram showing an exemplary structure of a computer in which the present invention is implemented.
  • the device and method implementation environment of the present invention is illustrated in FIG.
  • the central processing unit (CPU) 1201 executes various processes in accordance with a program stored in the read-only mapping data (ROM) 1202 or a program loaded from the storage portion 1208 to the random access mapping data (RAM) 1203. .
  • ROM read-only mapping data
  • RAM random access mapping data
  • the RAM 1203 data required when the CPU 1201 executes various processes and the like is also stored as needed.
  • the CPU 1201, the ROM 1202, and the RAM 1203 are connected to each other via a bus 1204.
  • Input/output interface 1205 is also coupled to bus 1204.
  • the following components are connected to the input/output interface 1205: an input portion 1206 including a keyboard, a mouse, etc.; an output portion 1207 including a display such as a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker And so on; a storage portion 1208 including a hard disk or the like; and a communication portion 1209 including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 1209 performs communication processing via a network such as the Internet.
  • the driver 1210 is also connected to the input/output interface 1205 as needed.
  • the removable medium 1211 such as a magnetic disk, an optical disk, a magneto-optical disk, semiconductor mapping data, and the like are mounted on the drive 1210 as needed, so that the calculations read therefrom are installed into the storage portion 1208 as needed.
  • a program constituting the software is installed from a network such as the Internet or a storage medium such as the detachable shield 1211.
  • a storage medium is not limited to the detachable shield 1211 shown in FIG. 12 in which a program is stored and distributed separately from the method to provide a program to the user.
  • the detachable medium 1211 include a magnetic disk, an optical disk (including a CD-ROM and a digital versatile disk (DVD)), a magneto-optical disk (including a mini disk (MD), and semiconductor mapping data.
  • a storage medium shield It may be a ROM 1202, a hard disk included in the storage section 1208, and the like, in which programs are stored, and distributed to the user together with the method including them.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Description

对图像进行分类的方法和设备
技术领域
[01] 本发明涉^ JI†视频或图像的分类 (包含对象 /不包含对象),即视频或图 像中对象的检测或识别,尤其涉及生成用于区分视频或图像中是否包含所 要检测的对象的分类器的方法和设备,以及用所生成的分类器对图像进行 分类的方法和设备。
背景技术
[02] 随着视频监控、人工智能、计算机视觉等应用的不断普及, 对检测视 频和图像中出现的特定对象, 例如人、 动物、 车辆等等的技术的需求日益 增加。在用于检测视频或者图像中的对象的方法中, 已知有一类方法是采 用静态图像特征来建立用于区分视频或图像中是包含对象还是非对象的 分类器, 从而用该分类器来对图像进行分类, 即在图像中检测对象, 其中 对于视频, ^一帧视为一幅图像来进行检测。
[03] Paul Viola和 Michael Jones在" Robust Real-time Object Detection", Second International Workshop On Statistical And Computational Theories Of Vision - Modeling, Learning, Computing, And Sampling, Vancouver, Canada, July 13, 2001中公开了一种这样的技术。在 Paul Viola 等人的技术中, 从图像中提取矩形块的像素和之间的差作为特征, 通过 AdaBoost方法从所提取的特征中选择更适合用来区分对象和非对象的特 征来形成弱分类器,并且通过融合弱分类器来形成强分类器。这类方法比 较适合在图像中检测例如人脸这样的对象,但是对于例如人这样的对象的 检测的鲁棒性则不是很高。
发明内容
[04] 鉴于现有技术的上述不足, 本发明旨在提供一种生成分类器的方法、 设备和对图像进行分类的方法和设备, 以提高图像中对 测的鲁棒性。
[05] 本发明的一个实施例是一种生成用于区分对象图像和非对象图像的 分类器的方法, 包括:从多个输入图像的每个图像中提取一组特征作为特 征向量, 其中所述提取包括: 对于所述特征向量的每个特征, 确定沿第一 轴的方向布置的多个第一区域,和沿与所述第一轴相交的第二轴的方向布 置的多个第二区域; 计算所述多个第一区域的像素和或均值之间的第一 差,和所述多个第二区域的像素和或均值之间的第二差; 和根据所述第一 差和第二差计算梯度大小和梯度方向, 以形成所述每个特征; 和根据所述 提取的特征向量训练出所述分类器。
[06] 本发明的另一个实施例是一种生成用于区分对象图像和非对象图像 的分类器的设备,其中所述设备从多个输入图像的每个图像中提取一组特 征作为特征向量, 所述设备包括: 确定单元, 其对于所述特征向量的每个 特征,确定沿第一轴的方向布置的多个第一区域,和沿与所述第一轴相交 的第二轴的方向布置的多个第二区域; 差计算单元,其计算所述多个第一 区域的像素和或均值之间的第一差,和所述多个第二区域的像素和或均值 之间的第二差; 和梯度计算单元,其根据所述第一差和第二差计算梯度大 小和梯度方向, 以形成所述每个特征; 和训练单元, 其根据所述提取的特 征向量训练出所述分类器。
[07] 根据本发明的上述实施例,由于根据沿两个方向布置的区域的像素来 计算包含梯度方向和梯度大小的特征,所提取的特征能够更加真实地反映 相应图像部分中对象边缘的分布。基于这样的特征产生的分类器, 能够更 加鲁棒地检测图像中例如人或动物的对象, 尤其是具有各种姿态的对象。
[08] 进一步地, 在上述方法和设备中, 各区域可以是矩形区域, 其中第一 区域是相接的, 并且第二区域也是相接的。
[09] 在上述方法和设备中, 在第一区域的数目和第二区域的数目均为二, 第一区域是相接的并且第二区域是相接的情况下,第一轴和第二轴的交点 在第一区域的连接线上或连接点的预定范围内,并且在第二区域的连接线 上或连接点的预定范围内。
[10] 在上述方法和设备中, 在第一区域的数目和第二区域的数目均为二, 第一区域是间隔开的并且第二区域是间隔开的情况下,第一轴和第二轴的 交点在第一区域的位置中心之间的中点和第二区域的位置中心之间的中 点的预定范围内。
[11] 在上述方法和设备中,在第一区域的数目和第二区域的数目均为三的 情况下,第一轴和第二轴的交点分别在第一区域中居于中间的第一区域内 和第二区域中居于中间的第二区域内。
[12] 在上述方法和设备中,至少两个特征所基于的区域布置之间的差别包 括下述中的一个或多个: 区域的相对位置关系、区域的数目、区域的形状、 区域的大小、 区域的纵横比。 这使得可供考察的特征更加丰富, 从而更加 利于选择适合区分对象和非对象的特征。
[13] 在上述方法和设备中, 对多个特征向量的至少一个维的特征进行转 换,其中被转换的特征包括梯度方向和梯度大小, 所述转换包括将所述梯 度方向转换为多个预定区间中所述梯度方向属于的区间。针对所述至少一 个维的每个维, 生成包含分别对应于所述预定区间的子分类器的分类器, 其中,对于每个所述预定区间,根据所述特征向量中区间与所述预定区间 相同的该维特征的梯度大小的分布, 获得相应子分类器的阈值。
[14] 本发明的另一个实施例是一种对图像进行分类的方法, 包括:从所述 图像中提取一组特征作为特征向量,其中所述提取包括:对于所述特征向 量的每个特征,确定沿第一轴的方向布置的多个第一区域,和沿与所述第 一轴相交的第二轴的方向布置的多个第二区域;计算所述多个第一区域的 像素和或均值之间的第一差,和所述多个第二区域的像素和或均值之间的 第二差; 和根据所述第一差和第二差计算梯度大小和梯度方向, 以形成所 述每个特征; 和根据所述提取的特征向量, 对所述图像进行分类。
[15] 本发明的另一个实施例是一种对图像进行分类的设备, 包括:特征提 取装置, 其从所述图像中提取一组特征作为特征向量, 包括: 确定单元, 其对于所述特征向量的每个特征,确定沿第一轴的方向布置的多个第一区 域,和沿与所述第一轴相交的第二轴的方向布置的多个第二区域; 差计算 单元,其计算所述多个第一区域的像素和或均值之间的第一差,和所述多 个第二区域的像素和或均值之间的第二差; 和梯度计算单元,其根据所述 第一差和第二差计算梯度大小和梯度方向, 以形成所述每个特征; 和分类 单元, 其根据所述提取的特征向量, 对所述图像进行分类。
[16] 在上述方法和设备中,如前所述, 由于能够根据多个区域的像素来计 算图像部分的梯度,所提取的特征能够更加完备地反映相应图像部分中对 象边缘的分布, 而较少受对象姿态变化的影响。基于这样的特征产生的分 类器, 能够更加鲁棒地检测图像中例如人或动物的对象,尤其是具有各种 姿态的对象。
[17] 在上述方法和设备中,各区域可以是矩形区域,其中第一区域是相接 的, 并且第二区域是相接的。
[18] 在上述方法和设备中, 在第一区域的数目和第二区域的数目均为二, 第一区域是相接的并且第二区域是相接的情况下,第一轴和第二轴的交点 在第一区域的连接线上或连接点的预定范围内,并且在第二区域的连接线 上或连接点的预定范围内。
[19] 在上述方法和设备中, 在第一区域的数目和第二区域的数目均为二, 第一区域是间隔开的并且第二 E域是间隔开的情况下,第一轴和第二轴的 交点在第一区域的位置中心之间的中点和第二区域的位置中心之间的中 点的预定范围内。
[20] 在上述方法和设备中,在第一区域的数目和第二区域的数目均为三的 情况下,第一轴和第二轴的交点分别在第一区域中居于中间的第一区域内 和第二区域中居于中间的第二区域内。
[21] 进一步地,在上述方法和设备中,至少两个特征所基于的区域布置之 间的差别包括下述中的一个或多个: 区域的相对位置关系、 区域的数目、 区域的形状、 区域的大小、 区域的纵横比。 这使得可供考察的特征更加丰 富, 从而更加利于选择适合区分对象和非对象的特征。
[22] 进一步地,在上述方法和设备中,对图像进行分类包括: 对于每个特 征的梯度方向和梯度大小,确定多个梯度方向区间中梯度方向所属的梯度 方向区间,每个梯度方向区间具有相应的阈值; 比较所述梯度大小和所确 定的梯度方向区间的相应阔值; 和根据比较结果产生分类结果。
附图说明
[23] 参照下面结合附图对本发明实施例的说明,会更加容易地理解本发明 的以上和其它目的、特点和优点。在附图中,相同的或对应的技术特征或 部件将采用相同或对应的附图标记来表示。在附图中不必依照比例绘制出 单元的尺寸和相对位置。
[24] 图 1的框图示出了根据本发明一个实施例的、生成用于区分对象图像 和非对象图像的分类器的设备的结构。
[25] 图 2a至图 2h是示出确定单元所确定的区域布置的例子的示意图。
[26] 图 3a示出了对象(人体)的边 ^^廓的分布的一个例子。
[27] 图 3b和 3c分别示出了基于图 2a和 2b示出的区域布置在图 3a示出 替换页 (细则第 26条) 的部分中确定第一区域和第二区域的示意图。
[28] 图 4a是示出图 3a所示的部分 302中所包含的对象边 廓的示意 图。
[29] 图 4b是示出梯度计算单元根据差计算单元基于图 3b和 3c所示的第 一区域和第二区域计算的第一差和第二差所计算的梯度的示意图。
[30] 图 5 示出了根据本发明一个实施例的生成用于区分对象图像和非对 象图像的分类器的方法的流程图。
[31] 图 6的框图示出了根据本发明一个优选实施例的、生成用于区分对象 图像和非对象图像的分类器的训练单元的结构。
[32] 图 7示出了根据本发明一个优选实施例的、生成用于区分对象图像和 非对象图像的分类器的训练方法的流程图。
[33] 图 8的框图示出了根据本发明一个实施例的、对图像进行分类的设备 的结构。
[34] 图 9示出了示出了根据本发明一个实施例的、检测图像中的对象的方 法的流程图。
[35] 图 10的框图示出了才艮据本发明一个优选实施例的分类单元的结构。
[36] 图 11示出了根据本发明一个优选实施例的分类方法的流程图。
[37] 图 12是示出其中实现本发明的计算机的示例性结构的框图。
具体实施方式
[38] 下面参照附图来说明本发明的实施例。 应当注意, 为了清楚的目的, 附图和说明中省略了与本发明无关的、本领域普通技术人员已知的部件和 处理的表示和描述。
[39] 图 1的框图示出了根据本发明一个实施例的、生成用于区分对象图像 和非对象图像的分类器的设备 100的结构。
[40] 如图 1所示, 设备 100包括确定单元 101、 差计算单元 102、 梯度计 算单元 103和训练单元 104。
[41] 在采用静态图像特征来建立分类器的技术中,收集对象图像和非对象 图像, 从所收集的对象图像和非对象图像中提取特征, 利用 AdaBoost方 法对提取的特征进行选择和融合,得到区分对象图像和非对象图像的分类 器。 在 Ding等人的标题为 "A Robust Human Face Detecting Method In Complicated Background Image"的专利申请 WO 2008/151470中公开了 收集和准备这样的对象图像和非对象图像的方法 (参见说明书第 2页至第 3 页)。 所收集和准备的对象图像和非对象图像可作为设备 100的输入图 像。 设备 100从多个输入图像的每个图像中提取一组特征作为特征向量。
[42] 确定单元 101对于所述特征向量的每个特征,确定沿第一轴的方向布 置的多个第一区域, 和沿与所述第一轴相交 (例如, 以直角或非直角相交) 的第二轴的方向布置的多个第二区域。
[43] 要提取的特征通常基于输入图像中的像素。确定单元 101用于确定每 个要提取的特征所基于的输入图像中的像素。确定单元 101可根据预定区 域布置确定所基于的输入图像中的像素。
[44] 第一区域和第二区域的布置可以有各种方式。在一个例子中, 多个第 一区域的像素的位置的加权平均位置以及多个第二区域的像素的位置的 加权平均位置在所述第一轴和第二轴的交点的预定范围内。具体以第一区 域为例, 可以将第一区域的像素的位置表示为 (;^, yij), 其中 表示第 i 个第一区域中的第 j个像素在第一轴 (即 X轴)上的座标, 表示第 i个第 一区域中的第 j个像素在第二轴 (即 Y轴)上的座标。 可将第一区域的像素 的位置的加权平均位置 ( α, )定义如下:
Ν Μέ Ν Μέ
[45】 xa = j xij wi , ya = j yij wi
[46] 其中 为第一区域的数目, M,为第 个第一区域中像素的数目, 为第 个第一区域的权重, 并且 |>, =1。
[47] 进一步地,或可选地,在上述例子中,所有第一区域的权重可以相同, 也可以至少部分地不同。在不相同的情况下,可以为包含的像素较多的第 一区域分配较小的权重, 为包含的像素较少的第一区域分配较大的权重。
[48] 虽然前面以第一区域为例对加权平均位置进行了说明,然而上述说明 也适用于第二区域。
[49] 在另一个例子中, 区域可以是矩形区域, 第一区域是相接的, 并且第 二区域是相接的。
[50] 图 2是示出确定单元 101所确定的区域布置的其它例子的示意图。在 图 2中, X轴表示第一轴, Y轴表示第二轴, 并且矩形块的白色和黑色只 是用于区分的目的。 虽然图 2中的第一轴和第二轴被示出为相互正交的, 然而第一轴和第二轴也可以以非直角的角度相交。
[51] 根据一种区域布置,第一区域的数目和第二区域的数目均为二,第一 区域是相接的并且第二区域是相接的。在这种布置中, 第一轴和第二轴的 交点在第一区域的连接线上或连接点 (例如当矩形区域的顶点相接时)的 预定范围内 (例如基本重合), 并且在第二区域的连接线上或连接点的预定 范围内。
[52] 图 2a和图 2b示出了这种区域布置的一个例子。 具体地, 图 2a示出 了在第一轴上第一区域的布置, 其中白色矩形块 201 和黑色矩形块 202 均表示第一区域并且在连接线上相接,而第一轴和第二轴的交点在连接线 上。 图 2b示出了在第二轴上第二区域的布置, 其中白色矩形块 203和黑 色矩形块 204均表示第二区域并且在连接线上相接,而第一轴和第二轴的 交点在连接线上。虽然图 2a和图 2b中分别示出了第一轴和第二轴上区域 的布置, 但实际上反映的是将图 2a和图 2b合并, 即图 2a的第一轴和第 二轴分别与图 2b的第一轴和第二轴相同时的区域布置。 可选地, 矩形块 201与 202, 以及矩形块 203与 204可以通过各自的顶点彼此相接。
[53] 才艮据另一种区域布置,第一区域的数目和第二区域的数目均为二,第 一区域是间隔开的并且第二区域是间隔开的。在这种布置中, 第一轴和第 二轴的交点在第一区域的位置中心之间的中点和第二区域的位置中心之 间的中点的预定范围内。
[54] 图 2c和图 2d示出了这种区域布置的一个例子。 图 2c示出了在第一 轴上第一区域的布置,其中白色矩形块 205和黑色矩形块 206均表示第一 区域并且是间隔开的,而第一轴和第二轴的交点在白色矩形块 205和黑色 矩形块 206的位置中心之间的中点的预定范围内。 图 2d示出了在第二轴 上第二区域的布置,其中白色矩形块 207和黑色矩形块 208均表示第二区 域并且是间隔开的,而第一轴和第二轴的交点在白色矩形块 207和黑色矩 形块 208的位置中心之间的中点的预定范围内。虽然图 2c和图 2d中分别 示出了第一轴和第二轴上区域的布置, 但实际上反映的是将图 2c和图 2d 合并,即图 2c的第一轴和第二轴分别与图 2d的第一轴和第二轴相同时的 区域布置。
[55] 图 2g和图 2h示出了这种区域布置的另一个例子,其中矩形块的顶点 相对。 图 28示出了在第一轴上第一区域的布置, 其中白色矩形块 215和 黑色矩形块 216均表示第一区域并且是间隔开的,而第一轴和第二轴的交 点在白色矩形块 215和黑色矩形块 216的位置中心之间的中点的预定范围 内。 图 2h示出了在第二轴上第二区域的布置, 其中白色矩形块 217和黑 色矩形块 218均表示第二区域并且是间隔开的,而第一轴和第二轴的交点 在白色矩形块 217和黑色矩形块 218的位置中心之间的中点的预定范围 内。 虽然图 2g和图 2h中分别示出了第一轴和第二轴上区域的布置,但实 际上反映的是将图 2g和图 2h合并, 即图 2g的第一轴和第二轴分别与图 2h的第一轴和第二轴相同时的区域布置。
[56] 才艮据另一种区域布置,第一区域的数目和第二区域的数目均为三。在 这种布置中,第一轴和第二轴的交点分别在第一区域中居于中间的第一区 域内和第二区域中居于中间的第二区域内。
[57] 图 2e和图 2f示出了这种区域布置的一个例子。 图 2e示出了在第一 轴上第一区域的布置, 其中白色矩形块 210和黑色矩形块 209、 211均表 示第一区域,并且第一轴和第二轴的交点在居中的白色矩形块 210内。 图 2f示出了在第二轴上第二区域的布置,其中白色矩形块 213和黑色矩形块 212、 214 均表示第二区域, 并且第一轴和第二轴的交点在居中的白色矩 形块 213内。 虽然图 2e和图 2f中分别示出了第一轴和第二轴上区域的布 置, 但实际上反映的是将图 2e和图 2f合并, 即图 2e的第一轴和第二轴 分别与图 2f的第一轴和第二轴相同时的区域布置。 可选地, 矩形块 209、 210与 211, 以及矩形块 212、 213与 214可以是分离的, 而不是相接的。
[58] 需要注意,第一区域和第二区域的形状并不限于矩形,也可以是其它 形状, 例如多边形、 三角形、 圆形、 环形、 不规则形状。 第一区域和第二 区域的形状也可以是不同的, 并且不同第一 /第二区域的形状也可以是不 同的。
[59] 另外,在具有矩形形状的情况下,第一区域中的不同区域的边可以是 彼此平行的, 也可以是彼此相对旋转一个角度。 同样地, 在具有矩形形状 的情况下, 第二区域中的不同区域的边可以是彼此平行的,也可以是彼此 相对旋转一个角度。在具有矩形形状的情况下,矩形区域的相接包括通过 各自的边来相接 (即第一轴和第二轴的交点在这些边上), 和通过各自的角 部的顶点相接 (即第一轴和第二轴的交点在这些顶点处)。
[60] 还应注意,第一轴上布置的第一区域和第二轴上布置的第二区域的数 目不限于图 2所示的数目,并且第一区域的数目不必与第二区域的数目相 同,只要第一区域的像素的位置的加权平均位置以及第二区域的像素的位 置的加权平均位置在第一轴和第二轴的交点的预定范围内。优选地, 第一 区域的数目和第二区域的数目均不超过 3。
[61] 还应注意,第一区域的相对位置关系和第二区域的相对位置关系可以 是任意的, 例如第一轴上布置的第一区域可以是相接的、分离的、部分相 接的、 部分分离的, 第二轴上布置的第二区域可以是相接的、 分离的、 部 分相接的、部分分离的,只要第一区域的像素的位置的加权平均位置以及 第二区域的像素的位置的加权平均位置在第一轴和第二轴的交点的预定 范围内。
[62] 在收集的对象图像中, 对象的边缘轮廓表现出区别于非对象的特征。 对象的边缘轮廊在对象图像中可能具有各种分布。为了能够提取出足够的 反映对象的边 廓的特征,确定单元 101可以在输入图像的不同位置处 的不同大小的部分内确定第一区域和第二区域,以获得该部分内的边 廓特征。
[63] 图 3a 示出了对象(人体)的边缘轮廊的分布的一个例子。 如图 3a 所示, 在输入图像中, 人体的边缘轮廓存在于例如部分 301、 302、 303的 大小不同、 位置不同的各个部分中。
[64] 图 3b和 3c示出了基于图 2a和 2b示出的区域布置在图 3a示出的部 分 302中确定第一区域和第二区域的示意图。 在图 3b中, 附图标记 304 指示第一区域的布置。 在图 3c中, 附图标记 305指示第一区域的布置。
[65] 在一个实施例中,确定单元 101可以基于一种区域布置在输入图像的 不同位置确定第一区域和第二区域。接着通过改变这种区域布置中区域大 小和 /或区域纵横比来得到新的区域布置, 并且基于新的区域布置在输入 图像的不同位置确定第一区域和第二区域。重复此过程,直到这种区域布 置的所有可能区域大小或区域纵横比均被尝试过。
[66] 另外, 或可选地, 在上述实施例中, 确定单元 101可以通过改变区域 布置中区域的相对位置关系来得到新的区域布置。
[67] 另外, 或可选地, 在上述实施例中, 确定单元 101可以通过改变区域 布置中区域的数目来得到新的区域布置。
[68] 另外, 或可选地, 在上述实施例中, 确定单元 101可以通过改变区域 布置中区域的形状来得到新的区域布置。
[69] 确定单元 101 基于一种区域布置在输入图像中的一个位置确定的第 一区域和第二区域决定了一个要提取的特征。概括地讲,至少两个特征所 基于的区域布置是不同的。例如, 不同区域布置之间的差别可以包括下述 中的一个或多个: 区域的相对位置关系、 区域的数目、 区域的形状、 区域 的大小、 区域的纵横比。
[70] 回到图 1,对于确定单元 101基于每个区域布置在输入图像中的每个 位置确定的第一区域和第二区域,差计算单元 102计算第一区域的像素和 或均值(灰度)之间的第一差 dx, 和第二区域的像素和或均值(灰度) 之间的第二差 dy。
[71] 例如,对于图 2a和 2b示出的区域布置,可以通过下式计算第一差和 第二差:
第一差 =矩形块 202的像素和或均值 -矩形块 201的像素和或均值, 第二差 =矩形块 202的像素和或均值 -矩形块 201的像素和或均值。
[72] 再例如,对于图 2c和 2d示出的区域布置,可以通过下式计算第一差 和第二差:
第一差 =矩形块 206的像素和或均值 -矩形块 205的像素和或均值, 第二差 =矩形块 208的像素和或均值 -矩形块 207的像素和或均值。
[73] 再例如, 对于图 2e和 2f示出的区域布置, 可以通过下式计算第一差 和第二差:
第一差 =矩形块 209的像素和或均值 +矩形块 211的像素和或均值 - 矩形块 210的像素和或均值 x2,
第二差 =矩形块 212的像素和或均值 +矩形块 214的像素和或均值 -矩 形块 213的像素和或均值 x2。
[74] 再例如,对于图 28和211示出的区域布置,可以通过下式计算第一差 和第二差:
第一差 =矩形块 216的像素和或均值 -矩形块 215的像素和或均值, 第二差 =矩形块 218的像素和或均值 -矩形块 217的像素和或均值。
[75] 计算轴向上区域的像素和或均值(灰度)之间的差的目的是获得反映 相应轴向上像素灰度的变化的信息。对于不同的区域布置,可以有相应的 计算第一差和第二差的方法, 只要其能够反映这种变化。
[76] 回到图 1,梯度计算单元 103根据差计算单元计算的第一差和第二差 计算梯度大小和梯度方向, 以形成所提取的特征。
[77] 可根据下式计算梯度的方向和大小: 梯度方向= ( ) (1 ),
dy
梯度大小=^^2 + 2 ( 2 )。
[78] 才艮据上式 (1), 梯度方向的角度范围为 0到 180度。 在一个可选实施 例中, 可以才艮据下式计算梯度方向:
梯度方向= "2("¾ = arg (―) - π ( 1' )。
dy dy
[79] 根据上式 (1'), 梯度方向的角度范围为 0到 360度。
[80] 图 4a是示出图 3a所示的部分 302中所包含的对象边 廊的示意 图。如图 4a所示, 边缘 401示意性地表示在部分 302中包含的边缘轮廊。
[81] 图 4b是示出梯度计算单元 103根据差计算单元 102基于图 3b和 3c 所示的第一区域和第二区域计算的第一差和第二差所计算的梯度方向的 示意图。 在图 4b中, 斜线 402的法线 403表示所计算出的梯度方向。
[82] 由于根据沿两个方向布置的、协同定位的区域的像素来计算包含梯度 方向和梯度大小的特征,所提取的特征能够更加真实地反映相应图像部分 中对象边缘的分布。相应地, 基于这样的特征产生的分类器, 能够更加鲁 棒地检测图像中例如人或动物的对象, 尤其是具有各种姿态的对象。
[83] 针对每个输入图像提取的所有特征形成一个特征向量。
[84] 回到图 1, 训练单元 104根据所提取的特征向量训练出分类器。
[85] 可采用方向性梯度直方图, 通过例如 SVM (支持向量机) 的机器学 习方法来根据上述实施例中获得的特征向量来训练出分类器。 在例如 Dalai等人的 "Histograms of Oriented Gradients for Human Detection", Proc.of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005:886 -893 和 Triggs等人的 "Human Detection Using Oriented Histograms of Flow and Appearance", Proc. European Conference on Computer Vision, 2006的文献中描述了这种根据梯度特征 训练分类器的方法。
[86] 图 5 示出了根据本发明一个实施例的生成用于区分对象图像和非对 象图像的分类器的方法 500的流程图。
[87] 如图 5所示, 方法 500从步骤 501开始。 步骤 503 505和 507用于 从当前输入图像中提取一组特征作为特征向量。 在步骤 503, 对于特征向 量的每个特征,确定沿第一轴的方向布置的多个第一区域,和沿与所述第 一轴相交 (例如, 以直角或非直角相交)的第二轴的方向布置的多个第二区 域。
[88] 如参照图 1所描述的, 可通过 Ding等人的标题为 "A Robust Human Face Detecting Method In Complicated Background Image"的专利申请 WO 2008/151470中公开的方法 (参见说明书第 2页至第 3页)来收集和准 备包括对象图像和非对象图像的输入图像。
[89] 第一区域和第二区域的布置可以是前面结合图 1 的实施例说明的区 域布置。
[90] 在步骤 503中,可以在输入图像的不同位置处的不同大小的部分内确 定第一区域和第二区域, 以获得该部分内的边缘轮廓特征。
[91] 在方法 500的一个修改实施例中,在步骤 503可以基于一种区域布置 在输入图像的不同位置确定第一区域和第二区域。接着通过改变这种区域 布置中区域大小和 /或区域纵横比来得到新的区域布置, 并且基于新的区 域布置在输入图像的不同位置确定第一区域和第二区域。重复此过程, 直 到这种区域布置的所有可能区域大小或区域纵横比均被尝试过。
[92] 另外, 或可选地, 在上述实施例中, 在步骤 503可以通过改变区域布 置中区域的相对位置关系来得到新的区域布置。
[93] 另外, 或可选地, 在上述实施例中, 在步骤 503可以通过改变区域布 置中区域的数目来得到新的区域布置。
[94] 另外, 或可选地, 在上述实施例中, 在步骤 503可以通过改变区域布 置中区域的形状来得到新的区域布置。
[95] 在步骤 503,基于一种区域布置在输入图像中的一个位置确定的第一 区域和第二区域决定了一个要提取的特征。概括地讲,至少两个特征所基 于的区域布置是不同的。例如, 不同区域布置之间的差别可以包括下述中 的一个或多个: 区域的相对位置关系、 区域的数目、 区域的形状、 区域的 大小、 区域的纵横比。
[96] 在步骤 505, 计算第一区域的像素和或均值之间的第一差, 和第二区 域的像素和或均值之间的第二差。可通过前面结合图 1的实施例描述的方 法来计算第一差和第二差。
[97] 接着在步骤 507,根据计算的第一差和第二差计算梯度大小和梯度方 向, 以形成所提取的特征。 可根据公式(1 ) (或(1,))和(2 )来计算梯 度方向和梯度大小。
[98] 接着在步骤 509, 确定对于当前输入图像, 是否存在未提取的特征。 如果存在, 则返回步骤 503, 以执行提取下一个特征的过程; 否则, 执行 步骤 511。
[99] 在步骤 511, 确定是否还有未提取特征向量的输入图像。 如果有, 则 返回步骤 503, 以执行提取下一个输入图像的特征向量的过程; 否则, 方 法前进到步骤 513。
[100] 在方法 500中, 由于根据沿两个方向布置的、协同定位的区域的像素 来计算包含梯度方向和梯度大小的特征,所提取的特征能够更加真实地反 映相应图像部分中对象边缘的分布。相应地,基于这样的特征产生的分类 器, 能够更加鲁棒地检测图像中例如人或动物的对象,尤其是具有各种姿 态的对象。
[101] 针对每个输入图像提取的所有特征形成一个特征向量。
[102] 在步骤 513, 根据所提取的特征向量训练出分类器。
[103] 可采用方向性梯度直方图, 通过例如 SVM (支持向量机) 的机器学 习方法来根据上述实施例中获得的特征向量来训练出分类器。 在例如 Dalai等人的 "Histograms of Oriented Gradients for Human Detection", Proc.of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005:886 -893 和 Triggs等人的 "Human Detection Using Oriented Histograms of Flow and Appearance"* Proc. European Conference on Computer Vision, 2006的文献中描述了这种根据梯度特征 训练分类器的方法。
[104] 方法 500在步骤 515结束。
[105] 如下面将要描述的,也可以不采用方向性梯度直方图来根据上述实施 例中获得的梯度特征来训练出分类器。
[106] 图 6的框图示出了根据本发明一个优选实施例的、生成用于区分对象 图像和非对象图像的分类器的训练单元 104的结构。
[107] 如图 6所示,训练单元 104包括转换单元 601和分类器生成单元 602。
[108] 转换单元 601对多个特征向量的至少一个维的特征进行转换,其中被 转换的特征包括梯度方向和梯度大小。例如,特征向量可以是前面参照图 1和图 5描述的实施例中产生的特征向量。 转换单元 601进行的转换包括 将梯度方向转换为多个预定区间中该梯度方向属于的区间。
[109] 例如, 梯度方向的角度范围(即多个预定区间的角度覆盖范围)为 180 度。 可以将这个范围划分为若干个预定区间 (也称为梯度方向区间), 例如 划分为 0到 60度、 60度到 120度和 120度到 180度三个区间。 当然, 也 可以进行其它的划分。梯度方向的角度范围也可以是 360度。预定区间的 数目优选为 3至 15。 预定区间的数目越大, 则角度划分越细致, 更加利 于得到更强分类能力 (更低的错误率), 但在检测时更容易产生过学习现 象, 使分类效果变差。 预定区间的数目越小, 则角度划分越粗, 分类能力 越弱, 但对角度变化越不敏感, 有利于提高姿势变化的鲁棒性。 可以根据 具体实现的需要在分类能力和姿势鲁棒性之间取得折衷,以确定预定区间 的数目。
[110] 转换单元 601根据特征的梯度方向所处于的区间,将梯度方向转换为 相应的区间。
[111] 假定有 N个预定区间, 并且特征向量表示为 < ,···, fM>, 其中 ft包括 梯度大小 Ii和梯度方向 Oi。对于要转换的特征 ft, 经过转换的特征表示为 fi, 其中 f,i包括梯度大小 Ii和区间 。
[112] 可以根据各个特征向量的同一个维的特征 ft来生成与该维相应的分 类器。 该分类器可表示为 hi(I, 0), 其中 I表示梯度大小, O表示梯度方 向。 分类器包括分别与 N个预定区间 Kj对应的 N个子分类器 (1), 0<j<N+l , 用于对梯度方向属于相应预定区间的特征进行分类。 每个子分 类器 (1)具有相应的阈值 θί 和基于该阈值确定的分类 aij和 · (对象、 非对象)。 (1)的处理可表示为: 如果 1<θί 则 1^·(Ι)= ί』·; 否则 (1)= 。 对于每个子分类器 (Ι), 可以根据各个转换的特征向量的特征 f'i中区间 与区间 Kj相同的特征的梯度大小的分布, 学习得到阈值 θί 和分类 aij
[113] 分类器生成单元 602针对上述至少一个维的每个维,生成包含分别对 应于所述预定区间的子分类器的分类器, 其中, 对于每个所述预定区间, 根据所述特征向量中区间与所述预定区间相同的该维特征的梯度大小的 分布, 获得相应子分类器的阈值和基于该阈值确定的分类。 可选地, 也可 以进一步获得所确定的分类和的可靠性的度量。
[114] 在一个简单实现中,可以只针对一个维进行转换和分类器生成,所生 成的分类器作为用于区分对象图像和非对象图像的分类器。 [115] 优选地, 上述至少一个维可以包括特征向量的至少两个维或所有维。 在这样的情况下,可以分别生成与每一个维相应的分类器, 并且根据生成 的各个分类器获得最终的分类器。
[116] 可通过已知的方法来将对应于各个维的分类器组合成最终的分类器。 例如, Adaboost方法是一种用来分类的方法, 可用来把针对各个维生成 的分类器融合在一起, 组合出新的很强的分类器。
[117] 在 Adaboost方法中, 为每个样本设置权重, 通过迭代的方法组合分 类器。每次迭代时, 当分类器对某些样本正确分类时, 则减少这些样本的 权值; 当错误分类时, 则增加这些样本的权重, 让学习算法在后续的学习 中集中对比较难的训练样本进行学习,最终得到一个识别准确率理想的分 类器。
[118] 在 Paul Viola和 Michael Jones 的文章 "Robust Real-time Object Detection" , Second International Workshop On Statistical And Computational Theories Of Vision - Modeling, Learning, Computing, And Sampling, Vancouver, Canada, July 13, 2001中描述这种选择和融合 多个分类器以形成最终分类器的技术。
[119] 在一个优选实施例中,预定区间之一为代表弱梯度的区间。在这种情 况下,转换单元 601在特征的梯度大小小于预定阈值的情况下,将梯度方 向转换为代表弱梯度的区间。 对于和代表弱梯度的区间相应的子弱分类 器, 无论梯度大小如何, 均将特征分类为非对象。
[120] 图 7示出了根据本发明一个优选实施例的、生成用于区分对象图像和 非对象图像的分类器的训练方法 700的流程图。
[121] 如图 7所示, 方法 700从步骤 701开始。 在步骤 703, 对多个特征向 量的至少一个维的特征进行转换,其中被转换的特征包括梯度方向和梯度 大小。例如,特征向量可以是前面参照图 1和图 5描述的实施例中产生的 特征向量。所进行的转换包括将梯度方向转换为多个预定区间中该梯度方 向属于的区间。
[122] 在步骤 705, 针对所转换的特征向量的当前维, 生成包含分别对应于 所述预定区间的子分类器的分类器, 其中, 对于每个所述预定区间, 根据 所述特征向量中区间与所述预定区间相同的当前维特征的梯度大小的分 布, 获得相应子分类器的阈值和基于该阈值确定的分类。 可选地, 也可以 进一步获得所确定的分类和的可靠性的度量。 [123] 在步骤 707,确定是否有未生成分类器的维。如果有,则返回步骤 705 生成下一个维的分类器; 否则方法在步骤 709结束。
[124] 在一个简单实现中,可以只针对一个维进行转换和分类器生成,所生 成的分类器作为用于区分对象图像和非对象图像的分类器。
[125] 优选地, 上述至少一个维可以包括特征向量的至少两个维或所有维。 在这样的情况下,可以分别生成与每一个维相应的分类器, 并且根据生成 的各个分类器获得最终的分类器。
[126] 可通过已知的方法来将对应于各个维的分类器组合成最终的分类器, 例如 Paul Viola等人的 AdaBoost方法来根据所生成的分类器来形成最终 的分类器。
[127] 在一个优选实施例中,预定区间之一为代表弱梯度的区间。在这种情 况下, 在步骤 703中, 在特征的梯度大小小于预定阈值的情况下, 将梯度 方向转换为代表弱梯度的区间。对于和代表弱梯度的区间相应的子弱分类 器, 无论梯度大小如何, 均将特征分类为非对象。
[128] 图 8的框图示出了根据本发明一个实施例的、对图像进行分类的设备 800的结构。
[129] 如图 8所示, 设备 800包括确定单元 801、 差计算单元 802、 梯度计 算单元 803和分类单元 804。
[130] 输入设备 800 的图像可以是通过扫描窗口从要处理的图像中获得预 定尺寸的图像。 可通过在 Ding 等人的标题为 "A Robust Human Face Detecting Method In Complicated Background Image"的专利申清 WO 2008/151470中描述的方法来获得图像 (参见说明书第 5页)。
[131] 在这个实施例中,所要提取的特征向量是分类单元 804所使用的分类 器所基于的特征向量。
[132] 确定单元 801对于所述特征向量的每个特征,确定沿第一轴的方向布 置的多个第一区域, 和沿与所述第一轴相交 (例如, 以直角或非直角相交) 的第二轴的方向布置的多个第二区域。
[133] 确定单元 801 所基于的第一区域和第二区域的区域布置可以是前面 结合确定单元 101描述的区域布置。
[134] 对于确定单元 801 基于每个区域布置在输入图像中的每个位置确定 的第一区域和第二区域, 差计算单元 802 计算第一区域的像素和或均值 (灰度)之间的第一差 dx, 和第二区域的像素和或均值(灰度)之间的 第二差 dy。 可根据公式( 1 ) (或( 1' ) )和( 2 )来计算梯度方向和梯度大 小。
[135] 梯度计算单元 803根据差计算单元 802计算的第一差和第二差计算梯 度大小和梯度方向, 以形成所提取的特征。可以采用前面结合梯度计算单 元 103描述的方法来计算梯度大小和梯度方向。
[136] 针对输入图像提取的所有特征形成一个特征向量。分类单元 804根据 所提取的特征向量,对输入图像进行分类。分类单元 804所采用的分类器 可以是在前面的实施例中生成的分类器,例如采用方向性梯度直方图生成 的分类器、 基于梯度方向区间生成的分类器。
[137] 图 9示出了示出了根据本发明一个实施例的、对图像进行分类的方法 900的流程图。
[138] 如图 9所示, 方法 900从步骤 901开始。 步骤 903、 905和 907用于 从当前输入图像中提取一组特征作为特征向量。所要提取的特征向量是所 使用的分类器所基于的特征向量。输入图像可以是通过扫描窗口从要处理 的图像中获得预定尺寸的图像。 可通过在 Ding等人的标题为 "A Robust Human Face Detecting Method In Complicated Background Image"的专 利申请 WO 2008/151470中描述的方法 ^获得图像 (参见说明书第 5页)。
[139] 在步骤 903, 对于特征向量的每个特征, 确定沿第一轴的方向布置的 多个第一区域, 和沿与所述第一轴相交 (例如, 以直角或非直角相交)的第 二轴的方向布置的多个第二区域。步骤 903所基于的第一区域和第二区域 的区域布置可以是前面结合确定单元 101描述的区域布置。
[140] 接着在步骤 907,根据计算的第一差和第二差计算梯度大小和梯度方 向, 以形成所提取的特征。 可根据公式(1 ) (或(1,))和(2 )来计算梯 度方向和梯度大小。
[141] 接着在步骤 909, 确定对于当前输入图像, 是否存在未提取的特征。 如果存在, 则返回步骤 903, 以执行提取下一个特征的过程; 否则, 执行 步骤 911。
[142] 针对输入图像提取的所有特征形成一个特征向量。 在步骤 911, 根据 所提取的特征向量,对输入图像进行分类。 步骤 911所采用的分类器可以 是在前面的实施例中生成的分类器,例如采用方向性梯度直方图生成的分 类器、 基于梯度方向区间生成的分类器。 [143] 方法 900在步骤 913结束。
[144] 图 10的框图示出了根据本发明一个优选实施例的分类单元 104的结 构。
[145] 如图 12所示, 分类单元 104包括分类器 1001至 100M, M为所提取 的特征向量中特征的数目。 每个分类器对应于一个特征。 分类器 1001至 100M可以是前面参照图 6描述的分类器。 以分类器 1001为例, 分类器 1001包括多个子分类器 1001-1至 1001-N。 如前面参照图 6所描述的, 每 个子分类器 1001-1至 1001-N对应于一个不同的梯度方向区间,并且每个 梯度方向区间具有相应的阈值。
[146] 对于所提取的特征向量的每个特征,在相应分类器 (例如分类器 1001) 中, 在该特征的梯度方向属于的一个子分类器 (例如子分类器 1001-1 至 1001-N之一)所对应的梯度方向区间的情况下, 由该子分类器比较该特征 的梯度大小和该梯度方向区间的相应阈值,并且根据比较结果产生分类结 果。 分类结果可以是图像的分类 (对象、 非对象)。 可选地, 分类结果还可 以包含图像分类的可靠性。
[147] 在未示出的单元中,可通过已知的方法,把各个分类器根据特征向量 的相应特征产生的分类结果组合成最终的分类结果。 例如可采用 Adaboost方法。
[148] 图 11示出了根据本发明一个优选实施例的分类方法的流程图。 该方 法可用来实现图 9的步骤 911。
[149] 如图 11所示, 方法从步骤 1101开始。在步骤 1103,对于所提取的特 征向量的一个特征, 确定与该特征相关的多个梯度方向区间 (如参照图 6 所描述的)中该特征的梯度方向所属的梯度方向区间。 如参照图 6所描述 的, 每个梯度方向区间具有相应的阈值。
[150] 在步骤 1105, 比较该特征的梯度大小和所确定的梯度方向区间的相 应阈值。
[151] 在步骤 1107, 根据比较结果产生分类结果。 分类结果可以是图像的 分类 (对象、 非对象)。 可选地, 分类结果还可以包含图像分类的可靠性。
[152] 在步骤 1109, 确定特征向量中是否还有未处理的特征。 如果有, 则 返回步骤 1103继续处理下一个特征。如果没有,则方法在步骤 1111结束。
[153] 图 12是示出其中实现本发明的计算机的示例性结构的框图。 [154] 本发明的设备和方法实现环境如图 12所示。
[155] 在图 12中, 中央处理单元 (CPU)1201根据只读映射数据 (ROM)1202 中存储的程序或从存储部分 1208加载到随机存取映射数据 (RAM)1203的 程序执行各种处理。在 RAM 1203中, 也根据需要存储当 CPU 1201执行 各种处理等等时所需的数据。
[156] CPU 1201、 ROM 1202和 RAM 1203经由总线 1204彼此连接。 输入 /输出接口 1205也连接到总线 1204。
[157] 下述部件连接到输入 /输出接口 1205: 输入部分 1206, 包括键盘、 鼠 标等等; 输出部分 1207, 包括显示器, 比如阴极射线管 (CRT)、 液晶显示 器 (LCD)等等, 和扬声器等等; 存储部分 1208, 包括硬盘等等; 和通信部 分 1209, 包括网络接口卡比如 LAN卡、调制解调器等等。通信部分 1209 经由网络比如因特网执行通信处理。
[158] 根据需要, 驱动器 1210也连接到输入 /输出接口 1205。 可拆卸介质 1211 比如磁盘、 光盘、 磁光盘、 半导体映射数据等等根据需要被安装在 驱动器 1210上, 使得从中读出的计算才 ^序根据需要被安装到存储部分 1208中。
[159] 在通过软件实现上述步骤和处理的情况下,从网络比如因特网或存储 介质比如可拆卸介盾 1211安装构成软件的程序。
[160] 本领域的技术人员应当理解, 这种存储介质不局限于图 12所示的其 中存储有程序、 与方法相分离地分发以向用户提供程序的可拆卸介盾 1211。 可拆卸介质 1211 的例子包含磁盘、 光盘 (包含光盘只读映射数据 (CD-ROM)和数字通用盘 (DVD))、 磁光盘(包含迷你盘 (MD)和半导体映 射数据。 或者, 存储介盾可以是 ROM 1202、 存储部分 1208中包含的硬 盘等等, 其中存有程序, 并且与包含它们的方法一起被分发给用户。
[161] 在前面的说明书中参照特定实施例描述了本发明。 然而本领域的 普通技术人员理解, 在不偏离如权利要求书限定的本发明的范围的前 提下可以进行各种修改和改变。

Claims

权利 要求 书
1. 一种对图像进行分类的方法, 包括:
从所述图像中提取一组特征作为特征向量, 其中所述提取包括:
对于所述特征向量的每个特征,确定沿第一轴的方向布置的多个 第一区域, 和沿与所述第一轴相交的第二轴的方向布置的多个第二区域; 计算所述多个第一区域的像素和或均值之间的第一差,和所述多 个第二区域的像素和或均值之间的第二差; 和
根据所述第一差和第二差计算梯度大小和梯度方向,以形成所述 每个特征; 和
根据所述提取的特征向量, 对所述图像进行分类。
2. 如权利要求 1所述的方法, 其中所述区域为矩形区域, 所述第一 区域是相接的, 并且所述第二区域是相接的。
3. 如权利要求 1所述的方法, 其中,
在所述第一区域的数目和所述第二区域的数目均为二,所述第一区域 是相接的并且所述第二区域是相接的情况下,所述第一轴和第二轴的交点 在所述第一区域的连接线上或连接点的预定范围内,并且在所述第二区域 的连接线上或连接点的预定范围内;
在所述第一区域的数目和所述第二区域的数目均为二,所述第一区域 是间隔开的并且所述第二区域是间隔开的情况下,所述第一轴和第二轴的 交点在所述第一区域的位置中心之间的中点和所述第二区域的位置中心 之间的中点的预定范围内;
在所述第一区域的数目和所述第二区域的数目均为三的情况下,所述 第一轴和第二轴的交点分别在所述第一区域中居于中间的第一区域内和 所述第二区域中居于中间的第二区域内。
4. 如权利要求 1所述的方法, 其中至少两个所述特征所基于的区域 布置之间的差别包括下述中的一个或多个: 区域的相对位置关系、 区域的 数目、 区域的形状、 区域的大小、 区域的纵横比。
5. 如权利要求 1所述的方法, 其中对所述图像进行分类包括: 对于每个所述特征,确定与所述特征相关的多个梯度方向区间中所述 特征的梯度方向所属的梯度方向区间, 每个梯度方向区间具有相应的阈 值;
比较所述特征的梯度大小和所确定的梯度方向区间的相应阈值; 和 才艮据比较结果产生分类结果。
6. 如权利要求 5所述的方法, 其中所述多个梯度方向区间的数目为 3至 15。
7. 如权利要求 5所述的方法, 其中多个梯度方向区间所覆盖的范围 为 180度或 360度。
8. 一种对图像进行分类的设备, 所述设备从所述图像中提取一组特 征作为特征向量, 并且包括:
确定单元,其对于所述特征向量的每个特征,确定沿第一轴的方向布 置的多个第一区域,和沿与所述第一轴相交的第二轴的方向布置的多个第 二区域;
差计算单元, 其计算所述多个第一区域的像素和或均值之间的第一 差, 和所述多个第二区域的像素和或均值之间的第二差; 和
梯度计算单元, 其根据所述第一差和第二差计算梯度大小和梯度方 向, 以形成所述每个特征; 和
分类单元, 其根据所述提取的特征向量, 对所述图像进行分类。
9. 如权利要求 8所述的设备, 其中所述区域为矩形区域, 所述第一 区域是相接的, 并且所述第二区域是相接的。
10. 如权利要求 8所述的设备, 其中,
在所述第一区域的数目和所述第二区域的数目均为二,所述第一区域 是相接的并且所述第二区域是相接的情况下,所述第一轴和第二轴的交点 在所述第一区域的连接线上或连接点的预定范围内,并且在所述第二区域 的连接线上或连接点的预定范围内;
在所述第一区域的数目和所述第二区域的数目均为二,所述第一区域 是间隔开的并且所述第二区域是间隔开的情况下,所述第一轴和第二轴的 交点在所述第一区域的位置中心之间的中点和所述第二区域的位置中心 之间的中点的预定范围内; 在所述第一区域的数目和所述第二区域的数目均为三的情况下,所述 第一轴和第二轴的交点分别在所述第一区域中居于中间的第一区域内和 所述第二区域中居于中间的第二区域内。
11. 如权利要求 8所述的设备, 其中至少两个所述特征所基于的区域 布置之间的差别包括下述中的一个或多个: 区域的相对位置关系、 区域的 数目、 区域的形状、 区域的大小、 区域的纵横比。
12. 如权利要求 8所述的设备, 其中对于每个所述特征, 所述分类单 元包括相应的分类器, 所述分类器包括:
多个子分类器,每个子分类器对应于一个不同的梯度方向区间,每个 梯度方向区间具有相应的阈值,
其中每个子分类器被配置为在所述特征的梯度方向属于所述子分类 器所对应的梯度方向区间的情况下,比较所述特征的梯度大小和所述梯度 方向区间的相应阈值, 并且根据比较结果产生分类结果。
13. 如权利要求 12所述的设备, 其中所有所述梯度方向区间的数目 为 3至 15。
14. 如权利要求 12所述的设备, 其中所有所述梯度方向区间所覆盖 的范围为 180度或 360度。
PCT/CN2010/072867 2009-05-20 2010-05-18 对图像进行分类的方法和设备 WO2010133161A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP10777364A EP2434431A1 (en) 2009-05-20 2010-05-18 Method and device for classifying image
US13/319,914 US20120093420A1 (en) 2009-05-20 2010-05-18 Method and device for classifying image
JP2012511134A JP5545361B2 (ja) 2009-05-20 2010-05-18 画像分類方法、装置、プログラム製品および記憶媒体

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200910135298.6 2009-05-20
CN200910135298.6A CN101894262B (zh) 2009-05-20 2009-05-20 对图像进行分类的方法和设备

Publications (1)

Publication Number Publication Date
WO2010133161A1 true WO2010133161A1 (zh) 2010-11-25

Family

ID=43103450

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/072867 WO2010133161A1 (zh) 2009-05-20 2010-05-18 对图像进行分类的方法和设备

Country Status (5)

Country Link
US (1) US20120093420A1 (zh)
EP (1) EP2434431A1 (zh)
JP (1) JP5545361B2 (zh)
CN (1) CN101894262B (zh)
WO (1) WO2010133161A1 (zh)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9767354B2 (en) 2009-02-10 2017-09-19 Kofax, Inc. Global geographic information retrieval, validation, and normalization
CN102713974B (zh) * 2010-01-06 2015-09-30 日本电气株式会社 学习装置、识别装置、学习识别***和学习识别装置
CN102609713A (zh) * 2011-01-20 2012-07-25 索尼公司 对图像进行分类的方法和设备
WO2012139241A1 (en) 2011-04-11 2012-10-18 Intel Corporation Hand gesture recognition system
EP2774080A4 (en) * 2011-11-01 2015-07-29 Intel Corp OBJECT DETECTION BY MEANS OF EXTENDED SURFFUNCTIONS
US9165188B2 (en) 2012-01-12 2015-10-20 Kofax, Inc. Systems and methods for mobile image capture and processing
US10146795B2 (en) 2012-01-12 2018-12-04 Kofax, Inc. Systems and methods for mobile image capture and processing
JP2013161126A (ja) * 2012-02-01 2013-08-19 Honda Elesys Co Ltd 画像認識装置、画像認識方法および画像認識プログラム
US10127636B2 (en) 2013-09-27 2018-11-13 Kofax, Inc. Content-based detection and three dimensional geometric reconstruction of objects in image and video data
US9355312B2 (en) * 2013-03-13 2016-05-31 Kofax, Inc. Systems and methods for classifying objects in digital images captured using mobile devices
US20140316841A1 (en) 2013-04-23 2014-10-23 Kofax, Inc. Location-based workflows and services
CN103345631B (zh) * 2013-06-04 2016-12-28 北京大学深圳研究生院 图像特征提取、训练、检测方法及模块、装置、***
US9386235B2 (en) 2013-11-15 2016-07-05 Kofax, Inc. Systems and methods for generating composite images of long documents using mobile video data
US9760788B2 (en) 2014-10-30 2017-09-12 Kofax, Inc. Mobile document detection and orientation based on reference object characteristics
US10242285B2 (en) 2015-07-20 2019-03-26 Kofax, Inc. Iterative recognition-guided thresholding and data extraction
US9830528B2 (en) * 2015-12-09 2017-11-28 Axis Ab Rotation invariant object feature recognition
WO2019010704A1 (zh) * 2017-07-14 2019-01-17 深圳市柔宇科技有限公司 全景图像、视频的识别方法、分类器建立方法及电子装置
US10803350B2 (en) 2017-11-30 2020-10-13 Kofax, Inc. Object detection and image cropping using a multi-detector approach
JP6901386B2 (ja) * 2017-12-08 2021-07-14 株式会社東芝 勾配推定装置、勾配推定方法、プログラムおよび制御システム
US11315352B2 (en) * 2019-05-08 2022-04-26 Raytheon Company Calculating the precision of image annotations

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5291563A (en) * 1990-12-17 1994-03-01 Nippon Telegraph And Telephone Corporation Method and apparatus for detection of target object with improved robustness
WO2008151470A1 (fr) 2007-06-15 2008-12-18 Tsinghua University Procédé de détection robuste de visage humain dans une image d'arrière-plan compliquée

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3606430B2 (ja) * 1998-04-14 2005-01-05 松下電器産業株式会社 画像整合性判定装置
JP2005044330A (ja) * 2003-07-24 2005-02-17 Univ Of California San Diego 弱仮説生成装置及び方法、学習装置及び方法、検出装置及び方法、表情学習装置及び方法、表情認識装置及び方法、並びにロボット装置
CN100405388C (zh) * 2004-05-14 2008-07-23 欧姆龙株式会社 特定被摄体检测装置
JP2006268825A (ja) * 2005-02-28 2006-10-05 Toshiba Corp オブジェクト検出装置、学習装置、オブジェクト検出システム、方法、およびプログラム
JP4764172B2 (ja) * 2006-01-13 2011-08-31 財団法人電力中央研究所 画像処理による移動体候補の検出方法及び移動体候補から移動体を検出する移動体検出方法、移動体検出装置及び移動体検出プログラム
US7693301B2 (en) * 2006-10-11 2010-04-06 Arcsoft, Inc. Known face guided imaging method
KR101330636B1 (ko) * 2007-01-24 2013-11-18 삼성전자주식회사 얼굴시점 결정장치 및 방법과 이를 채용하는 얼굴검출장치및 방법
US8325983B2 (en) * 2008-09-22 2012-12-04 Samsung Electronics Co., Ltd. Combination detector and object detection method using the same
US20100091127A1 (en) * 2008-09-30 2010-04-15 University Of Victoria Innovation And Development Corporation Image reconstruction method for a gradient camera
JP2010204947A (ja) * 2009-03-03 2010-09-16 Toshiba Corp オブジェクト検出装置、オブジェクト検出方法、及び、プログラム
US8831304B2 (en) * 2009-05-29 2014-09-09 University of Pittsburgh—of the Commonwealth System of Higher Education Blood vessel segmentation with three-dimensional spectral domain optical coherence tomography
US8447139B2 (en) * 2010-04-13 2013-05-21 International Business Machines Corporation Object recognition using Haar features and histograms of oriented gradients

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5291563A (en) * 1990-12-17 1994-03-01 Nippon Telegraph And Telephone Corporation Method and apparatus for detection of target object with improved robustness
WO2008151470A1 (fr) 2007-06-15 2008-12-18 Tsinghua University Procédé de détection robuste de visage humain dans une image d'arrière-plan compliquée

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
"Proceeding of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 31 Dec. 2005 (31.12.2005)", 31 December 2005, article NAVNEET DALAL ET AL.: "Histogram of Oriented Gradients for Human Detection", XP031330347 *
DALAL ET AL.: "Histograms of Oriented Gradients for Human Detection", PROC. OF IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2005, pages 886 - 893, XP010817365, DOI: doi:10.1109/CVPR.2005.177
PAUL VIOLA ET AL.: "Robust Real-time Object Detection", SECOND INTERNATIONAL WORKSHOP ON STATISTICAL AND COMPUTATIONAL THEORIES OF VISION-MODELING, LEARNING, COMPUTING, AND SAMPLING, 13 July 2001 (2001-07-13), VANCOUVER, CANADA, XP002391053 *
PAUL VIOLA; MICHAEL JONES: "Robust Real - time Object Detection", SECOND INTERNATIONAL WORKSHOP ON STATISTICAL AND COMPUTATIONAL THEORIES OF VISION - MODELING, LEARNING, COMPUTING, AND SAMPLING, VANCOUVER, CANADA, 13 July 2001 (2001-07-13)
PAUL VIOLA; MICHAEL JONES: "Robust Real-time Object Detection", SECOND INTERNATIONAL WORKSHOP ON STATISTICAL AND COMPUTATIONAL THEORIES OF VISION-MODELING, LEARNING, COMPUTING, AND SAMPLING, VANCOUVER, CANADA, 13 July 2001 (2001-07-13)
TRIGGS ET AL.: "Human Detection Using Oriented Histograms of Flow and Appearance", PROC. EUROPEAN CONFERENCE ON COMPUTER VISION, 2006

Also Published As

Publication number Publication date
JP5545361B2 (ja) 2014-07-09
EP2434431A1 (en) 2012-03-28
JP2012527664A (ja) 2012-11-08
CN101894262A (zh) 2010-11-24
CN101894262B (zh) 2014-07-09
US20120093420A1 (en) 2012-04-19

Similar Documents

Publication Publication Date Title
WO2010133161A1 (zh) 对图像进行分类的方法和设备
US20230117712A1 (en) Feature density object classification, systems and methods
CN106874894B (zh) 一种基于区域全卷积神经网络的人体目标检测方法
CN106778586B (zh) 离线手写签名鉴别方法及***
CN111695522B (zh) 一种平面内的旋转不变人脸检测方法、装置及存储介质
CN105956582B (zh) 一种基于三维数据的人脸识别***
CN105894047B (zh) 一种基于三维数据的人脸分类***
JP5709410B2 (ja) パターン処理装置及びその方法、プログラム
US7912253B2 (en) Object recognition method and apparatus therefor
US7840037B2 (en) Adaptive scanning for performance enhancement in image detection systems
KR101184097B1 (ko) 얼굴 정면포즈 판단 방법
CN103430218A (zh) 用3d脸部建模和地标对齐扩增造型的方法
JP2008310796A (ja) コンピュータにより実施される、訓練データから分類器を構築し、前記分類器を用いてテストデータ中の移動物体を検出する方法
CN101576953A (zh) 一种人体姿态的分类方法和装置
JP2012226745A (ja) 奥行き画像内の物体を検出する方法およびシステム
CN106940791B (zh) 一种基于低维方向梯度直方图的行人检测方法
Stiene et al. Contour-based object detection in range images
CN102479329A (zh) 分类器生成装置和方法,检测图像中的对象的装置和方法
Deng et al. Detection and recognition of traffic planar objects using colorized laser scan and perspective distortion rectification
CN112926463B (zh) 一种目标检测方法和装置
CN112001448A (zh) 一种形状规则小物体检测方法
Ren et al. A non-contact sleep posture sensing strategy considering three dimensional human body models
Gottumukkal et al. Real time face detection from color video stream based on PCA method
Schulz et al. Pedestrian recognition from a moving catadioptric camera
JP2005209137A (ja) 対象物認識方法及び対象物認識装置、並びに顔方向識別装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10777364

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2012511134

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2010777364

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 13319914

Country of ref document: US