WO2014084218A1 - Dispositif de détection de sujet - Google Patents

Dispositif de détection de sujet Download PDF

Info

Publication number
WO2014084218A1
WO2014084218A1 PCT/JP2013/081808 JP2013081808W WO2014084218A1 WO 2014084218 A1 WO2014084218 A1 WO 2014084218A1 JP 2013081808 W JP2013081808 W JP 2013081808W WO 2014084218 A1 WO2014084218 A1 WO 2014084218A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
identification
learning
target area
identification target
Prior art date
Application number
PCT/JP2013/081808
Other languages
English (en)
Japanese (ja)
Inventor
八木康史
槇原靖
華春生
宮川恵介
岩崎瞬
Original Assignee
国立大学法人大阪大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 国立大学法人大阪大学 filed Critical 国立大学法人大阪大学
Publication of WO2014084218A1 publication Critical patent/WO2014084218A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/16Anti-collision systems
    • G08G1/166Anti-collision systems for active traffic, e.g. moving vehicles, pedestrians, bikes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast

Definitions

  • the present invention relates to an object detection device that detects an object based on a captured image acquired from an imaging means.
  • an image is picked up using an image pickup means such as a camera, and an object is detected based on the obtained picked-up image.
  • an image pickup means such as a camera
  • an object is detected based on the obtained picked-up image.
  • a technique for detecting a pedestrian reflected in a captured image using a feature amount (so-called HOG feature amount) related to intensity and gradient of luminance in a local region of the image is described.
  • a vehicle periphery monitoring device that monitors the relative positional relationship between the object and the host vehicle is constructed. By introducing this device, the vehicle occupant (especially the driver) is supported.
  • the above discriminator can discriminate not only pedestrians (human bodies) but also animals, artificial structures, etc. by performing various machine learning.
  • an image area hereinafter referred to as an identification target area
  • the identification accuracy may differ depending on the type of the object. This is probably because the shape of the projected image varies depending on the type of the object, and the ratio of capturing the image information of the background portion changes.
  • the image information of the background portion acts as a disturbance factor (noise information) that lowers the learning / identification accuracy of the object during the learning process or the identification process.
  • this type of device sequentially captures images while the vehicle is running, the scene (background, weather, etc.) of the captured image obtained from the camera can change from moment to moment.
  • road surface patterns such as pedestrian crossings and guardrails are often included in images of identification objects (pedestrians and the like) acquired when a vehicle travels on a road surface. For this reason, as a result of causing the classifier to perform machine learning, there is a high possibility that the road surface pattern is erroneously learned as a pedestrian. That is, it is difficult to predict the image feature in the background portion, and there is a problem that the influence as a disturbance factor cannot be ignored.
  • the present invention has been made to solve the above-described problems, and an object of the present invention is to provide an object detection apparatus that can improve learning / identification accuracy regardless of the type of the object.
  • An object detection apparatus includes an imaging unit that acquires a captured image, an identification target region extraction unit that extracts an identification target region from the captured image acquired by the imaging unit, and the identification target region extraction Object identifying means for identifying, for each type of the object, whether or not the object exists in the identification target area from the image feature amount extracted in the identification target area by the means,
  • the discriminating means is a discriminator generated using machine learning that receives a feature data group as the image feature amount and outputs the presence / absence information of the object, and the discriminator is used for the machine learning.
  • the feature data group is created and input from images of at least one of the sub-regions selected according to the type of the object among the plurality of sub-regions constituting each learning sample image.
  • the discriminator as the object discriminating means is an image of at least one sub-area selected according to the type of the object out of a plurality of sub-areas constituting each learning sample image used for machine learning. Since the feature data group is created and input, image information of sub-regions suitable for the shape of the projected image of the object can be selectively employed for the learning process, The learning accuracy can be improved regardless of the type. Then, by excluding the remaining sub-regions from the learning process, it is possible to prevent over-learning for image information other than the projected image of the object that acts as a disturbance factor during the identification process, and the identification accuracy of the object Can be improved.
  • the object identifying means identifies whether or not the object exists in the identification object area for each moving direction of the object. Based on the tendency that the image shape on the captured image changes according to the moving direction, the learning / identification accuracy of the object is further improved by identifying each moving direction.
  • the identification target area extracting unit extracts the identification target area having a size corresponding to a distance from the imaging unit to the target. For example, by making the relative magnitude relationship between the target object and the identification target area constant or substantially constant regardless of the distance, the influence of disturbance factors (image information other than the projected image of the target object) can be suppressed uniformly. As a result, the learning / identification accuracy of the object is further improved.
  • the image feature amount preferably includes a luminance gradient direction histogram in space. Because the fluctuation in the brightness gradient direction due to the exposure amount of imaging is small, it is possible to accurately grasp the characteristics of the target object, and stable identification accuracy even in outdoor environments where the intensity of ambient light changes from moment to moment Is obtained.
  • the image feature amount includes a luminance gradient direction histogram in time and space.
  • the imaging unit is mounted on a moving body and acquires the captured image by capturing an image while the moving body is moving. Since the scene of the captured image obtained from the imaging means mounted on the moving body changes every moment, it is particularly effective.
  • the classifier as the object identification unit is selected according to the type of the object from among a plurality of sub-regions constituting each learning sample image provided for machine learning. Since the feature data group is created and input from the at least one sub-region image, image information of the sub-region suitable for the shape of the projected image of the object is selectively selected for the learning process. The learning accuracy can be improved regardless of the type of the object. Then, by excluding the remaining sub-regions from the learning process, it is possible to prevent over-learning for image information other than the projected image of the object that acts as a disturbance factor during the identification process, and the identification accuracy of the object Can be improved. In particular, since the scene of the captured image obtained from the imaging means mounted on the moving body changes every moment, it is particularly effective.
  • FIG. 3A and 3B are image diagrams showing an example of a captured image acquired by imaging using a camera. It is a flowchart with which it uses for description of the learning process by the discrimination device shown in FIG. It is an image figure which shows an example of the learning sample image containing a crossing pedestrian. It is a schematic explanatory drawing regarding the definition method of each sub area
  • FIG. 8B is a schematic explanatory diagram illustrating a non-mask area and a mask area common to each learning sample image including a crossing pedestrian.
  • FIG. 9A is a schematic diagram showing a typical contour image obtained from a large number of learning sample images including facing pedestrians.
  • FIG. 9B is a schematic explanatory diagram showing a non-mask area and a mask area common to each learning sample image including a face-to-face pedestrian.
  • 2 is a flowchart provided for explaining the operation of the ECU shown in FIG. It is a schematic explanatory drawing showing the positional relationship of a vehicle, a camera, and a human body. It is a schematic explanatory drawing regarding the determination method of an identification object area
  • FIG. 15A is a schematic explanatory diagram relating to a method for calculating the HOG feature amount.
  • FIG. 15B is a schematic explanatory diagram regarding a method of calculating an STHOG feature amount. It is a schematic diagram for demonstrating the principle of the precision maintenance in the detection process using an STHOG feature-value.
  • FIG. 1 is a block diagram illustrating a configuration of a vehicle periphery monitoring device 10 as an object detection device according to the present embodiment.
  • FIG. 2 is a schematic perspective view of the vehicle 12 on which the vehicle periphery monitoring device 10 shown in FIG. 1 is mounted.
  • the vehicle periphery monitoring apparatus 10 includes a color camera (hereinafter simply referred to as “camera 14”) that captures a color image (hereinafter referred to as a captured image Im) including a plurality of color channels, and A vehicle speed sensor 16 for detecting the vehicle speed Vs of the vehicle 12, a yaw rate sensor 18 for detecting the yaw rate Yr of the vehicle 12, a brake sensor 20 for detecting the brake pedal operation amount Br by the driver, and the vehicle periphery monitoring device 10
  • An electronic control device (hereinafter referred to as “ECU 22”) to be controlled, a speaker 24 for issuing an alarm or the like by sound, and a display device 26 for displaying a captured image output from the camera 14 are provided.
  • the camera 14 is a camera that mainly uses light having a wavelength in the visible light region, and functions as an imaging unit that images the periphery of the vehicle 12.
  • the camera 14 has a characteristic that the output signal level increases as the amount of light reflected on the surface of the subject increases, and the luminance (for example, RGB value) of the image increases.
  • the camera 14 is fixedly disposed (mounted) at a substantially central portion of the front bumper portion of the vehicle 12.
  • the imaging means for imaging the periphery of the vehicle 12 is not limited to the above configuration example (so-called monocular camera), and may be, for example, a compound eye camera (stereo camera). Further, an infrared camera may be used instead of the color camera, or both may be provided. Further, in the case of a monocular camera, another ranging means (radar apparatus) may be provided.
  • the speaker 24 outputs an alarm sound or the like in response to a command from the ECU 22.
  • the speaker 24 is provided on a dashboard (not shown) of the vehicle 12.
  • an audio output function provided in another device (for example, an audio device or a navigation device) may be used.
  • the display device 26 (see FIGS. 1 and 2) is a HUD (head-up display) arranged on the front windshield of the vehicle 12 at a position that does not obstruct the driver's front view.
  • the display device 26 is not limited to the HUD, but a display that displays a map or the like of a navigation system mounted on the vehicle 12 or a display (MID; multi-information display) that displays fuel consumption or the like provided in a meter unit or the like is used. can do.
  • the ECU 22 basically includes an input / output unit 28, a calculation unit 30, a display control unit 32, and a storage unit 34.
  • Each signal from the camera 14, the vehicle speed sensor 16, the yaw rate sensor 18, and the brake sensor 20 is input to the ECU 22 side via the input / output unit 28.
  • Each signal from the ECU 22 is output to the speaker 24 and the display device 26 via the input / output unit 28.
  • the input / output unit 28 includes an A / D conversion circuit (not shown) that converts an input analog signal into a digital signal.
  • the calculation unit 30 performs calculations based on the signals from the camera 14, the vehicle speed sensor 16, the yaw rate sensor 18, and the brake sensor 20, and generates signals for the speaker 24 and the display device 26 based on the calculation results.
  • the calculation unit 30 functions as a distance estimation unit 40, an identification target region determination unit 42 (identification target region extraction unit), an object identification unit 44 (subject identification unit), and an object detection unit 46.
  • the target object identification unit 44 is configured by a classifier 50 generated using machine learning that receives a feature data group as an image feature amount and outputs the presence / absence information of the target object.
  • each unit in the calculation unit 30 is realized by reading and executing a program stored in the storage unit 34.
  • the program may be supplied from the outside via a wireless communication device (mobile phone, smartphone, etc.) not shown.
  • the display control unit 32 is a control circuit that drives and controls the display device 26.
  • the display control unit 32 drives the display device 26 by outputting a signal used for display control to the display device 26 via the input / output unit 28. Thereby, the display apparatus 26 can display various images (captured image Im, a mark, etc.).
  • the storage unit 34 is a computer-readable and non-transitory storage medium.
  • the storage unit 34 includes an imaging signal converted into a digital signal, a RAM (Random Access Memory) that stores temporary data used for various arithmetic processes, and a ROM (Read Only Memory) that stores an execution program, a table, a map, or the like. ) Etc.
  • the vehicle periphery monitoring apparatus 10 is basically configured as described above. An outline of the operation of the vehicle periphery monitoring device 10 will be described below.
  • the ECU 22 converts an analog video signal output from the camera 14 into a digital signal at a predetermined frame clock interval / cycle (for example, 30 frames per second) and temporarily captures it in the storage unit 34.
  • a road area hereinafter simply referred to as “road 60” on which the vehicle 12 travels
  • a plurality of utility pole areas hereinafter simply referred to as “electric poles”
  • crossing pedestrian 64 a crossing pedestrian area on the road 60
  • T T2
  • a road 60, a plurality of utility poles 62, and a crossing pedestrian 64 exist in the captured image Im shown in FIG. 3A.
  • the relative positional relationship between the camera 14 and each object changes every moment. Thereby, even if the first and second frames (captured image Im) are in the same field angle range, each object is imaged in a different form (shape, size, or color).
  • the ECU 22 performs various arithmetic processing with respect to the captured image Im (front image of the vehicle 12) read from the memory
  • FIG. The ECU 22 comprehensively considers the processing results for the captured image Im, and signals (vehicle speed Vs, yaw rate Yr, and operation amount Br) that indicate the traveling state of the vehicle 12 as necessary.
  • Vs vehicle speed
  • Yr yaw rate
  • the ECU 22 controls each output unit of the vehicle periphery monitoring device 10 in order to call the driver's attention. For example, the ECU 22 outputs an alarm sound (for example, a beeping sound) via the speaker 24 and emphasizes the part of the monitored object in the captured image Im visualized on the display device 26. Display.
  • an alarm sound for example, a beeping sound
  • step S11 learning data used for machine learning is collected.
  • the learning data is a data set of a learning sample image including (or not including) the target object and the type of the target object (including the attribute “no target object”).
  • types of objects include human bodies, various animals (specifically, mammals such as deer, horses, sheep, dogs and cats, birds, etc.), artificial structures (specifically, vehicles, signs, utility poles) , Guardrails, walls, etc.).
  • FIG. 5 is an image diagram showing an example of learning sample images 74 and 76 including crossing pedestrians 70 and 72.
  • a projected image of a human body that crosswalks from the left side to the right side (hereinafter referred to as a crossing pedestrian 70) is displayed.
  • a projected image of a human body that crosswalks from the right front side to the left back side (hereinafter referred to as crossing pedestrian 72) is displayed at the approximate center of another learning sample image 76.
  • Each of the learning sample images 74 and 76 in this example corresponds to a correct image including the object.
  • Each learning sample image collected may include an incorrect image that does not include an object.
  • the learning sample images 74 and 76 have an image region 80 whose shapes match or are similar to each other.
  • Step S12 a plurality of sub-regions 82 are defined by dividing the image region 80 of each of the learning sample images 74 and 76, respectively.
  • the rectangular image region 80 is equally divided into a lattice shape with eight rows and six columns regardless of the size. That is, 48 sub-regions 82 having the same shape are defined in the image region 80, respectively.
  • step S13 the learning architecture of the classifier 50 is constructed.
  • the learning architecture include boosting method, SVM (Support Vector Machine), neural network, EM (Expectation Maximization) algorithm and the like.
  • AdaBoost which is a kind of boosting method, is applied.
  • the discriminator 50 includes N (N is a natural number of 2 or more) feature data generators 90, N weak learners 92, a weight updater 93, and a weight calculator 94. And a sample load updater 95.
  • the first data generator, the second data generator, the third data generator,..., The Nth data generator are sequentially arranged from the top with respect to each feature data generator 90. It is written. Similarly, the first weak learner, the second weak learner, the third weak learner,..., And the Nth weak learner are written in order from the top for each weak learner 92.
  • one feature data generator 90 for example, the first data generator
  • one weak learner 92 for example, the first weak learner
  • N subsystems are constructed.
  • the output side (each weak learner 92) of N subsystems is connected to the input side of the weight updater 93, and the weighting calculator 94 and the sample load updater 95 are connected in series to the output side. It is connected.
  • the detailed operation of the classifier 50 during machine learning will be described later.
  • the mask conditions for the sub-region 82 are determined for each type of object.
  • the mask condition is a selection as to whether or not to adopt an image in each sub-region 82 when N feature data (hereinafter referred to as a feature data group) is generated from one learning sample image 74. Means a condition.
  • the background portion 78 excluding the crossing pedestrian 70 shows another three human bodies, a road surface, a building wall, and the like.
  • the image information of the background part 78 acts as a disturbance factor (noise information) that reduces the learning / identification accuracy of the crossing pedestrian 70 as the object. Therefore, it is effective to select only the sub-region 82 suitable for the identification process from among all 48 sub-regions 82 and to learn using the created feature data group.
  • FIG. 8A is a schematic diagram showing a typical contour image 100 obtained from a large number of learning sample images 74 and 76 including crossing pedestrians 70 and 72.
  • a contour extraction image (not shown) in which the contour of each object is extracted by performing known edge extraction processing on each learning sample image 74 or the like using an image processing device (not shown). Create each one.
  • each contour extraction image represents a part from which the contour of the object has been extracted in white, and represents a part from which the outline has not been extracted in black.
  • a contour image typical for each learning sample image 74 (typical contour image 100).
  • the typical contour image 100 represents a part from which the contour of the object has been extracted in white, and represents a part from which the contour has not been extracted in black. That is, the typical contour image 100 corresponds to an image representing the contour of an object (crossing pedestrians 70 and 72) included in common in a large number of learning sample images 74 and the like.
  • a sub-region 82 whose contour feature amount exceeds a predetermined threshold is adopted as a calculation target, and a sub-region 82 whose contour feature amount falls below a predetermined threshold is excluded from the calculation target.
  • a set of 20 sub-regions 82 is obtained among the 48 sub-regions 82 defined in the image region 80. It is assumed that the non-mask area 102 is determined. Of the 48 sub-regions 82, the remaining 28 sub-regions 82 (regions filled in with white) are determined as the mask region 104.
  • FIG. 9A is a schematic diagram showing a typical contour image 106 obtained from a large number of learning sample images including facing pedestrians.
  • the method for creating the typical contour image 106 is the same as that for the typical contour image 100 in FIG.
  • FIG. 9B is a schematic explanatory diagram showing a non-mask area 108 and a mask area 110 that are common to each learning sample image including a face-to-face pedestrian. Since the determination method of the mask area 110 is the same as that of the mask area 104 in FIG. 8B, the description thereof is omitted.
  • the mask areas 104 and 110 are different even though the object is the same (pedestrian). More specifically, the mask areas 104 and 110 differ depending on whether or not the four sub-areas 82 (see FIG. 9B) with hatching are mask targets. This is because the degree of change in the image shape due to walking motion (movement of shaking hands and feet) varies depending on the moving direction of the pedestrian. In this way, learning may be performed for each moving direction of the object based on the tendency of the image shape on the image to change according to the moving direction.
  • the moving direction may be any of a transverse direction (more specifically, right direction and left direction), a facing direction (more specifically, near side direction and back direction), and an oblique direction with respect to the image plane.
  • step S15 machine learning is performed by sequentially inputting a large number of learning data collected in step S11 to the discriminator 50.
  • the discriminator 50 inputs the learning sample image 74 including the crossing pedestrian 70 among the collected learning data to the feature data generator 90 side.
  • each feature data generator 90 creates each feature data (collectively, feature data group) by performing specific arithmetic processing on the learning sample image 74 according to the mask condition determined in step S14.
  • the calculation may be performed without using the values of all pixels belonging to the mask area 104 (see FIG. 8B), or the values of all the pixels described above are replaced with predetermined values (for example, 0).
  • the feature data may be substantially invalidated by calculating later.
  • Each weak learner 92 (i-th weak learner; 1 ⁇ i ⁇ N) is predetermined for each feature data (i-th feature data) acquired from the feature data generator 90 (i-th data generator). Each output result (i-th output result) is obtained by performing the above calculation.
  • the weight updater 93 receives the first to Nth output results acquired from the weak learners 92, and the object information 96 which is the presence / absence information of the object in the collected learning data. Enter.
  • the object information 96 indicates that the crossing pedestrian 70 is included in the learning sample image 74.
  • the weight updater 93 selects one weak learner 92 that has obtained an output result that minimizes the amount of error from the output value corresponding to the object information 96, and updates the weighting coefficient ⁇ so as to increase.
  • the quantity ⁇ is determined.
  • the weighting calculator 94 updates the weighting coefficient ⁇ by adding the update amount ⁇ supplied from the weight updater 93.
  • the sample load updater 95 updates a load (hereinafter referred to as a sample load 97) previously applied to the learning sample image 74 and the like based on the updated weighting coefficient ⁇ and the like.
  • the learning data is input, the weighting coefficient ⁇ is updated, and the sample weight 97 is sequentially updated, and the discriminator 50 performs machine learning until the convergence condition is satisfied (step S15).
  • An object identification unit 44 that can identify whether or not is constructed.
  • the object identifying unit 44 may be configured to be able to identify not only the object but also an object excluding the object (for example, the road 60 in FIG. 3A and the like).
  • step S22 the distance estimation unit 40 estimates the distance Dis from the vehicle 12 by calculating the elevation angle of the vehicle 12 using the captured image Im acquired in step S21.
  • FIG. 11 is a schematic explanatory diagram showing the positional relationship between the vehicle 12, the camera 14, and the human body M.
  • the vehicle 12 on which the camera 14 is mounted and the human body M as an object are present on a flat road surface S.
  • a contact point between the human body M and the road surface S is Pc
  • an optical axis of the camera 14 is L1
  • a straight line connecting the optical center C of the camera 14 and the contact point Pc is L2.
  • the angle (elevation angle) formed by the optical axis L1 of the camera 14 with respect to the road surface S is ⁇
  • the angle formed by the straight line L2 with respect to the optical axis L1 is ⁇
  • the height of the camera 14 with respect to the road surface S is Let it be Hc.
  • the distance estimation unit 40 can estimate the distance Dis corresponding to each position of the road surface S (the road 60 in FIG. 3A) on the captured image Im.
  • the distance estimation unit 40 considers the posture change between the road surface S and the camera 14 due to the motion of the vehicle 12, and estimates the distance Dis using a known method such as SfM (Structure from Motion). Also good.
  • SfM Structure from Motion
  • the vehicle periphery monitoring apparatus 10 is provided with a distance measuring sensor, you may measure the distance Dis using this.
  • the identification target area determination unit 42 determines the size and the like of the identification target area 122, which is the image area to be identified. In the present embodiment, the identification target area determination unit 42 determines the size or the like of the identification target area 122 according to the distance Dis estimated in step S22 and / or the vehicle speed Vs acquired from the vehicle speed sensor 16. A specific example will be described with reference to FIG.
  • the position on the captured image Im corresponding to the contact point Pc (see FIG. 11) between the road surface S (road 60) and the human body M (crossing pedestrian 64) is defined as a reference position 120.
  • region 122 is set so that all the crossing pedestrians 64 may be included.
  • the identification target area determination unit 42 determines the size of the identification target area 122 according to the distance Dis from the camera 14 (optical center C in FIG. 11) to the target using an arbitrary calculation formula including a linear function and a nonlinear function. decide. When it is assumed that the crossing pedestrian 64f illustrated by a broken line exists at the reference position 124 on the road surface S (the road 60), an identification target area 126 similar to the identification target area 122 is set.
  • the relative magnitude relationship between the crossing pedestrian 64 (64f) and the identification target region 122 (126) is constant or substantially constant regardless of the distance Dis, a disturbance factor (other than the projected image of the target) The influence of the image information) can be uniformly suppressed, and as a result, the learning / identification accuracy of the object is further improved.
  • the identification target area determination unit 42 also determines a designated area 128 that is a target range of a raster scan described later. For example, Dis1 that is a distance in which the object is reliably detected within the normal operation range is determined as the lower limit value, and Dis2 that is a distance that does not cause a collision with the object immediately within the normal operation range is set as the upper limit value. May be determined as As described above, by omitting scanning in a part of the captured image Im, not only can the calculation amount and calculation time of the identification process be reduced, but also erroneous detection itself that may occur in a region other than the designated region 128. Can be eliminated.
  • the identification target region determination unit 42 may appropriately change the shape of the identification target region 122 according to the type of the target object.
  • the size may be determined according to the distance Dis (for example, a value proportional to the distance Dis).
  • the identification target area determination unit 42 may change the size of the identification target area 122 according to the height of the target object even at the same distance Dis. Thereby, it is possible to set an appropriate size according to the height of the object, and the learning / identification accuracy of the object is further improved by uniformly suppressing the influence of the disturbance factor.
  • step S24 the arithmetic unit 30 starts a raster scan of the captured image Im within the designated area 128 determined in step S23.
  • the raster scan refers to a method of successively identifying the presence or absence of an object while moving the reference position 120 (pixels in the captured image Im) in a predetermined direction.
  • the identification target area determination unit 42 sequentially determines the reference position 120 currently being scanned and the position / size of the identification target area 122 identified from the reference position 120.
  • step S25 the object identifying unit 44 identifies whether or not there is at least one kind of object in the determined identification target area 122.
  • the object identifying unit 44 is a classifier 50 (see FIG. 7) generated using machine learning.
  • the weighting calculator 94 an appropriate weighting coefficient ⁇ f obtained by machine learning (step S15 in FIG. 4) is preset.
  • the object identifying unit 44 inputs an evaluation image 130 having an image area 80 including the identification target area 122 to each feature data generator 90 side.
  • necessary image processing such as normalization processing (gradation processing / enlargement / reduction processing), alignment processing, etc. may be appropriately performed on the image of the identification target region 122.
  • the object identification unit 44 includes each feature data generator 90, each weak learner 92, a weighting calculator 94, and an integrated learner that applies a step function to the weighted output result acquired from the weighting calculator 94.
  • the evaluation image 130 is sequentially processed via 98, and an identification result indicating that the crossing pedestrian 64 exists in the identification target area 122 is output.
  • the object discriminating unit 44 functions as a strong discriminator having high discrimination performance by combining N weak discriminators (weak learners 92).
  • Each feature data generator 90 calculates an image feature amount (that is, the above-described feature data group) in each sub-region 82 using the same calculation method as in the learning process (see FIG. 7).
  • HOG Heistograms of Oriented Gradient: luminance gradient direction histogram
  • each block is defined below corresponding to each sub-region 82.
  • the sub-region 82 as a block is composed of a total of 36 pixels 84, 6 pixels vertically and 6 pixels horizontally.
  • a two-dimensional gradient (Ix, Iy) of luminance is calculated for each pixel 84 constituting the block.
  • the gradient intensity I and the spatial luminance gradient angle ⁇ are calculated according to the following equations (2) and (3).
  • I (Ix 2 + Iy 2 ) 1/2
  • tan ⁇ 1 (Iy / Ix) (3)
  • each grid in the first row illustrates the direction of the planar luminance gradient.
  • the gradient intensity I and the spatial luminance gradient angle ⁇ are calculated for all the pixels 84, but the illustration of arrows in the second and subsequent rows is omitted.
  • a histogram for the spatial luminance gradient angle ⁇ is created for each block.
  • the horizontal axis of the histogram is the spatial luminance gradient angle ⁇ (eight divisions in this example), and the vertical axis of the histogram is the gradient intensity I.
  • a histogram for each block is created based on the gradient intensity I shown in Equation (2).
  • the HOG feature amount of the evaluation image 130 is obtained by connecting the histogram for each block (spatial luminance gradient angle ⁇ in the example of FIG. 14C) in a predetermined order, for example, ascending order.
  • the image feature quantity may include a luminance gradient direction histogram (HOG feature quantity) in space. Because the fluctuation of the luminance gradient direction ( ⁇ ) due to the exposure amount of imaging is small, it is possible to accurately capture the characteristics of the target object, and it is stable even in outdoor environments where the intensity of ambient light changes from moment to moment Identification accuracy can be obtained.
  • HOG feature quantity luminance gradient direction histogram
  • an STHOG Se-Temporal Histograms of Oriented Gradient
  • a three-dimensional gradient (Ix, Iy, It) of luminance is calculated for each pixel 84 constituting the block using a plurality of captured images Im acquired in time series, as in FIG. 14B. Is done.
  • the gradient intensity I and the temporal luminance gradient angle ⁇ are calculated according to the following equations (4) and (5).
  • I (Ix 2 + Iy 2 + It 2 ) 1/2
  • tan ⁇ 1 ⁇ It / (Ix 2 + Iy 2 ) 1/2 ⁇ (5)
  • an STHOG feature quantity which is a brightness gradient histogram in space-time is obtained by further connecting a histogram of the temporal brightness gradient angle ⁇ to the HOG feature quantity.
  • luminance gradient direction
  • luminance gradient direction
  • the object identification unit 44 identifies whether or not at least one type of object exists in the determined identification object region 122 (step S25). Thereby, not only the kind of the object including the human body M but also the moving direction including the transverse direction and the facing direction are identified.
  • step S26 the identification target area determination unit 42 determines whether or not all the scans in the designated area 128 have been completed. When it is determined that it is not completed (step S26: NO), the process proceeds to the next step (S27).
  • the identification target area determination unit 42 changes the position or size of the identification target area 122. Specifically, the identification target area determination unit 42 moves the reference position 120 that is the scan target by a predetermined amount (for example, one pixel) in a predetermined direction (for example, the right direction). When the distance Dis changes, the size of the identification target area 122 is also changed. Furthermore, considering that the typical value of the body length or the body width varies depending on the type of the object, the identification target region determination unit 42 may change the size of the identification target region 122 according to the type of the target.
  • step S26 the arithmetic unit 30 sequentially repeats steps S25 to S27 until all the scans in the designated area 128 are completed.
  • step S26 the calculation unit 30 ends the raster scan of the captured image Im (step S28).
  • step S29 the object detection unit 46 detects an object present in the captured image Im.
  • the identification result for a single frame may be used, or the motion vector for the same object can be calculated by considering the identification results for a plurality of frames.
  • step S30 the ECU 22 causes the storage unit 34 to store data necessary for the next calculation process. For example, the distance Dis created in step S22, the attribute of the object (crossing pedestrian 64 in FIG. 3A, etc.) obtained in step S25, the reference position 120, and the like can be given.
  • the vehicle periphery monitoring device 10 can monitor an object (for example, the human body M in FIG. 11) existing in front of the vehicle 12 at a predetermined time interval.
  • the vehicle periphery monitoring apparatus 10 is extracted with the camera 14 that acquires the captured image Im, and the identification target region determination unit 42 that extracts the identification target regions 122 and 126 from the acquired captured image Im.
  • the object identification unit for identifying, for each type of object, whether or not an object (for example, a crossing pedestrian 64) exists in the identification object areas 122 and 126 from the image feature amounts in the identification object areas 122 and 126. 44.
  • the target object identification unit 44 is a classifier 50 generated using machine learning that receives a feature data group as an image feature amount and outputs the presence / absence information of the target object.
  • the discriminator 50 as the object discriminating means includes at least one sub selected according to the type of the object out of the plurality of sub areas 82 constituting the learning sample images 74 and 76 used for machine learning. Since the feature data group is created and input from the image of the region 82 (non-mask regions 102 and 108), the image information of the sub-region 82 suitable for the shape of the projected image of the object is obtained for the learning process. The learning accuracy can be improved regardless of the type of object. Further, by excluding the remaining sub-region 82 (mask regions 104 and 110) from the learning process, it is possible to prevent over-learning with respect to image information other than the projected image of the object that acts as a disturbance factor during the identification process. Yes, the identification accuracy of the object can be improved.
  • the camera 14 is mounted on the vehicle 12 and acquires the captured image Im by capturing an image while the vehicle 12 is moving. Since the scene (background, weather, road surface pattern, etc.) of the captured image Im obtained from the camera 14 mounted on the vehicle 12 changes every moment, it is particularly effective.
  • the identification target area determination unit 42 extracts and determines the identification target area 122 based on the position on the road 60 (the reference position 120 in FIG. 12), thereby detecting the same as when the scene is fixed. Accuracy can be maintained.
  • the principle of maintaining accuracy in the detection process using the STHOG feature amount will be described with reference to FIG.
  • time-series data 132 used for calculation of the STHOG feature value is obtained by extracting the identification target region 122 on the xy plane in time series.
  • the reference position 120 included in each identification target region 122 corresponds to a position on the road 60 (FIG. 3A and the like). That is, the identification target area 122 as the vicinity area of the reference position 120 corresponds to a set of points that are completely or substantially stationary.
  • the identification target region determination unit 42 extracts and determines the identification target region 122 having a size corresponding to the distance Dis, thereby determining the relative position between the target and the background portion and The time series data 132 having substantially the same size and having a stable image shape can be obtained.
  • the above-described identification process is performed on the captured image Im obtained by the monocular camera (camera 14), but it goes without saying that the same effect can also be obtained with a compound eye camera (stereo camera). .
  • the learning process and the identification process by the object identification unit 44 are performed separately, but both processes may be provided so as to be executed in parallel.
  • the entire vehicle periphery monitoring device 10 is mounted on the vehicle 12, but any configuration may be used as long as at least imaging means is mounted.
  • any configuration may be used as long as at least imaging means is mounted.
  • the imaging signal output from the imaging unit is transmitted to a separate arithmetic processing unit (including the ECU 22) via the wireless communication unit, the same effect as that of the present embodiment can be obtained.
  • the object detection device is applied to the vehicle 12, but is not limited to this, and may be applied to other types of moving objects (for example, ships, aircraft, artificial satellites, etc.). Good. Needless to say, even when the object detection device is fixedly arranged, a certain effect of improving the learning / identification accuracy of the object can be obtained.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Traffic Control Systems (AREA)
  • Image Analysis (AREA)
  • Closed-Circuit Television Systems (AREA)

Abstract

La présente invention se rapporte à un dispositif de détection de sujet. Une unité d'identification de sujet est un identificateur dont l'entrée est un groupe de données de caractéristiques représentant une quantité de caractéristiques d'image, et la sortie est une information indiquant si un sujet (un piéton qui traverse) est présent, cet identificateur étant généré par apprentissage automatique. L'identificateur produit un groupe de données de caractéristiques à partir d'une image d'au moins une sous-région (82) (une région non masquée (102)) sélectionnée en fonction du type de sujet dans une pluralité de sous-régions (82) constituant chacune des images échantillons d'apprentissage qui servent à l'apprentissage automatique, et il entre le groupe de données de caractéristiques.
PCT/JP2013/081808 2012-11-27 2013-11-26 Dispositif de détection de sujet WO2014084218A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2012258488A JP2014106685A (ja) 2012-11-27 2012-11-27 車両周辺監視装置
JP2012-258488 2012-11-27

Publications (1)

Publication Number Publication Date
WO2014084218A1 true WO2014084218A1 (fr) 2014-06-05

Family

ID=50827854

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2013/081808 WO2014084218A1 (fr) 2012-11-27 2013-11-26 Dispositif de détection de sujet

Country Status (2)

Country Link
JP (1) JP2014106685A (fr)
WO (1) WO2014084218A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795985A (zh) * 2018-08-02 2020-02-14 松下电器(美国)知识产权公司 信息处理方法及信息处理***
CN112154492A (zh) * 2018-03-19 2020-12-29 德尔克股份有限公司 预警和碰撞避免
EP3664020A4 (fr) * 2017-07-31 2021-04-21 Equos Research Co., Ltd. Dispositif de génération de données d'image, dispositif de reconnaissance d'image, programme de génération de données d'image, et programme de reconnaissance d'image
CN112997214A (zh) * 2018-11-13 2021-06-18 索尼公司 信息处理装置、信息处理方法和程序
US11443631B2 (en) 2019-08-29 2022-09-13 Derq Inc. Enhanced onboard equipment
US11741367B2 (en) 2017-03-13 2023-08-29 Fanuc Corporation Apparatus and method for image processing to calculate likelihood of image of target object detected from input image

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9542626B2 (en) * 2013-09-06 2017-01-10 Toyota Jidosha Kabushiki Kaisha Augmenting layer-based object detection with deep convolutional neural networks
JP6511982B2 (ja) * 2015-06-19 2019-05-15 株式会社デンソー 運転操作判別装置
JP6795379B2 (ja) * 2016-03-10 2020-12-02 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America 運転制御装置、運転制御方法及び運転制御プログラム
US10392038B2 (en) * 2016-05-16 2019-08-27 Wi-Tronix, Llc Video content analysis system and method for transportation system
US10210418B2 (en) * 2016-07-25 2019-02-19 Mitsubishi Electric Research Laboratories, Inc. Object detection system and object detection method
JP2018136211A (ja) * 2017-02-22 2018-08-30 オムロン株式会社 環境認識システム及び学習装置
JP6782433B2 (ja) * 2017-03-22 2020-11-11 パナソニックIpマネジメント株式会社 画像認識装置
US10007269B1 (en) 2017-06-23 2018-06-26 Uber Technologies, Inc. Collision-avoidance system for autonomous-capable vehicle
CN107390682B (zh) * 2017-07-04 2020-08-07 安徽省现代农业装备产业技术研究院有限公司 一种农用车辆自动驾驶路径跟随方法及***
JP6797860B2 (ja) * 2018-05-02 2020-12-09 株式会社日立国際電気 水上侵入検知システムおよびその方法
JP7401199B2 (ja) * 2019-06-13 2023-12-19 キヤノン株式会社 情報処理装置、情報処理方法、およびプログラム
WO2021205616A1 (fr) * 2020-04-09 2021-10-14 三菱電機株式会社 Dispositif de commande de corps mobile, procédé de commande de corps mobile, et dispositif d'apprentissage
JP7484587B2 (ja) 2020-08-31 2024-05-16 沖電気工業株式会社 交通監視装置、交通監視システム、交通監視方法、および交通監視プログラム

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011165170A (ja) * 2010-01-15 2011-08-25 Toyota Central R&D Labs Inc 対象物検出装置及びプログラム
WO2011161924A1 (fr) * 2010-06-23 2011-12-29 国立大学法人大阪大学 Dispositif de détection d'objets en mouvement
WO2012124000A1 (fr) * 2011-03-17 2012-09-20 日本電気株式会社 Système de reconnaissance d'images, procédé de reconnaissance d'images, et support lisible par ordinateur non temporaire dans lequel est stocké un programme pour la reconnaissance d'images
JP2012185684A (ja) * 2011-03-07 2012-09-27 Jvc Kenwood Corp 対象物検出装置、対象物検出方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5707570B2 (ja) * 2010-03-16 2015-04-30 パナソニックIpマネジメント株式会社 物体識別装置、物体識別方法、及び、物体識別装置の学習方法
JP5290229B2 (ja) * 2010-03-30 2013-09-18 セコム株式会社 学習装置及び対象物検知装置
JP5214716B2 (ja) * 2010-12-14 2013-06-19 株式会社東芝 識別装置
JP5901054B2 (ja) * 2011-12-02 2016-04-06 国立大学法人九州工業大学 物体の検出方法及びその方法を用いた物体の検出装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011165170A (ja) * 2010-01-15 2011-08-25 Toyota Central R&D Labs Inc 対象物検出装置及びプログラム
WO2011161924A1 (fr) * 2010-06-23 2011-12-29 国立大学法人大阪大学 Dispositif de détection d'objets en mouvement
JP2012185684A (ja) * 2011-03-07 2012-09-27 Jvc Kenwood Corp 対象物検出装置、対象物検出方法
WO2012124000A1 (fr) * 2011-03-17 2012-09-20 日本電気株式会社 Système de reconnaissance d'images, procédé de reconnaissance d'images, et support lisible par ordinateur non temporaire dans lequel est stocké un programme pour la reconnaissance d'images

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HIDEFUMI YOSHIDA ET AL.: "A study on a method for stable pedestrian detection against pose changes with generative learning", IEICE TECHNICAL REPORT, vol. 111, no. 49, 12 May 2011 (2011-05-12), pages 127 - 132 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11741367B2 (en) 2017-03-13 2023-08-29 Fanuc Corporation Apparatus and method for image processing to calculate likelihood of image of target object detected from input image
DE102018105334B4 (de) 2017-03-13 2024-01-25 Fanuc Corporation Bildverarbeitungsvorrichtung und Bildverarbeitungsverfahren zur Berechnung der Bildwahrscheinlichkeit eines aus einem Eingangsbild erfassten Zielobjekts
EP3664020A4 (fr) * 2017-07-31 2021-04-21 Equos Research Co., Ltd. Dispositif de génération de données d'image, dispositif de reconnaissance d'image, programme de génération de données d'image, et programme de reconnaissance d'image
US11157724B2 (en) 2017-07-31 2021-10-26 Equos Research Co., Ltd. Image data generation device, image recognition device, image data generation program, and image recognition program
CN112154492A (zh) * 2018-03-19 2020-12-29 德尔克股份有限公司 预警和碰撞避免
US11749111B2 (en) 2018-03-19 2023-09-05 Derq Inc. Early warning and collision avoidance
US11763678B2 (en) 2018-03-19 2023-09-19 Derq Inc. Early warning and collision avoidance
CN110795985A (zh) * 2018-08-02 2020-02-14 松下电器(美国)知识产权公司 信息处理方法及信息处理***
CN112997214A (zh) * 2018-11-13 2021-06-18 索尼公司 信息处理装置、信息处理方法和程序
CN112997214B (zh) * 2018-11-13 2024-04-26 索尼公司 信息处理装置、信息处理方法和程序
US11443631B2 (en) 2019-08-29 2022-09-13 Derq Inc. Enhanced onboard equipment
US11688282B2 (en) 2019-08-29 2023-06-27 Derq Inc. Enhanced onboard equipment

Also Published As

Publication number Publication date
JP2014106685A (ja) 2014-06-09

Similar Documents

Publication Publication Date Title
WO2014084218A1 (fr) Dispositif de détection de sujet
JP7052663B2 (ja) 物体検出装置、物体検出方法及び物体検出用コンピュータプログラム
US11741696B2 (en) Advanced path prediction
WO2019223582A1 (fr) Procédé et système de détection de cible
US9776564B2 (en) Vehicle periphery monitoring device
JP4173902B2 (ja) 車両周辺監視装置
JP4173901B2 (ja) 車両周辺監視装置
DE112018007287T5 (de) Fahrzeugsystem und -verfahren zum erfassen von objekten und einer objektentfernung
US9070023B2 (en) System and method of alerting a driver that visual perception of pedestrian may be difficult
CN107133559B (zh) 基于360度全景的运动物体检测方法
JP4171501B2 (ja) 車両の周辺監視装置
US11170272B2 (en) Object detection device, object detection method, and computer program for object detection
US20120070034A1 (en) Method and apparatus for detecting and tracking vehicles
CN102073846A (zh) 基于航拍图像的交通信息获取方法
CN111967396A (zh) 障碍物检测的处理方法、装置、设备及存储介质
JP4631036B2 (ja) 通行人行動解析装置及び通行人行動解析方法並びにそのプログラム
US10984264B2 (en) Detection and validation of objects from sequential images of a camera
CN114359714A (zh) 基于事件相机的无人体避障方法、装置及智能无人体
WO2018138782A1 (fr) Dispositif de traitement d'informations, programme d'extraction de points caractéristiques, et procédé d'extraction de points caractéristiques
US11120292B2 (en) Distance estimation device, distance estimation method, and distance estimation computer program
KR20160015091A (ko) 교통신호제어 시스템의 보행자 검출 및 행동패턴 추적 방법
US20230245323A1 (en) Object tracking device, object tracking method, and storage medium
EP4089649A1 (fr) Caméras neuromorphiques pour aéronef
JP4055785B2 (ja) 移動物体の高さ検出方法及び装置並びに物体形状判定方法及び装置
CN115131594B (zh) 一种基于集成学习的毫米波雷达数据点分类方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13857904

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13857904

Country of ref document: EP

Kind code of ref document: A1