CN107341789B

CN107341789B - System and method for predicting pathway of visually impaired people based on RGB-D camera and stereo

Info

Publication number: CN107341789B
Application number: CN201611048370.8A
Authority: CN
Inventors: 于红雷; 杨恺伦; 程瑞琦; 陈浩; 汪凯巍
Original assignee: Hangzhou Vision Krypton Technology Co Ltd
Current assignee: Hangzhou Vision Krypton Technology Co Ltd
Priority date: 2016-11-23
Filing date: 2016-11-23
Publication date: 2019-12-17
Anticipated expiration: 2036-11-23
Also published as: CN107341789A

Abstract

The invention discloses a vision-impaired person access predicting system and method based on an RGB-D camera and stereo. The method comprises the steps of projecting invisible near-infrared static speckles by using an infrared projector, collecting images by using two infrared cameras and one RGB (red, green and blue) color camera, processing the collected images by using a small-sized processor, and calculating to obtain a depth image. The method acquires attitude angle information of a camera by using an attitude angle sensor. The small processor calculates and acquires a height image by using the depth information and the attitude angle information, divides the height image into blocks, converts the depth information obtained by the blocks into stereo signals, and finally transmits the stereo signals to the visually impaired to assist, so that the requirement of the visually impaired for path prediction can be well met.

Description

System and method for predicting pathway of visually impaired people based on RGB-D camera and stereo

Technical Field

The invention belongs to the technical field of visual impairment person auxiliary technology, binocular vision technology, three-dimensional environment perception technology and stereo interaction. The invention comprises a vision-impaired person access predicting method based on an RGB-D camera and stereo, and relates to a method for projecting invisible near-infrared static speckles by using an infrared projector, acquiring images by using two infrared cameras and one RGB camera, processing the acquired images by using a small processor, and calculating to acquire a depth image. The small-sized processor acquires attitude angle information of the camera by using the attitude angle sensor. The small processor calculates and obtains a height image by using the depth information and the attitude angle information, blocks the height image, converts the depth information obtained by blocking into a stereo signal, and finally transmits the stereo signal to the visually impaired by using the bone conduction headset to carry out an auxiliary path prediction method.

Background

According to the statistics of the world health organization, 2.85 hundred million people with visual impairment exist in the world. The visually impaired people lose normal vision, have difficulty in understanding color, shape, distance and movement, and have great influence on daily life, traveling and the like.

The traditional auxiliary tool for the visually impaired people is a walking stick for the blind, and the visually impaired people can know the condition before the walking stick by repeatedly moving the walking stick, so that time and labor are wasted. The walking stick for the blind has limited detection distance, can only detect barriers beside feet, and cannot reflect the conditions of far distance and air. The blind guide dog can provide help for the visually impaired, but the training and maintenance cost of the blind guide dog is high, and the blind guide dog is hard to bear by a common family. In some occasions, the guide dog cannot accompany the blind person to enter, such as a bus and a railway station, so that the assistance of the guide dog is limited. The bionic eye can help the visually impaired to recover partial vision, but the implantation of the bionic eye requires an operation and is high in cost. The bionic eye is only suitable for blind people with blindness caused by retinitis pigmentosa or senile macular degeneration. Visually impaired people with damaged optic nerves cannot restore part of their vision by implanting a bionic eye.

The electronic vision-impairment auxiliary tool mainly uses an ultrasonic technology, a laser ranging technology, a binocular vision technology, a laser speckle coding technology, a laser radar technology, a millimeter wave radar technology, a thermal imaging technology and a Global Positioning System (GPS). The range finding based on the ultrasonic wave technology and the laser range finding technology is limited, only single-point range finding can be realized, the obtained information amount is too small, the power consumption is high, the equipment is heavy, only an alarm function can be realized, and the environmental interference is easy to cause. The assistance based on the binocular vision technology depends on the abundance of characteristic points and textures in the environment, and the scene with single texture fails, such as an indoor white wall, a smooth ground and the like. The binocular vision technology can be deceived by special situations such as mirror reflection and the like, so that the judgment is missed or misjudged. Assistance based on laser speckle coding techniques fails outdoors because the actively projected structured light is overwhelmed by sunlight and thus the coded speckles cannot be identified. Due to the power limitation, the laser speckle coding technology has the farthest distance, and an object beyond the farthest distance cannot measure the distance. The auxiliary cost based on the laser radar technology is high, the sampling rate is low generally, the system is sensitive to dust, haze and rainwater, and color and texture information cannot be acquired. The auxiliary resolution based on the millimeter wave radar is low, and the signal processing process is difficult. The thermal imaging technology-based auxiliary resolution is low, the calibration process is complex, and only human, animal and other heating objects can be detected. The GPS has low assist accuracy, causes signal loss, cannot be used indoors, and cannot acquire local dynamic obstacle information.

The traditional interaction mode for the visually impaired people is mainly voice prompt and touch vibration. Semantic prompts generally broadcast the distance and direction of obstacles, require a certain time to broadcast, cause delay and accident risk, and have small transmittable information amount. The touch vibration is hardware through vibrating a waistband or a vibrating vest to the vibration prompts the position of the barrier, the vibration device can solve the problem of delay, but the vibration device brings burden to the visually impaired people, and wearing feeling of different people is different.

Disclosure of Invention

The present invention is directed to a system and method for visually impaired people path prediction based on an RGB-D camera and stereo sound.

The purpose of the invention is realized by the following technical scheme: a vision-impaired person pathway prediction system based on an RGB-D camera and stereo comprises an infrared projector, two identical infrared cameras, a color camera, an attitude angle sensor, a USB hub, a small-sized processor, a bone conduction earphone module, two bone conduction vibration modules and a battery module. The infrared projector, the two infrared cameras, the color camera and the attitude angle sensor are connected with the small-sized processor through the USB concentrator, and the battery module is connected with the small-sized processor. The color camera and the infrared projector are located between the two infrared cameras. The optical axes of the two infrared cameras and the color camera are parallel to each other. Attitude angles of the three cameras are consistent and are acquired in real time through attitude angle sensors. The small-sized processor controls the infrared projector to project invisible static near-infrared speckles to the front three-dimensional scene, and the two infrared cameras collect two infrared images of the projected three-dimensional scene in real time. A color camera acquires color images of a three-dimensional scene in real time. The USB concentrator transmits the two infrared images, the color image and the attitude angle information to the small-sized processor. And the small processor processes the two acquired infrared images and the color image to acquire a depth image of the three-dimensional scene. And the small processor processes the depth information and the attitude angle information to acquire a height image of the three-dimensional scene. The small processor blocks the height image, converts the depth information after blocking into a stereo signal and transmits the stereo signal to the bone conduction earphone module. The bone conduction earphone module converts the stereo signal into a bone conduction vibration signal and transmits the bone conduction vibration signal to the two bone conduction vibration modules. And the two bone conduction vibration modules transmit bone conduction vibration signals to the vision-impaired user.

The path prediction method of the system comprises the following steps:

(1) Calibrating two infrared cameras once to obtain focal lengths f of the two infrared cameras_IRPrincipal point position of left infrared camera (c)_IR-x,c_IR-y) Base distance B of two infrared cameras_IR-IR。

(2) Calibrating the color camera once to obtain the focal length f of the color camera_colorPrincipal point location (c)_COLOR-x,c_COLOR-y)。

(3) Calibrating the color camera and the left infrared camera by a binocular camera once to obtain the base line distance B between the left infrared camera and the color camera_IR-COLOR。

(4) The infrared projector projects the invisible static near-infrared speckle into the three-dimensional scene in real time.

(5) Two infrared cameras collect two infrared images IR of three-dimensional scene_leftAnd IR_right。

(6) Color camera Color images Color of a three-dimensional scene.

(7) The attitude Angle sensor collects the rotation angles of the three cameras in the X, Y and Z three-axis directions_X，Angle_Y，Angle_Z。

(8) The USB concentrator converts two infrared raysImage IR_leftAnd IR_rightColor image Color, rotation Angle in three-axis directions of X, Y and Z_X，Angle_Y，Angle_ZTo the mini-processor.

(9) Small processor IR for two infrared images_leftAnd IR_rightSobel edge is extracted, and two Sobel edge images Sobel are obtained_leftAnd Sobel_right。

(10) Sobel with left Sobel edge image_leftFor reference, two Sobel edge images Sobel_leftAnd Sobel_rightImage matching based on image blocks is carried out, and a series of well-matched effective points E ═ E are obtained₁,e₂,e₃,...,e_M}. At the left Sobel edge image Sobel_leftWherein each effective point is e ═ (u, v, d)^TU is the abscissa pixel value, v is the ordinate pixel value, and d is the disparity value.

(11) Taking the matched effective point E as a reference, forming a parallax plane by every three effective points, wherein the equation of the ith parallax plane is d ═ a_iu+b_iv+c_iWherein a is_i，b_i，c_iIs the coefficient of the ith parallax plane.

(12) On the basis of the parallax planes, unmatched pixel points (u ', v ', d ')^TConversion to matching significant Point (u, v, d)^T(ii) a The method specifically comprises the following steps: the pixel point (u ', v ', d ')^TA distance to the i-th parallax plane ofSetting the energy function asWhere ε, σ are constants. Traversing all parallax values d ' ═ d ' in the parallax search range for the pixel point '_min,...,d'_maxAnd solving the parallax value which enables the Energy function Energy (d') to be minimum, and taking the parallax value as the parallax value d of the pixel point. Further, u ═ u ', v ═ v'.

(13) Traversing all unmatched pixel points, obtaining the parallax value of each unmatched pixel point,Obtaining parallax image Disparity based on the left infrared camera_left。

(14) According to the focal lengths f of the two infrared cameras_IRAnd a base distance B_IR-IRTraversing each point (u, v, d) in the parallax image with a depth value ofDepth image Depth_leftEach point in the Depth map is (u, v, Depth), so that a Depth image Depth with the left infrared camera as a reference is obtained_left。

(15) Depth image Depth is utilized_leftAnd Color image Color, focal lengths f of two infrared cameras_IRPrincipal point position of left infrared camera (c)_IR-x,c_IR-y) Focal length f of color camera_colorPrincipal point location (c)_COLOR-x,c_COLOR-y) And a baseline distance B for the left infrared camera and the color camera_IR-COLORThe Depth image Depth of the color camera field of view can be obtained by aligning the Depth image and the color image_color。

(16) depth from Depth image Depth_colorfocal length f of color camera_colorAnd principal point position (c) of the color camera_COLOR-x,c_COLOR-y) The three-dimensional coordinates (X, Y, Z) of each point in the color camera coordinate system can be calculated. Depth image Depth_colorThe coordinate of the middle point is (u, v) and the depth value is depth, the three-dimensional coordinate (X, Y, Z) can be calculated by equation (1):

Z＝depth

(17) According to the three-dimensional coordinates (X, Y, Z) of each point in the depth image in the camera coordinate system and the rotation angles of the three-axis directions of the attitude Angle sensor, the rotation angles are Angle_X＝α，Angle_Y＝β，Angle_ZThen, the coordinate (X) of each point in the world coordinate system can be calculated by the formula (2)_w,Y_w,Z_w)：

(18) according to the coordinate Y of each point in the world coordinate system_wI.e., the vertical Height of each point to the color camera wearing position, a Height image Height can be acquired.

(19) Dividing the Height image Height into K from left to right, and calculating each Height image Height_KAverage height of_K. (K is generally between 2 and 10)

(20) K pieces of Height image Height are expressed by ensemble of K musical instruments with different timbres_KEach block is represented by the sound production of a musical instrument of a different timbre. Given the height of the visually impaired user as H, the average height of the height images of the different blocks_KInversely proportional to the difference between H and the instrument loudness Volume, i.e.: average height_KThe closer to H, the closer to the ground the real object in the image is, the more suitable the road condition is for passing, and the larger the loudness Volume is; average height_KThe farther away from H, the farther away from the ground the real object in the image is, the less suitable the road condition is for traffic, and the smaller the loudness Volume is. The instrument sound in each direction is stereo. The musical instrument can be selected from piano, violin, gong, trumpet, xylophone, etc. with special tone and pleasant.

(21) The mini-processor passes the stereo signal to the bone conduction headset module.

(22) The bone conduction earphone module converts the stereo signal into a bone conduction vibration signal.

(23) The bone conduction vibration module transmits the bone conduction signal to the visually impaired user.

Compared with the prior auxiliary method for the visually impaired, the method has the advantages that:

1. And (4) environmental suitability. Due to the use of an infrared projector and two infrared cameras, the method is compatible for use in both indoor and outdoor environments. When the device is indoors, the static near-infrared light spots projected by the infrared projector add textures to a three-dimensional scene, and the device is favorable for obtaining a dense depth image. When the system is outdoors, the near infrared part of sunlight is combined with a three-dimensional scene, so that a dense depth image can be acquired. The dense depth image can ensure the accuracy of the blocking height and the experience effect of auxiliary interaction.

2. Applicability in daytime and at night. Due to the use of the infrared projector and the two infrared cameras, the method can be used compatibly in the daytime and at night. In the daytime, static near-infrared light spots projected by the infrared projector and near-infrared components in sunlight can add textures to the three-dimensional scene, and dense depth images are facilitated. At night, the static near-infrared light spots projected by the infrared projector add textures to the near three-dimensional scene, and a depth image of the near three-dimensional scene can also be acquired. The method can obtain reliable depth images in the daytime and at night, so that the accuracy of the blocking height and the experience effect of auxiliary interaction are guaranteed.

3. The road conditions such as stairs and slopes can be distinguished, because stereo interaction is adopted, the stereo signals represent the height values of all directions in front, the sounds representing the road conditions such as the stairs and the slopes and the sounds representing the smooth road surfaces which can pass through are different in sound, and the sound signals can predict the passing areas and the road conditions such as the stairs and the slopes.

4. The conditions of the road pits and the like can be distinguished, because stereo interaction is adopted, stereo signals represent height values of all directions in front, because the height of the road pits is different from that of normal road conditions, the sound representing the conditions of the road pits and the sound representing the passable flat road surface sound differently, and the sound signals can predict not only passable areas but also the conditions of the road pits and the like.

5. no ears are occupied. The method adopts the bone conduction earphone to transmit signals to the visually impaired user, and does not hinder the user from hearing outside sounds. Most visually impaired people rely on external sounds to make some interpretation, such as judging the direction of a road according to traffic sounds.

6. Does not occupy two hands. The auxiliary device of the method is wearable, the small processor is portable, and the auxiliary device can be placed in a pocket or a small bag, so that the auxiliary device does not bring great burden to the vision disorder, and people with the vision disorder do not need to hold auxiliary tools with hands.

7. The user is not bothered. The stereo interaction mode of the method uses the musical instrument with pleasure to sound, does not cause annoyance to the visually impaired users, and enables the visually impaired users to avoid passing by listening to the pleasure music when in use.

8. A sufficient amount of information is fed back. Compared with semantic voice broadcasting, stereo interactive feedback utilizes different loudness, musical instruments with different timbres represent traffic ability of road conditions, road conditions in front in different directions can be simultaneously and fully transmitted, and the direction of a passable area can be predicted.

9. Easy to learn and understand. Compared with the sound coding in a complex form, the stereo interaction is based on the height partitioning, the height information after the partitioning processing is not very complicated, and a vision-impaired user can rapidly learn and understand the meaning of the stereo signal and select the walking direction according to the stereo signal.

10. And timely feedback is carried out. Compared with semantic voice broadcasting, the interactive feedback of stereo is timely without delay. Therefore, the visually impaired can select the correct passable path in time, and the safety of the method is ensured.

Drawings

FIG. 1 is a schematic block diagram of a pathway prediction system for visually impaired persons;

FIG. 2 is a schematic view of the structure of the glasses for visually impaired people to predict their pathway;

FIG. 3 shows two infrared images IR_leftAnd IR_right；

FIG. 4 is a Depth image Depth after the graying process_left(in the original depth image, the closer the red is, the farther the blue is, the more, expressed in pseudo color).

Fig. 5 shows a grayed color image (in the original color image, the vertical height from the ground plane to the RGB-D wearing position is close to the height of the visually impaired user, which is marked as a pass-through region by green).

Fig. 6 is a schematic diagram of a representation path of musical instrument stereo.

Detailed Description

As shown in fig. 1, a vision impaired person pathway prediction system based on RGB-D camera and stereo sound comprises an infrared projector, two identical infrared cameras, a color camera, an attitude angle sensor, a USB hub, a small-sized processor, a bone conduction earphone module, two bone conduction vibration modules, and a battery module. The infrared projector, the two infrared cameras, the color camera and the attitude angle sensor are connected with the small-sized processor through the USB concentrator, and the battery module is connected with the small-sized processor. The color camera and the infrared projector are located between the two infrared cameras. The optical axes of the two infrared cameras and the color camera are parallel to each other. Attitude angles of the three cameras are consistent and are acquired in real time through attitude angle sensors. The small-sized processor controls the infrared projector to project invisible static near-infrared speckles to the front three-dimensional scene, and the two infrared cameras collect two infrared images of the projected three-dimensional scene in real time. A color camera acquires color images of a three-dimensional scene in real time. The USB concentrator transmits the two infrared images, the color image and the attitude angle information to the small-sized processor. And the small processor processes the two acquired infrared images and the color image to acquire a depth image of the three-dimensional scene. And the small processor processes the depth information and the attitude angle information to acquire a height image of the three-dimensional scene. The small processor blocks the height image, converts the depth information after blocking into a stereo signal and transmits the stereo signal to the bone conduction earphone module. The bone conduction earphone module converts the stereo signal into a bone conduction vibration signal and transmits the bone conduction vibration signal to the two bone conduction vibration modules. And the two bone conduction vibration modules transmit bone conduction vibration signals to the vision-impaired user. The system may be designed similar to the glasses described in fig. 2 to achieve an aesthetic effect.

The path prediction method of the system comprises the following steps:

(2) For color phasecalibrating the camera to obtain the focal length f of the color camera_colorPrincipal point location (c)_COLOR-x,c_COLOR-y)。

(6) Color camera Color images Color of a three-dimensional scene.

(8) The USB hub transmits two infrared images IR_leftAnd IR_rightColor image Color, rotation Angle in three-axis directions of X, Y and Z_X，Angle_Y，Angle_ZTo the mini-processor.

(13) Traversing all unmatched pixel points, obtaining the parallax value of each unmatched pixel point, and obtaining the parallax image Disparity with the left infrared camera as the reference_left。

Z＝depth

Claims

1. A vision-impaired person passage prediction system based on an RGB-D camera and stereo is characterized by comprising an infrared projector, two identical infrared cameras, a color camera, an attitude angle sensor, a USB hub, a small-sized processor, a bone conduction earphone module, two bone conduction vibration modules and a battery module; the infrared projector, the two infrared cameras, the color camera and the attitude angle sensor are connected with the small-sized processor through the USB concentrator, and the battery module is connected with the small-sized processor; the color camera and the infrared projector are positioned between the two infrared cameras; the optical axes of the two infrared cameras and the color camera are parallel to each other; the attitude angles of the three cameras are consistent and are acquired in real time through attitude angle sensors; the small processor controls the infrared projector to project invisible static near-infrared speckles to the front three-dimensional scene, and the two infrared cameras collect two infrared images of the projected three-dimensional scene in real time; a color camera collects color images of a three-dimensional scene in real time; the USB concentrator transmits two infrared images, a color image and attitude angle information to the small-sized processor; the small processor processes the two acquired infrared images and the color image to acquire a depth image of the three-dimensional scene; the small processor processes the depth information and the attitude angle information to obtain a height image of the three-dimensional scene; the small processor blocks the height image, converts the depth information after blocking into a stereo signal and transmits the stereo signal to the bone conduction earphone module; the bone conduction earphone module converts the stereo signal into a bone conduction vibration signal and transmits the bone conduction vibration signal to the two bone conduction vibration modules; the two bone conduction vibration modules transmit bone conduction vibration signals to the visually impaired user; the path prediction method of the system comprises the following steps:

(1) Calibrating two infrared cameras once to obtain focal lengths f of the two infrared cameras_IRPrincipal point position of left infrared camera (c)_IR-x,c_IR-y) Base distance B of two infrared cameras_IR-IR；

(2) calibrating the color camera once to obtain the focal length f of the color camera_colorPrincipal point location (c)_COLOR-x,c_COLOR-y)；

(3) Calibrating the color camera and the left infrared camera by a binocular camera once to obtain the base line distance B between the left infrared camera and the color camera_IR-COLOR；

(4) The infrared projector projects invisible static near-infrared speckles into the three-dimensional scene in real time;

(5) Two infrared cameras collect two infrared images IR of three-dimensional scene_leftAnd IR_right；

(6) Acquiring a Color image Color of a three-dimensional scene by a Color camera;

(7) the attitude Angle sensor collects the rotation angles of the three cameras in the X, Y and Z three-axis directions_X，Angle_Y，Angle_Z；

(8) The USB hub transmits two infrared images IR_leftAnd IR_rightColor image Color, rotation Angle in three-axis directions of X, Y and Z_X，Angle_Y，Angle_ZTo the mini-processor;

(9) Small processor IR for two infrared images_leftAnd IR_rightSobel edge is extracted, and two Sobel edge images Sobel are obtained_leftAnd Sobel_right；

(10) Sobel with left Sobel edge image_leftFor reference, two Sobel edge images Sobel_leftAnd Sobel_rightImage matching based on image blocks is carried out to obtain a series of well matched effective pointsE＝{e₁,e₂,e₃,...,e_M}; at the left Sobel edge image Sobel_leftWherein each effective point is e ═ (u, v, d)^Tu is an abscissa pixel value, v is an ordinate pixel value, and d is a disparity value;

(11) Taking the matched effective point E as a reference, forming a parallax plane by every three effective points, wherein the equation of the ith parallax plane is d ═ a_iu+b_iv+c_iWherein a is_i，b_i，c_iCoefficients for the ith parallax plane;

(12) On the basis of the parallax planes, unmatched pixel points (u ', v ', d ')^Tconversion to matching significant Point (u, v, d)^T(ii) a The method specifically comprises the following steps: the pixel point (u ', v ', d ')^TA distance to the i-th parallax plane ofSetting the energy function asWherein epsilon and sigma are constants; traversing all parallax values d ' ═ d ' in the parallax search range for the pixel point '_min,...,d'_maxSolving the parallax value which enables the Energy function Energy (d') to be minimum, and taking the parallax value as the parallax value d of the pixel point; further, u ═ u ', v ═ v';

(13) Traversing all unmatched pixel points, obtaining the parallax value of each unmatched pixel point, and obtaining the parallax image Disparity with the left infrared camera as the reference_left；

(14) According to the focal lengths f of the two infrared cameras_IRAnd a base distance B_IR-IRTraversing each point (u, v, d) in the parallax image with a depth value ofDepth image Depth_leftEach point in the Depth map is (u, v, Depth), so that a Depth image Depth with the left infrared camera as a reference is obtained_left；

(15) Depth image Depth is utilized_leftAnd Color image Color, focal lengths f of two infrared cameras_IRPrincipal point position of left infrared camera (c)_IR-x,c_IR-y) Focal length f of color camera_colorprincipal point location (c)_COLOR-x,c_COLOR-y) And a baseline distance B for the left infrared camera and the color camera_IR-COLORThe Depth image Depth of the color camera field of view can be obtained by aligning the Depth image and the color image_color；

(16) depth from Depth image Depth_colorfocal length f of color camera_colorAnd principal point position (c) of the color camera_COLOR-x,c_COLOR-y) Calculating three-dimensional coordinates (X, Y, Z) of each point in a color camera coordinate system; depth image Depth_colorThe coordinate of the middle point is (u, v) and the depth value is depth, the three-dimensional coordinate (X, Y, Z) can be calculated by equation (1):

(18) According to the coordinate Y of each point in the world coordinate system_wNamely, the vertical Height from each point to the wearing position of the color camera, the Height image Height can be obtained;

(19) Dividing the Height image Height into K blocks from left to right, and calculating each block of Height image Height_KAverage height of_K(ii) a Wherein, the value of K is generally between 2 and 10;

(20) K pieces of Height image Height are expressed by ensemble of K musical instruments with different timbres_K: each block is represented by the sounding of musical instruments with different timbres; given the height of the visually impaired user as H, the average height of the height images of the different blocks_KInversely proportional to the difference between H and the instrument loudness Volume, i.e.: average height_KThe closer to H, the closer to the ground the real object in the image is, the more suitable the road condition is for passing, and the larger the loudness Volume is; average height_KThe farther away from H, the farther away from the ground the real object in the image is, the more unsuitable the road condition is for passing, and the smaller the loudness Volume is; the sound of the musical instrument in each direction is stereo;

(21) The small processor transmits the stereo signals to the bone conduction earphone module;

(22) The bone conduction earphone module converts the stereo signal into a bone conduction vibration signal;

(23) The bone conduction vibration module transmits the bone conduction vibration signal to the visually impaired user.