CN112241763A

CN112241763A - Multi-source multi-mode dynamic information fusion and cognition method and system

Info

Publication number: CN112241763A
Application number: CN202011120817.4A
Authority: CN
Inventors: 高洪波; 黄鹏博; 李智军
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2020-10-19
Filing date: 2020-10-19
Publication date: 2021-01-19

Abstract

The invention provides a multi-source multi-mode dynamic information fusion and cognition method and a system, which comprises the following steps: a camera image fusion step: using a plurality of cameras to acquire camera images, and preprocessing and fusing the acquired images to obtain an image information fusion result; radar information fusion: collecting target information by using a plurality of radars, and fusing and detecting the collected target information to obtain a target detection result; acoustic vector sensor tracking: acquiring target sound signals by using a plurality of sound vector sensors, and fusing and tracking target sound signal information to obtain a target tracking result; radar information and visual information fusion step: and fusing the image information fusion result, the target detection result and the target tracking result to generate a target region of interest. The invention is respectively used for collecting image information, radar information and sound information, the three types of information are mutually combined and have complementary advantages, and the limitation of a single sensor is made up.

Description

Multi-source multi-mode dynamic information fusion and cognition method and system

Technical Field

The invention relates to the technical field of epidemic prevention robots, in particular to a multi-source multi-mode dynamic information fusion and cognition method and system. In particular to a multi-source multi-mode dynamic information fusion and cognition method for an epidemic prevention robot in a complex environment.

Background

At present, under the condition of large waves and tides of information technology, robots are very commonly applied, and epidemic prevention robots are one type of robots. The epidemic prevention robot plays an important role in hospitals and areas after earthquake, can serve as a medicine delivery worker, a disinfector and the like, and can also be better competent for transporting and carrying some medical instruments. Because the working environment of the epidemic prevention robot is generally complex, a plurality of sensors are generally arranged on the epidemic prevention robot, the advantages of the sensors are complementary, information fusion is carried out, and the epidemic prevention task is completed together.

A large number of sensors are arranged on the epidemic prevention robot body, the sensors have the functions of eyes, ears and other organs of a human body and are mainly used for capturing external information. Information fusion is a method for processing information aiming at multiple sensors. Sensor information fusion is the key point of research in all countries in recent years, and a great amount of human resources and material resources are put into all countries in the world to carry out theoretical and practical research on the sensor information fusion technology. The information fusion of the sensors can be divided into two types according to different fusion types, wherein one type is the information fusion among the sensors of the same type, and the other type is the information fusion among the sensors of different types. For the information fusion of the similar sensors, the method mainly aims to obtain more accurate targets and more extensive information; for heterogeneous sensor information fusion, the main function is to combine the advantages of each sensor and complement the advantages of the functions of a single sensor, so that the effect of the sensor can be fully exerted.

The schematic diagram of the multi-sensor information fusion is shown in fig. 2, information which cannot be acquired by a single sensor can be realized by using the multi-sensor information fusion, the capability of the system for processing information can be effectively expanded, and meanwhile, by using redundant information provided by the multi-sensor, when one sensor fails, the information provided by the other sensor can be used, so that the robustness of the system is improved. In addition, a plurality of same-type sensors or a plurality of different-type sensors carry out comprehensive processing on information through information fusion, the detection area of the sensors can be enlarged, more comprehensive, more accurate and information data at different moments can be obtained, the detection capability of the system can be improved, and the limitation of information measurement caused by the adoption of a single sensor can be avoided.

When the sensor information fusion is implemented specifically, the sensor information fusion mainly comprises two aspects of processing, firstly, when the data of the sensor is processed, data integration needs to be carried out on the data of the same type, data of different types are subjected to data type conversion and converted into the data of the same type, and then when the data processed in the last step is fused, some methods need to be adopted to optimize data results, so that effective information can be accurately obtained.

The target detection is used for detecting a specific target, and the data measured by multiple sensors are fused by a fusion algorithm by utilizing a multi-sensor information fusion technology, so that more accurate information than that of a single sensor can be obtained. The information fusion structure comprises data level fusion, characteristic level fusion and the like, wherein the sensors of the data level information fusion are required to be of the same type, original observation information is correlated and then subjected to data layer fusion, and the characteristic level fusion is to extract the characteristics of targets observed by the sensors and then perform fusion processing on the extracted characteristic information.

Compared with the conventional sound pressure sensor which cannot utilize the sound wave vibration velocity information, the sound vector sensor can simultaneously acquire the sound pressure and vibration velocity information of target sound, can measure abundant information and has stronger direction sensitivity.

Patent document CN108392795A (application number: 201810109937.0) discloses a multi-modal control method for a rehabilitation robot based on multi-information fusion, which uses biological information and motion information of a patient as the basis of the adjustment of training parameters of the rehabilitation robot and parameters of a control system, i.e. motion commands or given signals of the control system. The method obtains information of surface electromyographic signals, electroencephalogram signals, electrocardiosignals, joint angles, joint angular velocities, joint moments, contact forces and the like of a patient, dynamically adjusts training parameters and control system parameters by using an information fusion algorithm, automatically switches training modes, and can realize upper and lower limb cooperative intelligent control and flexible control of the rehabilitation robot.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a multi-source multi-mode dynamic information fusion and cognition method and a multi-source multi-mode dynamic information fusion and cognition system.

The multi-source multi-mode dynamic information fusion and cognition method provided by the invention comprises the following steps:

a camera image fusion step: using a plurality of cameras to acquire camera images, and preprocessing and fusing the acquired images to obtain an image information fusion result;

radar information fusion: collecting target information by using a plurality of radars, and fusing and detecting the collected target information to obtain a target detection result;

acoustic vector sensor tracking: acquiring target sound signals by using a plurality of sound vector sensors, and fusing and tracking target sound signal information to obtain a target tracking result;

radar information and visual information fusion step: and fusing the image information fusion result, the target detection result and the target tracking result to generate a target region of interest.

Preferably, the plurality of cameras form a parallel structure, image acquisition is respectively carried out, acquired images are preprocessed to obtain data-level image information, the data-level image information is fused, image information with the precision meeting the preset requirement is obtained, and the time coverage range is expanded.

Preferably, the plurality of radars form a parallel structure, respectively acquire information of the target, preprocess the acquired target information to obtain data-level target information, fuse the data-level target information, acquire target detection information with accuracy meeting preset requirements, and expand a target detection range.

Preferably, the plurality of acoustic vector sensors form a parallel structure, respectively collect target sound signals, preprocess the collected target sound signals to obtain data-level target sound signals, fuse the data-level target sound signals, and obtain target sound information with precision meeting preset requirements.

Preferably, the fusing comprises:

adopting a pixel-level fusion algorithm based on a weighted average method, wherein the formula is as follows:

F(i,j)＝ω_A×A(i,j)+ω_B×B(i,j)

wherein A, B is the two images and F is the image after fusion; i represents the ith row of the image; j represents the jth column of the image; omega_ARepresenting the weight applied to image a.

Preferably, two radars R are provided_hAnd R_lWherein the high-precision radar R_hWith a sampling period of 2s, low-precision radar R_lThe sampling period of (3) s, and the data are time-sequenced by using an interpolation extrapolation method;

the method comprises the steps of performing incremental sequencing on each data of the radar, interpolating and extrapolating the data on the observation time of the high-precision radar to the data on the time point of the low-precision radar, taking the minimum time interval after interpolation and extrapolation as a basic time unit, and interpolating and extrapolating the data on other time points according to the basic time unit to obtain the data with equal intervals.

Preferably, the data after time sequencing is subjected to data level fusion, a plurality of groups of target information data collected by the radar are fused into a group of data, and a least square method is adopted for fusion during radar information fusion, wherein the fusion method comprises the following steps:

is provided with Z_k＝(z₁,z₂,…,z_n)^TIs t_k-1To t_kN data sets collected by the radar array at a moment

Z in (1) and

denotes z₁,z₂,…,z_nFused measured values and derivatives thereof, radar measured values z_iExpressed as:

v_trepresenting the measurement noise, the above equation is written in matrix form:

Z_n＝W_nU+V_n(ii) a Wherein, V_n＝(v₁,v₂,…,v_n)^TThe mean is 0 and the covariance matrix is:

wherein σ_rMeasuring the noise variance before fusion;

according to the least squares criterion:

j is to

A minimum defined variable;

and U have the same meaning;

to minimize J, the derivation is done on both sides of the above equation:

then there are:

and z have the same meaning;

and

have the same meaning;

the variance matrix estimates are:

fusing the n measured values to obtain the measured value at the k moment and the noise variance as follows:

z (k) represents the fused measurement value at time k.

The multi-source multi-mode dynamic information fusion and cognition system provided by the invention comprises the following components:

a camera image fusion module: using a plurality of cameras to acquire camera images, and preprocessing and fusing the acquired images to obtain an image information fusion result;

the radar information fusion module: collecting target information by using a plurality of radars, and fusing and detecting the collected target information to obtain a target detection result;

acoustic vector sensor tracking module: acquiring target sound signals by using a plurality of sound vector sensors, and fusing and tracking target sound signal information to obtain a target tracking result;

radar information and visual information fusion module: and fusing the image information fusion result, the target detection result and the target tracking result to generate a target region of interest.

Preferably, the plurality of cameras form a parallel structure, image acquisition is respectively carried out, acquired images are preprocessed to obtain data-level image information, the data-level image information is fused, and image information with the precision meeting the preset requirement and the time coverage range are expanded;

the multiple radars form a parallel structure, respectively acquire information of a target, preprocess the acquired target information to obtain data-level target information, fuse the data-level target information to obtain target detection information with the precision meeting the preset requirement and expand the target detection range;

the plurality of acoustic vector sensors form a parallel structure, respectively collect target sound signals, preprocess the collected target sound signals to obtain data-level target sound signals, fuse the data-level target sound signals, and obtain target sound information with the precision meeting the preset requirement.

Preferably, the fusing comprises:

F(i,j)＝ω_A×A(i,j)+ω_B×B(i,j)

wherein A, B is the two images and F is the image after fusion; i represents the ith row of the image; j represents the jth column of the image; omega_ARepresents the weight applied to image a;

is provided with two radars R_hAnd R_lWherein the high-precision radar R_hWith a sampling period of 2s, low-precision radar R_lThe sampling period of (3) s, and the data are time-sequenced by using an interpolation extrapolation method;

the method comprises the steps of firstly, carrying out increment sequencing on each data of the radar, then carrying out interpolation extrapolation on the data at the observation time of the high-precision radar to the data at the time point of the low-precision radar, finally taking the minimum time interval after interpolation extrapolation as a basic time unit, and carrying out interpolation extrapolation according to the basic time unit to obtain the data at other time points, thereby obtaining the data at equal intervals;

performing data level fusion on the data after time sequencing, fusing a plurality of groups of target information data acquired by the radar into a group of data, and fusing by adopting a least square method during radar information fusion, wherein the fusion method comprises the following steps:

Z in (1) and

wherein σ_rMeasuring the noise variance before fusion;

according to the least squares criterion:

j is to

A minimum defined variable;

and U have the same meaning;

to minimize J, the derivation is done on both sides of the above equation:

then there are:

and z have the same meaning;

and

have the same meaning;

the variance matrix estimates are:

z (k) represents the fused measurement value at time k.

Compared with the prior art, the invention has the following beneficial effects:

1. the three sensors, namely the camera, the acoustic vector sensor and the radar, are used for tracking the target and generating the region of interest, have advantages and are respectively used for acquiring image information, radar information and sound information, and the three types of information are combined with one another and have complementary advantages, so that the limitation of a single sensor is overcome;

2. according to the invention, by using a data level fusion method, image information, target information acquired by a radar and sound information acquired by a sound vector sensor are respectively subjected to data level fusion, so that the space and time scope of expanded information can be obtained, and more accurate target information can be obtained;

3. the method fuses radar information and visual information, utilizes the radar to detect the target, then projects the information onto the collected image to generate the region of interest, and the image only processes the target in the region of interest, thereby greatly reducing the operation amount;

4. the invention can be used for detecting the sound information of the target and tracking the target by designing the sound vector sensor part, and simultaneously can track the target and generate the interested region on the image by combining with the generated interested region.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is an overall schematic view of the present invention;

FIG. 2 is a schematic diagram of multi-sensor information fusion in accordance with the present invention;

FIG. 3 is a schematic diagram of data level fusion according to the present invention;

FIG. 4 is a schematic view of a portion of a camera image of the present invention;

FIG. 5 is a schematic diagram of a portion of radar information according to the present invention;

FIG. 6 is a schematic diagram of an interpolation-extrapolation temporal registration of the present invention;

FIG. 7 is a schematic view of the tracking portion of the acoustic vector sensor of the present invention;

FIG. 8 is a schematic diagram of the fusion of radar and visual information according to the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

Example (b):

referring to fig. 1, the multi-source multi-mode dynamic information fusion and cognition method for complex environment of epidemic prevention robot provided by the invention can be divided into a camera image part, a radar information part, an acoustic vector sensor tracking part and a radar information and visual information fusion part. The four parts specifically comprise a plurality of cameras, camera image acquisition, acquired image preprocessing, image information data level fusion, a plurality of radars, target information acquisition by using the radars, data level fusion of the acquired information of the targets, target detection, radar and visual information fusion, target interesting region generation, a plurality of acoustic vector sensors, target sound signal acquisition by using the acoustic vector sensors, data level fusion of the sound signal information, target tracking and combination of the generated interesting regions. For the camera image part, the camera image part is composed of a plurality of cameras, camera image acquisition, acquired image preprocessing and image information data level fusion, wherein the plurality of cameras form a parallel structure, respectively acquire images and then perform image data level fusion; the radar information part is composed of a plurality of radars, target information acquisition is carried out by utilizing the radars, data-level fusion and target detection are carried out on the information acquired by the targets, and as with the camera image part, the radars in the radar information part also form a parallel structure, respectively carry out information acquisition on the targets and then carry out data-level fusion; for the acoustic vector sensor tracking part, the acoustic vector sensor tracking part is composed of a plurality of acoustic vector sensors, utilizes the acoustic vector sensors to collect target sound signals, performs data level fusion on sound signal information and tracks a target, is the same as the camera image part, and the acoustic vector sensors form a parallel structure, respectively collects the target sound signals and then performs data level fusion; and for the radar information and visual information fusion part, connecting the radar information part and the camera image part, combining the target detection result obtained by the radar information part and the image information fusion result obtained by the camera image part, and fusing the radar and visual information to generate a target region of interest. The four parts are combined into one block as an abstract figure to form the complex environment multi-source multi-mode dynamic information fusion and cognition method of the epidemic prevention robot.

Referring to fig. 1, 2, 3 and 4, a multisource multimode dynamic information fusion and cognition method for complex environment of epidemic prevention robot is characterized in that: by constructing an image part, a parallel structure is formed by a plurality of cameras, image information is acquired by each camera, and then preprocessing and image information data-level fusion are performed to obtain more accurate image information and expand the image information space and time coverage.

In the parallel structure, the original homogeneous data obtained by the sensor is directly fused, so that the data loss is low, more comprehensive information can be provided for follow-up, and the parallel structure has the characteristic of high precision.

For data-level fusion, more accurate and wider information can be obtained by fusing data of a plurality of sensors of the same type. Because the cameras form a parallel structure, the images are acquired by the cameras independently, and the sampling periods of the cameras are not necessarily the same, so that the time difference problem exists when the data acquired by the cameras are fused, the image data fusion is performed in two steps, wherein the first step is firstly time registration and then fusion. For the time registration, in the image data level fusion, the current image information is extracted from the image information sequences acquired by the cameras respectively at the same time T, so that the time registration can be carried out. For the fusion algorithm, a pixel-level fusion algorithm based on a weighted average method is adopted, the algorithm can effectively suppress the noise of the input image, and the principle of the weighted average method is as follows:

F(i,j)＝ω_A×A(i,j)+ω_B×B(i,j)

where A, B is the two images and F is the image after fusion.

Referring to fig. 1, 2, 3 and 5, a multisource multimode dynamic information fusion and cognition method for complex environment of epidemic prevention robot is characterized in that: by constructing a radar information part, forming a parallel structure by a plurality of radars, collecting target information in parallel, and then carrying out data-level information fusion on the information, more accurate target detection information can be obtained and the range of target detection can be enlarged.

The method is characterized in that the data on the measurement time of the high-precision radar are calculated to the low-precision observation time by adopting an interpolation extrapolation method for radar information time alignment, so that the time synchronization of the two radars is achieved. Is provided with two radars R_hAnd R_lWherein the high-precision radar R_hWith a sampling period of 2s, low-precision radar R_lThe sampling period of (2) is 3s, and the principle of using interpolation extrapolation is as follows: the method comprises the steps of performing incremental sequencing on each data of the radar, interpolating and extrapolating the data on the observation time of the high-precision radar to the data on the time point of the low-precision radar, taking the minimum time interval after interpolation and extrapolation as a basic time unit, and interpolating and extrapolating the data on other time points according to the time unit to obtain a series of data with equal intervals. This principle is illustrated in fig. 6.

At a time of usingAfter inter-registration, data-level fusion can be performed, a plurality of groups of target information data collected by the radar are fused into a group of data, and a least square method is adopted for fusion during radar information fusion, wherein the specific fusion method comprises the following steps: is provided with Z_k＝(z₁,z₂,…,z_n)^TIs t_k-1To t_kN data sets collected by the radar array at a moment

Z in (1) and

denotes z₁,z₂,…,z_nFused measured values and derivatives thereof, radar measured values z_iCan be expressed in the following form:

wherein v is_tExpressed as measurement noise, the above equation is written in matrix form:

Z_n＝W_nU+V_n

wherein V_n＝(v₁,v₂,…,v_n)^TThe mean is 0 and the covariance matrix is:

wherein sigma_rThe noise variance is measured before fusion.

According to the least squares criterion:

to minimize J, the derivation is done on both sides of the above equation:

then there are:

the variance matrix estimate is:

fusing the n measured values to obtain a measured value at the moment k and the noise variance as follows:

wherein

After data fusion, more accurate target detection information after fusion can be obtained.

Referring to fig. 1, 2, 3 and 7, a multisource multimode dynamic information fusion and cognition method for complex environment of epidemic prevention robot is characterized in that: by constructing an acoustic vector sensor tracking part, forming a parallel structure by a plurality of acoustic vector sensors, acquiring target sound signals in parallel, and then performing data-level fusion on the acquired information, more accurate target sound information can be obtained.

The method is characterized in that a plurality of acoustic vector sensors form a parallel structure as a radar information part and a camera image part, and after target sound signals are collected by each acoustic vector sensor, data collected by each acoustic vector sensor needs to be subjected to time registration before data level fusion is carried out, and then fusion is carried out. In the invention, the time registration and the data fusion of the tracking part of the acoustic vector sensor are the same as the radar information part, the time registration is carried out by adopting an interpolation extrapolation method, and the data fusion is carried out by adopting a least square method.

Referring to fig. 1, 2, 3 and 8, a multisource multimode dynamic information fusion and cognition method for complex environment of epidemic prevention robot is characterized in that: by constructing a radar information and visual information fusion part, target detection information output by the radar information part and fusion image information output by the camera image part are combined for information fusion, so that the region of interest is generated.

The image data level fusion information and the target detection information come from different sensors, direct interaction does not exist between the information of the two sensors, and sampling frequencies of the different sensors are different, so that the target position information acquired by the respective sensors needs to be aligned in time and integrated when the information is fused. In the time alignment strategy, the interpolation extrapolation method is still adopted to align time, radar information is fused into image information, and actually the radar information is converted into an image information coordinate system, namely the radar information and the image information are converted into a uniform space dimension. By mapping the radar coordinate system into the image coordinate system, the fusion of the target detected by the radar into the image information is completed, thereby generating the target region of interest.

Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A multi-source multi-mode dynamic information fusion and cognition method is characterized by comprising the following steps:

2. The multi-source multi-mode dynamic information fusion and cognition method according to claim 1, wherein the plurality of cameras form a parallel structure, image acquisition is respectively performed, the acquired images are preprocessed to obtain data-level image information, the data-level image information is fused to obtain image information with precision meeting preset requirements, and the time coverage range is expanded.

3. The multi-source multi-mode dynamic information fusion and cognition method according to claim 1, wherein the plurality of radars form a parallel structure, respectively collect information of targets, preprocess the collected target information to obtain data-level target information and perform fusion, obtain target detection information with accuracy meeting preset requirements, and expand target detection range.

4. The multi-source multi-mode dynamic information fusion and cognition method according to claim 1, wherein the plurality of acoustic vector sensors form a parallel structure, respectively collect target sound signals, preprocess the collected target sound signals to obtain data-level target sound signals, perform fusion, and obtain target sound information with precision meeting preset requirements.

5. The multi-source multimodal dynamic information fusion and cognition method according to claim 1, wherein the fusing comprises:

F(i，j)＝ω_A×A(i，j)+ω_B×B(i，j)

6. The multi-source multi-mode dynamic information fusion and cognition method according to claim 5, wherein two radars R are provided_hAnd R_lWherein the high-precision radar R_hWith a sampling period of 2s, low-precision radar R_lThe sampling period of (3) s, and the data are time-sequenced by using an interpolation extrapolation method;

7. The multi-source multi-mode dynamic information fusion and cognition method according to claim 6, characterized in that data level fusion is performed on time-sequenced data, a plurality of groups of target information data collected by radar are fused into a group of data, and a least square method is adopted for fusion during radar information fusion, and the fusion method is as follows:

is provided with Z_k＝(z₁，z₂，…，z_n)^TIs t_k-1To t_kN data sets collected by the radar array at a moment,

z in (1) and

denotes z₁，z₂，…，z_nFused measured values and derivatives thereof, radar measured values z_iExpressed as:

Z_n＝W_nU+V_n(ii) a Wherein, V_n＝(v₁，v₂…，v_n)^TThe mean is 0 and the covariance matrix is:

wherein σ_rMeasuring the noise variance before fusion;

according to the least squares criterion:

j is to

A minimum defined variable;

and U have the same meaning;

to minimize J, the derivation is done on both sides of the above equation:

then there are:

and z have the same meaning;

and

have the same meaning;

the variance matrix estimates are:

z (k) represents the fused measurement value at time k.

8. A multi-source multi-mode dynamic information fusion and cognition system is characterized by comprising:

9. The multi-source multi-mode dynamic information fusion and cognition system according to claim 8, wherein the plurality of cameras form a parallel structure, image acquisition is performed respectively, the acquired images are preprocessed to obtain data-level image information, the data-level image information is fused to obtain image information with precision meeting preset requirements, and the time coverage is expanded;

10. The multi-source multimodal dynamic information fusion and cognition system according to claim 9 wherein the fusing comprises:

F(i，j)＝ω_A×A(i，j)+ω_B×B(i，j)