WO2019137912A1

WO2019137912A1 - Computer vision pre-fusion and spatio-temporal tracking

Info

Publication number: WO2019137912A1
Application number: PCT/EP2019/050340
Authority: WO
Inventors: Senthil YOGAMANI
Original assignee: Connaught Electronics Ltd.
Priority date: 2018-01-12
Filing date: 2019-01-08
Publication date: 2019-07-18
Also published as: DE102018100667A1

Abstract

The present invention relates to a new image processing method of several images (IMG). At first a plurality of predetermined image features is defined. On the basis of this plurality of predetermined image features image feature information of each of the several images (IMG) is determined. The so determined image feature information is fused into a new image. This process is also called an image fusion (IF). A spatio-temporal tracking of an object of the new image is enabled by using a probabilistic graphical model (PGM). The probabilistic graphical model (PGM) can be modified by a hierarchical modelling or order decoupling. Furthermore specific boundary conditions can be defined to adapt the probabilistic graphical model (PGM). The new fused image comprises an increased information density thanks to the image fusion (IF). This preferably leads to a new image with increased information density. It usually allows for an improved modelling within the probabilistic graphical model (PGM) and improved spatio-temporal tracking of objects.

Description

Computer vision pre-fusion and spatio-temporal tracking

This invention relates to an image processing method for tracking an object.

Existing solutions for tracking an object often use a so-called Kalman filter. The method of Kalman filtering is also known as linear quadratic estimation. This is an algorithm that uses a series of measurements observed over a period of time, containing statistical noise and other inaccuracies and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone. Common

applications for Kalman filtering is for example guidance, navigation or control of vehicles.

The document US 2015/0158182 A1 describes a robot system that includes a mobile robot having a controller executing a control system for controlling operation of the robot. This robot system further includes a cloud computing service in communication with a controller of the robot and a remote computing device in communication with the cloud computing service. The remote computing device communicates with the robot through the cloud computing service.

The document US 2016/0266581 A1 describes how a model of a vehicle’s view of its environment can be generated. Therefore, the vehicle comprises some sensors that should be completely un-occluded to watch the environment of the vehicle. For example, for each of a plurality of sensors of the object detection component, a computer may generate an individual 3D model of that sensor's field of view. Furthermore, weather information is received and used to adjust one or more of the models. After this adjustment, the models may be aggregated into another comprehensive 3D model. The comprehensive model may be combined with detailed map information indicating the probability of detecting objects at different locations. The model of the vehicle’s environment may be computed based on the combined comprehensive 3D model and detailed map information.

The document WO 2012/139636 A1 relates to a method for online calibration of a vehicle video system evaluated from image frames of a camera containing features on the road.

A portion of a road surface is captured by the camera in an image frame. A selection of at least two different features within the image frame is taken, the selected features being chosen as respective reference points. A sequence of at least one more image frame is performed with the camera while locating within the new frame the features chosen as reference points to be tracked. Analysing in a virtual image space the trajectory covered by the reference points during driving interval between image sequence by identifying a geometrical shape obtained by joining the located reference points between each other in each used images and considering the respective trajectories. A deviation of the resulting geometrical object from a parallelogram with corners defined by the reference points from at least two subsequent images is calculated while any measured deviation being used to define an offset correction of the camera.

The task of this invention is to provide a method which is more accurate and robust concerning object tracking in an image.

This task is being solved by the independent claims. Advantageous embodiments of this invention result in the dependent claims. This invention also provides a computer program product to conduct the method. Furthermore, also a driver assistance system with this computer program product is provided.

This invention describes a new method for image processing of several images. This method is characterized by the following steps. At first in step a) a plurality of

predetermined image features for the image processing of the several images is defined. This plurality of predetermined image features comprises preferably several different image features. This means the type of the image features and their quantity are defined before image feature information is determined. In the next step b) the image feature information is determined of each of the several images on the basis of the plurality of predetermined image features. Therefore preferably an appropriate image feature detector and/or image feature descriptor can be applied to the several images.

A feature detector can be a method which generates outputs locations (i.e. pixel coordinates) of significant areas in the image by applying a filter to the images. A filter is preferably a matrix that is designed in that way so that a desired image feature is detected within an image. An example of this is a corner detector, which outputs the locations of corners in the image but does not provide any other information about the image features detected.

A feature descriptor can be a method which takes an image and outputs feature descriptors/feature vectors. Feature descriptors can encode interesting information into a series of numbers and can act as a sort of numerical "fingerprint" that can be used to differentiate one feature from another. Preferably this information can be invariant under image transformation. The image feature can be found again even if the image is transformed in some way. An example would be the SIFT-method, which encodes information about the local neighbourhood image gradients the numbers of the feature vector.

The more image feature information is gathered the more image feature information can be used for step c) Therefore, an image feature detector with different filters can be applied to the images. If the image feature detector is equipped with several and/or different filters different image features and their information can be extracted out of the image. In the best case, all image feature information of step b) are used for the next step c), but at least the major part of the image feature information determined in step b) is used. In step c) the image feature information determined of each of the several images in step b) is fused into a new image. This fusing process can also be considered as condensing the image feature information into the new image. Therefore, this new image can also be called a“super image”. This new image, the super image, contains much more information than each of the several images. An object of the new image is spatio- temporally tracked in step d), wherein a probabilistic graphical model is employed. This means that the image feature information of the new image are input into the probabilistic graphical model. Since the new image usually contains more information than each of the several images, the data basis for the probabilistic graphical model may grow larger. This can help to perform more accurate and robust object tracking in the new image. The term spatio-temporal object tracking does not only comprise localising the object. It also includes capturing the development of the location of a detected object over a period of time. This means spatio-temporal tracking can also mean watching or observing a detected object. Due to the image fusing which is performed in step c) a larger data base that is condensed in the new image can be provided as an input for the probabilistic graphical model. This can enable a better capturing of the global structure of the scene. This also can help to improve the robustness of the object tracking.

Another embodiment contemplates a method, wherein one of the several images is a false colour representation of one of the other images concerning a depth, a motion and/or an intensity. False colour representations are often used if further information shall be expressed within a two-dimensional image. For example, if a map of a country is shown, different colours may indicate different population densities. A false colour representation is used concerning different temperatures. Such diagrams sometimes are referred to as heat maps. Concerning object tracking it is suitable to use false colour representations that give additional information concerning the depth, the motion or the intensity. Such representations can be generated before step b) and/or whilst step b).

This means that different false colour representations can deliver additional pieces of image feature information which are further condensed into a new image according to step b) by applying an appropriate image processing. Of course other types of false colour representation can be generated that can be used for the image fusion process. This means step b) can comprise additional image processing methods to create other images. Image feature information of those other images can be determined and used for the fusing in step c). Fig. 1 shows exemplarily a selection of several additional image processing methods that can be used for modifying and/or complementing step b).

Another embodiment of this invention describes a method, wherein by means of the several images a property of the tracked object is determined by a reconstruction analysis. The reconstruction analysis can contain many different image processing methods. A dense flow, optical flow, a bipolar ratification, motion serial, dynamic clustering, static clustering and so on can be used for the reconstruction analysis. The objective of the reconstruction analysis is to process the pieces of image feature information in a way that they can be fused later on. Each sub-method may provide a special kind of information. The different sub-methods of image processing may focus on different aspects. These different pieces of information can be used for the fusing step b). For example, it may be helpful to distinguish between dynamic and static objects. Another possibility is to use sub-methods that search for pedestrians or traffic signs in an image. Also providing information about the ground topology in a reconstruction analysis may be useful for the further object tracking. Since it is intended to leverage all image feature information of each of the several images, much more semantic information can be provided.

Another embodiment of this invention describes a method, wherein by means of each of the several images a type of the tracked object is determined in a recognition analysis. The recognition analysis is preferably carried out by using a convolutional neural network. A convolutional neural network (CNN) usually comprises several and different matrixes. Each matrix can be considered as a filter. These filters are usually designed to detect image features and/or their information. Usually a filter matrix is applied to an input image by moving the filter matrix over the input image. The activity of a neuron is often determined by a discrete convolution. Of course other types of artificial networks can be applied in order to determine a type of the tracked object. For example, if an object is captured in an image, the recognition analysis determines what kind of object has been found. This can be a pedestrian, a car, or a dog or other objects in the image. This information is very useful to classify a tracked object in the image. For example if a tree is detected in an image, it can be concluded that this object, the tree, is a stationary, non- moveable object.

Another embodiment of this invention describes a method, wherein for the reconstruction analysis an optical flow and/or structure from motion is employed. An optical flow or optic flow is the pattern of apparent motion of objects, surfaces and edges in a visual scene caused by a relative motion between an observer and a scene. Therefore, the method of optical flow may help to detect moving objects in an image. Structure from motion can be used in order to extract three-dimensional information out of several images. Since a single image does not contain three-dimensional information at least two images are necessary to conduct the method structure from motion.

Another embodiment of the present invention describes a method, wherein for the recognition analysis a convolutional neural network is employed. The convolutional neural network can be combined with different image processing methods. A convolutional neural network can comprise several layers. This may enable the convolutional neural network to detect more complex image features like a face. Often the convolutional neural network is applied in order to gain further information concerning the classification of objects.

A special embodiment of this invention describes a method, wherein for an object recognition of the new image the results of the reconstruction analysis and the recognition analysis are used together for an image fusion of the new image. The reconstruction analysis and the recognition analysis can be considered as two separate paths of the image processing. These two paths of the image processing can provide different results. For example, the reconstruction analysis may lead to useful information like the motion of objects or the ground topology. The recognition analysis may further analyse detected objects in the image concerning the type of the object. Preferably, the reconstruction analysis provides generic object detection based on different cues like the motion, the depth, the intensity or further cues that may be useful for a generic object detection. The recognition analysis preferably provides specific object detection based on appearance cues. These different kinds of cues are combined in this embodiment of the invention and fused into a new image. A further embodiment of this invention describes a method, wherein according to the object recognition the recognized object is characterized into the categories ground, dynamic/critical and infrastructure. Since it is intended to use this invention in vehicles, it is suitable to define some predetermined categories. In case of an autonomous driving vehicle the ground, the infrastructure and dynamic objects should be recognized as quickly as possible. Therefore, it may be useful to implement these three types of categories as predefined categories into an image processing method. The category ground may comprise features like a road, drivable area, free space, lanes, markings, road effects, footpaths, curves or lawns. Dynamic objects often can be considered as critical objects. This kind of category may comprise features like pedestrians, cyclists, motor bikes, cars, trucks, busses or animals. Objects like traffic signs, lights, buildings, walls, poles, trees, bridges or flyovers may be assigned to the category infrastructure. If these categories are predefined beforehand, the image processing itself may be accelerated.

A special embodiment of this invention describes a method, wherein a data node is created that comprises a semantic content, the object type, an object property, its position, velocity and/or pose as pieces of information. The recognition and reconstruction analysis provide different pieces of information. These pieces of information are preferably different. This means these pieces of information should not only contain information for example about the ground topology, they further should contain other pieces of information for example information about static or dynamic objects. In this variant all these information are collected in the data node. This means the data node can contain information about a location of a detected object in three-dimensional

coordinates, further semantic information that defines the type of the detected object, additional context information and some temporal or velocity information. The context may comprise information that may help to identify or track the detected object more effectively. For example, in case of a vehicle a semantic information“on road” may help to detect the vehicle faster. The information“on road” is a sign that this node is connected to the ground topology in some way. This means the image processing can consider this information. If something is located at or near a wall like a window, this information can also be stored in the data node.

Another embodiment of this invention describes a method, wherein for the fusing according to step c) and for the determining according to step b) image feature information is used exclusively. Many methods of image processing use further sensor information to detect or track objects. This variant of the invention only uses image feature information. The image feature information is leveraged as good as possible in this embodiment of the invention. The fusion and analysis according to sub-methods of image processing is preferably conducted at the image level. This means the image processing method is not combined with for example an ultrasonic sensor, a dead reckoning sensor or other sensor. The only sensor that is used in this embodiment is a camera that provides the several images. But the several images can also be given as an input. A camera that has implemented the image processing methods according to claim 1 is able to perform an improved object tracking.

Another embodiment of this invention describes a method, wherein a hierarchical model is employed for the probabilistic graphical model. The probabilistic graphical model can be adjusted differently. One option is to use a hierarchical model for the probabilistic graphical model. A hierarchical model offers the opportunity to average the amount of signal noise. The output signals of sensors, like a temperature value of a temperature sensor or an image of a camera, are usually affected by input signal noise. Their output signals show deviations concerning a measured parameter (e.g. image feature information) due to the input signal noise. The hierarchical model takes the advantage of the average effect in order to reduce the amount of deviations concerning the output signals. This advantage of the average effect can be implemented into the probabilistic graphical model by applying a hierarchical model into the probabilistic graphical model.

A further embodiment of this invention describes a method, wherein a boundary condition concerning an object structure is predetermined for the probabilistic graphical model. The type of the boundary condition depends on the proposed application of the image processing. For example, if small children or babies should be monitored, other boundary conditions are suitable as in the case of autonomous driving vehicles. In case of monitoring babies prominent traffic signs are not necessary as boundary conditions. In this case they are not helpful. But in case of autonomous driving vehicles boundary conditions that may help to classify prominent traffic signs can be very useful. For example a spatial boundary condition may comprise that lanes lie on a ground plane. A geometric boundary condition may comprise that lanes on the road are thick lines with a width that has at least a minimum value. A boundary condition concerning a colour may comprise that lanes on the road are typically white. A location boundary condition may comprise that a lane location may be based on high definition maps. There are many possibilities to define boundary conditions. It is useful to define meaningful boundary conditions. This may help to accelerate the image processing method. Of course the amount and the types of different boundary conditions is not fixed. They can be adapted according to the present application.

A special embodiment of this invention describes a method, wherein all image feature information of the several images is determined and all image feature information is fused to the new image according to a predetermined rule. The main idea of this invention is to use as many pieces of image feature information as well as possible. For example, if an image is taken with a camera that provides a resolution of two million pixels, the image may contain a maximum of two million image features. This variant of the invention uses all the two million pieces of image feature information. This means that no image feature information is neglected in this variant. In other words the degree of utilization is 100 % in this embodiment of the invention. This can help to provide a new image thanks to the image fusion but comprises a high degree of information density. The fusion of all the pieces of image feature information is conducted by a predetermined rule in this embodiment. This can help to improve the spatio-temporal tracking according to step d).

This invention also offers a computer program product with program code resources, which are stored in a computer-readable medium to conduct any preceding method if the computer program product is processed on a processor of an electronic control unit. This offers the opportunity to implement the image processing method into a camera. A chip in the camera may be able to conduct some or all of the steps if it is activated.

This invention further offers a driver assistance system with a computer program product. That means the image processing method can interact with the driver assistance system that further is able to manoeuvre a vehicle. This means in dependence on the result of the image processing method a signal can be generated for the driver assistance system.

This generated signal for the driver assistance system can be used for further processes like alarming a driver concerning a dangerous situation. Preferably the driver assistance system uses the computer program product in order to conduct the method of image processing so that the driver assistance system can generate specific steering signals for an autonomous vehicle manoeuvring.

This invention also provides a motor vehicle with a driver assistance system. If the driver assistance system generates a steering signal a control unit of the driver assistance system may manoeuvre the motor vehicle according to the steering signal. For example, if the image processing method is tracking an object that may collide with the vehicle, the driver assistance system may stop the vehicle. It preferably depends on the result of the image processing method how the vehicle is manoeuvred.

Further features of the invention are apparent from the claims, the figures and the description of figures. The features and feature combinations mentioned above in the description as well as the features and feature combinations mentioned below in the description of figures and/or shown in the figures alone are usable not only in the respectively specified combination, but also in other combinations without departing from the scope of the invention. Thus, implementations are also to be considered as encompassed and disclosed by the invention, which are not explicitly shown in the figures and explained, but arise from and can be generated by separated feature combinations from the explained implementations. Implementations and feature combinations are also to be considered as disclosed, which thus do not have all of the features of an originally formulated independent claim. Moreover, implementations and feature combinations are to be considered as disclosed, in particular by the implementations set out above, which extend beyond or deviate from the feature combinations set out in the relations of the claims.

The attached drawings show in:

Fig. 1 a block diagram of proposed architecture to perform spatio-temporal

fusion;

Fig. 2 an example of pedestrian detection by optical flow clustering (left) and

deep learning based object detection (right)

Fig. 3 an example of curve detection by depth based clustering (right) and deep learning based object detection (left);

Fig. 4 an illustration of occlusion handling of pedestrians by a car;

Fig. 5 illustration of different false colour representations from various cues;

Fig. 6 illustration of a data node containing several different pieces of information;

Fig. 7 illustration of the hierarchical modelling of the probabilistic graphical model; Fig. 8 illustration of different possibilities to combine data nodes; and

Fig. 9 illustration of an automated exploration of good connections in the

probabilistic graphical model.

Fig. 1 shows exemplarily the desired architecture to perform spatio-temporal tracking. At the left a set of several images IMG is provided. These images IMG can be input into the reconstruction 1 10 path and recognition 120 path. The reconstruction 1 10 path and the recognition 120 path can be considered as two complementary cues. Within these two paths several different kinds of analysis can be performed. In this example the reconstruction 1 10 path contains different analyses concerning a dense flow DF, a flow analysis OF, an epipolar ratification ER, a motion stereo MS, a dynamic clustering DC, a dynamic object localization DOL, a three-dimensional reconstruction SFM, a static clustering SC, a ground topology GT, a static object SO and a dynamic object DO. In the reconstruction 1 10 path all these different kinds of analyses may deliver much more information than the original images IMG contained. The recognition 120 path contains a convolutional neural network CNN that is able to perform an object classification OC. It is possible that the convolutional neural network CNN itself comprises several different image processing methods. Fig. 1 shows that it is intended to collect as many pieces of image feature information as possible to provide much more semantic information. All the results of the different analyses are collected and fused into a new image. The image fusion IF can be considered as a collection of all pieces of information which have been gathered or determined in the reconstruction 110 path and recognition 120 path. This image fusion IF creates the new image with a higher information density.

This means the fused new image due to the image fusion IF contains much more information than the several images IMG which have been input into the reconstruction 1 10 and recognition 120 path. This new fused image in which all the information is condensed that has been collected in the two paths is input into a probabilistic graphical model PGM. In this case the image fusion IF is carried out by the use of two

complementary cues, the reconstruction 1 10 and recognition 120. This can help to provide epistemic confidence concerning the object type. It may also be useful to provide spatial confidence where a detected object is on the new image. Preferably, the image fusion IF is done as a combination of the reconstruction 110 and recognition 120 path. A spatio-temporal object tracking is conducted by using the probabilistic graphical model PGM of the image fusion IF. It should be mentioned that the described methods and analyses only utilize image feature information of the several images IMG. No pieces of information of other sensor systems like ultrasonic, laser scanning, etc. are used in the examples shown in the figures.

The probabilistic graphical model PGM can capture the global structure of the scene. This can help to improve the object tracking. The object tracking can become more robust or effective.

Probabilistic graphical models PGM are a high level abstraction of joined probabilities which fully capture dependencies. It is easy to think at a higher level abstraction and model visually.

For instance, an urn problem without replacement of numbered balls can be

comprehended more easily with a graphical representation. In an urn several balls with printed numbers on it may be located. In a lottery after a mixing of the several balls one single ball is taken out. This procedure may be repeated several times until the desired amount of balls have been taken out from the urn. In this case it is much easier to comprehend the probabilities and the possibilities of the events that may appear. The statistical analysis can be understood easier and quicker. This idea is implemented in the probabilistic graphical model PGM. The probabilistic graphical model PGM is an elegant representation which is easier to work with and avoids mistakes. It also enables higher order compositions putting together low level graphical models to build higher level models. For example, for multi-object tracking, it is easier to model network evolutions using some sort of birth-death processes when new objects enter an image (birth) and existing objects leave an image (death).

Another advantage of probabilistic graphical models PGM is learning the graph topology from the video sequence. This means the probabilistic graphical model PGM is able to perform a learning phase of graphical models. That means a probabilistic graphical model PGM can be improved by performing a learning phase. Several images or videos can be offered to the probabilistic graphical model PGM in order to perform a learning process for the probabilistic graphical model PGM. The higher level of abstraction enables an easy design of complex models and also automates network topology exploration using Bayesian optimization. Deep learning networks can be viewed as graphical models as well. For example, a Kalman filter has one hidden layer whereas a convolutional neural network CNN has a cascade of hidden nodes or layers to be inferred from data. The biggest leap in machine learning in terms of technological progress may be the probabilistic programming for graphical models. Although deep learning had the biggest impact in black-box models, the model has remained pretty much the same for longer than 20 years. In the field of probabilistic programming many open-box models have been developed. With probabilistic programming, even the computational disadvantage of a manual implementation of Kalman filter can be eliminated. A complex graphical model can be specialized to generate an efficient implementation of a Kalman filter. This means there is no need to manually write an implementation. Different packages exist that can be used for probabilistic graphical models PGM. Such packages can be the“theano” code or the“pymc3” code.

Fig. 2 shows two images IMG that have been analysed differently. For example the left image IMG in Fig. 2 shows the result after the optical flow OF has been determined. In this case a cluster of pixels PCL on the right side indicates a moving object. In this case the moving object corresponds to a pedestrian. The right image IMG of Fig. 2 has been analysed by applying another image processing method than in case of the left image IMG. Concerning the right image IMG of Fig. 2 a deep learning based object detection has been performed. This type of analysis has detected four objects which are illustrated with bounding boxes. Furthermore, the learning process which can be performed for example by using a convolutional neural network CNN, also delivered information about further object properties. In this case two bounding boxes have been classified as dynamic objects DO. The other two bounding boxes have been classified as static objects SO. In the image fusion IF process the pieces of information of the left image IMG and the right image IMG are collected and fused into a new image.

Fig. 3 shows an example for a further static object SO detection. The left image IMG of Fig. 3 shows a traffic scene. In this case it was intended to perform a specific object classification OC. The deep learning process could detect in this case a curb on the left and right which is illustrated as a thick line with a shading. Of course a convolutional neural CNN network is also able to search and detect other objects like the pedestrian on the right or the two vehicles in front. It would be very confusing if too many image features were displayed in a single image. Therefore, in this case only the curb detection as object classification OC has been illustrated. Of course the image processing method proposed in this application preferably conducts more than one single image processing method and therefore gains many more pieces of information. The proposed image processing method of this application preferably comprises several image processing methods and takes advantage of these different image processing methods since they can deliver many different pieces of image feature information.

The right image IMG of Fig. 3 shows the result of another analysis. In this case a ground topology GT analysis has been performed. In the centre an ego vehicle 310 is shown.

The right image IMG in Fig. 3 is a false colour representation concerning a coordinate z 330. This means the different ground topologies GT are assigned to different z-levels in this case. The ground topology at the bottom left represents a different z coordinate 330 than the ground topology GT at the upper right. In Fig. 3 these two ground topologies represent different heights. This image IMG represents an upper view. It can be considered as a bird view image. As in the example of Fig. 2 also here the pieces of information of these two images are collected and fused into the new image.

Fig. 4 shows an example of how a probabilistic graphical model PGM can be used if some objects are occluded by other objects. A bounding box 410 contains a pedestrian 420. It is possible to identify this object as pedestrian 420 since enough of the pedestrian 420 is shown in the image IMG. Nevertheless the bottom part of the pedestrian 420 is occluded by the vehicle 430 in front of it. A probabilistic graphical model PGM that has performed a sufficient learning process is able to figure out what the object looks like if the vehicle 430 were not present in this case. In this case the probabilistic graphical model PGM probably would assume that at the bottom part of the bounding box 410 shoes are present. Since the probabilistic graphical model PGM could identify the content of the bounding box 410 as a pedestrian, the most probable solution concerning the bottom part of the bounding box 410 is the presence of shoes.

Furthermore, the probabilistic graphical model PGM could take advantage of the presence of the second pedestrian illustrated right of the bounding box 410. The probabilistic graphical model PGM could analyse the motion of these two pedestrians to determine a velocity concerning these two pedestrians. An information about the velocity of these two pedestrians may indicate that they are walking and that there is no skateboard at the bottom. This example illustrates how information can be combined due to the collection of image feature information. The probabilistic graphical model can contain much more analysis and conclusions. In order to improve a probabilistic graphical model PGM it is useful to perform a learning process beforehand. Due to this learning process the probabilistic graphical model PGM can perform a realistic estimation of object appearance. In this case the probabilistic graphical model PGM can help to handle the situation of Fig. 3 in which the feet of the pedestrian 420 are not visible. In order to increase the amount of information that can be fused later in the image fusion IF process several false colour representations 500 can be used. Fig. 5 shows three different false colour representations 500. In Fig. 5 a traffic sign, a“STOP” sign, is shown in different false colour representations. In many cases false colour representations 500 are also referred to as heat maps. The heat map 510 shows the image IMG after it has been analysed in terms of its intensity. The heat map 520 represents the same image IMG but in this case it has been analysed in terms of its depth. The heat map 530 shows a representation concerning the motion. These heat maps 510 - 530 can be generated at different steps of the reconstruction 110 or recognition 120 path. For example, the heat map 510 could be a result of the dense flow DF, the heat map 520 can be considered as a result of the three-dimensional reconstruction analysis SFM and the heat map 530 could be a result of the motion stereo analysis MS. Preferably, detected objects are

characterised into three parts based on their nature and criticality. These three categories are usually the ground, dynamic objects and the infrastructure. The probabilistic graphical model PGM is used to connect all detected objects and find weights across object relationships. This way it can exploit spatial context to improve the estimation and it can handle occlusion via spatial reasoning. It is intended to use an approximate influence through conditional independence assumptions, sampling and believe propagation.

Fig. 6 shows an illustrative representation of a data node 600. Within this data node 600 several pieces of information are collected. Starting from the left three-dimensional information 3D are stored. The data node 600 further contains pieces of information concerning a semantic information SEM, a context information CON, a temporal information TEM or other types of information INF. The content of the data node 600 is not fixed, it depends on the different image processing methods which are performed in the reconstruction 1 10 or recognition 120 analysis. There exist several possibilities as how to use this data node 600.

One approach of this invention is to apply a hierarchical modelling of the probabilistic graphical model PGM. This is illustratively shown in Fig. 7. Here the data nodes 600 are named differently. The illustration at the top of Fig. 7 shows an amount of small circles which represent different data nodes 600. Two sets A and B of data nodes 600 are illustrated in the illustration at the top of Fig. 7. These two sets of different data nodes 600 A and B are fused so that the amount of data nodes 600 is being reduced. The fusion process is indicated by the arrow between the upper and lower illustrations of Fig. 7. For example, the data nodes 600 of the above illustration may contain information of several pedestrians that are walking towards a common destination, like a pub. Each circle as a data node 600 may represent the movement of a single pedestrian. A single pedestrian does not show a clear movement directly to the pub. It is possible that the single pedestrian shows some deviation concerning his movement. If every pedestrian is analysed separately it takes some time until the direction and the destination of the pedestrian can be identified.

In this case a hierarchical modelling can improve the analysis of this situation. In the case of Fig. 7 the small data nodes 600 (u1-u4, x1-x5, z1-z4, f 1 -f5) are averaged. Due to the averaging new data nodes 600 that are named Xa, Xc, Xb, Za and Zb are created. These data nodes 600 are shown in the illustration below in Fig. 7. In the example of the walking pedestrians the new data nodes 600 contain averaged information about the movement of the pedestrians. This averaging reduces the noise concerning the movement of the pedestrians. The real movement of the pedestrians, the direction to the pub, appears out more clearly after the averaging process. This is the main idea behind the hierarchical modelling of a probabilistic graphical model.

Beside the hierarchical modelling also order decoupling is a possibility to modify the probabilistic graphical model PGM. Fig. 7 illustrates hierarchical modelling where two sub- graphs A and B of several data nodes 600 are grouped together to form a higher level state. This way, the graph can be hierarchically designed at different levels of abstraction. For example, often a Kalman filter is designed for each object like a pedestrian, a vehicle, etc. with different dynamical systems model. This means in some cases the data nodes 600 can be described by appropriate Kalman filters. By using order decoupling it is easy to tie them up graphically and design them hierarchically. Each Kalman filter is a Markov Chain and typically the connections between the data nodes 600 are non-stationary processes. Therein often the Chinese restaurant process or Indian Buffet process are used.

Fig. 8 shows three options of how data nodes 600 can be arranged or handled within a probabilistic graphical model PGM. The first illustration in Fig. 8 shows five data nodes 600 and the left data node x1 is directly connected to its neighboured right data node x2. This first illustration represents a first order model. The illustration in the centre of Fig. 8 shows a more complex model. In this case additional connections between the data nodes 600 are present. For example, the first data node x1 is connected to the second data node x2 as well as to the third data node x3. This means that the mathematical models which are part of the probabilistic graphical model PGM are different from the first order option. The third illustration in Fig. 8 shows how the second illustration can be folded to a first order model. In the bigger data node 800 two data nodes 600 are merged to a unity. This means the bigger data node 800 can be treated as a single data node 600 as in the case of the first illustration of Fig. 8. Usually the complexity of the bigger data nodes 800 increase due to the merging of several data nodes 600 into a single bigger data node 800. In other words a higher order model is transformed to a model with a lower order. This technique can be called order decoupling.

Fig. 9 shows how automated exploration of data nodes 600 can be done using Bayesian optimization where only dominant nodes T4 are automatically found. Fig. 9 shows below data nodes 600 which are named x1 - x6. Above these data nodes x1 - x6 further data nodes, named T1 - T4 are illustrated. The data nodes T1 - T3 are expressed by dashed lines. The automated exploration of the nodes 600 can be improved by applying a deep learning process to the probabilistic graphical model PGM before. The data nodes x1 - x6 are connected to the data nodes T1 - T4. Only the connection to the data node T4 shows a stable connection, expressed by continuous lines. To the left of the data node T4, the data nodes T1 - T3 shall be shown as blurred due to the dashed lines. It looks like they are disappearing from the probabilistic graphical model PGM. In the automated exploration process the probabilistic graphical model PGM considers these kind of data nodes T1 - T3 which are blurred in this case as unrealistic. If a certain probability value of a data node 600 or a connection to this data node falls below a threshold value for the probability, this data node or its connection may be neglected.

For example, a fresh and new probabilistic graphical model PGM without any input data may consider every option. It probably would check every possible option and every connection. In this case a probabilistic graphical model PGM may even try to detect or track a vehicle 430 at any place in an image IMG. The probabilistic graphical model PGM could even try to find the vehicle 430 in a blue region of the image IMG which is the sky. This is a very unrealistic option and since the vehicle 430 usually does not appear in the sky, the probabilistic graphical model PGM will not find the vehicle 430 in the sky. This means if enough images IMG have been analysed by a probabilistic graphical model PGM it recognizes that the probability value for the vehicle 430 in the sky decreases more and more. After an intense learning process the probabilistic graphical model PGM probably would determine a probability value for this unrealistic event which is extremely low. Preferably the probabilistic graphical model PGM neglects such unrealistic events if a certain threshold value is reached. In the example of Fig. 10 the blurred or dashed data nodes T1 - T3 already have been neglected. The possibilities or options that are described by these data nodes T1 - T3 are simply too unrealistic to be further considered.

This means that a probabilistic graphical model PGM can improve itself by an appropriate learning process. This learning process is the more effective the more image feature information could be provided to be input into the probabilistic graphical model PGM. This means the determining of the image feature information and fusing them together into a new image can drastically improve probabilistic graphical models PGM. This invention also intends to modify probabilistic graphical models PGM by implementing boundary conditions, hierarchical modelling or the method of order decoupling. Preferably, as many pieces of image feature information are collected from the several images in order to take a bigger advantage of the fusing process according to step c). The more information is present in the new fused image after the image fusion IF process the better can be the spatio-temporal tracking of an object.

The more information the new image contains after the image fusion IF the more effective can be the tracking step according to step d). The automated exploration of good connections in the probabilistic graphical model PGM can be carried out with less computational time. By applying appropriate image processing methods according to Fig.

1 or the modification of the probabilistic graphical model PGM concerning the boundary conditions, the hierarchical modelling or the order decoupling the spatio-temporal tracking of an object can be modified. This means that different adjustments can be applied to the probabilistic graphical model PGM to improve the spatio-temporal tracking of the objects. This enables not only detecting and classifying of the object, but it is also possible to follow and track the object over a period of time.

The method presented in this application does not transform the image feature

information into a map. This avoids the disadvantage of losing information by providing a map. According to Fig. 1 complementary data are available from the recognition 120 and the reconstruction 1 10. An object tracking is performed prior to generating a map. This means that more informative image feature information at the image level are available which allow to provide a more accurate spatio-temporal object tracking. It is very advantageous to perform a deep learning process to the probabilistic graphical model PGM before it is used in a vehicle. This is due to the fact that an untrained probabilistic graphical model PGM needs more computational time than an already trained

probabilistic graphical model. The examples show illustratively that leveraging as many image feature information at the image level as possible and fusing them into a new image can lead to an improved spatio- temporal tracking of objects. The new fused image contains many more pieces of information than each of the several images thanks to the image fusion IF. This enables the probabilistic graphical model PGM to determine meaningful options within less time.

This invention describes a method for image processing of several images IMG.

Therefore, the image feature information of each of the several images IMG is determined and at least a majority of them is fused into the new image. Preferably all pieces of image feature information are fused into a new image. A spatio-temporal tracking of an object of the new image is performed, wherein a probabilistic graphical model PGM is employed. The probabilistic graphical model PGM can further be adapted by different sub-models within the probabilistic graphical model PGM or by several boundary conditions. These modifications to the probabilistic graphical model PGM can be handled flexibly with regard to the desired application.

Claims

1. Method for image processing of several images (IMG) characterized by the

following steps:

a) Defining a plurality of predetermined image features for the image processing of the several images (IMG)

b) Determining an image feature information of each of the several images (IMG) on the basis of the plurality of predetermined image features,

c) Fusing the image feature information determined of each of the several images (IMG) in step b) into a new image,

d) Spatio-temporal tracking of an object of the new image, wherein a probabilistic graphical model (PGM) is employed.

2. Method of claim 1 , wherein one of the several images (IMG) is a false colour

representation (500) of one of the other images concerning a depth, a motion and/or an intensity.

3. Method of claim 1 or 2, wherein by means of the several images (IMG) a property of the tracked object is determined by a reconstruction (110) analysis.

4. Method of claim 1 or 2, wherein by means of the several images (IMG) a type of the tracked object is determined in a recognition (120) analysis.

5. Method of claim 3, wherein for the reconstruction (1 10) analysis an optical flow (OF) and/or structure from motion (SFM) is employed.

6. Method of claim 4, wherein for the recognition (120) analysis a convolutional neural network (CNN) is employed.

7. Method according to claims 3 and 4, wherein for an object recognition of the new image the results of the reconstruction (110) analysis and the recognition (120) analysis are used together for an image fusion (IF) of the new image.

8. Method of claim 7, wherein according to the object recognition the recognized object is categorized into the categories ground, dynamic/critical and infrastructure.

9. Method of claim 8, wherein a data node (600) is created that comprises a semantic content (SEM), an object type, an object property, its position, velocity and/or pose as pieces of information.

10. Method of any one of the preceding claims, wherein for the fusing according to step c) and for the determining according to step b) image feature information is used exclusively.

1 1. Method of any one of the preceding claims, wherein a hierarchical model is

employed for the probabilistic graphical model (PGM).

12. Method of any one of the preceding claims, wherein a boundary condition

concerning an object structure is predetermined for the probabilistic graphical model (PGM).

13. Method according to any one of the preceding claims, wherein all image feature information of the several images (IMG) is determined and all image feature information is fused into the new image according to a predetermined rule.

14. Computer program product with program code resources, which are stored in a computer-readable medium to conduct the method of any one of the preceding claims if the computer program product is processed on a processor of an electronic control unit.

15. Driver assistance system with a computer program product according to claim 14.