WO2019076867A1 - Semantic segmentation of an object in an image - Google Patents

Semantic segmentation of an object in an image Download PDF

Info

Publication number
WO2019076867A1
WO2019076867A1 PCT/EP2018/078192 EP2018078192W WO2019076867A1 WO 2019076867 A1 WO2019076867 A1 WO 2019076867A1 EP 2018078192 W EP2018078192 W EP 2018078192W WO 2019076867 A1 WO2019076867 A1 WO 2019076867A1
Authority
WO
WIPO (PCT)
Prior art keywords
image frame
neuronal network
predefined
high priority
convolutional neuronal
Prior art date
Application number
PCT/EP2018/078192
Other languages
French (fr)
Inventor
Stephen FOY
Rosalia BARROS
Ian Clancy
Original Assignee
Connaught Electronics Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Connaught Electronics Ltd. filed Critical Connaught Electronics Ltd.
Publication of WO2019076867A1 publication Critical patent/WO2019076867A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations

Definitions

  • the invention relates to a method for semantic segmentation of an object in an image, comprising the following method steps:
  • CNN convolution neural network
  • CNNs are highly successful at classification and categorization tasks but much of the research is on standard photometric RGB images and is not focused on embedded automotive devices. Automotive hardware devices need to have low power consumption requirements and thus low computational power.
  • a convolutional neural is a class of deep, feed-forward artificial neural networks that has successfully been applied to analyzing visual imagery.
  • CNNs use a variation of multilayer perceptrons designed to require minimal preprocessing.
  • Convolutional networks were inspired by biological processes in which the connectivity pattern between neurons is inspired by the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field.
  • CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns the filters that in traditional algorithms were hand-engineered. This independence from prior knowledge and human effort in feature design is a major advantage.
  • CNNs have applications in image and video recognition, recommender systems and natural language processing.
  • US 2017/0200063 A1 teaches applying a set of sections spanning a down-sampled version of an image of a road-scene to a low-fidelity classifier to determine a set of candidate sections for depicting one or more objects in a set of classes.
  • the set of candidate sections of the down-sampled version may be mapped to a set of potential sectors in a high-fidelity version of the image.
  • a high-fidelity classifier may be used to vet the set of potential sectors, determining the presence of one or more objects from the set of classes.
  • the low-fidelity classifier may include a first convolution neural network trained on a first training set of down-sampled versions of cropped images of objects in the set of classes.
  • the high-fidelity classifier may include a second CNN trained on a second training set of high-fidelity versions of cropped images of objects in the set of classes.
  • US 9.704,054 B1 describes that image classification and related imaging tasks performed using machine learning tools may be accelerated by using tools to associate an image with a cluster of such labels or categories, and then to select one of the labels or categories of the cluster as associated with the image.
  • the clusters of labels or categories may comprise labels that are mutually confused for one another, e.g. two or more labels or categories that have been identified as associated with a single image.
  • processes for identifying labels or categories associated with images may be accelerated because computations associated with labels or categories not included in the cluster may be omitted.
  • the invention provides a method for semantic segmentation of an object in an image, comprising the following method steps:
  • semantically classifying the detected objects by the convolutional neuronal network by assigning each detected object to one of a list of predefined object classes, providing a lookup-table with a priority list which comprises a priority level for each of the predefined object classes, respectively,
  • determining one or more object(s) which have a predefined priority level determining one or more object(s) which have a predefined priority level, determining a high priority area of the image frame which relates to the or an object with the predefined priority level,
  • a high priority area of the image is determined in a first image frame based on the priority levels of the objects detected in the image. Then, in a next image frame, only the high priority area of the image is processed which makes the method a lot more effective.
  • the priority levels of the different object classes are defined based on an order of safety, e.g. objects belonging to the object class "person” might be more important than objects belonging to the object class "curbside”.
  • the high priority area would be defined by the object(s) with the highest priority level, i.e. the predefined priority level would be the highest priority level. If these objects have been classified in a trustworthy way, areas of the image with objects having lower priority levels may be processed.
  • the step of analyzing only the high priority area in the next image frame by the convoiutional neuronal network may be performed in different ways as set out in the following. According to a preferred embodiment of the invention, analyzing only the high priority area in the next image frame by the convoiutional neuronal network is performed by
  • semantically classifying the detected objects by the convoiutional neuronal network by assigning each detected object to one of the list of predefined object classes, determining a respective priority of the detected objects by comparison with the lookup-table,
  • the step of analyzing only the high priority area in the next image frame by the convoiutional neuronal network by
  • semantically classifying the detected objects by the convoiutional neuronal network by assigning each detected object to one of the list of predefined object classes, determining a respective priority of the detected objects by comparison with the lookup-table,
  • analyzing only the new high priority area in the next image frame by the convolutional neuronal network is repeated at least once.
  • a high priority area with objects which should be classified may be defined in a multi-step process.
  • classification may also be performed directly after the first definition of the high priority area. Therefore, according to a preferred embodiment of the invention analyzing only the high priority area in the next image frame by the convolutional neuronal network is performed by semantically classifying the object by assigning the object to one of the list of predefined object classes. In this respect, preferably the following step is performed:
  • inputting a next image frame of the consecutively acquired image frames into the convolutional neuronal network in real time may be performed by inputting the complete image frame.
  • inputting a next image frame of the consecutively acquired image frames into the convolutional neuronal network in real time is performed by inputting only the high priority area of the next image frame into the convolutional neuronal network.
  • the step of consecutively acquiring image frames is performed by a camera with a field of view of more than 150 yielding respective image frames covering an image angle of more than 150 ⁇ More preferably, the camera has a field of view of more than 180 yielding respective image frames covering an image angle of more than 180°. In this way, a large field of view may be monitored while the mere amount of pixels of the images acquired by such a camera does not slow down processing speed appreciably since not the complete images have to processed for all image frames.
  • the invention also relates to the use of a method as described above in an automotive vehicle.
  • the invention further relates to a sensor arrangement for an automotive vehicle configured for performing a method as described above.
  • the invention also relates to a non-transitory computer-readable medium, comprising instructions stored thereon, that when executed on a processor, induce a sensor arrangement of an automotive vehicle to perform a method as described above.
  • Fig. 1 schematically depicts a vehicle with a sensor arrangement according to a preferred embodiment of the invention
  • FIG. 2 a, b schematically depict the processing of image frames according to a
  • FIGs. 3a - d schematically depict a further aspect of the processing of image frames according to a preferred embodiment of the invention.
  • Fig. 1 schematically depicts an automotive vehicle 1 with a sensor arrangement 2 which is comprised of a camera 3 and an evaluation unit 4.
  • the sensor arrangement 2 is adapted for semantic segmentation of images of objects 5 captured by camera 3.
  • the evaluation unit 4 may be part of an advanced driver-assistance system for helping a driver of the automotive vehicle 1 in the driving process.
  • the camera 3 is a large field-of- view camera 3 and may have a viewing angle which is larger than 180°.
  • the camera 3 consecutively acquires image frames.
  • the fre uency of acquiring image frames may be as high as 30 frames/second. However, for effectively processing the image frames, a processing frequency of 5 frames/second has shown to be sufficient.
  • a first image frame 6 of the consecutively acquired image frames is input into a convolutional neural network in real time.
  • the convolutional neural network is provided in the evaluation unit 4 to which the image frames of the camera 3 are transmitted.
  • the convolutional neural network it is examined whether any object 5 which is not part of the ground area the automotive vehicle 1 is driving on can be detected in the first image frame 6. If such objects 5 can be detected in the first image frame 6, these objects are semantically classified by the convolutional neural network by assigning each detected object to one of a list of predefined object classes.
  • these object classes may be "person”, “car”, “wall”, “tree”,...
  • a lookup-table with a priority list which comprises a priority level for each of the predefined object classes, respectively, is provided.
  • this priority list looks as follows: person priority 1
  • This priority list may have further object classes which are related to respective priorities.
  • a respective priority level is determined for each object which has been detected in the first image frame 6 by comparison with the lookup table.
  • a respective image frame 6 can be seen from Fig. 2a.
  • this image frame 6 two persons are detected as one object 5, and further a wall is detected as another object 8. Since the object class "person ' ' has a higher priority than the object class "wall" a high priority area 9 is determined which relates to the object 5 which belong to the object class "person ".
  • a next image frame 7 of the consecutively acquired image frames is input into the convolutional neural network in real time, wherein only the high priority area 9 in the next image frame 7 is analyzed by the convolutional neural network.
  • Fig. 2b the image frame 7 which is processed by the convolutional neural network for semantic segmentation of the objects 5 relates to the high priority area determined in the previous method step in image frame 6.
  • the objects 5 can be processed in much higher resolution which makes semantic segmentation of the objects 5, i.e.
  • a high priority area with objects which should be classified may also be defined in a multi-step process as described in the following with respect to Figs. 3a to d.
  • a high priority area 8 is defined which comprises two objects 5, 8 which belong to different object classes, i.e. "person” and "wall".
  • a high priority area 9 is defined which comprises both objects 5, 8 which then, in the next image frame 7 shown in Fig. 3b can be analyzed with higher resolution.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to a method for semantic segmentation of an object (5, 8) in an image, comprising the following method steps: - consecutively acquiring image frames (6, 7, 11), - inputting a first image frame (6) of the consecutively acquired image frames (6, 7, 1) into a convolutional neuronal network in real time, - examining by the convolutional neuronal network whether any object (5, 8) can be detected in the first image frame (6), - semantically classifying the detected objects (5, 8) by the convolutional neuronal network by assigning each detected object (5, 8) to one of a list of predefined object classes, - providing a lookup-table with a priority list which comprises a priority level for each of the predefined object classes, respectively, - determining a respective priority level of the detected objects (5, 8) by comparison with the lookup-table, - determining one or more object(s) (5) which have a predefined priority level, - determining a high priority area (9) of the image frame (6) which relates to the or an object (5) with the predefined priority level, - inputting a next image frame (7) of the consecutively acquired image frames (5, 6) into the convolutional neuronal network in real time, - analyzing only the high priority (9) area in the next image frame (7) by the convolutional neuronal network. In this way, an efficient CNN architecture design that can be applied for an automotive camera (3) with a large field of view taking advantage of the large field of view.

Description

Semantic Segmentation of an Object in an Image
The invention relates to a method for semantic segmentation of an object in an image, comprising the following method steps:
consecutively acquiring image frames,
inputting a first image frame of the consecutively acquired image frames into a convolutional neuronal network in real time, and
examining by the convolutional neuronal network whether any object can be detected in the first image frame for semantic segmentation.
One of the most fundamental problems in automotive computer vision is the semantic segmentation of objects in an image. The segmentation approach refers to the problems of associating every pixel to its corresponding object class. In recent times, there was a surge of convolution neural network (CNN) research and design aided by increase in computational power in computer architectures and the availability of large annotated datasets.
CNNs are highly successful at classification and categorization tasks but much of the research is on standard photometric RGB images and is not focused on embedded automotive devices. Automotive hardware devices need to have low power consumption requirements and thus low computational power.
In machine learning, a convolutional neural is a class of deep, feed-forward artificial neural networks that has successfully been applied to analyzing visual imagery. CNNs use a variation of multilayer perceptrons designed to require minimal preprocessing. Convolutional networks were inspired by biological processes in which the connectivity pattern between neurons is inspired by the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field.
CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns the filters that in traditional algorithms were hand-engineered. This independence from prior knowledge and human effort in feature design is a major advantage. CNNs have applications in image and video recognition, recommender systems and natural language processing. In this respect, US 2017/0200063 A1 teaches applying a set of sections spanning a down-sampled version of an image of a road-scene to a low-fidelity classifier to determine a set of candidate sections for depicting one or more objects in a set of classes. The set of candidate sections of the down-sampled version may be mapped to a set of potential sectors in a high-fidelity version of the image. A high-fidelity classifier may be used to vet the set of potential sectors, determining the presence of one or more objects from the set of classes. The low-fidelity classifier may include a first convolution neural network trained on a first training set of down-sampled versions of cropped images of objects in the set of classes. Similarly, the high-fidelity classifier may include a second CNN trained on a second training set of high-fidelity versions of cropped images of objects in the set of classes.
From US 2017/0099200 A1 it is known that data is received characterizing a request for agent computation of sensor data. The request includes a required confidence and required latency for completion of the agent computation. Agents to query are determined based on the required confidence. Data is transmitted to query the determined agents to provide analysis of the sensor data.
US 9.704,054 B1 describes that image classification and related imaging tasks performed using machine learning tools may be accelerated by using tools to associate an image with a cluster of such labels or categories, and then to select one of the labels or categories of the cluster as associated with the image. The clusters of labels or categories may comprise labels that are mutually confused for one another, e.g. two or more labels or categories that have been identified as associated with a single image. By defining clusters of labels or categories, and configuring a machine learning tool to associate an image with one of the clusters, processes for identifying labels or categories associated with images may be accelerated because computations associated with labels or categories not included in the cluster may be omitted.
It is an objective of the present invention to provide an efficient CNN architecture design that can be applied for an automotive camera with a large field of view taking advantage of the large field of view. This object is addressed by the subject matter of the independent claims. Preferred embodiments are described in the sub claims.
Therefore, the invention provides a method for semantic segmentation of an object in an image, comprising the following method steps:
consecutively acquiring image frames,
inputting a first image frame of the consecutively acquired image frames into a convolutional neuronal network in real time,
examining by the convolutional neuronal network whether any object can be detected in the first image frame,
semantically classifying the detected objects by the convolutional neuronal network by assigning each detected object to one of a list of predefined object classes, providing a lookup-table with a priority list which comprises a priority level for each of the predefined object classes, respectively,
determining a respective priority level of the detected objects by comparison with the lookup-table,
determining one or more object(s) which have a predefined priority level, determining a high priority area of the image frame which relates to the or an object with the predefined priority level,
inputting a next image frame of the consecutively acquired image frames into the convolutional neuronal network in real time,
analyzing only the high priority area in the next image frame by the convolutional neuronal network.
Thus, it is an essential idea of the invention that instead of regularly processing whole images only a section of the image may be processed with higher resolution for semantic segmentation of objects in the image. Especially, instead of always analyzing the complete image, a high priority area of the image is determined in a first image frame based on the priority levels of the objects detected in the image. Then, in a next image frame, only the high priority area of the image is processed which makes the method a lot more effective. Preferably, the priority levels of the different object classes are defined based on an order of safety, e.g. objects belonging to the object class "person" might be more important than objects belonging to the object class "curbside". Preferably, at the beginning of this method, the high priority area would be defined by the object(s) with the highest priority level, i.e. the predefined priority level would be the highest priority level. If these objects have been classified in a trustworthy way, areas of the image with objects having lower priority levels may be processed.
The step of analyzing only the high priority area in the next image frame by the convoiutional neuronal network may be performed in different ways as set out in the following. According to a preferred embodiment of the invention, analyzing only the high priority area in the next image frame by the convoiutional neuronal network is performed by
examining by the convoiutional neuronal network whether any object can be detected in the high priority area,
semantically classifying the detected objects by the convoiutional neuronal network by assigning each detected object to one of the list of predefined object classes, determining a respective priority of the detected objects by comparison with the lookup-table,
determining the one or more object(s) with the predefined priority level, determining a new high priority area of the image frame which relates to the or an object with the predefined priority level,
inputting a next image frame of the consecutively acquired image frames into the convoiutional neuronal network in real time, and
analyzing only the new high priority area in the next image frame by the convoiutional neuronal network.
Preferably, the step of analyzing only the high priority area in the next image frame by the convoiutional neuronal network by
examining by the convoiutional neuronal network whether any object can be detected in the high priority area,
semantically classifying the detected objects by the convoiutional neuronal network by assigning each detected object to one of the list of predefined object classes, determining a respective priority of the detected objects by comparison with the lookup-table,
determining the one or more object(s) with the predefined priority level, determining a new high priority area of the image frame which relates to the or an object with the predefined priority level,
inputting a next image frame of the consecutively acquired image frames into the convolutional neuronal network in real time, and
analyzing only the new high priority area in the next image frame by the convolutional neuronal network, is repeated at least once.
In this way, a high priority area with objects which should be classified may be defined in a multi-step process. However, according to another preferred embodiment of the invention, such classification may also be performed directly after the first definition of the high priority area. Therefore, according to a preferred embodiment of the invention analyzing only the high priority area in the next image frame by the convolutional neuronal network is performed by semantically classifying the object by assigning the object to one of the list of predefined object classes. In this respect, preferably the following step is performed:
accepting the object class the object has assigned to when analyzing only the high priority area in the next image frame as a trustworthy object class. If such a trustworthy classification of objects with a certain priority level has been achieved, preferably areas with objects with the next smaller priority class are processed.
In general, inputting a next image frame of the consecutively acquired image frames into the convolutional neuronal network in real time may performed by inputting the complete image frame. However, according to a preferred embodiment of the invention, inputting a next image frame of the consecutively acquired image frames into the convolutional neuronal network in real time is performed by inputting only the high priority area of the next image frame into the convolutional neuronal network.
Further, according to a preferred embodiment of the invention, the step of consecutively acquiring image frames is performed by a camera with a field of view of more than 150 yielding respective image frames covering an image angle of more than 150 \ More preferably, the camera has a field of view of more than 180 yielding respective image frames covering an image angle of more than 180°. In this way, a large field of view may be monitored while the mere amount of pixels of the images acquired by such a camera does not slow down processing speed appreciably since not the complete images have to processed for all image frames.
The invention also relates to the use of a method as described above in an automotive vehicle.
The invention further relates to a sensor arrangement for an automotive vehicle configured for performing a method as described above.
The invention also relates to a non-transitory computer-readable medium, comprising instructions stored thereon, that when executed on a processor, induce a sensor arrangement of an automotive vehicle to perform a method as described above.
These and other aspects of the invention will be apparent from and elucidated with reference to the embodiments described hereinafter. Individual features disclosed in the embodiments con constitute alone or in combination an aspect of the present invention. Features of the different embodiments can be carried over from one embodiment to another embodiment.
In the drawings:
Fig. 1 schematically depicts a vehicle with a sensor arrangement according to a preferred embodiment of the invention,
Figs. 2 a, b schematically depict the processing of image frames according to a
preferred embodiment of the invention, and
Figs. 3a - d schematically depict a further aspect of the processing of image frames according to a preferred embodiment of the invention.
Fig. 1 schematically depicts an automotive vehicle 1 with a sensor arrangement 2 which is comprised of a camera 3 and an evaluation unit 4. The sensor arrangement 2 is adapted for semantic segmentation of images of objects 5 captured by camera 3. The evaluation unit 4 may be part of an advanced driver-assistance system for helping a driver of the automotive vehicle 1 in the driving process. The camera 3 is a large field-of- view camera 3 and may have a viewing angle which is larger than 180°.
The method performed by the sensor arrangement 2 according to the preferred embodiment of the invention is as described in the following:
The camera 3 consecutively acquires image frames. The fre uency of acquiring image frames may be as high as 30 frames/second. However, for effectively processing the image frames, a processing frequency of 5 frames/second has shown to be sufficient. For processing the image frames, a first image frame 6 of the consecutively acquired image frames is input into a convolutional neural network in real time. The convolutional neural network is provided in the evaluation unit 4 to which the image frames of the camera 3 are transmitted.
In the convolutional neural network it is examined whether any object 5 which is not part of the ground area the automotive vehicle 1 is driving on can be detected in the first image frame 6. If such objects 5 can be detected in the first image frame 6, these objects are semantically classified by the convolutional neural network by assigning each detected object to one of a list of predefined object classes.
According to the preferred embodiment described here, these object classes may be "person", "car", "wall", "tree",... Such semantical classification of objects by a
convolutional neural network is well-known to the man skilled in the art and does not require any further explanations here.
However, differently from conventional methods, according to the preferred embodiment of the invention, a lookup-table with a priority list which comprises a priority level for each of the predefined object classes, respectively, is provided. In the present case, this priority list looks as follows: person priority 1
car priority 2
wall priority 3
tree priority 4 This priority list may have further object classes which are related to respective priorities. A respective priority level is determined for each object which has been detected in the first image frame 6 by comparison with the lookup table.
A respective image frame 6 can be seen from Fig. 2a. In this image frame 6 two persons are detected as one object 5, and further a wall is detected as another object 8. Since the object class "person'' has a higher priority than the object class "wall" a high priority area 9 is determined which relates to the object 5 which belong to the object class "person ".
Then, a next image frame 7 of the consecutively acquired image frames is input into the convolutional neural network in real time, wherein only the high priority area 9 in the next image frame 7 is analyzed by the convolutional neural network. This is shown in Fig. 2b in which the image frame 7 which is processed by the convolutional neural network for semantic segmentation of the objects 5 relates to the high priority area determined in the previous method step in image frame 6. In this way, the objects 5 can be processed in much higher resolution which makes semantic segmentation of the objects 5, i.e.
assigning the objects 5 to one of the list of predefined object classes, easier and, thus, more trustworthy.
However, according to a preferred embodiment of the invention, a high priority area with objects which should be classified may also be defined in a multi-step process as described in the following with respect to Figs. 3a to d.
In Fig. 3a it is shown that a high priority area 8 is defined which comprises two objects 5, 8 which belong to different object classes, i.e. "person" and "wall". Instead of directly focusing on object 5 which is the object with the higher priority, a high priority area 9 is defined which comprises both objects 5, 8 which then, in the next image frame 7 shown in Fig. 3b can be analyzed with higher resolution.
This analysis with higher resolution allows to clearly distinguish between the two objects 5, 8, and to define a new high priority area 10 which only relates to the object 5 which belongs to the object class with the highest priority, i.e. "person" as shown in Fig. 3c. Then, in a further image frame 1 1 shown in Fig. 3d, only this new high priority area 10 is examined, i.e. semantic segmentation is only performed for object 5 in order to verify that the object 5 detected here does actually belong to the object class "person".
Reference signs list
automotive vehicle
sensor arrangement
camera
evaluation unit
persons
first image frame
next image frame
wall
high priority area
new high priority area
further next image frame

Claims

Claims
1 . Method for semantic segmentation of an object (5, 8) in an image, comprising the following method steps:
consecutively acquiring image frames (6, 7, 1 1 ),
inputting a first image frame (6) of the consecutively acquired image frames (6, 7, 1 1 ) into a convolutional neuronal network in real time,
examining by the convolutional neuronal network whether any object (5, 8) can be detected in the first image frame (6),
semantically classifying the detected objects (5, 8) by the convolutional neuronal network by assigning each detected object (5, 8) to one of a list of predefined object classes,
providing a lookup-table with a priority list which comprises a priority level for each of the predefined object classes, respectively,
determining a respective priority level of the detected objects (5, 8) by comparison with the lookup-table,
determining one or more object(s) (5) which have a predefined priority level, determining a high priority area (9) of the image frame (6) which relates to the or an object (5) with the predefined priority level,
inputting a next image frame (7) of the consecutively acquired image frames (5, 6) into the convolutional neuronal network in real time,
analyzing only the high priority (9) area in the next image frame (7) by the convolutional neuronal network.
2. Method according to claim 1 , wherein analyzing only the high priority area (9) in the next image frame (7) by the convolutional neuronal network is performed by
examining by the convolutional neuronal network whether any object (5, 8) can be detected in the high priority area (9),
semantically classifying the detected objects (5, 8) by the convolutional neuronal network by assigning each detected object (5, 8) to one of the list of predefined object classes,
determining a respective priority of the detected objects (5, 8) by comparison with the lookup-table,
determining the one or more object(s) (5) with the predefined priority level, determining a new high priority area (10) of the image frame which relates to the or an object (5) with the predefined priority level,
inputting a next image frame (1 1 ) of the consecutively acquired image frames (5, 6) into the convolutional neuronal network in real time, and
analyzing only the new high priority area (10) in the next image frame (1 1 ) by the convolutional neuronal network.
3. Method according to claim 2, by repeating at least once the step of analyzing only the high priority area in a further next image frame by the convolutional neuronal network by
examining by the convolutional neuronal network whether any object (5, 8) can be detected in the high priority area,
semanticaliy classifying the detected objects (5, 8) by the convolutional neuronal network by assigning each detected object to one of the list of predefined object classes, determining a respective priority of the detected objects (5, 8) by comparison with the lookup-table,
determining the one or more object(s) (5) with the predefined priority level, determining a new high priority area of the image frame which relates to the or an object (5) with the predefined priority level,
inputting a further next image frame of the consecutively acquired image frames into the convolutional neuronal network in real time, and
analyzing only the new high priority area in the next image frame by the
convolutional neuronal network.
4. Method according to any of claims 1 to 3, wherein analyzing only the high priority area (9, 10) in the next image frame (7, 1 1 ) by the convolutional neuronal network is performed by semanticaliy classifying the object (5, 8) by assigning the object (5, 8) to one of the list of predefined object classes.
5. Method according to claim 4 comprising the following method step:
accepting the object class the object (5, 8) has assigned to when analyzing only the high priority area in the next image (7, 1 1 ) frame as a trustworthy object class.
6. Method according to any of the previous claims, wherein inputting a next image frame (7, 1 1 ) of the consecutively acquired image (6, 7, 1 1 ) frames into the convolutional neuronal network in real time is performed by inputting only the high priority area (9, 10) of the next image frame (7, 1 1 ) into the convolutional neuronal network.
7. Method according to any of the previous claims wherein the step of consecutively acquiring image frames (6, 7, 1 1 ) is performed by a camera (3) with a field of view of more than 150 yielding respective image frames (6, 7. 1 1 ) covering an image angle of more than 150°.
8. Use of the method according to any of the previous claims in an automotive vehicle (1 ).
9. Sensor arrangement (2) for an automotive vehicle (1 ) configured for performing the method according to any of claims 1 to 8.
10. Non-transitory computer-readable medium, comprising instructions stored thereon, that when executed on a processor, induce a sensor arrangement (2) of an automotive vehicle (1 ) to perform the method of any of claims 1 to 8.
PCT/EP2018/078192 2017-10-20 2018-10-16 Semantic segmentation of an object in an image WO2019076867A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102017124600.2A DE102017124600A1 (en) 2017-10-20 2017-10-20 Semantic segmentation of an object in an image
DE102017124600.2 2017-10-20

Publications (1)

Publication Number Publication Date
WO2019076867A1 true WO2019076867A1 (en) 2019-04-25

Family

ID=63896158

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2018/078192 WO2019076867A1 (en) 2017-10-20 2018-10-16 Semantic segmentation of an object in an image

Country Status (2)

Country Link
DE (1) DE102017124600A1 (en)
WO (1) WO2019076867A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113392837A (en) * 2021-07-09 2021-09-14 超级视线科技有限公司 License plate recognition method and device based on deep learning
GB2607420A (en) * 2021-04-06 2022-12-07 Canon Kk Image processing apparatus and method for controlling the same

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102021003439A1 (en) 2021-07-02 2021-08-19 Daimler Ag Method for drawing attention to at least one occupant in a vehicle
DE102021004931A1 (en) 2021-10-01 2021-12-09 Daimler Ag Method for processing environmental data in a vehicle

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170099200A1 (en) 2015-10-06 2017-04-06 Evolv Technologies, Inc. Platform for Gathering Real-Time Analysis
US9704054B1 (en) 2015-09-30 2017-07-11 Amazon Technologies, Inc. Cluster-trained machine learning for image processing
US20170200063A1 (en) 2016-01-13 2017-07-13 Ford Global Technologies, Llc Low- and high-fidelity classifiers applied to road-scene images

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6400831B2 (en) * 1998-04-02 2002-06-04 Microsoft Corporation Semantic video object segmentation and tracking
US6697502B2 (en) * 2000-12-14 2004-02-24 Eastman Kodak Company Image processing method for detecting human figures in a digital image
US9607224B2 (en) * 2015-05-14 2017-03-28 Google Inc. Entity based temporal segmentation of video streams

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9704054B1 (en) 2015-09-30 2017-07-11 Amazon Technologies, Inc. Cluster-trained machine learning for image processing
US20170099200A1 (en) 2015-10-06 2017-04-06 Evolv Technologies, Inc. Platform for Gathering Real-Time Analysis
US20170200063A1 (en) 2016-01-13 2017-07-13 Ford Global Technologies, Llc Low- and high-fidelity classifiers applied to road-scene images

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SERGI CAELLES ET AL: "Semantically-Guided Video Object Segmentation", 6 April 2017 (2017-04-06), XP055543131, Retrieved from the Internet <URL:https://arxiv.org/pdf/1704.01926v1.pdf> [retrieved on 20190116] *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2607420A (en) * 2021-04-06 2022-12-07 Canon Kk Image processing apparatus and method for controlling the same
CN113392837A (en) * 2021-07-09 2021-09-14 超级视线科技有限公司 License plate recognition method and device based on deep learning

Also Published As

Publication number Publication date
DE102017124600A1 (en) 2019-04-25

Similar Documents

Publication Publication Date Title
US10719743B2 (en) License plate reader using optical character recognition on plural detected regions
CN108399628B (en) Method and system for tracking objects
CN110139597B (en) System and method for iterative classification using neurophysiological signals
US10860837B2 (en) Deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition
JP7386545B2 (en) Method for identifying objects in images and mobile device for implementing the method
WO2019076867A1 (en) Semantic segmentation of an object in an image
US20180189610A1 (en) Active machine learning for training an event classification
Ge et al. Exploiting local and generic features for accurate skin lesions classification using clinical and dermoscopy imaging
US20180285698A1 (en) Image processing apparatus, image processing method, and image processing program medium
WO2016197303A1 (en) Image semantic segmentation
CN104077579B (en) Facial expression recognition method based on expert system
US10445602B2 (en) Apparatus and method for recognizing traffic signs
US20100111375A1 (en) Method for Determining Atributes of Faces in Images
Shenavarmasouleh et al. Drdr: Automatic masking of exudates and microaneurysms caused by diabetic retinopathy using mask r-cnn and transfer learning
CN111539320B (en) Multi-view gait recognition method and system based on mutual learning network strategy
JP2021533506A (en) Systems and methods for video anomaly detection and storage media
CN112487844A (en) Gesture recognition method, electronic device, computer-readable storage medium, and chip
Niu et al. Automatic localization of optic disc based on deep learning in fundus images
Thabet et al. Fast marching method and modified features fusion in enhanced dynamic hand gesture segmentation and detection method under complicated background
Kundu et al. Vision transformer based deep learning model for monkeypox detection
WO2019137915A1 (en) Generating input data for a convolutional neuronal network
Oh et al. Visual adversarial attacks and defenses
Borji et al. Bottom-up attention, models of
CN109711260B (en) Fatigue state detection method, terminal device and medium
KR20210089044A (en) Method of selecting training data for object detection and object detection device for detecting object using object detection model trained using method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18789091

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18789091

Country of ref document: EP

Kind code of ref document: A1