WO2001016885A1

WO2001016885A1 - Method for real-time segmentation of video objects in known stationary image background

Info

Publication number: WO2001016885A1
Application number: PCT/EP2000/008110
Authority: WO
Inventors: Andreas Graffunder; Silko-Matthias Kruse; Ralf Tanger
Original assignee: Deutsche Telekom Ag
Priority date: 1999-08-27
Filing date: 2000-08-19
Publication date: 2001-03-08
Also published as: DE19941644A1

Abstract

The methods for the segmentation of foreground objects of video images have been developed further since establishment of the conventional color keying methods that required an image background that was colored in a certain manner. Presently used methods are also limited. The program Pfinder for example requires certain information with respect to the form and texture of the foreground object. For the approach of M. Bichsel it is necessary to mark individual background zones. The inventive method facilitates the segmentation of objects of any shape or texture and with any number of holes as long as the color values of the foreground and the background are not identical in a certain image element. For a segmentation of foreground objects the difference between the image to be analyzed and the background image calculated by the average value of several takes of the individual image pixels is compared with a threshold value that is fixed to the maximum noise. Changes of brightness in the background of the image are compensated by adaptation and the adaptation is linked with the segmentation in such a manner that the method works precisely and stably. If several color channels are evaluated, only matching results are used for further analysis.

Description

Verfahren zur echtzeitfähigen Segmentierung von Videoobjekten bei bekanntem stationären BildhintergrundMethod for realtime segmentation of video objects with known stationary image background

Beschreibungdescription

Die Erfindung betrifft das Gebiet der Segmentierung (Separierung) von Vordergrundobjekten (Personen) bei aufgenommenen Videobildern für eine Weiterverarbeitung im Echtzeitbetrieb.The invention relates to the field of segmentation (separation) of foreground objects (people) in recorded video images for further processing in real time.

Nach dem Stand der Technik sind für das Separieren "Herausschneiden" vonAccording to the prior art, "cutting out" of

Gesprächspersonen aus aufgenommenen Videobildern verschiedene Verfahren bekannt. In der Tradition stehen color keying Methoden, wie z.B. blue screen. Bei color keying Verfahren werden die zu segmentierenden Objekte vor einem bekannten, homogen gefärbten Bildhintergrund aufgenommen, wobei jeder Bildpunkt als zum Hintergrund gehörend interpretiert wird, dessen Farbwert innerhalb einer bestimmten, im Farbraum um den color key liegenden Umgebung liegt. Eine derartige Klassifikation läßt sich problemlos in Echtzeit durchführen. Folglich stellen derartige Verfahren eine Basistechnologie heutiger Fernseh- und Video-technik dar. Der Nachteil dieses Ansatzes besteht in dem Erfordernis eines in bestimmter Weise getönten Bildhintergrundes.Various methods are known to the interlocutor from recorded video images. Color keying methods, such as blue screen. In the case of color keying methods, the objects to be segmented are recorded against a known, homogeneously colored image background, each pixel being interpreted as belonging to the background, the color value of which lies within a specific environment in the color space around the color key. Such a classification can easily be carried out in real time. Consequently, such methods represent a basic technology of today's television and video technology. The disadvantage of this approach lies in the need for a picture background which is tinted in a certain way.

Auf etwas anderer Basis arbeitet das Programm „Pfinder", welches im Rahmen des Projekts Azarbayejani, Trevor Darrel, Alex Pentland: „Pfinder: Real-Time Tracking of the Human Body", IEEE Transactions on Pattern Analysis and Machine Intelligence, „Intelligentes Zimmer" am MIT entwickelt wurde (siehe Christopher Wren, Ali July 1997, vol 19, no 7. Pp. 780-785 und Christopher Wren, Ali Azarbayejani, Trevor Darrel, Alex Pentland: „Real-Time Tracking of the Human Body", http://www-hite.media.mit.eda/visniod/demos/pimder).The "Pfinder" program works on a somewhat different basis. It is part of the Azarbayejani, Trevor Darrel, Alex Pentland project: "Pfinder: Real-Time Tracking of the Human Body", IEEE Transactions on Pattern Analysis and Machine Intelligence, "Intelligentes Zimmer" was developed at MIT (see Christopher Wren, Ali July 1997, vol 19, no 7. pp. 780-785 and Christopher Wren, Ali Azarbayejani, Trevor Darrel, Alex Pentland: "Real-Time Tracking of the Human Body", http: //www-hite.media.mit.eda/visniod/demos/pimder).

Bei diesem Verfahren werden die Bildpixel mittels eines Bayes-Klassifikators und einer maximum a posteriori (MAP) Schätzung als zum Szenenhintergrund oder zu einem „Blob" gehörend klassifiziert. „Blobs" sind dabei einfache geometrische Primitive zur Beschreibung der Vordergrundobjekte. Da dieses Verfahren nicht mit einfachen Schwell werten, sondern mit einer vergleichsweise aufwendigen MAP-Schätzung arbeitet, ist es als wesentlich langsamer einzustufen. Pfinder segmentiert auf einer SGI Indy mit 175 MHz R 4400 CPU und Vino Video 10 Bilder pro Sekunde, wobei die Eingabebilder auf 1/16 ihrer Ursprungsgröße (160 x 120) reduziert werden.In this method, the image pixels are classified as belonging to the scene background or to a "blob" using a Bayesian classifier and a maximum a posteriori (MAP) estimate. "Blobs" are simple geometric primitives for describing the foreground objects. Since this procedure is not easy Threshold values, but works with a comparatively complex MAP estimate, it can be classified as much slower. Pfinder segments on a SGI Indy with 175 MHz R 4400 CPU and Vino Video 10 frames per second, whereby the input images are reduced to 1/16 of their original size (160 x 120).

Das von M. Bichsel vorgestellte Verfahren hat eine Echtzeit-Segmentierung zum Ziel (siehe Martin Bichsel. "Segmenting simply connected moving objects in a static scene", Pattern Analysis and Machine Intelligence, 16(11); 1138-1142, Nov. 1994).The method presented by M. Bichsel aims at real-time segmentation (see Martin Bichsel. "Segmenting simply connected moving objects in a static scene", Pattern Analysis and Machine Intelligence, 16 (11); 1138-1142, Nov. 1994) ,

Bei diesem Verfahren wird allerdings nicht die Differenz zu einem Hintergrundbild berechnet, sondern das Segment eines Vordergrundobjektes auf der Grundlage der im Differenzbild zwischen je zwei Eingabebildern vorhandenen Gradienteninformation bestimmt. Auch dieses Verfahren beruht auf einem Bayes-Klassifikator, ist vergleichsweise langsam und steht daher nicht in direkter Konkurrenz zu dem hier beschriebenen. Auf einer SUN SPARCstation 1+ wurden 3.0 Sekunden für ein 256 x 240 Bildpunkte großes Bild benötigt.In this method, however, the difference to a background image is not calculated, but rather the segment of a foreground object is determined on the basis of the gradient information present in the difference image between two input images. This method too is based on a Bayesian classifier, is comparatively slow and is therefore not in direct competition with the one described here. On a SUN SPARCstation 1+, it took 3.0 seconds for a 256 x 240 pixel image.

Die vorliegende Erfindung beinhaltet ein Verfahren, das auf die im Bereich neuer Multimedia-Anwendungen (virtueller Shop, virtuelle Telefonkonferenz) erforderlichen Echtzeit-Segmentierungen von Gesprächsteilnehmern ausgerichtet ist. Bei dieserThe present invention includes a method which is aimed at the real-time segmentation of call participants required in the field of new multimedia applications (virtual shop, virtual telephone conference). At this

Spezialanwendung kann in der Regel von einer ruhenden Kamera ausgegangen werden. Ebenfalls ist eine Initialisierung ohne Vordergrundobjekt möglich, wodurch ein Hintergrundbild eingezogen werden kann. Diese Voraussetzungen ermöglichen den Einsatz schneller Algorithmen, wobei ein Kompromiß zwischen der Güte der Segmentierung und der Schnelligkeit des Verfahrens gefunden werden muß.Special applications can usually be assumed to be from a still camera. Initialization is also possible without a foreground object, which means that a background image can be drawn in. These prerequisites enable the use of fast algorithms, whereby a compromise must be found between the quality of the segmentation and the speed of the method.

Im Gegensatz zu dem Ansatz „Pfinder" werden keine Informationen über Form oder Textur des Vordergrundobjekts vorausgesetzt. Die vorliegende Erfindung kann beliebig geformte und texturierte Objekte mit beliebig vielen Löchern segmentieren, vorausgesetzt, die Farbwerte von Vordergrund und Hintergrund in einem bestimmten Bildelement sind nicht identisch. Im Gegensatz zu dem Ansatz von M. Bichsel wird keine Markierung der einzelnen Hintergrundregionen benötigt.In contrast to the "Pfinder" approach, no information about the shape or texture of the foreground object is required. The present invention can segment arbitrarily shaped and textured objects with any number of holes, provided that the color values of foreground and background in a certain picture element are not identical. In contrast to M. Bichsel's approach, no marking of the individual background regions is required.

Ein Algorithmus wertet im Prinzip pro Pixel die Differenz zwischen dem zu analysierenden Bild und dem bekannten Hintergrund aus. Ist die Differenz für einen Pixel größer als eine Schwelle, wird der Pixel als Vordergrund markiert, sonst als Hintergrund. Zur Gewinnung des Hintergrunds wird dieser zuerst ohne ein Vordergrundobjekt mehrfach aufgenommen. Das Mittel der Aufnahmen wird dann als Hintergrundbild gespeichert. Durch die Mittelung werden Rauschstörungen gemindert. Danach wird für jeden Pixel das maximale Rauschen geschätzt. Die Schwelle für die Entscheidung Vordergrund oder Hintergrund wird dann auf das maximale Rauschen festgesetzt. Die Idee ist, dass Grauwerte (Farbwerte) mit einer größeren Abweichung vom gespeicherten Hintergrund als das maximale Rauschen zum Vordergrund gehören müssen.In principle, an algorithm evaluates the difference per pixel between the image to be analyzed and the known background. If the difference for a pixel is greater than a threshold, the pixel is marked as the foreground, otherwise as the background. To obtain the background, it is first recorded several times without a foreground object. The average of the recordings is then saved as a background image. Noise interference is reduced by averaging. The maximum noise is then estimated for each pixel. The threshold for the foreground or background decision is then set to the maximum noise. The idea is that gray values (color values) with a greater deviation from the stored background than the maximum noise must belong to the foreground.

Der Vorteil dieses Verfahrens ist, dass aufgrund der sehr kleinen Schwell werte das Vordergrundobjekt sehr genau segmentiert werden kann. Von Nachteil ist jedoch, dass schon kleinste Beleuchtungsänderungen, wie z. B. Wolken, den Hintergrund so stark verändern, dass er als Vordergrund erkannt würde. Aus diesem Grund wurde eine leistungsstarke Adaption entwickelt, die sowohl sehr lokale, als auch globale Beleuchtungsänderungen detektiert und kompensiert. Durch das Zusammenspiel der sehr genauen Segmentierung und der Adaption ist ein sehr präzises und stabiles Verfahren entstanden, dessen Leistungsfähigkeit deutlich über bisher bekannten Ansätzen steht.The advantage of this method is that the foreground object can be segmented very precisely due to the very small threshold values. The disadvantage, however, is that even the smallest changes in lighting, such as. B. clouds, change the background so strongly that it would be recognized as the foreground. For this reason, a powerful adaptation was developed that detects and compensates for both very local and global lighting changes. The interplay of the very precise segmentation and the adaptation has resulted in a very precise and stable process, the performance of which is well above previously known approaches.

Das Verfahren beinhaltet die Extraktion eines Objekts aus einem Farbbildsignal, welches in Form von digitalen Komponenten für 3 Farbkanäle vorliegt. Das Verfahren arbeitet auf beliebigen Farbräumen wie z. B. RGB und YUV, vorausgesetzt, alle Farbkanäle liegen in der gleichen Auflösung vor. Die Bearbeitung der Farbkanäle erfolgt in identisch aufgebauten, unabhängig arbeitenden Subsystemen (Segmentierungsabschnitten). Diese Segmentierungsabschnitte können in 2 Betriebsarten angewendet werden: Initialisierungsbetrieb und Segmentierungsbetrieb. In einem nachgeschalteten System (Multiplexer) werden die Ergebnisse der Subsysteme so kombiniert, dass ein digitales Signal entsteht, welches für jedes Element des Eingangssignals eine Markierung enthält, ob das Eingangselement zum gesuchten Objekt gehört oder nicht (Ergebnismaske).The method includes the extraction of an object from a color image signal, which is in the form of digital components for 3 color channels. The method works on any color space such as. B. RGB and YUV, provided all color channels are in the same resolution. The color channels are processed in identically structured, independently working subsystems (segmentation sections). These segmentation sections can be used in 2 operating modes: initialization mode and segmentation mode. The results of the subsystems are combined in a downstream system (multiplexer) in such a way that a digital Signal is generated which contains a marking for each element of the input signal, whether the input element belongs to the object being searched for or not (result mask).

Das technische Problem, das der Erfindung zugrunde liegt, besteht darin, das Objekt mit minimalem Aufwand exakt aus jedem Farbkanal zu extrahieren, auch wenn sich die Beleuchtung der das Objekt umgebenden Bereiche ändert, sowie die Informationen der Farbkanäle so zu kombinieren, dass sowohl im gesuchten Objekt, als auch im Hintergrund, möglichst wenig Fehlstellen auftreten.The technical problem on which the invention is based is to extract the object exactly from each color channel with minimal effort, even if the lighting of the areas surrounding the object changes, and to combine the information of the color channels in such a way that both the desired Object, as well as in the background, as few defects as possible occur.

Die erfindungsgemäße Lösung sieht vor, dass im Segmentierungsabschnitt durchgeführt werden:The solution according to the invention provides that in the segmentation section:

1. Initialisierungsbetrieb1. Initialization operation

« die Aufnahme einer Reihe von Bildern, wobei sichergestellt sein muss, dass sich das zu suchende Objekt nicht auf den Bildern befindet;«Taking a series of pictures, making sure that the object to be searched is not on the pictures;

• eine Mittelung der Bildreihe im Initialisierungsabschnitt in dem Sinne, dass von den jeweils korrespondierenden Pixeln der Bilder das arithmetische Mittel berechnet wird und dieser in einem Mittelwertbild gespeichert wird;An averaging of the image row in the initialization section in the sense that the arithmetic mean of the respectively corresponding pixels of the images is calculated and this is stored in an average value image;

• eine Transferierung des Mittelwertbildes vom Initialisierungsabschnitt in den Hintergrundpuffer;A transfer of the mean value image from the initialization section into the background buffer;

• die Aufnahme einer weiteren Reihe von Bildern, wobei sichergestellt sein muß, dass sich das zu suchende Objekt nicht auf den Bildern befindet;• the taking of a further series of images, whereby it must be ensured that the object to be searched is not on the images;

• die Berechnung des maximalen Rauschens pro Pixel im Initialisierungsabschnitt in der Art, dass die Pixel aller zur Rauschschätzung aufgenommenen Bilder mit den jeweils korrespondierenden Pixeln des Mittelwertbildes aus dem Hintergrundpuffer verglichen werden, und die für jeden Pixel maximale Differenz zwischen den aufgenommenen Bildern und dem Hintergrundpuffer gespeichert wird;• the calculation of the maximum noise per pixel in the initialization section in such a way that the pixels of all images recorded for noise estimation are compared with the corresponding pixels of the mean value image from the background buffer, and the maximum difference between the pixels for each pixel recorded images and the background buffer is stored;

• die Transferierung des aus den maximalen Abständen berechneten Rauschbildes vom Initialisierungsabschnitt in den Schwellwertpuffer;The transfer of the noise image calculated from the maximum distances from the initialization section into the threshold value buffer;

• Schaltung des Adaptionsabschnits in den Initialisierungsbetrieb.• Switching the adaptation section into the initialization mode.

2. Segmentierungsbetrieb2. Segmentation operation

• Transferierung des Inhalts des Schwellwertpuffers in die Schwellwertabschnitte;• transfer of the content of the threshold value buffer into the threshold value sections;

• Transferierung der adaptierten Hintergrundpuffer aus dem Adaptionsabschnitt in die Schwellwertabschnitte;• Transfer of the adapted background buffers from the adaptation section into the threshold value sections;

» in den Schwellwertabschnitten, Vergleich jedes Pixels des aktuellen Bildes mit dem adaptierten Hintergrund. Ist der Abstand kleiner als der Wert des korrespondierenden Eintrags im Schwellwertpuffer, wird das entsprechende Element des Ausgangsbildes als Hintergrund markiert, sonst als Vordergrund, d. h., dem gesuchten Objekt zugehörig;»In the threshold value sections, comparison of each pixel of the current image with the adapted background. If the distance is smaller than the value of the corresponding entry in the threshold value buffer, the corresponding element of the output image is marked as the background, otherwise as the foreground, i. that is, belonging to the searched object;

• pixelweise Verknüpfung der Ergebnisse der beiden Schwellwertabschnitte mit einem UND-Glied;• Linking the results of the two threshold value sections pixel by pixel with an AND gate;

• Transferierung der verknüpften Ergebnisse an den Adaptionsabschnitt zur Neuberechnung des Inhalts der Adaptionspuffer;• Transfer of the linked results to the adaptation section to recalculate the content of the adaptation buffer;

und im Adaptionsabschnitt durchgeführt werden: 1. Initialisierungsbetrieband carried out in the adaptation section: 1. Initialization operation

• Initialisierung eines Pixeladaptionspuffers mit 0, wobei die Größe des Puffers so zu wählen ist, dass für jeden Pixel ein Eintrag vorhanden ist;• Initialization of a pixel adaptation buffer with 0, the size of the buffer being chosen so that there is an entry for each pixel;

• Initialisierung eines Helligkeits-Adaptionspuffers mit 0, wobei die Größe des Puffers so zu wählen ist, dass für jede mögliche Helligkeitsstufe ein Eintrag vorhanden ist;• Initialization of a brightness adaptation buffer with 0, whereby the size of the buffer should be selected so that there is an entry for every possible brightness level;

2. Adaptionsbetrieb2. Adaptation operation

• Initialisierung einer Reihe von Akkumulatoren mit 0, wobei so viele Akkumulatoren vorzusehen sind, wie mögliche Helligkeitsstufen vorhanden sind,Initialization of a series of accumulators with 0, whereby as many accumulators are to be provided as there are possible brightness levels,

• Abarbeitung aller aktuell als Hintergrund detektierten Pixel und Ausführung folgender Funktionen:Processing of all pixels currently detected as background and execution of the following functions:

Vergleich jedes Pixels mit dem korrespondierenden Pixel aus dem Hintergrundpuffer, der Gestalt, dass die Differenz der Helligkeitsstufen berechnet wird;Comparing each pixel with the corresponding pixel from the background buffer, the shape that the difference in brightness levels is calculated;

Speicherung der Differenz im korrespondierenden Feld des Pixeladaptionspuffers ;Storage of the difference in the corresponding field of the pixel adaptation buffer;

Akkumulierung der Differenz in dem Akkumulator, der für die Helligkeitsstufe des korrespondierenden Pixels des Hintergrundpuffers vorgesehen ist;Accumulating the difference in the accumulator, which is provided for the brightness level of the corresponding pixel of the background buffer;

• Nachbearbeitung der Akkumulatoren derart, dass der Eintrag durch die Anzahl der akkumulierten Differenzen dividiert wird; • Berechnung eines pixeladaptierten Hintergrundpuffers durch Addition der Einträge des Pixeladaptionspuffers mit den korrespondierenden Einträgen des Hintergrundpuffers ;• Postprocessing the accumulators in such a way that the entry is divided by the number of accumulated differences; • Calculation of a pixel-adapted background buffer by adding the entries of the pixel adaptation buffer with the corresponding entries of the background buffer;

• Berechnung eines helligkeitsadaptierten Hintergrundpuffers durch Addition der• Calculation of a brightness-adapted background buffer by adding the

Einträge des Hinterpuffers mit dem Inhalt desjenigen Akkumulators, welcher für die Helligkeit des jeweiligen Elements (Pixels) des Hintergrundpuffers vorgesehen ist;Entries of the back buffer with the content of that accumulator, which is provided for the brightness of the respective element (pixel) of the background buffer;

und im Multiplexerabschnitt durchgeführt werden:and performed in the multiplexer section:

• Zusammenfassung der Ergebnisse der Schwellwertabschnitte der Gestalt, dass:• Summary of the results of the threshold sections of the shape that:

• Ergebnisse, die in allen 3 Segmentierungsabschnitten übereinstimmen, direkt in die Ergebnismaske übernommen werden;• Results that match in all 3 segmentation sections are transferred directly to the result mask;

• bei Pixeln, bei denen die Ergebnisse der 3 Segmentierungsabschnitte nicht übereinstimmen, sich der Eintrag in der Ergebnismaske aus dem negierten örtlichen Vorgänger des jeweiligen Eintrags in der Ergebnismaske bestimmt.• For pixels in which the results of the 3 segmentation sections do not match, the entry in the result mask is determined from the negated local predecessor of the respective entry in the result mask.

Zu denjenigen Operationen, die für die erfindungsgemäße technische Lehre von maßgeblichem Einfluß sind, gehören besonders die Art des Schwellwertpuffers, der für jeden Pixel einen Eintrag vorsieht, die Art wie die einzelnen Einträge des Schwellwertpuffers berechnet werden, sowie Art und Ausgestaltung des Adaptionsabschnitts, sowie die Art, wie Segmentierungsabschnitt und Adaptionsabschnitt miteinander verknüpft sind.Those operations which are of decisive influence for the technical teaching according to the invention include, in particular, the type of threshold value buffer, which provides an entry for each pixel, the way in which the individual entries of the threshold value buffer are calculated, and the type and configuration of the adaptation section, and also The way in which the segmentation section and the adaptation section are linked to one another.

Die Ausführung der Erfindung ist den Erläuterungen zu den Abbildungen Fig. 1 bis 4 zu entnehmen. Dabei zeigen schematisch: Fig. 1 : das Gesamtsystem zur Extraktion eines Objektes aus einem Farbbildsignal als Blockschaltbild Fig. 2: als Blockschaltbild den Segmentierungsabschnitt Fig. 3: als Blockschaltbild den Schwellwertabschnitt Fig. 4: als Blockschaltbild den AdaptionsabschnittThe embodiment of the invention can be found in the explanations for the figures FIGS. 1 to 4. 1 shows the overall system for extracting an object from a color image signal as a block diagram. FIG. 2 shows the segmentation section as a block diagram Fig. 3: the threshold section as a block diagram. Fig. 4: the adaptation section as a block diagram

Figur 1 zeigt eine schematische Systemübersicht mit einer Untergliederung in vier, zum Teil aus parallelen Subsystemen bestehenden Signalverarbeitungsabschnitten. Die ankommenden Farbbildsignale können in beliebigen Farbräumen wie z. B. RGB und YUV definiert sein und werden deshalb allgemein als A, B und C angegeben. Die drei Farbsignale gelangen in drei identisch aufgebaute, parallel arbeitende Segmentierungsabschnitte 100 (Seg.Sect. = Segmentation Section). Jeder Segmentierungsabschnitt liefert für seinen Farbkanal ein Ergebnis Ea, Eb oder Ec, welches aus einem Binärbild gleicher Größe wie das Eingangsbild besteht und welches für jedes Element (Pixel) des Bildes eine 0 enthält, wenn dieses Element nicht zum Objekt gehört und eine 1, wenn dieses Element zum Objekt gehört. Die drei Ergebnisse gelangen dann in einen Multiplexerabschnitt 200 (MUX Sect. = Multiplexer Section), welcher die Einzelergebnisse Ea, Eb und Ec zu einem Endergebnis E verknüpft.FIG. 1 shows a schematic system overview with a subdivision into four signal processing sections, some of which consist of parallel subsystems. The incoming color image signals can be in any color space such. B. RGB and YUV can be defined and are therefore generally specified as A, B and C. The three color signals arrive in three identically constructed, parallel segmentation sections 100 (Seg.Sect. = Segmentation Section). Each segmentation section provides a result Ea, Eb or Ec for its color channel, which consists of a binary image of the same size as the input image and which contains a 0 for each element (pixel) of the image if this element does not belong to the object and a 1 if this element belongs to the object. The three results then arrive in a multiplexer section 200 (MUX Sect. = Multiplexer Section), which combines the individual results Ea, Eb and Ec to an end result E.

Die Funktionsweise der Segmentierungsabschnitte 100 wird anhand von Figur 2 näher erläutert. Der Segmentierungsabschnitt kann im Initialisierungsbetrieb und im Segmentierungsbetrieb benutzt werden.The functioning of the segmentation sections 100 is explained in more detail with reference to FIG. 2. The segmentation section can be used in initialization mode and in segmentation mode.

Im Initialisierungsbetrieb werden der Initialisierungsabschnitt 110 (Init.Sect. = Initialization Section), der Schwellwertpuffer 120 (Thresh. Buf. = Threshold Buffer), der Hintergrundpuffer 130 (Back. Buf. = Background Buffer) sowie der Adaptionsabschnitt 140 (Adapt. See. = Adaption Section) benötigt. Dem Initialisierungsabschnitt 110 wird eine Reihe von Bildern zugeführt, die das zu suchende Vordergrundobjekt nicht enthalten dürfen. Von diesen Bildern wird elementeweise der arithmetische Mittelwert berechnet. Es ergibt sich ein gemitteltes und relativ rauschfreies Hintergrundbild, welches in den Hintergrundpuffer 130 transferiert wird. Danach wird dem Initialisierungsabschnitt 110 eine weitere Reihe von Bildern zugeführt, die ebenfalls das zu suchende Objekt noch nicht enthalten dürfen. Durch Subtraktion der einzelnen Elemente der Bilder von den entsprechenden Elementen des Bildes aus dem Hintergrundpuffer 130 wird für jedes Element der Abstand zum Hintergrund berechnet. Der über die Reihe von Bildern gefundene maximale Abstand für jedes Element wird in den Schwellwertpuffer 120 transferiert. Der Schwellwertpuffer beinhaltet nach der Initialisierung dann für jedes Element der Bilder eine Schwelle, die dem maximalen absoluten Rauschen während der Initialisierungsphase entspricht. Zum Abschluß der Initialisierung wird der Adaptionsabschnitt 140 veranlasst, ebenfalls einen Initialisierungsbetrieb durchzuführen.In the initialization mode, the initialization section 110 (Init.Sect. = Initialization Section), the threshold buffer 120 (Thresh. Buf. = Threshold Buffer), the background buffer 130 (Back. Buf. = Background Buffer) and the adaptation section 140 (Adapt. See. = Adaptation section). The initialization section 110 is supplied with a series of images which must not contain the foreground object to be searched. The arithmetic mean of these images is calculated element by element. The result is an averaged and relatively noise-free background image, which is transferred to the background buffer 130. Then the initialization section 110 is supplied with a further series of images which likewise may not yet contain the object to be searched. By subtracting the individual elements of the image from the corresponding elements of the image from the background buffer 130, the distance to the background is calculated for each element. The one about the series of pictures The maximum distance found for each element is transferred to the threshold buffer 120. After initialization, the threshold value buffer then contains a threshold for each element of the images which corresponds to the maximum absolute noise during the initialization phase. At the end of the initialization, the adaptation section 140 is caused to also carry out an initialization operation.

Im Segmentierungsbetrieb werden außer dem Initialisierungsabschnitt 110 alle Elemente des Segmentierungsabschnitts benötigt. Für den Segmentierungsbetrieb wesentlich sind zwei identische parallel aufgebaute Schwellwertabschnitte 150 (Thresh. Sect. = Threshold Section), deren Aufbau Figur 3 zu entnehmen ist. Beide Schwellwertabschnitte werden mit dem aktuellen Bild sowie dem Inhalt des Schwellwertpuffers gespeist, jedoch mit verschiedenen adaptierten Hintergrundbildern. Die Adaption des Hintergrundbildes wird bei der Beschreibung des Adaptionsabschnitts erklärt. Im Schwellwertabschnitt wird im Subtraktionsabschnitt 151 (Sub. Sect. = Subtraction Section) der Abstand jedes Elements des aktuellen Bildes vom korrespondierenden Element des adaptierten Hintergrundbildes berechnet. Die Abstände werden dann im Vergleichsabschnitt 152 (Comp. Sect. = Comparison Section) mit den jeweils korrespondierenden Einträgen des Schwellwertpuffers verglichen. Für Elemente, bei denen der Abstand kleiner als der Schwellwert ist, wird dann im Ergebniselement eine 0 für Hintergrund, für die übrigen Elemente eine 1 für Vordergrund eingetragen. Die Ergebnisse der beidenIn segmentation mode, apart from the initialization section 110, all elements of the segmentation section are required. Essential for the segmentation operation are two identical threshold sections 150 (Thresh. Sect. = Threshold Section), the structure of which can be seen in FIG. 3. Both threshold value sections are fed with the current image and the content of the threshold value buffer, but with different adapted background images. The adaptation of the background image is explained in the description of the adaptation section. In the threshold value section, the distance of each element of the current image from the corresponding element of the adapted background image is calculated in the subtraction section 151 (Sub. Sect. = Subtraction Section). The distances are then compared in the comparison section 152 (Comp. Sect. = Comparison Section) with the corresponding entries of the threshold value buffer. For elements where the distance is smaller than the threshold value, a 0 for background is entered in the result element, and a 1 for foreground for the other elements. The results of the two

Schwellwertabschnitte werden dann mit einem gewöhnlichen UND-Glied 160 verknüpft. Es werden also nur die Elemente als dem Objekt zugehörig markiert, die von beiden Schwellwertabschnitten als solche erkannt wurden. Das verknüpfte Ergebnis wird dann in den Adaptionsabschnitt 140 geleitet und dieser in den Adaptionsbetrieb geschaltet.Threshold sections are then linked with a normal AND gate 160. Only those elements are marked as belonging to the object that were recognized as such by both threshold value sections. The linked result is then passed into the adaptation section 140 and this is switched to the adaptation mode.

Die Funktionsweise des Adaptionsabschnitts 140 wird anhand von Figur 4 erläutert. Der Adaptionsabschnitt kann im Initialisierungsbetrieb und im Adaptionsbetrieb benutzt werden.The mode of operation of the adaptation section 140 is explained with reference to FIG. 4. The adaptation section can be used in initialization mode and in adaptation mode.

Der Adaptionsabschnitt 140 ermöglicht sowohl eine ortslokale Adaption OA (Pixeladaption, Elementadaption), als auch eine helligkeitslokale Adaption LA (Luminanzadaption). Die OA liefert gute Ergebnisse in Bildbereichen mit deutlichem WO 01/16885 _1Q PCT/EPOO/08110The adaptation section 140 enables both a local adaptation OA (pixel adaptation, element adaptation) and a brightness-local adaptation LA (luminance adaptation). The OA delivers good results in image areas with clear WO 01/16885 _1Q PCT / EPOO / 08110

Abstand zum Objekt. Die LA ermöglicht, die in deutlichem Abstand zum Objekt gewonnenen Informationen auch am unmittelbaren Objektrand sowie in durch das Objekt verdeckten Hintergrundsbereichen zu verwenden.Distance to the object. The LA enables the information obtained at a clear distance from the object to be used on the immediate object edge as well as in background areas hidden by the object.

Im Initialisierungsbetrieb werden der Pixeladaptionspuffer (Pix. Adap. Buf. = PixelIn the initialization mode, the pixel adaptation buffer (Pix. Adap. Buf. = Pixel

Adaption Buffer) 142 sowie der Luminanzadaptionspuffer (Lum. Adap. Buf. = Luminance Adaption Buffer) 145 mit 0 initialisiert. Der Pixeladaptionspuffer ist dabei so groß zu wählen, dass für jedes Bildelement (Pixel) ein Eintrag vorhanden ist; der Luminanzadaptionspuffer ist so groß zu wählen, dass für jede Helligkeitsstufe ein Eintrag vorhanden ist.Adaption Buffer) 142 and the Luminance Adaption Buffer (Lum. Adap. Buf. = Luminance Adaption Buffer) 145 initialized with 0. The pixel adaptation buffer should be chosen so large that there is an entry for each picture element (pixel); the luminance adaptation buffer should be chosen so large that there is an entry for each brightness level.

Im Adaptionsbetrieb wird zunächst der Subtraktionsabschnitt 141 mit dem im Hintergrundpuffer 130 gespeicherten Hintergrundbild Back., dem aktuellen Bild X sowie dem Ergebnis der Schwellwertabschnitte gespeist. Es ist zu beachten, dass die Subtraktionsabschnitte 141 und 151 nicht identisch sind! Für alle Pixel, die als Hintergrund detektiert wurden, wird dann der Abstand zwischen aktuellem Bild und gespeichertem Hintergrundbild berechnet. Die neu berechneten Abstände werden dann in den Pixeladaptionspuffer 142 transferiert. Die übrigen Einträge des Pixeladaptionspuffers bleiben erhalten. Dann wird das Akkumulatorarray 146 (Akku. Arr. = Akkumulator Array) mit 0 initialisiert. Das Akkumulatorarray besteht aus einer Reihe von Akkumulatoren, wobei für jede Helligkeitsstufe ein Akkumulator vorzusehen ist. Die im Subtraktionsabschnitt 141 neu berechneten Abstände werden dann im Akkumulatorarray 146 dergestalt akkumuliert (addiert), dass jeweils die Zelle des Akkumulatorarrays mit der aktuell zu bearbeitenden Differenz addiert wird, die der Luminanz des entsprechenden Elements im Hintergrundpuffer 130 entspricht. Anders formuliert kann man also sagen, dass im Akkumulatorarray 146 die im Subtraktionsabschnitt 141 berechneten Differenzen geordnet nach der Helligkeit des entsprechenden Hintergrundpixels aufsummiert werden. Im Akkumulatoranalyseabschnitt (Akku. Ana. Sect. = Akkumulator Analysis Section) 147 wird dann jeder Wert des Akkumulatorarrays durch die Anzahl der Elemente geteilt, deren Differenz in den jeweiligen Akkumulatorwert eingegangen sind. Die Ergebnisse werden in den Luminanzadaptionspuffer (Lum. Adap. Buf. = Luminance Adaption Buffer) 145 übertragen. Im Pixeladaptionsabschnitt (Pix. Adap. Sect. = Pixel Adaption Section) 143 wird dann durch Addition des Hintergrundbildes Back mit dem Inhalt desPixeladaptionspuffers 142 der pixeladaptierte Hintergrund PA.Back gewonnen. Im Luminanzadaaptionsabschnitt (Lum. Adap. Sect. = Luminance Adaption Section) 144 wird durch Addition des Hintergrundbildes Back mit dem jeweils korrespondierenden Eintrag des Luminanzadaptionspuffers 145 der luminanzadaptierte Hintergrund LA.Back gewonnen.In the adaptation mode, the subtraction section 141 is first fed with the background image Back. Stored in the background buffer 130, the current image X and the result of the threshold value sections. It should be noted that the subtraction sections 141 and 151 are not identical! The distance between the current image and the stored background image is then calculated for all pixels which have been detected as the background. The newly calculated distances are then transferred to the pixel adaptation buffer 142. The remaining entries in the pixel adaptation buffer are retained. Then the accumulator array 146 (accumulator array = accumulator array) is initialized with 0. The accumulator array consists of a series of accumulators, one accumulator being provided for each brightness level. The distances recalculated in the subtraction section 141 are then accumulated (added) in the accumulator array 146 such that the cell of the accumulator array is added with the difference currently to be processed, which corresponds to the luminance of the corresponding element in the background buffer 130. In other words, one can say that in the accumulator array 146 the differences calculated in the subtraction section 141 are added up in an orderly manner according to the brightness of the corresponding background pixel. In the accumulator analysis section (Accumulator. Ana. Sect. = Accumulator Analysis Section) 147, each value of the accumulator array is then divided by the number of elements, the difference of which is included in the respective accumulator value. The results are transferred to the luminance adaptation buffer (Lum. Adap. Buf. = Luminance Adaption Buffer) 145. In the pixel adaptation section (Pix. Adap. Sect. = Pixel Adaption Section) 143 the pixel-adapted background PA.Back is then obtained by adding the background image Back with the content of the pixel adaptation buffer 142. In the luminance adaptation section (Lum. Adap. Sect. = Luminance Adaption Section) 144, the luminance-adapted background LA.Back is obtained by adding the background image Back with the corresponding entry of the luminance adaptation buffer 145.

Die Ergebnisse Ea, Eb und Ec der drei Segmentierungsabschnitte 100 gelangen zum Schluß in den Multiplexerabschnitt (MUX Sect. = Multiplexer Section) 200. Die Verknüpfung der Ergebnisse erfolgt elementeweise. Ergebnisse, die in allen drei Einzelergebnissen übereinstimmen, werden direkt übernommen. Stimmen die Einzelergebnisse nicht überein, wird das negierte Ergebnis des örtlichen Vorgängers des jeweils bearbeiteten Elements übernommen. Das gewonnene Ergebnis E kann dann ggf. noch mit üblichen morphologischen Filtern, wie z. B. Medianfilter, bearbeitet werden. The results Ea, Eb and Ec of the three segmentation sections 100 finally reach the multiplexer section (MUX Sect. = Multiplexer Section) 200. The results are linked element by element. Results that match all three individual results are adopted directly. If the individual results do not match, the negated result of the local predecessor of the element being processed is adopted. The result E obtained can then, if necessary, also be carried out using conventional morphological filters, such as, B. median filters are edited.

Claims

Patentansprüche (5) Claims (5)

1. Verfahren zur echtzeitfähigen Segmentierung von Videoobjekten bei bekanntem stationären Bildhintergrund, d a d u r c h g e k e n n z e i c h n e t, - dass pro Bildpixel die Differenz zwischen dem zu analysierenden Bild und dem durch Mittelung mehrerer Aufnahmen berechneten Hintergrundbild mit einem auf das maximale Rauschen festgesetzten Schwellwert verglichen wird und bei Überschreitung des Schwellwertes der Pixel als Vordergrund, ansonsten als Hintergrund markiert wird, - dass Helligkeitsänderungen im Hintergrundbild durch Adaption kompensiert werden, um Bildpixel nicht falsch zu detektieren, dass Adaption und Segmentierung so miteinander verknüpft sind, dass eine präzise Separierung der Vordergrundobjekte vom Szenenhintergrund erreicht wird, und dass für die Auswertung mehrerer Farbkanäle die Ergebnisse bei Übereinstimmung direkt übernommen werden, ansonsten der negierte örtliche Vorgänger verwendet wird.1. A method for real-time segmentation of video objects with a known stationary image background, characterized in that - for each image pixel, the difference between the image to be analyzed and the background image calculated by averaging several recordings is compared with a threshold value set to the maximum noise and if the threshold value is exceeded Pixels are marked as foreground, otherwise as background - that changes in brightness in the background image are compensated for by adaptation, so as not to incorrectly detect image pixels, that adaptation and segmentation are linked in such a way that a precise separation of the foreground objects from the scene background is achieved, and for the evaluation of several color channels, the results are adopted directly if they match, otherwise the negated local predecessor is used.

2. Verfahren nach Anspruch 1, dadurch gekennzeichnet, dass der Mittelwert für den Bildhintergrund aus dem arithmetischen Mittel jeweils korrespondierender Pixel einer aufgenommenen Bildreihe ohne Vordergrundobjekt berechnet wird.2. The method according to claim 1, characterized in that the mean value for the image background is calculated from the arithmetic mean of each corresponding pixel of a recorded image row without a foreground object.

3. Verfahren nach Anspruch 1 und 2, dadurch gekennzeichnet, dass der Schwellwert aus der maximalen Differenz zwischen den jeweils korrespondierenden Pixeln einer zur Rauschschätzung aufgenommenen Bildreihe ohne Vordergrundobjekt und dem Mittelwert des Hintergrundbildes berechnet wird.3. The method according to claim 1 and 2, characterized in that the threshold value is calculated from the maximum difference between the respectively corresponding pixels of a series of images recorded for noise estimation without foreground object and the mean value of the background image.

4. Verfahren nach Anspruch 1 bis 3, dadurch gekennzeichnet, dass für die Segmentierung die Differenzen der korrespondierenden Pixel eines aufgenommenen Bildes mit Vordergrundobjekt und den adaptierten Hintergrundbildern verglichen werden mit dem Schwellwert und bei jeweiliger Überschreitung der Pixel als Vordergrund markiert wird. Verfahren nach Anspruch 1 bis 4, dadurch gekennzeichnet, dass die Kompensation von Helligkeitsänderungen im Hintergrundbild dadurch erreicht wird,4. The method according to claim 1 to 3, characterized in that for the segmentation the differences of the corresponding pixels of a captured image with the foreground object and the adapted background images are compared with the threshold value and when the pixels are exceeded is marked as the foreground. Method according to Claims 1 to 4, characterized in that the compensation of changes in brightness in the background image is achieved by

- dass zunächst für jedes Pixel des aktuellen Hintergrunds die Differenz zum korrespondierenden Pixel des gespeicherten Hintergrundbildes gebildet wird und die Differenz sowohl im korrespondierenden Feld für die Pixeladaption gespeichert, als auch in dem für die Helligkeitsstufe des korrespondierenden Pixels des gespeicherten Hintergrundbildes vorgesehenen Akkumulators aufaddiert wird,that the difference to the corresponding pixel of the stored background image is first formed for each pixel of the current background and that the difference is both stored in the corresponding field for the pixel adaptation and added to the accumulator provided for the brightness level of the corresponding pixel of the stored background image,

- dass in einer Nachbearbeitung die Einträge im Akkumulator durch die Anzahl der akkumulierten Differenzen dividiert wird, und sowohl ein pixeladaptiertes Hintergrundbild, durch Addition der Einträge für die Pixeladaption mit den korrespondierenden Einträgen des gespeicherten Hintergrundbildes, als auch ein helligkeitsadaptiertes Hintergrundbild, durch Addition der Einträge vom gespeicherten Hintergrundbilde mit dem Inhalt desjenigen Akkumulators, der für die Helligkeit des jeweils korrespondierenden Pixels des gespeicherten Hintergrundbildes vorgesehen ist, berechnet werden. - That the entries in the accumulator are divided by the number of accumulated differences in post-processing, and both a pixel-adapted background image, by adding the entries for the pixel adaptation with the corresponding entries of the stored background image, and a brightness-adapted background image, by adding the entries from stored background images with the content of the accumulator that is provided for the brightness of the corresponding pixel of the stored background image are calculated.