EP2756458A1 - Maschinelles lernverfahren zum maschinellen erlernen von erscheinungsformen von objekten in bildern - Google Patents
Maschinelles lernverfahren zum maschinellen erlernen von erscheinungsformen von objekten in bildernInfo
- Publication number
- EP2756458A1 EP2756458A1 EP12769887.6A EP12769887A EP2756458A1 EP 2756458 A1 EP2756458 A1 EP 2756458A1 EP 12769887 A EP12769887 A EP 12769887A EP 2756458 A1 EP2756458 A1 EP 2756458A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- image
- training
- feature
- images
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Definitions
- Machine learning machine for machine learning of manifestations of objects in images
- the invention relates to a machine learning method for automatically detecting the appearances of objects in images in the form of object features from training operators for using the learned object features in an image processing system, as well as an apparatus for performing the method.
- Such an image processing system can be provided by an object recognition system, object tracking system or an image intensification system.
- the object of object recognition systems is to locate and classify objects (e.g., vehicles or people) in digital images. These are used, for example, in motor vehicles, where the environment and in particular the area in front of the motor vehicle is to be examined for objects such as other vehicles or passers-by, or the field of robotics, where the surroundings are to be searched for specific objects by a freely movable robot.
- objects e.g., vehicles or people
- These are used, for example, in motor vehicles, where the environment and in particular the area in front of the motor vehicle is to be examined for objects such as other vehicles or passers-by, or the field of robotics, where the surroundings are to be searched for specific objects by a freely movable robot.
- the object of object tracking systems is to retrieve an object (e.g., a vehicle or a person) in an image of a bios sequence, provided that its location, extent, and appearance are known in one or more previous images of the image sequence.
- an object e.g., a vehicle or a person
- the task of book registration systems is to determine bi-directional transformations (e.g., translations) between two images that allow the images to be matched by applying the transformation.
- bi-directional transformations e.g., translations
- panoramic imaging methods bring the overlapping areas of two images into coincidence to create an overall image (so-called stitching). From the relative positions of the image contents in both images, the necessary transformation information can be determined.
- the method of supervised machine learning of an object recognition system uses a preferably large number of annotated training guides which both represent the image contents of the objects to be learned and their image backgrounds.
- An image area around an image position at which an object to be learned is located in the training image is referred to as a positive trait example; sittv annotated.
- Image areas in the training image where there are no objects to learn (in the image background) are referred to as negative training examples (negative annotation).
- a basic problem with this is the necessary processing of a preferably large number of positive and negative training examples, which is necessary for the detection of the possibly diverse manifestations of backgrounds and objects.
- Desirable processing of a large number of training examples is therefore of great interest both from a functional point of view (training of a larger variance of manifestations) and an operational point of view (time and processing effort).
- the annotated training images are given by the images of an image sequence in which the position, extent and appearance of the object to be tracked are already known or annotated from previous images of the image sequence.
- An initial annotation can be effected, for example, by a user (marking of the object to be tracked), by an object recognition system or by the detection of moving objects.
- one of the two images is interpreted as a training report, the other as a test report.
- the determination of the positive annotations in the training image must be specified specifically for the registration task and the transformation information to be determined in terms of number and location.
- one or more positive annotations could be selected at fixed positions in the expected overlap area of both images (e.g., on the right edge of the image). The rest of the picture is negatively annotated.
- positive annotations may be generated by manual or automatic determination of prominent valley areas, i. by determining image areas that are particularly suitable for their retrieval in the test image (e.g., highly structured image areas). If more than two images (e.g., one image sequence) are to be registered with each other, positive and negative annotations may be appropriately selected in more than one image of the sequence (in the sense of multiple training images),
- the prior art is an explicit generation of a large number of positive and negative eventing examples in the form of feature data vectors with their explicit processing in a machine learning approach (e.g., support vector machine or neural network),
- the conventional methods solve this problem in discretized form.
- Individual training examples are discretely extracted at the areas determined by the annotation images and converted into individual feature data vectors. Since a large number of such training data vectors can be obtained from a single feature image by overlapping the image plane, typically only a small subset is selected in this step to reduce computational effort. The thereby achievable validity of the object feature contributions that can be obtained from a training image in a single processing step is consequently limited.
- the object of the invention is to provide the rapid processing of a large number of positive and negative training examples (annotations) in the training of an image processing system.
- At least one training image contains the representation of an object to be learned and the associated annotation images at positions of objects in the training image have positive annotation values (annotations);
- Linear filtering operations are standard image and signal processing operations (see, e.g., R.C. Gonzales, R. E. Woods, Digital Image Processing, Third Edition, Pearson Prentice Hall).
- the invention enables the training of greater variance of object and background manifestations, thereby increasing the robustness of the system in its application to untrained images.
- the invention allows for faster training runs. this makes possible
- the invention enables it to be implemented on hardware architectures with lower processing speeds (e.g., on mobile hardware architectures).
- FIG. 1 a schematic overview of the teaching unit according to the invention
- Figure 2 a schematic representation of the operation of the classification unit
- Figure 3. is a schematic representation of the operation of the fusion unit
- Figure 4 an exemplary representation of the Fiitervorgangs in the fusion unit.
- FIG. 1 schematically illustrates the learning unit 10 according to the invention. This comprises at least one training image unit 12, a feature extraction unit 14, a classification unit 16 and a feature fusion unit 18.
- a further optional subunit, the initialization unit serves exclusively for initializing object features and is therefore not shown in FIG.
- Task of the learning unit 10 is to capture the appearance of objects and backgrounds in training images 20 in an efficient manner. The detection takes place by determining the object feature contributions of each training image 20. Execution of the learning unit 10 on a plurality of training images 20 makes it possible to combine the searched object features from the object feature contributions of the individual training images 20. An embodiment of the combination of the subject timbral contributions is given by their averaging.
- the task of the initialization unit (not shown) is to provide an initial estimate of object features. An embodiment of the initialization unit is given by a random or uniform initialization of the object nerkmate. An alternative embodiment uses the training image unit and the feature extraction units to obtain an intial estimate of object features based on the objects imaged in the training images.
- the task of the training image unit 12 is to provide training images 20 and annotation images 22.
- the training images 20 may be real sensor brothers, computer graphics generated synthetic images, or mixed forms of both.
- the training image unit 12 provides an annotation page 22. It can be seen from the annotation image 22 at which image positions in the training image 20 objects are to be learned (positive annotations) .Picture positions in the training subject 20 on which no objects to be learned (eg in the image background) are negatively annotated Image excerpts in the training image background of the same size as the objects to be learned are referred to as negative training examples
- Figure 1 symbolically shows a training subject 20 with associated annotation image 22. For reasons of simpler representability the image plane is divided into a simpler 3x3 grid.
- An advantageous embodiment of the object recognition system training image unit 12 is provided by a computer graphics system in which the objects to be trained can be generated in arbitrary numbers against arbitrary backgrounds using 3D models at a known camera position, synthetically under any display conditions (e.g., illumination).
- the task of the feature extraction unit 14 is the conversion of a training image 20 into one or more memory cells 24.
- a simple embodiment of the feature extraction unit 14 is the generation of an edge image by edge biases.
- Several feature images 24 can be obtained, for example, by the application of a filter bank with direction filters.
- FIG. 1 symbolically shows the result of an edge-based operation as a feature image 24.
- the task of the classification unit 16 is the conversion of a feature image 24 in FIG a classification image 26.
- the entries of the classification image 26 designated as the classification response are a measure of the similarity between object features and the feature image 24 in the local environment of the corresponding image position. Larger classification responses indicate greater similarity.
- the object features 28 fed to the classification unit 16 are derived either from the initialization unit, not shown, or from object features derived by combination (e.g., averaging) of previously determined object feature contributions from training images 20.
- a preferred embodiment of the classification unit 16 for calculating the similarity measure is given by an image correlation between object features and feature image shown in FIG. If more than one feature image 24 has been generated per training image 20 in the feature extraction unit 14, then the classification unit 16 is to be applied to each feature image 24.
- the task of the feature fusion unit 18 is to fuse a possibly large number of differently weighted regions of the feature image 24 in the most efficient manner by addition, and thus to determine the searched feature contribution 30 of a training bios 20 to the object features.
- the feature fusion unit 18 uses the annotation image 22 and the classification image 26.
- the mode of operation of the feature fusion unit 18 is shown symbolically in FIG. 3 and can be subdivided into two steps.
- a high kiassification response should occur with optimally chosen object features. If this is not the case, this indicates that there are 24 new object feature structures in the feature image which are not yet sufficiently represented in the object features used, e.g. through a previously unlearned shaping of the object in the training image.
- the corresponding area of the feature image 24 must therefore be included in the determination of the subject feature contributions of the training image 20 with a positive weighting.
- the positive weighting at an image position is advantageously chosen to be the larger, the smaller the classification response at the corresponding image position has failed.
- each image position in the feature fusion unit 18 is assigned a weight and the results are assigned to a weight image 32.
- Step 2 makes advantageous use of the property of linear filter operations, in which the weights of a filter mask determine in which weighting which portions of a signal are to be summed. It should be noted at this point that the linear fiiter operations described here are not to be confused with their filter function as they are used, for example, in object recognition for the measurement of similarities or for feature extraction.
- the execution of the fusion is illustrated by way of example with reference to FIG. 4, which shows a characteristic maize 24 having a few non-zero entries (zeros are not shown in the figure).
- the task consists in summing the gray-marked image areas with given weights. The image positions of the sum regions to be summed are entered in the weight image 32 with their weights to be used.
- This task is now performed by filtering feature biogram 24 (M) through weight image 32 (G) (G * ), where * means the filtering operation, in result image 34 (G * M) the entries lying outside the central image area are ignored , which is represented by a dash.
- * means the filtering operation
- result image 34 the entries lying outside the central image area are ignored , which is represented by a dash.
- the object of the second step of the feature fusion shown in FIG. 3 below can thus be achieved by interpreting the weight image 32 obtained in the first step shown in FIG. 3 above as a filing mask in order to obtain the weight image 32 by linear filtering of the thermal image 24 to achieve the desired weighted summation of feature areas.
- the filtering of the feature image 24 with the answerssbiid 32 can be advantageously carried out after transformation of both Biider by means of fast Fourier transforms in the frequency domain by simple element-wise multiplication.
- the well-known methodology of performing filter operations in the frequency domain by exploiting the so-called convolution theorem is, for example, in the textbook of R.C. Gonzales and R.E. Woods (Digital Image Processing, Third Edition, Pearson Prentice Hall). With this methodology, unlike the prior art, the regions of the feature image 24 need not be explicitly generated in the form of feature data vectors but are implicitly generated, weighted and summed within the filter operation.
- FIGS. 1 and 3 the feature contributions of positive and negative weights are shown separated only for more comprehensible presentation.
- the Merkmaisfusion unit generates the sum of both contributions.
- Classification unit 16 generates more than one classification image 26, a corresponding number of feature contributions are generated in the feature fusion unit 18.
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE102011113154.3A DE102011113154B4 (de) | 2011-09-14 | 2011-09-14 | Maschinelles Lernverfahren zum maschinellen Erlernen von Erscheinungsformen von Objekten in Bildern |
PCT/DE2012/100238 WO2013037357A1 (de) | 2011-09-14 | 2012-08-13 | Maschinelles lernverfahren zum maschinellen erlernen von erscheinungsformen von objekten in bildern |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2756458A1 true EP2756458A1 (de) | 2014-07-23 |
Family
ID=47010116
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP12769887.6A Ceased EP2756458A1 (de) | 2011-09-14 | 2012-08-13 | Maschinelles lernverfahren zum maschinellen erlernen von erscheinungsformen von objekten in bildern |
Country Status (4)
Country | Link |
---|---|
US (1) | US9361543B2 (de) |
EP (1) | EP2756458A1 (de) |
DE (1) | DE102011113154B4 (de) |
WO (1) | WO2013037357A1 (de) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9098741B1 (en) * | 2013-03-15 | 2015-08-04 | Google Inc. | Discriminitive learning for object detection |
CN103914841B (zh) * | 2014-04-03 | 2018-03-09 | 深圳大学 | 基于超像素和深度学习的***细菌分割与分类*** |
CN107169571A (zh) * | 2016-03-07 | 2017-09-15 | 阿里巴巴集团控股有限公司 | 一种特征筛选方法及装置 |
US10163003B2 (en) * | 2016-12-28 | 2018-12-25 | Adobe Systems Incorporated | Recognizing combinations of body shape, pose, and clothing in three-dimensional input images |
KR102481885B1 (ko) * | 2017-09-08 | 2022-12-28 | 삼성전자주식회사 | 클래스 인식을 위한 뉴럴 네트워크 학습 방법 및 디바이스 |
JP7167668B2 (ja) * | 2018-11-30 | 2022-11-09 | コニカミノルタ株式会社 | 学習方法、学習装置、プログラムおよび記録媒体 |
CN109740658B (zh) * | 2018-12-28 | 2023-04-18 | 陕西师范大学 | 一种基于带权图的半监督图像分类方法 |
CN110929622B (zh) * | 2019-11-15 | 2024-01-05 | 腾讯科技(深圳)有限公司 | 视频分类方法、模型训练方法、装置、设备及存储介质 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7421415B2 (en) * | 2004-09-07 | 2008-09-02 | Siemens Corporate Research, Inc. | Methods and systems for 3D object detection using learning |
US7890512B2 (en) * | 2008-06-11 | 2011-02-15 | Microsoft Corporation | Automatic image annotation using semantic distance learning |
US8175376B2 (en) * | 2009-03-09 | 2012-05-08 | Xerox Corporation | Framework for image thumbnailing based on visual similarity |
US8588519B2 (en) * | 2010-09-22 | 2013-11-19 | Siemens Aktiengesellschaft | Method and system for training a landmark detector using multiple instance learning |
-
2011
- 2011-09-14 DE DE102011113154.3A patent/DE102011113154B4/de not_active Expired - Fee Related
-
2012
- 2012-08-13 WO PCT/DE2012/100238 patent/WO2013037357A1/de active Application Filing
- 2012-08-13 EP EP12769887.6A patent/EP2756458A1/de not_active Ceased
- 2012-08-13 US US14/344,390 patent/US9361543B2/en active Active
Non-Patent Citations (3)
Title |
---|
DAVIS J W ET AL: "A Two-Stage Template Approach to Person Detection in Thermal Imagery", 2005 SEVENTH IEEE WORKSHOPS ON APPLICATIONS OF COMPUTER VISION (WACV/MOTION'05) - 5-7 JAN. 2005 - BRECKENRIDGE, CO, USA, IEEE, LOS ALAMITOS, CALIF., USA, 5 January 2005 (2005-01-05), pages 364 - 369, XP032120847, ISBN: 978-0-7695-2271-5, DOI: 10.1109/ACVMOT.2005.14 * |
See also references of WO2013037357A1 * |
ZDENEK KALAL ET AL: "P-N learning: Bootstrapping binary classifiers by structural constraints", 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 13-18 JUNE 2010, SAN FRANCISCO, CA, USA, IEEE, PISCATAWAY, NJ, USA, 13 June 2010 (2010-06-13), pages 49 - 56, XP031726056, ISBN: 978-1-4244-6984-0 * |
Also Published As
Publication number | Publication date |
---|---|
DE102011113154A1 (de) | 2013-03-14 |
US9361543B2 (en) | 2016-06-07 |
WO2013037357A1 (de) | 2013-03-21 |
US20140328537A1 (en) | 2014-11-06 |
DE102011113154B4 (de) | 2015-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DE102011113154B4 (de) | Maschinelles Lernverfahren zum maschinellen Erlernen von Erscheinungsformen von Objekten in Bildern | |
DE102017220307B4 (de) | Vorrichtung und Verfahren zum Erkennen von Verkehrszeichen | |
DE69530566T2 (de) | Hough-Transform mit Fuzzy-Gradient und Wahl | |
EP2920741B1 (de) | Verfahren und vorrichtung zur bildgestützten landebahnlokalisierung | |
DE102007041893A1 (de) | Verfahren zur Detektion und/oder Verfolgung von bewegten Objekten in einer Überwachungsszene mit Störern, Vorrichtung sowie Computerprogramm | |
DE112013002740T5 (de) | Menscherfassungsvorrichtung | |
DE102019209644A1 (de) | Verfahren zum Trainieren eines neuronalen Netzes | |
DE102017212418A1 (de) | Fahrerassistenzsystem und -verfahren zur leitplankenerkennung | |
EP3511904B1 (de) | Verfahren zum bestimmen einer pose eines objekts in einer umgebung des objekts mittels multi-task-lernens, sowie steuerungsvorrichtung | |
EP4121885A1 (de) | Anonymisierungseinrichtung, überwachungsvorrichtung, verfahren, computerprogramm und speichermedium | |
DE102018205561A1 (de) | Vorrichtung zur Klassifizierung von Signalen | |
DE102017124600A1 (de) | Semantische Segmentierung eines Objekts in einem Bild | |
EP1180258B1 (de) | Mustererkennung mittels prüfung zusätzlicher merkmale nach teilverarbeitung | |
DE112018007277T5 (de) | Vorrichtung und verfahren zur automatischen fehlerschwellenwerterkennung für bilder | |
DE102018100315A1 (de) | Erzeugen von Eingabedaten für ein konvolutionelles neuronales Netzwerk | |
DE102020209080A1 (de) | Bildverarbeitungssystem | |
DE102014108492A1 (de) | Verfahren zum Detektieren eines blickwinkelabhängigen Merkmals eines Dokumentes | |
DE102019129029A1 (de) | System und verfahren zur objektdetektion | |
EP3576013A1 (de) | Abschätzen eines verlaufs eines schienenpfads | |
DE102013224382A1 (de) | Beschleunigte Objekterkennung in einem Bild | |
EP1359539A2 (de) | Neurodynamisches Modell der Verarbeitung visueller Informationen | |
EP0693200B1 (de) | Verfahren zur klassifizierung von objekten | |
DE102020208080A1 (de) | Erkennung von Objekten in Bildern unter Äquivarianz oder Invarianz gegenüber der Objektgröße | |
DE102009060687A1 (de) | Verfahren und Vorrichtung zum rechnergestützten Annotieren von Multimediadaten | |
WO2019072451A1 (de) | Verfahren zum verarbeiten von bildern |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20140414 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: AIRBUS DEFENCE AND SPACE GMBH |
|
DAX | Request for extension of the european patent (deleted) | ||
17Q | First examination report despatched |
Effective date: 20150827 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R003 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED |
|
18R | Application refused |
Effective date: 20171115 |