WO2009086683A1

WO2009086683A1 - Automatic detection, labeling and tracking of team members in a video

Info

Publication number: WO2009086683A1
Application number: PCT/CN2007/003986
Authority: WO
Inventors: Xiaofeng Tong; Jia Liu; Yimin Zhang
Original assignee: Intel Corporation
Priority date: 2007-12-29
Filing date: 2007-12-29
Publication date: 2009-07-16

Abstract

In some embodiments, an automatic detection, labeling and tracking of team members in a video is presented. In this regard, a method is introduced to receive a frame from a sports video, to identify a playing surface in the frame, to identify player regions on the playing surface, to transform pixels from the player regions into player models, to aggregate the player models from a plurality of frames, and to determine a team model to represent a first team and a second team based on clustering of player models. Other embodiments are also provided.

Description

AUTOMATIC DETECTION, LABELING AND TRACKING OF TEAM

MEMBERS IN A VIDEO

FIELD OF THE INVENTION

Embodiments of the present invention generally relate to the field of video processing, and, more particularly to an automatic detection, labeling and tracking of team members in a video.

BACKGROUND OF THE INVENTION

Player detection, labeling and tracking is critical for the study of team tactics and player activities in TV broadcast sports video. While some progress has been made on this topic, it is still challenging due to the difficulties such as player-to-player occlusion, low discriminative appearance between players, varying number of players on the screen, abrupt camera motion, and video blur.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings in which like references indicate similar elements, and in which:

FIG. 1 is a graphical illustration of an example frame of a video, in accordance with one example embodiment of the invention;

FIG. 2 is a flow chart of an example method for developing a player labeling module, in accordance with one example embodiment of the invention; FIG. 3 is a flow chart of an example method for testing a player labeling module, in accordance with one example embodiment of the invention; and

FIG. 4 is a block diagram of an example article of manufacture including content which, when accessed by a device, causes the device to implement one or more aspects of one or more embodiment(s) of the invention.

DETAILED DESCRIPTION

Embodiments of the present invention are generally directed to an automatic detection, labeling and tracking of team members in a video. In this regard, in accordance with but one example implementation of the broader teachings of the present invention, a method is introduced to receive a frame from a sports video, to identify a playing surface in the frame, to identify player regions on the playing surface, to transform pixels from the player regions into player models, to aggregate the player models from a plurality of frames, and to determine a team model to represent a first team and a second team based on clustering of player models. Other embodiments are also disclosed and claimed.

Fig. 1 is a graphical illustration of an example frame of a video, in accordance with one example embodiment of the invention. Frame 100 is intended to represent x. In accordance with the illustrated example embodiment, frame 100 may include one or more of global view 102, playing surface 104, boundary 106, outer region 108, first team members 110, second team members 112, referee 114, ball 116, first team labels 118, second team labels 120, and referee label 122 coupled as shown in Fig. 1. While shown as being a frame of a soccer match, frame 100 may well be from a video of another sport, such as basketball or football, or any other type of video that would benefit from the teachings of the present invention. Global view 102 represents the type of view depicted. In one embodiment, global view 102 represents a view in which the frame is predominantly of the playing surface or field and several players, as opposed to a close-up view or a crowd view, for example. While shown as including two first team members 110 and two second team members 112 for simplicity, many more players may be present in frame 100. Playing surface 104 may be grass or another surface type that is predominantly solid and uniform in color and may be surrounded by boundary 106 that separates and distinguishes playing surface 104 from outer region 108, which may include, for example, spectators.

First team members 110 would have matching uniforms that are distinguishable from the uniforms worn by second team members 112 and further distinguishable from the uniform worn by referee 114. In one embodiment of the present invention, as described in more detail hereinafter, the differences in player and referee uniforms allow labels, such as first team labels 118, second team labels 120, and referee label 122, to be added to frame 100. In one embodiment, ball 116 and other anomalies on playing surface 104 are ignored for player detection, labeling and tracking purposes.

Fig. 2 is a flow chart of an example method for developing a player labeling module, in accordance with one example embodiment of the invention. It will be readily apparent to those of ordinary skill in the art that although the following operations may be described as a sequential process, many of the operations may in fact be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged without departing from the spirit of embodiments of the invention.

In one embodiment, method 200 begins with receiving (202) a frame of a sports video such as frame 100. In one example embodiment, frames that don't depict global view 102 are ignored. In one embodiment, a video processing system may grab one or two frames per second to process out of a 25 frame per second video source.

Next is identifying (204) a playing surface in the frame, such as playing surface 104. In one embodiment, playing surface 104 is identified based on its color (perhaps green) that makes up a majority of frame 100. In one embodiment, the dominant color of playing surface 104 is learned by accumulating HSV color histograms. In one embodiment, playing surface 104 can be extracted from frame 100 through dominant color segmentation, morphological filtering and connect-component analysis.

Method 200 continues with identifying (206) player regions on the playing surface. In one embodiment, player regions comprise groupings of contrasting colors present on playing surface 104 of sufficient size, for example number of pixels. In one embodiment, ball 116 would be too small to be considered a player region.

Next is creating (208) player models. In one example embodiment, pixels from player regions are transformed into histograms which represent player models, hi one embodiment, only pixels from the upper half of the player regions (and therefore more likely to include a team jersey worn on the upper body) are included in creating of the player models. In one embodiment, a large pool of pixels is collected from player regions and transformed into CIE-Luv space. A Gaussian Mixture Model (GMM) may be estimated with N components by Expectation-Maximization (EM) clustering. Centers of these components are referred to as prototypes. The adjacent components with small center distance are merged together. The resultant merged components are referred to as meta-prototypes. AU player samples are represented as a histogram by binning all pixels into the corresponding meta-prototype. In one embodiment, to model the players' appearance, firstly we use EM clustering again to estimate K clusters over the meta- prototype histogram of all player samples. The centers of these clusters are named submodels.

Then, team models are created (210). In one embodiment, the player models are aggregated from a plurality of frames and team models for a first team and a second team are determined based on statistical analysis of predominant player models. In one embodiment, a referee model is also created and maintained. In one embodiment, the clusters are merged into four clusters with near absorption. Their centers are labeled real- models. A labeling function assigns each real-model and sub-model exactly one label in a label set LS = {Team A, Team B, Referee, Outlier}. The two real-models with the largest size are identified as Team A and Team B, as well as their corresponding sub-models. A minimum average distance (MAD) from the other real-models to the two team sub-models may then be computed. The real-model with smaller MAD may be labeled as Referee, and another one with larger MAD may be labeled Outlier. Fig. 3 is a flow chart of an example method for testing a player labeling module, in accordance with one example embodiment of the invention. It will be readily apparent to those of ordinary skill in the art that although the following operations may be described as a sequential process, many of the operations may in fact be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged without departing from the spirit of embodiments of the invention.

According to but one example implementation, method 300 begins with receiving (302) a frame of a sports video such as frame 100. In one example embodiment, a video processing system may grab fewer frames per second to process than are available from the video source. Next, the frame is analyzed to determine (304) whether a global view is depicted.

In one embodiment, if a global view is not depicted, then the frame is not processed further. If a global view 102 is depicted in the frame 100 then it is further processed.

Next is identifying (306) a playing surface in the frame, such as playing surface 104. In one embodiment, playing surface 104 is identified based on its color (perhaps green) that makes up a majority of frame 100. In one embodiment, the dominant color of playing surface 104 is learned by accumulating HSV color histograms. In one embodiment, playing surface 104 can be extracted from frame 100 through dominant color segmentation, morphological filtering and connect-component analysis.

Method 300 continues with identifying (308) target player regions on the playing surface. In one embodiment, target player regions comprise groupings of contrasting colors present on playing surface 104 of sufficient size, for example number of pixels. In one embodiment, ball 116 would be too small to be considered a target player region. In one embodiment, a detector scans across the filtered image regions of playing surface 104 with multiple scales. Target player regions may represent areas with multiple responses.

Next is comparing (310) player regions to stored team models, such as those generated by method 200. hi one embodiment, numerical representations for the target player regions are developed by performing boosted cascade detection on the upper half of the target player regions, hi one embodiment, each target player region is represented by its meta-prototype histogram

Lastly, team members are labeled and tracked (312). hi one embodiment, a target player region is labeled as a member of a team if the target player region is sufficiently similar to a stored team model, hi one embodiment, a target player region is assigned the sub-model's label with the nearest Bhattacharyya distance, hi one embodiment, first team labels 118 would be added to frame 100 around first team members 110, while second team labels 120 (different in color than first team labels 118) would be added to frame 100 around second team members 112. While shown as rectangles in Fig. 1, the team labels can be any shape and color, hi one embodiment, referee label 122 would be added to frame 100 around referee 114 if the target player region is sufficiently similar to a stored referee model, hi another embodiment, referee 114 or any other target player region (perhaps a goaltender) not sufficiently similar to one of the team models would be labeled with an outlier label (perhaps a different color rectangle). hi one embodiment, tracking of team members includes maintaining a list of labeled target player regions for use with subsequent frames, hi one embodiment, coordinates on frame 100 of a labeled team member are saved and utilized in the decision- making process of labeling a target player region. For example, if two team members are labeled in a frame and subsequently become occluded such that analysis of the singular frame can not reveal two team members, the stored list of labeled target player regions can be relied upon in part to maintain two labels. Also, for example, if a player temporarily becomes inverted or unrecognizable in a particular frame, the list of labeled target player regions can be utilized to assist in identifying an ambiguous target player region, hi one embodiment, a list of labeled target player regions is purged if it is determined that an abrupt change in view, such as a camera switch, has occurred. hi one embodiment, a set of rectangles is used to represent the detected player regions at frame t. A player is enclosed by a rectangle, in which the binary mask segmented by dominant color and HSV color histogram are taken as observations. A tracking module may find the players correspondence between adjacent frames with binary mask and color information. One aim of tracking is to find the correspondence of players in adjacent rectangles. Another aim is to discriminate false alarms and make up missing detection instance. Bi-direction tracking, forward tracking from time t to t+1, and backward tracking from time t+1 to t, may be used to handle this problem. Adjacent rectangles may be an exact match if the overlap of their enclosed rectangles is sufficient large (binary likelihood), and the similarity of the likelihood of their color histograms in HSV color space is enough large (color likelihood).

Fig. 4 illustrates a block diagram of an example storage medium comprising content which, when accessed, causes an electronic appliance to implement one or more aspects of the disclosed methods 200 and/or 300. In this regard, storage medium 400 includes content 402 (e.g., instructions, data, or any combination thereof) which, when executed, causes the appliance to implement one or more aspects of methods described above.

The machine-readable (storage) medium 400 may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, magnet or optical cards, flash memory, or other type of media / machine-readable medium suitable for storing electronic instructions. Moreover, the present invention may also be downloaded as a computer program product, wherein the program may be transferred from a remote computer to a requesting computer by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., a modem, radio or network connection).

In the description above, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without some of these specific details. In other instances, well-known structures and devices are shown in block diagram form.

Embodiments of the present invention may be used in a variety of applications. Although the present invention is not limited in this respect, the invention disclosed herein may be used in microcontrollers, general-purpose microprocessors, Digital Signal Processors (DSPs), Reduced Instruction-Set Computing (RISC), Complex Instruction-Set Computing (CISC), among other electronic components. However, it should be understood that the scope of the present invention is not limited to these examples.

Embodiments of the present invention may also be included in integrated circuit blocks referred to as core memory, cache memory, or other types of memory that store electronic instructions to be executed by the microprocessor or store data that may be used in arithmetic operations. In general, an embodiment using multistage domino logic in accordance with the claimed subject matter may provide a benefit to microprocessors, and in particular, may be incorporated into an address decoder for a memory device. Note that the embodiments may be integrated into radio systems or hand-held portable devices, especially when devices depend on reduced power consumption. Thus, laptop computers, cellular radiotelephone communication systems, two-way radio communication systems, one-way pagers, two-way pagers, personal communication systems (PCS), personal digital assistants (PDA's), cameras and other products are intended to be included within the scope of the present invention.

The present invention includes various operations. The operations of the present invention may be performed by hardware components, or may be embodied in machine- executable content (e.g., instructions), which may be used to cause a general-purpose or special-purpose processor or logic circuits programmed with the instructions to perform the operations. Alternatively, the operations may be performed by a combination of hardware and software. Moreover, although the invention has been described in the context of a computing appliance, those skilled in the art will appreciate that such functionality may well be embodied in any of number of alternate embodiments such as, for example, integrated within a communication appliance (e.g., a cellular telephone).

Many of the methods are described in their most basic form but operations can be added to or deleted from any of the methods and information can be added or subtracted from any of the described messages without departing from the basic scope of the present invention. Any number of variations of the inventive concept is anticipated within the scope and spirit of the present invention. In this regard, the particular illustrated example embodiments are not provided to limit the invention but merely to illustrate it. Thus, the scope of the present invention is not to be determined by the specific examples provided above but only by the plain language of the following claims.

Claims

CLAIMSWhat is claimed is:

1. A method comprising: receiving a frame from a sports video; identifying a playing surface in the frame; identifying player regions on the playing surface; transforming pixels from the player regions into player models; aggregating the player models from a plurality of frames; and determining a team model to represent a first team and a second team based on statistical analysis of player models.

2. The method of claim 1, wherein the sports video comprises a soccer video.

3. The method of claim 1, further comprising: ignoring a frame if it is determined that a global view is not depicted.

4. The method of claim 1, wherein transforming pixels from the player regions into player models comprises transforming pixels from an upper half of the player regions into player models.

5. The method of claim 1, further comprising: labeling a target player region as belonging to a team if the target player region is sufficiently similar to the team model.

6. The method of claim 1 , further comprising: labeling a target player region as an outlier if the target player region is not sufficiently similar to the team models.

7. A method comprising: receiving a global view frame from a sports video; identifying a playing surface in the frame; identifying target player regions on the playing surface; developing numerical representations for the target player regions; and labeling a target player region as a member of a team if the target player region is sufficiently similar to a stored team model.

8. The method of claim 7, further comprising: maintaining a list of labeled target player regions for use with subsequent frames.

9. The method of claim 7, wherein labeling a target player region comprises placing a colored rectangle in the frame around the target player region.

10. The method of claim 7, further comprising: labeling a target player region as a referee if the target player region is sufficiently similar to a stored referee model.

11. The method of claim 7, wherein developing numerical representations for the target player regions comprises: performing a boosted cascade detection on an upper half of the target player regions.

12. The method of claim 7, wherein the sports video comprises a soccer video.

13. A storage medium comprising content which, when executed by an accessing machine, causes the accessing machine to receive a frame from a sports video, to identify a playing surface in the frame, to identify player regions on the playing surface, to transform pixels from the player regions into player models, to aggregate the player models from a plurality of frames, and to determine a team model to represent a first team and a second team based on statistical analysis of player models.

14. The storage medium of claim 13, wherein the sports video comprises a soccer video.

15. The storage medium of claim 13, further comprising content which, when executed by the accessing machine, causes the accessing machine to ignore a frame if it is determined that a global view is not depicted.

16. The storage medium of claim 13, wherein the content to transform pixels from the player regions into player models comprises contents to transform pixels from an upper half of the player regions into player models.

17. The storage medium of claim 13, further comprising content which, when executed by the accessing machine, causes the accessing machine to label a target player region as belonging to a team if the target player region is sufficiently similar to the team model.

18. A storage medium comprising content which, when executed by an accessing machine, causes the accessing machine to receive a global view frame from a sports video, to identify a playing surface in the frame, to identify target player regions on the playing surface, to develop numerical representations for the target player regions, and to label a target player region as a member of a team if the target player region is sufficiently similar to a stored team model.

19. The storage medium of claim 18, further comprising content which, when executed by the accessing machine, causes the accessing machine to maintain a list of labeled target player regions for use with subsequent frames.

20. The storage medium of claim 18, wherein the content to label a target player region as a member of a team comprises content to place a colored rectangle in the frame around the target player region.

21. The storage medium of claim 18, further comprising content which, when executed by the accessing machine, causes the accessing machine to label a target player region as a referee if the target player region is sufficiently similar to a stored referee model.

22. The storage medium of claim 18, wherein the content to develop numerical representations for the target player regions comprises content to perform a boosted cascade detection on an upper half of the target player regions.