CN102682281A

CN102682281A - Aggregated facial tracking in video

Info

Publication number: CN102682281A
Application number: CN2012100538119A
Authority: CN
Inventors: I·莱希特; E·克鲁普卡; I·阿布拉莫夫斯基; I·克维埃特克夫斯基
Original assignee: Microsoft Corp
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2011-03-04
Filing date: 2012-03-02
Publication date: 2012-09-19
Also published as: EP2681717A2; EP2681717A4; WO2012122069A3; WO2012122069A2

Abstract

The invention relates to aggregated facial tracking in a video. A facial detecting system may analyze a video by traversing the video forwards and backwards to create tracks of a person within the video. After separating the video into shots, the frames of each shot may be analyzed using a face detector algorithm to produce some analyzed information for each frame. A facial track may be generated by grouping the faces detected and by traversing the sequence of frames forwards and backwards. Facial tracks may be joined together within a shot to generate a single track for a person's face within the shot, even when the tracks are discontinuous.

Description

The face of assembling in the video is followed the tracks of

Technical field

The present invention relates to image processing techniques, the face that relates in particular in the video is followed the tracks of.

Background technology

It possibly be difficult that face in the video is followed the tracks of.Many facial tracker algorithms can detect during towards camera facial the people, but when checking that in the side this man-hour maybe be more inaccurate.Leave camera along with this people turns over, the face detector algorithm possibly can't detect face at all.

Summary of the invention

Face detection system can be come analysis video through traveling through video forward and backward, so that create the tracking to a people's in the video face.After video was divided into camera lens, the frame of each camera lens can use face detection algorithm to analyze so that be each some information through analyzing of frame generation.Facial tracking can be through dividing into groups to detected face and generating through traveling through frame sequence forward and backward.Even follow the tracks of when discontinuous at face, the facial tracking also can be connected in camera lens together to generate the single tracking to a people's in this camera lens face.

Content of the present invention is provided so that some notions that will in following embodiment, further describe with the reduced form introduction.Content of the present invention is not intended to identify the key feature or the essential feature of the protection theme that requires, and is not intended to be used to limit the scope of the protection theme that requires yet.

Description of drawings

Fig. 1 is the diagram of embodiment that the network environment of the equipment with analysis video is shown.

Fig. 2 is the process flow diagram that the embodiment of the method that is used for analysis video is shown.

Fig. 3 illustrates the process flow diagram of embodiment that is used for confirming at video the method for camera lens.

Fig. 4 is the process flow diagram that the embodiment of the face tracking that is used for video is shown.

Fig. 5 is the process flow diagram that the embodiment of the method that is used for the existing facial analysis of following the tracks of is linked is shown.

Fig. 6 is the example view that the embodiment that has the facial sequence of frames of video of following the tracks of of gained is shown.

Embodiment

Face detection system can detect the face in the video by the analysis through sequence of frames of video forward and backward.Face can detected by face detection algorithm by on the basis of frame at first, and uses facial tracking analyzer to handle subsequently, so that create the frame sequence that comprises same facial.

Facial check and analysis device can be through traveling through frame sequence and operate being used to detect the facial forward and/or backward mode of coupling.In case one group of facial frame is detected, facial just the tracking can generate through the face object that connects from the successive frame in the video.In many cases, because therefore the face that some frame possibly not be detected can generate a plurality of facial tracking the to individual face in video lens.Under these circumstances, can through in every way to tracking compare the face that will separate follow the tracks of can be linked together become single facial the tracking.

In certain embodiments, facial tracking can be through only relatively the position and the size of face object generate.In such embodiment, the trace of face object can or more be confirmed the multiframe from two, three, and new frame can be analyzed to confirm whether new frame comprises the face object that matees this trace.

In certain embodiments, facial tracking can generate through the information (such as color histogram, face structure or other data) that relatively derives from image.In such embodiment, each face object can by relatively and the similarity between these face objects be found to be when being found to be in predetermined threshold identical.

In many examples, can detect by carrying out face on the basis of frame, wherein each frame can use face detection algorithm analysis.In such embodiment, frame can be parsed into static independent image.Such algorithm possibly not be very accurate, and possibly detect the object that is not facial improperly, maybe possibly can not detect the face that existed.Rely on to travel through forward and backward and come analysis frame information, can eliminate some noise or the unreliability of static faces detection algorithm to create facial the tracking through frame sequence.

Run through this instructions, in the whole description to accompanying drawing, identical Reference numeral is represented identical element.

When element is called as when being " connected " or " coupled ", these elements can directly be connected or be coupled, and perhaps also can have one or more neutral elements.On the contrary, be " directly connected " or when " directly coupling ", do not have neutral element when element is called as.

This theme can be presented as equipment, system, method and/or computer program.Therefore, partly or entirely can the use hardware and/or the software (comprising firmware, resident software, microcode, state machine, gate array etc.) of theme of the present invention are specialized.In addition, theme of the present invention can adopt on it embed have the computing machine that supplies instruction execution system to use or combine its use to use the computing machine of computer-readable program code can use or computer-readable storage medium on the form of computer program.In the context of this document, computing machine can use or computer-readable medium can be can comprise, store, communicate by letter, propagate or transmission procedure uses or combine any medium of its use for instruction execution system, device or equipment.

Computing machine can use or computer-readable medium for example can be but is not limited to electricity, magnetic, light, electromagnetism, infrared or semiconductor system, device, equipment or propagation medium.And unrestricted, computer-readable medium can comprise computer-readable storage medium and communication media as an example.

Computer-readable storage medium comprises the volatibility that realizes with any method or the technology that is used to store such as information such as computer-readable instruction, data structure, program module or other data and non-volatile, removable and removable medium not.Computer-readable storage medium comprises; But be not limited to, RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disc (DVD) or other optical disc storage, tape cassete, tape, disk storage or other magnetic storage apparatus, maybe can be used to store information needed and can be by any other medium of instruction execution system visit.Note; Computing machine can use or computer-readable medium can be to print paper or another the suitable medium that program is arranged on it; Because program can be via for example to the optical scanning of paper or other media and catch electronically; Handle subsequently if necessary by compiling, explanation, or with other suitable manner, and be stored in the computer memory subsequently.

Communication media is usually embodying computer-readable instruction, data structure, program module or other data such as modulated message signal such as carrier wave or other transmission mechanisms, and comprises transport.Term " modulated message signal " is meant to have the signal that is set or changes its one or more characteristics with the mode of coded message in signal.As an example and unrestricted, communication media comprises such as cable network or the wire medium directly line connects, and the wireless medium such as acoustics, RF, infrared and other wireless mediums.Arbitrary combination also should be included within the scope of computer-readable medium in above-mentioned.

When specializing in the general context of this theme at computer executable instructions, this embodiment can comprise the program module of being carried out by one or more systems, computing machine or other equipment.Generally speaking, program module comprises the routine carrying out particular task or realize particular abstract, program, object, assembly, data structure etc.Usually, the function of program module can make up in each embodiment or distribute as required.

Fig. 1 is the diagram that the system implementation example 100 that is used for video analysis is shown.But embodiment 100 is receiver, videos, video is resolved into camera lens and analyzes each frame of each camera lens and detect the simplification example to an equipment of the tracking of face object so that stride frame of video.

The functional module of the system that illustrates of Fig. 1.In some cases, these assemblies can be the combinations of nextport hardware component NextPort, component software or hardware and software.Some assemblies can be application layer software, and other assemblies can be the operating system level assemblies.In some cases, assembly can be tight connection to the connection of another assembly, and wherein two or more assemblies are operated on single hardware platform.In other cases, connection can connect through the network of span length's distance and carries out.Each embodiment can use different hardware, software and interconnect architecture to realize described function.

Be used for the facial system creation that detects and cross over the face tracking of a plurality of frames of video lens.System can use the result by pattern portion detection algorithm, and create the face of crossing over a plurality of frames subsequently and follow the tracks of, but this minimize missed or incorrect facial the detection.System can be forward and both ground analysis video camera lenses that have both at the same time backward, to connect the face in the frame sequence.

Connect face through the inspection frame sequence, possibly having that lose or unreliablely, each frame of detected face can be included in facial the tracking.In addition, misinterpretation or incorrect facial inspection those inspections do not find also comprise coupling facial near frame the time can be left in the basket.

System is carrying out level and smooth effect by having in the pattern portion check system to makeing mistakes.Computing was good when many face detection algorithm directly faced camera a people.When its head being turned to the side along with this people, typical face detection system maybe since the entire face characteristic possibly lose lose that this is analyzed to as if this of face put letter.For example, individual's photo from the side can comprise single eyes, nose side, half face, and this maybe not can be detected as the face with high reliability.The complete face of checking can comprise two eyes, a nose and a face, this maybe be reliable how be detected.

Face in the video is analyzed below capable of using true: can comprise additional information with afterwards each frame of video before the given frame, this additional information can help to confirm whether face exists really and when face possibly correctly not detected, fill.

Video analytic system can at first split into each camera lens with video.Each camera lens can be the sequence of similar frame, and can comprise identical face.In some cases, shot boundary can be determined when camera operator begins and finish specific video clip, thereby creates each camera lens.In other cases, scene can fully change, and still when writing down, also can create new camera lens even make at camera.Such incident takes place in the time of can and changing the visual field at the camera operator fast steering.

Can analyze camera lens follows the tracks of with the face of finding out in the camera lens.In many examples, can be consistent with size from a frame to another frame through supposing facial position, confirm facial the tracking.Such algorithm possibly not be intended to stride shot boundary to be operated.Therefore, many video parsers maybe from the too many camera lens of video creation rather than very little camera lens aspect make mistakes.Too many camera lens can be derived from video parser to the hypersusceptible situation of the change in the camera lens, and can in the time in fact possibly not having shot boundary, detect shot boundary.Camera lens can take place when video parser possibly more unwisely help to change and possibly can not detect the real lens border very little.

But each frame of face detector analysis video camera lens is to detect the face in this frame.Many embodiment can be through separating each frame and other frames and analyzing independently and operate face detector.Face detector can use the facial testing mechanism of any kind to detect the interior face of still image of frame.

In many cases, face detector can detect one or more faces, and the position and the size of face object can be provided.Some embodiment can comprise the reliability factor that is used to detect, the degree of confidence that this reliability factor can indicate algorithm in detection, can have.Some embodiment can comprise and facial relevant various characteristics, such as face structure analysis, color histogram or from other information of image derivation itself.

Facial tracking analyzer can be attempted through analysis frame sequence on both direction forward and backward face object being connected to another frame from a frame.In certain embodiments, facial tracking analyzer can be attempted position and the size through face object near the frame only comparing, near the face object mating in the frame.Other embodiment can compare additive factor (such as the factor that derives from graphical analysis) and mate face object.

In certain embodiments, the first pass that is used to mate face object can use the position of face object and size to make.Can use for second time the graphical analysis factor to carry out, with checking or the initial searching that analysis has been done with size of additional use location.

Facial tracking analyzer can be created first group of facial tracking, and the face that can attempt subsequently connecting in the camera lens is followed the tracks of.Connect facial process of following the tracks of and to connect discontinuous tracking, but identical face possibly is shown.Connection process can select non-overlapped face to follow the tracks of and the use location is analyzed with size or the image factor analysis in any or both connect them.

In certain embodiments, can face tracking and other facial tracking the in other camera lenses be made comparisons.In such embodiment, facial tracking can use graphical analysis (such as the analysis of face structure, color histogram or other types) to come relatively, so that confirm that two facial tracking are to same people.

The system of embodiment 100 is shown to include in individual equipment 102.In many examples, various component softwares can be implemented on many different equipment.In some cases, single component software is implemented on computing machine troops.Some embodiment can use the one or more cloud computing technology that is used for these assemblies to operate.

The system of embodiment 100 can be by each client devices 132 visit.Client devices 132 can visit system through web browser or other application.In such embodiment; Equipment 102 can be realized as the web service that can in based on the system of cloud, handle video, such embodiment can through from each client computer receiver, video image, the large-scale data center, handle video image and the result who is analyzed returned to client computer and operate.

In another embodiment, other computing platforms in the operation of equipment 102 can be controlled through personal computer, server computer or user are carried out.This embodiment can use the software package that distributes and be installed on the subscriber computer to realize.

In another embodiment, the operation of equipment 102 can be implemented in video camera or other specialized equipments.In the time of in being implemented in video camera, but this camera capture video fragment, and subsequently after this is true to this video segment execution analysis.

Equipment 102 can have hardware platform 104 and component software 106.The equipment of any kind that client devices 102 can be represented to communicate with video source (such as each client devices 132, social networks website 136 or other sources).In some cases, client devices 102 can have video camera or can in client devices 102, generate other capture devices of video.

Nextport hardware component NextPort 104 can be represented the typical architecture of computing equipment, like desk-top or server computer.In certain embodiments, client device 102 can be personal computer, game console, the network equipment, interactive self-service terminal (kiosk) or other equipment.Client device 102 can also be a portable set, such as laptop computer, net book computing machine, personal digital assistant, mobile phone or other mobile devices.

Nextport hardware component NextPort 104 can comprise processor 108, RAS 110 and non-volatile memories 112.Processor 108 can be single microprocessor, polycaryon processor or one group of processor.RAS 110 can store executable code and the data that can directly visit of processor 108, and non-volatile memories 112 can be with permanent state store executable code and data.

Nextport hardware component NextPort 104 also can comprise one or more user interface facilities 114 and network interface 116.User interface facilities 114 can comprise the user interface facilities of monitor, display, keyboard, pointing device and any other type.In certain embodiments, user's interface unit can comprise camera or other video capturing devices.Network interface 116 can comprise hard-wired interface and wave point, and equipment 102 can communicate through these interfaces and other equipment.

Component software 106 can comprise the operating system 118 that various application can be carried out above that.

Video analytic system 120 can be handled video to detect facial the tracking.But video parser 122 analysis video images are so that be divided into each camera lens with this video.Each camera lens can be the frame sequence relevant with room and time.Camera lens can comprise same scene, and when the people existed, the people in the scene can move smoothly and continuously.

Face detector 124 can be analyzed each frame of camera lens so that attempt finding out the face in the frame.Face detector 124 can be analyzed each frame as still image, and can use or can not use consecutive frame to detect face.Face detector 124 can be each face and returns one group of information.This group information can be different between an embodiment and another embodiment.This group information can comprise position and the size that each is facial, and this can be facial one group of coordinate and the rectangle of face object or the size of other shapes.This group coordinate can be central point or face object one jiao.In certain embodiments, size can be used a certain other of the height and width, radius of a circle of rectangle, oval a pair of radius or size to indicate to express.

In certain embodiments, this group information can comprise the additional information that can derive from image itself.Such information can comprise color histogram, face structure characteristic or some other information of face object.Such information can be used to mate face object through comparing the similar image characteristic.

Facial tracking analyzer 126 can use from the output of face detector 124 creates the frame sequence that comprises the same facial object.Some embodiment can only compare position and size facial in the camera lens, so that the face object in the successive frame is linked at together.The information that other embodiment can use from image to be derived is found out the face object in the successive frame.

In certain embodiments, facial tracking analyzer 124 can be in video sequence forward with to the post analysis successive frame.Facial tracking analyzer 124 can with the face object in the frame on arbitrary direction of given frame with each frame group or troop and make comparisons.In such embodiment, cluster analysis or clustering algorithm can be used to marking matched.

Some embodiment can use the object track algorithm to stride a plurality of frame tracks facial objects.Some object track algorithm can be striden the possible trace that frame of video is confirmed object, follows the tracks of so that confirm.Facial tracking analyzer 126 can use various technology (such as, agglomerate is followed the tracks of, is followed the tracks of or other follow-up mechanism based on tracking, the profile of kernel) stride a plurality of frames and analyze similar face object.

Facial tracking analyzer 126 can adopt the object track algorithm to use the metadata relevant with face object.Because individual's face possibly change the characteristic in the video; Such as this people with its head from directly towards camera turn to the side camera lens, when leaving camera to face, conventional object follow-up mechanism maybe be equally effective unlike the facial tracking analyzer 126 that can use the relevant metadata of the face object created with face detector 124.

Metadata can comprise can be from various facial orientations detected face object, this possibly be unusual pictures different.Facial tracking analyzer 126 can be associated in face object together, and those of detection and checking and various object follow-up mechanism are related.

Preprocessor 128 can be attempted non-overlapped face is followed the tracks of the face tracking that is connected into than long.But preprocessor 128 use locations confirm with size analysis whether two facial tracking possibly be relevant.In certain embodiments, preprocessor 128 can use graphical analysis relatively (such as, face structure comparison or color histogram analysis) confirm coupling.

In certain embodiments; Preprocessor 128 can attempt that the most detected face object matees two facial tracking in first facial the tracking through finding out, and the most detected face object in this face object and the second facial tracking is made comparisons.Two reliable face objects can be that best face is represented in each facial tracking, and first image that the comparison between those images maybe be recently followed the tracks of from the last image and second of a tracking is more definite.

Video analytic system 120 can be connected to other equipment through network 130.Network 130 can be personal area network, LAN, wide area network, the Internet or any other network.

Each client devices 132 can have various forms of videos.Video database 134 can be the storage vault that comprises any kind of analyzable video.Client devices 132 can be personal computer or other equipment that the user possibly be uploaded to the video from various video source it.Client devices 132 can be video camera, cell phone, personal digital assistant or other video capturing devices.

In certain embodiments, but various social networks website 136 can comprise the video database 138 that user's uploaded videos supply to be shared.Social networks website 136 can be configured to video is sent to video analytic system 120, so that make video analyzed and detect each individual in the video.

In many examples, the output of video analytic system 120 can be used to attempt real individual in the sign video.Output can detect individual's face and follow the tracks of, and image matching system can be attempted real individual's name or other information are associated with the face tracking of video.Such system is not shown in embodiment 100, and only is a use scene of video analytic system 120.

Fig. 2 is the flow process diagram that the embodiment 200 of the method that is used for analysis video is shown.Embodiment 200 is simplified example of a method, and video analytic system 120 can be carried out this method so that video is resolved to camera lens, video lens carried out by the static analysis of frame and the face that uses the output of static faces analysis to create a plurality of frames of crossing over video followed the tracks of.

Other embodiment can use different orderings, additional or step still less and different name or terms to accomplish similar functions.In certain embodiments, each operation or operational set can be carried out with synchronous or asynchronous system with other operations concurrently.Selected here step is selected in order with the form of simplifying the certain operations principle to be shown.

Embodiment 200 illustrate can analysis video to create a kind of method that the face in the video is followed the tracks of.After video is split into each camera lens, can detect for static faces by analyzing each camera lens on the basis of frame.Can be used to together subsequently by the frame analysis result, so that the mobile or progress of the single face in the video is shown a plurality of frame chainings.

At frame 202, can receive the video that to analyze.The video image of any kind that video can be made up of the series or the sequence of each frame.At frame 204, can video be divided into discrete camera lens.Each camera lens can be represented single scene or one group of relevant frame.Embodiment 300 places of executable procedural example in this instructions after a while find in frame 204.

At frame 206, can analyze each camera lens.In frame 206 for each camera lens and in frame 208 for each frame of each camera lens, can be in frame 210 to each facial this frame of analyzing.The analysis of frame 210 can be in still image, to detect facial still image analysis.In frame 212,, in frame 214, confirm the size and the position of this face, in frame 215, can carry out, and in frame 216, can be the definite reliability factor of this analysis this facial graphical analysis for each detected face.In frame 218, can all analysis results be stored in the facial definition.

Facial analysis can comprise position and size definition.In certain embodiments, the position of specific face in this frame indicated with big I in the position.The size definable comprises this facial image-region.In many examples, the position can be the central point on facial border, but one jiao of other embodiment definable or other positions.Facial big I is indicated by the geometric configuration such as rectangle, square, circle, ellipse, hexagon, octagon or other shapes.In some cases, size can define by one, two, three or more value.In the typical case of rectangle, big I uses height and width to define.

Facial graphical analysis can comprise the various data that derived from image itself, such as color histogram, face structure variable or other information.Some embodiment can use graphical analysis information to come two face objects of comparison to confirm whether these two objects mate.Such coupling can be performed, so that depend on embodiment two sequence frames, two face tracking or other couplings of separating is associated.

The reliability factor of frame 216 can be statistics or other designator of degree of confidence in analyzing.Reliability factor can be indicated the degree of confidence of face detection algorithm for the portion of one side really of this object.Facial detection can be to have a large amount of variable complicated algorithms.Each algorithm can have the different mechanisms that is used to indicate reliability, the numerical fraction such as from 0 to 1 or 1 to 10, the qualitative indicator such as high, medium and low or a certain other designators.

After each face and storage face object in frame 218 in the analysis frame, in frame 220, can store frame definition through analyzing.The process of frame 206 to 220 can repeat to each frame of each camera lens.

In frame 220, can select the frame in the camera lens, in certain embodiments, the frame of frame 220 can be any frame in the camera lens.But the frame in some embodiment scanning lens is to find out the most detected face object in this camera lens.In frame 224, can select without the most detected face object of analyzing from this frame.

Use selected face object, can in frame 226, carry out facial the tracking forward, and in frame 228, in video sequence, carry out facial the tracking backward through

video sequence.Frame

226 and 228 procedural example embodiment can find in the embodiment 400 in this instructions after a while.

In frame 230, can store the result of facial trace analysis, and in frame 232, can face object be labeled as and handle.In frame 234, if there is more multiaspect portion in the present frame, then this process can be back to frame 224 so that select another face.In frame 236, if there are more frames without analysis in the camera lens, then this process can be back to frame 222 so that select another frame.

In case frame is analyzed in frame 236, then in frame 238, can carry out link analysis.Can in the embodiment 500 that this instructions appears after a while, find out the example of this operation.

Fig. 3 is the flow process diagram that the embodiment 300 of the method that is used for the camera lens in definite video is shown.Embodiment 300 is simplification examples of the method that can be carried out by video parser (such as the video parser 122 of embodiment 100).

The method of embodiment 300 illustrates an example that how video sequence is divided into discrete camera lens.Each camera lens can be the sequence of similar frame, and can have identical face-image in facial the tracking.

In frame 302, can receive the video that to analyze.In frame 304,, in frame 306, present frame can be characterized, and in frame 308, next frame can be characterized for each frame in the video.In frame 310, can the sign of these frames be compared, so that confirm whether these frames are different on statistics.In frame 310, if block is significantly not different, and the metadata that then in frame 312, can relatively be associated with these frames is so that confirm whether camera lens changes.If, then this process is not back to frame 304 to handle next frame.

If statistical study or metadata analysis are indicated camera lens or in frame 310 or in 312, changed, then new camera lens can be identified in frame 314.This process can be back to frame 304 to handle another frame.The process of embodiment 300 can continue, and all is processed up to each frame of video.

The statistical of frame 310 can compare to various statisticss or from the image institute derived information of frame.Such information can comprise other analyses of color histogram, object analysis or image.When image suddenly changed to another frame from a frame, new frame can be indicated.

The metadata analysis of frame 312 can comprise timestamp or other metadata that inspection is associated with each frame.When timestamp changed to another frame from a frame with showing, timestamp can be indicated camera operator to stop or having been restarted camera, and this indicates new camera lens.

Fig. 4 is the flow process diagram that the embodiment 400 of the method that is used for the interior face tracking of video lens is shown.Embodiment 400 is simplification examples of the method that can be carried out by facial tracking analyzer (such as the face portion component parser 126 of embodiment 100).

Embodiment 400 illustrates and can create facial a kind of method of following the tracks of.Be linked at face object sequence together in the frame sequence that facial tracking can be a video.Facial tracking can be illustrated in it and move with video lens and the same face object when changing.

In frame 402, can receive start frame and detected face object and in frame 404, can be identified at the framing on the traversal direction.Traversal direction can be to pass through video flowing forward or backward, and can use the previous and follow-up frame of start frame.

In certain embodiments, start frame can and select the most detected face object to select through the frame in the scanning lens.Facial tracking can use the most detected face object to create, and the same procedure that each follow-up facial tracking subsequently can use selection not to be placed in the facial the most detected face object of following the tracks of is as yet created.

In frame 406, can use the trace analysis that the face object in the face object in the present frame and this framing is made comparisons.The trace analysis can be attempted mating face object based on the position of face object and size.In many examples, only use location and size relatively and can be used or can the inapplicable information that derives from graphical analysis in such analysis.

If in frame 408, there is successful coupling, then in frame 414, can this framing be added in facial the tracking.

If in frame 408, there is not successful coupling, then in frame 410, can use the graphical analysis result to attempt coupling.The graphical analysis result can use color histogram, face structure analysis or use from image that face object is associated the comparison of the information that the derives other types of carrying out.If in frame 412, there is successful coupling, then this process can proceed to frame 414, and in frame 414, can add this framing.If in frame 412, there is not successful coupling, then in frame 418, can finish to follow the tracks of.

In

frame

408 or 412, find out the coupling of success and in frame 414, frame is added under the facial situation of following the tracks of, if in frame 416, there is additional frame in the camera lens, then present frame can be increased progressively in frame 420, and this process can be back in the frame 402 so that repeat.If in frame 416, there is not additional frame, then in frame 418, can finish to follow the tracks of.

Fig. 5 is the flow process diagram that the embodiment 500 that is used to link facial method of following the tracks of is shown.Embodiment 500 is simplification examples of the method that can be carried out by preprocessor (such as the preprocessor 128 of embodiment 100).

Embodiment 500 illustrates can face be followed the tracks of and is linked at together to form the long facial a kind of method of following the tracks of through camera lens.The link analysis of embodiment 500 can be attempted following the tracks of from the face of same face object and be connected into single, long face tracking.

Non-overlapped facial tracking of a part of Operations Analyst of embodiment 500, wherein non-overlapped facial the tracking is that those faces of not sharing common frame are followed the tracks of.Overlapping facial tracking in the camera lens can indicate two faces that separate in same frame, to be illustrated.Because overlapping face is followed the tracks of two faces that separate of indication, consider that therefore it can be unsuitable connecting such face tracking.

In frame 502, can detect facial the tracking.In frame 504, non-overlapped facial tracking can be to be detected on the forward and backward both direction of given frame in camera lens.Detected facial tracking can be to follow the tracks of with the given facial face of following the tracks of potential coupling.

In frame 506, the object trace that face that can more potential coupling is followed the tracks of.The object trace can use the position of face object and size to come relatively more facial the tracking.In certain embodiments, can only compare the position of face object, and other embodiment can be in trace be analyzed use location and size both.

In frame 508, can be in each facial tracking the most detected face object of selection, and in frame 510, can compare the most detected face object.Comparison in the frame 510 can use the graphical analysis result to confirm whether facial the tracking representes same face.If in frame 510, there is coupling, then facial tracking can be added to together in frame 512.If in frame 510, there is not coupling, and in frame 514, have more multiaspect portion tracking in the camera lens, then this process can be back to frame 502 to handle another facial tracking.If in frame 514, there be not more available facial the tracking, then this process can finish in frame 516.

In certain embodiments, the process of embodiment 500 can be used to link the face tracking from different camera lenses.Under these circumstances, can use embodiment 500, and need not the comparison other trace of frame 506.Such embodiment can select face object from the face of two potential couplings is followed the tracks of, and uses the graphical analysis result to confirm whether facial the tracking matees.If then can stride shot boundary and connect facial the tracking.

Fig. 6 is the diagram that the example embodiment 600 of following the tracks of from the face of single camera lens is shown.Embodiment 600 shows five frames that show two faces, and the diagram of one of following the tracks of from the face that frame sequence is derived.Embodiment 600 is examples of simplifying very much for purpose of explanation.

Frame 602,604,606,608 and 610 shows the successive frame of single video lens.In each frame is face 612 and 614, and each face all travels through these frames in order.Facial 612 move to the backstage in order and move to the right side, and face 614 moves to the front in order and moves to the right side.

After these frames of traversal, stride face that these frames link facial 612 all places and size and follow the tracks of 616 and can be generated.Facial 616 the face objects can be illustrated in it and move through successive frame time of following the tracks of.

To foregoing description of the present invention from diagram and purpose of description and appear.It is not intended to exhaustive theme or this theme is limited to disclosed precise forms, and seeing that other modifications of above-mentioned instruction and modification all are possible.Select also to describe embodiment and come to explain best principle of the present invention and practical application thereof, make others skilled in the art in the modification of the special-purpose that various embodiment and various are suitable for being conceived, utilize the present invention best thus.Appended claims is intended to be interpreted as other alternative embodiments that comprise except that the scope that limit by prior art.

Claims

1. method of at least one computer processor, carrying out, said method comprises:

Reception comprises the video (202) of frame sequence;

For at least one camera lens in the said video, analyze each frame in the said frame detecting face (212), said facial with position and size identify at least;

Create first facial the tracking through following operation:

Select first face (212) in first frame;

Analyze at least one follow-up frame of said first frame to identify said first face (226); And

Analyze at least one previous frame of said first frame to identify said first face (228), so that create said first facial follow the tracks of (230).

2. the method for claim 1 is characterized in that, at least one facial tracking of said establishment also comprises:

Sign second facial the tracking;

Confirm that said first facial the tracking comprises and the similar face of the said second facial tracking; And

Said first facial the tracking with said second facial the tracking is combined into single facial the tracking.

3. method as claimed in claim 2 is characterized in that, said second facial the tracking shared common frame with said first facial the tracking in said frame sequence.

4. method as claimed in claim 3; It is characterized in that; Confirm that said first facial the tracking comprises and said second facially follow the tracks of similar face, said second facial the tracking is to use in said first facial tracking at least one that at least one the facial graphical analysis in facial and said second facial the tracking is carried out.

5. method as claimed in claim 4 is characterized in that said graphical analysis comprises the color histogram analysis.

6. method as claimed in claim 4 is characterized in that said graphical analysis comprises the face structure analysis.

7. the method for claim 1; It is characterized in that said first face is through identifying with making comparisons from said position and the said size of said first face in said second frame from said position and the said size of said first face in first frame.

8. method as claimed in claim 7; It is characterized in that said first face is to identify through making comparisons from the said position in said position and the said size of said first face in first frame and the framing that is comprising said second frame from said first face and said size.

9. method as claimed in claim 8 is characterized in that, the said clustering algorithm that relatively uses.

10. system comprises:

Face detector (124) is used for:

Each frame of first camera lens is to identify face in the analysis video, and said face is with position and size identify at least;

Facial tracking analyzer (126) is used for:

Select first face in first frame;

Analyze said first frame at least one frame afterwards to identify said first face; And

Analyze said first frame at least one frame before to identify said first face, so that create said first facial the tracking;

Said system is performed at least one processor (108).

11. system as claimed in claim 10 is characterized in that, said facial tracking analyzer is used for:

The second facial tracking is combined to said first facial the tracking, and said first face is followed the tracks of and second face is followed the tracks of not overlapping.

12. system as claimed in claim 11 is characterized in that, said facial tracking analyzer is used for:

Said second facial the tracking is combined to said first face, first facial follow the tracks of in first camera lens and said second facial the tracking in second camera lens.

13. system as claimed in claim 10 is characterized in that, said face detector identifies the reliability factor of said first face.

14. system as claimed in claim 13 is characterized in that, said facial tracking analyzer also is used for:

Analyze said facial the tracking to confirm to comprise said first face and second frame with high reliability factor; And at least a portion of selecting said second frame is to represent said first face in said facial the tracking.

15. system as claimed in claim 10 is characterized in that, said face detector also is used for: generate the analysis of image content to said face.