CN109241345A

CN109241345A - Video locating method and device based on recognition of face

Info

Publication number: CN109241345A
Application number: CN201811178561.5A
Authority: CN
Inventors: 李元朋
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-10-10
Filing date: 2018-10-10
Publication date: 2019-01-18
Anticipated expiration: 2038-10-10
Also published as: CN109241345B

Abstract

The embodiment of the present invention proposes a kind of video locating method and device based on recognition of face.This method comprises: each frame image to video carries out recognition of face, each frame image including facial image is obtained；Target following is carried out to each frame image for including facial image, is gathered each frame image of the facial image including same personage as one；Multiple frame images are chosen from each set；Compare the facial image of target person with from the facial image in each frame image that each set is chosen, the position occurred in the video with the determination target person.The embodiment of the present invention can fast and accurately identify the frame image for occurring facial image in video, also, targetedly identify that the facial image of target person occurs in which frame image.Therefore, be conducive to assist video editing, processing is optimized to video.

Description

Video locating method and device based on recognition of face

Technical field

The present invention relates to technical field of face recognition more particularly to a kind of video locating methods and dress based on recognition of face It sets.

Background technique

In a video, various personage may include.These personages may go out in certain times of video playing Existing, certain times do not occur.If necessary to fall some character image editings in video, needs to edit and manually check the view Frequently, in the character image that discovery needs to be clipped to, the information such as time which occurs are recorded, then is cut Collect operation.

Time-consuming for the artificial character image checked in video, if certain scene conversions in video are too fast, is also easy to leak Fall to want the character image cut.

Summary of the invention

The embodiment of the present invention provides a kind of video locating method and device based on recognition of face, to solve in the prior art One or more technical problems.

In a first aspect, the embodiment of the invention provides a kind of video locating methods based on recognition of face, comprising:

Recognition of face is carried out to each frame image of video, obtains each frame image including facial image；

Target following is carried out to each frame image for including facial image, by each frame figure of the facial image including same personage As a set；

Multiple frame images are chosen from each set；

Compare the facial image of target person with from the facial image in each frame image that each set is chosen, to determine State the position that target person occurs in the video.

In one embodiment, target following is carried out to each frame image for including facial image, comprising:

Using core correlation filtering, target following is carried out to each frame image for including facial image.

In one embodiment, using core correlation filtering, target is carried out to each frame image for including facial image Tracking, comprising:

Detect position of the facial image in each frame image；

Calculate position offset of the facial image in consecutive frame image；

If position offset be less than given threshold, the consecutive frame image is determined as include same personage face The frame image of image.

In one embodiment, position offset of the facial image in consecutive frame image is calculated, comprising:

If consecutive frame image occurs stretching or scale, after the coordinate alignment in consecutive frame image, then face is calculated Position offset of the image in consecutive frame image.

In one embodiment, multiple frame images are chosen from each set, comprising:

According to the clarity and/or resolution ratio of the frame image for including in a set, multiple frames are chosen from the set Image.

In one embodiment, compare the facial image of target person and from each frame image that each set is chosen Facial image, the position occurred in the video with the determination target person, comprising:

The facial image for calculating target person and the similarity from the facial image in each frame image that each set is chosen；

If the facial image of target person and the similarity from the facial image in each frame image that a set is chosen Greater than given threshold, it is determined that the target person occurs in the set of the video；

Obtain the frame number and/or corresponding playing time for occurring that the set of the target person includes.

Second aspect, the embodiment of the invention provides a kind of video positioning apparatus based on recognition of face, comprising:

Face recognition module carries out recognition of face for each frame image to video, obtains each frame including facial image Image；

Target tracking module will include same personage for carrying out target following to each frame image for including facial image Facial image each frame image as one gather；

Module is chosen, for choosing multiple frame images from each set；

Locating module, for compare the facial image of target person with from the face in each frame image that each set is chosen Image, the position occurred in the video with the determination target person.

In one embodiment, the target tracking module is also used to using core correlation filtering, to including face Each frame image of image carries out target following.

In one embodiment, the target tracking module is also used to detect position of the facial image in each frame image It sets；Calculate position offset of the facial image in consecutive frame image；It, will be described if position offset is less than given threshold Consecutive frame image be determined as include the facial image of same personage frame image.

In one embodiment, if the target tracking module is also used to consecutive frame image and occurs stretching or scale, After then the coordinate in consecutive frame image is aligned, then calculate position offset of the facial image in consecutive frame image.

In one embodiment, the clarity chosen module and be also used to the frame image for including in gathering according to one And/or resolution ratio, multiple frame images are chosen from the set.

In one embodiment, the locating module be also used to calculate the facial image of target person with from each set The similarity for the facial image in each frame image chosen；If the facial image of target person gathers each of selection with from one The similarity of facial image in frame image is greater than given threshold, it is determined that the set of the target person in the video Middle appearance；Obtain the frame number and/or corresponding playing time for occurring that the set of the target person includes.

The third aspect, the embodiment of the invention provides a kind of video positioning apparatus based on recognition of face, described device Function can also execute corresponding software realization by hardware realization by hardware.The hardware or software include one Or multiple modules corresponding with above-mentioned function.

It include processor and memory in the structure of described device in a possible design, the memory is used for Storage supports described device to execute the program of the above-mentioned video locating method based on recognition of face, the processor is configured to The program stored in the execution memory.Described device can also include communication interface, be used for and other equipment or communication Network communication.

Fourth aspect, the embodiment of the invention provides a kind of computer readable storage mediums, are known for storing based on face Computer software instructions used in other video positioning apparatus comprising for executing the above-mentioned video location based on recognition of face Program involved in method.

A technical solution in above-mentioned technical proposal has the following advantages that or the utility model has the advantages that can fast and accurately know The frame image of facial image Chu not occur in video, also, targetedly identify the facial image of target person at which Occur in frame image.Therefore, be conducive to assist video editing, processing is optimized to video.

Above-mentioned general introduction is merely to illustrate that the purpose of book, it is not intended to be limited in any way.Except foregoing description Schematical aspect, except embodiment and feature, by reference to attached drawing and the following detailed description, the present invention is further Aspect, embodiment and feature, which will be, to be readily apparent that.

Detailed description of the invention

In the accompanying drawings, unless specified otherwise herein, otherwise indicate the same or similar through the identical appended drawing reference of multiple attached drawings Component or element.What these attached drawings were not necessarily to scale.It should be understood that these attached drawings depict only according to the present invention Disclosed some embodiments, and should not serve to limit the scope of the present invention.

Fig. 1 shows the flow chart of the video locating method according to an embodiment of the present invention based on recognition of face.

Fig. 2 shows the flow charts of the video locating method according to an embodiment of the present invention based on recognition of face.

Fig. 3 shows the flow chart of the video locating method according to an embodiment of the present invention based on recognition of face.

Fig. 4 shows the structural block diagram of the video positioning apparatus according to an embodiment of the present invention based on recognition of face.

Fig. 5 shows the structural block diagram of the video positioning apparatus according to an embodiment of the present invention based on recognition of face.

Specific embodiment

Hereinafter, certain exemplary embodiments are simply just described.As one skilled in the art will recognize that Like that, without departing from the spirit or scope of the present invention, described embodiment can be modified by various different modes. Therefore, attached drawing and description are considered essentially illustrative rather than restrictive.

Fig. 1 shows the flow chart of the video locating method according to an embodiment of the present invention based on recognition of face.Such as Fig. 1 institute Show, being somebody's turn to do the video locating method based on recognition of face may include:

Step S11, recognition of face is carried out to each frame image of video, obtains each frame image including facial image.

Step S12, target following is carried out to each frame image for including facial image, by the facial image including same personage Each frame image as one gather.

Step S13, multiple frame images are chosen from each set.

Step S14, compare the facial image of target person with from the facial image in each frame image that each set is chosen, The position occurred in the video with the determination target person.

In general, video includes several frame image, and each frame image has corresponding frame number.In general, each frame figure As also having corresponding playing time in video.There is landscape in some frame images, there is personage in some frame images.Having Have in the frame image of personage, the facial image in some frame images there may be one, and the facial image in some frame images may Have multiple.

In embodiments of the present invention, recognition of face can be carried out to all frame images of video, it can also be first to video The frame image of a part for example preceding 10% carries out recognition of face, then carries out recognition of face to subsequent frame image segmentation.In this way, can The frame optical sieving in video including facial image to be come out.

In one embodiment, in step s 12, target following, packet are carried out to each frame image for including facial image It includes: using core KCF (Kernel Correlation Filter core correlation filtering) algorithm, to each frame figure including facial image As carrying out target following.It is then possible to gather each frame image of the facial image including same personage as one, may obtain To multiple set.

For example, frame number 040 to 050 includes the face of personage B if frame number 010 to 030 includes the facial image of personage A Image, using the frame image of frame number 010 to 030 as a set S1, the frame image of frame number 040 to 050 is as a set S2.

The facial image of multiple personages is likely to occur in one frame image, therefore, the same frame image may belong to difference Set.For example, frame number 020 to 050 includes the face figure of personage C if frame number 010 to 030 includes the facial image of personage A Picture, using the frame image of frame number 010 to 030 as a set S1, the frame image of frame number 020 to 050 is as a set S3.

In one embodiment, as shown in Fig. 2, using core correlation filtering, to each frame figure including facial image As carrying out target following, comprising:

Step S21, position of the detection facial image in each frame image.

Step S22, position offset of the facial image in consecutive frame image is calculated.

If step S23, position offset is less than given threshold, the consecutive frame image is determined as including same people The frame image of the facial image of object.

In embodiments of the present invention, these can be determined according to position offset of the facial image in each frame image Whether the facial image for including in frame image belongs to same personage.The position that usual same personage occurs in consecutive frame image is inclined Shifting amount will not be too far.Therefore position offset of the facial image in consecutive frame image can be calculated.If the difference is less than one Determine threshold value, it is possible to determine that consecutive frame image determines the facial image including same personage.

In a kind of example, calculating position offset of the facial image in consecutive frame image may include: calculating face The difference or distance of center point coordinate of the image in consecutive frame image.Such as: the centre coordinate of facial image in frame image F1 For (x1, y1), the centre coordinate of facial image is (x2, y2) in frame image F2, and the difference of the two can be (x2-x1, y2- y1).The two distance can be Euclidean distance or COS distance etc..

In another example, calculating position offset of the facial image in consecutive frame image may include: to calculate people After the difference of center point coordinate of the face image in consecutive frame image, then calculate the ratio of the difference and frame image size.For example, The centre coordinate of facial image is (x1, y1) in frame image F1, and the centre coordinate of facial image is (x2, y2) in frame image F2, The difference of the two can be (x2-x1, y2-y1).If the length of frame image is x, width y can calculate ratio (x2- X1)/x, and (y2-y1)/y, using these ratios as offset.

Therefore, offset threshold value can also be arranged in correspondence with according to the calculation method of offset.For example, setting length difference Threshold value, the threshold value of width difference, the threshold value of Euclidean distance, proportion threshold value etc..

During video capture, in fact it could happen that the stretching of camera lens or scaling so that same facial image generate amplification or The block that person reduces.For example, if frame image F1 and F2 are consecutive frame image, also, frame image F2 goes out compared with frame image F1 Stretching is showed, then coordinate conversion first can have been carried out to F2 according to the ratio of stretching, then compare the position of facial image in F2 and F1 Offset.

In one embodiment, step S13 chooses multiple frame images from each set, comprising:

For example, including 20 frame images in some set, the high example of several image quality can be chosen from this 20 frame images Such as high resolution and the high frame image of clarity.In this way, being conducive to accurately be known in subsequent progress target person identification Other result.

In one embodiment, as shown in figure 3, step S14 compare the facial image of target person with from each set The facial image in each frame image chosen, the position occurred in the video with the determination target person, comprising:

Step S31, calculate target person facial image with from it is each set choose each frame image in facial image Similarity.

If step S32, the facial image of target person and the facial image gathered in each frame image chosen from one Similarity be greater than given threshold, it is determined that the target person occurs in the set of the video.

Step S33, the frame number and/or corresponding playing time for occurring that the set of the target person includes are obtained.

If all including the facial image of target person from all frame images chosen in some set, it is possible to determine that this The frame image maximum probability for including in a set belongs to this target person.Wherein, the facial image of target person can be by user It voluntarily provides, some personages can be prestored in the database, such as need to forbid the photo etc. of the blacklist personage played.It is needing It, can be real to recall the facial images of one or more target persons in database when carrying out the processing such as editing to some video Existing real time contrast.

For example, the frame image in video including human face region has 100 facial images for same personage occur, by this 100 Zhang Zuowei mono- set.The frame image that several image quality height (high resolution, clarity are high) can be selected from this set, than The similarity of the facial image of the frame image and target person relatively selected.If similarity height can be determined that this 100 frame images In include target person.

It is then possible to export the frame number of these frame images with target person.Exist in addition it is also possible to export these frame numbers Corresponding playing time in video.

In a kind of application example, in the post-production of video, recognition of face is carried out to video, obtains some target person Object occur frame number or after the moment, these frame images can be found, the target person in these frame images is handled.Example Such as, these frame images are deleted, or mosaic processing etc. is done to the target person in these frame images.

Using the embodiment of the present invention, it can fast and accurately identify occur the frame image of facial image in video, and And targetedly identify that the facial image of target person occurs in which frame image.Therefore, be conducive to that video is assisted to compile Volume, processing is optimized to video.

The embodiment of the present invention can both have been supported to automatically track using personage in database realizing video, can also support to use Family uploads the character image to be tracked.In addition, by the personage in identification video, number that position character occurs in video, The position etc. occurred every time, convenient for handling specific personage in the post-production of video.

Fig. 4 shows the structural block diagram of the video positioning apparatus according to an embodiment of the present invention based on recognition of face.Such as Fig. 4 institute Show, the apparatus may include:

Face recognition module 41 carries out recognition of face for each frame image to video, and obtaining includes each of facial image Frame image；

Target tracking module 42 will include same people for carrying out target following to each frame image for including facial image Each frame image of the facial image of object is gathered as one；

Module 43 is chosen, for choosing multiple frame images from each set；

Locating module 44, for compare the facial image of target person with from the people in each frame image that each set is chosen Face image, the position occurred in the video with the determination target person.

In one embodiment, the target tracking module 42 is also used to using core correlation filtering, to including people Each frame image of face image carries out target following.

In one embodiment, the target tracking module 42 is also used to detect position of the facial image in each frame image It sets；Calculate position offset of the facial image in consecutive frame image；It, will be described if position offset is less than given threshold Consecutive frame image be determined as include the facial image of same personage frame image.

In one embodiment, if the target tracking module 42 is also used to consecutive frame image and occurs stretching or contract It puts, then after being aligned the coordinate in consecutive frame image, then calculates position offset of the facial image in consecutive frame image.

In one embodiment, the module 43 of choosing is also used to according to the clear of the frame image for including in a set Degree and/or resolution ratio, choose multiple frame images from the set.

In one embodiment, the locating module 44 be also used to calculate the facial image of target person with from each collection Close the similarity of the facial image in each frame image chosen；If the facial image of target person gathers selection with from one The similarity of facial image in each frame image is greater than given threshold, it is determined that the collection of the target person in the video Occur in conjunction；Obtain the frame number and/or corresponding playing time for occurring that the set of the target person includes.

The function of each module in each device of the embodiment of the present invention may refer to the corresponding description in the above method, herein not It repeats again.

Fig. 5 shows the structural block diagram of the video positioning apparatus according to an embodiment of the present invention based on recognition of face.Such as Fig. 5 institute Show, which includes: memory 910 and processor 920, and the calculating that can be run on processor 920 is stored in memory 910 Machine program.The processor 920 realizes that the affairs in above-described embodiment submit method when executing the computer program.It is described to deposit The quantity of reservoir 910 and processor 920 can be one or more.

The device further include:

Communication interface 930 carries out data interaction for being communicated with external device.

Memory 910 may include high speed RAM memory, it is also possible to further include nonvolatile memory (non- Volatile memory), a for example, at least magnetic disk storage.

If memory 910, processor 920 and the independent realization of communication interface 930, memory 910,920 and of processor Communication interface 930 can be connected with each other by bus and complete mutual communication.The bus can be Industry Standard Architecture Structure (ISA, Industry Standard Architecture) bus, external equipment interconnection (PCI, Peripheral Component) bus or extended industry-standard architecture (EISA, Extended Industry Standard Component) bus etc..The bus can be divided into address bus, data/address bus, control bus etc..For convenient for expression, Fig. 5 In only indicated with a thick line, it is not intended that an only bus or a type of bus.

Optionally, in specific implementation, if memory 910, processor 920 and communication interface 930 are integrated in one piece of core On piece, then memory 910, processor 920 and communication interface 930 can complete mutual communication by internal interface.

The embodiment of the invention provides a kind of computer readable storage mediums, are stored with computer program, the program quilt Processor realizes any method in above-described embodiment when executing.

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.Moreover, particular features, structures, materials, or characteristics described It may be combined in any suitable manner in any one or more of the embodiments or examples.In addition, without conflicting with each other, this The technical staff in field can be by the spy of different embodiments or examples described in this specification and different embodiments or examples Sign is combined.

In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance Or implicitly indicate the quantity of indicated technical characteristic." first " is defined as a result, the feature of " second " can be expressed or hidden It include at least one this feature containing ground.In the description of the present invention, the meaning of " plurality " is two or more, unless otherwise Clear specific restriction.

Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, to execute function, this should be of the invention Embodiment person of ordinary skill in the field understood.

Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable read-only memory (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other suitable Jie Matter, because can then be edited, be interpreted or when necessary with other for example by carrying out optical scanner to paper or other media Suitable method is handled electronically to obtain described program, is then stored in computer storage.

It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..

Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.

It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in a processing module It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In readable storage medium storing program for executing.The storage medium can be read-only memory, disk or CD etc..

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in its various change or replacement, These should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the guarantor of the claim It protects subject to range.

Claims

1. a kind of video locating method based on recognition of face characterized by comprising

Target following is carried out to each frame image for including facial image, each frame image of the facial image including same personage is made Gather for one；

Multiple frame images are chosen from each set；

Compare the facial image of target person with from the facial image in each frame image that each set is chosen, with the determination mesh The position that mark personage occurs in the video.

2. the method according to claim 1, wherein to include facial image each frame image carry out target with Track, comprising:

3. according to the method described in claim 2, it is characterized in that, using core correlation filtering, to including facial image Each frame image carries out target following, comprising:

Detect position of the facial image in each frame image；

Calculate position offset of the facial image in consecutive frame image；

If position offset be less than given threshold, the consecutive frame image is determined as include same personage facial image Frame image.

4. according to the method described in claim 3, it is characterized in that, calculating positional shift of the facial image in consecutive frame image Amount, comprising:

If consecutive frame image occurs stretching or scale, after the coordinate alignment in consecutive frame image, then facial image is calculated Position offset in consecutive frame image.

5. method according to claim 1 to 4, which is characterized in that choose multiple frame figures from each set Picture, comprising:

According to the clarity and/or resolution ratio of the frame image for including in a set, multiple frame images are chosen from the set.

6. method according to claim 1 to 4, which is characterized in that compare the facial image of target person with Facial image from each frame image that each set is chosen, the position occurred in the video with the determination target person It sets, comprising:

If the facial image of target person is greater than with from the similarity of the facial image in each frame image that a set is chosen Given threshold, it is determined that the target person occurs in the set of the video；

7. a kind of video positioning apparatus based on recognition of face characterized by comprising

Face recognition module carries out recognition of face for each frame image to video, obtains each frame image including facial image；

Target tracking module, for carrying out target following to each frame image for including facial image, by the people including same personage Each frame image of face image is gathered as one；

Module is chosen, for choosing multiple frame images from each set；

Locating module, for compare the facial image of target person with from the face figure in each frame image that each set is chosen Picture, the position occurred in the video with the determination target person.

8. device according to claim 7, which is characterized in that the target tracking module is also used to using core correlation filtering Algorithm carries out target following to each frame image for including facial image.

9. device according to claim 8, which is characterized in that the target tracking module is also used to detect facial image and exists Position in each frame image；Calculate position offset of the facial image in consecutive frame image；It is set if position offset is less than Determine threshold value, then the consecutive frame image is determined as include the facial image of same personage frame image.

10. device according to claim 9, which is characterized in that if the target tracking module is also used to consecutive frame figure As occurring stretching or scale, then after being aligned the coordinate in consecutive frame image, then facial image is calculated in consecutive frame image Position offset.

11. device according to any one of claims 7 to 10, which is characterized in that the selection module is also used to basis The clarity and/or resolution ratio for the frame image for including in one set, choose multiple frame images from the set.

12. device according to any one of claims 7 to 10, which is characterized in that the locating module is also used to calculate The facial image of target person and the similarity from the facial image in each frame image that each set is chosen；If target person Facial image with from one set choose each frame image in facial image similarity be greater than given threshold, it is determined that institute Target person is stated to occur in the set of the video；Obtain occur frame number that the set of the target person includes with/ Or corresponding playing time.

13. a kind of video positioning apparatus based on recognition of face characterized by comprising

One or more processors；

Storage device, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processors Realize such as method described in any one of claims 1 to 6.

14. a kind of computer readable storage medium, is stored with computer program, which is characterized in that the program is held by processor Such as method described in any one of claims 1 to 6 is realized when row.