CN116092173A

CN116092173A - Target tracking method, target tracking device, electronic equipment and storage medium

Info

Publication number: CN116092173A
Application number: CN202111272135.XA
Authority: CN
Inventors: 徐海
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2023-05-09

Abstract

The disclosure relates to the technical field of electronic equipment, and in particular relates to a target tracking method, a target tracking device, electronic equipment and a storage medium. A target tracking method, comprising: responding to tracking failure of a tracked target object in the shooting process, and acquiring at least one frame of image to be detected, wherein the image to be detected comprises at least one object; performing image detection on the image to be detected to obtain object characteristics of each object; and determining the object with the object characteristics meeting the preset condition as the target object, and tracking the target object. In the embodiment of the disclosure, the tracking of the target object is automatically restored, the user does not need to manually select the tracked object, and the user experience is improved.

Description

Target tracking method, target tracking device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of electronic equipment, and in particular relates to a target tracking method, a target tracking device, electronic equipment and a storage medium.

Background

The camera target tracking means that the camera continuously tracks a target object in the viewfinder when shooting, so that a focus is always kept on the target object. The camera target tracking function is widely applied to shooting scenes such as video conferences and portrait self-shooting at present. However, in the imaging system, tracking failure may occur during target tracking, resulting in focus loss.

Disclosure of Invention

In order to solve the technical problem of target tracking failure in the tracking process, the embodiment of the disclosure provides a target tracking method, a target tracking device, electronic equipment and a storage medium.

In a first aspect, embodiments of the present disclosure provide a target tracking method, including:

responding to tracking failure of a tracked target object in the shooting process, and acquiring at least one frame of image to be detected, wherein the image to be detected comprises at least one object;

performing image detection on the image to be detected to obtain object characteristics of each object;

and determining the object with the object characteristics meeting the preset condition as the target object, and tracking the target object.

In some implementations, the object features include a first action feature of the object; the determining the object whose object feature satisfies the preset condition as the target object includes:

determining a first similarity between the first action feature of each object and a preset action feature;

and determining an object corresponding to the first object feature of which the first similarity meets a first threshold condition as the target object.

In some embodiments, the determining the object whose object feature satisfies the preset condition as the target object includes:

Acquiring at least one frame of reference image acquired before the tracking failure of the target object;

performing image detection on the reference image to obtain target characteristics of the target object;

determining a second similarity of object features of each of the objects to the target features;

and determining an object corresponding to the object characteristic of which the second similarity meets a second threshold condition as the target object.

In some embodiments, the acquiring at least one frame of reference image acquired before the target object fails to track comprises:

and responding to the start of tracking the target object, and acquiring images at a plurality of moments in the tracking process as the reference images.

In some embodiments, the method of embodiments of the present disclosure further comprises:

acquiring first voice information of the target object acquired before the target object fails to track, and determining first position information of the target object according to the first voice information;

acquiring second voice information of each object acquired after the target object fails to track, and determining second position information of each object according to the second voice information;

And determining an object corresponding to the second position information of which the first position information meets a third threshold condition as the target object.

acquiring user voice information acquired through a microphone, acquiring at least one frame of target image in response to tracking start information acquired according to the user voice information identification; the target image includes at least one object;

performing image detection on the target image, and determining a second action characteristic of each object in the target image;

and determining an object corresponding to the second action characteristic meeting the preset action condition as a target object, and tracking the target object.

acquiring user voice information acquired through a microphone, acquiring at least one frame of target image in response to tracking start information and tracking target information acquired according to the user voice information; the target image includes at least one object;

and determining a target object from at least one object of the target image according to the tracking target information, and tracking the target object.

In a second aspect, embodiments of the present disclosure provide a target tracking apparatus, comprising:

the image acquisition module is configured to respond to tracking failure of a tracked target object in the shooting process, acquire at least one frame of image to be detected, wherein the image to be detected comprises at least one object;

the image detection module is configured to detect the image to be detected to obtain the object characteristics of each object;

and the target tracking module is configured to determine an object with the object characteristics meeting preset conditions as the target object and track the target object.

In some implementations, the object features include a first motion feature of the object, and the target tracking module is specifically configured to:

and determining an object corresponding to the first action characteristic of which the first similarity meets a first threshold condition as the target object.

In some embodiments, the target tracking module is specifically configured to:

In some embodiments, the object tracking device of embodiments of the present disclosure further comprises:

the first acquisition module is configured to acquire first voice information of the target object acquired before the tracking failure of the target object, and determine first position information of the target object according to the first voice information;

a second acquisition module configured to acquire second voice information of each object acquired after the target object fails to track, and determine second position information of each object according to the second voice information;

and a first determining module configured to determine an object corresponding to second position information, for which the first position information satisfies a third threshold condition, as the target object.

a third acquisition module configured to acquire user voice information acquired through a microphone, acquire at least one frame of target image in response to tracking start information acquired according to the user voice information recognition; the target image includes at least one object;

the second determining module is configured to detect the target image and determine a second action characteristic of each object in the target image;

and the third determining module is configured to determine an object corresponding to the second action characteristic meeting the preset action condition as a target object and track the target object.

a fourth acquisition module configured to acquire user voice information acquired through a microphone, acquire at least one frame of target image in response to tracking start information and tracking target information acquired according to the user voice information recognition; the target image includes at least one object;

and a fourth determining module configured to determine a target object from at least one object of the target image according to the tracking target information, and track the target object.

In a third aspect, embodiments of the present disclosure provide an electronic device, including:

an image acquisition device;

a processor; and

a memory storing computer instructions readable by the processor, which when read performs the method according to any of the embodiments of the first aspect.

In a fourth aspect, embodiments of the present disclosure provide a storage medium storing computer instructions for causing a computer to perform the method according to any one of the embodiments of the first aspect.

The target tracking method of the embodiment of the disclosure comprises the steps of responding to tracking failure of a tracked target object in a shooting process, acquiring at least one frame of image to be detected, detecting the image to be detected, obtaining object characteristics of each object, determining the object with the object characteristics meeting preset conditions as a target object, and tracking the target object, wherein the image to be detected comprises at least one object. In the embodiment of the disclosure, in the process of shooting and tracking the target object, when the target object fails to be tracked, the tracking of the target object can be automatically recovered through the object characteristics, the user does not need to manually select the tracked object, and the user experience is improved.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the prior art, the drawings that are required in the detailed description or the prior art will be briefly described, it will be apparent that the drawings in the following description are some embodiments of the present disclosure, and other drawings may be obtained according to the drawings without inventive effort for a person of ordinary skill in the art.

Fig. 1 is a flow chart of a target tracking method in accordance with some embodiments of the present disclosure.

Fig. 2 is a flow chart of a target tracking method in accordance with some embodiments of the present disclosure.

Fig. 3 is a flow chart of a target tracking method in accordance with some embodiments of the present disclosure.

Fig. 4 is a flow chart of a target tracking method in accordance with some embodiments of the present disclosure.

Fig. 5 is a flow chart of a target tracking method in accordance with some embodiments of the present disclosure.

Fig. 6 is a flow chart of a target tracking method in accordance with some embodiments of the present disclosure.

Fig. 7 is a block diagram of a target tracking device in accordance with some embodiments of the present disclosure.

Fig. 8 is a block diagram of a target tracking device in accordance with some embodiments of the present disclosure.

Fig. 9 is a block diagram of a target tracking device in accordance with some embodiments of the present disclosure.

Fig. 10 is a block diagram of a target tracking device in accordance with some embodiments of the present disclosure.

Fig. 11 is a block diagram of an electronic device in accordance with some embodiments of the present disclosure.

Detailed Description

The following description of the embodiments of the present disclosure will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure. In addition, technical features related to different embodiments of the present disclosure described below may be combined with each other as long as they do not make a conflict with each other.

With the development of electronic devices, the image system can realize more and more functions, and the target tracking function is also applied to more and more shooting scenes. In one example scenario, a user uses a mobile phone to self-photograph a section of dance short video, and a camera focus is expected to be always focused on a dancer during photographing, namely, focus tracking is performed on the dancer. In another example scenario, when a user records video of a meeting or lecture, it is desirable to keep the camera focus on the presenter during the recording process, i.e., focus tracking is performed on the presenter. The target tracking function may be employed in both of these scenarios.

In the related art, when tracking and shooting a target object by using a target tracking function, the tracking function of a camera needs to be manually started first, then a certain object is manually clicked in a viewfinder to be used as the tracked target object, and then the tracking and shooting can be started. However, due to the influence of factors such as rapid movement and occlusion of the target object, the target object is easy to fail in tracking, so that the tracking target is lost, and the shooting effect is influenced. If the user wants to resume tracking, he needs to manually click a certain object in the viewfinder again as a target object, so that tracking of the target object can be resumed. The operation steps are complicated, and for a more complex shooting scene, frequent manual recovery tracking is required to capture a target. Secondly, for a scene of self-shooting by fixing the position of the mobile phone by a single person, a user often has difficulty in not only appearing in a viewfinder, but also controlling the mobile phone to resume tracking. Therefore, the target tracking effect in the related art is poor, resulting in poor practicality and user actual experience, and difficult to achieve wide floor application.

Based on the defects in the related art, the embodiment of the disclosure provides a target tracking method, a device, an electronic device and a storage medium, which aim to automatically resume tracking of a target object when tracking failure occurs in the shooting process of the electronic device, and improve shooting effect and user experience.

In a first aspect, embodiments of the present disclosure provide a target tracking method, which may be applied to any electronic device having an image capturing function, such as a smart phone, a tablet computer, a wearable device, a handheld terminal, and the like, which is not limited in this disclosure.

As shown in fig. 1, in some embodiments, an object tracking method of an example of the present disclosure includes:

s110, responding to tracking failure of a tracked target object in the shooting process, and acquiring at least one frame of image to be detected.

S120, performing image detection on the image to be detected to obtain object characteristics of each object.

S130, determining an object with the object characteristics meeting preset conditions as a target object, and tracking the target object.

Specifically, when the electronic device performs target tracking on the target object, during the image capturing process, the camera of the electronic device continuously adjusts the focal position along with the position movement of the target object, so that the focal point of the camera is always focused on the target object. When the target object is temporarily blocked or rapidly moved, the camera may fail to track the target object, and the tracking target is lost, so that the focus cannot be focused on the target object desired to be photographed.

In one exemplary scenario, a fixed-location electronic device a performs object tracking on a user B, and captures a dance video of the user B. In the dancing process, the action and the position of the user B are continuously changed, and when the user B moves fast or is blocked, the electronic equipment A cannot capture the relevant characteristics of the user B between the front frame and the rear frame, so that the tracking of the user B fails, and the user B cannot be shot in a focusing way.

In the embodiment of the disclosure, when the tracking failure of the target object is detected, one or more frames of images to be detected can be acquired. It can be understood that the camera collects images at a fixed frame rate during the image capturing process, and when detecting that tracking of the target object at the current moment fails, one or more frames in the images collected after the current moment can be used as images to be detected.

In the case that the image to be detected is a single frame image, the image to be detected may be an image acquired by the camera at the current time when the tracking failure is detected, or may be an image of a certain frame after the current time. In the case that the image to be measured is a multi-frame image, the multi-frame image to be measured may be a multi-frame continuous image collected by the camera at the current moment when the tracking failure is detected, or may be a multi-frame discontinuous image collected after the current moment. The present disclosure is not limited in this regard.

In addition, the image to be measured is an image obtained when the tracking failure of the target object is detected, so that at least one target object is generally included in the image to be measured. Taking the foregoing scene of shooting the dance video of the user B as an example, when the user B fails to track in the dance process of the electronic device a, the image to be detected collected by the electronic device a still includes the user B.

In a multi-person camera scene, for example, a plurality of objects are typically included on the image to be measured. For example, taking a multi-person conference scenario as an example, in the conference scenario, there will generally be one presenter and multiple conference participants, where the presenter is the target object. When the talkback fails to track, the image to be detected collected by the electronic equipment comprises the talkback and a plurality of conference participants, namely a plurality of objects.

Thus, in embodiments of the present disclosure, at least one object is included in the image to be measured. Of course, it can be understood that, in a single shooting scene of a user, when a target object moves out of the view finding range of the camera to cause tracking failure, no object will exist on an image to be detected acquired by the camera, and when no object exists on the image to be detected, the camera stops the tracking function, which is not described in detail in this disclosure.

After the image to be detected is obtained, the image to be detected can be detected by utilizing an image detection technology, and the object characteristics of each object on the image to be detected are obtained.

The object features represent features of each object obtained by feature extraction of an image to be measured, which may include, for example, one or a combination of a plurality of motion features, facial features, and human body features. When the image to be measured comprises an object, the object characteristics of the object are obtained by extracting the characteristics of the image to be measured. When the image to be measured comprises a plurality of objects, the object characteristics of each object are respectively obtained by extracting the characteristics of the image to be measured.

After the object characteristics of each object are determined, a target object needing to be tracked is determined from at least one object in the image to be detected according to the object characteristics, and then the target object is tracked.

In one exemplary scenario, a multi-person conference scenario is exemplified, a presenter is a target object, and object features are exemplified by action features. When the camera fails to track the speaker, the acquired image to be detected comprises the speaker and a plurality of conference participants, and a plurality of objects are shared. The method comprises the steps of extracting characteristics of an image to be detected to obtain action characteristics of each object, and determining that the action characteristics of a presenter meet preset conditions through action recognition, so that the presenter in the image to be detected is determined to be a target object, and focus tracking of the presenter is restored.

Specifically, the preset condition may be, for example: the identifying action is characterized by waving a hand. When a presenter observes that tracking fails through a conference screen, for example, the presenter can make a waving action towards the camera, and the presenter's action characteristics are recognized and obtained to accord with a preset condition ' waving ' according to action characteristics extracted from an image to be detected acquired by the camera, so that the presenter is determined to be a target object and the presenter is tracked. The following embodiments of the present disclosure are specifically described, and will not be described in detail here.

In some embodiments, the subject features described in this disclosure are not limited to motion features, but may be any other features suitable for implementation, such as facial features, body features, and the like.

For example, in an exemplary scenario, when capturing a dance video of a user B, when a camera fails to track the user B, in order to ensure consistency of the dance video, the user B cannot make a specific action such as waving a hand, so that a target object in an image to be detected can be identified according to facial features and/or human body features of the user B. The following embodiments of the present disclosure are specifically described, and will not be described in detail here.

After confirming the target object in the image to be detected, the target object can be subjected to focus tracking, and tracking shooting of the target object is restored.

As can be seen from the above, in the embodiment of the present disclosure, in the process of capturing and tracking a target object, when the target object fails to be tracked, tracking of the target object can be automatically restored through the object feature, without requiring a user to manually select the tracked object, and user experience is improved. In addition, the embodiment of the disclosure automatically resumes tracking the target object, thereby being more beneficial to shooting scenes by a single person and improving the applicability of the tracking function.

As shown in fig. 2, in some embodiments, the object tracking method of the examples of the present disclosure includes:

s210, determining first similarity between the first action characteristic of each object and a preset action characteristic.

S220, determining an object corresponding to the first action feature of which the first similarity meets the first threshold condition as a target object.

Specifically, taking a multi-person conference scene as an example, it is assumed that 1 speaker and n conference participants are included in the scene, that is, n+1 objects are included in the view range of the camera. When a camera shoots a conference scene, it is desirable that the focus of the camera is focused on a presenter, that is, the presenter is a target object.

In some embodiments, when the tracking of the camera for the presenter fails, at least one frame of the image to be detected may be acquired through the foregoing embodiment of fig. 1, where the image to be detected includes n+1 objects in total. By performing image detection on the image to be detected, the object characteristics of each of n+1 objects can be extracted. In the disclosed embodiments, the object features may include a first motion feature of the object, the first motion feature representing a limb motion of the object.

In one example, the first motion feature may represent an arm motion of the subject. In the embodiment of the disclosure, the preset action can be preset according to a specific scene, for example, lifting a hand, waving a hand, and the like, and the preset action features can be obtained by collecting the image features corresponding to the preset action in advance.

And comparing the similarity between each extracted first action feature and a preset action feature to obtain the first similarity between each first action feature and the preset action feature. The first similarity indicates the similarity degree of the action of the object and the preset action, and the higher the first similarity is, the higher the possibility that the object makes the preset action is, and the opposite is.

In the above example, n+1 first motion features are obtained by feature extraction of n+1 targets of the image to be detected, and then similarity comparison is performed between each first motion feature and a preset motion feature, so that n+1 first similarities can be obtained.

After determining the first similarity corresponding to each object in the image to be detected, determining the object corresponding to the first action feature of which the first similarity meets the first threshold condition as the target object.

In one example, the first threshold condition may be a highest ranked similarity of the plurality of first similarities. In the foregoing example, the obtained n+1 first similarities may be ranked from top to bottom, and the object corresponding to the highest ranked first similarity may be determined as the target object. For example, the preset action is waving, only waving action made by the presenter in the image to be detected, so that the first similarity of the presenter is higher than the first similarity of other meeting personnel, the presenter can be determined as a target object, and further the presenter can be tracked, and target tracking is restored.

It should be noted that the above examples are merely illustrative of the present disclosure and are not limiting on the present disclosure. In other embodiments, the motion feature may be any other suitable form, and is not limited to waving or lifting, for example, the motion may be a motion with interest of hands, such as heart, scissors, etc., which will not be described in detail in this disclosure.

As can be seen from the foregoing, in the embodiments of the present disclosure, tracking shooting of a target object is automatically resumed based on the motion characteristics of the object, and the user does not need to manually select the tracked object, thereby improving user experience.

It can be appreciated that, in the embodiment of fig. 2, tracking of the target object is resumed based on the motion characteristics of the user, and the user is required to actively cooperate to make a specific preset motion, which is difficult to be applied to the dance shooting scene. For example, when a user B desires to capture a video of his own dance, and tracking failure occurs during the capturing process, it is difficult to find that the tracking of the focus fails in time because the user B is in the process of dance, and in order to ensure consistency of dance, a specific preset action such as waving a hand cannot be performed. Thus, automatic recovery of target tracking is achieved for this scenario in the embodiment of fig. 3 described below.

As shown in fig. 3, in some embodiments, the object tracking method of the examples of the present disclosure includes:

s310, acquiring at least one frame of reference image acquired before the tracking of the target object fails.

S320, performing image detection on the reference image to obtain target characteristics of the target object.

S330, determining a second similarity of the object feature of each object and the target feature.

S340, determining an object corresponding to the object characteristics with the second similarity meeting the second threshold condition as a target object.

Specifically, when the camera starts the target tracking function to capture an image, one or more frames of images can be collected in advance as a reference image in the tracking image capturing process. Because the position information of the target object can be continuously acquired by the camera in the target tracking process, the camera can determine the position of the target object in the reference image.

In some embodiments, the reference image may be acquired at an initial stage of enabling the tracking function to track and photograph the target object. For example, after the user enables the target tracking function and selects the target object, the camera may acquire one or more frames of images as reference images. It will be appreciated that the multi-frame reference image may be a continuous frame image or a discontinuous, spaced frame image, as this disclosure is not limited in this regard.

After the reference image is obtained, extracting the characteristics of the target object in the reference image based on the image detection technology to obtain the target characteristics of the target object. The target features may include one or more of facial features, body features, or other identifiable features of the target object. After obtaining the target feature, the electronic device may store the target feature of the target object.

When a failure of tracking the target object is detected at a certain moment, the object feature of each object can be obtained according to the acquired image to be detected based on the embodiment of fig. 1. And then, respectively carrying out similarity comparison on each object feature and the target feature to obtain a second similarity of each object feature and the target feature. The second similarity indicates the similarity degree of the object in the image to be detected and the target object, and the higher the second similarity is, the higher the possibility that the corresponding object and the target object are the same object is, and the opposite is the case.

In some embodiments, the object features may include human body features, i.e., human body features of the target object. In one example scenario, the electronic device captures a double dance video of user B and user C, where user B is a target object for target tracking, so that target features of user B may be obtained in advance through a reference image and stored. When the target tracking fails, the object features of the user B and the object features of the user C are respectively obtained according to the image to be detected, and then the second similarity of the user B and the second similarity of the user C are respectively obtained through similarity comparison.

After obtaining the second similarity corresponding to each object in the image to be detected, determining the object corresponding to the object feature of which the second similarity meets the second threshold condition as the target object.

In one example, the second threshold condition may be a highest ranked similarity of the plurality of second similarities. In the foregoing example, after obtaining the second similarity between the user B and the user C, the second similarity may be ranked from high to low, and the object corresponding to the highest ranked second similarity may be determined as the target object. For example, the second similarity corresponding to the user B is higher than the second similarity corresponding to the user C, so that the user B can be determined as the target object, and further the user B can be tracked, and the target tracking can be recovered.

It is worth noting that the above examples are merely illustrative of the present disclosure, and do not limit the present disclosure. In other embodiments, the object features are not limited to the above-mentioned human body features, but may be any other form suitable for implementation, such as facial features, etc., which will not be described in detail in this disclosure.

As can be seen from the foregoing, in the embodiment of the present disclosure, the target object is determined based on the object feature of the object and the target feature of the reference image, tracking and shooting of the target object is automatically restored, the user does not need to manually select the tracked object, and does not need to make a specific action, so that the whole restoration process can achieve no perception of the user and improve the user experience.

In some embodiments, based on the embodiment of fig. 3, the object tracking method illustrated in the present disclosure further includes:

in response to starting tracking of the target object, images at a plurality of moments in the tracking process are acquired as reference images.

Specifically, in some embodiments, the reference image includes a plurality of frames of images, and the plurality of frames of reference images are acquired images at the start of tracking the target object. In one example, when the electronic device initiates a target tracking function on a target object, multiple frames of images are acquired as reference images at fixed time intervals, e.g., 5 frames of images are acquired as reference images during the first 2 seconds of initiating the target tracking function.

In the embodiment of the disclosure, when the tracking of the target object is just started in the real shooting scene, the action range of the target object is often not large, so that the accuracy of the target characteristics of the target object can be improved by collecting the image when the tracking function is started as the reference image. And the multi-frame images at different moments are used as reference images, the sample number of the reference images is expanded, the characteristics of the target object at multiple angles are acquired, and the problem of inaccurate target characteristics caused by shielding of a single frame image is avoided.

In some embodiments, the electronic device uses a smart phone as an example, and in order to ensure good communication capability, the smart phone often has multiple groups of microphone arrays, and the location of the user direction can be achieved by collecting the voice information of the user through the microphone arrays. Thus, in embodiments of the present disclosure, for an electronic device having a microphone array, the determination of a microphone array positioning assistance target object may be employed, as described below in connection with the embodiment of fig. 4.

As shown in fig. 4, in some embodiments, the object tracking method of the examples of the present disclosure includes:

s410, acquiring first voice information of the target object acquired before the tracking failure of the target object, and determining first position information of the target object according to the first voice information.

S420, acquiring second voice information of each object acquired after the target object fails to track, and determining second position information of each object according to the second voice information.

S430, determining an object corresponding to the second position information of which the first position information meets the third threshold condition as a target object.

Taking a lecture video of a user a, a user B, and a user C taken using an electronic device as an example, in one example, the user a is located on the left side of the camera view range, the user B is located in the middle of the camera view range, and the user C is located on the right side of the camera view range. Wherein, user a is taken as a tracking target object, and user a is taken as a presenter to give lectures to user B and user C.

When the electronic device starts the tracking function to track and shoot the user A, the microphone array of the electronic device can pick up the first voice information of the user A, so that the first position information of the user A is determined to be on the left side based on the first voice information according to a positioning algorithm.

When the camera fails to track the user a, the electronic device may acquire the image to be measured based on the foregoing embodiment. Meanwhile, the microphone array of the electronic device can pick up the second voice information of each object in the current period of time, and determine the second position information of each object according to the second voice information. For example, the microphone array collects two pieces of second voice information of different directions, and determines that the second position information of the corresponding object is "left" and "right" according to the two pieces of second voice information.

After determining each second position information, similarity comparison can be performed between each second position information and the first position information, and the object corresponding to the second position information meeting the third threshold condition is determined as the target object.

In one example, the third threshold condition may be second location information that is closest to the first location information. In the foregoing example, the orientations indicated by the two second position information are "left side" and "right side", respectively, and the orientation indicated by the first position information is "left side". Therefore, the object on the left side in the image to be detected can be determined to be the target object, namely the user A is the target object, and further the user A can be tracked, and target tracking is restored.

It should be noted that, because the microphone positioning is only applicable to the situation that the object position is unchanged, in the embodiment of the present disclosure, the microphone positioning may be used as an auxiliary function, and the user may select to turn on or off according to a specific application scenario, so as to improve flexibility and accuracy of the target tracking method in the embodiment of the present disclosure.

It should be noted that, as is known from the foregoing description of the related art, in the related art, when the target tracking is used, the target tracking function needs to be manually turned on first, and then the tracked target object is manually selected. For example, when shooting with the target tracking function of the mobile phone image, the user first needs to click on an icon to start the camera application, then click on the target tracking function, and then click on the object in the viewfinder again to select the target object. In the process, the user needs to click for multiple times, the operation is complex, and the user experience is reduced. More importantly, for a scene of self-shooting by fixing the position of the mobile phone by a single person, a user often has difficulty in not only appearing in a viewfinder, but also controlling the mobile phone to select a target object, and the tracking start operation can be completed with the aid of a second person.

Based on the drawbacks in the related art described above, in some embodiments, as shown in fig. 5, an object tracking method of an example of the present disclosure includes:

s510, acquiring user voice information acquired through a microphone, and acquiring at least one frame of target image in response to tracking start information acquired according to user voice information recognition.

S520, performing image detection on the target image, and determining a second action characteristic of each object in the target image.

S530, determining an object corresponding to the second action characteristic meeting the preset action condition as a target object, and tracking the target object.

Taking a double-lens video shooting as an example, a user A and a user B fix the electronic device in front, the electronic device can shoot a video including two persons, and the shooting process continuously tracks the user A for shooting.

The electronic device can collect user voice information through the microphone, identify the user voice information, and collect at least one frame of target image through the camera when tracking start information is identified. The tracking initiation information represents information that initiates a target tracking function.

In one example, the user a or the user B may speak a voice such as "start tracking camera" or "start tracking function", so that the electronic device collects the user voice information through the microphone and recognizes the user voice information, and recognizes the "start target tracking function" from the user voice information, that is, recognizes the start tracking information, so that the target tracking function may be started.

In some embodiments, some of the electronic devices in the related art have an intelligent voice assistant, so that a user can use the intelligent voice assistant to implement the start of the target tracking function. For example, taking an intelligent voice assistant of "college classmates", the user a or the user B may say, for example, "college classmates, help me start the tracking shooting function", so as to wake up the intelligent voice assistant, and realize the start of the target tracking function. For the related art of intelligent voice assistant, those skilled in the art will understand with reference to the knowledge of the related art, and this disclosure will not be repeated.

It will be appreciated that after the target tracking function is initiated, it is also necessary to determine the target object to be tracked. In the disclosed embodiments, the target object may be determined based on the motion characteristics of the object in the target image.

Specifically, the electronic device captures at least one frame of target image with the camera while enabling the target tracking function based on the user's voice information, where the target image may include at least one object. And then, carrying out image detection on the target image, determining a second action characteristic of each object in the target image, and determining an object corresponding to the second action characteristic meeting the preset action condition as a target object.

Still described in the foregoing example, after the target tracking function is turned on by the user voice information, the target object needs to be determined from the user a and the user B of the viewfinder. The preset motion condition may be a preset motion condition, such as waving a hand. The acquired target image comprises a user A and a user B, and second action characteristics of the user A and the user B in the target image are respectively determined through an image recognition technology. Assuming that the second motion characteristic of the user a represents waving of hands and the second motion characteristic of the user B represents natural sagging of both hands, it can be determined that the second motion characteristic of the user a meets a preset motion condition, the user a is determined as a target object, and tracking shooting of the user a is started.

According to the method and the device, in the embodiment of the disclosure, the starting of target tracking and the determination of the target object are realized based on the combination of the voice information and the action characteristics, the user does not need to manually start and select the tracked object, and the user experience is improved.

As shown in fig. 6, in some embodiments, the object tracking method of the examples of the present disclosure includes:

s610, acquiring user voice information acquired through a microphone, and acquiring at least one frame of target image in response to tracking start information and tracking target information acquired according to user voice information recognition.

S620, determining a target object from at least one object of the target image according to the tracking target information, and tracking the target object.

Taking a double-lens video shooting as an example, a user A and a user B fix the electronic device in front, the electronic device can shoot a video including two persons, and the shooting process continuously tracks the user A for shooting. The user A is positioned at the left side of the view finding range of the camera of the electronic device, and the user B is positioned at the right side of the view finding range of the camera of the electronic device.

The electronic device can collect user voice information through the microphone, identify the user voice information, and collect at least one frame of target image by the camera when the tracking start information and the tracking target information are identified. The tracking start information indicates information for starting a target tracking function, and the tracking target information indicates information of a target object.

The difference is that the embodiment of fig. 5 is different from the embodiment of fig. 5, in this embodiment, the collected user voice information includes not only tracking start information but also tracking target information. In one example, the user a or the user B may speak, for example, "start tracking camera and track a person on the left side of the screen" or "track a person standing on the left side", or the like, so that the electronic device collects user voice information through the microphone and recognizes the user voice information, and recognizes that "start target tracking function" and "target object is on the left side" are obtained from the user voice information.

The "start target tracking function" is tracking start information, and the electronic device may start the target tracking function. The "target object is located on the left side" is tracking target information, and the electronic device can determine the user A located on the left side in the collected target image as the target object and start tracking shooting on the user A.

As can be seen from the foregoing, in the embodiment of the present disclosure, the starting of target tracking and the determination of the target object are realized based on the voice information, and the user does not need to manually start and select the tracked object, and does not need to make a specific action, thereby improving the user experience.

In a second aspect, embodiments of the present disclosure provide a target tracking apparatus that may be applied to any electronic device having an image capturing function, such as a smart phone, a tablet computer, a wearable device, a handheld terminal, and the like, which is not limited by the present disclosure.

As shown in fig. 7, in some embodiments, an object tracking device of an example of the present disclosure includes:

an image acquisition module 71 configured to acquire at least one frame of image to be detected in response to a tracking failure of a tracked target object in the image capturing process, wherein the voice information image to be detected comprises at least one object;

An image detection module 72 configured to perform image detection on the voice information image to be detected, to obtain object features of each voice information object;

the target tracking module 73 is configured to determine an object whose characteristic satisfies a preset condition as a voice information target object, and track the voice information target object.

In some implementations, the voice information object features include a first action feature of the voice information object, and the voice information object tracking module 73 is specifically configured to:

determining a first similarity between a first action feature of voice information of each voice information object and a preset action feature;

and determining an object corresponding to the first action characteristic of the voice information, the first similarity of which meets the first threshold condition, as a voice information target object.

In some embodiments, the voice information target tracking module 73 is specifically configured to:

acquiring at least one frame of reference image acquired before the tracking failure of the voice information target object;

performing image detection on the voice information reference image to obtain target characteristics of a voice information target object;

determining a second similarity of the object features of each of the speech information objects to the speech information object features;

and determining an object corresponding to the object characteristic of which the second similarity of the voice information meets the second threshold condition as a voice information target object.

in response to starting tracking of the voice information target object, images at a plurality of moments in the tracking process are collected to serve as voice information reference images.

As shown in fig. 8, in some embodiments, the object tracking device of the embodiments of the present disclosure further includes:

a first obtaining module 81 configured to obtain first voice information of a voice information target object collected before a failure of tracking the voice information target object, and determine first position information of the voice information target object according to the first voice information of the voice information;

A second acquisition module 82 configured to acquire second voice information of each voice information object acquired after the voice information target object tracking failure, and determine second position information of each voice information object according to the voice information second voice information;

the first determining module 83 is configured to determine that an object corresponding to the second location information, where the first location information of the voice information satisfies the third threshold condition, is a voice information target object.

As shown in fig. 9, in some embodiments, the object tracking device of the embodiments of the present disclosure further includes:

a third acquisition module 91 configured to acquire user voice information acquired through a microphone, acquire at least one frame of target image in response to tracking start information obtained by user voice information recognition according to the voice information; the voice information target image includes at least one object;

a second determining module 92 configured to perform image detection on the voice information target image and determine a second motion feature of each object in the voice information target image;

the third determining module 93 is configured to determine an object corresponding to the second motion feature that satisfies the preset motion condition as a target object, and track the voice information target object.

As shown in fig. 10, in some embodiments, the object tracking device of the embodiments of the present disclosure further includes:

a fourth acquisition module 11 configured to acquire user voice information acquired through a microphone, acquire at least one frame of target image in response to user voice information recognition based on the voice information to obtain tracking start information and tracking target information; the voice information target image includes at least one object;

the fourth determining module 12 is configured to track the target information according to the voice information, determine a target object from at least one object of the voice information target image, and track the voice information target object.

An image acquisition device;

a processor; and

a memory storing computer instructions readable by a speech information processor, the speech information processor executing the method of speech information according to any of the embodiments of the first aspect when the speech information computer instructions are read.

In a fourth aspect, the disclosed embodiments provide a storage medium storing computer instructions for causing a computer to perform the method according to any one of the embodiments of the first aspect.

A block diagram of an electronic device according to some embodiments of the present disclosure is shown in fig. 11, and the principles related to the electronic device and the storage medium according to some embodiments of the present disclosure are described below with reference to fig. 11.

Referring to fig. 11, the electronic device 1800 may include one or more of the following components: a processing component 1802, a memory 1804, a power component 1806, a multimedia component 1808, an audio component 1810, an input/output (I/O) interface 1812, a sensor component 1816, and a communication component 1818.

The processing component 1802 generally controls overall operation of the electronic device 1800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 1802 may include one or more processors 1820 to execute instructions. Further, the processing component 1802 may include one or more modules that facilitate interactions between the processing component 1802 and other components. For example, the processing component 1802 may include a multimedia module to facilitate interaction between the multimedia component 1808 and the processing component 1802. As another example, the processing component 1802 may read executable instructions from a memory to implement electronic device-related functions.

The memory 1804 is configured to store various types of data to support operations at the electronic device 1800. Examples of such data include instructions for any application or method operating on the electronic device 1800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply assembly 1806 provides power to the various components of the electronic device 1800. The power components 1806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 1800.

The multimedia component 1808 includes a display screen between the electronic device 1800 and the user that provides an output interface. In some embodiments, the multimedia component 1808 includes a front-facing camera and/or a rear-facing camera. When the electronic device 1800 is in an operational mode, such as a shooting mode or a video mode, the front-facing camera and/or the rear-facing camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 1810 is configured to output and/or input audio signals. For example, the audio component 1810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 1800 is in operating modes, such as a call mode, a recording mode, and a speech recognition mode. The received audio signals may be further stored in the memory 1804 or transmitted via the communication component 1818. In some embodiments, audio component 1810 also includes a speaker for outputting audio signals.

The I/O interface 1812 provides an interface between the processing component 1802 and a peripheral interface module, which may be a keyboard, click wheel, button, or the like. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 1816 includes one or more sensors for providing status assessment of various aspects of the electronic device 1800. For example, the sensor assembly 1816 may detect the on/off state of the electronic device 1800, the relative positioning of the components, such as the display and keypad of the electronic device 1800, the sensor assembly 1816 may also detect the change in position of the electronic device 1800 or a component of the electronic device 1800, the presence or absence of a user's contact with the electronic device 1800, the orientation or acceleration/deceleration of the electronic device 1800, and the change in temperature of the electronic device 1800. The sensor assembly 1816 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 1816 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1816 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 1818 is configured to facilitate communication between the electronic device 1800 and other devices, either wired or wireless. The electronic device 1800 may access a wireless network based on a communication standard, such as Wi-Fi,2G,3G,4G,5G, or 6G, or a combination thereof. In one exemplary embodiment, the communication component 1818 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 1818 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 1800 can be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements.

It should be apparent that the above embodiments are merely examples for clarity of illustration and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the present disclosure.

Claims

1. A method of tracking a target, comprising:

2. The method of claim 1, wherein the object features comprise first action features of the object; the determining the object whose object feature satisfies the preset condition as the target object includes:

3. The method according to claim 1, wherein determining the object whose object feature satisfies a preset condition as the target object includes:

4. A method according to claim 3, wherein said acquiring at least one frame of reference image acquired before said target object fails to track comprises:

5. The method according to any one of claims 1 to 4, further comprising:

6. The method according to any one of claims 1 to 5, further comprising:

7. The method according to any one of claims 1 to 5, further comprising:

8. A target tracking device, comprising:

9. An electronic device, comprising:

an image acquisition device;

a processor; and

a memory storing computer instructions readable by the processor, which when read, performs the method of any one of claims 1 to 7.

10. A storage medium having stored thereon computer instructions for causing a computer to perform the method according to any one of claims 1 to 7.