CN114449155A

CN114449155A - Camera device-based holder control method and holder control device

Info

Publication number: CN114449155A
Application number: CN202011201537.6A
Authority: CN
Inventors: 翟新刚; 张楠赓
Original assignee: Canaan Bright Sight Co Ltd
Current assignee: Canaan Bright Sight Co Ltd
Priority date: 2020-11-02
Filing date: 2020-11-02
Publication date: 2022-05-06

Abstract

The application discloses a camera device-based holder control method and a holder control device. The specific implementation scheme is as follows: the method comprises the following steps: acquiring an image by using a first camera device arranged on a cloud deck, wherein the cloud deck is connected with a second camera device; detecting a tracking target in the image and extracting characteristic information of the tracking target; under the condition that the similarity between the characteristic information of the tracking target and the pre-stored characteristic information is greater than a threshold value, calculating the coordinate difference between the tracking target and the central position of the image; and controlling the holder to move according to the coordinate difference so that the second camera device shoots the tracking target. In the whole control process, need not to download application, need not connect cloud platform and second camera device through the bluetooth, various live broadcast platforms of compatible improve the degree of accuracy of cloud platform control.

Description

Camera device-based holder control method and holder control device

Technical Field

The application relates to the field of computer vision, in particular to the field of camera device-based holder control.

Background

With the rapid increase of network bandwidth, the demands of online live broadcast and real-time video are more and more popularized, and live broadcast and shooting pan-tilt are in line with the increase of network bandwidth. The holder is a supporting device for mounting and fixing the camera device. The control of the holder directly determines the quality of the follow-shot effect. At present, most of the mobile phones are connected by Bluetooth, an open source tracking algorithm built in the mobile phones is called through an application program (APP) on the mobile phones, the difference between a shot target and the center position of a screen is calculated, the difference is sent to a cloud deck through a Bluetooth signal, and then the cloud deck is controlled to work, so that the shot target appears at the center position of the screen.

However, in the existing cradle head control method, before any live platform is started and a target is followed and photographed, the corresponding APP needs to be downloaded again to control the work of the cradle head, so that smooth follow and photographing are facilitated, moreover, only a few live platforms can be supported, the compatibility is poor, and the consumer experience is poor due to complex steps.

Disclosure of Invention

The embodiment of the application provides a control method based on a camera device and a holder control device, which are used for solving the problems in the related art, and the technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a control method based on an image capturing apparatus, including:

acquiring an image by using a first camera device arranged on a cloud deck, wherein the cloud deck is connected with a second camera device;

detecting a tracking target in the image and extracting characteristic information of the tracking target;

under the condition that the similarity between the characteristic information of the tracking target and the pre-stored characteristic information is greater than a threshold value, calculating a coordinate difference between the tracking target and the image center position;

and controlling the holder to move according to the coordinate difference so that the second camera device shoots the tracking target.

In one embodiment, the center position of the image corresponding to the first imaging device coincides with the center position of the image corresponding to the second imaging device.

In one embodiment, the detecting a tracking target in the image and extracting feature information of the tracking target includes:

in a case where it is detected that the tracking target in the image is a face, facial feature information is extracted.

In one embodiment, the calculating a coordinate difference between the tracking target and the image center position in a case where a similarity between the feature information of the tracking target and pre-stored feature information is greater than a threshold value includes:

in a case where the degree of similarity between the extracted facial feature information and the pre-stored facial feature information is greater than a first threshold value, a coordinate difference between the face to the image center position is calculated as a first coordinate difference.

In one embodiment, the method further comprises:

in a case where it is not detected that the tracking target in the image is a face and it is detected that the tracking target in the image is a body, body feature information is extracted.

In one embodiment, the method further comprises:

and calculating a coordinate difference between the body and the image center position as a second coordinate difference under the condition that the similarity between the extracted body characteristic information and the pre-stored body characteristic information is greater than a second threshold value.

In one embodiment, the calculating a coordinate difference between the body and the center position of the image in the case that the similarity between the extracted body feature information and the pre-stored body feature information is greater than a second threshold value includes:

and combining the extracted body characteristic information with the pre-stored body characteristic information: comparing the face correcting standing posture information, the face correcting sitting posture information, the face correcting squatting posture information, the back face standing posture information, the back face sitting posture information and the back face squatting posture information one by one to obtain a plurality of corresponding similarities, and selecting the maximum similarity;

and calculating the coordinate difference from the body posture corresponding to the maximum similarity to the image center position under the condition that the maximum similarity is larger than the second threshold value.

In one embodiment, further comprising:

detecting a head in a case where it is not detected that a tracking target in the image is a face and it is not detected that the tracking target in the image is a body;

in a case where the head is detected, a coordinate difference between the head to the image center position is calculated as a third coordinate difference.

In one embodiment, the method further comprises:

in the case where the head appears at the center position of the image corresponding to the second image pickup device, the step of extracting facial feature information is returned to, or

And executing the step of extracting the body characteristic information.

In a second aspect, an embodiment of the present application provides a pan/tilt control apparatus, including:

the image acquisition module is used for acquiring an image by utilizing a first camera device arranged on a cloud deck, and the cloud deck is connected with a second camera device;

the target detection module is used for detecting a tracking target in the image and extracting the characteristic information of the tracking target;

the coordinate difference calculation module is used for calculating the coordinate difference between the tracking target and the central position of the image under the condition that the similarity between the characteristic information of the tracking target and the pre-stored characteristic information is greater than a threshold value;

and the holder control module is used for controlling the movement of the holder according to the coordinate difference so as to enable the second camera device to shoot the tracking target.

In one embodiment, the object detection module comprises:

and the face detection sub-module is used for extracting the facial feature information under the condition that the tracking target in the image is detected to be a face.

In one embodiment, the coordinate difference calculation module includes:

a first coordinate difference calculation sub-module for calculating a coordinate difference between the face and the image center position as a first coordinate difference in a case where a similarity between the extracted facial feature information and pre-stored facial feature information is greater than a first threshold value.

In one embodiment, the object detection module further comprises:

and the body detection sub-module is used for extracting body characteristic information under the condition that the tracking target in the image is not detected to be a face and the tracking target in the image is detected to be a body.

In one embodiment, the coordinate difference calculation module includes:

and the second coordinate difference calculating submodule is used for calculating the coordinate difference between the body and the central position of the image as a second coordinate difference under the condition that the similarity between the extracted body characteristic information and the pre-stored body characteristic information is greater than a second threshold value.

In one embodiment, the second coordinate difference calculation submodule includes:

a plurality of similarity calculation units for comparing the extracted body feature information with pre-stored body feature information: comparing the front face standing posture information, the front face sitting posture information, the front face squatting posture information, the back face standing posture information, the back face sitting posture information and the back face squatting posture information one by one to obtain a plurality of corresponding similarities, and selecting the maximum similarity;

and a second coordinate difference calculation unit, configured to calculate a coordinate difference from the body posture corresponding to the maximum similarity to the image center position when the maximum similarity is greater than the second threshold.

In one embodiment, the object detection module further comprises: a head detection sub-module configured to detect a head when it is not detected that the tracking target in the image is a face and it is not detected that the tracking target in the image is a body;

the coordinate difference calculation module further includes: a third coordinate difference calculation sub-module for calculating a coordinate difference between the head portion and the image center position as a third coordinate difference in a case where the head portion is detected.

In one embodiment, the method further comprises:

a feedback module for returning to execute the step of extracting the facial feature information when the head appears at the center of the image corresponding to the second camera, or

And executing the step of extracting the body characteristic information.

In a third aspect, an electronic device is provided, including:

at least one processor; and a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any of the methods described above.

In a fourth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any of the above.

One embodiment in the above application has the following advantages or benefits: the first camera device is arranged on the cloud deck to acquire images in real time, a tracking target in the images is detected, the cloud deck is controlled to move through the coordinate difference between the tracking target and the image center position, the second camera device is driven to move, and the tracking target always appears in the image center position in the shooting process of the second camera device. In the whole control process, need not to download application, need not connect cloud platform and second camera device through the bluetooth, various live broadcast platforms of compatible improve the degree of accuracy of cloud platform control.

Other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a schematic diagram of a pan-tilt control method based on an image pickup apparatus according to an embodiment of the present application;

fig. 2 is a schematic diagram of a pan/tilt control method based on an image capturing apparatus according to another embodiment of the present application;

fig. 3 is a schematic diagram of a pan/tilt control method based on an image capturing apparatus according to another embodiment of the present application;

FIG. 4 is a schematic view of a pan and tilt head control apparatus according to an embodiment of the present application;

fig. 5 is a schematic view of a pan/tilt head control apparatus according to another embodiment of the present application;

fig. 6 is a block diagram of an electronic device for implementing a pan/tilt control method based on an image capturing apparatus according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

As shown in fig. 1, in an embodiment, a pan-tilt control method based on an image capturing apparatus is provided, which includes the following steps:

step S110: acquiring an image by using a first camera device arranged on a cloud deck, wherein the cloud deck is connected with a second camera device;

step S120: detecting a tracking target in the image and extracting characteristic information of the tracking target;

step S130: under the condition that the similarity between the characteristic information of the tracking target and the pre-stored characteristic information is greater than a threshold value, calculating a coordinate difference between the tracking target and the central position of the image;

step S140: and controlling the holder to move according to the coordinate difference so that the second camera device shoots the tracking target.

In one example, in the process of follow-shooting, the pan-tilt can be controlled to realize large-scale scanning monitoring, so that the control of the pan-tilt determines the follow-shooting effect. The pan-tilt control method provided by the embodiment can quickly identify the tracked target and timely control the pan-tilt to move, so that the camera device can timely capture the tracked target, and the tracked target is located at the center of the image. Simultaneously, need not the user and download any APP, need not to open wireless connection camera devices such as bluetooth, compatible more live platform.

Specifically, a first camera shooting device is arranged on the holderThe first camera device is used for capturing the picture of the target area in time and acquiring the image. The second camera device connected with the pan-tilt can be a follow-up shooting device such as a mobile phone, an IPAD, a camera or a video camera. Before the follow-up shooting is carried out, some characteristic information of the tracking target, such as face information, various posture information of pedestrians, face information of animals, various posture information of animals and the like, can be stored in a storage of the holder in advance in a mode of importing pictures containing faces, faces of animals and various postures. And the cloud platform detects the tracking target of the image acquired in real time and extracts the characteristic information of the tracking target. And comparing the extracted characteristic information of the tracking target with the pre-stored characteristic information to determine whether the currently detected tracking target is a target to be followed and shot. And if the similarity obtained by comparison is greater than the threshold value, determining the detected tracking target as a target to be followed, and calculating the coordinate difference between the tracking target and the central position of the image. And controlling the motion of the pan-tilt according to the coordinate difference, taking a two-axis pan-tilt as an example, firstly, converting an image coordinate system into a two-axis pan-tilt coordinate system. For example, 15 to 20 black and white chessboard images at different angles are shot, and the camera distortion parameter q (q) is solved by using openc, namely a calibration function in a cross-platform computer vision and machine learning software library issued based on BSD license (open source)₀,q₁,q₂,q₃) And solving a transformation matrix r from the image coordinate system to the two-axis pan-tilt coordinate system according to q:

and solving the rotation quantity m of the two-axis pan-tilt head according to the position difference delta of the image picture and the conversion matrix r, and assuming that the position of the detected tracking target in the image coordinate system is (x, y) and the central position of the image is a constant (x0, y0), the coordinate difference in the image coordinate system can be expressed as delta-

x0)/x0, (y-y 0)/y0,1), the rotation amount m of the two-axis pan-tilt is r delta. And according to m, the rotation of the two-axis tripod head can be controlled. The cradle head rotates to enable the second camera device to move along, and in the picture shot by the second camera device, the tracking target is located at the center position of the picture, so that the position of the tracking target is captured in time.

In the embodiment, the first camera device is arranged on the cloud deck to acquire images in real time, the tracking target in the images is detected, and the cloud deck is controlled to move through the coordinate difference between the tracking target and the central position of the images, so that the second camera device is driven to move, and the tracking target always appears in the central position of the images in the shooting process of the second camera device. In the whole control process, need not to download application, need not connect cloud platform and second camera device through the bluetooth, various live broadcast platforms of compatible improve the degree of accuracy of cloud platform control.

In one example, the image center position corresponding to the first imaging device coincides with the image center position corresponding to the second imaging device, so that the synchronism of the first imaging device and the second imaging device can be better improved.

In one embodiment, as shown in fig. 2, step S120 includes:

step S121: in the case where the tracking target in the detection image is a face, facial feature information is extracted.

In one example, the detected tracking target has a different face due to the difference in the tracking target, for example, the face may be a human face or faces of various animals. As shown in fig. 3, the present embodiment will be described using a human face as an example. Firstly, carrying out face detection on an image acquired by a first camera device on a holder, and if N faces are found, carrying out feature extraction on the N faces to obtain face feature information corresponding to the N faces.

In one embodiment, as shown in fig. 2, step S130 includes:

step S131: in a case where the degree of similarity between the extracted facial feature information and the pre-stored facial feature information is greater than a first threshold value, a coordinate difference between the face to the image center position is calculated as a first coordinate difference.

In one example, as shown in fig. 3, the face feature information corresponding to N extracted faces is compared with the pre-stored face feature information (i.e., the face feature information of the target to be followed) one by one to obtain N similarity degrees, and the largest similarity degree is found and compared with the first threshold value. And if the maximum similarity is larger than a first threshold value, calculating a coordinate difference between the face position corresponding to the maximum similarity and the image center position as a first coordinate difference. And controlling the holder to move according to the first coordinate difference so that the second camera device shoots the tracking target, wherein the face of the person is positioned in the center of the image in the image shot by the second camera device. And returning to acquire the image again if the maximum similarity is smaller than the first threshold. Through the detection to the people's face to according to the control of first coordinate difference to cloud platform motion, make the face can be in time taken in the accurate tracking of second camera device, improved with the real-time and the accuracy of shooing.

The first threshold value can be preset and stored in the holder in advance, and the value of the first threshold value is taken according to the requirement.

In one embodiment, as shown in fig. 2, step S120 further includes:

step S122: in the case where the tracking target in the image is not detected as a face, and the tracking target in the image is detected as a body, body feature information is extracted.

In one example, as shown in fig. 3, if a human face is not detected in the image, human detection is performed. If N human bodies are detected, feature extraction is performed on the N human bodies. Specifically, when extracting the features of N human bodies, extracting feature information of multiple postures corresponding to each human body, for example, standing posture feature information of the human body, sitting posture feature information of the human body, squatting posture feature information of the human body, and the like; or the face can be combined with the face standing position characteristic information, the face sitting position characteristic information, the face squatting position characteristic information, the face back standing position characteristic information, the face back sitting position characteristic information, the face back squatting position characteristic information and the like; alternatively, feature information of different parts of the human body, for example, posture feature information of the upper half body, posture feature information of the lower half body, posture feature information of the arms, posture feature information of both legs, posture feature information of both feet, and the like may be extracted. In order to improve the accuracy of tracking target identification, the more abundant the extracted body characteristic information, the better.

In one embodiment, step S130 further includes:

step S132: and calculating a coordinate difference between the body and the center position of the image as a second coordinate difference under the condition that the similarity between the extracted body characteristic information and the pre-stored body characteristic information is greater than a second threshold value.

In one example, as shown in fig. 3, the extracted feature information of N human bodies is compared with the pre-stored human body feature information (i.e., the human body feature information of the target to be followed) one by one. For example, every detected human: the extracted standing posture characteristic information, the extracted sitting posture characteristic information and the extracted squatting posture characteristic information are respectively compared with the prestored body characteristic information to obtain three similarity degrees. As another example, each human detected: the extracted face-correcting standing posture characteristic information, the extracted face-correcting sitting posture characteristic information, the extracted face-correcting squatting posture characteristic information, the extracted back face standing posture characteristic information, the extracted back face sitting posture characteristic information and the extracted back face squatting posture characteristic information are respectively compared with prestored body characteristic information, and six similarities are obtained in total. N person bodies are N x 6 similarities. And finding the maximum similarity among the plurality of similarities, and if the maximum similarity is greater than a second threshold, calculating a coordinate difference between the human body position corresponding to the maximum similarity and the image center position as a second coordinate difference. And controlling the movement of the holder according to the second coordinate difference so that the second camera device shoots the tracking target, wherein the human body is positioned in the center of the image in the image shot by the second camera device. And returning to acquire the image again if the maximum similarity is smaller than the second threshold. Through the detection on the human body and the control on the movement of the holder according to the second coordinate difference, the second camera device can timely and accurately track and shoot the human body, and the real-time performance and the accuracy of the follow shot are improved.

The second threshold value can be preset and stored in the holder in advance, and the value of the second threshold value is taken according to the requirement.

In one embodiment, step S132: the method comprises the following steps:

and under the condition that the maximum similarity is greater than a second threshold value, calculating the coordinate difference from the body posture corresponding to the maximum similarity to the image center position.

In one example, the extracted front face standing posture information, front face sitting posture information, front face squatting posture information, back face standing posture information, back face sitting posture information and back face squatting posture information may be a front face or a back face of a human face or a front face or a back face of an animal, and are all within the protection scope of the embodiment.

In one embodiment, as shown in fig. 2, step S120 further includes:

step S123, under the condition that the tracking target in the image is not detected as a face and the tracking target in the image is not detected as a body, detecting the head;

in the case where the head is detected, a coordinate difference between the head to the center position of the image is calculated as a third coordinate difference S124.

In one example, as shown in fig. 3, when neither a human face nor a human body is detected in an image captured by the first image capture device, the head of the human is detected. If a head is detected, a coordinate difference before the position of the head and the image center position is calculated as a third coordinate difference. And controlling the movement of the holder according to the third coordinate difference so that the second camera device shoots the tracking target, wherein the head is positioned at the center of the image in the image shot by the second camera device. If no head is detected, return is made to reacquiring the image. Through the detection to the head to according to the control of third coordinate difference to cloud platform motion, make the head can be in time taken in the tracking of second camera device accuracy, improved with the real-time and the accuracy of shooing.

In the above embodiment, through algorithms such as face identification, human body identification, people's head detection, the reliability is than only using this single index of people's face and will be higher a lot, can directly realize following in real time and clapping through a camera device direct control cloud platform moreover, promotes by a wide margin and follows the bat effect. The dependence degree on the face is prevented from being high, once the face is lost (for example, the face is easy to go out of the mirror when the face is stood by sitting or turned around, and the like), the pan-tilt cannot be controlled, and the real-time follow-up shooting effect is poor. Still compatible all live broadcast platforms, do not influence the beautiful face effect of platform APP, do not have the puzzlement of connecting bluetooth and use scheme merchant APP.

In one embodiment, as shown in fig. 2, the method further includes:

The step of extracting the body feature information is performed.

In one example, as shown in fig. 3, in the case of detecting a head, by controlling the pan/tilt head so that the head appears at the center position of the image in the second camera, in order to further confirm the tracking target, a human face of the head may be detected, the detection process is the same as in the above-mentioned embodiment, or a body under the head is detected, the detection process is the same as in the above-mentioned embodiment.

In another embodiment, as shown in fig. 4, there is provided a pan/tilt control apparatus based on an image capturing apparatus, including:

an image obtaining module 110, configured to obtain an image by using a first camera device arranged on a pan/tilt head, where the pan/tilt head is connected to a second camera device;

the target detection module 120 is configured to detect a tracking target in the image and extract feature information of the tracking target;

a coordinate difference calculating module 130, configured to calculate a coordinate difference between the tracking target and the image center position when a similarity between the feature information of the tracking target and the pre-stored feature information is greater than a threshold;

and the pan-tilt control module 140 is configured to control the pan-tilt to move according to the coordinate difference, so that the second camera shoots the tracking target.

In one embodiment, as shown in FIG. 5, the object detection module 120 includes:

a face detection sub-module 121 configured to extract facial feature information in a case where the tracking target in the detection image is a face.

In one embodiment, as shown in fig. 5, the coordinate difference calculation module 130 includes:

a first coordinate difference calculating sub-module 131 for calculating a coordinate difference between the face and the center position of the image as a first coordinate difference in case that the degree of similarity between the extracted facial feature information and the pre-stored facial feature information is greater than a first threshold value.

In one embodiment, as shown in fig. 5, the object detection module 120 further comprises:

the body detection sub-module 122 is configured to extract body feature information when the tracking target in the image is not detected as a face and the tracking target in the image is detected as a body.

and a second coordinate difference calculating sub-module 132 for calculating a coordinate difference between the body and the center position of the image as a second coordinate difference in case that the similarity between the extracted body feature information and the pre-stored body feature information is greater than a second threshold.

In one embodiment, the second coordinate difference calculation submodule 132 includes:

a plurality of similarity calculation units for comparing the extracted body feature information with pre-stored body feature information: comparing the front standing posture information, the front sitting posture information, the front squatting posture information, the back standing posture information, the back sitting posture information and the back squatting posture information one by one to obtain a plurality of corresponding similarities, and selecting the maximum similarity;

and the second coordinate difference calculating unit is used for calculating the coordinate difference from the body posture corresponding to the maximum similarity to the image center position under the condition that the maximum similarity and the second threshold are larger than the second threshold.

In one embodiment, as shown in fig. 5, the object detection module 120 further comprises: a head detection sub-module 123 configured to detect a head when it is not detected that the tracking target in the image is a face and it is not detected that the tracking target in the image is a body;

the coordinate difference calculation module 130 further includes: a third coordinate difference calculation submodule 133 for calculating a coordinate difference from the head to the image center position as a third coordinate difference in the case where the head is detected.

In one embodiment, as shown in fig. 5, the method further includes:

a feedback module 150 for returning to execute the step of extracting the facial feature information in case that the head appears at the center position of the image corresponding to the second image pickup device, or

The step of extracting the body feature information is performed.

The functions of each module in each apparatus in the embodiment of the present application may refer to corresponding descriptions in the above method, and are not described herein again.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 6, the embodiment of the present application is a block diagram of an electronic device according to the method for controlling a pan/tilt head of an image capturing apparatus. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display Graphical information for a Graphical User Interface (GUI) on an external input/output device, such as a display device coupled to the Interface. In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. The storage stores instructions executable by the at least one processor, so that the at least one processor executes the pan-tilt control operation method based on the camera device provided by the application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute a pan/tilt control method based on an image pickup apparatus provided by the present application.

The memory 602, as a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to an image capturing apparatus-based pan/tilt control method in the embodiments of the present application (for example, as shown in fig. 4, the image acquisition module 110, the target detection module 120, the coordinate difference calculation module 130, and the pan/tilt control module 140). The processor 601 executes various functional applications and data processing of the server by running non-transitory software programs, instructions and modules stored in the memory 602, that is, implements the camera-based pan-tilt control method in the above method embodiments.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data or the like created according to use of an electronic apparatus based on a pan/tilt control method of an image pickup apparatus. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 may optionally include memory located remotely from the processor 601, which may be connected to the electronic devices via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD) such as a Cr6 star display 6, a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, Integrated circuitry, Application Specific Integrated Circuits (ASICs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (Cathode Ray Tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Network (LAN), Wide Area Network (WAN), and the internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A holder control method based on a camera device is characterized by comprising the following steps:

acquiring an image by using a first camera device arranged on a holder, wherein the holder is connected with a second camera device;

2. The method of claim 1, wherein the image center location corresponding to the first camera coincides with the image center location corresponding to the second camera.

3. The method according to claim 1, wherein the detecting a tracking target in the image and extracting feature information of the tracking target comprises:

4. The method according to claim 3, wherein the calculating a coordinate difference between the tracking target and the image center position in a case where a similarity between the feature information of the tracking target and pre-stored feature information is greater than a threshold value includes:

5. The method of claim 3, further comprising:

6. The method of claim 5, further comprising:

7. The method according to claim 6, wherein the calculating the coordinate difference between the body to the image center position in the case that the similarity between the extracted body feature information and the pre-stored body feature information is greater than a second threshold value comprises:

8. The method of claim 5, further comprising:

9. The method of claim 8, further comprising:

And executing the step of extracting the body characteristic information.

10. A pan/tilt control device, comprising:

and the holder control module is used for controlling the holder to move according to the coordinate difference so as to enable the second camera device to shoot the tracking target.

11. The apparatus of claim 10, wherein the center position of the image corresponding to the first camera coincides with the center position of the image corresponding to the second camera.

12. The apparatus of claim 10, wherein the target detection module comprises:

13. The apparatus of claim 12, wherein the coordinate difference calculation module comprises:

14. The apparatus of claim 12, wherein the target detection module further comprises:

and the body detection submodule is used for extracting body characteristic information under the condition that the tracking target in the image is not detected to be a face and the tracking target in the image is detected to be a body.

15. The apparatus of claim 14, wherein the coordinate difference calculation module comprises:

16. The apparatus of claim 15, wherein the second coordinate difference calculation submodule comprises:

a plurality of similarity calculation units for comparing the extracted body feature information with pre-stored body feature information: comparing the face correcting standing posture information, the face correcting sitting posture information, the face correcting squatting posture information, the back face standing posture information, the back face sitting posture information and the back face squatting posture information one by one to obtain a plurality of corresponding similarities, and selecting the maximum similarity;

17. The apparatus of claim 14, wherein the target detection module further comprises: a head detection sub-module configured to detect a head when it is not detected that the tracking target in the image is a face and it is not detected that the tracking target in the image is a body;

18. The apparatus of claim 17, further comprising:

Executing the step of extracting the body characteristic information.

19. An electronic device, comprising:

wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.