CN113099103A

CN113099103A - Method, electronic device and computer storage medium for capturing images

Info

Publication number: CN113099103A
Application number: CN202010021729.2A
Authority: CN
Inventors: 时红仁; 程帅
Original assignee: Shanghai Pateo Electronic Equipment Manufacturing Co Ltd
Current assignee: Shanghai Pateo Electronic Equipment Manufacturing Co Ltd
Priority date: 2020-01-09
Filing date: 2020-01-09
Publication date: 2021-07-09

Abstract

Methods, electronic devices, and computer storage media for capturing images. The present disclosure relates to a method, apparatus, and computer storage medium for acquiring an image. The method comprises the following steps: acquiring an initial image displayed by the mobile device or the vehicle-mounted camera in response to determining that the voice input is associated with the captured image, the voice input being picked up via a sound pickup of the vehicle or the mobile device; identifying a target photographic object in the initial image based on the voice input; adjusting at least one of a focal length and a shooting orientation of at least one of an onboard camera and a camera of the mobile device based on image characteristics related to the target photographic object; and selecting an focusing point for the target shooting object for shooting the target image. The shooting requirement under the manual operation scene of not being convenient for can be satisfied to this disclosure.

Description

Method, electronic device and computer storage medium for capturing images

Technical Field

The present disclosure relates generally to image processing, and in particular, to methods, electronic devices, and computer storage media for capturing images.

Background

Conventional schemes for acquiring images include, for example: the method comprises the steps of aligning a shooting object by utilizing the mobile equipment, manually adjusting the zooming range, manually selecting a focusing object, and then touching the mobile equipment to shoot an image. The conventional scheme for acquiring images requires a large number of manual operation processes, is relatively complex and time-consuming, and is inconvenient for manually operating the mobile device especially in some scenes, such as on a running vehicle, or when a user wears gloves or the like, so that the user cannot quickly shoot a desired object.

Therefore, in the conventional scheme for acquiring an image, time is consumed because tedious manual zooming, focusing, selecting a focusing object, shooting by clicking and the like are required, and shooting requirements under some scenes inconvenient for manual operation are difficult to meet.

Disclosure of Invention

The present disclosure provides a method of acquiring an image, an electronic device, and a computer storage medium, which can satisfy a shooting demand in a scene inconvenient for manual operation.

According to a first aspect of the present disclosure, a method for capturing an image is provided. The method comprises the following steps: acquiring an initial image displayed by the mobile device or the vehicle-mounted camera in response to determining that the voice input is associated with the captured image, the voice input being picked up via a sound pickup of the vehicle or the mobile device; identifying a target photographic object in the initial image based on the voice input; adjusting at least one of a focal length and a shooting orientation of at least one of an onboard camera and a camera of the mobile device based on image characteristics related to the target photographic object; and selecting an focusing point for the target shooting object for shooting the target image.

Acquiring images according to a second aspect of the present invention, there is also provided an electronic device, the device comprising: a memory configured to store one or more computer programs; and a processor coupled to the memory and configured to execute the one or more programs to cause the apparatus to perform the method of the first aspect of the disclosure.

According to a third aspect of the present disclosure, there is also provided a non-transitory computer-readable storage medium. The non-transitory computer readable storage medium has stored thereon machine executable instructions which, when executed, cause a machine to perform the method of the first aspect of the disclosure.

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the disclosure, nor is it intended to be used to limit the scope of the disclosure.

Drawings

Fig. 1 shows a schematic diagram of a system 100 for a method of acquiring an image according to an embodiment of the present disclosure;

FIG. 2 shows a flow diagram of a method 200 for acquiring an image according to an embodiment of the present disclosure;

FIG. 3 shows a flow diagram of a method 300 for adjusting the focal length of an imaging device according to an embodiment of the disclosure;

FIG. 4 schematically illustrates a method 400 for adjusting an imaging device to a target shooting orientation, in accordance with an embodiment of the present disclosure;

fig. 5 schematically illustrates a schematic diagram of an in-vehicle image pickup apparatus 500 according to an embodiment of the present disclosure;

FIG. 6 shows a flow diagram of a method 600 for adjusting a camera to a target capture orientation according to an embodiment of the disclosure;

fig. 7 schematically shows a schematic diagram of a bottleeck structure 700 of a recognition model for a target photographic object according to an embodiment of the present disclosure;

fig. 8 schematically shows a schematic diagram of a yolov3 network structure of a recognition model for a target photographic object according to an embodiment of the present disclosure;

FIG. 9 schematically shows a schematic of an initial image 900 according to the present disclosure;

FIG. 10 schematically illustrates a schematic of an image 1000 identified via a recognition model according to the present disclosure;

FIG. 11 schematically illustrates a schematic view of a focus adjusted acquired image 1100 according to the present disclosure;

FIG. 12 schematically illustrates a schematic diagram of an acquired image 1200 based on a target shooting orientation, according to an embodiment of the disclosure;

FIG. 13 schematically illustrates a schematic diagram of a method 1300 for focusing according to an embodiment of the present disclosure; and

FIG. 14 schematically illustrates a block diagram of an electronic device 1400 suitable for use to implement embodiments of the present disclosure.

Like or corresponding reference characters designate like or corresponding parts throughout the several views.

Detailed Description

Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The term "include" and variations thereof as used herein is meant to be inclusive in an open-ended manner, i.e., "including but not limited to". Unless specifically stated otherwise, the term "or" means "and/or". The term "based on" means "based at least in part on". The terms "one example embodiment" and "one embodiment" mean "at least one example embodiment". The term "another embodiment" means "at least one additional embodiment". The terms "first," "second," and the like may refer to different or the same object. Other explicit and implicit definitions are also possible below.

As described above, in the above conventional scheme for capturing images, tedious manual zooming, selecting a focusing object, focusing, clicking to shoot, and the like are required, so that time is consumed, and it is difficult to meet shooting requirements in some scenes where manual operations are inconvenient. .

To address, at least in part, one or more of the above problems, as well as other potential problems, example embodiments of the present disclosure propose a scheme for acquiring images. The scheme comprises the following steps: acquiring an initial image displayed by the mobile device or the vehicle-mounted camera in response to determining that the voice input is associated with the captured image, the voice input being picked up via a sound pickup of the vehicle or the mobile device; identifying a target photographic object in the initial image based on the voice input; adjusting at least one of a focal length and a shooting orientation of at least one of an onboard camera and a camera of the mobile device based on image characteristics related to the target photographic object; and selecting an focusing point for the target shooting object for shooting the target image.

In the above-described aspect, when it is determined that the voice input is associated with the captured image, the target photographic subject is determined based on the voice input and the initial image; and adjusting at least one of the focal length and/or the photographing orientation of the lens of the in-vehicle image pickup device 114 or the mobile apparatus 120 for the target photographing object to photograph the target image about the target photographing object, the present disclosure can confirm the photographing intention according to the user voice and zoom, adjust the photographing orientation, and focus based on the image characteristics of the target photographing object associated with the photographing intention in the initial image, without requiring cumbersome manual operations, and thus can achieve satisfaction of the photographing requirement in a scene inconvenient for manual operations.

Fig. 1 shows a schematic diagram of a system 100 for a method of acquiring an image according to an embodiment of the present disclosure. As shown in fig. 1, system 100 includes a vehicle 110, a mobile device 120, and a server 160. In some embodiments, vehicle 110, mobile device 120 of user 122 (e.g., passenger), server 160 may interact data via base station 150, network 140, for example. The vehicle 110 and the mobile device 120 may also interact and share data via wireless communication means such as Wi-Fi, bluetooth, cellular, NFC, etc.

As for the vehicle 110, it includes at least: an in-vehicle computing device 114 (e.g., a car machine), an in-vehicle data aware device, an in-vehicle T-BOX, etc. The vehicle-mounted data sensing equipment is used for sensing data of the vehicle and data of the external environment where the vehicle is located in real time. The in-vehicle data perception apparatus includes at least a plurality of in-vehicle camera devices, which include, for example: a vehicle front camera, a vehicle rear camera, a roof camera 112, and the like. The overhead camera 112 may adjust the focal length of the lens to achieve different focal lengths for the target photographic subject. The overhead camera 112 may also adjust the shooting orientation based on the received drive signal so as to make the position of the target photographic subject in the image meet a predetermined requirement. In some embodiments, the shooting orientation of the overhead camera 112 may cover a panorama of the vehicle exterior environment image. The vehicle 110 and the mobile device 120 may interact and share data through wireless communication means such as Wi-Fi, bluetooth, cellular, NFC, and the like. For example, the mobile device 120 may establish an association with the vehicle 110 by detecting a predetermined action (e.g., shake-shake) on the mobile device 120. By establishing an association with the vehicle 110 through the mobile device 120 by a predetermined action (e.g., pan-pan), an association between the vehicle and the associated mobile device of a particular user (e.g., driver) can be established in a convenient and secure manner for sharing data and computing resources.

The onboard T-BOX is used for data interaction with onboard computing device 114 (e.g., car machine), mobile device 120, server 160. In some embodiments, the onboard T-BOX includes, for example, a SIM card, a GPS antenna, a 4G or 5G antenna, or the like. When a user sends a control command (for example, remotely starting a vehicle, opening an air conditioner, adjusting a seat to a proper position, and the like) through an application program (APP) of the mobile device 120 (for example, a mobile phone), the TSP backend sends a monitoring request instruction to the vehicle-mounted T-BOX, after the vehicle obtains the control command, the vehicle sends a control message through the CAN bus and controls the vehicle, and finally, an operation result is fed back to the APP of the mobile device 120 of the user. Data interaction is realized between the vehicle-mounted T-BOX and the vehicle machine through canbus communication, such as vehicle state information, key state information, control instructions and the like. The vehicle-mounted T-BOX can collect bus data related to buses Dcan, Kcan and PTcan of the vehicle 110.

With respect to the overhead camera 112, in some embodiments, it is used to capture images of the vehicle environment. In some embodiments, the overhead camera 112 includes, for example, a camera, a first rotation device (e.g., 520 in fig. 5), a second rotation device (e.g., 530 in fig. 5), and a lift device (e.g., 540 in fig. 5). Wherein first rotary device can drive the camera and rotate 360 degrees around first axle. The second rotating device may be configured to rotate the camera around a second axis perpendicular to the first axis (the rotation angle range is, for example, between 0 degree and 360 degrees). The lifting device is used for driving the camera to move along the vertical direction. In some embodiments, a lift device is used to extend the overhead camera 112 from within the vehicle body, or retract it from outside the vehicle. The rooftop camera 112 may interact and share data with the mobile device 120 via the vehicle 110 (e.g., the in-vehicle computing device 114 and/or the in-vehicle T-BOX). The top camera 112 may also directly interact and share data with the mobile device 120 through wireless communication means such as Wi-Fi, bluetooth, cellular, NFC, and the like.

With respect to the mobile device 120, it is, for example and without limitation, a cell phone. The mobile device 120 may interact with the vehicular T-BOX directly or with the server 160 via the base station 150 and the network 140. In some embodiments, the mobile device 120 may be a tablet, a cell phone, a wearable device, or the like.

With respect to server 160, it is used, for example, to provide services for the Internet of vehicles. Server 160 interacts data with vehicle 110 and mobile device 120, for example, via network 140, base station 150. In some embodiments, the server 160 may have one or more processing units, including special purpose processing units such as GPUs, FPGAs, ASICs, and the like, as well as general purpose processing units such as CPUs. In addition, one or more virtual machines may also be running on each computing device.

A method for acquiring an image according to an embodiment of the present disclosure will be described below in conjunction with fig. 2, 9 to 12. Fig. 2 shows a flow diagram of a method 200 for acquiring an image according to an embodiment of the present disclosure. Fig. 9 schematically shows a schematic view of an initial image 900 according to the present disclosure. Fig. 10 schematically shows a schematic view of an acquired image 1000 identified via a recognition model according to the present disclosure. Fig. 11 schematically illustrates a schematic view of a focus adjusted acquired image 1100 according to the present disclosure. It should be understood that the method 200 may be performed, for example, at the electronic device 1400 depicted in fig. 14. Or may be performed at mobile device 120 or vehicle 110 (e.g., without limitation, in-vehicle computing device 114 such as a car machine) as described in fig. 1. It should be understood that method 200 may also include additional acts not shown and/or may omit acts shown, as the scope of the disclosure is not limited in this respect.

At block 202, the in-vehicle computing device 114 or the mobile device 120 determines whether the voice input is associated with capturing an image. Voice input is picked up via a microphone of the vehicle 110 or the mobile device 120. For example, user 122 has the camera of mobile device 120 pointed at a distant scene (e.g., the scene in front of FIG. 9) and said "I want to take the car" or "I want to take the tree". At this time, the sound pickup of the vehicle 110 or the mobile device 120 picks up the above-mentioned voice input of the user 122. The in-vehicle computing device 114 or the mobile device 120 may first determine whether the received voice input is associated with capturing an image.

The manner in which it is determined whether a voice input is associated with capturing an image may include a variety of ways. In some embodiments, determining the manner in which the voice input is associated with capturing the image comprises: the in-vehicle computing device 114 or the mobile device 120 extracts acoustic features of the acquired or received voice input; then, determining whether the voice input includes a voice segment associated with a predetermined keyword based on the acoustic feature via a predetermined keyword recognition model, the predetermined keyword recognition model being trained via a plurality of voice sample data on the photographing instruction; and in response to determining that the speech input includes a speech segment included with the predetermined keyword, determining that the speech input is associated with the captured image. By adopting the above means, when it is determined that the voice input is not associated with the captured image, the in-vehicle computing device 114 or the mobile device 120 does not need to perform subsequent recognition processing on the voice input, and computing resources are saved.

In some embodiments, the in-vehicle computing device 114 or the mobile device 120 may determine whether the voice input is associated with capturing an image by interactive question answering, for example, if the in-vehicle computing device 114 or the mobile device 120 determines that the voice input is an answer with respect to a first voice. The first voice is output via a speaker of the vehicle 110 or the mobile device 120. For example, the speaker of the mobile device 120 outputs a first voice for asking a target photographic subject, for example, "please say the photographic subject". If the in-vehicle computing device 114 or the mobile device 120 determines that the user's voice input captured by the mobile device 120 is an answer to the first voice, it may be determined that the user's voice input is associated with capturing an image.

At block 204, if the in-vehicle computing device 114 or the mobile device 120 determines that the voice input is associated with capturing an image, an initial image displayed by the mobile device 120 is obtained. For example, the in-vehicle computing device 114 or the mobile device 120 acquires an image captured by the mobile device while inputting speech, such as the initial image 900 shown in FIG. 9.

At block 206, the in-vehicle computing device 114 or the mobile device 120 identifies a target photographic subject in the initial image based on the voice input.

The manner of determining the target photographic subject may include various ways. In some embodiments, the manner of determining the target photographic subject includes, for example: first, the in-vehicle computing device 114 or the mobile device 120 recognizes a voice input of the user to determine a keyword indicating a target photographic object. For example, the voice input of user 122 is "I want to take the car". The in-vehicle computing device 114 or the mobile device 120 determines, for example, that the keyword indicating the target photographic subject is "car", for example, based on the voice input. In some embodiments, the determined keywords may be attributes (e.g., category, name, etc.) of the target photographic subject.

The in-vehicle computing device 114 or the mobile device 120 then identifies the object and the object class included in the initial image based on the identification model, which is trained via a plurality of image sample data. In some embodiments, the recognition model is used, for example, to detect objects (i.e., objects) of interest in an input picture (e.g., an initial image captured by the mobile device 120), and to output object category and location information. In some embodiments, the recognition model may derive a list of objects, including corresponding object categories and location information. For example, the in-vehicle computing device 114 or the mobile device 120 identifies, based on the identification model, that the initial image includes objects and object classes as indicated in fig. 10, including object 1010 and object 1020, the object class of the object 1010 being, for example, "car," and the object class of the object 1020 being, for example, "tree.

With respect to the detection algorithm of the recognition model, in some embodiments, the detection algorithm of the recognition model employs, for example, a real-time object detection algorithm yolo (third version) framework, that is, yolov3 framework, while the infrastructure network architecture may employ a mobileNetv2 network of a lightweight Machine Learning (ML) device system. The advantage of using the above algorithm is that the image processing speed is fast, for example, the detection speed for each picture on the GPU can reach about 20 milliseconds, and in addition, the detection precision is high, the real-time performance is good, and the algorithm is suitable for fast image detection in a moving vehicle and is matched with the computational power of a mobile device and a vehicle-mounted computing device. The following describes the mobileNetv2 network structure with reference to table one and fig. 7, and describes the yolov3 network structure with reference to fig. 8, which will not be described herein again.

As for the image sample data of the recognition model, for example, the plurality of captured images may be marked manually or by using a tool such as labelme software, the object in the captured image is labeled in a peripheral frame manner, the relative position of the object in the training image (for example, the position of the central point of the peripheral frame) and the object type are returned, and the labeled plurality of captured images are used as the image sample data for training the recognition model.

Thereafter, if the in-vehicle computing device 114 or the mobile device 120 can determine that the object category and the keyword match, the object is determined to be the target photographic object. For example, if the in-vehicle computing device 114 or the mobile device 120 determines that the keyword indicating the target photographic object is "car" based on the voice input, and further determines the object category of the object in the initial image (e.g., the object category "car" of the object 1010 shown in fig. 10 matches the determined keyword "car"), the object 1010 may be determined to be the target photographic object.

At block 208, the in-vehicle computing device 114 or the mobile device 120 adjusts at least one of a focal length and a shooting orientation of at least one of the in-vehicle camera (e.g., the rooftop camera 112) and the camera of the mobile device based on image features associated with the target photographic subject.

In some embodiments, adjusting at least one of the focal length and the photographing orientation of the at least one camera comprises, for example: first, the in-vehicle computing device 114 or the mobile device 120 determines the size scale of the target photographic subject (e.g., 1010 in fig. 10) in the initial image 1000. For example, a peripheral frame of the target photographic subject (e.g., the peripheral frame 1012 in fig. 10) is determined; the size ratio is then determined based on the ratio of the area of the peripheral border (e.g., peripheral border 1012 in fig. 10) to the area of the image as a whole. Then, if the in-vehicle computing apparatus 114 or the mobile apparatus 120 determines that the size ratio does not belong to the predetermined ratio, the focal length of the imaging device of the in-vehicle computing apparatus 114 or the mobile apparatus 120 is adjusted to the target focal length. Thereafter, the in-vehicle computing device 114 or the mobile device 120 determines the position information of the target photographic subject (e.g., 1110 in fig. 11) in the image (e.g., 1100 in fig. 11) generated based on the target focal length. For example, the in-vehicle computing device 114 or the mobile device 120 determines the position information based on the center position of the peripheral frame (e.g., the peripheral frame 1112 in fig. 11) of the target photographic subject (e.g., 1110 in fig. 11) in the image (e.g., 1100 in fig. 11) generated by the target focal length. If the vehicle-mounted computing device 114 or the mobile device 120 determines that the position information is not equal to the preset position), adjusting the camera of the vehicle-mounted computing device 114 or the mobile device 120 to the target shooting orientation. For example, the position information (for example, the center position of the peripheral frame 1212 of the vehicle 1210) where the target photographic subject (for example, the vehicle 1210 in fig. 12) is located in the image (for example, the captured image 1200 shown in fig. 12) captured by the in-vehicle camera 112 in the target photographic orientation is made to be a preset position (for example, 0.5, 0.618). With respect to the predetermined scale described above, in some embodiments, it is preset, for example, or determined via machine learning image features of photographic images stored at the mobile device 120 of the user 122.

Specific ways for adjusting at least one image capturing device to the target zoom ratio and the target shooting orientation will be described below with reference to fig. 3-5, and will not be described herein again.

At block 210, the in-vehicle computing device 114 or the mobile device 120 picks an in-focus point for the target photographic subject for capturing the target image.

In the above-described aspect, when it is determined that the voice input is associated with the captured image, the target photographic subject is determined based on the voice input and the initial image; and adjusting at least one of the focal length and/or the photographing orientation of the lens of the in-vehicle camera 114 or the mobile device 120 for the target photographing object to photograph the target image about the target photographing object, the present disclosure can compose (e.g., zoom, adjust the photographing orientation) and focus by confirming the photographing intention according to the user voice and combining the image characteristics of the target photographing object in the initial image to which the photographing intention is directed, and does not require cumbersome manual zoom, selection of the focus point, and focusing operations, thereby enabling to fulfill the photographing requirement in a scene inconvenient for the manual operation.

Fig. 3 illustrates a flow chart of a method 300 for adjusting a focal length of an image capture device according to an embodiment of the present disclosure. It should be understood that the method 300 may be performed, for example, at the electronic device 1400 depicted in fig. 14. Or may be performed at mobile device 120 or vehicle 110 (e.g., without limitation, in-vehicle computing device 114 such as a car machine) as described in fig. 1. It should be understood that method 300 may also include additional acts not shown and/or may omit acts shown, as the scope of the disclosure is not limited in this respect.

At block 302, the in-vehicle computing device 114 or the mobile device 120 determines whether the target focal length exceeds a predetermined focal length threshold of the lens of the camera. The predetermined focal length threshold is, for example, a predetermined focal length threshold corresponding to a total zoom range threshold (e.g., an optical zoom range) of one or more cameras in the in-vehicle computing device 114 or the mobile device 120. It should be appreciated that optical zooming of an image capture device is achieved using a series of lens elements that can be zoomed in or out by lens movement, without any detriment to the sharpness and resolution of the picture, which captures an image that is closer in effect to the target photographic subject. Therefore, the in-vehicle computing device 114 or the mobile device 120 preferentially photographs the target photographic subject with the optical zoom performed by the movement of the lens, thereby reducing the image enlarged by the digital zoom so that the image of the target photographic subject is blurred.

At block 304, if the in-vehicle computing device 114 or the mobile device 120 determines that the target focal length exceeds a predetermined focal length threshold of a lens of an imaging apparatus of the in-vehicle computing device 114 or the mobile device 120, an image is captured of the target photographic subject based on the predetermined focal length threshold. It should be appreciated that digital zooming does not require additional mechanical modules or lenses for lens movement. The original picture is cut into scenes, and then the cut image is amplified by using an algorithm, so that the image is shot close to the shot target. For example, if the in-vehicle computing device 114 or the mobile device 120 determines that the target focal length exceeds a zoom range threshold of the total optical zoom of one or more lenses of the imaging apparatus, an image is captured based on the zoom range threshold of the optical zoom for further incorporating digital zoom to make up for the lack of the zoom range of the optical zoom that is difficult to meet the capture expectations.

At block 306, the in-vehicle computing device 114 or the mobile device 120 clips an image taken about a target photographic subject based on a predetermined scale (e.g., set in advance, or determined via machine learning image characteristics of photographic images stored at the mobile device 120 of the user 122) to generate a target image. For example, the in-vehicle computing device 114 or the mobile device 120 performs scene segmentation on the zoom range threshold captured image based on the optical zoom, and then enlarges the segmented image using an algorithm, thereby generating a target image. In some real-time embodiments, in order to avoid discarding image information resulting from enlarging the cut image, more pixels may be added to the enlarged image for preserving image details. By adopting the means, the advantages of digital zooming and optical zooming can be simultaneously considered, so that the shooting of the target image is not influenced by the limitation of the threshold value of the optical zooming range.

In some embodiments, if it is determined that the target focal length does not exceed the predetermined focal length threshold of the lens of the image capture device, e.g., within a first predetermined range, the first lens of the image capture device is adjusted to the target focal length. And if the target zoom ratio is determined to be beyond the first preset range but still within the preset focal length threshold value, adjusting a second lens of the camera device to the target focal length. For example, the imaging device of the in-vehicle computing apparatus 114 or the mobile apparatus 120 is provided with a plurality of lens elements corresponding to different optical zoom ranges, respectively, and by the above means, it is possible to perform auto zooming in a wider optical zoom range.

Fig. 4 illustrates a flow chart of a method 400 for adjusting a camera to a target capture orientation in accordance with an embodiment of the disclosure. It should be understood that the method 400 may be performed, for example, at the electronic device 1400 depicted in fig. 14. Or may be performed at mobile device 120 or vehicle 110 (e.g., without limitation, in-vehicle computing device 114 such as a car machine) as described in fig. 1.

At block 402, the in-vehicle computing device 114 or the mobile device 120 determines whether the focal length of at least one camera in the in-vehicle computing device 114 and the mobile device 120 has been adjusted to a target focal length.

At block 404, if the in-vehicle computing device 114 or the mobile device 120 determines that the focal length of the at least one camera has been adjusted to the target focal length, location information of the target photographic subject in an image generated by the at least one camera based on the target focal length is obtained. At block 406, the in-vehicle computing device 114 or the mobile device 120 generates a drive signal for driving a rotation device of the in-vehicle computing device 114 or the mobile device 120 to rotate based on the location information, the predetermined location information, and the initial shooting orientation for adjusting the camera of the in-vehicle computing device 114 or the mobile device 120 to the target shooting orientation.

As for the initial photographing orientation, as described above, for example, the initial photographing orientation at the time when the mobile device 120 or the overhead camera 112 photographs the initial image may be acquired. In some embodiments, if the initial image was captured by the mobile device 120, if the user desires the rooftop camera 112 to capture the target image, the in-vehicle computing device 114 may first obtain an initial capture orientation at which the initial image was captured by the mobile device 120, and then adjust the rooftop camera 112 to an initial capture orientation that matches the initial capture direction of the mobile device 120, such that this orientation is used as a starting point (i.e., the initial capture orientation) for the rooftop camera 112 to adjust the capture orientation. In some embodiments, to determine whether the overhead camera 112 has actually been adjusted to the initial shooting orientation. In some embodiments, to adjust the rooftop camera 112 to an initial camera orientation that matches the initial camera orientation of the mobile device 120, image features of the initial image are then acquired. Thereafter, the in-vehicle computing device 114 may acquire an environmental image captured by the overhead camera 112. If the in-vehicle computing device 114 may determine that the environmental image captured by the overhead camera 112 matches the image characteristics of the initial image captured by the mobile device 120 in the initial shooting orientation, it is determined that the overhead camera 112 has actually been adjusted to the matching initial shooting orientation.

Regarding the location information, for example, if the in-vehicle computing device 114 determines that the location information (e.g., the center location of the peripheral bezel 1112 of 1110 in fig. 11, whose location information is, for example, 0.7, 0.6) of the target photographic subject (e.g., the car 1110 in fig. 11) in the image (e.g., 1100 in fig. 11) generated based on the target focal length is not equal to the predetermined location (e.g., 0.5, 0.618), a first drive signal for driving the first rotating means of the in-vehicle computing apparatus 114 to rotate is generated based on the position information (0.7, 0.6) of the target photographic subject, the predetermined position (0.5, 0.618) and the initial photographic orientation, so that the yaw angle (yaw) of the shooting orientation adjustment of the camera of the in-vehicle computing device 114 is made a, and/or a second drive signal for driving a second rotation device of the in-vehicle computing device 114 to rotate, so that the pitch angle (pitch) of the shooting orientation adjustment of the camera of the in-vehicle computing apparatus 114 is β. So that the position information (e.g., the center position of the peripheral frame 1212 of the vehicle 1210) where the target object (e.g., the vehicle 1210 in fig. 12) is located in the image (e.g., the image 1200 shown in fig. 12) captured by the overhead camera 112 in the target capturing orientation is a preset position (e.g., 0.5, 0.618).

By adopting the above-described means, the in-vehicle computing apparatus 114 or the mobile apparatus 120 can automatically adjust the shooting orientation of the imaging device according to the position difference between the position information of the target shooting object in the image generated based on the target focal length and the expected position.

Fig. 5 schematically illustrates a schematic diagram of an in-vehicle image pickup apparatus 500 according to an embodiment of the present disclosure. It should be understood that the in-vehicle image pickup apparatus 500 may further include additional structures not shown and/or may omit the structures shown, and the scope of the present disclosure is not limited in this respect.

As shown in fig. 5, the overhead camera 500 includes, for example, a camera 510, a first rotating device 520, a second rotating device 530, and a lifting device 540.

The first rotating device 520 may rotate about a first axis (a vertical axis perpendicular to the horizontal plane, e.g., a Z-axis) on a first plane (e.g., the horizontal plane) by 0 to 360 degrees. In some embodiments, the range of rotation of the first rotation device 520 may also be less than 360 degrees. The first rotating device 520 is, for example, connected to a rotating shaft of a first driving source (not shown), and the first rotating device 520 may also be driven to rotate by a rotating shaft 534 of the first driving source (e.g., a first motor 532) via a first transmission mechanism (e.g., a gear or a transmission belt 526) as shown in fig. 5. In some embodiments, the rotation angle of the first rotating device 520 is controlled by the first driving signal. For example, the photographing orientation of the camera 510 is adjusted by a yaw angle (yaw) due to the rotation of the first rotating means 520 driven by the first driving signal.

The second rotation device 530 can rotate 0 to 360 degrees about a second axis (e.g., a horizontal axis parallel to the first plane, the second axis perpendicular to the first axis), and in some embodiments, the rotation range of the second rotation device 530 can also be less than 360 degrees. For example, in a clockwise direction, as indicated by arrow 550 in fig. 5, or in a counter-clockwise direction. The second rotating device 530 is, for example, a second driving source (e.g., a second motor including a rotor and a stator coupled to a rotating shaft), and the rotating shaft of the second rotating device 530 may be coupled to the camera head 510 directly or via a second transmission mechanism (e.g., a gear). The photographing orientation of the camera 510 is rotated with the rotation of the rotation shaft of the second rotating means 530. For example, the photographing orientation of the camera 510 is adjusted by a pitch angle (pitch) due to the rotation of the second rotating means 530 driven by the second driving signal. In some embodiments, the rotation angle of the second rotating device 530 is controlled by the second driving signal. The fixed part of the second rotating means 530 (e.g. the housing of the second motor) is fixedly connected to the supporting means 540.

The housing of the second rotating means 530 is relatively fixedly connected to the first rotating means 520 by means of a supporting means 540. Since the housing of the second rotating device 530 is fixedly connected to the first rotating device 520, and the rotating shaft of the first rotating device 520 is connected to the camera 510, when the first driving source (e.g., the first motor 532) drives the first rotating device 520 to rotate a predetermined angle around the Z-axis, the first rotating device 520 also drives the camera 510 to rotate a predetermined angle around the Z-axis.

By adopting the above means, the first rotating device 520 can drive the camera 510 to rotate around the Z axis (vertical axis, i.e. first axis) perpendicular to the horizontal plane. The second rotating device 530 can rotate the camera 510 around a second axis perpendicular to the first axis.

In some embodiments, the lifting device 540 of the overhead camera 500 can lift or lower the overhead camera 500 in a vertical direction, so as to extend the overhead camera 500 out of the vehicle or retract the overhead camera 500 into the vehicle.

In the above-described aspect, the roof imaging apparatus 500 may adjust the shooting orientation for the panoramic environment according to the driving signal.

Fig. 6 illustrates a flow chart of a method 600 for adjusting a camera to a target capture orientation in accordance with an embodiment of the disclosure. It should be understood that the method 600 may be performed, for example, at the electronic device 1400 depicted in fig. 14. Or may be performed at mobile device 120 or vehicle 110 (e.g., without limitation, in-vehicle computing device 114 such as a car machine) as described in fig. 1. At block 602, the mobile device 120 acquires an initial shooting orientation of the mobile device 120 associated with an initial image (e.g., image 900 of fig. 9), the initial shooting orientation being determined based on detection data of a sensor (e.g., a pose sensor).

At block 604, the mobile device 120 generates voice information indicating movement about the at least one camera based on the target shooting orientation and the initial shooting orientation. For example, the speaker of the mobile device 120 utters the second voice instructing the user to adjust 10 degrees clockwise.

At block 606, the mobile device 120 determines whether its camera has been moved to the target shooting orientation. In some embodiments, the mobile device 120 may confirm whether the camera has been moved to the target photographing orientation by determining whether the position information of the target photographing object is equal to a predetermined position.

At block 608, if the mobile device 120 determines that the camera of the mobile device 120 has been moved to the target shooting orientation, a prompt message is presented, such as the mobile device 120 alerting the user 122 that the mobile device 120 has been adjusted into position by displaying an indication or a voice prompt.

By adopting the above means, even if the automatic direction driving device does not exist in the image pickup device, or the direction adjusting range exceeds the adjusting range of the driving device, the user can be automatically prompted to adjust the direction of the image pickup device, so as to obtain the target image with the expected effect.

The network structure for implementing the recognition model for the target photographic subject is described below with reference to table one and fig. 7 to 8. The network structure of mobileNet-v2 is described below with reference to Table one.

Watch 1

In table one above, input represents the input image size, operator represents the network layer, conv2d represents the CNN layer, and its convolution kernel is 3x 3. The bottleeck represents a basic module unit. In addition, t represents the convolutional layer expansion factor, c represents the number of output channels, n represents the number of times of repeated execution of the bottlenet module, s is the step size of convolutional shift, and k represents the number of object classes.

The input image size, e.g. the first row of table one, is e.g. a 3 channel image of 224 x 224. The corresponding network layer is a CNN layer, and its convolution kernel is 3 × 3. The output is 32 channels. Once performed, the convolution moves by a step size of 2. Since the step size of the shift of the convolution operation of the first row of table one is 2, the input image size of the second row of table one becomes an image of 32 channels which is 112 × 112. The corresponding network layer is a bottomleneck structure (the bottomleneck structure will be described with reference to fig. 8), the expansion factor is 1, the output is 16 channels, and the execution is performed once. The step size of the convolution shift is 1. Since the convolution shift of the second row of table one has a step size of 1, the input image size of the third row of table one becomes an image of 16 channels of 112 x 112. Via the bottleeck process, the spreading factor is 6 and the output is 24 channels. This was performed 2 times. The convolution moves by a step size of 2 and so on … …. avgpool is global average pooling, the size of the pooling kernel is 7x7, and the output is an image of 1x1 of 1280 channels. Then, the network layer is a CNN layer, the convolution kernel of the CNN layer is 1 × 1, and the output k represents the number of object classes.

Fig. 7 schematically illustrates a schematic diagram of a bottleeck structure 700 of a recognition model for a target photographic object according to an embodiment of the present disclosure. In fig. 7, the left side of fig. 7 is the structure when the step size is 1, as shown in the left side of fig. 7, 710 indicates the input, 712 indicates the convolution kernel is a 1 × 1 convolution operation, and the active layer Relu 6 after convolution. 714 indicates a deep convolution (Dwise _ conv) operation with a convolution kernel of 3x3, followed by an activation layer Relu 6. 718 indicates a convolution operation with a convolution kernel of 1x1 followed by a linear transform (linear) operation. 720 indicates an overlay (Add) operation.

The right side of fig. 7 is the structure when the step size is 2, as shown on the right side of fig. 7, 730 indicates the input. 732 indicates that the convolution kernel is a 1x1 convolution operation, and an activation layer Relu 6 after convolution. 734 indicates a deep convolution (Dwise _ conv) operation with a convolution kernel of 3x3, a convolution shift of step size 2, followed by an activation layer Relu 6. 736 indicates a convolution operation with a convolution kernel of 1 × 1 followed by a linear transform (linear) operation.

Fig. 8 schematically shows a schematic diagram of a yolov3 network structure of a recognition model for a target photographic object according to an embodiment of the present disclosure. As shown in fig. 8, 802 indicates an input image. 804 to 812 indicate 5 mobilene-stage, each of which is a process of scaling the input image height and width size to 1/2 original, respectively. For example, the mobilent-stage 1 stage indicated at 804 scales the input image from 224 to 1/2, i.e., 112, for example, the CNN layer operation of the first row in table one scales the corresponding input image from 224 to 112, which can be considered as the mobilent-stage 1 stage. The mobilent-stage 2 indicated at 806 scales the image size output by mobilent-stage 1 from 112 x 112 to 1/2, i.e. 56 x 56. And so on … …. The input image height and width dimensions are scaled to 1/32, respectively, via the mobilent-stage 1 through mobilent-stage 5 stages. The image size is scaled at the mobilent-stage, which is beneficial to improving the processing speed of the identification module of the target shooting object.

In fig. 8, 814, 828, and 842 indicate convolution operations with convolution kernels of 3 × 3, respectively. 816. 830 and 844 respectively indicate convolution operations with convolution kernels of 3x 3. 818. 832 and 846 indicate convolution operations with a convolution kernel of 1x1, respectively. 822 and 836 indicate convolution operations with convolution kernels of 1x1, respectively. 824 and 838 indicate up-sampling operations, respectively. 826 and 840 respectively indicate an integration (e.g., adding corresponding feature map corresponding elements) operation. 820. 834 and 848 respectively indicate outputs. Each output is, for example, the number of object categories and location information of the object (e.g., coordinates with respect to the peripheral border). The output 820 is output after being processed by the mobilent-stage 5 stage, on the basis of 1/32 that the height and width of the image are respectively scaled to the input image size, and then convolved. Output 834 is formed by performing an integration operation 826 on the processing result of the mobilene-stage 5 stage through convolution operation 814, convolution operation 822 and upsampling operation 824 and the processing result of the mobilene-stage 4 stage, and then performing convolution operation 830 and convolution operation 832. Thus, the image size of output 834 is different from the image size of output 820. Similarly, the result of the output 840 after the integration operation 826 is processed again and then subjected to the integration operation 840 with the processing result at the mobilent-stage 3 stage, and thus the image size of the output 840 is different from the image size of the

outputs

820 and 834. The recognition model of the target photographic object constructed by adopting the above means can recognize objects with different image sizes, so that the present disclosure can quickly and accurately recognize target photographic objects with different sizes in the distant view and the near view during the driving of the vehicle 110.

The method 1300 for focusing is described below in conjunction with fig. 12-13. Fig. 12 schematically illustrates a schematic diagram of an acquired image 1200 based on a target shooting orientation according to an embodiment of the disclosure. Fig. 13 schematically illustrates a schematic diagram of a method 1300 for focusing according to an embodiment of the present disclosure.

At block 1302, the in-vehicle computing device 114 or the mobile device 120 determines whether the camera has been adjusted to the target shooting orientation.

At block 1304, if the in-vehicle computing device 114 or the mobile device 120 determines that the camera has been adjusted to the target photographic orientation, the target photographic object is determined to be in focus. In some embodiments, in adjusting the camera's shooting orientation, the location information of the target photographic object may be tracked, and if the in-vehicle computing device 114 or the mobile device 120 determines that the location information (e.g., the location of the center point of the peripheral frame 1212 in fig. 12) of the target photographic object (e.g., the cart 1210 in fig. 12) is equal to the predetermined location information, it is determined that the camera has been adjusted to the target shooting orientation. For example, the position information of the target photographic subject (e.g., the car 1210) in the captured image 1200 in fig. 12 is the predetermined position information (e.g., 0.5, 0.618), the in-vehicle computing device 114 or the mobile device 120 determines that the target photographic subject (e.g., the car 1210) is the object of focus.

At block 1306, the in-vehicle computing device 114 or the mobile device 120 generates multi-frame image data based on a plurality of in-focus points for a focus object, the plurality of in-focus points being generated via driving the at least one imaging device to move a plurality of in-focus positions of the lens.

Regarding the manner in which the camera device moves the lens for automatic focusing, in some embodiments, the lens is configured to be disposed in a voice coil motor, for example, the voice coil motor is mainly composed of a coil, a magnet set and spring plates, the coil is fixed in the magnet set through upper and lower spring plates, when the coil is energized, the coil generates a magnetic field, the coil magnetic field and the magnet set interact with each other, the coil moves upward, and when the camera disposed in the coil moves, for example, the coil returns under the spring force of the spring plate when the power is cut off, so that the automatic focusing function is realized.

At block 1308, the in-vehicle computing device 114 or the mobile device 120 determines the image data with the greatest contrast among the plurality of frames of image data.

At block 1310, the at least one camera device is caused to move the lens to an in-focus position associated with the image data having the greatest contrast.

In the above-described aspect, auto-focusing can be automatically performed on the target photographic object as a focusing object directly after the image pickup apparatus is adjusted to the target photographic orientation, without manually selecting a focusing object among a plurality of objects.

In some embodiments, the in-vehicle computing device 114 or the mobile device 120 may also use other automatic focusing methods, for example, use "phase detection automatic focusing", for example, determine a focusing offset value by comparing the distances between the left and right pixels and the variation thereof, so as to achieve accurate focusing.

FIG. 14 schematically illustrates a block diagram of an electronic device 1400 suitable for use to implement embodiments of the present disclosure. The device 1400 may be a device for implementing the

methods

200, 300, 400, 600, and 1300 shown in fig. 2-6. As shown in fig. 7, device 1400 includes a Central Processing Unit (CPU)1401 that can perform various appropriate actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM)1402 or loaded from a storage unit 1408 into a Random Access Memory (RAM) 1403. In the RAM1403, various programs and data required for the operation of the device 1400 can also be stored. The CPU 1401, ROM 1402, and RAM1403 are connected to each other via a bus 1404. An input/output (I/O) interface 1405 is also connected to bus 1404.

Various components in device 1400 connect to I/O interface 1405, including: an input unit 1406, an output unit 1407, a storage unit 1408, and a processing unit 1401 perform the respective methods and processes described above, for example, the

methods

200, 300, 400, 600, and 1300. For example, in some embodiments,

methods

200, 300, 400, 600, and 1300 may be implemented as a computer software program stored on a machine-readable medium, such as storage unit 1408. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1400 via ROM 1402 and/or communication unit 1409. When loaded into RAM1403 and executed by CPU 1401, the computer program may perform one or more of the operations of

methods

200, 300, 400, 600 and 1300 described above. Alternatively, in other embodiments, the CPU 1401 may be configured by any other suitable means (e.g. by means of firmware) to perform one or more of the acts of the

methods

200, 300, 400, 600 and 1300.

It should be further appreciated that the present disclosure may be embodied as methods, apparatus, systems, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for carrying out various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor in a voice interaction device, a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

The above are only alternative embodiments of the present disclosure and are not intended to limit the present disclosure, which may be modified and varied by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. A method of acquiring an image, comprising:

in response to determining that a voice input is associated with capturing an image, acquiring an initial image displayed by a mobile device or an in-vehicle camera, the voice input being picked up via a vehicle or a microphone of the mobile device;

identifying a target photographic object in the initial image based on the voice input;

adjusting at least one of a focal length and a shooting orientation of at least one of the vehicle-mounted camera and the camera of the mobile equipment based on the image characteristics related to the target shooting object; and

and selecting an focusing point for the target shooting object to be used for shooting a target image.

2. The method of claim 1, wherein determining that a voice input is associated with capturing an image comprises:

extracting acoustic features of the voice input;

determining, based on the acoustic features, whether the voice input includes a voice segment associated with a predetermined keyword via a predetermined keyword recognition model trained via a plurality of voice sample data about a photographing instruction; and

in response to determining that the speech input includes a speech segment included with a predetermined keyword, determining that the speech input is associated with a captured image.

3. The method of claim 1, wherein determining that a voice input is associated with capturing an image comprises:

in response to determining that the voice input is an answer to a first voice, the first voice is output via a speaker of a vehicle or the mobile device.

4. The method of claim 2 or 3, wherein identifying the target photographic subject in the initial image comprises one of:

recognizing the voice input to determine a keyword indicating a target photographic object;

identifying an object, an object class, included in the initial image based on an identification model, the identification model being trained via a plurality of image sample data; and

in response to determining that the object category matches the keyword, determining that the object is the target photographic object.

5. The method of claim 4, wherein adjusting at least one of a focal length and a shooting orientation of at least one camera comprises:

determining the size proportion of the target shooting object in the initial image;

in response to determining that the size ratio is not equal to a predetermined ratio, adjusting the focal length of the at least one camera to a target focal length;

determining position information of the target photographic object in an image generated based on the target focal length; and

and adjusting the at least one camera to a target shooting orientation in response to determining that the position information is not equal to the predetermined position.

6. The method of claim 5, wherein determining a size proportion of the target photographic subject in the initial image comprises:

determining a peripheral frame of the target shooting object; and

determining the size ratio based on a ratio between an area of the peripheral bounding box and an overall area of the image.

7. The method of claim 5, wherein determining position information of the target photographic subject in an image generated based on the target focal distance comprises:

determining the position information based on a center position of a peripheral frame of the target photographic subject in the image generated by the target focal length.

8. The method of claim 7, wherein adjusting the at least one camera to a target focal length and a target capture orientation comprises:

in response to determining that the target focal length exceeds a predetermined focal length threshold for a lens of the at least one camera device, capturing an image of the target photographic subject based on the predetermined focal length threshold; and

clipping an image of the target photographic subject captured based on a predetermined focus threshold value based on a predetermined scale to generate the target image.

9. The method of claim 5, wherein choosing an in-focus point for the target photographic subject for capturing a target image comprises:

in response to determining that the at least one imaging device has been adjusted to the target shooting orientation, determining that the target shooting object is a focused object;

acquiring a plurality of in-focus points for the target focus pair, the plurality of in-focus points being generated by driving the at least one image pickup device to move a plurality of in-focus positions of a lens, to generate multi-frame image data;

determining image data with the maximum contrast in the multi-frame image data; and

causing the lens of the at least one camera device to move to an in-focus position associated with the image data having the greatest contrast.

10. The method of claim 5, wherein adjusting the at least one camera to a target capture orientation comprises:

in response to determining that the at least one camera has been adjusted to a target focal length, acquiring an initial camera orientation of the mobile device associated with the initial image, the initial camera orientation being determined based on detection data of a sensor; and

and generating a driving signal for driving at least one rotating device of the at least one image pickup device to rotate based on the position information, the preset position information and the initial shooting orientation so as to adjust the at least one image pickup device to the target shooting orientation.

11. The method of claim 1, wherein adjusting the at least one camera to a target capture orientation comprises:

acquiring an initial shooting orientation of the mobile device associated with the initial image, the initial shooting orientation being determined based on detection data of a sensor;

generating voice information indicating movement of the at least one image pickup device with respect to the movement based on the target photographing orientation and the initial photographing orientation; and

presenting prompt information in response to determining that the at least one camera has been moved to the target shooting orientation.

12. An electronic device, comprising:

a memory configured to store one or more computer programs; and

a processor coupled to the memory and configured to execute the one or more programs to cause the apparatus to perform the method of any of claims 1-11.

13. A non-transitory computer readable storage medium having stored thereon machine executable instructions which, when executed, cause a machine to perform the steps of the method of any of claims 1-11.