WO2023092380A1 - Method of suggesting shooting position and posture for electronic device having camera, electronic device and computer-readable storage medium - Google Patents

Method of suggesting shooting position and posture for electronic device having camera, electronic device and computer-readable storage medium Download PDF

Info

Publication number
WO2023092380A1
WO2023092380A1 PCT/CN2021/133093 CN2021133093W WO2023092380A1 WO 2023092380 A1 WO2023092380 A1 WO 2023092380A1 CN 2021133093 W CN2021133093 W CN 2021133093W WO 2023092380 A1 WO2023092380 A1 WO 2023092380A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
posture
camera
shooting position
electronic device
Prior art date
Application number
PCT/CN2021/133093
Other languages
French (fr)
Inventor
Teruchika MIURA
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp., Ltd. filed Critical Guangdong Oppo Mobile Telecommunications Corp., Ltd.
Priority to PCT/CN2021/133093 priority Critical patent/WO2023092380A1/en
Publication of WO2023092380A1 publication Critical patent/WO2023092380A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/64Computer-aided capture of images, e.g. transfer from script file into camera, check of taken image quality, advice or proposal for image composition or decision on when to take image
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/239Image signal generators using stereoscopic image cameras using two 2D image sensors having a relative position equal to or related to the interocular distance
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/254Image signal generators using stereoscopic image cameras in combination with electromagnetic radiation sources for illuminating objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/271Image signal generators wherein the generated image signals comprise depth maps or disparity maps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/296Synchronisation thereof; Control thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/63Control of cameras or camera modules by using electronic viewfinders
    • H04N23/633Control of cameras or camera modules by using electronic viewfinders for displaying additional information relating to control or operation of the camera
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/2224Studio circuitry; Studio devices; Studio equipment related to virtual studio applications
    • H04N5/2226Determination of depth image, e.g. for foreground/background separation

Definitions

  • the present disclosure relates to a method of suggesting a shooting position and posture for an electronic device having a camera, an electronic device performing the method, and a computer-readable storage medium storing a program to implement the method.
  • the electronic device could process an image captured by the user so as to change it into an image with good composition.
  • changing the composition of the captured image to good composition is not an easy feat. Even if the electronic device converts the image, it may be deteriorated due to deformation, enlargement or reduction.
  • the present disclosure aims to solve at least one of the technical problems mentioned above. Accordingly, the present disclosure needs to provide a method of suggesting a shooting position and posture for an electronic device having a camera, an electrical device implementing such method and a computer-readable storage medium storing a program to implement the method.
  • a method of suggesting a shooting position and posture for an electronic device having a camera includes:
  • an electronic device having a camera includes:
  • an acquiring unit configured to acquire an image captured by the camera, a depth map corresponding to the image, and a current posture of the camera when the image is captured;
  • a determining unit configured to determine a favorable shooting position and posture of the camera based on the image, the depth map and the current posture
  • an outputting unit configured to output a suggestion to guide the camera to the favorable shooting position and posture.
  • a computer-readable storage medium on which a computer program is stored, is provided.
  • the computer program is executed by a computer to implement the method according to the present disclosure.
  • FIG. 1 is a functional block diagram illustrating a configuration of an electronic device according to an embodiment of the present disclosure
  • FIG. 2 is an example of a camera posture candidate list stored in a memory of the electronic device according to an embodiment of the present disclosure
  • FIG. 3 shows a situation where the electronic device is capturing an image which includes objects and a background
  • FIG. 4 is a top view corresponding to FIG. 3 which shows the camera postures specified in the camera posture candidate list;
  • FIG. 5 is a side view corresponding to FIG. 3 which shows the camera postures specified in the camera posture candidate list
  • FIG. 6 is a functional block diagram of a processor of the electronic device according to an embodiment of the present disclosure.
  • FIG. 7 is a flowchart illustrating a method of suggesting a favorable shooting position and posture for the electronic device according to an embodiment of the present disclosure
  • FIG. 8 is an example of an image (top) and an example of a depth map corresponding to the image (bottom) ;
  • FIG. 9 is a flowchart illustrating a method of determining the favorable shooting position and posture according to an embodiment of the present disclosure.
  • FIG. 10 is a flowchart illustrating a method of creating a 3D model according to an embodiment of the present disclosure
  • FIG. 12 shows the original depth map captured by the electronic device (top) and shape information of the extracted objects (bottom) ;
  • FIG. 13 shows a 3D model created by combining the label and the shape information
  • FIG. 14 shows the original image captured by the electronic device (top) and a background image obtained by deleting the objects and interpolating an occlusion area (bottom) ;
  • FIG. 15 is a flowchart illustrating a method of creating a plurality of 2D images according to an embodiment of the present disclosure
  • FIG. 16 shows a situation where the objects are shot from three different shooting positions and postures
  • FIG. 17A is an example of a created 2D image when the objects are shot from a first shooting position and posture
  • FIG. 17B is an example of a created 2D image when the objects are shot from a second shooting position and posture
  • FIG. 17C is an example of a created 2D image when the objects are shot from a third shooting position and posture
  • FIG. 18A is an example of a preview image when a human (main object) is viewed from a first shooting position and posture;
  • FIG. 18B is an example of a preview image when the human is viewed from a second shooting position and posture
  • FIG. 18C is an example of a preview image when the human is viewed from a third shooting position and posture.
  • FIG. 19 is an example of an image on which a target image is superimposed on the preview image.
  • FIG. 1 is a functional block diagram illustrating an example of a configuration of the electronic device 100 according to an embodiment of the present disclosure.
  • the electronic device 100 is a mobile device such as a smartphone in this embodiment, but it may be other types of electronic device equipped with one or more camera modules.
  • the electronic device 100 includes a camera module 10, a range sensor module 20, and an image signal processor 30 that controls the camera module 10 and the range sensor module 20.
  • the camera module 10 includes two image sensor modules 11 and 12, as shown in FIG. 1.
  • the image sensor module 11 includes a first lens 11a that is capable of focusing on an object, a first image sensor 11b that detects an image inputted via the first lens 11a, and a first image sensor driver 11c that drives the first image sensor 11b.
  • the image sensor module 12 includes a second lens 12a that is capable of focusing on an object, a second image sensor 12b that detects an image inputted via the second lens 12a, and a second image sensor driver 12c that drives the second image sensor 12b.
  • the term of “object” indicates not only inanimate objects but also living things such as humans, animals, and plants.
  • the image sensor modules 11 and 12 are used for binocular stereo viewing.
  • the image sensor module 11 captures a master camera image
  • the image sensor module 12 captures a slave camera image.
  • the master camera image and the slave camera image may be a color image such as an RGB image, a YUV image, or a monochrome image.
  • the camera module 10 may include one or more other image sensor modules such as a telephoto camera or a wide-angle camera.
  • the range sensor module 20 includes a lens 20a, a range sensor 20b, a range sensor driver 20c and a projector 20d, as shown in FIG. 1.
  • the projector 20d emits pulsed light toward a subject and the range sensor 20b detects reflection light from the subject through the lens 20a.
  • the range sensor module 20 acquires a time of flight depth value (i.e., ToF depth value) based on the time difference between emitting the pulsed light and receiving the reflection light.
  • the ToF depth value indicates an actual distance between the electronic device 100 and the subject.
  • a depth map can be obtained based on the ToF depth values.
  • the image signal processor 30 controls the image sensor modules 11 and 12 to capture an image, and also controls the range sensor module 20 to capture a depth map.
  • the image signal processor 30 may perform image processing based on the image acquired from the camera module 10 and the depth map acquired from the range sensor module 20.
  • the electronic device 100 further includes a global navigation satellite system (GNSS) module 41, a wireless communication module 42, a CODEC 43, a speaker 44, a microphone 45, a display module 46, an input module 47, an inertial measurement unit (IMU) 48, a processor 50, a Graphics Processing Unit (GPU) 51, a Neural network Processing Unit (NPU) 52 and a memory 60.
  • GNSS global navigation satellite system
  • IMU inertial measurement unit
  • processor 50 a Graphics Processing Unit (GPU) 51, a Neural network Processing Unit (NPU) 52 and a memory 60.
  • GPU Graphics Processing Unit
  • NPU Neural network Processing Unit
  • the GNSS module 41 measures a current position of the electronic device 100.
  • the wireless communication module 42 performs wireless communications with the Internet.
  • the CODEC 43 bi-directionally performs encoding and decoding, using a predetermined encoding/decoding method.
  • the speaker 44 outputs a sound in accordance with sound data decoded by the CODEC 43.
  • the microphone 45 inputs sound and outputs sound data to the CODEC 43 based on inputted sound.
  • the display module 46 displays various information such as a preview image captured by the camera module 10 in real time so that a user can check it before shooting.
  • the display module 46 displays a suggestion to guide the camera module 10 to a favorable shooting position and posture where an image with good composition could be captured.
  • the input module 47 inputs information via a user’s operation. For example, the input module 47 inputs the user’s instruction to shoot an image and store it in the memory 60.
  • the IMU 48 detects an angular velocity and an acceleration of the electronic device 100.
  • a posture of the electronic device 100 i.e., the camera module 10) can be grasped by a measurement result of the IMU 48.
  • the processor 50 controls the GNSS module 41, the wireless communication module 42, the CODEC 43, the speaker 44, the microphone 45, the display module 46, the input module 47, and the IMU 48.
  • the GPU 51 and the NPU 52 may be used to create a 3D model and a 2D image described below.
  • the GPU 51 and/or the NPU 52 infers a name/label of an object detected in a captured image by using a trained model created by deep learning.
  • the processing units 51, 52 may also be used to obtain a score of the 2D image by using a trained model created by deep learning.
  • the memory 60 stores data of images captured by the camera module 10, and data of depth map images captured by the range sensor module 20.
  • the memory 60 stores trained models created by the deep learning in advance.
  • the trained model is configured to infer a name/label of an object detected in the image so that a label of the object can be obtained.
  • the memory 60 stores a program which runs on the processing units 30, 50, 51 and 52.
  • the program may include a depth measurement program for acquiring a depth map, a 2D rendering program for creating a 2D image, a scoring program for calculating a score of the 2D image, and a suggestion drawing program for drawing, on a preview image, a suggestion to guide the camera module 10 to a favorable shooting position and posture.
  • the memory 60 stores data of background images obtained based on the captured images.
  • a background image is created by deleting object (s) from the image captured by the camera module 10 and interpolating an occlusion area where the object (s) are removed.
  • the memory 60 stores a 3D model database and a camera posture candidate list.
  • the 3D model database is a database which stores a 3D model of the objects detected in the image. As described later, the 3D model is created by combining a label of the object and a shape information of the object. The shape information indicates a rough shape of the object such as ellipsoid, sphere, cuboid or cube.
  • the 3D model may be updated each time an image and a depth map are captured.
  • the camera posture candidate list is used to create/render a plurality of 2D images of object (s) when the object (s) are viewed from a plurality of shooting positions and postures of the camera module 10.
  • the camera posture candidate list stores information on a plurality of postures of the camera module 10. As can be understood from the example below, a plurality of positions and postures of the camera module 10 are determined based on the camera posture candidate list and a position of an object.
  • FIG. 2 is an example of the camera posture candidate list.
  • the angles ⁇ indicate postures of the camera module 10 of the electronic device 100 with respect to the main object in a vertical plane.
  • the angles are -70, -60, -50, -40, -30, -20, -10, 0, +10, +20, +30, +40, +50, +60 and +70
  • the angles ⁇ are -30, -25, -20, -15, -10, -5, 0, +5, +10, +15, +20, +25 and +30.
  • the camera posture candidate list defines a plurality of postures of the camera module 10 when the optical axis a of the camera module 10 passes through the main object (i.e., the human H) and the camera module 10 is located on a surface of a virtual sphere whose center G is located on the object and whose radius r is substantially equal to a distance between the camera module 10 and the object.
  • the optical axis a is orthogonal to the surface of the sphere.
  • the center G of the sphere may be located at a center of the object.
  • the center of the object is a center of gravity of the object, for example.
  • the camera posture candidate list is not limited to the above example as long as it defines postures of the camera module 10.
  • the configuration of the electronic device 100 has been described. It should be understood that the configuration of the electronic device 100 is not limited to the above.
  • one of the image sensor modules 11 and 12 can be omitted.
  • a depth map can be obtained by a SfM (Structure from Motion) method or a “moving stereo processing” technique in which a single image sensor module moves around an object and captures images of the object from different angles.
  • the range sensor module 20 can be omitted. Even without the range sensor module 20, a depth map can be obtained by a method of “stereo processing” or “stereo match technique” .
  • the stereo processing method uses a plurality of images captured by a plurality of image sensor modules. Specifically, an amount of parallax is calculated for each corresponding pixel of the master camera image and the slave camera image.
  • the Visual SLAM (Simultaneous Localization and Mapping) technique can also be used to obtain a depth map. These methods may be used in combination.
  • processors 50, 51 and 52 are also just one example. At least any one of them can be omitted, or one or more processors can be added according to the performance required for the electronic device 100.
  • the ISP 30 includes an acquiring unit 31, a determining unit 32 and an outputting unit 33.
  • the acquiring unit 31 is configured to acquire an image captured by the camera module 10, a depth map captured by the range sensor module 20, and a current posture of the camera module 10 grasped by the IMU 48.
  • the image may be a color image.
  • the image is a master camera image captured by the image sensor module 11 or a slave camera image captured by the image sensor module 12.
  • the depth map corresponds to the image.
  • the current posture is a posture of the camera module 10 when the image was captured.
  • the current posture may be acquired by the processor 50.
  • the determining unit 32 is configured to determine a favorable shooting position and posture of the camera module 10 based on the image, the depth map and the current posture acquired by the acquiring unit 31. A method of determining the favorable shooting position and posture will be described later.
  • the “posture” indicates an angle or an orientation of the camera module 10 relative to a main object.
  • the outputting unit 33 is configured to output a suggestion to guide the camera module 10 to the favorable shooting position and posture determined by the determining unit 32.
  • the suggestion may be superimposed on a preview image.
  • At least one of the acquiring unit 31, the determining unit 32 and the outputting unit 33 may be performed by a processor other than the ISP 30 (i.e., the processor 50, the GPU 51 or the NPU 52) .
  • Each of the acquiring unit 31, the determining unit 32 and the outputting unit 33 may be performed by a different processor.
  • FIG. 7 is a flowchart which shows the overall outline of the method.
  • an image captured by the camera module 10 a depth map corresponding to the image, and a current posture of the camera module 10 when the image is captured, are acquired.
  • the image may be captured with either the image sensor module 11 or the image sensor module 12.
  • FIG. 8 illustrates an example of the image (color image) and the depth map corresponding to the image.
  • the image and the depth map contain the human H and the tree T as objects, and contain the mountain B as a background.
  • the depth map may be a greyscale image in which the brightness of an area decreases as the distance from the electronic device 100 increases.
  • step S2 a favorable shooting position and posture of the camera module 10 based on the image, the depth map and the current posture acquired in the step S1 is determined.
  • the details of the step S2 will be described with reference to the flowchart of FIGs. 9 and 10.
  • step S21 a 3D model of an object in the image is created based on the image and the depth map.
  • the step S21 includes the steps S211, S212 and S213 as shown in FIG. 10.
  • a label of the object in the image is obtained.
  • the GPU 51 or the NPU 52 may infer to obtain the label by using a trained model stored in the memory 60.
  • the trained model may be created by a deep learning and be stored in the memory 60 in advance.
  • two objects, i.e., the human H and the tree T are detected based on the image and the depth map.
  • the labels of the human H and the tree T are obtained by performing object recognition using the trained model.
  • the objects i.e., the human H and the tree T
  • each of the labels i.e., “HUMAN” and “TREE”
  • shape information of the object is obtained.
  • the shape information indicates a rough 3D shape (such as ellipsoid, sphere, cuboid or cube) of the object in the image.
  • the shape information is obtained based on the depth map which includes surface coordinates of the object.
  • the surface coordinates can be acquired from the positions of a surface of the object.
  • a shape of the object is estimated based on the surface coordinates.
  • the shape of the object may be estimated based on a bounding box around the object.
  • shape information of the two objects i.e., the human H and the tree T are obtained.
  • the shape of the human H and the tree T are obtained by performing shape recognition based on the depth map.
  • the shape information SH indicates a rough 3D shape of the human H
  • the shape information ST indicates a rough 3D shape of the tree T.
  • the shapes of the human H and the tree T are approximated as ellipsoids, respectively.
  • the 3D model is created by combining the label and the shape information.
  • FIG. 13 shows the 3D model created by combining the label obtained in the step S211 and the shape information obtained in the step S212.
  • the 3D model contains a model MH and a model MT.
  • the model MH is the shape information SH associated with the label “HUMAN” .
  • the model MT is the shape information ST associated with the label “TREE” .
  • the 3D model database stored in the memory 60 is updated by the 3D model created in the step S213.
  • the 3D model in the 3D model database is updated by adding a newly created 3D model to the previous 3D model.
  • two objects that is, one object in the newly created 3D model and another object in the previous 3D model may be merged if the two objects have the same label and are close in location to each other.
  • a background image may be created.
  • the background image is created by deleting an object from the image captured by the camera module 10.
  • An area where the object is removed i.e., occlusion area
  • the interpolation processing is performed by using the previous background images stored in the memory 60.
  • the interpolation processing may be performed by using an image captured by an image sensor module such as a wide-angle camera provided with the camera module 10.
  • the interpolation processing may be performed based on the pixels surrounding the area by using a filter such as a smoothing filter, or the interpolation processing may be performed by machine learning such as deep learning.
  • the background image may be converted into a spherical image so that a background image corresponding to a new position and posture of the camera module 10 can be obtained.
  • the background image may be created in another step such as the step S223.
  • step S22 the steps after step S22 will be described.
  • a plurality of 2D images are created/rendered based on the 3D model in the step S21.
  • the plurality of 2D images are images when the object is viewed from a plurality of shooting positions and postures.
  • the plurality of shooting positions and postures are defined by the current posture of the camera module 10 and the camera posture candidate list. The details of the step S22 will be described with reference to the flowchart of FIG. 15.
  • step S221 it is determined whether all postures in the camera posture candidate list are used or not. If NO, proceed to the step S222. Otherwise, end the process flow in FIG. 15.
  • a posture of the camera module 10 is selected from the camera posture candidate list.
  • a 2D image when the object is viewed from a shooting position and posture determined based on the selected posture is created.
  • the 2D image is created based on the 3D model created in the step S21.
  • an object image when the object is viewed from a position and posture of the camera module 10 is created.
  • the position and posture is defined by the current posture acquired in the step S1 and the posture selected in the step S222.
  • the camera posture candidate list is a list shown in FIG. 2 and the situation is shown in FIGs. 3 and 4
  • a virtual sphere, whose center is a center of the human H and whose radius is equal to the distance between the camera module 10 (the electronic device 100) and the human H, is considered.
  • the angles and ⁇ are set to be 0 at a current position and posture of the camera module 10, and then a plurality of shooting positions and postures to create the 2D images are defined.
  • an object image when the object is viewed from the selected posture is created.
  • the object image is created based on the 3D model database.
  • the object is transformed according to the viewpoint corresponding to the selected posture based on the 3D model database.
  • the object image is superimposed on the background image.
  • the background image may be transformed according to the viewpoint before the superposition.
  • the step of creating the 2D images may be preferably performed by the GPU 51.
  • FIG. 16 shows a situation where the object, the human H is shot from three different shooting positions and postures.
  • the shooting positions and postures Pa, Pb and Pc are defined by the current posture of the camera module 10 and the camera posture candidate list.
  • the shooting positions and postures Pa, Pb and Pc correspond to the shooting positions and postures shown in FIG. 4.
  • FIG. 17A shows the created 2D image when the main object (the human H) is shot from the shooting position and posture Pa.
  • FIG. 17B shows the created 2D image when the main object is shot from the shooting position and posture Pb.
  • FIG. 17C shows the created 2D image when the main object is shot from the shooting position and posture Pc.
  • the main object and the sub-object are shown as the model MH and the model MT, respectively.
  • the 3D model contains the simplified object models, i.e., the models MH and MT, in each of which the shape information indicating a rough 3D shape of the object is associated with the label.
  • a score for each of the plurality of 2D images is calculated.
  • the score shows how good a composition of the 2D image is.
  • the score shows a level of the composition of the 2D image.
  • the GPU 51 or the NPU 52 may calculate or infer the score by using a trained model created by the deep learning. That is to say, aesthetic evaluation is performed by machine learning.
  • the learning data for creating the trained model are 2D images in which the rough shape of an object is projected onto a background image and a label of the object is attached to the projected shape.
  • the objects are simplified to be shown in the 2D images (i.e., the model MH and the model MT) .
  • the score for each of the 2D images can be calculated appropriately because the labels are given to the corresponding objects.
  • a shooting position and posture where the 2D image with the highest score can be captured is determined to be the favorable shooting position and posture.
  • the position and posture for capturing the 2D image of FIG. 17B is determined as the favorable position and posture if the score of the 2D image of FIG. 17B is the highest among the 2D images of FIGS. 17A, 17B and 17C.
  • step after step S2 will be described.
  • a suggestion to guide the camera module 10 to the favorable shooting position and posture is outputted.
  • the suggestion is created based on the favorable shooting position and posture determined in the step S2.
  • the display module 46 displays a guide information based on the suggestion.
  • the speaker 44 may emit a guide voice based on the suggestion.
  • the suggestion may be superimposed on the image captured by the camera module 10, as described later with some examples.
  • the user of the electronic device 100 can grasp the suggestion while looking at a preview image displayed on the display module 46.
  • the suggestion may be a pointer to show a shooting position defined by the favorable shooting position and posture.
  • FIG. 18A shows a preview image when the human H is viewed from the shooting position and posture Pa.
  • the pointe P is superimposed on the preview image.
  • the user can easily know a position where he/she should go to capture an image with good composition.
  • the pointer P may be animated for the user’s ease of viewing.
  • the suggestion may be a guide arrow or pictogram to guide the camera module 10 in a shooting posture defined by the favorable shooting position and posture.
  • FIG. 18B shows a preview image when the human H is viewed from the shooting position and posture Pb.
  • the guide arrow GA is superimposed on the preview image.
  • the guide arrow GA indicates that the user should rotate the electronic device 100 (the camera module 10) around a horizontal axis. That is to say, the arrow GA suggests the user to change a pitch of the electronic device 100. Instead, the guide arrow may indicate a change of yaw or roll of the electronic device 100.
  • the guide arrow GA may be animated so that the user can easily grasp how the camera should be rotated.
  • the suggestion may be a message indicating what the user should do.
  • FIG. 18C shows a preview image when the human H is viewed from the shooting position and posture.
  • the message MSG is superimposed on the preview image.
  • the message MSG shows “READY” to indicate that the user should shoot an image now.
  • the suggestion may be a target image.
  • FIG. 19 is a preview image on which the target image TI is superimposed.
  • the target image is a 2D image obtained when the favorable shooting position and posture is determined. That is to say, the target image is the 2D image with the highest score which is created in the step S22.
  • the steps S1 to S3 described above are executed each time an image is captured. In the case of shooting a video, the steps S1 to S3 are executed at a speed corresponding to the frame rate. Alternatively, the steps S1 to S3 may be executed at a predetermined interval. For example, the steps S1 to S3 are executed each time a predetermined number of images are captured.
  • the user can know a favorable shooting position and posture even if the camera module 10 has not been pointed at a main object from the favorable shooting position and posture because the plurality of 2D images are created based on the 3D model created in advance.
  • first and second are used herein for purposes of description and are not intended to indicate or imply relative importance or significance or to imply the number of indicated technical features.
  • a feature defined as “first” and “second” may comprise one or more of this feature.
  • a plurality of means “two or more than two” , unless otherwise specified.
  • the terms “mounted” , “connected” , “coupled” and the like are used broadly, and may be, for example, fixed connections, detachable connections, or integral connections; may also be mechanical or electrical connections; may also be direct connections or indirect connections via intervening structures; may also be inner communications of two elements which can be understood by those skilled in the art according to specific situations.
  • a structure in which a first feature is "on" or “below” a second feature may include an embodiment in which the first feature is in direct contact with the second feature, and may also include an embodiment in which the first feature and the second feature are not in direct contact with each other, but are in contact via an additional feature formed therebetween.
  • a first feature "on” , “above” or “on top of” a second feature may include an embodiment in which the first feature is orthogonally or obliquely “on” , “above” or “on top of” the second feature, or just means that the first feature is at a height higher than that of the second feature; while a first feature “below” , “under” or “on bottom of” a second feature may include an embodiment in which the first feature is orthogonally or obliquely “below” , "under” or “on bottom of” the second feature, or just means that the first feature is at a height lower than that of the second feature.
  • Any process or method described in a flow chart or described herein in other ways may be understood to include one or more modules, segments or portions of codes of executable instructions for achieving specific logical functions or steps in the process, and the scope of a preferred embodiment of the present disclosure includes other implementations, in which it should be understood by those skilled in the art that functions may be implemented in a sequence other than the sequences shown or discussed, including in a substantially identical sequence or in an opposite sequence.
  • the logic and/or step described in other manners herein or shown in the flow chart may be specifically achieved in any computer readable medium to be used by the instructions execution system, device or equipment (such as a system based on computers, a system comprising processors or other systems capable of obtaining instructions from the instructions execution system, device and equipment executing the instructions) , or to be used in combination with the instructions execution system, device and equipment.
  • the computer readable medium may be any device adaptive for including, storing, communicating, propagating or transferring programs to be used by or in combination with the instruction execution system, device or equipment.
  • the computer readable medium comprise but are not limited to: an electronic connection (an electronic device) with one or more wires, a portable computer enclosure (a magnetic device) , a random access memory (RAM) , a read only memory (ROM) , an erasable programmable read-only memory (EPROM or a flash memory) , an optical fiber device and a portable compact disk read-only memory (CDROM) .
  • the computer readable medium may even be a paper or other appropriate medium capable of printing programs thereon, this is because, for example, the paper or other appropriate medium may be optically scanned and then edited, decrypted or processed with other appropriate methods when necessary to obtain the programs in an electric manner, and then the programs may be stored in the computer memories.
  • each part of the present disclosure may be realized by the hardware, software, firmware or their combination.
  • a plurality of steps or methods may be realized by the software or firmware stored in the memory and executed by the appropriate instructions execution system.
  • the steps or methods may be realized by one or a combination of the following techniques known in the art: a discrete logic circuit having a logic gate circuit for realizing a logic function of a data signal, an application-specific integrated circuit having an appropriate combination logic gate circuit, a programmable gate array (PGA) , a field programmable gate array (FPGA) , etc.
  • each function cell of the embodiments of the present disclosure may be integrated in a processing module, or these cells may be separate physical existence, or two or more cells are integrated in a processing module.
  • the integrated module may be realized in a form of hardware or in a form of software function modules. When the integrated module is realized in a form of software function module and is sold or used as a standalone product, the integrated module may be stored in a computer readable storage medium.
  • the storage medium mentioned above may be read-only memories, magnetic disks, CD, etc.
  • the storage medium may be transitory or non-transitory.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Electromagnetism (AREA)
  • Studio Devices (AREA)

Abstract

Disclosed is a method of suggesting a shooting position and posture for an electronic device having a camera. The method includes acquiring an image captured by the camera, a depth map corresponding to the image, and a current posture of the camera when the image is captured, determining a favorable shooting position and posture of the camera based on the image, the depth map and the current posture, and outputting a suggestion to guide the camera to the favorable shooting position and posture.

Description

METHOD OF SUGGESTING SHOOTING POSITION AND POSTURE FOR ELECTRONIC DEVICE HAVING CAMERA, ELECTRONIC DEVICE AND COMPUTER-READABLE STORAGE MEDIUM TECHNICAL FIELD
The present disclosure relates to a method of suggesting a shooting position and posture for an electronic device having a camera, an electronic device performing the method, and a computer-readable storage medium storing a program to implement the method.
BACKGROUND
In recent years, images captured by an electronic device equipped with a camera such as smartphones are often uploaded to Social Networking Service (SNS) etc,. When shooting an object/subject with a smartphone, it is preferable for a user of the electronic device to shoot an image with good composition.
It is difficult for a user who is unfamiliar with capturing images to determine the appropriate shooting position and posture for shooting an image with good composition. The electronic device could process an image captured by the user so as to change it into an image with good composition. However, changing the composition of the captured image to good composition is not an easy feat. Even if the electronic device converts the image, it may be deteriorated due to deformation, enlargement or reduction.
SUMMARY
The present disclosure aims to solve at least one of the technical problems mentioned above. Accordingly, the present disclosure needs to provide a method of suggesting a shooting position and posture for an electronic device having a camera, an electrical device implementing such method and a computer-readable storage medium storing a program to implement the method.
In accordance with the present disclosure, a method of suggesting a shooting position and posture for an electronic device having a camera is provided. The method includes:
acquiring an image captured by the camera, a depth map corresponding to the image, and a current posture of the camera when the image is captured;
determining a favorable shooting position and posture of the camera based on the image, the depth map and the current posture; and
outputting a suggestion to guide the camera to the favorable shooting position and posture.
In accordance with the present disclosure, an electronic device having a camera is provided. The electronic device includes:
an acquiring unit configured to acquire an image captured by the camera, a depth map corresponding to the image, and a current posture of the camera when the image is captured;
a determining unit configured to determine a favorable shooting position and posture of the camera based on the image, the depth map and the current posture; and
an outputting unit configured to output a suggestion to guide the camera to the favorable shooting position and posture.
In accordance with the present disclosure, a computer-readable storage medium, on which a computer program is stored, is provided. The computer program is executed by a computer to implement the method according to the present disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS
These and/or other aspects and advantages of embodiments of the present disclosure will become apparent and more readily appreciated from the following descriptions made with reference to the drawings, in which:
FIG. 1 is a functional block diagram illustrating a configuration of an electronic device according to an embodiment of the present disclosure;
FIG. 2 is an example of a camera posture candidate list stored in a memory of the electronic device according to an embodiment of the present disclosure;
FIG. 3 shows a situation where the electronic device is capturing an image which includes objects and a background;
FIG. 4 is a top view corresponding to FIG. 3 which shows the camera postures specified in the camera posture candidate list;
FIG. 5 is a side view corresponding to FIG. 3 which shows the camera postures specified in the camera posture candidate list;
FIG. 6 is a functional block diagram of a processor of the electronic device according to an embodiment of the present disclosure;
FIG. 7 is a flowchart illustrating a method of suggesting a favorable shooting position and posture for the electronic device according to an embodiment of the present disclosure;
FIG. 8 is an example of an image (top) and an example of a depth map corresponding to the image (bottom) ;
FIG. 9 is a flowchart illustrating a method of determining the favorable shooting position and posture according to an embodiment of the present disclosure;
FIG. 10 is a flowchart illustrating a method of creating a 3D model according to an embodiment of the present disclosure;
FIG. 11 shows the original image captured by the electronic device (top) , and the objects extracted from the image and the labels corresponding to the objects (bottom) ;
FIG. 12 shows the original depth map captured by the electronic device (top) and shape information of the extracted objects (bottom) ;
FIG. 13 shows a 3D model created by combining the label and the shape information;
FIG. 14 shows the original image captured by the electronic device (top) and a background image obtained by deleting the objects and interpolating an occlusion area (bottom) ;
FIG. 15 is a flowchart illustrating a method of creating a plurality of 2D images according to an embodiment of the present disclosure;
FIG. 16 shows a situation where the objects are shot from three different shooting positions and postures;
FIG. 17A is an example of a created 2D image when the objects are shot from a first shooting position and posture;
FIG. 17B is an example of a created 2D image when the objects are shot from a second shooting position and posture;
FIG. 17C is an example of a created 2D image when the objects are shot from a third shooting position and posture;
FIG. 18A is an example of a preview image when a human (main object) is viewed from a first shooting position and posture;
FIG. 18B is an example of a preview image when the human is viewed from a second shooting position and posture;
FIG. 18C is an example of a preview image when the human is viewed from a third shooting position and posture; and
FIG. 19 is an example of an image on which a target image is superimposed on the preview image.
DETAILED DESCRIPTION
Embodiments of the present disclosure will be described in detail and examples of the embodiments will be illustrated in the accompanying drawings. The same or similar elements and elements having same or similar functions are denoted by like reference numerals throughout the descriptions. The embodiments described herein with reference to the drawings are explanatory, which aim to illustrate the present disclosure, but shall not be construed to limit the present disclosure.
<Electronic device 100>
An electronic device 100 will be described with reference to FIG. 1. FIG. 1 is a functional block diagram illustrating an example of a configuration of the electronic device 100 according to an embodiment of the present disclosure.
The electronic device 100 is a mobile device such as a smartphone in this embodiment, but it may be other types of electronic device equipped with one or more camera modules.
As shown in FIG. 1, the electronic device 100 includes a camera module 10, a range sensor module 20, and an image signal processor 30 that controls the camera module 10 and the range sensor module 20.
The camera module 10 includes two  image sensor modules  11 and 12, as shown in FIG. 1. The image sensor module 11 includes a first lens 11a that is capable of focusing on an object, a first image sensor 11b that detects an image inputted via the first lens 11a, and a first image sensor driver 11c that drives the first image sensor 11b. The image sensor module 12 includes a second lens 12a that is capable of focusing on an object, a second image sensor 12b that detects an image inputted via the second lens 12a, and a second image sensor driver 12c that drives the second image sensor 12b. In the present disclosure, the term of “object” indicates not only inanimate objects but also living things such as humans, animals, and plants.
The  image sensor modules  11 and 12 are used for binocular stereo viewing. The image sensor module 11 captures a master camera image, the image sensor module 12 captures a slave camera image. The master camera image and the slave camera image may be a color image such as an RGB image, a YUV image, or a monochrome image.
The camera module 10 may include one or more other image sensor modules such as a telephoto camera or a wide-angle camera.
The range sensor module 20 includes a lens 20a, a range sensor 20b, a range sensor driver 20c and a projector 20d, as shown in FIG. 1. The projector 20d emits pulsed light toward a subject and the range sensor 20b detects reflection light from the subject through the lens 20a.
The range sensor module 20 acquires a time of flight depth value (i.e., ToF depth value) based on the time difference between emitting the pulsed light and receiving the reflection light. The ToF depth value indicates an actual distance between the electronic device 100 and the subject. A depth map can be obtained based on the ToF depth values.
The image signal processor 30 controls the  image sensor modules  11 and 12 to capture an image, and also controls the range sensor module 20 to capture a depth map. The image signal processor 30 may perform image processing based on the image acquired from the camera module 10 and the depth map acquired from the range sensor module 20.
As shown in FIG. 1, the electronic device 100 further includes a global navigation satellite system (GNSS) module 41, a wireless communication module 42, a CODEC 43, a speaker 44, a microphone 45, a display module 46, an input module 47, an inertial measurement unit (IMU) 48, a processor 50, a Graphics Processing Unit (GPU) 51, a Neural network Processing Unit (NPU) 52 and a memory 60.
The GNSS module 41 measures a current position of the electronic device 100. The wireless communication module 42 performs wireless communications with the Internet. The CODEC 43 bi-directionally performs encoding and decoding, using a predetermined encoding/decoding method. The speaker 44 outputs a sound in accordance with sound data decoded by the CODEC 43. The microphone 45 inputs sound and outputs sound data to the CODEC 43 based on inputted sound.
The display module 46 displays various information such as a preview image captured by the camera module 10 in real time so that a user can check it before shooting. In the present disclosure, the display module 46 displays a suggestion to guide the camera module 10 to a favorable shooting position and posture where an image with good composition could be captured.
The input module 47 inputs information via a user’s operation. For example, the input module 47 inputs the user’s instruction to shoot an image and store it in the memory 60.
The IMU 48 detects an angular velocity and an acceleration of the electronic device 100. A posture of the electronic device 100 (i.e., the camera module 10) can be grasped by a measurement result of the IMU 48.
The processor 50 controls the GNSS module 41, the wireless communication module 42, the CODEC 43, the speaker 44, the microphone 45, the display module 46, the input module 47, and the IMU 48.
The GPU 51 and the NPU 52 may be used to create a 3D model and a 2D image described below. The GPU 51 and/or the NPU 52 infers a name/label of an object detected in a captured image by using a trained model created by deep learning. The  processing units  51, 52 may also be used to obtain a score of the 2D image by using a trained model created by deep learning.
The memory 60 stores data of images captured by the camera module 10, and data of depth map images captured by the range sensor module 20. The memory 60 stores trained models created by the deep learning in advance. For example, the trained model is configured to infer a name/label of an object detected in the image so that a label of the object can be obtained.
The memory 60 stores a program which runs on the  processing units  30, 50, 51 and 52. The program may include a depth measurement program for acquiring a depth map, a 2D rendering program for creating a 2D image, a scoring program for calculating a score of the 2D image, and a suggestion drawing program for drawing, on a preview image, a suggestion to guide the camera module 10 to a favorable shooting position and posture.
The memory 60 stores data of background images obtained based on the captured images. A background image is created by deleting object (s) from the image captured by the camera module 10 and interpolating an occlusion area where the object (s) are removed.
The memory 60 stores a 3D model database and a camera posture candidate list. The 3D model database is a database which stores a 3D model of the objects detected in the image. As described later, the 3D model is created by combining a label of the object and a shape information of the object. The shape information indicates a rough shape of the object such as ellipsoid, sphere, cuboid or cube. The 3D model may be updated each time an image and a depth map are captured.
The camera posture candidate list is used to create/render a plurality of 2D images of object (s) when the object (s) are viewed from a plurality of shooting positions and postures of the camera module 10. The camera posture candidate list stores information on a plurality of postures of the camera module 10. As can be understood from the example below, a plurality of positions and postures of the camera module 10 are determined based on the camera posture candidate list and a position of an object.
FIG. 2 is an example of the camera posture candidate list. The camera posture candidate list stores angles
Figure PCTCN2021133093-appb-000001
at θ=0° and angles θ at
Figure PCTCN2021133093-appb-000002
The angles
Figure PCTCN2021133093-appb-000003
indicate postures of the camera module 10 of the electronic device 100 with respect to a main object in a horizontal plane. The angles θ indicate postures of the camera module 10 of the electronic device 100 with respect to the main object in a vertical plane. In the example, the angles
Figure PCTCN2021133093-appb-000004
are -70, -60, -50, -40, -30, -20, -10, 0, +10, +20, +30, +40, +50, +60 and +70, and the angles θ are -30, -25, -20, -15, -10, -5, 0, +5, +10, +15, +20, +25 and +30.
FIG. 3 shows a situation where the electronic device 100 is capturing an image which includes objects (i.e., a human H and a tree T) and a background (i.e., a mountain B) . In FIG. 3, the sign “a” indicates an optical axis of the camera module 10. FIG. 4 is a top view corresponding to FIG. 3. FIG. 4 shows the camera postures defined by the angles
Figure PCTCN2021133093-appb-000005
stored in the camera posture candidate list. FIG. 5 is a side view corresponding to FIG. 3. FIG. 5 shows the camera postures defined by the angles θ stored in the camera posture candidate list.
As shown in FIGs. 4 and 5, the camera posture candidate list defines a plurality of postures of the camera module 10 when the optical axis a of the camera module 10 passes through the  main object (i.e., the human H) and the camera module 10 is located on a surface of a virtual sphere whose center G is located on the object and whose radius r is substantially equal to a distance between the camera module 10 and the object. The optical axis a is orthogonal to the surface of the sphere. The center G of the sphere may be located at a center of the object. The center of the object is a center of gravity of the object, for example.
It should be noted that the camera posture candidate list is not limited to the above example as long as it defines postures of the camera module 10.
The configuration of the electronic device 100 has been described. It should be understood that the configuration of the electronic device 100 is not limited to the above.
For example, one of the  image sensor modules  11 and 12 can be omitted. Even with only one image sensor, a depth map can be obtained by a SfM (Structure from Motion) method or a “moving stereo processing” technique in which a single image sensor module moves around an object and captures images of the object from different angles.
The range sensor module 20 can be omitted. Even without the range sensor module 20, a depth map can be obtained by a method of “stereo processing” or “stereo match technique” . The stereo processing method uses a plurality of images captured by a plurality of image sensor modules. Specifically, an amount of parallax is calculated for each corresponding pixel of the master camera image and the slave camera image. The Visual SLAM (Simultaneous Localization and Mapping) technique can also be used to obtain a depth map. These methods may be used in combination.
Regarding the processors, the use of a combination of the  processors  50, 51 and 52 is also just one example. At least any one of them can be omitted, or one or more processors can be added according to the performance required for the electronic device 100.
Next, the functions of the ISP 30 according to one embodiment are described in detail with reference to FIG. 6. The ISP 30 includes an acquiring unit 31, a determining unit 32 and an outputting unit 33.
The acquiring unit 31 is configured to acquire an image captured by the camera module 10, a depth map captured by the range sensor module 20, and a current posture of the camera module 10 grasped by the IMU 48. The image may be a color image. The image is a master camera image captured by the image sensor module 11 or a slave camera image captured by the image sensor module 12. The depth map corresponds to the image. The current posture is a posture of the camera module 10 when the image was captured. The current posture may be acquired by the processor 50.
The depth map is acquired by the range sensor module 20. As mentioned, the depth map can be acquired by other technique such as the SfM, the stereo processing or the Visual SLAM. The current posture of the camera module 10 is obtained by the IMU 48. Alternatively, the current posture can be obtained by the Visual SLAM technique.
The determining unit 32 is configured to determine a favorable shooting position and posture of the camera module 10 based on the image, the depth map and the current posture acquired by the acquiring unit 31. A method of determining the favorable shooting position and posture will be described later. The “posture” indicates an angle or an orientation of the camera module 10 relative to a main object.
The outputting unit 33 is configured to output a suggestion to guide the camera module 10 to the favorable shooting position and posture determined by the determining unit 32. The suggestion may be superimposed on a preview image.
It should be noted that at least one of the acquiring unit 31, the determining unit 32 and the outputting unit 33 may be performed by a processor other than the ISP 30 (i.e., the processor 50, the GPU 51 or the NPU 52) . Each of the acquiring unit 31, the determining unit 32 and the outputting unit 33 may be performed by a different processor.
< Method of suggesting a shooting position and posture>
A method of suggesting a favorable shooting position and posture to a user of the electronic device 100 according to an embodiment of the present disclosure will be described. FIG. 7 is a flowchart which shows the overall outline of the method.
In the step S1, an image captured by the camera module 10, a depth map corresponding to the image, and a current posture of the camera module 10 when the image is captured, are acquired. The image may be captured with either the image sensor module 11 or the image sensor module 12.
FIG. 8 illustrates an example of the image (color image) and the depth map corresponding to the image. The image and the depth map contain the human H and the tree T as objects, and contain the mountain B as a background. As shown in FIG. 8, the depth map may be a greyscale image in which the brightness of an area decreases as the distance from the electronic device 100 increases.
In the step S2, a favorable shooting position and posture of the camera module 10 based on the image, the depth map and the current posture acquired in the step S1 is determined. The details of the step S2 will be described with reference to the flowchart of FIGs. 9 and 10.
In the step S21, a 3D model of an object in the image is created based on the image and the depth map. The step S21 includes the steps S211, S212 and S213 as shown in FIG. 10.
In the step S211, a label of the object in the image is obtained. Preferably, the GPU 51 or the NPU 52 may infer to obtain the label by using a trained model stored in the memory 60. The trained model may be created by a deep learning and be stored in the memory 60 in advance. In case of the image shown in FIG. 8, two objects, i.e., the human H and the tree T are detected based on the image and the depth map. The labels of the human H and the tree T are obtained by performing object recognition using the trained model. As shown in FIG. 11, the objects (i.e., the human H and the tree T) are extracted and each of the labels (i.e., “HUMAN” and “TREE” ) is attached in association with the corresponding object.
In the step S212, shape information of the object is obtained. The shape information indicates a rough 3D shape (such as ellipsoid, sphere, cuboid or cube) of the object in the image. The shape information is obtained based on the depth map which includes surface coordinates of the object. The surface coordinates can be acquired from the positions of a surface of the object. A shape of the object is estimated based on the surface coordinates. Alternatively, the shape of the object may be estimated based on a bounding box around the object.
In case of the image shown in FIG. 8, shape information of the two objects, i.e., the human H and the tree T are obtained. Specifically, the shape of the human H and the tree T are obtained by performing shape recognition based on the depth map. In FIG. 12, the shape information SH indicates a rough 3D shape of the human H and the shape information ST indicates a rough 3D shape of the tree T. The shapes of the human H and the tree T are approximated as ellipsoids, respectively.
In the step S213, the 3D model is created by combining the label and the shape information. FIG. 13 shows the 3D model created by combining the label obtained in the step S211 and the shape information obtained in the step S212. The 3D model contains a model MH and a model MT.The model MH is the shape information SH associated with the label “HUMAN” . The model MT is the shape information ST associated with the label “TREE” .
The 3D model database stored in the memory 60 is updated by the 3D model created in the step S213. In other words, the 3D model in the 3D model database is updated by adding a newly created 3D model to the previous 3D model. When updating the 3D model, two objects, that is, one object in the newly created 3D model and another object in the previous 3D model may be merged if the two objects have the same label and are close in location to each other. By updating the 3D model in this way, the accuracy of the 3D model can be improved.
For example, in the step S211, a background image may be created. Specifically, the background image is created by deleting an object from the image captured by the camera module 10. An area where the object is removed (i.e., occlusion area) may be interpolated. In  the present disclosure, as shown in FIG. 14, the objects H and T are removed and the areas where the objects H and T are removed are filled by interpolation processing. The interpolation processing is performed by using the previous background images stored in the memory 60. The interpolation processing may be performed by using an image captured by an image sensor module such as a wide-angle camera provided with the camera module 10. Alternatively, the interpolation processing may be performed based on the pixels surrounding the area by using a filter such as a smoothing filter, or the interpolation processing may be performed by machine learning such as deep learning.
The background image may be converted into a spherical image so that a background image corresponding to a new position and posture of the camera module 10 can be obtained. The background image may be created in another step such as the step S223.
Returning to FIG. 9, the steps after step S22 will be described.
In the step S22, a plurality of 2D images are created/rendered based on the 3D model in the step S21. The plurality of 2D images are images when the object is viewed from a plurality of shooting positions and postures. The plurality of shooting positions and postures are defined by the current posture of the camera module 10 and the camera posture candidate list. The details of the step S22 will be described with reference to the flowchart of FIG. 15.
In the step S221, it is determined whether all postures in the camera posture candidate list are used or not. If NO, proceed to the step S222. Otherwise, end the process flow in FIG. 15.
In the step S222, a posture of the camera module 10 is selected from the camera posture candidate list.
In the step S223, a 2D image when the object is viewed from a shooting position and posture determined based on the selected posture is created. The 2D image is created based on the 3D model created in the step S21. Specifically, in the present disclosure, an object image when the object is viewed from a position and posture of the camera module 10 is created. The position and posture is defined by the current posture acquired in the step S1 and the posture selected in the step S222. When the camera posture candidate list is a list shown in FIG. 2 and the situation is shown in FIGs. 3 and 4, a virtual sphere, whose center is a center of the human H and whose radius is equal to the distance between the camera module 10 (the electronic device 100) and the human H, is considered. The angles
Figure PCTCN2021133093-appb-000006
and θ are set to be 0 at a current position and posture of the camera module 10, and then a plurality of shooting positions and postures to create the 2D images are defined.
The method of creating the 2D image according to the present disclosure will be described in detail. First, an object image when the object is viewed from the selected posture is created. The object image is created based on the 3D model database. Specifically, the object is transformed according to the viewpoint corresponding to the selected posture based on the 3D model database. Next, the object image is superimposed on the background image. Preferably, the background image may be transformed according to the viewpoint before the superposition. The step of creating the 2D images may be preferably performed by the GPU 51.
FIG. 16 shows a situation where the object, the human H is shot from three different shooting positions and postures. The shooting positions and postures Pa, Pb and Pc are defined by the current posture of the camera module 10 and the camera posture candidate list. The shooting positions and postures Pa, Pb and Pc correspond to the shooting positions and postures shown in FIG. 4. FIG. 17A shows the created 2D image when the main object (the human H) is shot from the shooting position and posture Pa. FIG. 17B shows the created 2D image when the main object is shot from the shooting position and posture Pb. FIG. 17C shows the created 2D image when the main object is shot from the shooting position and posture Pc. As shown in FIGs. 17A, 17B and 17C, the main object and the sub-object are shown as the model MH and the model MT, respectively.
In the present disclosure, the 3D model contains the simplified object models, i.e., the models MH and MT, in each of which the shape information indicating a rough 3D shape of the  object is associated with the label. By using such a 3D model, a time required to create a 2D image can be reduced because a considerable amount of complicated calculations become unnecessary to transform the objects.
In the step S23, a score for each of the plurality of 2D images is calculated. The score shows how good a composition of the 2D image is. In other words, the score shows a level of the composition of the 2D image. Preferably, the GPU 51 or the NPU 52 may calculate or infer the score by using a trained model created by the deep learning. That is to say, aesthetic evaluation is performed by machine learning.
It should be noted that the learning data for creating the trained model are 2D images in which the rough shape of an object is projected onto a background image and a label of the object is attached to the projected shape.
In the present disclosure, the objects are simplified to be shown in the 2D images (i.e., the model MH and the model MT) . Despite the simplification, the score for each of the 2D images can be calculated appropriately because the labels are given to the corresponding objects.
In the step S24, a shooting position and posture where the 2D image with the highest score can be captured is determined to be the favorable shooting position and posture. For example, the position and posture for capturing the 2D image of FIG. 17B is determined as the favorable position and posture if the score of the 2D image of FIG. 17B is the highest among the 2D images of FIGS. 17A, 17B and 17C.
Returning to FIG. 7, the step after step S2 will be described.
In the step S3, a suggestion to guide the camera module 10 to the favorable shooting position and posture is outputted. The suggestion is created based on the favorable shooting position and posture determined in the step S2. The display module 46 displays a guide information based on the suggestion. The speaker 44 may emit a guide voice based on the suggestion.
The suggestion may be superimposed on the image captured by the camera module 10, as described later with some examples. As a result, the user of the electronic device 100 can grasp the suggestion while looking at a preview image displayed on the display module 46.
The suggestion may be a pointer to show a shooting position defined by the favorable shooting position and posture. FIG. 18A shows a preview image when the human H is viewed from the shooting position and posture Pa. In FIG. 18A, the pointe P is superimposed on the preview image. The user can easily know a position where he/she should go to capture an image with good composition. The pointer P may be animated for the user’s ease of viewing.
The suggestion may be a guide arrow or pictogram to guide the camera module 10 in a shooting posture defined by the favorable shooting position and posture. FIG. 18B shows a preview image when the human H is viewed from the shooting position and posture Pb. In FIG. 18B, the guide arrow GA is superimposed on the preview image. The guide arrow GA indicates that the user should rotate the electronic device 100 (the camera module 10) around a horizontal axis. That is to say, the arrow GA suggests the user to change a pitch of the electronic device 100. Instead, the guide arrow may indicate a change of yaw or roll of the electronic device 100. The guide arrow GA may be animated so that the user can easily grasp how the camera should be rotated.
The suggestion may be a message indicating what the user should do. FIG. 18C shows a preview image when the human H is viewed from the shooting position and posture. In FIG. 18C, the message MSG is superimposed on the preview image. The message MSG shows “READY” to indicate that the user should shoot an image now.
The suggestion may be a target image. FIG. 19 is a preview image on which the target image TI is superimposed. The target image is a 2D image obtained when the favorable shooting position and posture is determined. That is to say, the target image is the 2D image with the highest score which is created in the step S22.
It should be understood that any combination of the suggestions such as the pointer, the guide arrow, the message and the target image may be outputted.
The steps S1 to S3 described above are executed each time an image is captured. In the case of shooting a video, the steps S1 to S3 are executed at a speed corresponding to the frame rate. Alternatively, the steps S1 to S3 may be executed at a predetermined interval. For example, the steps S1 to S3 are executed each time a predetermined number of images are captured.
According to the present disclosure described above, even user with little experience and knowledge of photography can grasp a favorable shooting position and posture of the camera module and successfully capture an image with good composition.
Further, the user can know a favorable shooting position and posture even if the camera module 10 has not been pointed at a main object from the favorable shooting position and posture because the plurality of 2D images are created based on the 3D model created in advance.
Still further, according to the present disclosure, it is possible to obtain an image with good composition without deterioration because it is not necessary to perform image processing for deformation, enlargement or reduction of the captured image.
In the description of embodiments of the present disclosure, it is to be understood that terms such as "central" , "longitudinal" , "transverse" , "length" , "width" , "thickness" , "upper" , "lower" , "front" , "rear" , "back" , "left" , "right" , "vertical" , "horizontal" , "top" , "bottom" , "inner" , "outer" , "clockwise" and "counterclockwise" should be construed to refer to the orientation or the position as described or as shown in the drawings in discussion. These relative terms are only used to simplify the description of the present disclosure, and do not indicate or imply that the device or element referred to must have a particular orientation, or must be constructed or operated in a particular orientation. Thus, these terms cannot be constructed to limit the present disclosure.
In addition, terms such as "first" and "second" are used herein for purposes of description and are not intended to indicate or imply relative importance or significance or to imply the number of indicated technical features. Thus, a feature defined as "first" and "second" may comprise one or more of this feature. In the description of the present disclosure, "a plurality of" means “two or more than two” , unless otherwise specified.
In the description of embodiments of the present disclosure, unless specified or limited otherwise, the terms "mounted" , "connected" , "coupled" and the like are used broadly, and may be, for example, fixed connections, detachable connections, or integral connections; may also be mechanical or electrical connections; may also be direct connections or indirect connections via intervening structures; may also be inner communications of two elements which can be understood by those skilled in the art according to specific situations.
In the embodiments of the present disclosure, unless specified or limited otherwise, a structure in which a first feature is "on" or "below" a second feature may include an embodiment in which the first feature is in direct contact with the second feature, and may also include an embodiment in which the first feature and the second feature are not in direct contact with each other, but are in contact via an additional feature formed therebetween. Furthermore, a first feature "on" , "above" or "on top of" a second feature may include an embodiment in which the first feature is orthogonally or obliquely "on" , "above" or "on top of" the second feature, or just means that the first feature is at a height higher than that of the second feature; while a first feature "below" , "under" or "on bottom of" a second feature may include an embodiment in which the first feature is orthogonally or obliquely "below" , "under" or "on bottom of" the second feature, or just means that the first feature is at a height lower than that of the second feature.
Various embodiments and examples are provided in the above description to implement different structures of the present disclosure. In order to simplify the present disclosure, certain elements and settings are described in the above. However, these elements and settings are only by way of example and are not intended to limit the present disclosure. In addition, reference  numbers and/or reference letters may be repeated in different examples in the present disclosure. This repetition is for the purpose of simplification and clarity and does not refer to relations between different embodiments and/or settings. Furthermore, examples of different processes and materials are provided in the present disclosure. However, it would be appreciated by those skilled in the art that other processes and/or materials may also be applied.
Reference throughout this specification to "an embodiment" , "some embodiments" , "an exemplary embodiment" , "an example" , "a specific example" or "some examples" means that a particular feature, structure, material, or characteristics described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. Thus, the appearances of the above phrases throughout this specification are not necessarily referring to the same embodiment or example of the present disclosure. Furthermore, the particular features, structures, materials, or characteristics may be combined in any suitable manner in one or more embodiments or examples.
Any process or method described in a flow chart or described herein in other ways may be understood to include one or more modules, segments or portions of codes of executable instructions for achieving specific logical functions or steps in the process, and the scope of a preferred embodiment of the present disclosure includes other implementations, in which it should be understood by those skilled in the art that functions may be implemented in a sequence other than the sequences shown or discussed, including in a substantially identical sequence or in an opposite sequence.
The logic and/or step described in other manners herein or shown in the flow chart, for example, a particular sequence table of executable instructions for realizing the logical function, may be specifically achieved in any computer readable medium to be used by the instructions execution system, device or equipment (such as a system based on computers, a system comprising processors or other systems capable of obtaining instructions from the instructions execution system, device and equipment executing the instructions) , or to be used in combination with the instructions execution system, device and equipment. As to the specification, "the computer readable medium" may be any device adaptive for including, storing, communicating, propagating or transferring programs to be used by or in combination with the instruction execution system, device or equipment. More specific examples of the computer readable medium comprise but are not limited to: an electronic connection (an electronic device) with one or more wires, a portable computer enclosure (a magnetic device) , a random access memory (RAM) , a read only memory (ROM) , an erasable programmable read-only memory (EPROM or a flash memory) , an optical fiber device and a portable compact disk read-only memory (CDROM) . In addition, the computer readable medium may even be a paper or other appropriate medium capable of printing programs thereon, this is because, for example, the paper or other appropriate medium may be optically scanned and then edited, decrypted or processed with other appropriate methods when necessary to obtain the programs in an electric manner, and then the programs may be stored in the computer memories.
It should be understood that each part of the present disclosure may be realized by the hardware, software, firmware or their combination. In the above embodiments, a plurality of steps or methods may be realized by the software or firmware stored in the memory and executed by the appropriate instructions execution system. For example, if it is realized by the hardware, likewise in another embodiment, the steps or methods may be realized by one or a combination of the following techniques known in the art: a discrete logic circuit having a logic gate circuit for realizing a logic function of a data signal, an application-specific integrated circuit having an appropriate combination logic gate circuit, a programmable gate array (PGA) , a field programmable gate array (FPGA) , etc.
Those skilled in the art shall understand that all or parts of the steps in the above exemplifying method of the present disclosure may be achieved by commanding the related hardware with programs. The programs may be stored in a computer readable storage medium,  and the programs comprise one or a combination of the steps in the method embodiments of the present disclosure when run on a computer.
In addition, each function cell of the embodiments of the present disclosure may be integrated in a processing module, or these cells may be separate physical existence, or two or more cells are integrated in a processing module. The integrated module may be realized in a form of hardware or in a form of software function modules. When the integrated module is realized in a form of software function module and is sold or used as a standalone product, the integrated module may be stored in a computer readable storage medium.
The storage medium mentioned above may be read-only memories, magnetic disks, CD, etc. The storage medium may be transitory or non-transitory.
Although embodiments of the present disclosure have been shown and described, it would be appreciated by those skilled in the art that the embodiments are explanatory and cannot be construed to limit the present disclosure, and changes, modifications, alternatives and variations can be made in the embodiments without departing from the scope of the present disclosure.

Claims (15)

  1. A method of suggesting a shooting position and posture for an electronic device having a camera, the method comprising:
    acquiring an image captured by the camera, a depth map corresponding to the image, and a current posture of the camera when the image is captured;
    determining a favorable shooting position and posture of the camera based on the image, the depth map and the current posture; and
    outputting a suggestion to guide the camera to the favorable shooting position and posture.
  2. The method of claim 1, wherein the determining a favorable shooting position and posture of the camera based on the image, the depth map and the current posture comprises:
    creating a 3D model of an object in the image based on the image and the depth map;
    creating, based on the 3D model, a plurality of 2D images when the object is viewed from a plurality of shooting positions and postures, wherein the plurality of shooting positions and postures are defined by the current posture and a camera posture candidate list which stores information on a plurality of postures of the camera;
    calculating, for each of the plurality of 2D images, a score showing how good a composition of the 2D image is; and
    determining, as the favorable shooting position and posture, a shooting position and posture where the 2D image with the highest score can be captured.
  3. The method of claim 2, wherein the creating a 3D model of an object in the image based on the image and the depth map comprises:
    obtaining a label of the object in the image;
    obtaining shape information of the object, wherein the shape information indicates a rough 3D shape of the object; and
    creating the 3D model by combining the label and the shape information.
  4. The method of claim 3, wherein the label of the object is obtained by performing object recognition using a trained model created by deep learning.
  5. The method of any one of claims 2 to 4, wherein the creating, based on the 3D model, a plurality of 2D images when the object is viewed from a plurality of shooting positions and postures comprises:
    selecting a posture of the camera from the camera posture candidate list; and
    creating a 2D image when the object is viewed from a shooting position and posture determined based on the selected posture.
  6. The method of claim 5, wherein the creating a 2D image when the object is viewed from a shooting position and posture determined based on the selected posture comprises:
    creating an object image when the object is viewed from the selected posture;
    creating a background image by deleting the object from the image captured by the camera; and
    superimposing the object image on the background image.
  7. The method of any one of claims 2 to 6, wherein
    the camera posture candidate list defines a plurality of postures of the camera when an optical axis of the camera passes through the object and the camera is located on a surface of a sphere whose center is located at the object and whose radius is substantially equal to a distance between the camera and the object.
  8. The method of any one of claims 2 to 7, wherein the score is calculated using a trained model created by a deep learning.
  9. The method of any one of claims 1 to 8, wherein the suggestion is superimposed on the image.
  10. The method of claim 9, wherein the suggestion is a pointer to show a shooting position defined by the favorable shooting position and posture.
  11. The method of claim 9, wherein the suggestion is a guide arrow to guide the camera in a shooting posture defined by the favorable shooting position and posture.
  12. The method of claim 9, wherein the suggestion is a message indicating what a user should do.
  13. The method of claim 9, wherein the suggestion is a target image which is a 2D image obtained when the favorable shooting position and posture is determined.
  14. An electronic device having a camera comprising:
    an acquiring unit configured to acquire an image captured by the camera, a depth map corresponding to the image, and a current posture of the camera when the image is captured;
    a determining unit configured to determine a favorable shooting position and posture of the camera based on the image, the depth map and the current posture; and
    an outputting unit configured to output a suggestion to guide the camera to the favorable shooting position and posture.
  15. A computer-readable storage medium, on which a computer program is stored, wherein the computer program is executed by a computer to implement the method according to any one of claims 1 to 13.
PCT/CN2021/133093 2021-11-25 2021-11-25 Method of suggesting shooting position and posture for electronic device having camera, electronic device and computer-readable storage medium WO2023092380A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/133093 WO2023092380A1 (en) 2021-11-25 2021-11-25 Method of suggesting shooting position and posture for electronic device having camera, electronic device and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/133093 WO2023092380A1 (en) 2021-11-25 2021-11-25 Method of suggesting shooting position and posture for electronic device having camera, electronic device and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2023092380A1 true WO2023092380A1 (en) 2023-06-01

Family

ID=86538500

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/133093 WO2023092380A1 (en) 2021-11-25 2021-11-25 Method of suggesting shooting position and posture for electronic device having camera, electronic device and computer-readable storage medium

Country Status (1)

Country Link
WO (1) WO2023092380A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104145474A (en) * 2011-12-07 2014-11-12 英特尔公司 Guided image capture
CN108701352A (en) * 2016-03-23 2018-10-23 英特尔公司 Amending image using the identification based on three dimensional object model and enhancing
CN109767485A (en) * 2019-01-15 2019-05-17 三星电子(中国)研发中心 Image processing method and device
CN110913140A (en) * 2019-11-28 2020-03-24 维沃移动通信有限公司 Shooting information prompting method and electronic equipment
CN111328396A (en) * 2017-11-15 2020-06-23 高通科技公司 Pose estimation and model retrieval for objects in images
JP2020107251A (en) * 2018-12-28 2020-07-09 株式会社バンダイナムコエンターテインメント Image generation system and program
CN111935393A (en) * 2020-06-28 2020-11-13 百度在线网络技术(北京)有限公司 Shooting method, shooting device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104145474A (en) * 2011-12-07 2014-11-12 英特尔公司 Guided image capture
CN108701352A (en) * 2016-03-23 2018-10-23 英特尔公司 Amending image using the identification based on three dimensional object model and enhancing
CN111328396A (en) * 2017-11-15 2020-06-23 高通科技公司 Pose estimation and model retrieval for objects in images
JP2020107251A (en) * 2018-12-28 2020-07-09 株式会社バンダイナムコエンターテインメント Image generation system and program
CN109767485A (en) * 2019-01-15 2019-05-17 三星电子(中国)研发中心 Image processing method and device
CN110913140A (en) * 2019-11-28 2020-03-24 维沃移动通信有限公司 Shooting information prompting method and electronic equipment
CN111935393A (en) * 2020-06-28 2020-11-13 百度在线网络技术(北京)有限公司 Shooting method, shooting device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111465962B (en) Depth of motion for augmented reality of handheld user device
US9269003B2 (en) Diminished and mediated reality effects from reconstruction
WO2019223382A1 (en) Method for estimating monocular depth, apparatus and device therefor, and storage medium
KR102131477B1 (en) Methods for facilitating computer vision application initialization
US11328486B2 (en) Volumetric capture of objects with a single RGBD camera
US20150262412A1 (en) Augmented reality lighting with dynamic geometry
US10755486B2 (en) Occlusion using pre-generated 3D models for augmented reality
JP2015228050A (en) Information processing device and information processing method
US20240046590A1 (en) Reconstruction of Essential Visual Cues in Mixed Reality Applications
WO2022016953A1 (en) Navigation method and apparatus, storage medium and electronic device
US11748913B2 (en) Modeling objects from monocular camera outputs
US20230213944A1 (en) Robot and control method therefor
US20220036644A1 (en) Image processing apparatus, image processing method, and program
JP7477596B2 (en) Method, depth estimation system, and computer program for depth estimation
KR20230097163A (en) Three-dimensional (3D) facial feature tracking for autostereoscopic telepresence systems
WO2023092380A1 (en) Method of suggesting shooting position and posture for electronic device having camera, electronic device and computer-readable storage medium
JP7006810B2 (en) 3D measuring device, mobile robot, push wheel type moving device and 3D measurement processing method
JP2023171298A (en) Adaptation of space and content for augmented reality and composite reality
US20230186575A1 (en) Method and apparatus for combining an augmented reality object in a real-world image
US20210037230A1 (en) Multiview interactive digital media representation inventory verification
TWM630947U (en) Stereoscopic image playback apparatus
CN115686233A (en) Interaction method, device and interaction system for active pen and display equipment
US20210383617A1 (en) Image processing device, image processing method, program, and display device
US20240137646A1 (en) Method and electronic device for generating point cloud
EP4206977A1 (en) Electronic device and control method of electronic device