EP4138390A1 - Method for camera control, image signal processor and device with temporal control of image acquisition parameters - Google Patents

Method for camera control, image signal processor and device with temporal control of image acquisition parameters Download PDF

Info

Publication number
EP4138390A1
EP4138390A1 EP21192389.1A EP21192389A EP4138390A1 EP 4138390 A1 EP4138390 A1 EP 4138390A1 EP 21192389 A EP21192389 A EP 21192389A EP 4138390 A1 EP4138390 A1 EP 4138390A1
Authority
EP
European Patent Office
Prior art keywords
frame
image
stream
image frames
target frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP21192389.1A
Other languages
German (de)
French (fr)
Inventor
Jarno Nikkanen
Jiaqi Guo
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to EP21192389.1A priority Critical patent/EP4138390A1/en
Priority to US17/855,394 priority patent/US20230058934A1/en
Priority to CN202210880734.8A priority patent/CN115714919A/en
Publication of EP4138390A1 publication Critical patent/EP4138390A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof
    • H04N23/84Camera processing pipelines; Components thereof for processing colour signals
    • H04N23/88Camera processing pipelines; Components thereof for processing colour signals for colour balance, e.g. white-balance circuits or colour temperature control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/70Circuitry for compensating brightness variation in the scene
    • H04N23/71Circuitry for evaluating the brightness variation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/765Interface circuits between an apparatus for recording and another apparatus
    • H04N5/77Interface circuits between an apparatus for recording and another apparatus between a recording apparatus and a television camera
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Definitions

  • the present invention relates to electronic devices and a method to control such electronic device. More particularly, the present invention relates to a method for camera control to acquire an image and an image signal processor (ISP) implementing such method. Further, the present invention relates to a device implementing such method.
  • ISP image signal processor
  • a method according to claim 1 is provided for camera control to acquire an image and a camera device is provided according to claim 14.
  • a method for camera control is provided to acquire an image.
  • the method comprises the steps:
  • a stream of image frames is acquired by an image sensor of the camera comprising at least one frame and preferably a plurality of subsequent frames.
  • the stream of image frames might be used as preview of the camera or might be part of a video stream.
  • a target frame is acquired by the image sensor wherein selection of the target frame might be performed by user interaction such as pushing a trigger button to start recording a video or acquiring an image or is the next image of the video stream or is a frame of a preview operation.
  • the target frame is the raw data of the image intended by the user to be captured or displayed to the user in a preview.
  • scene information of the target frame is determined.
  • the scene information can be related to the whole target frame or any real-world object in the target frame.
  • the object encompasses shapes, surfaces and structures that can be used to be identified in the stream of image frames and might contain multiple whole objects and some partially visible objects, or it could contain only part of one object.
  • scene information can be determined for parts of the target frame or for the complete target frame.
  • scene information of a part of the image frame or of the complete image frame can be determined to identify match of the scene information.
  • At least one reference frame is selected from the stream of image frames by identifying the scene information of the target frame in the reference frame.
  • Each frame of the stream of image frames is checked whether there is at least a partial match of the corresponding scene information of the target in the respective image frame.
  • the image frames of the stream of image frames are checked for coinciding scene information.
  • the target frame content can be compared by the scene information against the earlier frames as a whole, to see how much of the current frame contents is visible in the earlier frames, without segmenting the target frame contents into objects and then comparing object by object.
  • the scene information can be identified in one of the frames of the stream of image frames, this frame of the stream of image frames is selected and taken as reference frame.
  • the method is consecutively going through the image frames of the stream of image frames to identify the respective scene information and select the reference frame. Alternatively, only those image frames are checked which potentially provide improvement to the acquisition accuracy and consistency.
  • the acquisition parameter might relate to an auto white balancing (AWB), automatic exposure control (AEC) and/or tone-mapping (TM) parameter.
  • ABB auto white balancing
  • AEC automatic exposure control
  • TM tone-mapping
  • the acquisition parameter of an image frame acquired before capturing the target frame are used in order to increase the consistency and accuracy of color and brightness reproduction of images and videos.
  • more information about the scene in which the camera is operated is used from the previously acquired image frames.
  • scene information may include localization information, for the image frame of the stream of image frames and the target frame, e.g. simultaneous localization and mapping (SLAM) data.
  • SLAM simultaneous localization and mapping
  • the camera can easily determine whether there is a match of the scene information by overlap of the SLAM data.
  • the SLAM data for example the presence of an object of the target frame which is also present in one of the image frames of the stream of image frames can be determined.
  • selecting of the reference frame can be performed.
  • SLAM data can be acquired for a part of the target frame or the complete target frame.
  • SLAM data can be acquired for each of the complete image frame or only parts of the respective image frame.
  • the present invention is not limited to identification of specific and previously trained objects.
  • the method is independent of the respective object which can be any object of the real-world, specific structures, surfaces or shapes which are localized and mapped by the SLAM process.
  • most modern terminals such as smartphones, tablets or the like, already have SLAM modules implemented, such that the information provided by the SLAM module can be used for identification of the target frame in the present invention.
  • the scene information includes depth information or odometry information of the image frame and/or the target frame.
  • scene information includes a pose of the image sensor, i.e. the camera.
  • the camera includes one or more of an inertial motion unit (IMU) such as an acceleration sensor, a gyroscope or the like in order to be able to acquire the pose of the camera.
  • IMU inertial motion unit
  • the depth information of the object might be provided by stereo camera measurement, LIDAR or the like.
  • pose and depth information/odometry information might also be included in the SLAM data.
  • selecting a reference frame from the stream of image frames by identifying the scene information of the target frame in the reference frame includes determining an at least partial overlap of the image frame from the stream of image frames with the target frame by the scene information.
  • partial overlap of the scene contents of the target frame and the image frame is determined in order to make sure that use of the at least one acquisition parameter of the selected reference frame to determine the final image is applicable.
  • objects present and visible in the target frame are also at least partially present and visible in the respective image frame when the scene information of the target frame coincides with the scene information of the image frame of the stream of image frames.
  • scene information include coordinates of the scene and preferably an object of the scene.
  • Selecting the reference frame from the stream of images by identifying the scene information of the target frame includes calculating coordinates of the scene and determining overlap with coordinates of the respective image frame of the stream of image frames.
  • the image frame can be selected as reference frame.
  • coordinates of an object can be any real-world object, such as shapes, structures, surfaces or the like.
  • the object might be further several real-world objects or parts thereof, only one real-world object or a part thereof.
  • SLAM data and/or depth information and/or the pose of the image sensor are used in order to calculate the coordinates of the scene or object in the scene.
  • the coordinates are calculated in a world coordinate system to be able to be compared between the individual frames and also if the camera is moving or the pose of the camera is changing.
  • calculating the coordinates of the scene or object of the scene includes:
  • the coordinates of the target frame provided in the world coordinate system are compared with the coordinates of each image frame in the stream of image frames also in the world coordinate system subsequently to determine the partial overlap with the target frame.
  • selecting the reference frame includes determining a confidence level of the respect frame for the acquisition parameter and selecting the reference frame if the confidence level is above a preset threshold.
  • a measure is provided whether the determined at least one or more acquisition parameters of the respective image frame are suitable to be used in order to determine the final image. Only if the confidence level is high enough, i.e. above a preset threshold, the image frame of the stream of image frames is selected as reference image.
  • the confidence level of the respective image frame to be selected as reference frame needs to be above the confidence level of the target frame in order to provide an improvement of consistency and accuracy of color and brightness reproduction of the image.
  • the acquisition parameters are determined from the target frame itself.
  • the reference frame is selected by the maximum of overlap between the respective image frame of the stream of image frames and the target frame and the confidence level of the respective image frame of the stream of image frames.
  • an optimum of color and brightness consistency and accuracy can be achieved.
  • the confidence value is determined by one or more of a color gamut in particular for AWB, brightness gamut for AEC and/or TM, a hull of the 2D chromaticity for AWB, 1D brightness range for AEC and/or TM, or 3D color histogram for AWB and/or AEC and/or TM. If SLAM data is used to make a rough model about the scene in which the camera is operated, then AWB/AEC/TM parameters from image frames having a higher confidence level can be used to correct the acquisition parameters that result for target frames having a lower confidence level, hence increasing the consistency and accuracy of color and brightness reproduction.
  • the image frame from the stream of image frames comprises low resolution images having a resolution lower than the final image and in particular a resolution smaller than 640x480 pixel, more preferably a resolution smaller than 320x240 pixel and more preferably a resolution smaller than 64x48 pixel.
  • the image frames from the stream of image frames can be easily stored and processed without increase of computational demands on the device.
  • the image frames of the stream of image frames are stored in a memory of the camera for subsequent use to determine the acquisition parameters.
  • the image frames from the stream of image frames provides low resolution images, the image frames can be easily stored without excessive memory consumption.
  • only the image frames of the stream of image frames might be stored having a confidence level above a preset threshold. Thus, only those image frames are stored which can be used as reference images while the other image frames of the stream of image frames are disregarded in order to further reduce the demands on memory.
  • the camera pose is stored together with the stored image frames of the stream of image frames.
  • the coordinates of the object in the respective image frames can be calculated.
  • Further information might be stored together with the image frames of the stream of image frames such as focal length, principal point and depth information.
  • the method further comprises: Detecting change of illumination between the reference frame and the target frame and adapting the reference frame to the changed illumination before determining the acquisition parameter.
  • more than one reference frames are selected wherein the at least one acquisition parameter is determined from the more than one reference frame for example by averaging.
  • weighted averaging can be used, wherein the acquisition parameter of the more than one reference frame are weighted by their respective confidence value.
  • the steps of the method are iteratively repeated for every new target frame of a video stream or a stream of preview-images.
  • an image signal processor is provided.
  • the ISP is configured to perform the steps of the method described before.
  • the ISP is connectable to an image sensor to receive image data or image frames.
  • the IPS may be connectable to a SLAM module of a device implementing the ISP which may be a terminal or the like.
  • a camera device preferably implemented in a mobile terminal.
  • the camera device comprises an image sensor, a processor and a memory storage storing instruction which when executed by the processor perform the steps of the method described above.
  • the camera device comprises a SLAM module to acquire SLAM data to identify the reference frame.
  • the present invention is related to a camera control to improve the consistency and accuracy of color and brightness reproduction of images and videos in particular during automatic white balancing (AWB), automatic exposure control (AEC) and tone-mapping (TM) algorithms.
  • ALB automatic white balancing
  • AEC automatic exposure control
  • TM tone-mapping
  • the method according to the present invention is implemented in a camera module preferably of a terminal such as a smartphone, tablet or the like.
  • the camera module is connected to a processing module for performing the steps of the invention.
  • the processing module might comprise an Image Signal Processor (ISP) or the like.
  • ISP Image Signal Processor
  • the present invention is not restricted to a certain kind of terminals or any specific implementation.
  • FIG. 1 showing the method for camera control to acquire an image.
  • step S01 a stream of image frames is acquired by an image sensor, wherein the stream of image frames comprises at least one frame.
  • a stream of image frames is acquired by an image sensor of the camera comprising at least one frame and preferably a plurality of subsequent frames.
  • the stream of image frames might be used as preview of the camera or is part of a video stream captured.
  • the image frames of the stream of image frames have a low resolution, preferably lower than 640x480 pixel, more preferably a resolution smaller than 320x240 pixel and more preferably a resolution smaller than 64x48 pixel.
  • the image frames are 3A statistics instead of original raw frames in order to reduce memory consumption, for example a 2D RGB grid that represents linearized raw camera RGB image frame.
  • step S02 a target frame is acquired by the image sensor.
  • selection of the target frame might be performed by user interaction such as pushing a trigger button to start recording a video or acquiring an image.
  • the target frame is determined by the next frame of a video stream to be captured or the next frame of a preview.
  • the target frame is the raw data of the image intended by the user to be captured.
  • step S03 scene information of the target frame is determined preferably by the processing module or ISP.
  • scene information includes any information about the scene of the target frame.
  • Scene information can be determined for parts of the target frame or for the complete target frame.
  • scene information of a part of the image frame or of the complete image frame can be determined to identify match of the scene information.
  • a reference frame is selected from the stream of image frames by identifying the scene information of the target frame in the reference frame preferably by the processing module or ISP.
  • Each frame of the stream of image frames is checked whether there is at least a partial overlap between the scene information of the target frame and the respective image frame whether scene content of the target frame is partially or completely present in the respective image frame. Alternatively, only those image frames are checked which potentially provide improvement to the acquisition accuracy and consistency. If the scene information can be identified in one of the frames of the stream of image frames, this frame of the stream of image frames is selected and taken as reference frame. Therein, preferably the method is consecutively going through the image frames of the stream of image frames to identify the respective scene information and select the reference frame. Thus, overlap between the target frame and the respective image frame of the stream of image frames is determined by the scene information to identify a possible reference frame to be selected if sufficient overlap is determined.
  • Step S05 at least one acquisition parameter of the reference frame is determined preferably by the processing module or ISP.
  • the at least one acquisition parameter might be an auto white balancing (AWB), automatic exposure control (AEC) and/or tone-mapping (TM) parameter determined from the reference frame.
  • ABB auto white balancing
  • AEC automatic exposure control
  • TM tone-mapping
  • more than one reference frames are selected wherein the at least acquisition parameter is determined from the more than one reference frame for example by averaging.
  • all reference frames that have match score above certain level can be selected.
  • weighted averaging can be used, wherein the acquisition parameter of the more than one reference frame are weighted by their respective confidence value.
  • step S06 a final image is determined from the target frame by the at least one acquisition parameters preferably by the processing module or ISP.
  • the target frame contains raw data and as soon as the respective acquisition parameter is determined, the raw data of the target stream is determined by use of the one or more acquisition parameter from the reference frame.
  • the acquisition parameters of an image frame acquired before capturing the target frame are used in order to increase the consistency and accuracy of color and brightness reproduction of images and videos.
  • more information about the scene in which the camera is operated is used from the previously acquired image frames.
  • step S04 localization information and more preferably SLAM data might be used as scene information to make a rough model about the scene in which the camera is operated in order to determine the reference frame including at least partially the same scene contents as the target frame. Then AWB/AEC/TM parameters from frames having a higher confidence level can be used to correct the parameters that result for the target frame having a lower confidence level, hence increasing the consistency and accuracy of color and brightness reproduction.
  • the camera can easily determine whether scene information of the target frame is also present in one of the image frames of the stream of image frames if there is at least a partial overlap in the scene content between the respective image frame and the target frame.
  • selecting of the reference frame can be performed.
  • the method is independent of a respective object to be recognized and any object of the real-world, such as structures, surfaces or shapes which are localized and mapped by the SLAM process can be used to determine overlap between the target frame and the respective image frame.
  • any object of the real-world such as structures, surfaces or shapes which are localized and mapped by the SLAM process can be used to determine overlap between the target frame and the respective image frame.
  • most modern terminals such as smartphones, tablets or the like, already have SLAM modules implemented, such that the information provided by the SLAM module can be used for identification of the target frame in the present invention.
  • the method can be implemented in an iterative process and repeated for each new target frame being a frame of a video stream or a preview, thereby continuously improving the image reproduction.
  • figure 2 showing the steps in order to acquire a final image.
  • figure 2 refers to the implementation for an AWB algorithm.
  • the present method can alternatively or at the same time also implemented in AEC or TM algorithm as mentioned above.
  • picture A an initial image is acquired wherein by an auto white balancing algorithm acquisition parameters related to the AWB is determined for the initial image and applied in picture B to achieve correctly adjusted picture. Therein, by a SLAM algorithm simultaneous localization and mapping of the content of the picture B is performed and for the scene of the respective image frame a point cloud is determined as scene information. These steps are repeated for each image frame of the stream of image frames including the pictures A to E of figure 2 .
  • Picture C shows a closer look of the respective objects in the scene by moving camera closer to the object 14 or by zooming in.
  • object 14 is present in both image frames B and C, wherein points 14 of the point cloud mark the object 14.
  • points 14 of the point cloud mark the object 14.
  • other points 10 of the point cloud are detected.
  • Picture D shows the same object 14 even closer thereby reducing the color gamut of the image.
  • Picture E only contains the object 14 and almost all color information is drawn directly from the object 14 itself leading to a low color gamut to be used as information for determining the respective AWB parameter of picture E.
  • the AWB algorithm might fail resulting in wrong colors of the object 14 as shown in picture F.
  • picture B of Figure 2 the image has a high color gamut and thus a high confidence level can be achieved for the acquisition parameters related to the AWB parameters. Further, the target frame shown in picture E has full overlap with the content of picture B since both show the object 14.
  • the scene information of picture E including object 14 is identified subsequently in each of the images D, C and B in reversed order of acquisition until a picture is reached having a high confidence level regarding the AWB parameter and still have an overlap in the scene content, i.e. showing the object 14.
  • the image frame includes object 14 completely but also partial overlap of the scene content between the target frame of picture E and the possible reference frame might be sufficient to improve color reproduction.
  • the present method is not limited to specific objects and any object as scene content can be used as scene information, such as surfaces, shapes, structures or the like.
  • Fig. 2 showing as an example object 14, other objects or parts of objects are also possible.
  • This comparison and identification of overlap by the scene information between the image frames in the plurality of image frames B-D and the target frame E is preferably performed by acquiring SLAM data as scene information for each of the pictures B to E.
  • the SLAM data of object 14 can be identified by the world coordinates of the object 14 determined by the SLAM algorithm in the other frames in order to determine overlap.
  • picture C is used as reference frame and the AWB parameters determined for the picture C are also used for the AWB of picture E leading to picture E having a corrected AWB and producing correct colors thereby improving color consistency and accuracy for the object 14.
  • the corrected AWB produces a result shown in picture G of figure 2 having the correct color and not suffering from the reduced color information provided by the picture E itself.
  • Figure 3 shows a world coordinate system 22.
  • a first step upon acquiring a frame 20 of a stream of image frames which might be used as reference frame, by the acquired depth information or odometry information, coordinates of the object 14 in the image frame 20 can be determined in the camera coordinate system 26 of camera in a first state/position denoted by "cam1".
  • the pose (R1, t1) of the camera "cam 1” and the coordinates of the object 14 in the camera coordinate system 26 of "cam1” coordinates of the object 14 in the world coordinate system 22 can be determined.
  • any object, surface, shape or structure can be used and coordinates can be determined to determine overlap between target frame and respective image frame. Further, coordinates of a plurality of objects present in the scene, parts of a plurality of object in the scene or part of only one object in the scene can be used in order to determine overlap between the target frame and the respective image frame.
  • the coordinates of the object 14 in the target frame 32 can be determined in the camera coordinate system 30 of "cam2".
  • the pose (R2, t2) of the camera “cam 2” and the coordinates of the object 14 in the camera coordinate system 30 of "cam2” coordinates of the object 14 in the world coordinate system 22 can be determined.
  • overlap between the target frame 32 and the frame 20 can be determined.
  • overlap is determined by of a set of the 3D point of the 3D point cloud 34 in the world coordinate system that are visible in both target and reference frame, and there is no distinction done regarding which object(s) these points belong to.
  • the 3D point cloud may be determined from the depth information, the camera position and/or camera orientation information (camera pose) as exemplified in more detail below.
  • the coordinates of the object 14 can be determined in the world coordinate system for the target frame 32 of "cam2".
  • the 3D point cloud 34 of the target frame 32 is available in the world coordinate system. Depth information/map, camera position and/or camera pose from target frame 32 was used to construct this 3D point cloud 34.
  • the distance of the camera at camera state "cam1" from those 3D points based on the camera pose and/or camera position in image frame 20 is determined to determine which area of the image frame 20 covers those 3D points of the 3D point cloud 34.
  • depth information of the image frame 20 may not be available and only overlap of the scene or object of the reference frame 32 with the image frame 20 is determined without the need to calculate coordinates of the whole image frame 20 in the world coordinate system.
  • the coordinates of each pixel in the target frame might be translated into the world coordinate system 22. Alternatively, only for certain points of the target frame coordinates are determines. Similarly, for the respective image frame from the stream of image frames either for each of the pixels coordinates in the world coordinate system are determined, or, alternatively, for a selection of pixels of the respective image frame the coordinates are determined and translated into the world coordinate system 22 in order to identify overlap between the target frame or the object in the target frame and the respective image frame.
  • the coordinates of the scene or object 14 of the target frame 32 can be translated into the world coordinate system 22 and can than be compared with the world coordinates of the scene or object 14 of the reference frame 20 in order to determine whether the object 14 is present in the target frame 32 and the reference frame 20. Only if there is an overlap, i.e. the object 14 is at least partially visible in the respective image frame, this frame is considered to be used as reference frame.
  • the acquisition parameters of the determined reference frame are used in order to produce the final image. Therein, for each frame, it is checked if there is at least a partial overlap of the scene with earlier frames. If yes, then it is checked whether the earlier frames have higher confidence level for the acquisition parameters available (separately for AWB, AEC, and TM).
  • the system contains 3 parts in total.
  • the first part is running SLAM 48 on the device by SLAM input data 46 from the image, IMU and depth data for camera pose estimation and scene modelling 50 acquiring a depth map or depth information.
  • a sequence of image frames is captured and stored 40.
  • the stored frames could be also low resolution 3A statistics instead of original raw frames in order to reduce memory consumption, for example 2D RGB grid that represents linearized raw camera RGB image.
  • the corresponding camera pose is stored per each frame, which is a 4 ⁇ 4 matrix, alongside with other image metadata such as camera's focal length (cx,cy), principle point (px,py), and uncorrected algorithm parameters 42 such as AWB gains.
  • the depth data or odometry data will be collected at the same time.
  • An algorithm confidence value 44 is calculated for each frame; for example, color gamut, convex hull of the 2D chromaticity or 3D color histogram could be used as the confidence metric for AWB/AEC/TM, because more colors visible inside the FOV usually makes the scene easier for AWB and also makes it easier for AEC and TM to estimate the correct brightness of objects in relation to other objects in the scene.
  • the convex hull should be calculated from image data in device independent color space to enable using the same thresholds for high and low confidence for all devices. Higher confidence frames are the potential reference frames that can be utilized for correction of low confidence frames.
  • a decision 52 is made whether the respective image frame has a high confidence level. If the image frame has a high confidence level, the image frame is stored to be used later as reference frame for a video stream, preview or picture. For the final image of the high confidence frames, the uncorrected AWB/AEC/TM parameter are used to generate the final image.
  • the next step is to verify if the target frame i contents has shown in the most recent potential reference frames 60 from the data 62 of all acquired potential reference frames (or any of the high confidence frames that are identified to belong to the same physical space in which the camera is currently operated).
  • the 3D points of the target frame determined before are projected back to the potential reference frame j by following the steps described above in reverse, and replace the (R
  • Frame j is selected as reference frame based on maximizing the proportion of the low confidence frame i that is visible in the reference frame j ( c_common_area(i,j) ) and maximizing the confidence level that the reference frame j has ( c_confidence(j) ) .
  • the maximized value is the product c_common_area(i,j) ⁇ c_confidence(j), but also other implementations are possible.
  • AWB Automatic White Balancing
  • WB White Balancing
  • RGB gains that enable correct and consistent color reproduction of object colors regardless of the prevailing illumination, hence achieving color constancy. For example, white objects are reproduced as white regardless of the color of the illumination (if chromatic adaptation processing is excluded).
  • x ⁇ C ⁇ G ⁇ x
  • x is the 3 ⁇ 1 vector that corresponds to linearized raw camera RGB value
  • G is the diagonal 3 ⁇ 3 WB RGB gains matrix (the diagonal values are the WB RGB gains)
  • C is the 3x3 color space conversion matrix to convert from linearized raw camera RGB to device independent linear RGB.
  • Illumination change between frames i and j detected 64 by comparing the linearized raw pixel RGB average values common_area_avg_rgb(i) and common_area_avg_rgb(j) that belong to the same object surfaces that is visible in both frames, and that have been normalized to eliminate impact from any difference in exposure (both are 3x1 RGB vectors).
  • each point of the 3D point cloud 34 as shown in figure 3 have a corresponding RGB value in both target and reference frame. These are the points from which the "common_area_avg_rgb" are calculated from each frame.
  • a decision 66 is made whether an illumination change could be detected.
  • AWB AWB/AEC/TM parameter determined for the respective reference frame j are used and applied 72 to the target frame to achieve high color accuracy and consistency.
  • the camera device 100 comprises a processor 102 and a memory 104.
  • the memory 104 stores instructions which when executed by the processor 102 carry out the steps of the method described above.
  • the camera device 100 might further comprise or is connected to an image sensor to acquire image data to be used in the method of the present invention.
  • the camera device might comprise or might be connected to a SLAM module.
  • the camera device 100 might have an individual SLAM module or a SLAM module is implemented in the terminal device used by the camera device 100.
  • the camera device 100 is shown together with the image sensor 106 and the SLAM module 108 as integrated component of the terminal.
  • SLAM data/depth information provided by the SLAM module of the terminal or camera more information about the respective scene can be used and thus scene information can be identified in different frames in order to improve consistency and accuracy of color reproduction the acquisition parameters of frames having a higher confidence level are used.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

Method and device for camera control to acquire an image. The Method includes: Acquiring a stream of image frames by an image sensor comprising at least one frame; Acquiring a target frame by the image sensor; Determining scene information in the target frame; Selecting a reference frame from the stream of image frames by identifying the scene information of the target frame in the reference frame; Determining at least one acquisition parameter of the reference frame; and Acquiring a final image from the target frame with the acquisition parameters.

Description

    Technical field
  • The present invention relates to electronic devices and a method to control such electronic device. More particularly, the present invention relates to a method for camera control to acquire an image and an image signal processor (ISP) implementing such method. Further, the present invention relates to a device implementing such method.
  • Background
  • In the current existing camera systems, some framings of a scene are difficult and implemented algorithm for auto white balancing (AWB), automatic exposure control (AEC) and tone mapping (TM) might generate unsatisfying results. Especially if there is only one color or only a limited number of different colors visible in the frame, AWB can fail to reach a correct illumination estimate, and AEC/TM can fail to estimate the real brightness of the object correctly. Consequently, there can be inconsistency in color and brightness reproduction between different frames of the same scene, which leads to worse image and video quality and worse user experience.
  • This issue of different framings of the same scene having different color and/or brightness reproduction is still present in all digital camera devices. The most common handling of temporal stability still relies on straightforward temporal filtering of the acquisition parameters of the AWB/AEC/TM algorithm by using e.g. trimmed mean or other similar filters across algorithm results for multiple frames, which ensures smooth transitions between acquisition parameters of subsequent frames, but does not ensure that the same object under the same illumination will be always reproduced consistently.
  • To solve this problem, more information about the scene should be utilized than only the current camera frame. One possibility is temporal filtering of consecutive AWB and/or AEC/TM results. This results in smooth transitions between each of the subsequent frames, but does not prevent convergence into wrong parameters. So, it does not solve the presented problem.
  • Thus, it is an object of the present invention to improve the consistency and accuracy of color and brightness reproduction of images and videos in automatic white balancing (AWB), automatic exposure control (AEC), and tone-mapping (TM) algorithms.
  • Summary
  • By the present invention a method according to claim 1 is provided for camera control to acquire an image and a camera device is provided according to claim 14.
  • In a first aspect of the present invention a method for camera control is provided to acquire an image. The method comprises the steps:
    • Acquiring a stream of image frames by an image sensor comprising at least one frame;
    • Acquiring a target frame by the image sensor;
    • Determining scene information of the target frame;
    • Selecting at least one reference frame from the stream of image frames by identifying the scene information of the target frame in the reference frame; Determining at least one acquisition parameter of the reference frame; and Determining a final image from the target frame by the at least one acquisition parameter.
  • Thus, in accordance with the present invention, a stream of image frames is acquired by an image sensor of the camera comprising at least one frame and preferably a plurality of subsequent frames. In particular, the stream of image frames might be used as preview of the camera or might be part of a video stream.
  • Subsequently, a target frame is acquired by the image sensor wherein selection of the target frame might be performed by user interaction such as pushing a trigger button to start recording a video or acquiring an image or is the next image of the video stream or is a frame of a preview operation. Thus, the target frame is the raw data of the image intended by the user to be captured or displayed to the user in a preview.
  • Subsequently, scene information of the target frame is determined. Therein, the scene information can be related to the whole target frame or any real-world object in the target frame. Therein, the object encompasses shapes, surfaces and structures that can be used to be identified in the stream of image frames and might contain multiple whole objects and some partially visible objects, or it could contain only part of one object. Further, scene information can be determined for parts of the target frame or for the complete target frame. Similarly, in order to identify the scene information in the respective image frame of the stream of image frames, scene information of a part of the image frame or of the complete image frame can be determined to identify match of the scene information.
  • Afterwards at least one reference frame is selected from the stream of image frames by identifying the scene information of the target frame in the reference frame. Each frame of the stream of image frames is checked whether there is at least a partial match of the corresponding scene information of the target in the respective image frame. Thus, the image frames of the stream of image frames are checked for coinciding scene information. In particular, the target frame content can be compared by the scene information against the earlier frames as a whole, to see how much of the current frame contents is visible in the earlier frames, without segmenting the target frame contents into objects and then comparing object by object. If the scene information can be identified in one of the frames of the stream of image frames, this frame of the stream of image frames is selected and taken as reference frame. Therein, preferably the method is consecutively going through the image frames of the stream of image frames to identify the respective scene information and select the reference frame. Alternatively, only those image frames are checked which potentially provide improvement to the acquisition accuracy and consistency.
  • From the reference frame at least one or more acquisition parameter are determined and the final image is determined from the target frame by use of the determined acquisition parameter. Therein, the acquisition parameter might relate to an auto white balancing (AWB), automatic exposure control (AEC) and/or tone-mapping (TM) parameter.
  • Thus, by the present invention the acquisition parameter of an image frame acquired before capturing the target frame are used in order to increase the consistency and accuracy of color and brightness reproduction of images and videos. Thus, by the present invention more information about the scene in which the camera is operated is used from the previously acquired image frames.
  • Preferably, scene information may include localization information, for the image frame of the stream of image frames and the target frame, e.g. simultaneous localization and mapping (SLAM) data. Thus, by utilizing the SLAM data the camera can easily determine whether there is a match of the scene information by overlap of the SLAM data. Therein, by the SLAM data for example the presence of an object of the target frame which is also present in one of the image frames of the stream of image frames can be determined. Thus, on the basis of the acquired SLAM data, selecting of the reference frame can be performed. Therein, SLAM data can be acquired for a part of the target frame or the complete target frame. Similarly, SLAM data can be acquired for each of the complete image frame or only parts of the respective image frame. By using the SLAM data it is not necessary to use a very large amount of annotated ground-truth data for training any object recognition, thereby reducing the cost for accumulating a high quality training data. Further, by use of the SLAM data, the present invention is not limited to identification of specific and previously trained objects. In particular, by using the SLAM data the method is independent of the respective object which can be any object of the real-world, specific structures, surfaces or shapes which are localized and mapped by the SLAM process. Further, most modern terminals, such as smartphones, tablets or the like, already have SLAM modules implemented, such that the information provided by the SLAM module can be used for identification of the target frame in the present invention.
  • Preferably, the scene information includes depth information or odometry information of the image frame and/or the target frame. Alternatively or additionally, scene information includes a pose of the image sensor, i.e. the camera. Thus, preferably the camera includes one or more of an inertial motion unit (IMU) such as an acceleration sensor, a gyroscope or the like in order to be able to acquire the pose of the camera. Therein, the depth information of the object might be provided by stereo camera measurement, LIDAR or the like. Therein, pose and depth information/odometry information might also be included in the SLAM data.
  • Preferably, selecting a reference frame from the stream of image frames by identifying the scene information of the target frame in the reference frame includes determining an at least partial overlap of the image frame from the stream of image frames with the target frame by the scene information. Thus, by matching the scene information of the target frame and the respective image frame, partial overlap of the scene contents of the target frame and the image frame is determined in order to make sure that use of the at least one acquisition parameter of the selected reference frame to determine the final image is applicable. Thus, by the at least partial overlap, objects present and visible in the target frame are also at least partially present and visible in the respective image frame when the scene information of the target frame coincides with the scene information of the image frame of the stream of image frames.
  • Preferably, scene information include coordinates of the scene and preferably an object of the scene. Selecting the reference frame from the stream of images by identifying the scene information of the target frame includes calculating coordinates of the scene and determining overlap with coordinates of the respective image frame of the stream of image frames. Thus, if there is a sufficient overlap between the scene of the target frame and the respective image frame according to the calculated coordinates, the image frame can be selected as reference frame. Therein, if coordinates of an object are used, the object can be any real-world object, such as shapes, structures, surfaces or the like. The object might be further several real-world objects or parts thereof, only one real-world object or a part thereof. Therein, preferably SLAM data and/or depth information and/or the pose of the image sensor are used in order to calculate the coordinates of the scene or object in the scene. Therein, preferably, the coordinates are calculated in a world coordinate system to be able to be compared between the individual frames and also if the camera is moving or the pose of the camera is changing.
  • Preferably, calculating the coordinates of the scene or object of the scene includes:
    • Acquiring depth information d for pixels (u,v) in the respective image frame and/or the target frame;
    • Determining coordinates in the camera system (Xcam,Ycam,d,1) preferably by X cam = u × 4 px × d ÷ cx
      Figure imgb0001
      and Y cam = v × 4 + 60 py × d ÷ cy
      Figure imgb0002
      with (px, py) being the principal point of the image sensor, (cx, cy) being the focal length, wherein preferably cx = cy; and
    • Transferring the coordinates to the world coordinate system preferably by X Y Z 1 = R | t X cam Y cam d 1
      Figure imgb0003
      with (X, Y, Z, 1) being the coordinates in the world coordinate system and (R|t) the pose of the image sensor.
  • Preferably, the coordinates of the target frame provided in the world coordinate system are compared with the coordinates of each image frame in the stream of image frames also in the world coordinate system subsequently to determine the partial overlap with the target frame.
  • Preferably, selecting the reference frame includes determining a confidence level of the respect frame for the acquisition parameter and selecting the reference frame if the confidence level is above a preset threshold. Thus, by the confidence level a measure is provided whether the determined at least one or more acquisition parameters of the respective image frame are suitable to be used in order to determine the final image. Only if the confidence level is high enough, i.e. above a preset threshold, the image frame of the stream of image frames is selected as reference image. In particular, the confidence level of the respective image frame to be selected as reference frame needs to be above the confidence level of the target frame in order to provide an improvement of consistency and accuracy of color and brightness reproduction of the image. In particular, if in the stream of image frames no image frame can be found having a confidence level being above the preset threshold the acquisition parameters are determined from the target frame itself.
  • Preferably, the reference frame is selected by the maximum of overlap between the respective image frame of the stream of image frames and the target frame and the confidence level of the respective image frame of the stream of image frames. Thus, an optimum of color and brightness consistency and accuracy can be achieved.
  • Preferably, the confidence value is determined by one or more of a color gamut in particular for AWB, brightness gamut for AEC and/or TM, a hull of the 2D chromaticity for AWB, 1D brightness range for AEC and/or TM, or 3D color histogram for AWB and/or AEC and/or TM. If SLAM data is used to make a rough model about the scene in which the camera is operated, then AWB/AEC/TM parameters from image frames having a higher confidence level can be used to correct the acquisition parameters that result for target frames having a lower confidence level, hence increasing the consistency and accuracy of color and brightness reproduction.
  • Preferably, the image frame from the stream of image frames comprises low resolution images having a resolution lower than the final image and in particular a resolution smaller than 640x480 pixel, more preferably a resolution smaller than 320x240 pixel and more preferably a resolution smaller than 64x48 pixel. Thus, the image frames from the stream of image frames can be easily stored and processed without increase of computational demands on the device.
  • Preferably, the image frames of the stream of image frames are stored in a memory of the camera for subsequent use to determine the acquisition parameters. In particular, if the image frames from the stream of image frames provides low resolution images, the image frames can be easily stored without excessive memory consumption. In particular, only the image frames of the stream of image frames might be stored having a confidence level above a preset threshold. Thus, only those image frames are stored which can be used as reference images while the other image frames of the stream of image frames are disregarded in order to further reduce the demands on memory.
  • Preferably, the camera pose is stored together with the stored image frames of the stream of image frames. Thus, by the pose the coordinates of the object in the respective image frames can be calculated. Further information might be stored together with the image frames of the stream of image frames such as focal length, principal point and depth information.
  • Preferably, the method further comprises:
    Detecting change of illumination between the reference frame and the target frame and adapting the reference frame to the changed illumination before determining the acquisition parameter.
  • Preferably, more than one reference frames are selected wherein the at least one acquisition parameter is determined from the more than one reference frame for example by averaging. In particular, weighted averaging can be used, wherein the acquisition parameter of the more than one reference frame are weighted by their respective confidence value.
  • Preferably, the steps of the method are iteratively repeated for every new target frame of a video stream or a stream of preview-images.
  • In an aspect of the present invention an image signal processor (ISP) is provided. The ISP is configured to perform the steps of the method described before. Preferably, the ISP is connectable to an image sensor to receive image data or image frames. Further, the IPS may be connectable to a SLAM module of a device implementing the ISP which may be a terminal or the like.
  • In an aspect of the present invention a camera device is provided preferably implemented in a mobile terminal. The camera device comprises an image sensor, a processor and a memory storage storing instruction which when executed by the processor perform the steps of the method described above.
  • Preferably the camera device comprises a SLAM module to acquire SLAM data to identify the reference frame.
  • Figures
  • The present invention is described in more detail with reference to accompanying figures.
  • The figures show:
  • Figure 1
    a flow diagram of a method according to the present invention,
    Figure 2
    example images of the steps of the method according to the present invention,
    Figure 3
    detailed illustration of a step of the method according to the present invention,
    Figure 4
    a diagram showing another embodiment of the present invention and
    Figure 5
    a camera device according to the present invention.
    Detailed Description
  • The present invention is related to a camera control to improve the consistency and accuracy of color and brightness reproduction of images and videos in particular during automatic white balancing (AWB), automatic exposure control (AEC) and tone-mapping (TM) algorithms.
  • Preferably, the method according to the present invention is implemented in a camera module preferably of a terminal such as a smartphone, tablet or the like. Preferably, the camera module is connected to a processing module for performing the steps of the invention. The processing module might comprise an Image Signal Processor (ISP) or the like. However, the present invention is not restricted to a certain kind of terminals or any specific implementation.
  • Referring to figure 1, showing the method for camera control to acquire an image.
  • In step S01, a stream of image frames is acquired by an image sensor, wherein the stream of image frames comprises at least one frame.
  • Thus, a stream of image frames is acquired by an image sensor of the camera comprising at least one frame and preferably a plurality of subsequent frames.
  • In particular, the stream of image frames might be used as preview of the camera or is part of a video stream captured. In particular, the image frames of the stream of image frames have a low resolution, preferably lower than 640x480 pixel, more preferably a resolution smaller than 320x240 pixel and more preferably a resolution smaller than 64x48 pixel. Alternatively, the image frames are 3A statistics instead of original raw frames in order to reduce memory consumption, for example a 2D RGB grid that represents linearized raw camera RGB image frame.
  • In step S02, a target frame is acquired by the image sensor.
  • Therein, selection of the target frame might be performed by user interaction such as pushing a trigger button to start recording a video or acquiring an image. Alternatively, the target frame is determined by the next frame of a video stream to be captured or the next frame of a preview. Thus, the target frame is the raw data of the image intended by the user to be captured.
  • In step S03, scene information of the target frame is determined preferably by the processing module or ISP.
  • Therein, scene information includes any information about the scene of the target frame. Scene information can be determined for parts of the target frame or for the complete target frame. Similarly, in order to identify the scene information in the respective image frame of the stream of image frames, scene information of a part of the image frame or of the complete image frame can be determined to identify match of the scene information.
  • In step S04 a reference frame is selected from the stream of image frames by identifying the scene information of the target frame in the reference frame preferably by the processing module or ISP.
  • Each frame of the stream of image frames is checked whether there is at least a partial overlap between the scene information of the target frame and the respective image frame whether scene content of the target frame is partially or completely present in the respective image frame. Alternatively, only those image frames are checked which potentially provide improvement to the acquisition accuracy and consistency. If the scene information can be identified in one of the frames of the stream of image frames, this frame of the stream of image frames is selected and taken as reference frame. Therein, preferably the method is consecutively going through the image frames of the stream of image frames to identify the respective scene information and select the reference frame. Thus, overlap between the target frame and the respective image frame of the stream of image frames is determined by the scene information to identify a possible reference frame to be selected if sufficient overlap is determined.
  • In Step S05 at least one acquisition parameter of the reference frame is determined preferably by the processing module or ISP. Therein the at least one acquisition parameter might be an auto white balancing (AWB), automatic exposure control (AEC) and/or tone-mapping (TM) parameter determined from the reference frame.
  • Preferably, more than one reference frames are selected wherein the at least acquisition parameter is determined from the more than one reference frame for example by averaging. In particular, all reference frames that have match score above certain level can be selected. In particular, weighted averaging can be used, wherein the acquisition parameter of the more than one reference frame are weighted by their respective confidence value. Thus, more information from previous frames can be used to determine the acquisition parameter of the target frame providing a more reliable result.
  • In step S06 a final image is determined from the target frame by the at least one acquisition parameters preferably by the processing module or ISP.
  • Therein, the target frame contains raw data and as soon as the respective acquisition parameter is determined, the raw data of the target stream is determined by use of the one or more acquisition parameter from the reference frame.
  • Thus, by the present invention the acquisition parameters of an image frame acquired before capturing the target frame are used in order to increase the consistency and accuracy of color and brightness reproduction of images and videos. Thus, by the present invention more information about the scene in which the camera is operated is used from the previously acquired image frames.
  • In step S04 localization information and more preferably SLAM data might be used as scene information to make a rough model about the scene in which the camera is operated in order to determine the reference frame including at least partially the same scene contents as the target frame. Then AWB/AEC/TM parameters from frames having a higher confidence level can be used to correct the parameters that result for the target frame having a lower confidence level, hence increasing the consistency and accuracy of color and brightness reproduction. Thus, by utilizing the SLAM data, the camera can easily determine whether scene information of the target frame is also present in one of the image frames of the stream of image frames if there is at least a partial overlap in the scene content between the respective image frame and the target frame. Thus, on the basis of the acquired SLAM data, selecting of the reference frame can be performed. In particular, by using the SLAM data as scene information the method is independent of a respective object to be recognized and any object of the real-world, such as structures, surfaces or shapes which are localized and mapped by the SLAM process can be used to determine overlap between the target frame and the respective image frame. Further, most modern terminals, such as smartphones, tablets or the like, already have SLAM modules implemented, such that the information provided by the SLAM module can be used for identification of the target frame in the present invention.
  • Therein, the method can be implemented in an iterative process and repeated for each new target frame being a frame of a video stream or a preview, thereby continuously improving the image reproduction.
  • Referring to figure 2 showing the steps in order to acquire a final image. Therein, figure 2 refers to the implementation for an AWB algorithm. However, the present method can alternatively or at the same time also implemented in AEC or TM algorithm as mentioned above.
  • In picture A an initial image is acquired wherein by an auto white balancing algorithm acquisition parameters related to the AWB is determined for the initial image and applied in picture B to achieve correctly adjusted picture. Therein, by a SLAM algorithm simultaneous localization and mapping of the content of the picture B is performed and for the scene of the respective image frame a point cloud is determined as scene information. These steps are repeated for each image frame of the stream of image frames including the pictures A to E of figure 2.
  • Picture C shows a closer look of the respective objects in the scene by moving camera closer to the object 14 or by zooming in. Therein, object 14 is present in both image frames B and C, wherein points 14 of the point cloud mark the object 14. Similarly, by other points 10 of the point cloud other objects are detected.
  • Picture D shows the same object 14 even closer thereby reducing the color gamut of the image. Picture E only contains the object 14 and almost all color information is drawn directly from the object 14 itself leading to a low color gamut to be used as information for determining the respective AWB parameter of picture E. As clearly visible in the comparison between pictures B-D and E and shown in detail in pictures F and G, the AWB algorithm might fail resulting in wrong colors of the object 14 as shown in picture F.
  • In picture B of Figure 2 the image has a high color gamut and thus a high confidence level can be achieved for the acquisition parameters related to the AWB parameters. Further, the target frame shown in picture E has full overlap with the content of picture B since both show the object 14.
  • Thus, by the method of the present invention, the scene information of picture E including object 14 is identified subsequently in each of the images D, C and B in reversed order of acquisition until a picture is reached having a high confidence level regarding the AWB parameter and still have an overlap in the scene content, i.e. showing the object 14. Therein, it would not be necessary that the image frame includes object 14 completely but also partial overlap of the scene content between the target frame of picture E and the possible reference frame might be sufficient to improve color reproduction. Further, the present method is not limited to specific objects and any object as scene content can be used as scene information, such as surfaces, shapes, structures or the like. Although Fig. 2 showing as an example object 14, other objects or parts of objects are also possible. This comparison and identification of overlap by the scene information between the image frames in the plurality of image frames B-D and the target frame E is preferably performed by acquiring SLAM data as scene information for each of the pictures B to E. Thereby, the SLAM data of object 14 can be identified by the world coordinates of the object 14 determined by the SLAM algorithm in the other frames in order to determine overlap. Thus, in the example of figure 2, picture C is used as reference frame and the AWB parameters determined for the picture C are also used for the AWB of picture E leading to picture E having a corrected AWB and producing correct colors thereby improving color consistency and accuracy for the object 14. The corrected AWB produces a result shown in picture G of figure 2 having the correct color and not suffering from the reduced color information provided by the picture E itself.
  • The steps for determining the coordinates of the scene or an object within the scene of the target frame and the respective image frames is illustrated in figure 3. Figure 3 shows a world coordinate system 22. In a first step, upon acquiring a frame 20 of a stream of image frames which might be used as reference frame, by the acquired depth information or odometry information, coordinates of the object 14 in the image frame 20 can be determined in the camera coordinate system 26 of camera in a first state/position denoted by "cam1". By the pose (R1, t1) of the camera "cam 1" and the coordinates of the object 14 in the camera coordinate system 26 of "cam1", coordinates of the object 14 in the world coordinate system 22 can be determined. Therein, it is not necessary to have a real-world object as exemplified in figure 3. Instead, any object, surface, shape or structure can be used and coordinates can be determined to determine overlap between target frame and respective image frame. Further, coordinates of a plurality of objects present in the scene, parts of a plurality of object in the scene or part of only one object in the scene can be used in order to determine overlap between the target frame and the respective image frame.
  • Similar for the target frame 32 according to the depth information provided by a 3D point cloud 34 of the camera in the camera state denoted by "cam2", the coordinates of the object 14 in the target frame 32 can be determined in the camera coordinate system 30 of "cam2". By the pose (R2, t2) of the camera "cam 2" and the coordinates of the object 14 in the camera coordinate system 30 of "cam2", coordinates of the object 14 in the world coordinate system 22 can be determined. Thus, overlap between the target frame 32 and the frame 20 can be determined. Therein, in the example of figure 3, overlap is determined by of a set of the 3D point of the 3D point cloud 34 in the world coordinate system that are visible in both target and reference frame, and there is no distinction done regarding which object(s) these points belong to. The 3D point cloud may be determined from the depth information, the camera position and/or camera orientation information (camera pose) as exemplified in more detail below.
  • Alternatively, only the coordinates of the object 14 can be determined in the world coordinate system for the target frame 32 of "cam2". The 3D point cloud 34 of the target frame 32 is available in the world coordinate system. Depth information/map, camera position and/or camera pose from target frame 32 was used to construct this 3D point cloud 34. For image frame 20, the distance of the camera at camera state "cam1" from those 3D points based on the camera pose and/or camera position in image frame 20 is determined to determine which area of the image frame 20 covers those 3D points of the 3D point cloud 34. Thus, depth information of the image frame 20 may not be available and only overlap of the scene or object of the reference frame 32 with the image frame 20 is determined without the need to calculate coordinates of the whole image frame 20 in the world coordinate system.
  • Therein, the coordinates of each pixel in the target frame might be translated into the world coordinate system 22. Alternatively, only for certain points of the target frame coordinates are determines. Similarly, for the respective image frame from the stream of image frames either for each of the pixels coordinates in the world coordinate system are determined, or, alternatively, for a selection of pixels of the respective image frame the coordinates are determined and translated into the world coordinate system 22 in order to identify overlap between the target frame or the object in the target frame and the respective image frame.
  • Due to the SLAM data acquired for the image frames including at least the depth information, i.e odometry information, in the stream of image frames, the coordinates of the scene or object 14 of the target frame 32 can be translated into the world coordinate system 22 and can than be compared with the world coordinates of the scene or object 14 of the reference frame 20 in order to determine whether the object 14 is present in the target frame 32 and the reference frame 20. Only if there is an overlap, i.e. the object 14 is at least partially visible in the respective image frame, this frame is considered to be used as reference frame. The acquisition parameters of the determined reference frame are used in order to produce the final image. Therein, for each frame, it is checked if there is at least a partial overlap of the scene with earlier frames. If yes, then it is checked whether the earlier frames have higher confidence level for the acquisition parameters available (separately for AWB, AEC, and TM).
  • Referring to figure 4 The system contains 3 parts in total. The first part is running SLAM 48 on the device by SLAM input data 46 from the image, IMU and depth data for camera pose estimation and scene modelling 50 acquiring a depth map or depth information. During this process, a sequence of image frames is captured and stored 40. The stored frames could be also low resolution 3A statistics instead of original raw frames in order to reduce memory consumption, for example 2D RGB grid that represents linearized raw camera RGB image. Also the corresponding camera pose is stored per each frame, which is a 4 × 4 matrix, alongside with other image metadata such as camera's focal length (cx,cy), principle point (px,py), and uncorrected algorithm parameters 42 such as AWB gains. The depth data or odometry data will be collected at the same time.
  • An algorithm confidence value 44 is calculated for each frame; for example, color gamut, convex hull of the 2D chromaticity or 3D color histogram could be used as the confidence metric for AWB/AEC/TM, because more colors visible inside the FOV usually makes the scene easier for AWB and also makes it easier for AEC and TM to estimate the correct brightness of objects in relation to other objects in the scene. The convex hull should be calculated from image data in device independent color space to enable using the same thresholds for high and low confidence for all devices. Higher confidence frames are the potential reference frames that can be utilized for correction of low confidence frames.
  • A decision 52 is made whether the respective image frame has a high confidence level. If the image frame has a high confidence level, the image frame is stored to be used later as reference frame for a video stream, preview or picture. For the final image of the high confidence frames, the uncorrected AWB/AEC/TM parameter are used to generate the final image.
  • If in the decision the image frame has a low confidence level for AWB/AEC/TM parameter, the system will retrieve the depth data and construct a depth map or 3D point cloud 58 as scene information. In order to build the 3D points cloud, each pixel (u,v) in the depth map first needs to be transferred into the camera coordinate system by using projective camera intrinsic matrix information as below, X cam = u × 4 px × d ÷ cx
    Figure imgb0004
    Y cam = v × 4 + 60 py × d ÷ cy
    Figure imgb0005
    where d is the real depth value from the depth map. After that the 3D points could be obtained by the following equation: X Y Z 1 = R | t X cam Y cam d 1
    Figure imgb0006
    , where (R|t) is the estimated camera pose.
  • The next step is to verify if the target frame i contents has shown in the most recent potential reference frames 60 from the data 62 of all acquired potential reference frames (or any of the high confidence frames that are identified to belong to the same physical space in which the camera is currently operated). The 3D points of the target frame determined before, are projected back to the potential reference frame j by following the steps described above in reverse, and replace the (R|t) with the potential reference frame's camera pose. Frame j is selected as reference frame based on maximizing the proportion of the low confidence frame i that is visible in the reference frame j (c_common_area(i,j)) and maximizing the confidence level that the reference frame j has (c_confidence(j)). According to one embodiment of the invention the maximized value is the product c_common_area(i,j) c_confidence(j), but also other implementations are possible.
  • Once the reference frame j is selected, the system moves to the third part. AWB is here used as an example algorithm. Automatic White Balancing (AWB) is the camera control algorithm that estimates the chromaticity of the illumination and calculates the White Balancing (WB) RGB gains that enable correct and consistent color reproduction of object colors regardless of the prevailing illumination, hence achieving color constancy. For example, white objects are reproduced as white regardless of the color of the illumination (if chromatic adaptation processing is excluded). The effect of WB on image RGB pixel can be illustrated by = C G x ,
    Figure imgb0007
    where x is the 3×1 vector that corresponds to linearized raw camera RGB value, G is the diagonal 3×3 WB RGB gains matrix (the diagonal values are the WB RGB gains), and C is the 3x3 color space conversion matrix to convert from linearized raw camera RGB to device independent linear RGB.
  • Illumination change between frames i and j detected 64 by comparing the linearized raw pixel RGB average values common_area_avg_rgb(i) and common_area_avg_rgb(j) that belong to the same object surfaces that is visible in both frames, and that have been normalized to eliminate impact from any difference in exposure (both are 3x1 RGB vectors). Therein, each point of the 3D point cloud 34 as shown in figure 3 have a corresponding RGB value in both target and reference frame. These are the points from which the "common_area_avg_rgb" are calculated from each frame. If Euclidean distance or other difference metrics diff(common-area-avg_rgb(i), common_area_avg_rgb(j)) is larger than certain threshold common_area_similarity_thr then illumination change is considered to be detected, otherwise the illumination is considered unchanged.
  • A decision 66 is made whether an illumination change could be detected.
    1. 1. If no illumination change is detected between target frame i and higher confidence reference frame j, then WB gains of frame j can be used for frame i 68 and just regular temporal filtering might be applied on top to ensure smooth parameter changes between frames.
    2. 2. If illumination change is detected, then the higher confidence reference frame j WB RGB gains need to be corrected 70 according to the illumination change before applying on target frame i. The correction factor (3x1 vector) correction_factor = common_area_avg_rgb(j)/common_area_avg_rgb(i) is used as a multiplier for the WB RGB gains of frame j before applying on frame i.
  • What is described here for AWB might also be applied to AEC or TM. The corrected AWB/AEC/TM parameter determined for the respective reference frame j are used and applied 72 to the target frame to achieve high color accuracy and consistency.
  • Referring to figure 5 showing a camera device 100 being implemented in a terminal such as a smartphone, tablet or the like. The camera device 100 comprises a processor 102 and a memory 104. Therein, the memory 104 stores instructions which when executed by the processor 102 carry out the steps of the method described above. Therein, the camera device 100 might further comprise or is connected to an image sensor to acquire image data to be used in the method of the present invention. Further, the camera device might comprise or might be connected to a SLAM module. Therein, the camera device 100 might have an individual SLAM module or a SLAM module is implemented in the terminal device used by the camera device 100. In figure 5 for illustration purposes the camera device 100 is shown together with the image sensor 106 and the SLAM module 108 as integrated component of the terminal.
  • Thus, by using the SLAM data/depth information provided by the SLAM module of the terminal or camera more information about the respective scene can be used and thus scene information can be identified in different frames in order to improve consistency and accuracy of color reproduction the acquisition parameters of frames having a higher confidence level are used.

Claims (17)

  1. Method for camera control to acquire an image including:
    Acquiring (S01) a stream of image frames by an image sensor comprising at least one frame;
    Acquiring (S02) a target frame by the image sensor;
    Determining (S03) scene information of the target frame;
    Selecting (S04) at least one reference frame from the stream of image frames by identifying the scene information of the target frame in the reference frame;
    Determining (S05) at least one acquisition parameter of the reference frame; and
    Determining (S06) a final image from the target frame by the at least one acquisition parameter.
  2. Method according to claim 1, wherein the scene information includes localization information and preferably simultaneous localization and mapping, SLAM, data for the image frame and the target frame.
  3. Method according to claim 1 or 2, wherein the scene information includes depth information of the image frame and/or the target frame and/or pose of the image sensor.
  4. Method according to any of claims 1 to 3, wherein selecting a reference frame from the stream of image frames by identifying the scene information of the target frame in the reference frame includes determining an at least partial overlap of the image frame from the stream of image frames with the target frame by the scene information.
  5. Method according to claim 4, wherein scene information include coordinates of the scene, and wherein selecting a reference frame from the stream of images by identifying the scene information of the target frame includes calculating coordinates of the target frame and determining an at least partial overlap with coordinates in the respective image frames of the stream of image frames.
  6. Method according to claim 5, wherein calculating coordinates of the scene includes:
    Acquiring depth information d for pixel in the frame (u,v,0);
    Determining coordinates in the camera system (Xcam,Ycam,d,1) preferably by X cam = u × 4 px × d ÷ cx
    Figure imgb0008

    and Y cam = v × 4 + 60 py × d ÷ cy
    Figure imgb0009
    with (px, py) being the principal point of the image sensor, (cx, cy) being the focal length; and
    Transferring the coordinates to the world coordinate system preferably by X Y Z 1 = R | t X cam Y cam d 1
    Figure imgb0010
    with (X, Y, Z, 1) being the coordinates in the world coordinate system and (R|t) the pose of the image sensor.
  7. Method according to claim 6, including comparing the coordinates in the world coordinate system of the object in the target frame with each image frame from the stream to determine the at least partial overlap.
  8. Method according to any of claims 1 to 7, wherein selecting a reference frame includes determining a confidence level of the respective frame for the acquisition parameter and selecting the reference frame if the confidence level is above a preset threshold.
  9. Method according to any of claims 1 to 8, wherein the reference frame is selected by the maximum of overlap and confidence value of the respective image frame of the stream of image frames.
  10. Method according to any of claims 1 to 9, wherein the confidence value is provided by one or more of color gamut, brightness gamut, a hull of the 2D chromaticity, 1D brightness range or 3D color histogram.
  11. Method according to any of claims 1 to 10, wherein the image frames from the stream of image frames comprises low resolution images having a resolution lower than the final image or 3A statistics of the raw image frame.
  12. Method according to any of claims 1 to 11, wherein image frames of the stream of image frames are stored and preferably those image frames of the stream of image frames are stored having a confidence level above a preset threshold.
  13. Method according to claim 12, wherein the camera pose is stored together with the stored image frames of the stream of image frames.
  14. Method according to any of claims 1 to 13, wherein the method further comprises:
    Detecting change of illumination between the reference frame and the target frame and adapting the reference frame to the changed illumination before determining the acquisition parameter.
  15. Method according to any of claims 1 to 14, wherein the steps of the method are repeated for every new target frame of a video stream or a stream of preview-images.
  16. Image Signal Processor, ISP, configured to perform the steps of the method according to claims 1 to 15.
  17. Camera device comprising a processor and a memory storage storing instruction which when executed by the processor perform the steps of the method according to any of claims 1 to 15.
EP21192389.1A 2021-08-20 2021-08-20 Method for camera control, image signal processor and device with temporal control of image acquisition parameters Pending EP4138390A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP21192389.1A EP4138390A1 (en) 2021-08-20 2021-08-20 Method for camera control, image signal processor and device with temporal control of image acquisition parameters
US17/855,394 US20230058934A1 (en) 2021-08-20 2022-06-30 Method for camera control, image signal processor and device
CN202210880734.8A CN115714919A (en) 2021-08-20 2022-07-25 Method for camera control, image signal processor and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP21192389.1A EP4138390A1 (en) 2021-08-20 2021-08-20 Method for camera control, image signal processor and device with temporal control of image acquisition parameters

Publications (1)

Publication Number Publication Date
EP4138390A1 true EP4138390A1 (en) 2023-02-22

Family

ID=77431216

Family Applications (1)

Application Number Title Priority Date Filing Date
EP21192389.1A Pending EP4138390A1 (en) 2021-08-20 2021-08-20 Method for camera control, image signal processor and device with temporal control of image acquisition parameters

Country Status (3)

Country Link
US (1) US20230058934A1 (en)
EP (1) EP4138390A1 (en)
CN (1) CN115714919A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030020826A1 (en) * 2001-06-25 2003-01-30 Nasser Kehtarnavaz Automatic white balancing via illuminant scoring autoexposure by neural network mapping
US20120140985A1 (en) * 2010-12-07 2012-06-07 Canon Kabushiki Kaisha Image processing apparatus and control method therefor
JP2013168723A (en) * 2012-02-14 2013-08-29 Nikon Corp Image processing device, imaging device, image processing program, and image processing method
US20180288388A1 (en) * 2017-03-29 2018-10-04 Intel Corporation Camera platforms with rolling light projection
CN110675457A (en) * 2019-09-27 2020-01-10 Oppo广东移动通信有限公司 Positioning method and device, equipment and storage medium
US20200294304A1 (en) * 2016-05-02 2020-09-17 Samsung Electronics Co., Ltd. Method, apparatus, and recording medium for processing image
US20200372673A1 (en) * 2019-05-22 2020-11-26 Dell Products, L.P. Resolving region-of-interest (roi) overlaps for distributed simultaneous localization and mapping (slam) in edge cloud architectures
US20210120221A1 (en) * 2018-06-19 2021-04-22 Hiscene Information Technology Co., Ltd Augmented reality-based remote guidance method and apparatus, terminal, and storage medium
WO2021147113A1 (en) * 2020-01-23 2021-07-29 华为技术有限公司 Plane semantic category identification method and image data processing apparatus

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL234396A0 (en) * 2014-08-31 2014-12-02 Brightway Vision Ltd Self-image augmentation

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030020826A1 (en) * 2001-06-25 2003-01-30 Nasser Kehtarnavaz Automatic white balancing via illuminant scoring autoexposure by neural network mapping
US20120140985A1 (en) * 2010-12-07 2012-06-07 Canon Kabushiki Kaisha Image processing apparatus and control method therefor
JP2013168723A (en) * 2012-02-14 2013-08-29 Nikon Corp Image processing device, imaging device, image processing program, and image processing method
US20200294304A1 (en) * 2016-05-02 2020-09-17 Samsung Electronics Co., Ltd. Method, apparatus, and recording medium for processing image
US20180288388A1 (en) * 2017-03-29 2018-10-04 Intel Corporation Camera platforms with rolling light projection
US20210120221A1 (en) * 2018-06-19 2021-04-22 Hiscene Information Technology Co., Ltd Augmented reality-based remote guidance method and apparatus, terminal, and storage medium
US20200372673A1 (en) * 2019-05-22 2020-11-26 Dell Products, L.P. Resolving region-of-interest (roi) overlaps for distributed simultaneous localization and mapping (slam) in edge cloud architectures
CN110675457A (en) * 2019-09-27 2020-01-10 Oppo广东移动通信有限公司 Positioning method and device, equipment and storage medium
WO2021147113A1 (en) * 2020-01-23 2021-07-29 华为技术有限公司 Plane semantic category identification method and image data processing apparatus

Also Published As

Publication number Publication date
US20230058934A1 (en) 2023-02-23
CN115714919A (en) 2023-02-24

Similar Documents

Publication Publication Date Title
US9898856B2 (en) Systems and methods for depth-assisted perspective distortion correction
US8199202B2 (en) Image processing device, storage medium storing image processing program, and image pickup apparatus
CN107945105B (en) Background blurring processing method, device and equipment
CN106899781B (en) Image processing method and electronic equipment
US8417059B2 (en) Image processing device, image processing method, and program
US8385595B2 (en) Motion detection method, apparatus and system
US9251589B2 (en) Depth measurement apparatus, image pickup apparatus, and depth measurement program
US20140049612A1 (en) Image processing device, imaging device, and image processing method
KR100953076B1 (en) Multi-view matching method and device using foreground/background separation
WO2010061956A1 (en) Stereoscopic image processing device, method, recording medium and stereoscopic imaging apparatus
US20100118156A1 (en) Image processing apparatus, image pickup apparatus and image processing method
KR20200023651A (en) Preview photo blurring method and apparatus and storage medium
US9088709B2 (en) Image processing apparatus and method for controlling the same, and image pickup apparatus
JP7223079B2 (en) IMAGE PROCESSING APPARATUS, CONTROL METHOD THEREOF, AND IMAGING APPARATUS
CN108053438B (en) Depth of field acquisition method, device and equipment
US10013632B2 (en) Object tracking apparatus, control method therefor and storage medium
JP6157165B2 (en) Gaze detection device and imaging device
US20100208140A1 (en) Image processing apparatus, image processing method and storage medium storing image processing program
JP4042750B2 (en) Image processing apparatus, computer program, and image processing method
US10074209B2 (en) Method for processing a current image of an image sequence, and corresponding computer program and processing device
US20160180545A1 (en) Method and electronic device for object tracking in a light-field capture
CN110188640B (en) Face recognition method, face recognition device, server and computer readable medium
CN113610865A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
EP4138390A1 (en) Method for camera control, image signal processor and device with temporal control of image acquisition parameters
CN111885371A (en) Image occlusion detection method and device, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN PUBLISHED

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20230328

RBV Designated contracting states (corrected)

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20240207