WO2019047983A1 - Procédé et dispositif de traitement d'images, dispositif électronique et support d'informations lisible par ordinateur - Google Patents

Procédé et dispositif de traitement d'images, dispositif électronique et support d'informations lisible par ordinateur Download PDF

Info

Publication number
WO2019047983A1
WO2019047983A1 PCT/CN2018/105102 CN2018105102W WO2019047983A1 WO 2019047983 A1 WO2019047983 A1 WO 2019047983A1 CN 2018105102 W CN2018105102 W CN 2018105102W WO 2019047983 A1 WO2019047983 A1 WO 2019047983A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
predetermined
merged
depth
background image
Prior art date
Application number
PCT/CN2018/105102
Other languages
English (en)
Chinese (zh)
Inventor
张学勇
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201710813594.1A external-priority patent/CN107590795A/zh
Priority claimed from CN201710814395.2A external-priority patent/CN107704808A/zh
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2019047983A1 publication Critical patent/WO2019047983A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/521Depth or shape recovery from laser ranging, e.g. using interferometry; from the projection of structured light
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing

Definitions

  • the present invention relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, an electronic device, and a computer readable storage medium.
  • the existing image fusion usually combines real people with the background, but the fusion method is less interesting.
  • Embodiments of the present invention provide an image processing method, an image processing apparatus, an electronic device, and a computer readable storage medium.
  • the image processing method of the embodiment of the present invention is for processing a merged image, which is formed by fusing a predetermined background image with a person region image in a scene image of a current user in a real scene.
  • the image processing method includes: identifying a specific object in the merged image; fusing a predetermined sound model that matches the specific object with the merged image to output a sound image.
  • An image processing apparatus is configured to process a merged image, the merged image being formed by fusing the predetermined background image with a person region image in a scene image of a current user in a real scene.
  • the image processing apparatus includes a processor for: identifying a specific object in the merged image; fusing a predetermined sound model that matches the specific object with the merged image to output a sound image.
  • An electronic device of an embodiment of the invention includes one or more processors, a memory, and one or more programs. Wherein the one or more programs are stored in the memory and are configured to be executed by the one or more processors, the program comprising instructions for performing the image processing method described above.
  • a computer readable storage medium in accordance with an embodiment of the present invention includes a computer program for use in conjunction with an electronic device capable of imaging, the computer program being executable by a processor to perform the image processing method described above.
  • the image processing apparatus, the electronic device, and the computer readable storage medium of the embodiment of the present invention fuse the person region image with the predetermined background image to form a merged image, the predetermined background image in the merged image is identified by the specific object, and Determining a predetermined sound model matching the specific object according to the identified specific object to fuse the predetermined sound model and the merged image to output the sound image, so that the user can also hear the sound while viewing the merged image, enhancing the interest of the image fusion.
  • the image processing apparatus, the electronic device, and the computer readable storage medium of the embodiment of the present invention fuse the person region image with the predetermined background image to form a merged image, the predetermined background image in the merged image is identified by the specific object, and Determining a predetermined sound model matching the specific object according to the identified specific object to fuse the predetermined sound model and the merged image to output the sound image, so that the user can also hear the sound while viewing the merged image, enhancing the interest of the image fusion.
  • FIG. 1 is a flow chart of an image processing method according to some embodiments of the present invention.
  • FIG. 2 is a schematic diagram of an image processing apparatus in accordance with some embodiments of the present invention.
  • FIG. 3 is a schematic structural view of an electronic device according to some embodiments of the present invention.
  • FIG. 4 is a flow chart of an image processing method according to some embodiments of the present invention.
  • FIG. 5 is a schematic flow chart of an image processing method according to some embodiments of the present invention.
  • FIG. 6 is a flow chart of an image processing method according to some embodiments of the present invention.
  • FIG. 7(a) through 7(e) are schematic diagrams of scenes of structured light measurement in accordance with one embodiment of the present invention.
  • 8(a) and 8(b) are schematic diagrams of scenes of structured light measurement in accordance with one embodiment of the present invention.
  • FIG. 9 is a flow chart of an image processing method according to some embodiments of the present invention.
  • FIG. 10 is a flow chart of an image processing method according to some embodiments of the present invention.
  • FIG. 11 is a flow chart of an image processing method according to some embodiments of the present invention.
  • FIG. 12 is a flow chart of an image processing method according to some embodiments of the present invention.
  • FIG. 13 is a flow chart of an image processing method according to some embodiments of the present invention.
  • FIG. 14 is a schematic diagram of an image processing apparatus according to some embodiments of the present invention.
  • 15 is a schematic diagram of an electronic device in accordance with some embodiments of the present invention.
  • an image processing method is used to process a merged image.
  • the merged image is formed by fusing a predetermined background image and a person region image
  • the character region image is an image of a region where the current user is located in the scene image captured by the current user in the real scene.
  • Image processing methods include:
  • the predetermined sound model matched with the specific object is merged with the merged image to output the sound image.
  • an image processing method may be implemented by the image processing apparatus 100 of the embodiment of the present invention.
  • the image processing apparatus 100 is for processing a merged image.
  • the merged image is formed by fusing a predetermined background image and a human region image.
  • the character area image is an image of the area where the current user is located in the scene image captured by the current user in the real scene.
  • the image processing apparatus 100 includes a processor 20. Both step 03 and step 04 can be implemented by processor 20. That is, the processor 20 can be used to identify a particular object in the merged image, and fuse the predetermined sound model that matches the particular object with the merged image to output the sound image.
  • an image processing apparatus 100 according to an embodiment of the present invention may be applied to an electronic apparatus 1000 according to an embodiment of the present invention. That is, the electronic device 1000 of the embodiment of the present invention includes the image processing device 100 of the embodiment of the present invention.
  • the electronic device 1000 includes a mobile phone, a tablet computer, a notebook computer, a smart bracelet, a smart watch, a smart helmet, smart glasses, and the like.
  • the predetermined background image may be a predetermined two-dimensional background image or a predetermined three-dimensional background image.
  • the predetermined background image may be randomly assigned by the processor 20 or selected by the current user.
  • the image processing apparatus 100, and the electronic device 1000 of the embodiment of the present invention fuse the person region image with the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) to form a merged image, the predetermined background in the merged image
  • the image predetermined two-dimensional background image or predetermined three-dimensional background image
  • the predetermined sound model matching the specific object according to the identified specific object, to fuse the predetermined sound model and the merged image to output the sound image, Therefore, the user can also hear the sound while viewing the merged image, enhance the interest of the image fusion, and make the user feel immersive and improve the user experience.
  • specific objects include animals, plants, running water, raindrops, musical instruments, fire, sky, roads, automobiles, and the like.
  • the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) in the merged image includes trees
  • the processor 20 may identify the trees in the merged image by the following method: the processor 20 first based on the color of the RGB space The histogram performs color feature extraction on the merged image or the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image), then performs texture feature extraction based on the Gabor filter, and finally determines the merged image according to the combination of the color feature and the texture feature. There are trees.
  • the processor 20 may select a wind-blown tree, and the predetermined sound model of the rustling of the leaves of the tree merges with the merged image.
  • the processor 20 may convert the merged image of the RGB space into a merged image of the HSV space, perform a color histogram calculation on the merged image of the HSV space, and adjust the low-order statistics by the color histogram characteristic. The moment value is used as the feature description quantity.
  • the K-proximity method is used to judge the category of the merged image, that is, whether there is an animal in the merged image, and the animal is judged when the animal is present, and finally the predetermined sound model matching the recognized animal is selected. .
  • the image processing method of the embodiment of the present invention further includes:
  • the image processing apparatus 100 further includes a visible light camera 11 and a depth image acquisition component 12.
  • Step 021 can be implemented by visible light camera 11, and step 022 can be implemented by depth image acquisition component 12.
  • Step 023 and step 024 can be implemented by processor 20.
  • the visible light camera 11 can be used to acquire a scene image of the current user.
  • the depth image acquisition component 12 can be used to acquire a depth image of the current user.
  • the processor 20 is operable to process the scene image and the depth image to extract the current user's character area in the scene image to obtain the character area image, and fuse the person area image with the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) To get a merged image.
  • the scene image may be a grayscale image or a color image
  • the depth image represents depth information of each person or object in the real scene in which the current user is located.
  • the scene range of the scene image is consistent with the scene range of the depth image, and each pixel in the scene image can find the depth information corresponding to the pixel in the depth image.
  • the existing methods of segmenting characters and background mainly divide the characters and backgrounds according to the similarity and discontinuity of adjacent pixels in pixel values, but this segmentation method is susceptible to environmental factors such as external illumination.
  • the image processing method of the embodiment of the present invention extracts a character region in the scene image by acquiring a depth image of the current user. Since the acquisition of the depth image is not easily affected by factors such as illumination and color distribution in the scene, the character region extracted by the depth image is more accurate, and in particular, the boundary of the person region can be accurately calibrated. Further, the effect of the merged image in which the more accurate character region image is merged with the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) is better.
  • obtaining the depth image of the current user in step 022 includes:
  • depth image acquisition assembly 12 includes a structured light projector 121 and a structured light camera 122.
  • Step 0221 can be implemented by structured light projector 121
  • step 0222 and step 0223 can be implemented by structured light camera 122.
  • the structured light projector 121 can project structured light to the current user
  • the structured light camera 122 can be used to capture the structured light image modulated by the current user, and demodulate the phase information corresponding to each pixel of the structured light image to obtain the depth. image.
  • a structured light image modulated by the current user is formed on the surface of the current user's face and the body.
  • the structured light camera 122 captures the modulated structured light image and demodulates the structured light image to obtain a depth image.
  • the pattern of the structured light may be a laser stripe, a Gray code, a sine stripe, a non-uniform speckle, or the like.
  • step 0223 demodulates phase information corresponding to each pixel of the structured light image to obtain a depth image, including:
  • 02233 Generate a depth image based on the depth information.
  • step 02231, step 02232, and step 02233 can all be implemented by the structured light camera 122.
  • the structured light camera 122 can also be used to demodulate phase information corresponding to each pixel in the structured light image, convert the phase information into depth information, and generate a depth image based on the depth information.
  • the phase information of the modulated structured light is changed compared to the unmodulated structured light, and the structured light presented in the structured light image is the structured light after the distortion is generated, wherein the changed phase information It can represent the depth information of the object. Therefore, the structured optical camera 122 first demodulates the phase information corresponding to each pixel in the structured light image, and then calculates the depth information based on the phase information, thereby obtaining a final depth image.
  • the following is an example of a widely used grating projection technology (stripion projection technology).
  • the grating projection technique belongs to the surface structure light in a broad sense.
  • the depth image acquisition component 12 needs to be parameterized before using the structured light for depth information acquisition.
  • the calibration includes geometric parameters (for example, the relative relationship between the structured light camera 122 and the structured light projector 121). Calibration of positional parameters, etc., internal parameters of the structured light camera 122, calibration of internal parameters of the structured light projector 121, and the like.
  • the structured light projector 121 projects the four stripes into the measured object in time division.
  • the structured light camera 122 collects the image on the left side of Fig. 7(b) while reading the stripe of the reference plane as shown on the right side of Fig. 7(b).
  • the second step is to perform phase recovery.
  • the structured optical camera 122 calculates the modulated phase based on the acquired four modulated fringe patterns (ie, the structured light image), and the phase map obtained at this time is a truncated phase map. Since the result obtained by the four-step phase shifting algorithm is calculated by the inverse tangent function, the phase after the structured light modulation is limited to [- ⁇ , ⁇ ], that is, whenever the modulated phase exceeds [- ⁇ ] , ⁇ ], which will start again. The resulting phase principal value is shown in Figure 7(c).
  • the bounce processing is required, that is, the truncated phase is restored to the continuous phase.
  • the left side is the modulated continuous phase map and the right side is the reference continuous phase map.
  • the modulated continuous phase and the reference continuous phase are subtracted to obtain a phase difference (ie, phase information), the phase difference is used to represent the depth information of the measured object relative to the reference surface, and then the phase difference is substituted into the phase and depth.
  • phase difference ie, phase information
  • the formula the parameters involved in the formula are calibrated
  • the three-dimensional model of the object to be tested as shown in Fig. 7(e) can be obtained.
  • the structured light used in the embodiments of the present invention may be any other pattern in addition to the above-mentioned grating, depending on the specific application scenario.
  • the present invention can also use the speckle structure light to perform the acquisition of the depth information of the current user.
  • the method of obtaining depth information by the speckle structure light is to use a substantially flat plate diffraction element having an embossed diffraction structure of a specific phase distribution, the cross section being a step relief structure having two or more concavities and convexities.
  • the thickness of the substrate in the diffractive element is approximately 1 micrometer, and the height of each step is not uniform, and the height may range from 0.7 micrometer to 0.9 micrometer.
  • the structure shown in Fig. 8(a) is a partial diffraction structure of the collimating beam splitting element of the present embodiment.
  • Fig. 8(b) is a cross-sectional side view taken along section A-A, and the units of the abscissa and the ordinate are both micrometers.
  • the speckle pattern generated by the speckle structure light has a high degree of randomness and will change the pattern as the distance is different. Therefore, before the depth information is acquired using the speckle structure light, it is first necessary to calibrate the speckle pattern in the space. For example, within a range of 0 to 4 meters from the structured optical camera 122, a reference plane is taken every 1 cm. After the calibration is completed, 400 speckle images are saved. The smaller the calibration interval, the higher the accuracy of the acquired depth information. Subsequently, the structured light projector 121 projects the speckle structure light onto the object to be tested (i.e., the current user), and the height difference of the surface of the object to be measured causes the speckle pattern of the speckle structure light projected onto the object to be measured to change.
  • the object to be tested i.e., the current user
  • the structured light camera 122 captures the speckle pattern (ie, the structured light image) projected onto the object to be tested, and then performs the cross-correlation operation on the speckle pattern and the 400 speckle images saved in the previous calibration, thereby obtaining 400 correlations. Degree image. The position of the measured object in the space will show the peak on the correlation image, and the above peaks will be superimposed and interpolated to obtain the depth information of the measured object.
  • a collimating beam splitting element is used, which not only has the function of collimating the non-collimated beam, but also has the function of splitting light, that is, the non-collimated light reflected by the mirror passes through the collimating beam splitting element.
  • a plurality of collimated beams are emitted at different angles, and the cross-sectional areas of the plurality of collimated beams are approximately equal, and the energy fluxes are approximately equal, so that the effect of the astigmatism of the diffracted light by the beam is better.
  • the laser light is scattered to each beam, which further reduces the risk of harming the human eye, and the speckle structure light consumes the same collection effect compared to other uniformly arranged structured light.
  • the battery is lower.
  • the step 023 processes the scene image and the depth image to extract the current user's character area in the scene image to obtain the character area image, including:
  • 0234 Determine a person area that is connected to the face area and falls within the depth range according to the depth range of the person area to obtain a person area image.
  • step 0231, step 0232, step 0233, and step 0234 can all be implemented by the processor 20. That is to say, the processor 20 can be used to identify a face region in the scene image, obtain depth information corresponding to the face region from the depth image, determine a depth range of the person region according to the depth information of the face region, and The depth range of the region determines a person region that is connected to the face region and falls within the depth range to obtain a person region image.
  • the trained depth learning model may firstly identify the face region in the scene image, and then the depth information of the face region may be determined according to the corresponding relationship between the scene image and the depth image. Since the face region includes features such as a nose, an eye, an ear, and a lip, the depth data corresponding to each feature in the face region is different in the depth image, for example, when the face faces the depth image capturing component 12 In the depth image captured by the depth image acquisition component 12, the depth data corresponding to the nose may be small, and the depth data corresponding to the ear may be large. Therefore, the depth information of the face area described above may be a numerical value or a numerical range. Wherein, when the depth information of the face region is a value, the value may be obtained by averaging the depth data of the face region; or, by taking the median value of the depth data of the face region.
  • the processor 20 may determine the depth information of the face area according to the depth information of the face area.
  • the depth range of the character area is set, and the character area falling within the depth range and connected to the face area is extracted according to the depth range of the character area to obtain a person area image.
  • the person region image can be extracted from the scene image according to the depth information. Since the acquisition of the depth information is not affected by the image of the illumination, color temperature and the like in the environment, the extracted image of the person region is more accurate.
  • the step 023 processing the scene image and the depth image to extract the current user's character area in the scene image to obtain the character area image further includes:
  • both step 0235 and step 0236 can be implemented by processor 20. That is, the processor 20 can also be used to process the scene image to obtain a full field edge image of the scene image, and to correct the person region image from the full field edge image.
  • the processor 20 first performs edge extraction on the scene image to obtain a full field edge image, wherein the edge line in the full field edge image includes the current user and the edge line of the background object in the scene where the current user is located.
  • edge extraction of the scene image can be performed by the Canny operator.
  • the core of the algorithm for edge extraction of Canny operator mainly includes the following steps: First, convolving the scene image with 2D Gaussian filter template to eliminate noise; then, using the differential operator to obtain the gradient value of the gray level of each pixel, and Calculating the gradient direction of the gray level of each pixel according to the gradient value, and finding the adjacent pixel of the corresponding pixel along the gradient direction by the gradient direction; then, traversing each pixel, if the gray value of a certain pixel and the two directions of the gradient direction If the gray value of the neighboring pixel is not the largest, then the pixel is considered not to be the edge point. In this way, the pixel points at the edge position in the scene image can be determined, thereby obtaining the full-field edge image after the edge extraction.
  • the processor 20 After acquiring the full-field edge image, the processor 20 corrects the person region image according to the full-field edge image. It can be understood that the image of the person region is obtained by merging all the pixels in the scene image that are connected to the face region and fall within the set depth range. In some scenarios, there may be some connections with the face region and falling into the image. Objects in the depth range. Therefore, in order to make the extracted person region image more accurate, the person region image can be corrected using the full field edge map.
  • the processor 20 may perform secondary correction on the corrected person region image.
  • the corrected person region image may be expanded to enlarge the person region image to preserve the edge details of the person region image.
  • step 024 is to merge the character region image with the predetermined background image to obtain a merged image, including:
  • 02411 Acquire a predetermined fusion region in a predetermined three-dimensional background image
  • 02412 determining a pixel area to be replaced of a predetermined fusion area according to the image of the person area
  • step 02411, step 02212, and step 02413 can all be implemented by processor 20. That is to say, the processor 20 can be configured to acquire a predetermined fusion region in the predetermined three-dimensional background image, determine a pixel region to be replaced of the predetermined fusion region according to the image of the person region, and replace the pixel region to be replaced of the predetermined fusion region with the image of the region region. To get a merged image.
  • the depth data corresponding to each pixel in the predetermined three-dimensional background image can be directly acquired during the modeling process; when the predetermined three-dimensional background image is obtained through animation, the predetermined three-dimensional is obtained.
  • the depth data corresponding to each pixel in the background image may be set by the producer; in addition, each object existing in the predetermined three-dimensional background image is also known, and therefore, before using the predetermined three-dimensional background image for image fusion processing, The depth data and the objects existing in the predetermined three-dimensional background image calibrate the fusion position of the person region image, that is, the predetermined fusion region.
  • the processor 20 needs to be based on visible light.
  • the size of the image of the person region actually acquired by the camera 11 determines the pixel region to be replaced in the predetermined fusion region.
  • the merged image is obtained by replacing the pixel area to be replaced in the predetermined fusion area with the person area image. In this way, the fusion of the person region image with the predetermined three-dimensional background image is achieved.
  • step 024 is to merge the character region image with the predetermined background image to obtain a merged image, including:
  • 02421 processing a predetermined three-dimensional background image to obtain a full-field edge image of the predetermined three-dimensional background image
  • 02423 Determine a calculated fusion region of the predetermined three-dimensional background image according to the full-field edge image and the depth data of the predetermined three-dimensional background image; 02424: determine a pixel region to be replaced that calculates the fusion region according to the image of the person region; and
  • step 02421, step 02224, step 02423, step 02424, and step 02425 can all be implemented by processor 20. That is, the processor 20 can be configured to process a predetermined three-dimensional background image to obtain a full-field edge image of the predetermined three-dimensional background image, acquire depth data of the predetermined three-dimensional background image, and determine according to the full-field edge image and the depth data of the predetermined three-dimensional background image.
  • the calculation fusion region of the three-dimensional background image is predetermined, the pixel region to be replaced of the fusion region is determined according to the image of the person region, and the pixel region to be replaced of the calculation fusion region is replaced with the image of the region region to obtain a merged image.
  • the processor 20 first needs to determine the fusion position of the person region image in the predetermined three-dimensional background image. Specifically, the processor 20 performs edge extraction on the predetermined three-dimensional background image to obtain a full-field edge image, and acquires depth data of the predetermined three-dimensional background image, wherein the depth data is acquired in a predetermined three-dimensional background image modeling or animation process. Subsequently, the processor 20 determines a calculated fusion region in the predetermined three-dimensional background image based on the full-field edge image and the depth data of the predetermined three-dimensional background image.
  • the size of the image of the person region is affected by the collection distance of the visible light camera 11, the size of the image of the person region is calculated, and the pixel region to be replaced in the fusion region is determined according to the size of the image of the region region. Finally, the pixel area to be replaced in the calculated fusion area image is replaced with the person area image, thereby obtaining a merged image. In this way, the fusion of the person region image with the predetermined three-dimensional background image is achieved.
  • the person region image may be a two-dimensional person region image or a three-dimensional person region image.
  • the processor 20 can extract a two-dimensional person region image from the scene image by combining the depth information in the depth image, and the processor 20 can also establish a three-dimensional image of the person region according to the depth information in the depth image, and then combine the scene image.
  • the color information is color-filled to the three-dimensional character area to obtain a three-dimensional colored character area image.
  • the predetermined fusion region or computational fusion region in the predetermined three-dimensional background image may be one or more.
  • the fusion position of the two-dimensional person area image or the three-dimensional person area image in the predetermined three-dimensional background image is the only one predetermined fusion area;
  • the calculation fusion area is one, two-dimensional
  • the fusion position of the character area image or the three-dimensional character area image in the predetermined three-dimensional background image is the only one calculation fusion area;
  • the predetermined fusion area is plural, the two-dimensional person area image or the three-dimensional character area
  • the fusion position of the image in the predetermined three-dimensional background image may be any one of a plurality of predetermined fusion regions, and further, since the three-dimensional person region image has depth information, the three-dimensional person region may be searched for in a plurality of predetermined fusion regions.
  • the predetermined fusion region matching the depth information of the image is used as the fusion position to obtain a better fusion effect; when the calculation fusion region is plural, the two-dimensional human region image or the three-dimensional human region image is calculated in the three-dimensional background image.
  • the fusion location can be any of a plurality of computational fusion regions. Further, since the three-dimensional person region image has depth information, the calculated fusion region matching the depth information of the three-dimensional person region image can be found as a fusion position in the plurality of calculation fusion regions to obtain a better fusion effect.
  • the image of the person region can be merged with the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) to obtain a combined image.
  • the merged image is then processed to identify a particular object to match the predetermined sound model.
  • the predetermined sound model is merged with the merged image to output an audio image.
  • the sound image may be composed of a frame merged image and a predetermined sound model, or may be composed of a multi-frame merged image and a predetermined sound model, and the sound image at this time is a sound video.
  • the image processing method of the embodiment of the present invention further includes:
  • the process proceeds to step 03 to identify a specific object in the merged image.
  • both step 011 and step 012 can be implemented by processor 20. That is to say, the processor 20 can also be used to determine whether a predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) in the merged image is associated with a predetermined sound model, in a predetermined background image (predetermined two-dimensional background image) Or when the predetermined three-dimensional background image has a predetermined stored sound model, the merged image is fused with the predetermined sound model to output the sound image, and the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) has no associated storage.
  • the process proceeds to step 03 to identify a specific object in the merged image.
  • the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) in the merged image is present in each predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) when it is constructed or selected in the previous stage.
  • the object is known, and at this time, a predetermined sound model matching the specific object can be directly stored in association with a predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image).
  • the processor 20 uses a predetermined background image (a predetermined two-dimensional background image or a predetermined three-dimensional background image)
  • the associated stored predetermined sound model and the merged image may be directly merged to output the sound image.
  • the predetermined background image predetermined two-dimensional background image or predetermined three-dimensional background image
  • the predetermined background image predetermined two-dimensional background image or predetermined three-dimensional background image
  • the processor 20 it is necessary to identify a specific object in a predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image), and then select a predetermined sound model matching the specific object from the pre-stored poly-predetermined sound model according to the identified specific object. Finally, the selected predetermined sound model is merged with the merged image to output an audio image.
  • the image processing method of the embodiment of the present invention further includes:
  • the image processing apparatus 100 further includes an electroacoustic component 70 and a display 50.
  • Step 05 can be implemented by electroacoustic element 70 and display 50. Among them, the image is displayed by the display 50, and the sound is played by the electroacoustic element 70.
  • the electroacoustic component 70 can be a speaker, an earphone, a microphone, a cartridge, and the like.
  • the audio image when it is played, it may be the default playback image without playing the sound.
  • the current user can choose to trigger a play request to play the image and sound simultaneously.
  • the current user does not trigger a play request, only the image is played without playing the sound.
  • the merged image in the audio image is a multi-frame, for example, when the current user and the friend are in a video chat, the current user's image seen by the friend is a merged image, and at this time, the current user and the friend may trigger the play request to perform the playback request. Play images and sound simultaneously. In this way, the user's video chat process adds fun. Further, the current user and the friend trigger the play request again in a state where the image and sound of the sound image are simultaneously played, and the display 50 continues to display the merged image, and the electroacoustic element 70 stops playing the sound.
  • the audio image When the audio image is played, it is also possible to play the default image and sound simultaneously. At this time, the current user can choose to trigger the play request to stop the playback of the sound.
  • the predetermined sound model that matches a particular object includes one or more songs.
  • the predetermined sound model includes a song
  • the predetermined sound model is played one or more times during the playback of the sound image. That is to say, when the played audio image includes a frame merged image and a predetermined sound model, the display 50 continuously displays the merged image of one frame, the electro-acoustic component 70 plays the predetermined sound model once, or repeats the predetermined sound repeatedly. model.
  • the display 50 displays the above-described multi-frame merged image at a certain frame rate, during which the electro-acoustic component 70 plays a predetermined sound model, or cycles through multiple times to reserve. Sound model.
  • the predetermined sound model includes a plurality of songs
  • the predetermined sound models are sequentially stored in a list, and during the sound image playback, the plurality of the predetermined sound models are played in any one of a sequential play, a random play, a single loop, and a list loop.
  • the played audio image includes a frame merge image and a multi-track predetermined sound model
  • the plurality of predetermined sound models are sequentially stored in a list, and the display 50 continuously displays the merged image of one frame, and the electro-acoustic component 70 can be pressed.
  • the display 50 displays the above-described multi-frame merged image at a certain frame rate, during which the electro-acoustic component 70 can order the storage of the list of predetermined sound models.
  • the plurality of predetermined sound models are sequentially played, or the predetermined sound model of the plurality of songs is repeatedly played in the order of the list, or the predetermined sound model in the random play list is selected, or one of the plurality of predetermined sound models is selected for loop playback.
  • an embodiment of the present invention further provides an electronic device 1000.
  • the electronic device 1000 includes an image processing device 100.
  • the image processing apparatus 100 can be implemented using hardware and/or software.
  • the image processing apparatus 100 includes an imaging device 10 and a processor 20.
  • the imaging device 10 includes a visible light camera 11 and a depth image acquisition assembly 12.
  • the visible light camera 11 includes an image sensor 111 and a lens 112, and the visible light camera 11 can be used to capture color information of the current user to obtain a scene image, wherein the image sensor 111 includes a color filter array (such as a Bayer filter array), and the lens 112 The number can be one or more.
  • the image sensor 111 senses light intensity and wavelength information from the captured scene to generate a set of raw image data; the image sensor 111 sends the set of raw image data to the processing In the processor 20, the processor 20 obtains a color scene image by performing operations such as denoising, interpolation, and the like on the original image data.
  • the processor 20 can process each image pixel in the original image data one by one in a plurality of formats, for example, each image pixel can have a bit depth of 8, 10, 12 or 14 bits, and the processor 20 can be the same or different.
  • the bit depth is processed for each image pixel.
  • the depth image acquisition component 12 includes a structured light projector 121 and a structured light camera 122 that can be used to capture depth information of the current user to obtain a depth image.
  • the structured light projector 121 is for projecting structured light to a current user, wherein the structured light pattern may be a laser stripe, a Gray code, a sinusoidal stripe, or a randomly arranged speckle pattern or the like.
  • the structured light camera 122 includes an image sensor 1221 and a lens 1222, and the number of the lenses 1222 may be one or more.
  • Image sensor 1221 is used to capture a structured light image that structured light projector 121 projects onto the current user.
  • the structured light image may be transmitted by the depth acquisition component 12 to the processor 20 for processing such as demodulation, phase recovery, phase information calculation, etc. to obtain depth information of the current user.
  • the functions of the visible light camera 11 and the structured light camera 122 can be implemented by a single camera, that is, the imaging device 10 includes only one camera and one structured light projector 121, which can not only capture scene images, It is also possible to take a structured light image.
  • depth images of the current user can be acquired by a binocular vision method, a depth image acquisition method based on Time of Flight (TOF).
  • TOF Time of Flight
  • the processor 20 is further configured to fuse the person region image extracted from the scene image and the depth image with a predetermined background image (predetermined two-dimensional background image or a predetermined three-dimensional background image) to obtain a merged image, and process the merged image to determine a predetermined sound model, Finally, the merged image is fused with the predetermined sound model to output the sound image.
  • a predetermined background image predetermined two-dimensional background image or a predetermined three-dimensional background image
  • the processor 20 may extract the two-dimensional image of the person region from the scene image in combination with the depth information in the depth image, or create a three-dimensional image of the person region according to the depth information in the depth image, and then combine the scene
  • the color information in the image is color-filled to the three-dimensional character area to obtain a three-dimensional colored person area image.
  • the fusion processing of the person region image and the predetermined background image may be performed by performing the two-dimensional person region image and the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) Fusion to obtain a merged image may also be performed by fusing a three-dimensional colored person region image with a predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) to obtain a merged image.
  • the image processing apparatus 100 further includes a memory 30.
  • the memory 30 can be embedded in the electronic device 1000, or can be independent of the memory outside the electronic device 1000, and can include direct memory access (DMA) features.
  • the raw image data acquired by the visible light camera 11 or the structured light image related data collected by the depth image acquisition component 12 may be transferred to the memory 30 for storage or buffering.
  • the predetermined sound model can also be stored in the memory 30.
  • the processor 20 can read raw image data from the memory 30 for processing to obtain a scene image, and can also read structured light image related data from the memory 30 for processing to obtain a depth image, and can also read a predetermined sound from the memory 30.
  • the model is used for further processing of the merged image.
  • the scene image and the depth image may also be stored in the memory 30 for the processor 20 to call the processing at any time.
  • the processor 20 calls the scene image and the depth image to perform the character region extraction, and the extracted image of the person region is obtained.
  • Performing fusion processing with a predetermined background image predetermined two-dimensional background image or predetermined three-dimensional background image
  • processing the merged image to identify a specific object, and then searching for a predetermined sound model matching the specific object, and finally combining the image with the predetermined sound
  • the model is fused to output an audio image.
  • the predetermined background image predetermined two-dimensional background image or predetermined three-dimensional background image
  • merged image may also be stored in the memory 30.
  • the image processing apparatus 100 may also include a display 50.
  • the display 50 can acquire a merged image of the sound image directly from the processor 20, and can also acquire a merged image of the sound image from the memory 30.
  • Display 50 displays the merged images in the sound image for viewing by the user or for further processing by a graphics engine or a Graphics Processing Unit (GPU).
  • the image processing apparatus 100 further includes an encoder/decoder 60 that can encode image data of a scene image, a depth image, and a merged image, etc., and the encoded image data can be saved in the memory 30 and can be The image is decompressed by the decoder for display prior to display on display 50.
  • Encoder/decoder 60 may be implemented by a Central Processing Unit (CPU), GPU, or coprocessor. In other words, the encoder/decoder 60 may be any one or more of a central processing unit (CPU), a GPU, and a coprocessor.
  • the image processing apparatus 100 also includes a control logic 40.
  • the processor 20 analyzes the data acquired by the imaging device to determine image statistical information of one or more control parameters (eg, exposure time, etc.) of the imaging device 10.
  • Processor 20 sends image statistics to control logic 40, which controls imaging device 10 to determine good control parameters for imaging.
  • Control logic 40 may include a processor and/or a microcontroller that executes one or more routines, such as firmware.
  • One or more routines may determine control parameters of imaging device 10 based on the received image statistics.
  • the image processing apparatus 100 may further include an electroacoustic element 70 for playing a predetermined sound model in the sound image.
  • the electroacoustic element 70 is usually composed of a diaphragm, a voice coil, a permanent magnet, a bracket, and the like.
  • an alternating magnetic field is generated by the action of the current, and the permanent magnet also generates a constant magnetic field of constant magnitude and direction. Since the magnitude and direction of the magnetic field generated by the voice coil are constantly changing with the change of the audio current, the interaction of the two magnetic fields causes the voice coil to move perpendicular to the direction of the current in the voice coil.
  • the electroacoustic component 70 can acquire a predetermined sound model in the sound image from the processor 20 for playback, or can acquire a predetermined sound model in the sound image from the memory 30 for playback.
  • an electronic device 1000 of an embodiment of the present invention includes one or more processors 20, a memory 30, and one or more programs 31.
  • One or more of the programs 31 are stored in the memory 30 and are configured to be executed by one or more processors 20.
  • the program 31 includes instructions for executing the image processing method of any of the above embodiments.
  • the program 31 includes instructions for performing the image processing method described in the following steps:
  • the predetermined sound model matched with the specific object is merged with the merged image to output the sound image.
  • program 31 further includes instructions for performing the image processing method described in the following steps:
  • 0234 Determine a person area that is connected to the face area and falls within the depth range according to the depth range of the person area to obtain a person area image.
  • a computer readable storage medium in accordance with an embodiment of the present invention includes a computer program for use in conjunction with an electronic device 1000 capable of imaging.
  • the computer program can be executed by the processor 20 to perform the image processing method of any of the above embodiments.
  • a computer program can be executed by processor 20 to perform the image processing methods described in the following steps:
  • the predetermined sound model matched with the specific object is merged with the merged image to output the sound image.
  • the computer program can also be executed by the processor 20 to complete the image processing method described in the following steps:
  • 0234 Determine a person area that is connected to the face area and falls within the depth range according to the depth range of the person area to obtain a person area image.
  • first and second are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated.
  • features defining “first” and “second” may include at least one of the features, either explicitly or implicitly.
  • the meaning of "a plurality” is at least two, such as two, three, etc., unless specifically defined otherwise.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Graphics (AREA)
  • Optics & Photonics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Processing Or Creating Images (AREA)

Abstract

La présente invention concerne un procédé de traitement d'images, un dispositif de traitement d'images (100), un dispositif électronique (1000) et un support d'informations lisible par ordinateur. Le procédé de traitement d'images est utilisé pour traiter une image combinée, l'image combinée étant formée par fusion d'une image d'arrière-plan prédéfinie et d'une image de région de silhouette dans une image de scène d'un utilisateur actuel dans une scène réelle. Le procédé de traitement d'images comprend : (03) l'identification d'un objet spécifique dans une image combinée ; (04) la fusion d'un modèle sonore prédéfini correspondant à l'objet spécifique avec l'image combinée de sorte à délivrer une image audio.
PCT/CN2018/105102 2017-09-11 2018-09-11 Procédé et dispositif de traitement d'images, dispositif électronique et support d'informations lisible par ordinateur WO2019047983A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201710813594.1 2017-09-11
CN201710813594.1A CN107590795A (zh) 2017-09-11 2017-09-11 图像处理方法及装置、电子装置和计算机可读存储介质
CN201710814395.2 2017-09-11
CN201710814395.2A CN107704808A (zh) 2017-09-11 2017-09-11 图像处理方法及装置、电子装置和计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2019047983A1 true WO2019047983A1 (fr) 2019-03-14

Family

ID=65633548

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/105102 WO2019047983A1 (fr) 2017-09-11 2018-09-11 Procédé et dispositif de traitement d'images, dispositif électronique et support d'informations lisible par ordinateur

Country Status (1)

Country Link
WO (1) WO2019047983A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232353A (zh) * 2019-06-12 2019-09-13 成都世纪光合作用科技有限公司 一种获取场景人员深度位置的方法和装置
CN112198494A (zh) * 2019-06-20 2021-01-08 北京小米移动软件有限公司 飞行时间模组标定方法、装置、***及终端设备

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050201565A1 (en) * 2004-03-15 2005-09-15 Samsung Electronics Co., Ltd. Apparatus for providing sound effects according to an image and method thereof
CN101309389A (zh) * 2008-06-19 2008-11-19 深圳华为通信技术有限公司 一种合成可视图像的方法、装置和终端
CN104349175A (zh) * 2014-08-18 2015-02-11 周敏燕 一种基于手机终端的视频制作***及方法
CN105869198A (zh) * 2015-12-14 2016-08-17 乐视移动智能信息技术(北京)有限公司 多媒体照片生成方法、装置、设备及手机
CN106488017A (zh) * 2016-10-09 2017-03-08 上海斐讯数据通信技术有限公司 一种移动终端及其对拍摄的图像进行配乐的方法
CN106937059A (zh) * 2017-02-09 2017-07-07 北京理工大学 基于Kinect的影像合成方法和***
CN107590795A (zh) * 2017-09-11 2018-01-16 广东欧珀移动通信有限公司 图像处理方法及装置、电子装置和计算机可读存储介质
CN107704808A (zh) * 2017-09-11 2018-02-16 广东欧珀移动通信有限公司 图像处理方法及装置、电子装置和计算机可读存储介质

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050201565A1 (en) * 2004-03-15 2005-09-15 Samsung Electronics Co., Ltd. Apparatus for providing sound effects according to an image and method thereof
CN101309389A (zh) * 2008-06-19 2008-11-19 深圳华为通信技术有限公司 一种合成可视图像的方法、装置和终端
CN104349175A (zh) * 2014-08-18 2015-02-11 周敏燕 一种基于手机终端的视频制作***及方法
CN105869198A (zh) * 2015-12-14 2016-08-17 乐视移动智能信息技术(北京)有限公司 多媒体照片生成方法、装置、设备及手机
CN106488017A (zh) * 2016-10-09 2017-03-08 上海斐讯数据通信技术有限公司 一种移动终端及其对拍摄的图像进行配乐的方法
CN106937059A (zh) * 2017-02-09 2017-07-07 北京理工大学 基于Kinect的影像合成方法和***
CN107590795A (zh) * 2017-09-11 2018-01-16 广东欧珀移动通信有限公司 图像处理方法及装置、电子装置和计算机可读存储介质
CN107704808A (zh) * 2017-09-11 2018-02-16 广东欧珀移动通信有限公司 图像处理方法及装置、电子装置和计算机可读存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232353A (zh) * 2019-06-12 2019-09-13 成都世纪光合作用科技有限公司 一种获取场景人员深度位置的方法和装置
CN112198494A (zh) * 2019-06-20 2021-01-08 北京小米移动软件有限公司 飞行时间模组标定方法、装置、***及终端设备
CN112198494B (zh) * 2019-06-20 2023-11-21 北京小米移动软件有限公司 飞行时间模组标定方法、装置、***及终端设备

Similar Documents

Publication Publication Date Title
US11503228B2 (en) Image processing method, image processing apparatus and computer readable storage medium
CN107480613B (zh) 人脸识别方法、装置、移动终端和计算机可读存储介质
CN107734267B (zh) 图像处理方法和装置
CN107623817B (zh) 视频背景处理方法、装置和移动终端
CN107610080B (zh) 图像处理方法和装置、电子装置和计算机可读存储介质
CN107734264B (zh) 图像处理方法和装置
US11138740B2 (en) Image processing methods, image processing apparatuses, and computer-readable storage medium
CN107705278B (zh) 动态效果的添加方法和终端设备
CN107509043B (zh) 图像处理方法、装置、电子装置及计算机可读存储介质
CN107610171B (zh) 图像处理方法及其装置
CN107509045A (zh) 图像处理方法和装置、电子装置和计算机可读存储介质
CN107610127B (zh) 图像处理方法、装置、电子装置和计算机可读存储介质
CN107590828B (zh) 拍摄图像的虚化处理方法和装置
CN107707835A (zh) 图像处理方法及装置、电子装置和计算机可读存储介质
CN107707831A (zh) 图像处理方法和装置、电子装置和计算机可读存储介质
CN107613239B (zh) 视频通信背景显示方法和装置
CN107592491B (zh) 视频通信背景显示方法和装置
CN107454336B (zh) 图像处理方法及装置、电子装置和计算机可读存储介质
CN107682656B (zh) 背景图像处理方法、电子设备和计算机可读存储介质
CN107590793A (zh) 图像处理方法及装置、电子装置和计算机可读存储介质
CN107707838A (zh) 图像处理方法和装置
WO2019047983A1 (fr) Procédé et dispositif de traitement d'images, dispositif électronique et support d'informations lisible par ordinateur
CN107734266B (zh) 图像处理方法和装置、电子装置和计算机可读存储介质
CN107705276B (zh) 图像处理方法和装置、电子装置和计算机可读存储介质
CN107529020B (zh) 图像处理方法及装置、电子装置和计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18853620

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18853620

Country of ref document: EP

Kind code of ref document: A1