WO2021103474A1 - 图像的处理方法和装置、存储介质及电子装置 - Google Patents

图像的处理方法和装置、存储介质及电子装置 Download PDF

Info

Publication number
WO2021103474A1
WO2021103474A1 PCT/CN2020/094576 CN2020094576W WO2021103474A1 WO 2021103474 A1 WO2021103474 A1 WO 2021103474A1 CN 2020094576 W CN2020094576 W CN 2020094576W WO 2021103474 A1 WO2021103474 A1 WO 2021103474A1
Authority
WO
WIPO (PCT)
Prior art keywords
portrait
target
area
components
image
Prior art date
Application number
PCT/CN2020/094576
Other languages
English (en)
French (fr)
Inventor
余晓铭
易阳
李峰
蔡锴捷
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2021103474A1 publication Critical patent/WO2021103474A1/zh
Priority to US17/524,387 priority Critical patent/US20220067888A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20004Adaptive image processing
    • G06T2207/20012Locally adaptive

Definitions

  • This application relates to the field of Artificial Intelligence (AI), and specifically relates to an image processing technology.
  • AI Artificial Intelligence
  • the area in the image is simply defined as the foreground and the non-portrait background.
  • the image includes multiple people, it is usually difficult for the related technology to accurately identify the foreground people in the image.
  • the portraits of some limbs are also recognized as the foreground, which leads to misdetection of the foreground portrait recognition.
  • the embodiments of the present application provide an image processing method and device, a storage medium, and an electronic device, which can accurately identify the foreground person in the image and avoid the misdetection of the foreground person.
  • an image processing method including:
  • the target area where the target group of portrait components are located determines the target area where the target group of portrait components are located, wherein the target group of portrait components includes a target face, and the target area and the plurality of groups of portrait components are divided by Separate the areas where other groups of portrait parts are located except for the target group's portrait parts;
  • an image processing device including:
  • the first recognition unit is configured to recognize multiple groups of portrait components and the area where the multiple groups of portrait components are located from the first target image to be processed, wherein each group of portrait components corresponds to a human body;
  • the first determining unit is configured to determine the target area where the target group portrait part is located among the areas where the multiple groups of portrait parts are located, wherein the target group portrait part includes a target face, and the target area is related to the multiple groups. In the portrait parts, areas where other groups of portrait parts except the target group portrait parts are located are separated;
  • the first processing unit is configured to perform blur processing on an area other than the target area in the first target image to obtain a second target image.
  • a computer-readable storage medium stores a computer program, wherein the computer program is configured to execute the above-mentioned image processing when running. method.
  • an electronic device including a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the above-mentioned computer program through the computer program.
  • a computer program product which when running on a computer, causes the computer to execute the above-mentioned image processing method.
  • multiple groups of portrait components and areas where multiple groups of portrait components are located are identified from the first target image to be processed, where each group of portrait components corresponds to a human body, and the multiple groups of portrait components are located
  • the target area including the face is determined in the area, so that the area other than the target area in the first target image is blurred, that is, the target group portrait component including the face is recognized from the first target image, Therefore, the target area where the portrait component of the target group is located is determined as the foreground area, and the limbs of other characters that do not include the face are determined as the background area, thereby improving the accuracy of foreground person recognition and reducing the technical effect of false detection of portrait recognition.
  • Fig. 1 is a schematic diagram of an application environment of an optional image processing method according to an embodiment of the present application
  • Fig. 2 is a schematic flowchart of an optional image processing method according to an embodiment of the present application
  • Fig. 3 is a schematic diagram of an image processed by an optional image processing method according to an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of yet another optional image processing method according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an image processed according to another optional image processing method according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an image processed according to another optional image processing method according to an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of another optional image processing method according to an embodiment of the present application.
  • Fig. 8 is a schematic structural diagram of an initial recognition model according to an embodiment of the present application.
  • Fig. 9 is a schematic structural diagram of a coding network according to an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of yet another optional image processing apparatus according to an embodiment of the present application.
  • Fig. 11 is a schematic structural diagram of yet another optional image processing apparatus according to an embodiment of the present application.
  • Fig. 12 is a schematic structural diagram of an optional electronic device according to an embodiment of the present application.
  • Machine Learning is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance.
  • Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence.
  • Machine learning and deep learning usually include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and style teaching learning.
  • machine learning training can be improved to obtain a recognition model for recognizing portrait parts in an image, so that the portrait parts in the input image are recognized through the recognition module, and then the portrait part including the face is located.
  • the connected area is determined as the foreground area, so that the limbs of other people that do not include the face can be determined as the background image, thereby achieving the technical effect of improving the accuracy of foreground person recognition and reducing misdetection of portrait recognition.
  • an image processing method is provided.
  • the above image processing method can be, but is not limited to, applied to the environment shown in FIG. 1.
  • the image processing method of the embodiment of the present application can be used to process static images such as photos, blur the background area outside the portrait in the image, and can also be used to process the video frame images in the video.
  • the background area outside the portrait is blurred, and the background area of each frame of the video is blurred, so that the area outside the foreground portrait in the video is in a blurred state.
  • the video here can be video data generated in a video conference.
  • the application of the image processing method in the embodiment of the present application is not limited to the above examples.
  • the user equipment 102 can execute S120 through the processor 106 to identify multiple groups of portrait components and areas where the multiple groups of portrait components are located from the first target image to be processed, where each group of portrait components corresponds to a human body; the portrait components here can be Including: hair, face, torso, etc.; S122, in the area where multiple groups of portrait parts are located, determine the target area where the target group's portrait parts are located, where the target group's portrait parts include the target face, the target area, and multiple groups of portraits The areas where the other groups of portrait components except the target group’s portrait components are located in the components are separated; it is understandable that the hair, face, torso and other portrait components that belong to the same portrait in the image are connected to each other.
  • At least one connected area can be determined. Since the main characters in the photo or video are all showing their faces, the connected area including the area where the face is located can be determined as the target connected area. It is the foreground area in the first target image; S124, perform blur processing on the area other than the target area in the first target image to obtain the second target image; here, the area other than the target area is determined as the first target image The background area of the image is blurred, and the processed second target image is obtained.
  • the user equipment 102 may store the first target image and the second target image through a memory, and display the first target image and the processed second target image through the display 108.
  • the above-mentioned image processing method can be, but is not limited to, applied to the user equipment 102, and the background area in the image can also be blurred by an application (APP).
  • APP application
  • the aforementioned APP can be but not limited to running in the user equipment 102
  • the user equipment 102 can be, but not limited to, a mobile phone, a tablet computer, a notebook computer, a PC, and other terminal devices that support the running of the APP.
  • the above-mentioned image processing method can also be applied to a server, and the server assists in blurring the background area in the image, and sends the processed second target image to the user equipment 102.
  • the foregoing server and user equipment 102 may, but are not limited to, implement data interaction through a network, and the foregoing network may include, but is not limited to, a wireless network or a wired network.
  • the wireless network includes: Bluetooth, WIFI and other networks that realize wireless communication.
  • the aforementioned wired network may include, but is not limited to: wide area network, metropolitan area network, and local area network. The foregoing is only an example, and there is no limitation on this in this embodiment.
  • the image processing method provided in the embodiment of the present application may be executed by an electronic device (such as a user equipment or a server). As shown in FIG. 2, the foregoing image processing method includes:
  • Step 202 Identify multiple groups of portrait components and areas where the multiple groups of portrait components are located from the first target image to be processed, where each group of portrait components corresponds to a human body;
  • Step 204 Determine the target area where the target group portrait part is located in the area where the multiple groups of portrait parts are located.
  • the target group portrait part includes the target face, and the target area and the multiple groups of portrait parts except the target group portrait part Separate the areas where other groups of portrait parts are located;
  • Step 206 Perform blurring processing on the area other than the target area in the first target image to obtain a second target image.
  • the portrait component in the first target image can be recognized through the recognition model.
  • the portrait component here may include, but is not limited to: human face, hair, and torso.
  • the recognition model can recognize the portrait part 32 that is a human face, the portrait parts 34 and 38 that are the torso, and the portrait part 36 that is the hair in the first target image. It can be understood that the portrait parts that can be identified here are only examples, and the application is not limited to this.
  • the image since the background area outside the portrait needs to be blurred, as shown in FIG. 3, the image includes a plurality of portrait parts, and the area where these portrait parts are located is also the portrait area. It is understandable that the area where the multiple connected portrait parts are located is actually the portrait area.
  • the portrait corresponding to the portrait component 38 shown in FIG. 3 but such a portrait with only a torso should obviously not be determined as the foreground.
  • a group of portrait parts including human faces may be determined as the target group portrait parts, the area where the target group portrait parts are located is determined as the target area, and then the target area is determined as the first target image.
  • the target area In the foreground area of the first target image, the area other than the target area is determined as the background area, and the background area is blurred to obtain the processed second target image.
  • the background area As shown in Figure 3, for the area corresponding to the torso portrait part 38, since it does not include the human face portrait part, it needs to be blurred.
  • the oblique line covered part in Figure 3 is used to illustrate these
  • the area has been blurred, so that the limbs of other people that do not include the face can be determined as the background area, which improves the accuracy of foreground person recognition, reduces the technical effect of reducing misdetection of portrait recognition, and solves the problem of inaccurate foreground person recognition Technical problems caused by misdetection of portrait recognition.
  • a group of portrait components may include: a part or all of the components corresponding to a human body.
  • the face, arms, and hands of the object S can form a group of portrait parts, and the group of portrait parts corresponds to the object S.
  • the target area is separated from the area where other groups of portrait parts except the target group of portrait parts are located in the multiple groups of portrait parts.
  • a group of portrait components corresponding to the object S is the target area, and the target area is not connected to the area where other objects in the first target image are located.
  • determining the target area where the portrait components of the target group are located includes:
  • the torso of object A, the face of object A, and the two arms of object A are identified to form a group of portrait parts corresponding to object A
  • the torso of object B is identified to form an object corresponding to object B.
  • a group of portrait parts recognizes the torso of object C, the face of object C, and one arm of object C, forming a group of portrait parts corresponding to object C, that is, there are three groups of portrait parts in the image.
  • the two groups of portrait components include human faces. Determine the target area where the target group's portrait part is located in the two groups of portrait parts, and the way to determine it includes one of the following:
  • Method 1 Determine whether the area of the face included in each group of portrait parts is greater than or equal to the first threshold. If so, it can be determined that the group of portrait parts is the target group of portrait parts, and the area where the target group of portrait parts is located is target area.
  • the face area of the object N in the image is smaller than the face area of the object M.
  • the face area of the object M is 3 square centimeters
  • the face area of the object N is 2 square centimeters.
  • the first threshold is 3 square centimeters, then only the object M satisfies the condition.
  • a group of portrait parts corresponding to the object M serves as the target group of portrait parts.
  • Manner 2 Determine whether the area of each group of portrait parts is greater than or equal to the second threshold. If so, it can be determined that the group of portrait parts is the target group of portrait parts, and the area where the target group of portrait parts is located is the target area. For example, the face area of the subject M is 3 square centimeters, the area of the torso and arms is 5 square centimeters, the face area of the subject N is 2 square centimeters, but the area of the torso and arms is 10 square centimeters, assuming the second If the threshold is 8 square centimeters, both the object M and the object N satisfy the condition, and further, a group of portrait parts corresponding to the object M and a group of portrait parts corresponding to the object N can be both used as the target portrait parts.
  • the target group of portrait components can be determined simultaneously through the above-mentioned method 1 and method 2, that is, the area where the face is located is greater than or equal to the first threshold, and the group of portrait components is located The area of the area is greater than or equal to the second threshold, and it is determined as the target group of portrait parts.
  • the above-mentioned first threshold and/or second threshold are positively correlated with the size of the first target image. That is to say, in the case of comparing the area of the area where the faces included in the M groups of portrait components are located with the first threshold, the first threshold on which they are based is positively correlated with the size of the first target image; In the case of comparing the area of the area where the component is located with the second threshold, the second threshold on which it is based is positively correlated with the size of the first target image; when comparing the area of the area where the faces included in the M groups of portrait components are located with the first When the threshold values are compared and the area of the region where the M groups of portrait components are located is compared with the second threshold value, the first threshold value and the second threshold value on which the first threshold value and the second threshold value are based are both positively correlated with the size of the first target image.
  • the positive correlation may include, but is not limited to, a proportional relationship, an exponential relationship, and so on.
  • determining the target area where the portrait parts of the target group are located includes:
  • S2 Perform area recognition on the binary image to obtain a target area, where the target area includes pixels of the target face.
  • identifying multiple groups of portrait components and areas where multiple groups of portrait components are located from the first target image to be processed includes:
  • the first target image is processed through the recognition model to determine the multiple groups of portrait components and the regions where the multiple groups of portrait components are located.
  • the method before processing the first target image through the recognition model to determine the multiple groups of portrait components and the regions where the multiple groups of portrait components are located, the method includes:
  • S2 Train the initial recognition model based on the first set of training images and the second set of training images to obtain a trained recognition model, where the trained recognition model compares the estimated portrait area recognized by the first set of training images to a The error between the known portrait regions in the group region division result satisfies the first convergence condition, and the trained recognition model estimates the portrait components recognized by the second set of training images and the known portrait components in the set of training recognition results The error between the two satisfies the second convergence condition; the trained recognition model includes: a coding network used to encode images to obtain coded data, a portrait area recognition network for recognizing a portrait area according to the coded data, and recognition according to the coded data Portrait part recognition network for portrait parts.
  • training the initial recognition model based on the first set of training images and the second set of training images includes:
  • the first training image and the second training image are input into the initial recognition model, where the initial recognition model includes: an initial coding network, an initial portrait region recognition network, and an initial portrait component recognition network, and the initial coding network includes a cascaded first volume Multilayer, the initial portrait area recognition network includes a cascaded second convolutional layer, and the initial portrait component recognition network includes a cascaded third convolutional layer;
  • the first convolutional layer in the initial coding network receives the encoded data obtained by encoding the first training image and the second training image by the previous first convolutional layer in the cascade, and sends the encoded data to the corresponding second Convolutional layer, third convolutional layer and the next first convolutional layer in the cascade; receive the corresponding first convolutional layer and the encoded data sent by the second convolutional layer in the cascade through the initial portrait area recognition network , And perform portrait region recognition on the received coded data; receive the corresponding first convolutional layer and the third convolutional layer in the cascade through the initial portrait component recognition network to send the coded data, and perform a portrait on the received coded data Part identification.
  • multiple groups of portrait components and areas where multiple groups of portrait components are located are identified from the first target image to be processed, where each group of portrait components corresponds to a human body, and the multiple groups of portrait components are located
  • the target area where the target group portrait component including the face is located in the area is determined, so that the area other than the target area in the first target image is blurred, that is, the first target image is recognized as including the face
  • the target group of portrait parts thereby determining the target area where the target group’s portrait part is located as the foreground area, and determining the area where other groups of portrait parts excluding the human face are located as the background area, thus improving the recognition accuracy of the foreground person , To reduce the occurrence of false detection of portrait recognition.
  • the first target image is a video frame image in the target video
  • after performing blurring processing on an area other than the target area in the first target image after obtaining the second target image
  • the above method further includes: replacing the first target image in the target video with the second target image; and playing the second target image during the process of playing the target video. In this way, the image in the video can be blurred.
  • the above-mentioned image processing method includes:
  • S404 Determine the target connected area in the area where the portrait component is located in the first target image, where the target connected area includes the area of the face in the portrait component in the first target image;
  • S406 Perform blurring processing on areas in the first target image excluding the connected areas of the target to obtain a second target image.
  • the recognition model can identify the portrait component in the input image, where the portrait component may include but is not limited to: human face, hair, and torso.
  • the portrait component may include but is not limited to: human face, hair, and torso.
  • the portrait part 32 is a human face
  • the portrait parts 34 and 38 are the torso
  • the portrait part 36 is the hair. It can be understood that here The recognizable portrait parts are only examples, and the application is not limited thereto.
  • the image includes multiple portrait components, and the area where the multiple connected portrait components are connected is connected to the The obtained area is the portrait area.
  • the portrait corresponding to the portrait component 38 shown in FIG. 3 but such a portrait with only a torso should obviously not be determined as the foreground.
  • a connected area composed of portrait components including a human face is determined as the target connected area, the target connected area is determined as the foreground area, and the area other than the target connected area in the first target image is determined as the background Region, and perform blurring processing on the background region outside the target connected region to obtain the processed second target image.
  • the connected area where the torso portrait part 38 is located since it does not include the human face portrait part, it is also blurred.
  • the oblique line covered part in Figure 3 is used to indicate that these areas are In this way, the limbs of other characters that do not include the human face are determined as the background area, which realizes the technical effect of improving the accuracy of the foreground character recognition and reducing the false detection of the portrait recognition.
  • determining the target connected area in the area where the portrait component is located in the first target image includes:
  • S1 Determine each interconnected area in the area where the portrait component is located in the first target image as a candidate connected area, and obtain a set of candidate connected areas;
  • a portrait in the image includes: hair, face, and torso.
  • the areas where these portrait components are located are connected to each other.
  • the interconnection here can be direct or indirect, for example, the portrait component corresponding to the hair.
  • the area where the human face is located may be connected to the area where the human portrait part corresponding to the torso is located.
  • the area where the portrait component 36 is located can be connected to the area where the portrait component 34 is located through the area where the portrait component 32 is located.
  • the area where the face is located, the area where the torso is located in the image, etc. can be recognized.
  • the pixels in the area where the identified portrait component is located are marked.
  • the connected area where the pixels with the first type of mark are located in the image may be determined as a candidate connected area.
  • the pixels in a connected area all have the first type mark. It is understandable that the same mark may be used for different portrait parts, or a corresponding mark configured in advance may be used for different portrait parts.
  • the pixel points of the connected area where the portrait component is located may be set as the target pixel value, so that the connected area where the target pixel value is located is determined as the candidate connected area.
  • the pixels in a candidate connected area are all target pixel values. It is understandable that when there are multiple portraits in the image, multiple candidate connected regions can be determined.
  • S2 Determine a candidate connected area in a group of candidate connected areas whose area is greater than a first threshold and include a face as a target connected area, or determine a candidate connected area in a group of candidate connected areas that includes a face as a target connected area area.
  • a candidate connected area including a human face in a set of candidate connected areas is determined as a target connected area. It can be understood that, in the case where multiple candidate connected regions all include regions corresponding to human faces, they can all be determined as target connected regions, and thus serve as foreground regions. As shown in FIG. 5, there may be multiple portraits in the first target image, and the regions corresponding to each of these portraits including the human face can all be determined as foreground regions.
  • the area of a group of candidate connected areas is larger than the first threshold and includes people.
  • the candidate connected region of the face is determined as the target connected region.
  • the first threshold can be set according to actual conditions, for example, it can be set to one-fourth to one-sixth of the first target image, and the first threshold is positively correlated with the size of the first target image. So as to avoid setting a fixed value to cause the inability to adapt to images of different sizes. It is understandable that the first threshold here can also be set to a fixed value.
  • determining each interconnected area in the area where the portrait component is located in the first target image is a candidate connected area to obtain a set of candidate connected areas, including:
  • the first target image when determining the candidate connected region, may be binarized first, so that the candidate connected region where the portrait is located can be easily determined by performing region recognition on the binary image.
  • the pixel values of the pixels corresponding to the portrait part can be set to the first pixel value
  • the pixel values of the pixels other than the pixel points corresponding to the portrait part can be set to the second pixel value, so as to achieve the first target image Converted to binary image.
  • the first target image after the recognition result of the portrait component in the first target image is determined by the recognition model, the first target image can be binarized according to the recognition result of the portrait component to obtain the processed binary image.
  • the target candidate area can be determined from a set of candidate connected areas, and the area except the target candidate area in the first target image can be blurred to obtain the second target image, and the background area can be blurred.
  • a connected domain detection method may be used for area identification.
  • the recognition model which can be a deep neural network
  • the method before the first target image to be processed is input to the recognition model to obtain the human image recognition result output by the recognition model, the method further includes:
  • the trained recognition model includes: coding network, portrait area recognition network, and portrait part recognition network.
  • the coding network term encodes the image to obtain the coded data
  • the portrait area recognition network is used according to The coded data recognizes the portrait area
  • the portrait component recognition network is used to recognize the portrait component based on the coded data;
  • the first set of training images and a set of region division results corresponding to the first set of training images can be obtained, and each region division result in the set of region division results is used to represent the first set of training images.
  • the division result of the portrait area in the corresponding image in the image you can also obtain the second set of training images and a set of training recognition results corresponding to the second set of training images, and each training recognition result in the set of training recognition results
  • the recognition result is used to represent the recognition result of the portrait component in the corresponding image in the second set of training images, so that the initial recognition model is trained through the first set of training images and the second set of training images.
  • the initial recognition model includes an initial coding network, an initial portrait region recognition network, and an initial portrait component recognition network.
  • the trained recognition model has a trained coding network, a portrait region recognition network, and a portrait component recognition network. .
  • the recognition model used is used to recognize the portrait parts in the input image, and the portrait region recognition module is not needed. Therefore, the portrait region recognition network in the trained recognition model can be deleted to obtain the recognition model .
  • the recognition model of the deletion of the human image area recognition network can improve the recognition efficiency because the amount of processing required is reduced.
  • the human image area recognition network is set.
  • the first set of training images and a set of region division results can be used to expand the amount of training data for the initial recognition model. Both need to be coded through the coding network first, so the accuracy of the coding network can also be effectively improved through the first set of training images, thereby improving the recognition accuracy of the recognition model obtained by training.
  • training the initial recognition model based on the first set of training images and the second set of training images includes:
  • the initial recognition model includes: an initial coding network, an initial portrait region recognition network, and an initial portrait component recognition network
  • the initial coding network includes a cascaded first convolution Layer
  • the initial portrait region recognition module includes a cascaded second convolutional layer
  • the initial portrait component recognition module includes a cascaded third convolutional layer.
  • the initial recognition model receives the encoded data obtained by encoding the first training image and the second training image by the previous first convolutional layer in the cascade through the first convolutional layer in the initial encoding network, and sends the encoded data to the corresponding The second convolutional layer, the third convolutional layer and the next first convolutional layer of the cascade.
  • the initial recognition model receives the encoded data sent by the corresponding first convolutional layer and the cascaded last second convolutional layer through the initial portrait region recognition network, and performs portrait region recognition on the received encoded data.
  • the initial recognition model receives the coded data sent by the corresponding first convolutional layer and the previous third convolutional layer in the cascade through the initial portrait component recognition network, and performs portrait component recognition on the received coded data.
  • the initial coding network in the initial recognition model in the embodiment of the present application is the coding network in the trained recognition model after the training is completed.
  • the initial portrait region recognition network is the post-training network after the training is completed.
  • the face area recognition network in the recognition model, the initial face part recognition network is the face part recognition network in the trained recognition model after the training is completed.
  • each network includes multiple convolutional layers that are cascaded, and the first pair of convolutional layers in the first convolutional layer cascaded in the initial coding network Enter the image of the initial recognition model for encoding, and send the encoded encoded data to the next first convolutional layer in the cascade, the corresponding second convolutional layer in the initial portrait region recognition network, and the initial portrait component recognition network.
  • the corresponding third convolutional layer is the first convolutional layer in the cascaded, and the first pair of convolutional layers in the first convolutional layer cascaded in the initial coding network Enter the image of the initial recognition model for encoding, and send the encoded encoded data to the next first convolutional layer in the cascade, the corresponding second convolutional layer in the initial portrait region recognition network, and the initial portrait component recognition network.
  • the corresponding third convolutional layer is the third convolutional layer.
  • the second convolutional layer in the initial portrait area recognition network receives the output data of the cascaded last second convolutional layer and the data transmitted by the corresponding first convolutional layer, and the initial portrait component recognizes the third convolutional block in the network Receive the data output by the last third convolutional layer in the cascade and the data transmitted by the corresponding first convolutional layer.
  • the initial portrait region recognition network and the initial portrait component recognition network in the embodiments of the present application respectively perform two tasks of portrait segmentation and portrait component analysis. While expanding the scale of training data, the initial recognition model can simultaneously obtain images from portraits. The overall human body perceptual incentive provided by the segmentation task, and the human partial detail perceptual incentive provided by the portrait component analysis task, thereby improving the performance of the model.
  • the first convolutional layer can be a dense convolutional layer, that is, the first convolutional layer can include multiple densely connected residual modules, so as to efficiently encode different scales of the image, so that the features include Rich information at different scales.
  • the architecture of the trained recognition model in the embodiments of this application is the same as the initial recognition model.
  • the difference between the architecture of the recognition model and the initial recognition model is that there is no face area recognition network, coding network
  • the data transmission architecture of the human image component recognition network is the same as the architecture shown in Figs. 8-9.
  • k ⁇ k represents the convolution operation of the convolution layer (also called convolution kernel) with the size of k ⁇ k
  • C represents the concatenation of feature channels
  • Add represents features
  • the bilinear interpolation operation Upsample represents the bilinear interpolation operation with an upsampling multiple of 2.
  • the first convolutional layer may include a plurality of densely connected residual modules.
  • the architecture of the second convolutional layer may be the same as the architecture of the third convolutional layer shown in FIG. 9.
  • the input of the third convolutional layer includes the output of the cascaded last third convolutional layer and the output of the corresponding first convolutional layer.
  • the training loss Loss may be:
  • CrossEntropy(.) represents cross-entropy loss
  • HS represents the portrait segmentation data set, which contains N training examples, such as the first set of training images
  • S gt represents the real portrait segmentation label corresponding to image I
  • the results can be divided according to a set of regions It is determined that HP represents the portrait component analysis data set, which contains M training examples, such as the second set of training images
  • P gt represents the real portrait component analysis label corresponding to the image I, which can be determined according to a group of portrait segmentation results. It is understandable that when the aforementioned training loss Loss is less than the set value, it can be considered that the convergence condition is currently met.
  • the recognition model in the embodiment of the present application may be a Deep Neural Network (DNN) model, a Convolutional Neural Network (CNN) model, etc. in a classification model based on deep learning.
  • DNN Deep Neural Network
  • CNN Convolutional Neural Network
  • the method further includes :
  • S904 Play the second target image during the process of playing the target video.
  • the first target image may be a video frame image in the target video, for example, may be an image frame in a video transmitted by a video conference.
  • the area of the first target image in the target video except the connected area of the target can be blurred to obtain the second target image, and the first target image in the target video can be replaced with the second target Image, so that the second target image is played when the target video is played, and the background area is blurred to highlight the characters in the video conference.
  • the above-mentioned blurring processing is only an optional embodiment provided by the present application, and the present application is not limited to this.
  • an image processing device for implementing the above-mentioned image processing method.
  • the device includes:
  • the first recognition unit 1001 is configured to recognize multiple groups of portrait components and areas where the multiple groups of portrait components are located from the first target image to be processed, wherein each group of portrait components corresponds to a human body;
  • the first determining unit 1003 is configured to determine the target area where the target group of portrait parts is located in the area where the plurality of groups of portrait parts are located, wherein the target group of portrait parts includes a target face, and the target area is connected to the target area.
  • the target group of portrait parts includes a target face, and the target area is connected to the target area.
  • the first processing unit 1005 is configured to perform blurring processing on areas in the first target image excluding the target area to obtain a second target image.
  • the foregoing first determining unit 1003 may include:
  • the first determining module is used to determine the M groups of portrait components including faces among the N groups of portrait components, where the N groups of portrait components are multiple groups of portrait components, and N ⁇ M ⁇ 1;
  • the second determining module is used to determine the target area where the portrait part of the target group is located in the area where the portrait part of the M group is located, wherein the area of the area where the face included in the portrait part of the target group is located is greater than or equal to the first threshold, and /Or, the area of the region where the portrait component of the target group is located is greater than or equal to the second threshold.
  • the first threshold and/or the second threshold are positively correlated with the size of the first target image.
  • the foregoing second determining module may include:
  • the setting sub-module is used to set the pixel values of the pixels corresponding to the M groups of portrait components as the first pixel value, and set the pixel values of the pixels in the first target image except the pixels corresponding to the M groups of portrait components as the first pixel value. Two pixel values to obtain a binary image, where the first pixel value is different from the second pixel value;
  • the processing sub-module is used to perform area recognition on the binary image to obtain the target area, where the target area includes the pixel points of the target face.
  • the first recognition unit 1001 recognizes multiple groups of portrait components and the region where the multiple groups of portrait components are located from the first target image to be processed, wherein each group of portrait components corresponds to a human body; the first determination The unit 1003 determines the target area where the target group of portrait parts are located in the area where the plurality of groups of portrait parts are located, wherein the target group of portrait parts includes a target face, and the target area is divided from the plurality of groups of portrait parts. The regions where the other groups of portrait components except the target group of portrait components are located are separated; the first processing unit 1005 performs blurring processing on the regions in the first target image except the target region to obtain a second target image.
  • the area outside the target area is blurred, that is, the target group portrait component including the face is recognized from the first target image, and the target area where the target group portrait component is located is determined as the foreground area.
  • the limbs of other characters including the human face are determined as the background area, thereby improving the accuracy of the foreground character recognition and reducing the false detection of the portrait recognition.
  • the foregoing apparatus may further include:
  • the replacement unit is used for when the first target image is a video frame image in the target video, the first processing unit performs blurring processing on the area other than the target area in the first target image to obtain the second target image , Replacing the first target image in the target video with the second target image;
  • the playing unit is used to play the second target image during the process of playing the target video.
  • an image processing device for implementing the above-mentioned image processing method.
  • the device includes:
  • the second recognition unit 1102 is used to input the first target image to be processed into the recognition model to obtain the portrait recognition result output by the recognition model.
  • the target recognition model is used to recognize the parts of the portrait in the image, and the portrait recognition result Used to represent the human portrait component recognized in the first target image;
  • the second determining unit 1104 is configured to determine the target connected area in the area where the portrait component is located in the first target image, where the target connected area includes the area of the face in the portrait component in the first target image;
  • the second processing unit 1106 is configured to perform blurring processing on areas in the first target image excluding the connected areas of the target to obtain a second target image.
  • the recognition model can recognize the portrait components in the input image, where the portrait components may include but are not limited to: human face, hair, and torso.
  • the image since the background area outside the portrait needs to be blurred, the image includes multiple portrait components, and the area obtained by connecting the areas where the multiple connected portrait components are connected is the portrait area.
  • the connected area including the human face is determined as the target connected area, thus the target connected area is determined as the foreground area, the area other than the target connected area is determined as the background area, and the first target image except The background area outside the target connected area is blurred to obtain a processed second target image.
  • the embodiments of the present application can determine the limbs of other people that do not include the human face as the background area, thereby improving the accuracy of foreground person recognition and reducing false detections of portrait recognition.
  • the second determining unit 1104 includes:
  • the third determining module is configured to determine each interconnected area in the area where the portrait component is located in the first target image as a candidate connected area to obtain a group of candidate connected areas;
  • the fourth determining module is used to determine the candidate connected areas in a group of candidate connected areas whose area is greater than the first threshold and include faces as target connected areas, or connect the candidate connected areas in a group of candidate connected areas that include faces The area is determined as the target connected area.
  • the first threshold is positively correlated with the size of the first target image.
  • the first determining module is specifically used for:
  • the foregoing device may further include:
  • the first acquisition unit is used to acquire a first set of training images, a second set of training images, a set of region division results, and a set of training recognition results, where the first set of training images corresponds to a set of region division results one-to-one, Each region division result is used to represent the known portrait region in an image in the first group of training images, the second group of training images corresponds to a group of training recognition results one-to-one, and each training recognition result is used to represent the second group Known portrait parts in an image in the training image;
  • the training unit is used to train the initial recognition model based on the first set of training images and the second set of training images to obtain a trained recognition model, where the trained recognition model is based on the estimated portraits recognized by the first set of training images
  • the error between the region and the known portrait region in the result of a set of region divisions satisfies the first convergence condition
  • the trained recognition model compares the estimated portrait parts recognized by the second set of training images to the estimated portrait components in the set of training recognition results.
  • the trained recognition model includes: an encoding module, a portrait region recognition module, and a portrait component recognition module.
  • the coding module is used to encode images to obtain encoded data;
  • the portrait region recognition module is used to Identify the portrait area according to the coded data;
  • the portrait part recognition module is used to identify the estimated portrait part according to the coded data;
  • the second processing unit is used to delete the portrait region recognition module in the trained recognition model to obtain the target recognition model.
  • the training unit includes:
  • the input module is used to select the first training image from the first set of training images, and select the second training image from the second set of training images; input the first training image and the second training image to the initial recognition model, where
  • the initial recognition model includes: an initial coding network, an initial portrait region recognition network, and an initial portrait component recognition network.
  • the initial coding network includes a cascaded first convolutional layer
  • the initial portrait region recognition network includes a cascaded second convolutional layer.
  • the initial portrait component recognition network includes a cascaded third convolutional layer, and the first convolutional layer in the initial coding network is used to receive the cascaded last first convolutional layer to encode the first training image and the second training image After the encoded data, and send the encoded data to the corresponding second convolutional layer, third convolutional layer and the next first convolutional layer in the cascade.
  • the initial portrait area recognition network is used to receive the corresponding first convolution Layer and the encoded data sent by the last second convolutional layer of the cascade, and perform portrait area recognition on the received encoded data; the initial portrait component recognition network is used to receive the corresponding first convolutional layer and the last first convolutional layer of the cascade.
  • the coded data sent by the three convolutional layers, and the received coded data is recognized as a portrait component.
  • the electronic device for implementing the foregoing image processing method.
  • the electronic device includes a memory 1202 and a processor 1204, and the memory 1202 stores A computer program, and the processor 1204 is configured to execute the steps in any one of the foregoing method embodiments through the computer program.
  • the above-mentioned electronic device may be located in at least one network device among a plurality of network devices in a computer network.
  • the foregoing processor may be configured to execute the following steps through a computer program:
  • S2. Determine the target area where the target group's portrait parts are located in the area where the multiple groups of portrait parts are located, where the target group's portrait parts include the target face, and the target area and the multiple groups of portrait parts other than the target group's portrait parts The areas where the group portrait parts are located are separated;
  • the structure shown in FIG. 12 is only for illustration, and the electronic device may also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, and a mobile Internet device (Mobile Internet Devices, MID), PAD and other terminal devices.
  • FIG. 12 does not limit the structure of the above electronic device.
  • the electronic device may also include more or fewer components (such as a network interface, etc.) than shown in FIG. 12, or have a different configuration from that shown in FIG.
  • the memory 1202 can be used to store software programs and modules, such as the image processing method and device corresponding program instructions/modules in the embodiments of the present application.
  • the processor 1204 executes the software programs and modules stored in the memory 1202 by running Various functional applications and data processing, that is, to achieve the above-mentioned image processing methods.
  • the memory 1202 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory 1202 may further include a memory remotely provided with respect to the processor 1204, and these remote memories may be connected to the terminal through a network.
  • the memory 1202 can be specifically, but not limited to, used to store information such as the first target image and the second target image.
  • the memory 1202 may, but is not limited to, include the first identification unit 1001, the first determination unit 1003, and the first processing unit 1005 in the image processing apparatus described above.
  • it may also include, but is not limited to, other module units in the above-mentioned image processing device, which will not be repeated in this example.
  • the aforementioned transmission device 1206 is used to receive or send data via a network.
  • the above-mentioned specific examples of networks may include wired networks and wireless networks.
  • the transmission device 1206 includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices and routers via a network cable so as to communicate with the Internet or a local area network.
  • the transmission device 1206 is a radio frequency (RF) module, which is used to communicate with the Internet in a wireless manner.
  • RF radio frequency
  • the above-mentioned electronic device further includes: a display 1208 for displaying the above-mentioned first target image and second target image; and a connection bus 1210 for connecting each module component in the above-mentioned electronic device.
  • a computer-readable storage medium and a computer program is stored in the computer-readable storage medium, wherein the computer program is configured to execute any of the foregoing when running. The steps in the method embodiment.
  • the foregoing computer-readable storage medium may be configured to store a computer program for executing the following steps:
  • S2. Determine the target area where the target group's portrait parts are located in the area where the multiple groups of portrait parts are located, where the target group's portrait parts include the target face, and the target area and the multiple groups of portrait parts other than the target group's portrait parts The areas where the group portrait parts are located are separated;
  • the storage medium may include: a flash disk, a read-only memory (Read-Only Memory, ROM), a random access device (Random Access Memory, RAM), a magnetic disk or an optical disk, etc.
  • the integrated unit in the foregoing embodiment is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in the foregoing computer-readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, It includes several instructions to make one or more computer devices (which may be personal computers, servers, or network devices, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the disclosed client can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division.
  • there may be other division methods for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, units or modules, and may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

一种图像的处理方法和装置、存储介质及电子装置,其中,该方法包括:从待处理的第一目标图像中识别多组人像部件和多组人像部件所在的区域,其中,每组人像部件对应于一个人体(S202);在多组人像部件所在的区域中,确定目标组人像部件所在的目标区域,其中,目标组人像部件包括目标人脸,目标区域与多组人像部件中除目标组人像部件之外的其它组人像部件所在的区域相隔开(S204);对第一目标图像中除目标区域以外的区域进行虚化处理,得到第二目标图像(S206)。上述方法和装置、存储介质及电子装置解决了由于前景人物识别不准确造成的人像识别误检的技术问题。

Description

图像的处理方法和装置、存储介质及电子装置
本申请要求于2019年11月26日提交中国专利局、申请号为201911175754.X、申请名称为“图像的处理方法和装置、存储介质及电子装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能(Artificial Intelligence,AI)领域,具体涉及一种图像处理技术。
背景技术
在相关技术中,图像中的区域被简单的定义为人像前景与非人像背景,当图像中包括多个人物时,相关技术通常难以准确地识别出图像中的前景人物,往往将图像中仅具有部分肢体的人像也识别为前景,导致前景人像识别出现误检。
针对上述问题,目前尚未提出有效的解决方案。
发明内容
本申请实施例提供了一种图像的处理方法和装置、存储介质及电子装置,能够准确地识别图像中的前景人物,避免发生前景人物误检的情况。
根据本申请实施例的一个方面,提供了一种图像的处理方法,包括:
从待处理的第一目标图像中识别多组人像部件和所述多组人像部件所在的区域,其中,每组人像部件对应于一个人体;
在所述多组人像部件所在的区域中,确定目标组人像部件所在的目标区域,其中,所述目标组人像部件包括目标人脸,所述目标区域与所述多组人像部件中除所述目标组人像部件之外的其它组人像部件所在的区域相隔开;
对所述第一目标图像中除所述目标区域以外的区域进行虚化处理,得到第二目标图像。
根据本申请实施例的另一方面,还提供了一种图像的处理装置,包括:
第一识别单元,用于从待处理的第一目标图像中识别多组人像部件和所述多组人像部件所在的区域,其中,每组人像部件对应于一个人体;
第一确定单元,用于在所述多组人像部件所在的区域中确定目标组人像部件所在的目标区域,其中,所述目标组人像部件包括目标人脸,所述目标区域与所述多组人像部件中除所述目标组人像部件之外的其它组人像部件所在的区域相隔开;
第一处理单元,用于对所述第一目标图像中除所述目标区域以外的区域进行虚化处理,得到第二目标图像。
根据本申请实施例的又一方面,还提供了一种计算机可读的存储介质,该计算机可读的存储介质中存储有计算机程序,其中,该计算机程序被设置 为运行时执行上述图像的处理方法。
根据本申请实施例的又一方面,还提供了一种电子装置,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,上述处理器通过计算机程序执行上述的图像的处理方法。
根据本申请实施例的又一方面,还提供了一种计算机程序产品,当其在计算机上运行时,使得计算机执行上述的图像的处理方法。
在本申请实施例中,通过从待处理的第一目标图像中识别出多组人像部件和多组人像部件所在的区域,其中,每组人像部件对应于一个人体,在多组人像部件所在的区域中确定出包括人脸的目标区域,从而对第一目标图像中除目标区域以外的区域进行虚化处理,也就是说,从第一目标图像中识别出包括人脸的目标组人像部件,从而将目标组人像部件所在的目标区域确定为前景区域,将不包括人脸的其它人物的肢体确定为背景区域,从而提高了前景人物识别准确度,减少了人像识别误检的技术效果。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是根据本申请实施例的一种可选的图像的处理方法的应用环境的示意图;
图2是根据本申请实施例的一种可选的图像的处理方法的流程示意图;
图3是根据本申请实施例的一种可选的图像的处理方法进行图像处理的图像示意图;
图4是根据本申请实施例的又一种可选的图像的处理方法的流程示意图;
图5是根据本申请实施例的另一种可选的图像的处理方法进行图像处理的图像示意图;
图6是根据本申请实施例的又一种可选的图像的处理方法进行图像处理的图像示意图;
图7是根据本申请实施例的另一种可选的图像的处理方法的流程示意图;
图8是根据本申请实施例的初始识别模型的结构示意图;
图9是根据本申请实施例的编码网络的结构示意图;
图10是根据本申请实施例的又一种可选的图像的处理装置的结构示意图;
图11是根据本申请实施例的又一种可选的图像的处理装置的结构示意图;
图12是根据本申请实施例的一种可选的电子装置的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都应当属于本申请保护的范围。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、***、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。在本申请实施例中,可以提高机器学习训练获得用于对图像中的人像部件进行识别的识别模型,从而通过识别模块识别输入的图像中的人像部件,进而将包括人脸的人像部件所在的连通区域确定为前景区域,从而能够将不包括人脸的其它人物的肢体确定为背景图像,从而实现了提高前景人物识别准确度,减少人像识别误检的技术效果。
根据本申请实施例的一个方面,提供了一种图像的处理方法,作为一种可选的实施方式,上述图像的处理方法可以但不限于应用于如图1所示的环境中。
本申请实施例的图像处理方法可以用于对照片等静态图像进行处理,将图像中人像外的背景区域进行虚化,还可以用于对视频中的视频帧图像进行处理,对视频帧图像中人像外的背景区域进行虚化,通过对视频中每帧图像进行背景虚化,使得视频中前景人像以外的区域处于虚化的状态,这里的视频可以是视频会议中产生的视频数据。当然可以理解的是,本申请实施例的图像处理方法的应用并不仅限于上述举例。
用户设备102可以通过处理器106执行S120,从待处理的第一目标图像中识别多组人像部件和多组人像部件所在的区域,其中,每组人像部件对应于一个人体;这里的人像部件可以包括:头发、人脸、躯干等等;S122,在 多组人像部件所在的区域中,确定目标组人像部件所在的目标区域,其中,目标组人像部件包括目标人脸,目标区域与多组人像部件中除目标组人像部件之外的其它组人像部件所在的区域相隔开;可以理解的是,在图像中同属于一个人像的头发、人脸、躯干等人像部件是相互连通的,故而在第一目标图像中存在人像的情况下,可以确定出至少一个连通区域,由于照片或者视频时主要的人物均是露脸的,故而可以将包括人脸所在区域的连通区域确定为目标连通区域,也就是第一目标图像中的前景区域;S124,对第一目标图像中除目标区域以外的区域进行虚化处理,得到第二目标图像;这里,将目标区域以外的区域确定为第一目标图像中的背景区域,从而对其进行虚化处理,得到处理后的第二目标图像。在本申请实施例中,通过对第一目标图像中的人像部件进行识别,将包括人脸的连通区域确定为前景区域,从而能够将不包括人脸的其它组人像部件所在的区域确定为背景图像,从而实现了提高前景人物识别准确度,减少人像识别误检的技术效果。这里,用户设备102可以通过存储器存储第一目标图像和第二目标图像,通过显示器108显示第一目标图像和处理后的第二目标图像。
可选地,在本实施例中,上述图像的处理方法可以但不限于应用于用户设备102中,还可以由应用程序(Application,APP)对图像中的背景区域进行虚化处理。其中,上述APP可以但不限于运行在用户设备102中,该用户设备102可以但不限于为手机、平板电脑、笔记本电脑、PC机等支持运行APP的终端设备。
可以理解的是,上述图像的处理方法还可以应用于服务器,通过服务器协助对图像中的背景区域进行虚化处理,并将处理后的第二目标图像发送至用户设备102。上述服务器和用户设备102可以但不限于通过网络实现数据交互,上述网络可以包括但不限于无线网络或有线网络。其中,该无线网络包括:蓝牙、WIFI及其它实现无线通信的网络。上述有线网络可以包括但不限于:广域网、城域网、局域网。上述仅是一种示例,本实施例中对此不作任何限定。
作为一种可选的实施方式,本申请实施例提供的图像的处理方法可以由电子设备(如用户设备或服务器)执行,如图2所示,上述图像的处理方法包括:
步骤202,从待处理的第一目标图像中识别多组人像部件和多组人像部件所在的区域,其中,每组人像部件对应于一个人体;
步骤204,在多组人像部件所在的区域中,确定目标组人像部件所在的目标区域,其中,目标组人像部件包括目标人脸,目标区域与多组人像部件中除目标组人像部件之外的其它组人像部件所在的区域相隔开;
步骤206,对第一目标图像中除目标区域以外的区域进行虚化处理,得到第二目标图像。
在本申请实施例中,可以通过识别模型可以对第一目标图像中的人像部件进行识别,这里的人像部件可以包括但不限于:人脸、头发、躯干。如图3所示的,将第一目标图像输入至识别模型后,识别模型可以识别出第一目标图像中为人脸的人像部件32、为躯干的人像部件34和38、为头发的人像部件36,可以理解的是,这里可以识别的人像部件仅为举例,本申请并不限于此。
在本申请实施例中,由于需要对人像外的背景区域进行虚化处理,如图3所示,该图像中包括多个人像部件,这些人像部件所在的区域也就是人像区域。可以理解的是,多个连通的人像部件所在的区域实际上就是人像区域。这里,由于在拍照等图像采集的过程中可能会误入其他人像,如图3所示的人像部件38对应的人像,但是对于这样仅具有躯干的人像显然不应被确定为前景。
在本申请实施例中,可以将一组包括人脸的人像部件确定为目标组人像部件,将目标组人像部件所在的区域确定为目标区域,进而,将该目标区域确定为第一目标图像中的前景区域,将第一目标图像中除目标区域以外的区域确定为背景区域,并对该背景区域进行虚化处理,得到处理后的第二目标图像。如图3所示的,对于为躯干的人像部件38所对应的区域,由于其中不包括人脸的人像部件,因此需要对其进行虚化处理,图3中斜线覆盖部分用于示意对这些区域进行了虚化处理,从而能够将不包括人脸的其它人物的肢体确定为背景区域,实现了提高前景人物识别准确度,减少人像识别误检的技术效果,解决了由于前景人物识别不准确造成的人像识别误检的技术问题。
在本实施例中,一组人像部件可以包括:一个人体对应的部分或全部的部件。例如,对象S的脸、胳膊、手,这3个部件可以组成一组人像部件,该组人像部件对应于对象S。
需要说明的是,目标区域与多组人像部件中除目标组人像部件之外的其它组人像部件所在的区域相隔开。例如,对象S对应的一组人像部件为目标区域,该目标区域与第一目标图像中的其它对象所在的区域不连通。
可选的,本实施例中,步骤S204中在多组人像部件所在的区域中,确定目标组人像部件所在的目标区域,包括:
S1,在N组人像部件中确定包括人脸的M组人像部件,其中,N组人像部件为从第一目标图像中识别出的多组人像部件,N≥M≥1;
S2,在M组人像部件所在的区域中,确定目标组人像部件所在的目标区域,其中,目标组人像部件包括的人脸所在的区域的面积大于或等于第一阈值,和/或,目标组人像部件所在的区域的面积大于或等于第二阈值。
例如,在一张图像中识别出对象A的躯干、对象A人脸以及对象A的2支胳膊,组成对应于对象A的一组人像部件,识别出对象B的躯干,组成对应于对象B的一组人像部件,识别出对象C的躯干、对象C的人脸以及对象 C的1支胳膊,组成对应于对象C的一组人像部件,也就是说,该图像中存在三组人像部件。其中,两组人像部件中包括人脸。在这两组人像部件中确定目标组人像部件所在的目标区域,其确定的方式包括以下之一:
方式一:确定每组人像部件中包括的人脸所在的区域的面积是否大于或等于第一阈值,若是,则可确定该组人像部件为目标组人像部件,该目标组人像部件所在的区域为目标区域。在一张图像中,对象M位于对象N前面的时,则在图像中对象N的人脸面积相比对象M人脸的面积要小。例如,在该张图像中,对象M的人脸面积为3平方厘米,对象N的人脸面积为2平方厘米,假设第一阈值为3平方厘米,那么只有对象M满足条件,进而,可以将对象M对应的一组人像部件作为目标组人像部件。
方式二:确定每组人像部件所在的区域的面积是否大于或等于第二阈值,若是,则可确定该组人像部件为目标组人像部件,该目标组人像部件所在的区域为目标区域。例如,对象M的人脸面积为3平方厘米,躯干以及胳膊所在的区域为5平方厘米,对象N的人脸面积为2平方厘米,但躯干以及胳膊所在的区域为10平方厘米,假设第二阈值为8平方厘米,那么对象M和对象N均满足条件,进而,可以将对象M对应的一组人像部件和对象N对应的一组人像部件均作为目标人像部件。
还需说明的是,在本申请实施例中,可以将同时通过上述方式一和方式二确定目标组人像部件,即将人脸所在的区域的面积大于或等于第一阈值、且一组人像部件所在的区域的面积大于或等于第二阈值,确定为目标组人像部件。
在本实施例中,上述第一阈值和/或第二阈值与第一目标图像的尺寸呈正相关。也就是说,在将M组人像部件包括的人脸所在的区域的面积与第一阈值进行比较的情况下,所依据的第一阈值与第一目标图像的尺寸呈正相关;在将M组人像部件所在的区域的面积与第二阈值进行比较的情况下,所依据的第二阈值与第一目标图像的尺寸呈正相关;在将M组人像部件包括的人脸所在的区域的面积与第一阈值进行比较、且将M组人像部件所在的区域的面积与第二阈值进行比较的情况下,所依据的第一阈值和第二阈值均与第一目标图像的尺寸呈正相关。其中,正相关可以包括但不限于正比关系、指数关系等等。
可选的,在M组人像部件所在的区域中,确定目标组人像部件所在的目标区域,包括:
S1,将M组人像部件对应的像素点的像素值设置为第一像素值,将第一目标图像中除M组人像部件对应的像素点以外的像素点的像素值设置为第二像素值,得到二值图像,其中,第一像素值与第二像素值不同;
S2,对二值图像进行区域识别,得到目标区域,其中,目标区域中包括目标人脸的像素点。
可选的,在本实施例中,从待处理的第一目标图像中识别多组人像部件和多组人像部件所在的区域,包括:
通过识别模型处理第一目标图像,确定多组人像部件和多组人像部件所在的区域。
可选的,在本实施例中,在通过识别模型处理第一目标图像,确定多组人像部件和多组人像部件所在的区域之前,包括:
S1,获取第一组训练图像、第二组训练图像、以及一组区域划分结果、一组训练识别结果,其中,第一组训练图像与一组区域划分结果一一对应,每个区域划分结果用于表示第一组训练图像中一张图像中的已知人像区域,第二组训练图像与一组训练识别结果一一对应,每个训练识别结果用于表示第二组训练图像中一张图像中的已知人像部件;
S2,基于第一组训练图像和第二组训练图像对初始识别模型进行训练,得到训练后的识别模型,其中,训练后的识别模型对第一组训练图像识别到的预估人像区域与一组区域划分结果中的已知人像区域之间的误差满足第一收敛条件,训练后的识别模型对第二组训练图像识别到的预估人像部件与一组训练识别结果中的已知人像部件之间的误差满足第二收敛条件;训练后的识别模型包括:用于对图像进行编码得到编码数据的编码网络、根据所述编码数据识别人像区域的人像区域识别网络、根据所述编码数据识别人像部件的人像部件识别网络。
其中,基于第一组训练图像和第二组训练图像对初始识别模型进行训练,包括:
从所述第一组训练图像中选出第一训练图像,从第二组训练图像中选出第二图像;
将第一训练图像和第二训练图像输入初始识别模型,其中,初始识别模型包括:初始编码网络、初始人像区域识别网络和初始人像部件识别网络,所述初始编码网络包括级联的第一卷积层,所述初始人像区域识别网络包括级联的第二卷积层,所述初始人像部件识别网络包括级联的第三卷积层;
通过初始编码网络中的第一卷积层接收级联的上一个第一卷积层对第一训练图像和第二训练图像进行编码后得到的编码数据,并将编码数据发送至对应的第二卷积层、第三卷积层和级联中下一个第一卷积层;通过初始人像区域识别网络接收对应的第一卷积层和级联中上一个第二卷积层发送的编码数据,并对接收的编码数据进行人像区域识别;通过初始人像部件识别网络接收对应的第一卷积层和级联中上一个第三卷积层发送的编码数据,并对接收的编码数据进行人像部件识别。
在本申请实施例中,通过从待处理的第一目标图像中识别出多组人像部件和多组人像部件所在的区域,其中,每组人像部件对应于一个人体,在多组人像部件所在的区域中确定出包括人脸的目标组人像部件所在的目标区 域,从而对第一目标图像中除目标区域以外的区域进行虚化处理,也就是说,从第一目标图像中识别出包括人脸的目标组人像部件,从而将该目标组人像部件所在的目标区域确定为前景区域,将不包括人脸的其它组人像部件所在的区域确定为背景区域,如此,提高了前景人物的识别准确度,减少人像识别误检情况的发生。
作为一种可选的实施例,在第一目标图像为目标视频中的视频帧图像的情况下,在对第一目标图像中除目标区域以外的区域进行虚化处理,得到第二目标图像之后,上述方法还包括:将目标视频中的第一目标图像替换为第二目标图像;在播放目标视频的过程中播放第二目标图像。从而实现对视频中的画面进行虚化处理。
作为又一种可选的实施方式,如图4所示,上述图像的处理方法包括:
S402,将待处理的第一目标图像输入识别模型,得到识别模型输出的人像识别结果,其中,识别模型用于对图像中的人像部件进行识别,人像识别结果用于表示在第一目标图像中识别到的人像部件;
S404,在第一目标图像中人像部件所在的区域中确定出目标连通区域,其中,目标连通区域包括人像部件中的人脸在第一目标图像中的区域;
S406,对第一目标图像中除目标连通区域以外的区域进行虚化处理,得到第二目标图像。
在本申请实施例中,识别模型可以针对输入的图像中识别其中的人像部件,这里的人像部件可以包括但不限于:人脸、头发、躯干。如图3所示,将第一目标图像输入至识别模型后可以识别出图像中为人脸的人像部件32、为躯干的人像部件34和38、为头发的人像部件36,可以理解的是,这里可以识别的人像部件仅为举例,本申请并不限于此。
在本申请实施例中,由于需要对图像中除人像外的背景区域进行虚化处理,如图3所示,该图像中包括多个人像部件,将多个连通的人像部件所在的区域连通所得到的区域就是人像区域。这里,由于在拍照等图像采集的过程中可能会误入其他人像,如图3所示的人像部件38对应的人像,但是对于这样仅具有躯干的人像显然不应被确定为前景。
在本申请实施例中,将由包括人脸的人像部件组成的连通区域确定为目标连通区域,并将目标连通区域确定为前景区域,将第一目标图像中除目标连通区域以外的区域确定为背景区域,并对目标连通区域以外的背景区域进行虚化处理,得到处理后的第二目标图像。如图3所示,对于为躯干的人像部件38所在的连通区域,由于其中不包括人脸的人像部件,故而也将其进行虚化处理,图3中斜线覆盖部分用于示意这些区域进行了虚化处理,如此,将不包括人脸的其它人物的肢体确定为背景区域,实现了提高前景人物识别准确度,减少人像识别误检的技术效果。
可选的,在第一目标图像中人像部件所在的区域中确定出目标连通区域, 包括:
S1,将第一目标图像中人像部件所在的区域中每个相互连通的区域确定为一个候选连通区域,得到一组候选连通区域;
假设图像中的一个人像包括:头发、人脸和躯干,这些人像部件所在的区域之间是相互连通的,这里的相互连通可以是直接连通,也可以是间接连通,例如,头发对应的人像部件所在的区域可以通过人脸对应的人像部件所在的区域与躯干对应的人像部件所在的区域连通。如图3所示,人像部件36所在的区域可以通过人像部件32所在的区域与人像部件34所在的区域连通。
通过识别模型识别第一目标图像中的人像部件时,可以识别出图像中人脸所在的区域,躯干所在的区域等。对识别出的人像部件所在的区域的像素点进行标记,在确定连通区域时,可以将图像中具有第一类型标记的像素点所在的连通区域确定为一个候选连通区域。这里,在一个连通区域中的像素点均具有第一类型标记。可以理解的是,对于不同的人像部件可以采用相同的标记,也可以对不同的人像部件采用预先配置的对应的标记。又例如,可以将人像部件所在的连通区域的像素点设置为目标像素值,从而将目标像素值所在的连通区域确定为候选连通区域。这里,在一个候选连通区域中像素点均为目标像素值。可以理解的是,在图像中存在多个人像的情况下,可以确定出多个候选连通区域。
S2,将一组候选连通区域中区域面积大于第一阈值、且包括人脸的候选连通区域确定为目标连通区域,或,将一组候选连通区域中包括人脸的候选连通区域确定为目标连通区域。
在本申请实施例中,将一组候选连通区域中包括人脸的候选连通区域确定为目标连通区域。可以理解的是,在多个候选连通区域均包括人脸对应的区域的情况下,可以均确定为目标连通区域,从而作为前景区域。如图5所示,第一目标图像中可能存在多个人像,可以将这些人像各自对应的区域中包括人脸的区域均确定为前景区域。
在本申请实施例中,考虑到可能存在一些远处的行人进入图像中,但是并不希望将这些行人确定为前景区域,故而将一组候选连通区域中区域面积大于第一阈值、且包括人脸的候选连通区域确定为目标连通区域。如图6所示,若候选连通区域的区域面积小于或等于第一阈值,则不会被确定为目标连通区域,从而可以将远处的行人等确定为背景区域进行虚化处理。可以理解的是,这里的第一阈值可以根据实际情况进行设置,例如可以设置为第一目标图像的四分之一至六分之一,该第一阈值与第一目标图像的尺寸呈正相关,从而避免设置固定值造成不能适应不同尺寸的图像。可以理解的是,这里的第一阈值也可以设置为一个固定值。
可选的,将第一目标图像中人像部件所在的区域中每个相互连通的区域确定为一个候选连通区域,得到一组候选连通区域,包括:
将第一目标图像中人像部件对应的像素点的像素值设置为第一像素值,并将第一目标图像中除人像部件对应的像素点以外的像素点的像素值设置为第二像素值,得到二值图像,其中,第一像素值与第二像素值不同;
对二值图像进行区域识别,得到一组候选连通区域,其中,区域识别用于对二值图像中像素值相同的像素点所在的连通区域进行识别,一组候选连通区域中的像素点的像素值均为第一像素值。
在本申请实施例中,在确定候选连通区域时,可以先对第一目标图像进行二值化处理,从而便于通过对二值图像进行区域识别确定出人像所在的候选连通区域。这里,可以将人像部件对应的像素点的像素值均设置为第一像素值,将除人像部件对应的像素点以外的像素点的像素值设置为第二像素值,从而实现将第一目标图像转化为二值图像。如图7所示,通过识别模型确定出第一目标图像中的人像部件识别结果后,可以根据人像部件识别结果对第一目标图像进行二值化处理,得到处理后的二值图像。可以理解的是,后续可以从一组候选连通区域确定出目标候选区域,对第一目标图像中除目标候选区域以外的区域进行虚化处理,得到第二目标图像,完成对背景区域的虚化。在本申请可选实施例中,可以采用连通域检测的方式进行区域识别。
下面结合图7对本申请实施例的方法进行举例说明。对于待处理的图像I,可以将其输入至识别模型,该识别模型可以是深度神经网络,该识别模型对图像I中的人像部件进行解析,得到解析集合P=Par(Enc(I));再对人像部件解析得到人像部件识别结果指示的人像部件对应的像素点,对二值化后的二值图像进行连通域检测得到连通区域集合D={D 1,D 2…,D n},其中D i表示第i个连通区域的像素集合。接着将面积大于给定阈值(如图像面积的1/8)的连通区域加入到候选连通区域集合C中,其中
Figure PCTCN2020094576-appb-000001
Figure PCTCN2020094576-appb-000002
sum()为面积求取函数。在得到候选连通区域集合后,可以将不包含指定人体部位(如人脸)或小于给定阈值(如连通集面积的1/5)的区域从候选连通区域集合中去除,得到前景区域集合F,
Figure PCTCN2020094576-appb-000003
Figure PCTCN2020094576-appb-000004
其中P j∩…P k为指定的人体部位的像素集合,背景区域集合则为B=U-F,其中U为全体像素构成的集合,从而可以对背景区 域进行虚化处理。
可选的,在将待处理的第一目标图像输入至识别模型,得到识别模型输出的人像识别结果之前,方法还包括:
S1,获取第一组训练图像、第二组训练图像、以及一组区域划分结果、一组训练识别结果,其中,第一组训练图像与一组区域划分结果一一对应,每个区域划分结果用于表示第一组训练图像中一张图像中的已知人像区域,第二组训练图像与一组训练识别结果一一对应,每个训练识别结果用于表示第二组训练图像中一张图像中的已知人像部件;
S2,基于第一组训练图像和第二组训练图像对初始识别模型进行训练,得到训练后的识别模型,其中,训练后的识别模型对第一组训练图像识别到的预估人像区域与一组区域划分结果中的已知人像区域之间的误差满足第一收敛条件,训练后的识别模型对第二组训练图像识别到的预估人像部件与一组训练识别结果中的已知人像部件之间的误差满足第二收敛条件,训练后的识别模型包括:编码网络、人像区域识别网络和人像部件识别网络,其中,编码网络用语对图像进行编码得到编码数据,人像区域识别网络用于根据编码数据识别人像区域,人像部件识别网络用于根据编码数据识别人像部件;
S3,将训练后的识别模型中的人像区域识别网络删除,得到识别模型。
在本申请实施例中,可以获取第一组训练图像和与第一组训练图像一一对应的一组区域划分结果,一组区域划分结果中的每个区域划分结果用于表示第一组训练图像中与其对应的一张图像中的人像区域划分结果,还可以获取第二组训练图像和与第二组训练图像一一对应的一组训练识别结果,一组训练识别结果中的每个训练识别结果用于表示第二组训练图像中与其对应的一张图像中的人像部件识别结果,从而通过第一组训练图像和第二组训练图像对初始识别模型进行训练。
在本申请实施例中,初始识别模型中包括初始编码网络、初始人像区域识别网络和初始人像部件识别网络,训练后的识别模型具有相训练后的编码网络、人像区域识别网络和人像部件识别网络。
在本申请实施例中,所使用的识别模型是用于对输入的图像中的人像部件进行识别,无需人像区域识别模块,因而可以删除训练后的识别模型中的人像区域识别网络,得到识别模型。
可以理解的是,删除人像区域识别网络的识别模型由于所需进行的处理量变少,故而可以提高识别效率。同时,在本申请实施例中,在训练初始识别模型时设置人像区域识别网络,可以利用第一组训练图像和一组区域划分结果扩大对于初始识别模型的训练数据量,由于图像输入后识别模型均需要先通过编码网络进行编码,因此通过第一组训练图像还可以有效提高编码网络的准确度,进而提高训练得到的识别模型的识别准确度。
可选的,基于第一组训练图像和第二组训练图像对初始识别模型进行训 练,包括:
从第一组训练图像中选出第一训练图像,从第二组训练图像中选出第二训练图像;
将第一训练图像和第二训练图像输入到初始识别模型,其中,初始识别模型包括:初始编码网络、初始人像区域识别网络和初始人像部件识别网络,初始编码网络包括级联的第一卷积层,初始人像区域识别模块包括级联的第二卷积层,初始人像部件识别模块包括级联的第三卷积层。
初始识别模型通过初始编码网络中的第一卷积层接收级联中上一个第一卷积层对第一训练图像和第二训练图像进行编码后得到的编码数据,并将编码数据发送至对应的第二卷积层、第三卷积层和级联的下一个第一卷积层。初始识别模型通过初始人像区域识别网络接收对应的第一卷积层和级联的上一个第二卷积层发送的编码数据,并对接收的编码数据进行人像区域识别。初始识别模型通过初始人像部件识别网络接收对应的第一卷积层和级联中上一个第三卷积层发送的编码数据,并对接收的编码数据进行人像部件识别。
可以理解的是,本申请实施例中初始识别模型中的初始编码网络在训练完成后即为训练后的识别模型中的编码网络,类似的,初始人像区域识别网络在训练完成后即为训练后的识别模型中的人像区域识别网络,初始人像部件识别网络在训练完成后即为训练后的识别模型中的人像部件识别网络。
如图8所示,在本申请实施例中,每个网络中均包括级联的多个卷积层,初始编码网络中级联的第一卷积层中第一个第一卷积层对输入初始识别模型的图像进行编码,并将编码后的编码数据分别发送至级联的下一个第一卷积层、初始人像区域识别网络中对应的第二卷积层和初始人像部件识别网络中对应的第三卷积层。初始人像区域识别网络中的第二卷积层接收级联的上一个第二卷积层输出的数据和对应的第一卷积层传送的数据,初始人像部件识别网络中的第三卷积块接收级联的上一个第三卷积层输出的数据和对应的第一卷积层传送的数据。
可以理解的是,本申请实施例中初始人像区域识别网络和初始人像部件识别网络分别进行人像分割与人像部件解析两个任务,在扩大训练数据规模的同时,使得初始识别模型能够同时获取来自人像分割任务提供的人体整体感知激励、与人像部件解析任务提供的人体局部细节感知激励,从而提升模型的性能。如图9所示的,对于第一卷积层可以是密集卷积层,即第一卷积层可以包括多个密集连接的残差模块,从而对图像的不同尺度进行高效编码,使得特征包含不同尺度的丰富信息。
如图8和图9所示的实施例,本申请实施例中训练后的识别模型的架构与初始识别模型是相同的,识别模型与初始识别模型的架构区别在于没有人像区域识别网络,编码网络和人像部件识别网络的数据传输架构与图8至图9所示的架构是相同的。在图8和图9所示的模型架构中,k×k表示卷积层 (也可以称为卷积核)大小为k×k的卷积操作,C表示特征通道的串接,Add表示特征的相加,双线性插值操作Upsample表示上采样倍数为2的双线性插值操作。如图9所示,第一卷积层可以包括多个密集连接的残差模块。
可以理解的是,第二卷积层的架构可以与图9所示的第三卷积层的架构相同。第三卷积层的输入包括级联的上一个第三卷积层的输出和对应的第一卷积层的输出。本申请实施例中的初始人像区域识别网络和初始人像部件识别网络采用了类似的解码结构,并利用编码网络抽取的特征由低尺度到高尺度逐步恢复出人像分割与部件解析的结果:S=Seg(Enc(I)),P={P 1∪P 2∪…∪P k}=Par(Enc(I)),其中,I表示输入图像,S为人像分割的像素集合,P为人像部件解析集合,P i表示第i种人像部件(比如人脸)的像素集合;在模型训练时,本申请实施例中联合人像分割与人像部件解析两个任务,在扩大数据规模的同时,使得模型能够同时获取来自人像分割任务提供的人体整体感知激励、与人像部件解析任务提供的人体局部细节感知激励,从而提升模型的性能。在本申请实施例中,训练损失Loss可以为:
Figure PCTCN2020094576-appb-000005
其中CrossEntropy(.)表示交叉熵损失,HS表示人像分割数据集,其包含N个训练实例,例如第一组训练图像,S gt表示图像I对应的真实人像分割标签,可以根据一组区域划分结果确定,HP表示人像部件解析数据集,其包含M个训练实例,例如第二组训练图像,P gt表示图像I对应的真实人像部件解析标签,可以根据一组人像分割结果确定。可以理解的是,在上述训练损失Loss小于设定值时可以认为当前满足收敛条件。
需要注意的是,上述仅为本申请可选实施例,本申请并不限于上述举例。对于本申请实施例中的识别模型可以是基于深度学习的分类模型中的深度神经网络(Deep Neural Network,简称DNN)模型、卷积神经网络(Convolutional Neural Network,简称CNN)模型等。
可选的,在第一目标图像为目标视频中的视频帧图像的情况下,在对第一目标图像中除目标连通区域以外的区域进行虚化处理,得到第二目标图像之后,方法还包括:
S902,将目标视频中的第一目标图像替换为第二目标图像;
S904,在播放目标视频的过程中播放第二目标图像。
在本申请实施例中,第一目标图像可以是目标视频中的视频帧图像,例如可以是视频会议传输的视频中的图像帧。在接收到目标视频后,可以对目标视频中的第一目标图像中除目标连通区域以外的区域进行虚化处理,得到第二目标图像,将目标视频中的第一目标图像替换为第二目标图像,从而播放目标视频时播放的是第二目标图像,实现通过对背景区域进行虚化处理,突出视频会议中的人物。
可以理解的是,在本申请实施例可以通过高斯模糊处理,对背景区域进行虚化,得到虚化结果I′,I =GaussianBlur(I,r)*B+I*F,其中,GaussianBlur(.)为高斯模糊操作,r为模糊核半径大小(例如可以设置为50),B表示背景区域集合,代数运算I*F表示通过F索引(取)出I中对应下标的元素。当然可以理解的是,上述虚化处理仅为本申请提供的可选实施例,本申请并不限于此。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
根据本申请实施例的另一个方面,还提供了一种用于实施上述图像的处理方法的图像的处理装置。如图10所示,该装置包括:
第一识别单元1001,用于从待处理的第一目标图像中识别多组人像部件和所述多组人像部件所在的区域,其中,每组人像部件对应于一个人体;
第一确定单元1003,用于在所述多组人像部件所在的区域中,确定目标组人像部件所在的目标区域,其中,所述目标组人像部件包括目标人脸,所述目标区域与所述多组人像部件中除所述目标组人像部件之外的其它组人像部件所在的区域相隔开;
第一处理单元1005,用于对所述第一目标图像中除所述目标区域以外的区域进行虚化处理,得到第二目标图像。
可选的,上述第一确定单元1003可以包括:
第一确定模块,用于在N组人像部件中确定包括人脸的M组人像部件,其中,N组人像部件为多组人像部件,N≥M≥1;
第二确定模块,用于在M组人像部件所在的区域中,确定目标组人像部件所在的目标区域,其中,目标组人像部件包括的人脸所在的区域的面积大于或等于第一阈值,和/或,目标组人像部件所在的区域的面积大于或等于第二阈值。
其中,第一阈值和/或第二阈值与第一目标图像的尺寸呈正相关。
可选的,上述第二确定模块,可以包括:
设置子模块,用于将M组人像部件对应的像素点的像素值设置为第一像素值,将第一目标图像中除M组人像部件对应的像素点以外的像素点的像素值设置为第二像素值,得到二值图像,其中,第一像素值与第二像素值不同;
处理子模块,用于对二值图像进行区域识别,得到目标区域,其中,目标区域中包括目标人脸的像素点。
通过本装置实施例,第一识别单元1001从待处理的第一目标图像中识别 多组人像部件和所述多组人像部件所在的区域,其中,每组人像部件对应于一个人体;第一确定单元1003在所述多组人像部件所在的区域中,确定目标组人像部件所在的目标区域,其中,所述目标组人像部件包括目标人脸,所述目标区域与所述多组人像部件中除所述目标组人像部件之外的其它组人像部件所在的区域相隔开;第一处理单元1005对所述第一目标图像中除所述目标区域以外的区域进行虚化处理,得到第二目标图像。从而对目标区域以外的区域进行虚化处理,也就是说,从第一目标图像中识别出包括人脸的目标组人像部件,从而将目标组人像部件所在的目标区域确定为前景区域,将不包括人脸的其它人物的肢体确定为背景区域,从而提高了前景人物识别准确度,减少了人像识别误检。
作为一种可选的实施例,上述装置还可以包括:
替换单元,用于在第一目标图像为目标视频中的视频帧图像的情况下,在第一处理单元对第一目标图像中除目标区域以外的区域进行虚化处理,得到第二目标图像之后,将目标视频中的第一目标图像替换为第二目标图像;
播放单元,用于在播放目标视频的过程中播放第二目标图像。
根据本申请实施例的另一个方面,还提供了一种用于实施上述图像的处理方法的图像的处理装置。如图11所示,该装置包括:
第二识别单元1102,用于将待处理的第一目标图像输入至识别模型,得到识别模型输出的人像识别结果,其中,目标识别模型用于对图像中的人像的部件进行识别,人像识别结果用于表示第一目标图像中识别到的人像部件;
第二确定单元1104,用于在第一目标图像中人像部件所在的区域中确定出目标连通区域,其中,目标连通区域包括人像部件中的人脸在第一目标图像中的区域;
第二处理单元1106,用于对第一目标图像中除目标连通区域以外的区域进行虚化处理,得到第二目标图像。
在本申请实施例中,识别模型可以对输入的图像中的人像部件进行识别,这里的人像部件可以包括但不限于:人脸、头发、躯干。在本申请实施例中,由于需要对人像外的背景区域进行虚化处理,该图像中包括多个人像部件,将多个连通的人像部件所在的区域连通所得到的区域也就是人像区域。在本申请实施例中,将包括人脸的连通区域确定为目标连通区域,从而将目标连通区域确定为前景区域,将目标连通区域以外的区域确定为背景区域,并对第一目标图像中除目标连通区域以外的背景区域进行虚化处理,得到处理后的第二目标图像。本申请实施例能够将不包括人脸的其它人物的肢体确定为背景区域,从而提高了前景人物识别准确度,减少了人像识别误检。
可选的,第二确定单元1104包括:
第三确定模块,用于将第一目标图像中人像部件所在的区域中每个相互连通的区域确定为一个候选连通区域,得到一组候选连通区域;
第四确定模块,用于将一组候选连通区域中区域面积大于第一阈值、且包括人脸的候选连通区域确定为目标连通区域,或,将一组候选连通区域中包括人脸的候选连通区域确定为目标连通区域。
可选的,第一阈值与第一目标图像的尺寸呈正相关。
可选的,第一确定模块具体用于:
将第一目标图像中人像部件对应的像素点的像素值设置为第一像素值,并将第一目标图像中除人像部件对应的像素点以外的像素点的像素值设置为第二像素值,得到二值图像,其中,第一像素值与第二像素值不同;
对二值图像进行区域识别,得到一组候选连通区域,其中,区域识别用于对二值图像中像素值相同的像素点所在的连通区域进行识别,一组候选连通区域中的像素点的像素值均为第一像素值。
可选的,上述装置还可以包括:
第一获取单元,用于获取第一组训练图像、第二组训练图像、以及一组区域划分结果、一组训练识别结果,其中,第一组训练图像与一组区域划分结果一一对应,每个区域划分结果用于表示第一组训练图像中一张图像中的已知人像区域,第二组训练图像与一组训练识别结果一一对应,每个训练识别结果用于表示第二组训练图像中一张图像中的已知人像部件;
训练单元,用于基于第一组训练图像和第二组训练图像对初始识别模型进行训练,得到训练后的识别模型,其中,训练后的识别模型对第一组训练图像识别到的预估人像区域与一组区域划分结果中的已知人像区域之间的误差满足第一收敛条件,训练后的识别模型对第二组训练图像识别到的预估人像部件与一组训练识别结果中的已知人像部件之间的误差满足第二收敛条件,训练后的识别模型包括:编码模块、人像区域识别模块和人像部件识别模块,编码模块用于图像进行编码得到编码数据;人像区域识别模块用于根据编码数据识别出人像区域;人像部件识别模块用于根据编码数据识别出预估人像部件;
第二处理单元,用于将训练后的识别模型中的人像区域识别模块删除,得到目标识别模型。
可选的,训练单元包括:
输入模块,用于从第一组训练图像中选出第一训练图像,从第二组训练图像中选出第二训练图像;将第一训练图像和第二训练图像输入到初始识别模型,其中,初始识别模型包括:初始编码网络、初始人像区域识别网络和初始人像部件识别网络,初始编码网络包括级联的第一卷积层,初始人像区域识别网络包括级联的第二卷积层,初始人像部件识别网络包括级联的第三卷积层,初始编码网络中的第一卷积层用于接收级联的上一个第一卷积层对第一训练图像和第二训练图像进行编码后的编码数据,并将编码数据发送至对应的第二卷积层、第三卷积层和级联的下一个第一卷积层,初始人像区域 识别网络用于接收对应的第一卷积层和级联的上一个第二卷积层发送的编码数据,并对接收的编码数据进行人像区域识别;初始人像部件识别网络用于接收对应的第一卷积层和级联的上一个第三卷积层发送的编码数据,并对接收的编码数据进行人像部件识别。
根据本申请实施例的又一个方面,还提供了一种用于实施上述图像的处理方法的电子装置,如图12所示,该电子装置包括存储器1202和处理器1204,该存储器1202中存储有计算机程序,该处理器1204被设置为通过计算机程序执行上述任一项方法实施例中的步骤。
可选地,在本实施例中,上述电子装置可以位于计算机网络的多个网络设备中的至少一个网络设备。
可选地,在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:
S1,从待处理的第一目标图像中识别多组人像部件和多组人像部件所在的区域,其中,每组人像部件对应于一个人体;
S2,在多组人像部件所在的区域中,确定目标组人像部件所在的目标区域,其中,目标组人像部件包括目标人脸,目标区域与多组人像部件中除目标组人像部件之外的其它组人像部件所在的区域相隔开;
S3,对第一目标图像中除目标区域以外的区域进行虚化处理,得到第二目标图像。
可选地,本领域普通技术人员可以理解,图12所示的结构仅为示意,电子装置也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。图12其并不对上述电子装置的结构造成限定。例如,电子装置还可包括比图12中所示更多或者更少的组件(如网络接口等),或者具有与图12所示不同的配置。
其中,存储器1202可用于存储软件程序以及模块,如本申请实施例中的图像的处理方法和装置对应的程序指令/模块,处理器1204通过运行存储在存储器1202内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的图像的处理方法。存储器1202可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其它非易失性固态存储器。在一些实例中,存储器1202可进一步包括相对于处理器1204远程设置的存储器,这些远程存储器可以通过网络连接至终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。其中,存储器1202具体可以但不限于用于存储第一目标图像和第二目标图像等信息。作为一种示例,如图12所示,上述存储器1202中可以但不限于包括上述图像的处理装置中的第一识别单元1001、第一确定单元1003及第一处理单元1005。此外,还可以包括但不限于上述图像的处理装置中的其它模 块单元,本示例中不再赘述。
可选地,上述的传输装置1206用于经由一个网络接收或者发送数据。上述的网络具体实例可包括有线网络及无线网络。在一个实例中,传输装置1206包括一个网络适配器(Network Interface Controller,NIC),其可通过网线与其它网络设备与路由器相连从而可与互联网或局域网进行通讯。在一个实例中,传输装置1206为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。
此外,上述电子装置还包括:显示器1208,用于显示上述第一目标图像和第二目标图像;和连接总线1210,用于连接上述电子装置中的各个模块部件。
根据本申请的实施例的又一方面,还提供了一种计算机可读的存储介质,该计算机可读的存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。
可选地,在本实施例中,上述计算机可读的存储介质可以被设置为存储用于执行以下步骤的计算机程序:
S1,从待处理的第一目标图像中识别多组人像部件和多组人像部件所在的区域,其中,每组人像部件对应于一个人体;
S2,在多组人像部件所在的区域中,确定目标组人像部件所在的目标区域,其中,目标组人像部件包括目标人脸,目标区域与多组人像部件中除目标组人像部件之外的其它组人像部件所在的区域相隔开;
S3,对第一目标图像中除目标区域以外的区域进行虚化处理,得到第二目标图像。
可选地,在本实施例中,本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
上述实施例中的集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在上述计算机可读取的存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在存储介质中,包括若干指令用以使得一台或多台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。
在本申请的上述实施例中,对各个实施例的描述都各有侧重,某个实施 例中没有详述的部分,可以参见其它实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的客户端,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
以上所述仅是本申请的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。

Claims (16)

  1. 一种图像的处理方法,由电子设备执行,所述方法包括:
    从待处理的第一目标图像中识别多组人像部件和所述多组人像部件所在的区域,其中,每组人像部件对应于一个人体;
    在所述多组人像部件所在的区域中,确定目标组人像部件所在的目标区域,其中,所述目标组人像部件包括目标人脸,所述目标区域与所述多组人像部件中除所述目标组人像部件之外的其它组人像部件所在的区域相隔开;
    对所述第一目标图像中除所述目标区域以外的区域进行虚化处理,得到第二目标图像。
  2. 根据权利要求1所述的方法,所述在所述多组人像部件所在的区域中,确定目标组人像部件所在的目标区域,包括:
    在N组人像部件中确定包括人脸的M组人像部件,其中,所述N组人像部件为所述多组人像部件,N≥M≥1;
    在所述M组人像部件所在的区域中,确定所述目标组人像部件所在的目标区域,其中,所述目标组人像部件包括的人脸所在的区域的面积大于或等于第一阈值,和/或,所述目标组人像部件所在的区域的面积大于或等于第二阈值。
  3. 根据权利要求2所述的方法,所述第一阈值和/或所述第二阈值与所述第一目标图像的尺寸呈正相关。
  4. 根据权利要求2所述的方法,所述在所述M组人像部件所在的区域中,确定所述目标组人像部件所在的目标区域,包括:
    将所述M组人像部件对应的像素点的像素值设置为第一像素值,将所述第一目标图像中除所述M组人像部件对应的像素点以外的像素点的像素值设置为第二像素值,得到二值图像,其中,所述第一像素值与所述第二像素值不同;
    对所述二值图像进行区域识别,得到所述目标区域,其中,所述目标区域中包括目标人脸的像素点。
  5. 根据权利要求1所述的方法,所述从待处理的第一目标图像中识别多组人像部件和所述多组人像部件所在的区域,包括:
    通过识别模型处理所述第一目标图像,确定所述多组人像部件和所述多组人像部件所在的区域。
  6. 根据权利要求1所述的方法,在所述通过识别模型处理所述第一目标图像,确定所述多组人像部件和所述多组人像部件所在的区域之前,所述方法还包括:
    获取第一组训练图像、第二组训练图像、以及一组区域划分结果、一组训练识别结果,其中,所述第一组训练图像与所述一组区域划分结果一一对应,每个区域划分结果用于表示所述第一组训练图像中一张图像中的已知人 像区域,所述第二组训练图像与所述一组训练识别结果一一对应,每个训练识别结果用于表示所述第二组训练图像中一张图像中的已知人像部件;
    基于所述第一组训练图像和所述第二组训练图像对初始识别模型进行训练,得到训练后的识别模型,其中,所述训练后的识别模型对所述第一组训练图像识别到的预估人像区域与所述一组区域划分结果中的所述已知人像区域之间的误差满足第一收敛条件,所述训练后的识别模型对所述第二组训练图像识别到的预估人像部件与所述一组训练识别结果中的所述已知人像部件之间的误差满足第二收敛条件;所述训练后的识别模型包括:用于对图像进行编码得到编码数据的编码网络、根据所述编码数据识别人像区域的人像区域识别网络、根据所述编码数据识别人像部件的人像部件识别网络。
  7. 根据权利要求6所述的方法,所述基于所述第一组训练图像和所述第二组训练图像对初始识别模型进行训练,包括:
    从所述第一组训练图像中选出第一训练图像,从所述第二组训练图像中选出第二训练图像;
    将所述第一训练图像和所述第二训练图像输入所述初始识别模型,其中,所述初始识别模型包括:初始编码网络、初始人像区域识别网络和初始人像部件识别网络,所述初始编码网络包括级联的第一卷积层,所述初始人像区域识别网络包括级联的第二卷积层,所述初始人像部件识别网络包括级联的第三卷积层;
    通过所述初始编码网络中的第一卷积层接收级联的上一个第一卷积层对所述第一训练图像和所述第二训练图像进行编码后得到的编码数据,并将所述编码数据发送至对应的第二卷积层、第三卷积层和级联的下一个第一卷积层;通过所述初始人像区域识别网络接收对应的第一卷积层和级联的上一个第二卷积层发送的编码数据,并对接收的编码数据进行人像区域识别;通过所述初始人像部件识别网络接收对应的第一卷积层和级联中上一个第三卷积层发送的编码数据,并对接收的编码数据进行人像部件识别。
  8. 根据权利要求1至7任一项所述的方法,在所述第一目标图像为目标视频中的视频帧图像的情况下,在所述对所述第一目标图像中除所述目标区域以外的区域进行虚化处理,得到第二目标图像之后,所述方法还包括:
    将所述目标视频中的所述第一目标图像替换为所述第二目标图像;在播放所述目标视频的过程中播放所述第二目标图像。
  9. 一种图像的处理装置,所述装置包括:
    第一识别单元,用于从待处理的第一目标图像中识别多组人像部件和所述多组人像部件所在的区域,其中,每组人像部件对应于一个人体;
    第一确定单元,用于在所述多组人像部件所在的区域中,确定目标组人像部件所在的目标区域,其中,所述目标组人像部件包括目标人脸,所述目标区域与所述多组人像部件中除所述目标组人像部件之外的其它组人像部件 所在的区域相隔开;
    第一处理单元,用于对所述第一目标图像中除所述目标区域以外的区域进行虚化处理,得到第二目标图像。
  10. 根据权利要求9所述的装置,所述第一确定单元包括:
    第一确定模块,用于在N组人像部件中确定包括人脸的M组人像部件,其中,所述N组人像部件为所述多组人像部件,N≥M≥1;
    第二确定模块,用于在所述M组人像部件所在的区域中,确定所述目标组人像部件所在的目标区域,其中,所述目标组人像部件包括的人脸所在的区域的面积大于或等于第一阈值,和/或,所述目标组人像部件所在的区域的面积大于或等于第二阈值。
  11. 根据权利要求10所述的装置,所述第一阈值和/或所述第二阈值与所述第一目标图像的尺寸呈正相关。
  12. 根据权利要求10所述的装置,所述第二确定模块,包括:
    设置子模块,用于将所述M组人像部件对应的像素点的像素值设置为第一像素值,将所述第一目标图像中除所述M组人像部件对应的像素点以外的像素点的像素值设置为第二像素值,得到二值图像,其中,所述第一像素值与所述第二像素值不同;
    处理子模块,用于对所述二值图像进行区域识别,得到所述目标区域,其中,所述目标区域中包括目标人脸的像素点。
  13. 根据权利要求9至12任一项所述的装置,所述装置还包括:
    替换单元,用于在所述第一目标图像为目标视频中的视频帧图像的情况下,在所述第一处理单元对所述第一目标图像中除所述目标区域以外的区域进行虚化处理,得到第二目标图像之后,将所述目标视频中的所述第一目标图像替换为所述第二目标图像;
    播放单元,用于在播放所述目标视频的过程中播放所述第二目标图像。
  14. 一种存储介质,所述存储介质包括存储的程序,其中,所述程序运行时执行上述权利要求1至8任一项中所述的方法。
  15. 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为通过所述计算机程序执行所述权利要求1至8任一项中所述的方法。
  16. 一种计算机程序产品,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1至8任一项中所述的方法。
PCT/CN2020/094576 2019-11-26 2020-06-05 图像的处理方法和装置、存储介质及电子装置 WO2021103474A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/524,387 US20220067888A1 (en) 2019-11-26 2021-11-11 Image processing method and apparatus, storage medium, and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911175754.X 2019-11-26
CN201911175754.XA CN110991298B (zh) 2019-11-26 2019-11-26 图像的处理方法和装置、存储介质及电子装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/524,387 Continuation US20220067888A1 (en) 2019-11-26 2021-11-11 Image processing method and apparatus, storage medium, and electronic device

Publications (1)

Publication Number Publication Date
WO2021103474A1 true WO2021103474A1 (zh) 2021-06-03

Family

ID=70087119

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/094576 WO2021103474A1 (zh) 2019-11-26 2020-06-05 图像的处理方法和装置、存储介质及电子装置

Country Status (3)

Country Link
US (1) US20220067888A1 (zh)
CN (1) CN110991298B (zh)
WO (1) WO2021103474A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991298B (zh) * 2019-11-26 2023-07-14 腾讯科技(深圳)有限公司 图像的处理方法和装置、存储介质及电子装置
CN112686907A (zh) * 2020-12-25 2021-04-20 联想(北京)有限公司 一种图像处理方法、设备及装置
CN112581481B (zh) * 2020-12-30 2024-04-12 Oppo广东移动通信有限公司 图像处理方法和装置、电子设备、计算机可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107509031A (zh) * 2017-08-31 2017-12-22 广东欧珀移动通信有限公司 图像处理方法、装置、移动终端及计算机可读存储介质
CN108509994A (zh) * 2018-03-30 2018-09-07 百度在线网络技术(北京)有限公司 人物图像聚类方法和装置
CN109410220A (zh) * 2018-10-16 2019-03-01 腾讯科技(深圳)有限公司 图像分割方法、装置、计算机设备及存储介质
US20190311186A1 (en) * 2018-04-09 2019-10-10 Pegatron Corporation Face recognition method
CN110991298A (zh) * 2019-11-26 2020-04-10 腾讯科技(深圳)有限公司 图像的处理方法和装置、存储介质及电子装置

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108230252B (zh) * 2017-01-24 2022-02-01 深圳市商汤科技有限公司 图像处理方法、装置以及电子设备
CN107146203A (zh) * 2017-03-20 2017-09-08 深圳市金立通信设备有限公司 一种图像虚化方法及终端
CN106971165B (zh) * 2017-03-29 2018-08-10 武汉斗鱼网络科技有限公司 一种滤镜的实现方法及装置
CN106973164B (zh) * 2017-03-30 2019-03-01 维沃移动通信有限公司 一种移动终端的拍照虚化方法及移动终端
CN107563979B (zh) * 2017-08-31 2020-03-27 Oppo广东移动通信有限公司 图像处理方法、装置、计算机可读存储介质和计算机设备
CN109829456B (zh) * 2017-11-23 2022-05-17 腾讯科技(深圳)有限公司 图像识别方法、装置及终端
CN108093158B (zh) * 2017-11-30 2020-01-10 Oppo广东移动通信有限公司 图像虚化处理方法、装置、移动设备和计算机可读介质
CN107948517B (zh) * 2017-11-30 2020-05-15 Oppo广东移动通信有限公司 预览画面虚化处理方法、装置及设备
CN108234882B (zh) * 2018-02-11 2020-09-29 维沃移动通信有限公司 一种图像虚化方法及移动终端
CN110147805B (zh) * 2018-07-23 2023-04-07 腾讯科技(深圳)有限公司 图像处理方法、装置、终端及存储介质
CN110363172A (zh) * 2019-07-22 2019-10-22 曲靖正则软件开发有限公司 一种视频处理方法、装置、电子设备及可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107509031A (zh) * 2017-08-31 2017-12-22 广东欧珀移动通信有限公司 图像处理方法、装置、移动终端及计算机可读存储介质
CN108509994A (zh) * 2018-03-30 2018-09-07 百度在线网络技术(北京)有限公司 人物图像聚类方法和装置
US20190311186A1 (en) * 2018-04-09 2019-10-10 Pegatron Corporation Face recognition method
CN109410220A (zh) * 2018-10-16 2019-03-01 腾讯科技(深圳)有限公司 图像分割方法、装置、计算机设备及存储介质
CN110991298A (zh) * 2019-11-26 2020-04-10 腾讯科技(深圳)有限公司 图像的处理方法和装置、存储介质及电子装置

Also Published As

Publication number Publication date
CN110991298B (zh) 2023-07-14
US20220067888A1 (en) 2022-03-03
CN110991298A (zh) 2020-04-10

Similar Documents

Publication Publication Date Title
JP7236545B2 (ja) ビデオターゲット追跡方法と装置、コンピュータ装置、プログラム
US11151725B2 (en) Image salient object segmentation method and apparatus based on reciprocal attention between foreground and background
WO2021103474A1 (zh) 图像的处理方法和装置、存储介质及电子装置
CN113379627B (zh) 图像增强模型的训练方法和对图像进行增强的方法
US20210081695A1 (en) Image processing method, apparatus, electronic device and computer readable storage medium
CN111814620A (zh) 人脸图像质量评价模型建立方法、优选方法、介质及装置
CN113642431A (zh) 目标检测模型的训练方法及装置、电子设备和存储介质
CN113469289B (zh) 视频自监督表征学习方法、装置、计算机设备和介质
CN112380955B (zh) 动作的识别方法及装置
CN111126347B (zh) 人眼状态识别方法、装置、终端及可读存储介质
KR20220044828A (ko) 얼굴 속성 인식 방법, 장치, 전자 기기 및 저장 매체
CN112561879B (zh) 模糊度评价模型训练方法、图像模糊度评价方法及装置
CN113128368A (zh) 一种人物交互关系的检测方法、装置及***
CN116977674A (zh) 图像匹配方法、相关设备、存储介质及程序产品
US20230154139A1 (en) Systems and methods for contrastive pretraining with video tracking supervision
CN111626212B (zh) 图片中对象的识别方法和装置、存储介质及电子装置
US20230115765A1 (en) Method and apparatus of transferring image, and method and apparatus of training image transfer model
CN113610856B (zh) 训练图像分割模型和图像分割的方法和装置
Uchigasaki et al. Deep image compression using scene text quality assessment
CN113177483B (zh) 视频目标分割方法、装置、设备以及存储介质
CN111539420B (zh) 基于注意力感知特征的全景图像显著性预测方法及***
CN113537359A (zh) 训练数据的生成方法及装置、计算机可读介质和电子设备
CN114550236B (zh) 图像识别及其模型的训练方法、装置、设备和存储介质
CN113610064B (zh) 笔迹识别方法和装置
CN117636398A (zh) 一种视频行人重识别方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20892336

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20892336

Country of ref document: EP

Kind code of ref document: A1