WO2024041235A1 - 图像处理方法、装置、设备、存储介质及程序产品 - Google Patents

图像处理方法、装置、设备、存储介质及程序产品 Download PDF

Info

Publication number
WO2024041235A1
WO2024041235A1 PCT/CN2023/105718 CN2023105718W WO2024041235A1 WO 2024041235 A1 WO2024041235 A1 WO 2024041235A1 CN 2023105718 W CN2023105718 W CN 2023105718W WO 2024041235 A1 WO2024041235 A1 WO 2024041235A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
repaired
mask template
initial
target
Prior art date
Application number
PCT/CN2023/105718
Other languages
English (en)
French (fr)
Inventor
钟立耿
朱允全
刘文然
文伟
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2024041235A1 publication Critical patent/WO2024041235A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/10Image enhancement or restoration using non-spatial domain filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • G06T5/30Erosion or dilatation, e.g. thinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20036Morphological image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20056Discrete and fast Fourier transform, [DFT, FFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present application relates to the field of image processing technology, and in particular, to an image processing method, device, equipment, storage medium and program product.
  • Video filling is to perform video filling on the video. Frame images are processed.
  • video filling technologies include: methods based on optical flow and methods based on neural network models.
  • optical flow is applied to propagate the pixel gradient of the unmasked area to the masked area to fill the masked area with optical flow to complete the video frame image filling; however, the method based on optical flow is only suitable for simple movement of the background, and is not suitable for situations where object occlusion and complex movement of the background occur.
  • the neural network model When processing video frame images based on the neural network model, the neural network model is a single model, which can fill in the video frame image with better reference pixel propagation effects when complex movements occur in the background.
  • the generation capability of a single model is limited. For situations with complex textures and object occlusion, the filling content is blurred and the image quality of video frame images cannot be guaranteed.
  • This application provides an image processing method, device, equipment, storage medium and program product to ensure the accuracy of image processing and improve the image quality of processed video frame images.
  • embodiments of the present application provide an image processing method, which method includes:
  • Repair the first type of object in the image to be processed obtain the first repaired image, and generate the corresponding initial image mask template based on the initial blurred area in the first repaired image;
  • a target repaired image corresponding to the image to be processed is determined.
  • an image processing device which includes:
  • the first processing unit is configured to perform mask processing on the first type of objects contained in the acquired target video frame image, and obtain the image to be processed after mask processing; the first type of object is the image element to be repaired;
  • the second processing unit is configured to perform repair processing on the first type of object in the image to be processed, obtain the first repaired image, and generate a corresponding initial image mask template based on the initial blurred area in the first repaired image;
  • a third processing unit configured to perform morphological processing on the initial blurred area corresponding to the initial blurred pixels to obtain the image target mask template when the first number of initial blurred pixels contained in the initial mask template of the image reaches the first threshold;
  • the fourth processing unit is configured to, when the second number of intermediate blurred pixels contained in the image target mask template reaches the second threshold, perform repair processing on the pixel area corresponding to the intermediate blurred pixel in the first repaired image to obtain the second Repair images;
  • the determining unit is configured to determine the target repaired image corresponding to the image to be processed based on the second repaired image.
  • embodiments of the present application provide an electronic device, including: a memory and a processor, wherein the memory is used to store computer instructions; the processor is used to execute computer instructions to implement the image processing method provided by the embodiments of the present application. A step of.
  • embodiments of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium stores computer instructions.
  • the steps of the image processing method provided by the embodiments of the present application are implemented.
  • embodiments of the present application provide a computer program product, which includes computer instructions, and the computer instructions are stored in a computer-readable storage medium; when the processor of the electronic device reads the computer instructions from the computer-readable storage medium, it processes The computer executes the computer instructions, causing the electronic device to execute the steps of the image processing method provided by the embodiments of the present application.
  • image repair is decomposed into three stages.
  • the obtained first repaired image is further detected and a corresponding initial image mask template is generated; in the second stage, when it is determined that the initial image mask template contains
  • the first number of initial blurred pixels reaches the first threshold, morphological processing is performed on the blurred areas corresponding to the initial blurred pixels to connect different blurred areas, and an image target mask template is obtained, which avoids incorrect processing of smaller blurred areas.
  • Necessary processing improves processing efficiency; in the third stage, when it is determined that the second number of intermediate blur pixels contained in the image target mask template reaches the second threshold, it is determined that there is an object outline that needs to be completed in the first repaired image, so that The pixel area corresponding to the intermediate blurred pixel is repaired to obtain a second repaired image; finally, based on the second repaired image, a target repaired image corresponding to the image to be processed is determined.
  • the image quality of the second repaired image is improved and the image quality of the target repaired image is ensured.
  • Figure 1 is a schematic diagram of the first image processing in the related technology
  • FIG. 2 is a schematic diagram of the second type of image processing in related technologies
  • Figure 3 is a schematic diagram of an application scenario provided by the embodiment of the present application.
  • Figure 4 is a flow chart of an image processing method provided by an embodiment of the present application.
  • Figure 5 is a schematic diagram of filling processing of first-type objects provided by an embodiment of the present application.
  • Figure 6 is a schematic diagram of the first image processing provided by the embodiment of the present application.
  • Figure 7 is a schematic diagram of the second image processing provided by the embodiment of the present application.
  • Figure 8 is a third image processing schematic diagram provided by an embodiment of the present application.
  • Figure 9 is a schematic diagram of morphological processing of an initial blurred area provided by an embodiment of the present application.
  • Figure 10 is a schematic diagram of repairing the pixel area corresponding to the intermediate blurred pixel in an embodiment of the present application
  • Figure 11 is a flow chart of another image processing method provided by an embodiment of the present application.
  • Figure 12 is a flow chart of a specific implementation method of image processing provided by the embodiment of the present application.
  • Figure 13 is a schematic diagram of a specific implementation method of image processing provided by the embodiment of the present application.
  • Figure 14 is a flow chart of a training method for an information dissemination model provided by an embodiment of the present application.
  • Figure 15 is a structural diagram of an image processing device provided by an embodiment of the present application.
  • Figure 16 is a structural diagram of an electronic device provided by an embodiment of the present application.
  • Figure 17 is a structural diagram of another electronic device provided by an embodiment of the present application.
  • Video Inpainting is a technology that uses the unoccluded area information in the video to repair the obscured area, that is, the unoccluded area information is used to reasonably fill the occluded area.
  • Video repair requires two abilities. One is the ability to use temporal information, which spreads the available pixels of a certain frame to the corresponding areas of other frames; the other is the generation ability. If there are no available pixels in other frames, then It is necessary to use spatial and temporal information to generate pixels in the corresponding area.
  • the Visual Identity System (VIS) is used to pre-identify the mask template corresponding to the object in the image.
  • Mask template Use a selected image, graphic or object to block all or part of the image to be processed to control the image processing area or process.
  • the specific image or object used for overlaying is called a mask template.
  • mask templates can be used for films, filters, etc.
  • the mask template is a two-dimensional matrix array, and sometimes multi-valued images are also used.
  • image mask templates are mainly used to: 1. Extract the area of interest, and multiply the pre-made mask template of the area of interest with the image to be processed to obtain the area of interest image, and the image value in the area of interest. remain unchanged, while the image values outside the area are all 0; 2.
  • Masking effect use a mask template to shield certain areas on the image so that they do not participate in processing or calculation of processing parameters, or only process or process the shielded area.
  • Statistics 3.
  • Structural feature extraction using similarity variables or image matching methods to detect and extract structural features similar to the mask in the image; 4. Production of special-shaped images.
  • the mask template is mainly used to extract the area of interest.
  • the mask template can be a two-dimensional matrix array. The number of rows of the two-dimensional matrix array is consistent with the height of the image to be processed (that is, the number of rows of the image to be processed).
  • the number of columns is consistent with the width of the image to be processed (that is, the number of columns of pixels), that is, each element in the two-dimensional matrix array is used to process the pixels at the corresponding position in the image to be processed.
  • the value of the element in the mask template corresponding to the area to be processed (such as the blurred area) of the image to be processed is 1, and the value of the remaining positions is 0.
  • the mask of the area of interest is After multiplying the membrane template and the image to be processed, if the value of a certain position in the two-dimensional matrix array is 1, the value of the pixel at that position in the image to be processed remains unchanged. If the value of a certain position in the two-dimensional matrix array is If the value is 1, the value of the pixel at this position in the image to be processed remains unchanged, so that the area of interest can be extracted from the image to be processed.
  • Morphological processing used to extract image components from the image that are meaningful for expressing and describing the shape of the region, so that subsequent recognition work can capture the most essential shape features of the target object.
  • Morphological processing includes but is not limited to: expansion and erosion, opening and closing operations, and morphology of grayscale images.
  • first and second in this article are only used for descriptive purposes and cannot be understood to express or imply relative importance or implicitly indicate the number of indicated technical features. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more features. In the description of the embodiments of this application, unless otherwise stated, the meaning of "multiple” It's two or more.
  • video repair is to repair the video in the video. Frame images are processed.
  • video restoration technologies include: optical flow-based methods and neural network model-based methods.
  • the method based on optical flow includes the following steps: Step 1, use adjacent frames for optical flow estimation; Step 2, fill the masked area with optical flow; Step 3, apply optical flow to propagate the pixel gradient of the unmasked area to the masked area; Step 4. Perform Poisson reconstruction on the pixel gradient to generate RGB pixels; Step 5. If an image repair module is included, perform image repair on areas that cannot be filled by optical flow.
  • the network structure is mostly an encoder-decoder structure, which needs to take into account the consistency between frames and the naturalness of the generated pixels. It receives frame sequence information as input and directly outputs the repaired frames after network processing.
  • embodiments of the present application provide an image processing method, device, equipment, storage medium and program product to ensure the accuracy of image processing and improve the image quality of processed video frames.
  • the neural network model is used to complete three types of video repair; respectively:
  • the video frame image is repaired based on the inter-frame pixel propagation model.
  • the first type of object can be the foreground area in the video frame;
  • the blurred area in the video frame image is repaired based on the image repair model.
  • the first type of object can be the blurred area in the video frame, the blurred area Please refer to the description below for the detection method;
  • the object area in the video frame image (that is, the background area blocked by the foreground object) is repaired.
  • the acquired target video The first type of objects contained in the frame image are masked to obtain the image to be processed after mask processing.
  • the second type of objects contained in the video image are also masked.
  • the object is recognized and the corresponding initial mask template of the object is determined; then, the image to be processed and the initial mask template of the object are input into the trained information propagation model, and the first type of object in the image to be processed is repaired through the information propagation model.
  • the image elements to be repaired are repaired, and the initial blurred area in the first repaired image is detected (the initial blurred area is the area that still exists in the first repaired image after the image to be processed is repaired. Blurry area), generate a corresponding initial image mask template based on the initial blurry area, and determine the object target mask template in the image to be processed.
  • the initial blurred area in the first repaired image is further detected, and a corresponding initial image mask template is generated; and
  • morphological processing is performed on the initial blur area corresponding to the initial blur pixels to obtain the image target mask template to make the blur area more regular. ;
  • the image repair model is used to perform the pixel area corresponding to the intermediate blurred pixel in the first repaired image.
  • Repair processing obtain the second repaired image, repair the blurred area in the first repaired image, that is, enhance the blurred area in the first repaired image; finally, determine the object initial mask template and the object target mask template.
  • the object repair model is used to repair the pixel areas corresponding to the second type of objects in the second repair image in the second repair image, and the third repair image is obtained to realize the repair of the occluded object area.
  • the repair process is to enhance the blurred area in the second repaired image.
  • the above-mentioned initial blurred pixels refer to pixels in the initial mask template of the image
  • the intermediate blurred pixels refer to pixels in the target mask template of the image.
  • the blurred area caused by complex texture and object occlusion is repaired and enhanced, and the image quality of the target repaired image is improved.
  • parts of the information propagation model, image repair model and object repair model involve artificial intelligence (AI) and machine learning technology, based on speech technology, natural language processing technology and machine learning in artificial intelligence. (Machine Learning, ML).
  • AI artificial intelligence
  • machine learning technology based on speech technology, natural language processing technology and machine learning in artificial intelligence.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology mainly includes several major directions such as computer vision technology, natural language processing technology, and machine learning/deep learning. With the research and progress of artificial intelligence technology, artificial intelligence has been researched and applied in many fields, such as common smart homes, smart customer service, virtual assistants, smart speakers, smart marketing, driverless driving, autonomous driving, robots, smart medical care, etc. , I believe that with the development of technology, artificial intelligence will be applied in more fields and play an increasingly important value.
  • Machine learning is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, and algorithm complexity. Complexity theory and many other disciplines. It specializes in studying how computers can simulate or implement human learning behavior to acquire new knowledge or skills, and reorganize existing knowledge structures to continuously improve their performance. Compared with data mining to find mutual characteristics between big data, machine learning pays more attention to the design of algorithms, allowing computers to automatically "learn" rules from data and use the rules to predict unknown data.
  • Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent. Its applications cover all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning and other technologies. Reinforcement Learning (RL), also known as reinforcement learning, evaluation learning or reinforcement learning, is one of the paradigms and methodologies of machine learning. It is used to describe and solve the problem of how an agent learns during its interaction with the environment. Strategies to maximize returns or achieve specific goals.
  • RL Reinforcement Learning
  • FIG. 3 is a schematic diagram of an application scenario according to an embodiment of the present application.
  • This application scenario includes a terminal device 310 and a server 320, and the terminal device 310 and the server 320 can communicate through a communication network.
  • the communication network may be a wired network or a wireless network. Therefore, the terminal device 310 and the server 320 may be connected directly or indirectly through wired or wireless communication. For example, the terminal device 310 may be indirectly connected to the server 320 through a wireless access point, or the terminal device 310 may be directly connected to the server 320 through the Internet, which is not limited in this application.
  • the terminal device 310 includes but is not limited to mobile phones, tablet computers, notebook computers, desktop computers, e-book readers, intelligent voice interaction devices, smart home appliances, vehicle-mounted terminals and other devices; the terminal device may be installed with various A client, which can be an application (such as a browser, game software, etc.) that supports functions such as video editing and video playback, or a web page, a small program, etc.;
  • a client can be an application (such as a browser, game software, etc.) that supports functions such as video editing and video playback, or a web page, a small program, etc.
  • the server 320 is a backend server corresponding to the client installed in the terminal device 310.
  • the server 320 can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, Cloud servers for basic cloud computing services such as middleware services, domain name services, security services, Content Delivery Network (CDN), and big data and artificial intelligence platforms.
  • CDN Content Delivery Network
  • the image processing method in the embodiment of the present application can be executed by an electronic device, and the electronic device can be the server 320 or the terminal device 310. That is, the method can be executed by the server 320 or the terminal device 310 alone, or it can also be executed by the server 320 or the terminal device 310.
  • the server 320 and the terminal device 310 execute together.
  • the terminal device 310 can obtain the image to be processed after mask processing, perform repair processing on the image to be processed, obtain the first repaired image, and determine the initial image mask template corresponding to the first repaired image.
  • the image initial mask template is processed to obtain an image target mask template
  • the first number of intermediate blur pixels contained in the image target mask template is
  • the second quantity reaches the second threshold
  • the blur position in the first repaired image is continued to be repaired to obtain a second repaired image.
  • a target repaired image corresponding to the image to be processed is determined.
  • the terminal device 310 can obtain the video frame image, and then send the video frame image to the server 320.
  • the server 320 performs mask processing on the first type of objects contained in the obtained video frame image to obtain the mask.
  • the processed image to be processed is repaired on the image to be processed to obtain a first repaired image, and an initial image mask template corresponding to the first repaired image is determined, when the first number of initial blurred pixels contained in the initial mask template of the image reaches the first
  • a threshold is reached, the initial image mask template is processed to obtain the image target mask template, and when the second number of intermediate blur pixels contained in the image target mask template reaches the second threshold, the blur position in the first repaired image is Continue the repair process to obtain a second repair image, and finally determine the target repair image corresponding to the image to be processed based on the second repair image.
  • the terminal device 310 can obtain the image to be processed, perform repair processing on the image to be processed, obtain the first repaired image, and then send the first repaired image to the server 320, and the server 320 Determine the initial image mask template corresponding to the first repaired image, and when the first number of initial blurred pixels contained in the initial image mask template reaches the first threshold, process the initial image mask template to obtain the image target mask template, When the second number of intermediate blurred pixels contained in the image target mask template reaches the second threshold, the blurred position in the first repaired image is continued to be repaired to obtain a second repaired image, and finally based on the second repaired image, the to-be-repaired image is determined. Process the target corresponding image to repair the image.
  • a video frame image can be input into the terminal device 310, and the terminal device 310 sends the video frame image to be processed to the server 320.
  • the server 320 can use the image processing method of the embodiment of the present application to determine the target repair corresponding to the image to be processed. image.
  • the multiple servers 320 can be composed into a blockchain, and the server 320 is a node on the blockchain; such as the image processing method disclosed in the embodiment of the present application , the methods of repair processing, morphology processing, etc. involved can be saved on the blockchain.
  • Figure 4 is a flow chart of an image processing method provided by an embodiment of the present application, including the following steps:
  • Step S400 Mask processing is performed on the first type of objects contained in the acquired target video frame image to obtain an image to be processed after mask processing; the first type of objects are image elements to be repaired.
  • the image to be processed contains a repair area that requires video repair determined based on the mask area; it should be noted that the mask area is the repair area.
  • the repair area in the image to be processed is repaired to obtain a first repaired image; then, the first repaired image is detected to determine whether other areas in the first repaired image except the repaired area are Whether the image content of the image is the same as that of the video frame image before repair processing or the image content to be processed before repair processing, and determine whether the first filling image needs to be further filled to obtain the image content of other areas except the repair area.
  • a target filling image that is consistent with the video frame image before inpainting or the image to be processed before inpainting.
  • Step S401 Perform repair processing on the first type of object in the image to be processed, obtain a first repaired image, and generate a corresponding initial image mask template based on the initial blurred area in the first repaired image.
  • generating the corresponding initial image mask template includes: generating an initial image mask template including the initial blurred area, that is, the initial image mask template is a mask template of the initial blurred area,
  • the initial template of the image may be a two-dimensional matrix array.
  • the number of rows of the two-dimensional matrix array is consistent with the height of the first repaired image (i.e., the number of rows of the first repaired image), and the number of columns is consistent with the width of the first repaired image (i.e., the number of rows of the first repaired image).
  • the number of columns of pixels in the repaired image is consistent, and each element in the two-dimensional matrix array is used to process the pixel at the corresponding position in the first repaired image.
  • the value of the element in the initial mask template of the image corresponding to the initial blurred area of the image to be processed is 1, and the value of the remaining positions is 0.
  • the initial mask template of the image is multiplied by the first repaired image, if the two-dimensional matrix If the value of a certain position in the array is 1, then the value of the pixel at that position in the first repaired image remains unchanged. If the value of a certain position in the two-dimensional matrix array is 1, then the value of the pixel at that position in the first repaired image will not change. The numerical values of the pixels remain unchanged, so that the initial image mask template can be used to extract the initial blurred area from the first repaired image.
  • the trained information propagation model F T when filling the first type of objects in the image to be processed: first, input the video sequence containing the image to be processed into the trained information propagation model F T ; then, in the trained information propagation model F T In the information propagation model F T , with reference to the time domain information and spatial domain information, based on the pixels in other video frame images contained in the video sequence, the first type of objects in the image to be processed are repaired; specifically, in the image containing the image to be processed, In two or more adjacent video frame images, the first pixel in other video frame images is used to fill the second pixel in the image to be processed, where the first pixel in other video frames is the same as the first pixel in the image to be processed. The second pixel is at the same position in the video frame image.
  • Figure 5 is a schematic diagram of filling processing of first type objects in an embodiment of the present application.
  • the corresponding initial image mask template m blur is generated, which can be achieved in the following ways:
  • the size of the first repaired image is 7cm*7cm, then the size of each pixel block can be 0.7cm*0.7cm, as required It should be noted that the method of dividing the first repaired image into multiple pixel blocks is only an example and is not the only method;
  • the image quality can be set as the resolution threshold.
  • the resolution threshold When the resolution of a pixel block is lower than the resolution threshold, the pixel block is used as the initial blur area;
  • the first type of objects includes but is not limited to: logo removal, subtitle removal, object removal, etc.; wherein the object can be a moving person or object, or a stationary person or object.
  • a video clip is produced based on the video of a certain platform website, but because the video obtained from a certain platform contains a station logo, which affects the look and feel, the first type of object is the station logo at this time, and can be provided by the embodiment of the present application.
  • the image processing technology removes the station logo from the video frame image of the video. See Figure 6.
  • Figure 6 is a schematic diagram of image processing provided by an embodiment of the present application.
  • subtitles can be removed from the video frame image, see FIG. 7 , which is an image processing schematic diagram provided by an embodiment of the present application; or certain moving objects, such as passers-by, vehicles, etc., can be removed from the video frame image. 8 is removed, and FIG. 8 is a schematic diagram of image processing provided by an embodiment of the present application.
  • Step S402 When the first number of initial blur pixels contained in the initial image mask template reaches the first threshold, perform morphological processing on the initial blur area corresponding to the initial blur pixels to obtain the image target mask template.
  • the initial image mask template is determined based on pixel blocks, and each pixel block has its own corresponding resolution, where the resolution represents the number of pixels in the horizontal and vertical directions of the pixel block; therefore, based on each The resolution of the pixel block, determine the number of pixels contained in the pixel block, and add the number of pixels contained in all pixel blocks contained in the initial mask template of the image to obtain the initial blurred pixels contained in the initial mask template of the image.
  • the number of pixels in a pixel block the number of pixels in the horizontal direction * the number of pixels in the vertical direction.
  • the first threshold when the first number of initial blurred pixels contained in the initial mask template of the image reaches the first threshold, it means that there are more pixel blocks in the first repaired image.
  • the initial blurred area corresponding to the initial blurred pixel will be morphologically processed to obtain the image target mask template, so that the initial blurred area in the first repaired image
  • the fuzzy areas are connected, and the fuzzy areas are more regular.
  • the initial blur area corresponding to the initial blur pixel is subjected to morphological processing to obtain the image target mask template, which can be achieved in the following ways: using the expansion f dilate operation and The corrode f dilate operation first expands and then erodes multiple initial blur areas m blur to connect multiple scattered initial blur areas and obtain an image target mask template.
  • the image target mask template is
  • Figure 9 is a schematic diagram of morphological processing of an initial blurred area provided by an embodiment of the present application.
  • the first repaired image includes a plurality of initial blurred areas, respectively A1 to A8; at this time, the initial blurred areas A1 to A8 are first expanded according to the set expansion ratio, and the expanded initial blurred areas B1 to B8 are obtained.
  • the shrinkage ratio is determined based on the expansion ratio. When the expansion ratio is 10, the shrinkage ratio is 1/10.
  • the principle of image erosion is as follows: Assume that the foreground object in the image is 1 and the background is 0. Assume that there is a foreground object in the original image, then the process of using a structural element to corrode the original image is as follows: traverse every pixel of the original image , then use the center point of the structural element to align the pixel currently being traversed, then take the minimum value of all pixels in the corresponding area of the original image covered by the current structural element, and replace the current pixel value with this minimum value. Since the minimum value of a binary image is 0, it is replaced with 0, which becomes a black background. It can also be seen that if all the current structural elements are covered by background, then no changes will be made to the original image, because they are all 0.
  • each intermediate fuzzy area is equal to or larger than the initial fuzzy area; the intermediate fuzzy area is relatively large (for example, the width and height are larger than the corresponding When the width and height threshold), the blurred area of the image blur can be clearly displayed in the first repaired image.
  • the repairing effect of the first repaired image is not good, and the first repaired image needs to be repaired. Therefore, it is determined whether to perform repair processing on the first repair image based on the image target mask template, while ensuring the repair effect, Reduce the amount of calculation.
  • the first number of initial blurred pixels contained in the initial mask template of the image is less than the first threshold, it means that the number of pixel blocks in the first repaired image is reduced, and there is no obvious blur in the first repaired image.
  • the blurred area of the image is displayed, it is determined that the repair effect of the first repaired image is better, and the first repaired image is used as the target repaired image corresponding to the image to be processed. There is no need to perform morphology on the blurred area corresponding to the initial blurred pixels. processing, and there is no need to perform steps such as continuing to process the first repaired image, so as to reduce the calculation process and improve image processing efficiency.
  • Step S403 When the second number of intermediate blurred pixels contained in the image target mask template reaches the second threshold, in the first repaired image, repair processing is performed on the pixel area corresponding to the intermediate blurred pixels to obtain a second repaired image.
  • the dispersed initial blurred areas have been connected in the image target mask template, when the second number of intermediate blurred pixels contained in the image target mask template reaches the second threshold, it means that the first repaired image can be clearly blurred. If the blurred area of the image is displayed, it is determined that the repair effect of the first repaired image is not good. At this time, in order to ensure the accuracy of image processing, it is necessary to repair the pixel area corresponding to the middle blurred pixel in the first repaired image.
  • the pixel area corresponding to the intermediate blurred pixel is repaired, which can be implemented in the following manner:
  • the trained image repair model F I in the first repaired image , based on the image target mask template is Repair the pixel area corresponding to the middle blurred pixel to obtain the second repaired image; record the repair process of the trained image repair model as:
  • x blurcomp represents the second repaired image.
  • the pixel area is determined in the following way: according to the position of the intermediate blur pixel in the target mask template, the area at the same position in the first repaired image is determined as the pixel area.
  • the pixel area corresponding to the middle blurred pixel is generally a no-reference area or a moving object area.
  • the trained image repair model F I can be an image generation tool for blurred areas such as latent diffusion models (Latent Diffusion Models, LDM) or large mask inpainting models (Large Mask Inpainting, LaMa).
  • the LDM model is a high-resolution image synthesis training tool that achieves highly competitive performance on image restoration and various tasks such as unconditional image generation, semantic scene synthesis, and super-resolution;
  • the LaMa model is an image generation tool that generalizes well to higher resolution images.
  • the pixel area corresponding to the intermediate blur pixel is repaired, which can be achieved in the following ways: First, input the 3-channel first repaired image and the 1-channel image target mask template LaMa model; secondly, in the LaMa model, the image target mask template is inverted and multiplied with the first repaired image to obtain the first color image with the mask area; then, the first color picture and the image template are The mask templates are superimposed to obtain a 4-channel image; then, after downsampling the 4-channel image, it is processed by Fast Fourier Convolutions (FFC), and the Fast Fourier Convolutions are The image after convolution processing is upsampled to obtain the second repaired image; during the processing of fast Fourier convolution, the input image will be divided into two parts based on the channel, and these two parts will undergo two different processes respectively.
  • FFC Fast Fourier Convolutions
  • FIG. 10 is a schematic diagram of repairing a pixel area corresponding to an intermediate blurred pixel in an embodiment of the present application.
  • fast Fourier convolution allows the LaMa model to obtain the receptive field of the entire image even at shallow levels.
  • Fast Fourier convolution not only improves the repair quality of the LaMa model, but also reduces the number of parameters of the LaMa model.
  • the bias in fast Fourier convolution makes the LaMa model have better generalization, and can use low-resolution images to produce repair results of high-resolution images.
  • fast Fourier convolution it can be used in the spatial domain Working simultaneously in the frequency domain, there is no need to go back to previous layers to understand the context of the image.
  • first threshold and the second threshold may be the same or different.
  • the method of determining the second number of intermediate blur pixels is similar to the method of determining the first number of initial blur pixels, which will not be repeated here.
  • the second number of intermediate blurred pixels contained in the image target mask template is less than the second threshold, it means that the number of pixel blocks in the first repaired image is reduced, and the first repaired image cannot be obviously blurred. If the blurry area of the image is blurred, the repair effect of the first repaired image is better. At this time, the first repaired image is used as the target repaired image corresponding to the image to be processed. There is no need to continue repairing the blurred area in the first repaired image to reduce Calculation process to improve image processing efficiency.
  • Step S404 Based on the second repaired image, determine the target repaired image corresponding to the image to be processed.
  • the first type of object in the image to be processed is repaired to obtain the first repaired image.
  • further detection is performed.
  • the blurred area in the first repaired image is repaired, that is, the blurred area in the first repaired image is enhanced; and since the blurred area in the first repaired image is enhanced, a second repaired image is obtained, which improves The image quality of the second repaired image is improved, thereby further ensuring the image quality of the target repaired image.
  • the second repaired image when determining the target repaired image corresponding to the image to be processed based on the second repaired image, the second repaired image can be used as the target repaired image, or the third repaired image obtained by repairing the second repaired image can be used.
  • the image is used as the target to repair the image.
  • whether to use the second repaired image as the target repaired image or the third filled image as the target repaired image is determined based on whether the outline of the second type of object in the object initial mask template and the object target mask template are consistent.
  • the object target mask template is determined in the following way:
  • the object initial mask template m obj is input into the trained information propagation model F T ; then, in the trained information propagation model F T , based on the object completion ability of the trained information propagation model F T , the The second type of object in the initial object mask template is subjected to object outline completion processing to obtain the object target mask template.
  • the object initial mask template is determined after identifying the second type of objects contained in the video frame image, and the second type of objects are image elements that need to be retained.
  • the initial mask template m obj of the object corresponding to the second type of object in the video frame image is determined through the visual recognition model F VIS (Visual Identity System, VIS); denote that through the visual recognition model F VIS ,
  • x m is the video frame image.
  • the initial object mask template m obj corresponding to the second type of object in the image to be processed is determined through the visual recognition model F VIS (Visual Identity System, VIS).
  • F VIS Visual Identity System
  • the visual recognition model is trained based on images with mask templates.
  • first the object initial mask template is compared with the object target mask template to obtain the first comparison result, where the first comparison result is used to characterize whether the outline of the second type of object is consistent; then, based on The first comparison result is processed on the second repaired image to obtain the target repaired image.
  • the object initial mask template and the object target mask template can be completely overlapped to determine the second type of object mask area in the object initial mask template and the target Whether the mask areas of the second type of object in the mask template completely overlap. If they completely overlap, it is determined that the first comparison result is used to characterize the outline of the second type of object. Otherwise, it is determined that the first comparison result is used to characterize the second type of object.
  • the outlines are inconsistent.
  • the third pixel number of the second type object mask area in the object initial mask template, and the second type object mask in the object target mask template The fourth number of pixels in the area, and the first comparison result is determined based on the difference between the third number of pixels and the fourth number of pixels, where the difference between the third number of pixels and the fourth number of pixels characterizes the initial mask template of the object and Differences in the second type of object mask area in the object target mask template.
  • the comparison result Based on the difference between the third number of pixels and the fourth number of pixels, when determining the comparison result, if the difference between the third number and the fourth number of pixels is less than the threshold, it is determined that the first comparison result is used to characterize the second type of object.
  • the outlines are consistent; otherwise, it is determined that the first comparison result is used to characterize the outlines of the second type of objects that are inconsistent.
  • the second repaired image is used as the target repaired image.
  • the second repaired image is processed to obtain the target repaired image in the following manner:
  • x objcomp represents the third repaired image after repair
  • x objremain represents the visible pixel part of the image to be processed
  • x objremain x mt ⁇ m obj , which includes the first type of object mask area and the second type of object mask. Color image of the area.
  • the trained object repair model can use any model for image repair, such as the spatiotemporal joint model for video repair (Spatial-Temporal Transformations for Video Inpainting, STTN); when using the object repair model , in the first repaired image, when repairing the pixel area corresponding to the second type of object, based on the self-attention characteristics of Transformations, the visible pixel part is used to repair the pixel area corresponding to the second type of object.
  • the spatiotemporal joint model for video repair Spatial-Temporal Transformations for Video Inpainting, STTN
  • Figure 11 is a flow chart of another image processing method in an embodiment of the present application, including the following steps:
  • Step S1100 Mask processing is performed on the first type of objects contained in the acquired target video frame image to obtain an image to be processed after mask processing; the first type of objects are image elements to be repaired.
  • Step S1101 Recognize the second type of object contained in the acquired video frame image, and determine the initial mask template of the object based on the recognition result.
  • Step S1102 Perform repair processing on the first type of object in the image to be processed, obtain a first repaired image, and generate a corresponding initial image mask template based on the initial blurred area in the first repaired image.
  • Step S1103 Perform object outline completion processing on the second type of object in the initial object mask template to obtain the object target mask template.
  • Step S1104 When the first number of initial blur pixels contained in the initial image mask template reaches the first threshold, perform morphological processing on the blur area corresponding to the initial blur pixels to obtain the image target mask template.
  • Step S1105 When the second number of intermediate blurred pixels contained in the image target mask template reaches the second threshold, in the first repaired image, repair processing is performed on the pixel area corresponding to the intermediate blurred pixels to obtain a second repaired image.
  • Step S1106 Compare the object initial mask template with the object target mask template to determine whether the outline of the second type of object is consistent. If so, execute step S1107; otherwise, execute step S1108.
  • Step S1107 use the second repaired image as the target repaired image.
  • Step S1108 In the second repaired image, repair processing is performed on the pixel area corresponding to the second type of object to obtain a third repaired image, and the third repaired image is used as the target repaired image.
  • Figure 12 exemplarily provides a flow chart of a specific implementation method of image processing in the embodiment of the present application, including the following steps:
  • Step S1200 Mask processing is performed on first-type objects contained in the acquired target video frame image to obtain an image to be processed after mask processing.
  • the first-type objects are image elements to be repaired.
  • Step S1201 Recognize the second type of objects contained in the acquired target video frame image through the visual recognition model, and determine the initial object mask template of the second type of object based on the recognition results.
  • Step S1202 input the video sequence containing the image to be processed and the mask template sequence containing the initial mask template of the object of the image to be processed into the trained information propagation model, and obtain the first repaired image and image through the trained information propagation model.
  • Initial mask template and object target mask template input the video sequence containing the image to be processed and the mask template sequence containing the initial mask template of the object of the image to be processed into the trained information propagation model, and obtain the first repaired image and image through the trained information propagation model.
  • the trained information dissemination model corresponds to two input parameters, which are:
  • the first input parameter is a video sequence containing an image to be processed, and each frame of the image in the video sequence can be an image x mt to be processed;
  • the first input parameter is a mask template sequence including an object initial mask template of the image to be processed, and each mask template in the mask template sequence can be an object initial mask template corresponding to the corresponding image to be processed; For example: It is the initial mask template of the object x m1 ;
  • Step S1203 Determine whether the first number of initial blur pixels contained in the initial image mask template reaches the first threshold. If so, execute step S1204; otherwise, execute step S1210.
  • Step S1204 Perform morphological processing on the blurred area corresponding to the initial blurred pixel to obtain an image target mask template.
  • Step S1205 Determine whether the second number of intermediate blur pixels included in the image target mask template reaches the second threshold. If so, execute step S1206; otherwise, execute step S1210.
  • Step S1206 Input the image target mask template and the first repaired image into the trained image repair model, and obtain the second repaired image through the trained image repair model.
  • Step S1207 Determine whether the outline of the second type of object contained in the initial object mask template and the object target mask template are consistent. If so, execute step S1211; otherwise, execute step S1208.
  • Step S1208 Input the second repaired image and the object target mask template into the trained object repair model, and obtain the third repaired image through the trained object repair model.
  • Step S1209 Use the third repaired image as the target repaired image corresponding to the image to be processed.
  • Step S1210 use the first repaired image as the target repaired image corresponding to the image to be processed.
  • Step S1211 use the second repaired image as the target repaired image corresponding to the image to be processed.
  • Figure 13 corresponds to Figure 12, and Figure 13 provides a schematic diagram of a specific implementation method of image processing in an embodiment of the present application.
  • the image processing process is divided into three stages according to the model used.
  • the three stages are explained in detail below.
  • Phase 1 Input the initial mask template of the image to be processed and the object into the trained information propagation model; in the trained information propagation model, based on the inter-frame reference information, use the corresponding images in other video frames that are continuous with the image to be processed.
  • the available pixels in the area are used to repair the inter-frame reference information of the image to be processed.
  • the trained information propagation model also has certain image generation capabilities. Through the image generation capabilities, the pixel parts without available pixels in other video frame images are repaired with the help of space and Information in the time domain is used to generate pixels to complete image repair and obtain the first repaired image; at the same time, the trained information propagation model also has object completion capabilities. Through the object completion capabilities, the second repair image in the image to be processed is obtained.
  • the class object performs contour completion processing to obtain the object target mask template; and the trained information propagation model can also determine the initial image mask template corresponding to the initial blur area based on the repaired image; finally, the image that has been processed in stage one is The trained information propagation model simultaneously outputs the first repaired image, the initial image mask template corresponding to the initial blurred area where the repair result is blurred in the first repaired image, and the object target mask template.
  • Stage 2 First determine the first number of initial blur pixels in the initial blur area in the image initial mask template, and then determine whether the first number is greater than the first threshold. If the first number of initial blur pixels in the initial blur area is less than the first If a threshold value is reached, the initial blurred area is ignored, and the first repaired image is output as the target repaired image without subsequent processing; if the first number of initial blurred pixels in the initial blurred area reaches the first threshold, the expansion and erosion operations are applied to The scattered initial blur areas are connected to obtain the processed image target mask template. After obtaining the image target mask template, determine the second number of intermediate blur pixels in the blur area in the image target mask template, and then determine the second number.
  • the image repair model is called, and based on the processed image target mask template, the pixel position of the blurred area in the image target mask template is repaired on the first repaired image.
  • Stage three Based on stage two, if the number of pixels changed by the object target mask template relative to the object's initial mask template in the mask area of the second type of object is less than the third threshold, it is considered that the mask of the second type of object is There is no object outline that needs to be completed in the mask area, and the second repaired image is used as the target repaired image; if the number of pixels changed by the object's target mask template relative to the object's initial mask template in the mask area of the second type of object reaches the third If the threshold is three, the object repair model is called to repair the pixels in the mask area of the second type of object, covering the repair content of the image repair module, obtaining the third repaired image, and using the third repaired image as the target repaired image.
  • the first repaired image, the initial image mask template, and the object target mask template are determined based on the image to be processed and the object initial mask template through the trained information propagation model. , realizes reference pixel propagation, so that image content with complex movements in the background can be better repaired. After the image elements to be repaired are repaired and the first repaired image is obtained, in order to ensure the accuracy of image processing during the image repair process, it is determined that the first number of initial blurred pixels contained in the initial mask template of the image reaches the first threshold.
  • the image repair model is used to repair the pixel area corresponding to the intermediate blurred pixel in the first repaired image to obtain the second repaired image.
  • the process of image processing of the image to be processed involves a trained information propagation model, a trained image repair model, and a trained object repair model.
  • a trained information propagation model e.g., a trained image repair model
  • a trained object repair model e.g., a trained object repair model
  • model training e.g., a process of model training is explained in detail.
  • the trained information dissemination model is obtained by performing loop iterative training on the information dissemination model to be trained based on the training samples in the training sample data set.
  • the following takes a loop iteration process as an example to introduce the training process of the information dissemination model to be trained.
  • Figure 14 is a training method for an information dissemination model in an embodiment of the present application, which includes the following steps:
  • Step S1400 obtain a training sample data set.
  • the training sample data set includes at least one set of training samples.
  • Each set of training samples includes: historical images that have been masked for the image elements to be repaired and the corresponding actual repaired images, and historical images.
  • Step S1401 Select training samples from the training sample data set, and input the training samples into the information dissemination model to be trained.
  • Step S1402 use the information propagation model to be trained to predict the predicted repaired image corresponding to the historical image, and based on the predicted blurred area in the predicted repaired image, generate an image prediction mask template and an object prediction mask corresponding to the object historical mask template. template.
  • Step S1403 Construct a first-type loss function based on the predicted repaired image and the actual repaired image, construct a second-type loss function based on the image prediction mask template and the image intermediate mask template, and construct a second-type loss function based on the object prediction mask template and the object's actual mask template.
  • the third type of loss function is constructed, in which the image intermediate mask template is determined based on the predicted repaired image and the actual repaired image.
  • the first type of loss function is determined as follows:
  • the first sub-loss function is determined; that is, the first sub-loss function is constructed using L 1 loss, and the first sub-loss function is recorded as
  • the second sub-loss function is determined, where the second comparison result is used to characterize whether the predicted repaired image and the actual repaired image are consistent; that is, the adversarial loss L gen is used to construct the second sub-loss function sub-loss function, the second sub-loss function is recorded as
  • the first type of loss function is determined.
  • the second type of loss function is determined as follows:
  • the third sub-loss function is determined, and the third sub-loss function is used as the second type of loss function; where, the image prediction mask template is Predict the number of pixels in the predicted blurry area in the repaired image Obtained when greater than the set threshold.
  • c is RGB 3 channels
  • H*W represents a matrix of H*W size
  • denoted is the predicted value of d t
  • d t is the actual difference between the predicted repaired image and the actual repaired image, that is, the number of pixels in the actual blurred area in the predicted repaired image compared with the actual repaired image, and represents the predicted repaired image
  • y t is the actual repaired image.
  • the third type of loss function is determined as follows:
  • the fourth sub-loss function is determined; that is, the fourth sub-loss function is constructed using L 1 loss, and the fourth sub-loss function is recorded as and in Represents the object prediction mask template, Represents the actual mask template of historical objects;
  • the fifth sub-loss function is determined; that is, the fifth sub-loss function is constructed using the dice loss L dice , and the fifth sub-loss function is recorded as and in Represents the object prediction mask template, Represents the actual mask template of historical objects;
  • the third type of loss function is determined.
  • Step S1404 Construct a target loss function based on the first type of loss function, the second type of loss function, and the third type of loss function.
  • the target loss function is:
  • Step S1405 Adjust parameters of the information dissemination model to be trained based on the target loss function.
  • the image restoration model uses image generation tools for blurred areas such as latent diffusion models (Latent Diffusion Models, LDM) or large mask inpainting models (Large Mask Inpainting, LaMa).
  • the original image, the image mask template corresponding to the original image, the guide text, and the target image are input into the LDM model to be trained, and the foreground part and the background part are repeatedly mixed in the LDM model based on the guide text.
  • the target image is an image that meets the repair standard after image repair is performed on the original image.
  • the original image, the image mask template corresponding to the original image, and the target image are input to the LaMa model to be trained, and in the LaMa model, the original image containing the image mask, and the original
  • the image masks of the image are superimposed to obtain a 4-channel image. After downsampling the 4-channel image, it undergoes fast Fourier convolution processing, and after fast Fourier processing, an upsampling operation is performed.
  • the receptive field is the convolutional neural network through each layer The size of the region on the output feature map mapped on the original image.
  • the object repair model uses a model that uses transformer as the network structure, such as STTN.
  • the original image and the original image containing the mask area are input to the object repair model to be trained, and in the object repair model, the mask areas in all input images are filled simultaneously through self-attention, Obtain the predicted image; and construct a loss function based on the predicted image and the original image, and adjust the parameters of the object repair model to be trained based on the loss function; among them, the loss function in the training process uses L 1 loss and adversarial loss L gen .
  • a training method for the information propagation model, image restoration model, and object restoration model is proposed to characterize the accuracy of the output results of the information propagation model, image restoration model, and object restoration model. Further, in the embodiments of this application In the image processing process, when using model processing, the accuracy of image processing improves the image quality of the processed video frame image.
  • the embodiments of the present application also provide a digital image processing device,
  • the principle of the device to solve the problem is similar to the method of the above-mentioned embodiment, so the implementation of the device can refer to the implementation of the above-mentioned method, and repeated details will not be repeated.
  • FIG. 15 illustrates an image processing device 1500 provided by an embodiment of the present application.
  • the image processing device 1500 includes:
  • the first processing unit 1501 is configured to perform mask processing on the first type of objects contained in the acquired target video frame image, and obtain the image to be processed after mask processing; the first type of object is the image element to be repaired; the second processing Unit 1502 is configured to perform repair processing on the first type of object in the image to be processed, obtain a first repaired image, and generate a corresponding initial image mask template based on the initial blurred area in the first repaired image; the third processing unit 1503 is configured When the first number of initial blur pixels contained in the initial mask template of the image reaches the first threshold, morphological processing is performed on the initial blur area corresponding to the initial blur pixels to obtain the image target mask template; the fourth processing unit 1504 is configured When the second number of intermediate blurred pixels contained in the image target mask template reaches the second threshold, in the first repaired image, the pixel area corresponding to the intermediate blurred pixels is repaired to obtain a second repaired image; the determination unit 1505 , configured to determine the target repair image corresponding to the image to be processed based on the second repair image.
  • the second processing unit 1502 is specifically configured to: input the video sequence containing the image to be processed into a trained information propagation model; in the trained information propagation model, based on other videos in the video sequence The image elements in the frame image are repaired for the first type of objects in the image to be processed to obtain the first repaired image, and based on the initial blurred area in the first repaired image, a corresponding initial image mask template is generated.
  • the second processing unit 1502 is specifically configured to: input an initial mask template of the object into the trained information propagation model, where the initial mask template of the object is a second image contained in the video frame image.
  • the second type of object is determined after identification, and the second type of object is an image element that needs to be retained; in the trained information propagation model, the object outline is completed for the second type of object in the initial object mask template to obtain the object target Mask template.
  • the determination unit 1505 is specifically configured to: compare the object initial mask template with the object target mask template to obtain a first comparison result, where the first comparison result is used to characterize the second type of object. whether the contours are consistent; based on the first comparison result, process the second repaired image to obtain the target repaired image.
  • the determination unit 1505 is specifically configured to: if the first comparison result indicates that the contours of the second type of object are inconsistent, then in the second repair image, perform repair processing on the pixel area corresponding to the second type of object, A third repaired image is obtained, and the third repaired image is used as the target repaired image; if the first comparison result indicates that the second object wheel pair is consistent, the second repaired image is used as the target repaired image.
  • the trained information dissemination model is obtained by training in the following manner: based on the training samples in the training sample data set, perform loop iterative training on the information dissemination model to be trained to obtain the trained information dissemination model. , in which the following operations are performed in a loop iteration process: training samples are selected from the training sample data set; where, the training samples are: historical images after masking the image elements to be repaired, and the historical images that need to be retained The object history mask template corresponding to the image element; input the training sample into the information propagation model to predict the predicted repaired image corresponding to the historical image, and based on the predicted blurred area in the predicted repaired image, generate the image prediction mask template and the object history mask template The corresponding object prediction mask template; the target loss function constructed based on the predicted repaired image, the image prediction mask template, and the object prediction mask template is used to adjust the parameters of the information propagation model.
  • the training samples also include: the actual repaired image corresponding to the historical image, and the actual mask template of the object corresponding to the historical mask template of the object; then the target loss function is constructed in the following way: based on prediction The first type of loss function is constructed based on the repaired image and the actual repaired image, the second type loss function is constructed based on the image prediction mask template and the image intermediate mask template, and the third type loss function is constructed based on the object prediction mask template and the object's actual mask template. function, where the image intermediate mask template is determined based on the predicted repaired image and the actual repaired image; the target loss function is constructed based on the first type of loss function, the second type of loss function, and the third type of loss function.
  • the first type of loss function is determined in the following way: determining the first sub-loss function based on the image difference pixel value between the predicted repaired image and the actual repaired image; based on the predicted repaired image and the actual repaired image The second comparison result of the repaired image is used to determine the second sub-loss function, where the second comparison result is used to characterize whether the predicted repaired image is consistent with the actual repaired image; based on the first sub-loss function and the second sub-loss function, the first sub-loss function is determined. class loss function.
  • the second type of loss function is determined as follows: based on the mask difference pixel value between the image prediction mask template and the image intermediate mask template, determine the third sub-loss function, and The third sub-loss function is regarded as the second type of loss function.
  • the third type of loss function is determined in the following way: based on the object difference pixel value between the object prediction mask template and the historical object actual mask template, determine the fourth sub-loss function; The similarity between the object prediction mask template and the actual mask template of the historical object is determined to determine the fifth sub-loss function; based on the fourth sub-loss function and the fifth sub-loss function, the third type of loss function is determined.
  • the second processing unit 1502 after the second processing unit 1502 generates the corresponding initial image mask template, it is also configured to: when the first number of initial blur pixels contained in the initial image mask template is less than the first threshold, The first repaired image is used as the target repaired image corresponding to the image to be processed.
  • the third processing unit 1503 is further configured to: when the second number of intermediate blur pixels contained in the image target mask template is less than the second threshold, the first The repaired image is used as the target repaired image corresponding to the image to be processed.
  • each unit or module
  • the functions of each unit can be implemented in the same or multiple software or hardware.
  • the embodiments of the present application also provide an electronic device, and the electronic device may be a server.
  • the structure of the electronic device may be as shown in Figure 16 , including a memory 1601, a communication module 1603, and one or more processors 1602.
  • Memory 1601 is used to store computer programs executed by the processor 1602.
  • the memory 1601 may mainly include a storage program area and a storage data area.
  • the storage program area may store the operating system and programs required to run instant messaging functions.
  • the storage data area may store various instant messaging information and operating instruction sets.
  • the memory 1601 can be a volatile memory (volatile memory), such as a random-access memory (random-access memory, RAM); the memory 1601 can also be a non-volatile memory (non-volatile memory), such as a read-only memory, flash Memory (flash memory), hard disk (hard disk drive, HDD) or solid-state drive (SSD); or the memory 1601 is capable of carrying or storing desired computer programs in the form of instructions or data structures and can be Any other media accessible by a computer, but not limited to this.
  • the memory 1601 may be a combination of the above memories.
  • the processor 1602 may include one or more central processing units (CPUs) or a digital processing unit or the like.
  • the processor 1602 is used to implement the above image processing method when calling the computer program stored in the memory 1601.
  • the communication module 1603 is configured to communicate with terminal devices and other servers.
  • the embodiment of the present application does not limit the specific connection medium between the above-mentioned memory 1601, communication module 1603 and processor 1602.
  • the memory 1601 and the processor 1602 are connected through a bus 1604.
  • the bus 1604 is depicted as a thick line in FIG. 16 , and the connection methods between other components are only schematically illustrated and not limited thereto.
  • the bus 1604 can be divided into an address bus, a data bus, a control bus, etc. For ease of description, only one thick line is used in Figure 16, but it does not describe only one bus or one type of bus.
  • a computer storage medium is stored in the memory 1601, and computer executable instructions are stored in the computer storage medium.
  • the computer executable instructions are used to implement the image processing method of the embodiment of the present application.
  • the processor 1602 is used to execute the above image processing method.
  • the electronic device may also be other electronic devices, such as the terminal device 310 shown in FIG. 3 .
  • the structure of the electronic device can be as shown in Figure 17, including: communication component 1710, memory 1720, display unit 1730, camera 1740, sensor 1750, audio circuit 1760, Bluetooth module 1770, processor 1780 and other components.
  • Communication component 1710 is configured to communicate with the server.
  • a circuit wireless fidelity (Wireless Fidelity, WiFi) module may be included.
  • the WiFi module is a short-distance wireless transmission technology. Electronic devices can help users send and receive information through the WiFi module.
  • Memory 1720 may be used to store software programs and data.
  • the processor 1780 executes software programs or data stored in the memory 1720 to perform various functions and data processing of the terminal device 310 .
  • Memory 1720 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • the memory 1720 stores an operating system that enables the terminal device 310 to run. In this application, the memory 1720 can store the operating system and various application programs, and can also store codes for executing the image processing method in the embodiment of this application.
  • the display unit 1730 may also be used to display information input by the user or information provided to the user and a graphical user interface (GUI) of various menus of the terminal device 310 .
  • the display unit 1730 may include a display screen 1732 provided on the front of the terminal device 310.
  • the display screen 1732 can be configured in the form of a liquid crystal display, a light-emitting diode, etc.
  • the display unit 1730 may be used to display the target repair image, etc. in the embodiment of the present application.
  • the display unit 1730 can also be used to receive input numeric or character information and generate signal input related to user settings and function control of the terminal device 310.
  • the display unit 1730 can include a touch screen 1731 disposed on the front of the terminal device 310, which can collect The user's touch operations on or near it, such as clicking a button, dragging a scroll box, etc.
  • the touch screen 1731 can cover the display screen 1732, or the touch screen 1731 and the display screen 1732 can be integrated to realize the input and output functions of the terminal device 310. After integration, it can be referred to as a touch display screen.
  • the display unit 1730 can display application programs and corresponding operation steps.
  • Camera 1740 can be used to capture still images. There may be one camera 1740 or multiple cameras.
  • the object passes through the lens to produce an optical image that is projected onto the photosensitive element.
  • the photosensitive element can be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the processor 1780 to convert it into a digital image signal.
  • the terminal device may also include at least one sensor 1750, such as an acceleration sensor 1751, a distance sensor 1752, a fingerprint sensor 1753, and a temperature sensor 1754.
  • the terminal device can also be equipped with other sensors such as gyroscope, barometer, hygrometer, thermometer, infrared sensor, light sensor, motion sensor, etc.
  • the audio circuit 1760, the speaker 1761, and the microphone 1762 can provide an audio interface between the user and the terminal device 310.
  • the audio circuit 1760 can transmit the electrical signal converted from the received audio data to the speaker 1761, and the speaker 1761 converts it into a sound signal and outputs it.
  • the terminal device 310 may also be configured with a volume button for adjusting the volume of the sound signal.
  • the microphone 1762 converts the collected sound signal into an electrical signal, which is received by the audio circuit 1760 and converted into audio data, and then the audio data is output to the communication component 1710 for sending to, for example, another terminal device. device 310, or output the audio data to memory 1720 for further processing.
  • the Bluetooth module 1770 is used to interact with other Bluetooth devices having Bluetooth modules through the Bluetooth protocol.
  • the terminal device can establish a Bluetooth connection with a wearable electronic device (such as a smart watch) that also has a Bluetooth module through the Bluetooth module 1770 to perform data exchange.
  • a wearable electronic device such as a smart watch
  • the processor 1780 is the control center of the terminal device. It uses various interfaces and lines to connect various parts of the entire terminal. It executes the functions of the terminal device by running or executing software programs stored in the memory 1720 and calling data stored in the memory 1720. Various functions and processing data.
  • the processor 1780 may include one or more processing units; the processor 1780 may also integrate an application processor and a baseband processor, where the application processor mainly processes the operating system, user interface, application programs, etc., and the baseband processor The processor primarily handles wireless communications. It can be understood that the above-mentioned baseband processor may not be integrated into the processor 1780.
  • the processor 1780 in this application can run an operating system, application programs, user interface display and touch response, as well as the image processing method in the embodiment of this application.
  • the processor 1780 is coupled with the display unit 1730.
  • various aspects of the image processing method provided by this application can also be implemented in the form of a program product, which includes a computer program.
  • the program product is run on an electronic device, the computer program is used to make the electronic device.
  • the device performs the steps in the image processing method according to various exemplary embodiments of the present application described above in this specification.
  • the Program Product may take the form of one or more readable media in any combination.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the program product of embodiments of the present application may take the form of a portable compact disk read-only memory (CD-ROM) and include a computer program, and may be run on a computing device.
  • CD-ROM portable compact disk read-only memory
  • the program product of the present application is not limited thereto.
  • a readable storage medium may be any tangible medium containing or storing a program that may be used by or in combination with a command execution system, apparatus or device.
  • the readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying a readable computer program therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a readable signal medium may also be any readable medium other than a readable storage medium that can send, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.
  • Computer program embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical cable, RF, etc., or any suitable combination of the foregoing.
  • embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may adopt an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware aspects. Example form. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) embodying a computer-usable computer program therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

本申请提供一种图像处理方法、装置、设备及存储介质,对获取的目标视频帧图像包含的第一类对象进行掩膜处理,获得待处理图像,第一类对象为待修复的图像元素;对待处理图像中第一类对象进行修复处理,获得第一修复图像,基于第一修复图像中的初始模糊区域,生成相应的图像初始掩膜模板;当图像初始掩膜模板包含的初始模糊像素的第一数量达到第一阈值时,对初始模糊像素对应的模糊区域进行形态学处理,获得图像目标掩膜模板;当图像目标掩膜模板包含的中间模糊像素的第二数量达到第二阈值时,在第一修复图像中,对中间模糊像素对应的像素区域进行修复处理,获得第二修复图像;基于第二修复图像,确定待处理图像对应的目标修复图像。

Description

图像处理方法、装置、设备、存储介质及程序产品
相关申请的交叉引用
本申请基于申请号为202211029204.9、申请日为2022年8月26日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及图像处理技术领域,尤其涉及一种图像处理方法、装置、设备、存储介质及程序产品。
背景技术
随着科技的发展,越来越多的应用程序支持视频播放,播放的视频是经过处理后的,为了保证视频处理的准确性,而提出视频填充技术,其中,视频填充是对视频中的视频帧图像进行处理。
目前,视频填充技术包括:基于光流的方式和基于神经网络模型的方式。
基于光流进行视频帧图像处理时,使用相邻视频帧图像进行光流估计,应用光流将未遮掩区域的像素梯度传播至遮掩区域,以对遮掩区域进行光流填充,完成对视频帧图像的填充;但是基于光流的方式仅适用于背景进行简单移动的情况下,并不适用于出现对象遮挡以及背景发生复杂运动的情况。
基于神经网络模型进行视频帧图像处理时,神经网络模型为单个模型,可在背景发生复杂运动的情况下,较好的参考像素传播效果,对视频帧图像进行填充处理。但单个模型的生成能力有限,对于纹理复杂、对象遮挡的情况,填充内容模糊,无法保证视频帧图像的图像质量。
因此,如何在出现对象遮挡、纹理复杂的情况下,保证图像处理的准确性,进一步提升处理后的视频帧图像的图像质量是目前需要解决的技术问题。
发明内容
本申请提供一种图像处理方法、装置、设备、存储介质及程序产品,用以保证图像处理的准确性,提升处理后的视频帧图像的图像质量。
第一方面,本申请实施例提供一种图像处理方法,所述方法包括:
对获取的目标视频帧图像包含的第一类对象进行掩膜处理,获得掩膜处理后的待处理图像;第一类对象为待修复的图像元素;
对待处理图像中第一类对象进行修复处理,获得第一修复图像,并基于第一修复图像中的初始模糊区域,生成相应的图像初始掩膜模板;
当图像初始掩膜模板包含的初始模糊像素的第一数量达到第一阈值时,对初始模糊像素对应的初始模糊区域进行形态学处理,获得图像目标掩膜模板;
当图像目标掩膜模板包含的中间模糊像素的第二数量达到第二阈值时,在第一修复图像中,对中间模糊像素对应的像素区域进行修复处理,获得第二修复图像;
基于第二修复图像,确定待处理图像对应的目标修复图像。
第二方面,本申请实施例提供一种图像处理装置,该装置包括:
第一处理单元,配置为对获取的目标视频帧图像包含的第一类对象进行掩膜处理,获得掩膜处理后的待处理图像;第一类对象为待修复的图像元素;
第二处理单元,配置为对待处理图像中第一类对象进行修复处理,获得第一修复图像,并基于第一修复图像中的初始模糊区域,生成相应的图像初始掩膜模板;
第三处理单元,配置为当图像初始掩膜模板包含的初始模糊像素的第一数量达到第一阈值时,对初始模糊像素对应的初始模糊区域进行形态学处理,获得图像目标掩膜模板;
第四处理单元,配置为当图像目标掩膜模板包含的中间模糊像素的第二数量达到第二阈值时,在第一修复图像中,对中间模糊像素对应的像素区域进行修复处理,获得第二修复图像;
确定单元,配置为基于第二修复图像,确定待处理图像对应的目标修复图像。
第三方面,本申请实施例提供一种电子设备,包括:存储器和处理器,其中,存储器,用于存储计算机指令;处理器,用于执行计算机指令以实现本申请实施例提供的图像处理方法的步骤。
第四方面,本申请实施例提供一种计算机可读存储介质,计算机可读存储介质存储有计算机指令,计算机指令被处理器执行时实现本申请实施例提供的图像处理方法的步骤。
第五方面,本申请实施例提供一种计算机程序产品,其包括计算机指令,计算机指令存储在计算机可读存储介质中;当电子设备的处理器从计算机可读存储介质读取计算机指令时,处理器执行计算机指令,使得电子设备执行本申请实施例提供的图像处理方法的步骤。
本申请实施例有益效果如下:
本申请实施例中,将图像修复分解三个阶段,在第一阶段对获得的第一修复图像进一步检测,并生成相应的图像初始掩膜模板;在第二阶段当确定图像初始掩膜模板包含的初始模糊像素的第一数量达到第一阈值时,对初始模糊像素对应的模糊区域进行形态学处理以连接不同的模糊区域,获得图像目标掩膜模板,避免了对较小的模糊区域的不必要处理,提高了处理效率;在第三阶段当确定图像目标掩膜模板包含的中间模糊像素的第二数量达到第二阈值时,确定在第一修复图像中存在需要补全的物体轮廓,从而对中间模糊像素对应的像素区域进行修复处理,获得第二修复图像;最后,基于第二修复图像,确定待处理图像对应的目标修复图像。通过上述三个阶段的配合提升了第二修复图像的图像质量,保证目标修复图像的图像质量。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为相关技术中第一种图像处理示意图;
图2为相关技术中第二种图像处理示意图;
图3为本申请实施例提供的一种应用场景示意图;
图4为本申请实施例提供的一种图像处理方法流程图;
图5为本申请实施例提供的一种对第一类对象进行填充处理的示意图;
图6为本申请实施例提供的第一种图像处理示意图;
图7为本申请实施例提供的第二种图像处理示意图;
图8为本申请实施例提供的第三种图像处理示意图;
图9为本申请实施例提供的一种对初始模糊区域进行形态学处理的示意图;
图10为本申请实施例中一种对中间模糊像素对应的像素区域进行修复处理的示意图;
图11为本申请实施例提供的另一种图像处理方法流程图;
图12为本申请实施例提供的一种图像处理具体实施方法流程图;
图13为本申请实施例提供的一种图像处理具体实施方法示意图;
图14为本申请实施例提供的一种信息传播模型的训练方法流程图;
图15为本申请实施例提供的一种图像处理装置结构图;
图16为本申请实施例提供的一种电子设备结构图;
图17为本申请实施例提供的另一种电子设备结构图。
具体实施方式
为了使本申请的目的、技术方案及有益效果更加清楚明白,以下将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请部分实施例,并不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为了便于本领域技术人员更好地理解本申请的技术方案,下面对本申请涉及的部分概念进行介绍。
视频修复(Video Inpainting)是利用视频中未被遮挡的区域信息对被遮掩的区域进行修复的技术,即采用未被遮挡的区域信息对被遮挡的区域进行合理填充。视频修复需要具备两种能力,一种是利用时域信息的能力,该能力将某一帧的可用像素传播到其他帧的相应区域;另一种是生成能力,若其他帧没有可用像素,则需要借助空间和时域上的信息,对相应区域进行像素生成。
视觉识别***(Visual Identity System,VIS)用于预先识别图像中的对象对应的掩膜模板。
掩膜模板:用选定的图像、图形或物体,对待处理的图像的全部或局部进行遮挡,来控制图像处理的区域或处理过程。用于覆盖的特定图像或物体称为掩膜模板。光学图像处理中,掩膜模板可以足胶片、滤光片等。数字图像处理中,掩膜模板为二维矩阵数组,有时也用多值图像。数字图像处理中,图像掩膜模板主要用于:1、提取感兴趣区域,用预先制作的感兴趣区域的掩膜模板与待处理图像相乘,得到感兴趣区图像,感兴趣区内图像值保持不变,而区外图像值都为0;2、屏蔽作用,用掩膜模板对图像上某些区域作屏蔽,使其不参加处理或不参加处理参数的计算,或仅对屏蔽区作处理或统计;3、结构特征提取,用相似性变量或图像匹配方法检测和提取图像中与掩模相似的结构特征;4、特殊形状图像的制作。本申请实施例中,掩膜模板主要用于提取感兴趣区域,掩膜模板可以是二维矩阵数组,二维矩阵数组的行数与待处理图像的高度一致(即待处理图像的行数),列数与待处理图像的宽度一致(即像素的列数),即二维矩阵数组中每个元素用于处理待处理图像中相应位置的像素。掩码模板中与待处理图像的待处理区域(如模糊区域)的对应位置的元素的取值为1,其余位置的取值为0,感兴趣区域的掩 膜模板与待处理图像相乘后,如果二维矩阵数组中某个位置的取值为1,则待处理图像中该位置的像素的数值不变,如果二维矩阵数组中某个位置的取值为1,则待处理图像中该位置的像素的数值不变,从而可以从待处理图像中提取感兴趣区域。
形态学处理:用于从图像中提取对表达和描述区域形状有意义的图像分量,使后续的识别工作能够抓住目标对象最为本质的形状特征。形态学处理中包括但不限于:扩张和腐蚀、开运算和闭运算、灰度图像的形态学。
下文中所用的词语“示例性”的意思为“用作例子、实施例或说明性”。作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。
文中的术语“第一”、“第二”仅用于描述目的,而不能理解为明示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个特征,在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
下面对本申请实施例的设计思想进行简要介绍:
随着科技的发展,越来越多的应用程序支持视频播放,播放的视频是经过处理后的,为了保证视频处理的准确性,而提出视频修复技术,其中,视频修复是对视频中的视频帧图像进行处理。
目前,视频修复技术包括:基于光流的方式和基于神经网络模型的方式。
基于光流的方式包括如下步骤:步骤1、使用相邻帧进行光流估计;步骤2、对遮掩区域进行光流填充;步骤3、应用光流将未遮掩区域的像素梯度传播至遮掩区域;步骤4、对像素梯度进行泊松重建,生成RGB像素;步骤5、若包含图像修复模块,对光流无法填充的区域进行图像修复。
但是基于光流的视频修复方法在背景进行简单移动的情况下,修复效果较好,修复后的图像不存在图像模糊的问题,使用较好的光流估计模块,修复痕迹难以察觉。但是,当背景进行复杂移动时,或出现对象遮挡情况时,基于光流的视频修复方法,修复效果会受到很大影响,且光流估计的错误带来的错误像素会随其传播逐渐扩大,导致修复内容错误。参见图1,图1为相关技术中第一种图像处理示意图。
基于神经网络模型的方式,网络结构多为编码器-解码器结构,需要兼顾帧间一致性以及生成的像素的自然性,接收帧序列信息作为输入,经过网络处理直接输出修复好的帧。
相关技术中,基于神经网络模型的算法,在背景发生复杂运动的情况下,能够实现较好的参考像素传播效果,进行修复,修复效果较好。但目前的神经网络模型为单个模型,单个模型的生成能力有限,对于纹理复杂,对象遮挡的情况,修复效果会有较多模糊的案例,且受限于显存等原因,难以处理过高分辨率的输入。因此,对于纹理复杂,对象遮挡的情况,修复内容模糊。参见图2,图2为相关技术中第二种图像处理示意图。
可见,相关技术中的图像处理方式,受限于光流质量与模型生成质量,无论采取其中任何一种方法,当前都无法做到非常鲁棒的效果。因此,如何在出现对象遮挡、纹理复杂的情况下,保证图像处理的准确性,提升处理后的视频帧图像的图像质量是目前需要解决的技术问题。
有鉴于此,本申请实施例提供一种图像处理方法、装置、设备、存储介质及程序产品,用以保证图像处理的准确性,提升处理后的视频帧的图像质量。
在本申请实施例提供的图像处理方法中,应用神经网络模型完成3种类型的视频修复;分别为:
1、当视频中存在背景发生复杂运动的情况时,基于帧间像素传播模型,对视频帧图像进行修复,这种情况下,第一类对象可以是视频帧中的前景区域;
2、当视频中视频帧图像的纹理复杂时,基于图像修复模型,对视频帧图像中的模糊区域进行修复,这种情况下,第一类对象可以是视频帧中的模糊区域,模糊区域的检测方式参见下文的说明;
3、针对视频中视频帧图像存在对象遮挡的情况,基于对象修复模型,对视频帧图像中对象区域(也即被前景的对象遮挡的背景区域)进行修复。
本申请实施例中,在确定需要对视频帧图像中的第一类对象采用其他元素修复时,即针对视频帧图像中的待修复的图像元素采用其他元素修复时:首先,对获取的目标视频帧图像包含的第一类对象进行掩膜处理,获得掩膜处理后的待处理图像,同时为了确保处理过程中需要保留的第二类对象不受影响,还对视频图像中包含的第二类对象进行识别,确定相应的对象初始掩膜模板;然后,将待处理图像和对象初始掩膜模板,输入已训练的信息传播模型,通过信息传播模型对待处理图像中第一类对象进行修复处理,获得第一修复图像,此时针对待修复的图像元素修复完毕,并检测第一修复图像中的初始模糊区域(初始模糊区域是对待处理图像进行修复后,在得到的第一修复图像中仍旧存在的模糊区域),基于初始模糊区域生成相应的图像初始掩膜模板,以及确定待处理图像中对象目标掩膜模板。
为了保证图像修复过程中,图像处理的准确性,在获得第一修复图像后,本申请实施例中,进一步检测第一修复图像中的初始模糊区域,并生成相应的图像初始掩膜模板;并在确定图像初始掩膜模板包含的初始模糊像素的第一数量达到第一阈值时,对初始模糊像素对应的初始模糊区域进行形态学处理,获得图像目标掩膜模板,以使模糊区域更加的规则;接着,确定图像目标掩膜模板包含的中间模糊像素的第二数量,当第二数量达到第二阈值时,通过图像修复模型,在第一修复图像中,对中间模糊像素对应的像素区域进行修复处理,获得第二修复图像,对第一修复图像中模糊区域进行修复处理,即对第一修复图像中的模糊区域进行增强;最后,确定对象初始掩膜模板与对象目标掩膜模板中第二类对象的轮廓不一致时,通过对象修复模型,在第二修复图像中,对第二修复图像中第二类对象对应的像素区域进行修复处理,获得第三修复图像,实现对遮挡对象区域进行修复处理,即对第二修复图像中的模糊区域进行增强。
其中,上述的初始模糊像素是指图像初始掩膜模板中的像素,中间模糊像素是指图像目标掩膜模板中的像素。
本申请实施例中,实现了对由纹理复杂、对象遮挡情况导致的修复模糊的模糊区域进行修复处理,对模糊区域进行增强处理,提升了目标修复图像的图像质量。
在本申请实施例中,信息传播模型、图像修复模型以及对象修复模型的部分,涉及人工智能(Artificial Intelligence,AI)和机器学习技术,基于人工智能中的语音技术、自然语言处理技术和机器学习(Machine Learning,ML)而设计。
人工智能是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用***。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。
人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能技术主要包括计算机视觉技术、自然语言处理技术、以及机器学习/深度学习等几大方向。随着人工智能技术研究和进步,人工智能在多个领域展开研究和应用,例如常见的智能家居、智能客服、虚拟助理、智能音箱、智能营销、无人驾驶、自动驾驶、机器人、智能医疗等,相信随着技术的发展,人工智能将在更多的领域获得应用,并发挥越来越重要的价值。
机器学习是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复 杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。对比于数据挖掘从大数据之间找相互特性而言,机器学习更加注重算法的设计,让计算机能够自动地从数据中“学习”规律,并利用规律对未知数据进行预测。
机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习等技术。强化学习(Reinforcement Learning,RL),又称再励学习、评价学习或增强学习,是机器学习的范式和方法论之一,用于描述和解决智能体(agent)在与环境的交互过程中通过学习策略以达成回报最大化或实现特定目标的问题。
以下结合说明书附图对本申请的优选实施例进行说明,应当理解,此处所描述的优选实施例仅用于说明和解释本申请,并不用于限定本申请,并且在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
参见图3,图3为本申请实施例的应用场景示意图。该应用场景中包括终端设备310和服务器320,终端设备310与服务器320之间可以通过通信网络进行通信。
在一种可选的实施方式中,通信网络可以是有线网络或无线网络。因此,终端设备310和服务器320可以通过有线或无线通信方式进行直接或间接地连接。比如,终端设备310可以通过无线接入点与服务器320间接地连接,或发终端设备310通过因特网与服务器320直接地连接,本申请在此不做限制。
在本申请实施例中,终端设备310包括但不限于手机、平板电脑、笔记本电脑、台式电脑、电子书阅读器、智能语音交互设备、智能家电、车载终端等设备;终端设备上可以安装有各种客户端,该客户端可以是支持视频编辑、视频播放等功能的应用程序(例如浏览器、游戏软件等),也可以是网页、小程序等;
服务器320是与终端设备310中安装的客户端相对应的后台服务器。服务器320可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式***,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。
需要说明的是,本申请实施例中的图像处理方法可以由电子设备执行,该电子设备可以为服务器320或者终端设备310,即,该方法可以由服务器320或者终端设备310单独执行,也可以由服务器320和终端设备310共同执行。
在终端设备310单独执行时,比如,可由终端设备310获取掩膜处理后的待处理图像,对待处理图像进行修复处理,获得第一修复图像,确定第一修复图像对应的图像初始掩膜模板,在图像初始掩膜模板包含的初始模糊像素的第一数量达到第一阈值时,对图像初始掩膜模板进行处理,获得图像目标掩膜模板,在图像目标掩膜模板包含的中间模糊像素的第二数量达到第二阈值时,对第一修复图像中的模糊位置继续进行修复处理,获得第二修复图像,最后基于第二修复图像,确定待处理图像对应的目标修复图像。
在服务器320单独执行时,比如,可由终端设备310获取视频帧图像,然后将视频帧图像发送给服务器320,服务器320对获取的视频帧图像包含的第一类对象进行掩膜处理,获得掩膜处理后的待处理图像,对待处理图像进行修复处理,获得第一修复图像,确定第一修复图像对应的图像初始掩膜模板,在图像初始掩膜模板包含的初始模糊像素的第一数量达到第一阈值时,对图像初始掩膜模板进行处理,获得图像目标掩膜模板,在图像目标掩膜模板包含的中间模糊像素的第二数量达到第二阈值时,对第一修复图像中的模糊位置继续进行修复处理,获得第二修复图像,最后基于第二修复图像,确定待处理图像对应的目标修复图像。
在服务器320和终端设备310共同执行时,比如,可由终端设备310获得待处理图像,并对待处理图像进行修复处理,获得第一修复图像,然后将第一修复图像发送给服务器320,由服务器320确定第一修复图像对应的图像初始掩膜模板,在图像初始掩膜模板包含的初始模糊像素的第一数量达到第一阈值时,对图像初始掩膜模板进行处理,获得图像目标掩膜模板,在图像目标掩膜模板包含的中间模糊像素的第二数量达到第二阈值时,对第一修复图像中的模糊位置继续进行修复处理,获得第二修复图像,最后基于第二修复图像,确定待处理图像对应的目标修复图像。
需要说明的是,在下文中,主要是以服务器单独执行为例进行举例说明的,在此不做具体限定。
在具体实施中,可以在终端设备310中输入视频帧图像,终端设备310将待视频帧图像发送至服务器320,服务器320可以采用本申请实施例的图像处理方法,确定待处理图像对应的目标修复图像。
需要说明的是,图3所示只是举例说明,实际上终端设备310和服务器320的数量不受限制,在本申请实施例中不做具体限定。
本申请实施例中,当服务器320的数量为多个时,多个服务器320可组成为一区块链,而服务器320为区块链上的节点;如本申请实施例所公开的图像处理方法,其中所涉及的修复处理的处理方式、形态学处理的处理方式等可保存于区块链上。
下面结合上述描述的应用场景,根据附图来描述本申请示例性实施方式提供的图像处理方法,需要注意的是,上述应用场景仅是为了便于理解本申请的精神和原理而示出,本申请的实施方式在此方面不受任何限制。
参见图4,图4为本申请实施例提供的一种图像处理方法流程图,包括如下步骤:
步骤S400,对获取的目标视频帧图像包含的第一类对象进行掩膜处理,获得掩膜处理后的待处理图像;第一类对象为待修复的图像元素。
在对视频修复处理时,首先获取需要进行视频修复的视频序列x={xt}(t=0,1,2,…,T),以及相应的掩膜模板序列m={mt}(t=0,1,2,…,T),其中,xt表示需要进行视频修复的视频帧图像,即处理前的视频帧图像,mt表示与视频帧图像对应的掩膜模板;掩膜模板用于指示待修复的图像元素,即通过掩膜模板可确定第一类对象对应的掩膜区域。
然后,基于掩膜模板中的掩膜区域,对相应视频帧图像进行掩膜处理,获得掩膜处理后的待处理图像xmt;掩膜处理为xmt=xt·(1-mt),其中,掩膜模板mt一般为二值矩阵,“·”是逐元素相乘。因此,待处理图像中包含基于掩膜区域确定的需要进行视频修复的修复区域;需要说明的是,掩膜区域即修复区域。
图像处理主要包括对待处理图像的修复区域进行修复处理,即对视频帧图像xt中的掩膜区域进行修复处理,获得处理后的视频序列y={yt}(t=0,1,2,…,T),其中,yt表示修复处理后的视频帧图像。
为了保证修复处理后的视频帧图像yt相比于修复处理前的视频帧图像xt,仅在掩膜区域的图像内容不同,其他区域的图像内容在时间和空间上是自然且一致的。本申请实施例中,首先,对待处理图像中的修复区域进行修复处理,获得第一修复图像;然后,对第一修复图像进行检测,以判断在第一修复图像中除修复区域外,其他区域的图像内容均与修复处理前的视频帧图像或修复处理前的待处理图像的图像内容是否相同,并判断是否需要对第一填充图像进一步填充,以获得除修复区域外,其他区域的图像内容与修复处理前的视频帧图像或修复处理前的待处理图像一致的目标填充图像。
步骤S401,对待处理图像中第一类对象进行修复处理,获得第一修复图像,并基于第一修复图像中的初始模糊区域,生成相应的图像初始掩膜模板。
其中,基于第一修复图像中的初始模糊区域,生成相应的图像初始掩膜模板包括:生成包含初始模糊区域的图像初始掩膜模板,即图像初始掩膜模板为初始模糊区域的掩膜模板,图像初始模板可以是一个二维矩阵数组,二维矩阵数组的行数与第一修复图像的高度(即第一修复图像的行数)一致,列数与第一修复图像的宽度(即第一修复图像的像素的列数)一致,二维矩阵数组中每个元素用于处理第一修复图像中相应位置的像素。图像初始掩膜模板中与待处理图像的初始模糊区域对应位置的元素的取值为1,其余位置的取值为0,图像初始掩膜模板与第一修复图像相乘后,如果二维矩阵数组中某个位置的取值为1,则第一修复图像中该位置的像素的数值不变,如果二维矩阵数组中某个位置的取值为1,则第一修复图像中该位置的像素的数值不变,从而图像初始掩膜模板可以用于从第一修复图像中提取初始模糊区域。
在一种可能的实现方式中,首先,将包含待处理图像的视频序列xm={xmt}(t=0,1,2,…,T),输入以训练的信息传播模型FT;接着,通过已训练的信息传播模型FT,对待处理图像中第一类对象进行修复处理,获得第一修复图像并基于第一修复图像中的初始模糊区域,生成相应的图像初始掩膜模板mblur;最后,通过已训练的信息传播模型FT,输出第一修复图像和图像初始掩膜模板mblur,其中图像初始掩膜模板mblur指示了第一修复图像中修复效果不好的区域,即第一修复图像中模糊的区域。
通过已训练的信息传播模型FT,对待处理图像中第一类对象进行填充处理时:首先,将包含待处理图像的视频序列,输入已训练的信息传播模型FT;然后,在已训练的信息传播模型FT中,参照时域信息和空域信息,基于视频序列中包含的其他视频帧图像中的像素,对待处理图像中第一类对象进行修复处理;具体的,在包含待处理图像的相邻两帧或多帧视频帧图像中,采用其他视频帧图像中的第一像素,对待处理图像中的第二像素进行填充,其中,其他视频帧中的第一像素与待处理图像中的第二像素在视频帧图像中的位置相同。参见图5,图5为本申请实施例中一种对第一类对象进行填充处理的示意图。
基于第一修复图像中的初始模糊区域,生成相应的图像初始掩膜模板mblur,可以通过以下方式实现:
首先,按照第一修复图像的尺寸,将第一修复图像划分为多个像素块;例如第一修复图像的尺寸为7cm*7cm,那每个像素块的尺寸可以为0.7cm*0.7cm,需要说明的是,将第一修复图像划分为多个像素块的方式仅是举例说明,并不是唯一方式;
接着,确定每个像素块的分辨率,基于第一填充图像中每个像素块的分辨率,确定像素块,并将像素块作为初始模糊区域;具体的,由于分辨率越高,图像越清晰,图像质量越好,因此,本申请实施例中图像质量可以设置为分辨率阈值,当一个像素块的分辨率低于该分辨率阈值时,将该像素块作为初始模糊区域;
最后,基于初始模糊区域,对初始模糊区域进行掩膜处理,获得相应的图像初始掩膜模板mblur
在本申请实施例中,第一类对象包括但不限于:logo移除、字幕移除、物体移除等;其中,物体可以是运动的人或物,也可以为静止的人或物。
比如,基于某平台网站的视频制作一个视频片段,但是由于从某平台上获取的视频中带有台标,影响观感,此时第一类对象为台标,并且可通过本申请实施例提供的图像处理技术将台标从视频的视频帧图像中移除,参见图6,图6为本申请实施例提供的一种图像处理示意图。
类似地,可将字幕从视频帧图像中移除,参见图7,图7为本申请实施例提供的一种图像处理示意图;或将某些运动对象,如路人、交通工具等从视频帧图像中移除,参见图8,图8为本申请实施例提供的一种图像处理示意图。
步骤S402,当图像初始掩膜模板包含的初始模糊像素的第一数量达到第一阈值时,对初始模糊像素对应的初始模糊区域进行形态学处理,获得图像目标掩膜模板。
由于,图像初始掩膜模板是基于像素块确定的,且每个像素块都有自身对应的分辨率,其中,分辨率表示该像素块水平方向和垂直方向的像素个数;因此,基于每个像素块的分辨率,确定该像素块中包含的像素数量,并将图像初始掩膜模板中包含的所有像素块中包含的像素数量相加,获得图像初始掩膜模板包含的初始模糊像素的第一数量。
具体的,一个像素块的像素数量=水平方向的像素个数*垂直方向的像素个数。
在一种可能的实现方式中,当图像初始掩膜模板包含的初始模糊像素的第一数量达到第一阈值时,说明第一修复图像中像素块较多。
但是,当第一修复图像中像素块比较分散,即初始模糊区域不集中时,即使在第一修复图像中像素块较多的情况下,第一修复图像中也不能够明显的显示出图像模糊的模糊区域,此时确定第一修复图像的修复效果达标,无需对第一修复图像进行修复处理,减少计算量。
因此,为了保证修复图像的准确性,以及减少计算量;对第一修复图像进行验证,以确定第一修复图像的修复效果是否达标是非常必要的。在此基础上,本申请实施例中,在图像初始掩膜模板中,将对初始模糊像素对应的初始模糊区域进行形态学处理,获得图像目标掩膜模板,以使第一修复图像中的初始模糊区域连接,且模糊区域更加的规则。
在一种可能的实现方式中,在图像初始掩膜模板中,对初始模糊像素对应的初始模糊区域进行形态学处理,获得图像目标掩膜模板,可以通过以下方式实现:采用扩张fdilate操作和腐蚀fdilate操作,对多个初始模糊区域mblur进行先扩张后腐蚀的操作,使多个分散的初始模糊区域连接起来,并获得图像目标掩膜模板,图像目标掩膜模板为
参见图9,图9为本申请实施例提供的一种对初始模糊区域进行形态学处理的示意图。设第一修复图像中包括多个初始模糊区域,分别为A1~A8;此时,首先对初始模糊区域A1~A8分别按照设定扩张比例进行扩张,获得扩张后的初始模糊区域B1~B8,比如,将初始模糊区域A1~A8扩大10倍;然后,判断扩张后的初始模糊区域B1~B8中是否存在重叠,并将存在重叠的区域进行合并,获得至少一个合并区域;最后,将合并区域按照收缩比例进行腐蚀,获得中间模糊区域,收缩比例是基于扩张比例确定的,在扩张比例为10时,收缩比例为1/10。
图像腐蚀的原理如下:假设图像中的前景物体为1,背景为0,假设原图像中有一个前景物体,那么用一个结构元素去腐蚀原图的过程是这样的:遍历原图像的每一个像素,然后用结构元素的中心点对准当前正在遍历的这个像素,然后取当前结构元素所覆盖下的原图对应区域内的所有像素的最小值,用这个最小值替换当前像素值。由于二值图像最小值就是0,所以就是用0替换,即变成了黑色背景。从而也可以看出,如果当前结构元素覆盖下,全部都是背景,那么就不会对原图做出改动,因为都是0,如果全部都是前景像素,也不会对原图做出改动,因为都是1。只有结构元素位于前景物体边缘的时候,它覆盖的区域内才会出现0和1两种不同的像素值,这个时候把当前像素替换成0就有变化了。因此腐蚀看起来的效果就是让前景物体缩小了一圈。对于前景物体中一些细小的连接处,如果结构元素大小相等,这些连接处就会被断开。
此时,将分散的初始模糊区域连接起来,生成一个中间模糊区域,每个中间模糊区域相比较初始模糊区域而言,等于或大于初始模糊区域;在中间模糊区域比较大(例如宽高大于相应的宽高阈值)时,在第一修复图像中能够明显的显示出图像模糊的模糊区域,此时说明第一修复图像的修复效果不好,需要对第一修复图像进行修复处理。因此,基于图像目标掩膜模板确定是否对第一修复图像进行修复处理,在保证修复效果的同时, 减少计算量。
在另一种可能的实现方式中,当图像初始掩膜模板包含的初始模糊像素的第一数量小于第一阈值时,说明第一修复图像中像素块减少,第一修复图像中不能够明显的显示出图像模糊的模糊区域,此时确定第一修复图像的修复效果较好,并将第一修复图像作为待处理图像对应的目标修复图像,无需执行对初始模糊像素对应的模糊区域进行形态学处理,以及无需执行对第一修复图像继续处理等步骤,以减少计算流程,提升图像处理效率。
步骤S403,当图像目标掩膜模板包含的中间模糊像素的第二数量达到第二阈值时,在第一修复图像中,对中间模糊像素对应的像素区域进行修复处理,获得第二修复图像。
由于,图像目标掩膜模板中已将分散的初始模糊区域连接起来,因此,当图像目标掩膜模板包含的中间模糊像素的第二数量达到第二阈值时,说明第一修复图像中能够明显的显示出图像模糊的模糊区域,确定第一修复图像的修复效果不好,此时为了保证图像处理的准确性,需要对第一修复图像中,中间模糊像素对应的像素区域进行修复处理。
在一种可能的实现方式中,在第一修复图像中,对中间模糊像素对应的像素区域进行修复处理,可以通过以下方式实现:
首先,将第一修复图像和图像目标掩膜模板,输入已训练的图像修复模型FI
接着,在已训练的图像修复模型FI中,在第一修复图像中,基于图像目标掩膜模板为对中间模糊像素对应的像素区域进行修复处理,获得第二修复图像;记已训练的图像修复模型的修复处理过程为:
其中,xblurcomp表示第二修复图像。
像素区域是通过以下方式确定的:根据中间模糊像素在目标掩膜模板的位置,在第一修复图像确定相同位置的区域,以作为像素区域。中间模糊像素对应的像素区域一般为无参考区域,或运动物体区域。
在本申请实施例中,已训练的图像修复模型FI可以为潜在扩散模型(Latent Diffusion Models,LDM)或大面具上色模型(Large Mask Inpainting,LaMa)等用于模糊区域的图像生成工具。
其中,LDM模型是一项高分辨率图像合成训练工具,在图像修复和各种任务(例如:无条件图像生成、语义场景合成和超分辨率)上实现了高度竞争的性能;
LaMa模型是一种图像生成工具,可以很好的泛化到更高的分辨率图像。
下面,以使用LaMa模型为例,对在第一修复图像中,对中间模糊像素对应的像素区域进行修复处理进行说明。
在使用LaMa模型,在第一修复图像中,对中间模糊像素对应的像素区域进行修复处理,可以通过以下方式实现:首先,将3通道的第一修复图像和1通道的图像目标掩膜模板输入LaMa模型;其次,在LaMa模型中,将图像目标掩膜模板取反,并和第一修复图像相乘,得到带有掩膜区域的第一彩色图像;接着,将第一彩色图片和图像模板掩膜模板进行叠加,得到一个4通道的图像;然后,对该4通道的图像进行下采样操作后,经过快速傅里叶卷积(Fast Fourier Convolutions,FFC)处理,并将快速傅里叶卷积处理后的图像进行上采样处理,获得第二修复图像;其中,在快速傅里叶卷积的处理过程中,会将输入图像基于通道分为2部分,且这2部分分别经过2个不同的分支。一个分支负责提取局部信息,称为局部分支。另一个分支负责提取全局信息,称为全局分支。在全局分支中会使用快速傅里叶卷积提取全局特征。最后将局部信息和全局信息进行交叉融合,再基于通道进行拼接,得到最终的第二修复图像。参见图10,图10为本申请实施例中一种对中间模糊像素对应的像素区域进行修复处理的示意图。
在本申请实施例中,快速傅里叶卷积使得LaMa模型即使在浅层也可以获得整个图像的感受野。快速傅里叶卷积不仅提升了LaMa模型的修复质量,还降低了LaMa模型的参数量。同时快速傅里叶卷积中的偏置使得LaMa模型具有更好的泛化性,可以使用低分辩率图像产生高分辨率图像的修复结果,使用快速傅里叶卷积时,可在空间域和频域中同时工作,并不需要回到前面的层来理解图像的上下文。
需要说明的是,第一阈值和第二阈值可以相同也可以不同,确定中间模糊像素的第二数量的方式与确定初始模糊像素的第一数量的方式类似,在此不再重复赘述。
在另一种可能的实现方式中,当图像目标掩膜模板包含的中间模糊像素的第二数量小于第二阈值时,说明第一修复图像中像素块减少,第一修复图像中不能够明显的显示出图像模糊的模糊区域,第一修复图像的修复效果较好,此时将第一修复图像作为待处理图像对应的目标修复图像,无需继续第一修复图像中模糊区域进行修复处理,以减少计算流程,提升图像处理效率。
步骤S404,基于第二修复图像,确定待处理图像对应的目标修复图像。
在本申请实施例中,对待处理图像中第一类对象进行修复处理,获得第一修复图像,针对待修复的图像元素修复完毕后,为了保证图像修复过程中,图像处理的准确性,进一步检测第一修复图像中初始模糊区域,并生成相应的图像初始掩膜模板;并在确定图像初始掩膜模板包含的初始模糊像素的第一数量达到第一阈值时,对初始模糊像素对应的模糊区域进行形态学处理,获得图像目标掩膜模板,以使将分散的初始模糊区域连接起来,使模糊区域更加规则;接着,确定图像目标掩膜模板包含的中间模糊像素的第二数量达到第二阈值时,在第一修复图像中,对中间模糊像素对应的像素区域进行修复处理,获得第二修复图像;最后,基于第二修复图像,确定待处理图像对应的目标修复图像。对第一修复图像中模糊区域进行修复处理,即对第一修复图像中的模糊区域进行增强处理;且由于对第一修复图像中的模糊区域进行了增强处理,以获得第二修复图像,提升了第二修复图像的图像质量,因此进一步保证了目标修复图像的图像质量。
在上述步骤S404中,基于第二修复图像,确定待处理图像对应的目标修复图像时,可将第二修复图像作为目标修复图像,或将对第二修复图像进行修复处理后获得的第三修复图像作为目标修复图像。
具体的,是将第二修复图像作为目标修复图像,还是将第三填充图像作为目标修复图像,是基于对象初始掩膜模板与对象目标掩膜模板中第二类对象的轮廓是否一致确定的。
其中,对象目标掩膜模板是通过如下方式确定的:
首先,将对象初始掩膜模板mobj,输入已训练的信息传播模型FT;接着,在已训练的信息传播模型FT中,基于已训练的信息传播模型FT的物体补全能力,对对象初始掩膜模板中的第二类对象进行对象轮廓补全处理,获得对象目标掩膜模板其中,对象初始掩膜模板是对视频帧图像中包含的第二类对象进行识别后确定的,第二类对象为需保留的图像元素。
在一种可能的实现方式中,通过视觉识别模型FVIS(Visual Identity System,VIS),确定视频帧图像中第二类对象对应的对象初始掩膜模板mobj;记通过视觉识别模型FVIS,确定对象初始掩膜模板mobj的过程为:
mobj=FVIS(xm)
其中,xm为视频帧图像。
在另一中可能的实现方式中,通过视觉识别模型FVIS(Visual Identity System,VIS),确定待处理图像中第二类对象对应的对象初始掩膜模板mobj
其中,视觉识别模型是基于存在掩膜模板的图像进行训练获得的。
在本申请实施例中,首先将对象初始掩膜模板与对象目标掩膜模板进行对比,获得第一对比结果,其中,第一对比结果用于表征第二类对象的轮廓是否一致;接着,基于第一对比结果,对第二修复图像进行处理,获得目标修复图像。
在将对象初始掩膜模板与对象目标掩膜模板进行对比时,可将对象初始掩膜模板与对象目标掩膜模板完全重合,确定对象初始掩膜模板中的第二类对象掩膜区域与目标掩膜模板中的第二类对象掩膜区域是否完全重合,若完全重合,则确定第一对比结果用于表征第二类对象的轮廓一致,否则确定第一对比结果用于表征第二类对象的轮廓不一致。
在将对象初始掩膜模板与对象目标掩膜模板进行对比时,确定对象初始掩膜模板中第二类对象掩膜区域的第三像素数量,以及对象目标掩膜模板中第二类对象掩膜区域的第四像素数量,并基于第三像素数量和第四像素数量的差值,确定第一对比结果,其中,第三像素数量和第四像素数量的差值表征了对象初始掩膜模板和对象目标掩膜模板中第二类对象掩膜区域的差异。
基于第三像素数量和第四像素数量的差值,确定对比结果时,若第三像素数量和第四像素数量的差值小于阈值时,则确定第一对比结果用于表征第二类对象的轮廓一致,否则确定第一对比结果用于表征第二类对象的轮廓不一致。
在一种可能的实现方式中,当第一对比结果表征第二类对象轮廓一致时,将第二修复图像作为目标修复图像。
在另一种可能的实现方式中,当第一对比结果表征第二类对象轮廓不一致,对第二修复图像进行处理,获得目标修复图像可以通过以下方式实现:
首先,将第二修复图像和对象目标掩膜模板,输入已训练的对象修复模型Fobj
接着,在通过已训练的对象修复模型Fobj中,在第二修复图像xblurcomp中,基于对象目标掩膜模板对第二类对象对应的像素区域进行修复处理,获得第三修复图像,并将第三修复图像作为目标修复图像;记已训练的对象修复模型Fobj的修复处理过程为:
xobjcomp=Fobj(xobjremain,mobj)
其中,xobjcomp表示修复后的第三修复图像,xobjremain表示待处理图像中可见像素部分,且xobjremain=xmt·mobj,即包括第一类对象掩膜区域和第二类对象掩膜区域的彩色图像。
在本申请实施例中,已训练的对象修复模型可以使用任意的用于图像修复的模型,例如用于视频修复的时空联合模型(Spatial-Temporal Transformations for Video Inpainting,STTN);在使用对象修复模型,在第一修复图像中,对第二类对象对应的像素区域进行修复处理时,基于Transformations的自注意力特性,使用可见像素部分对第二类对象对应的像素区域进行修复处理。
参见图11,图11为本申请实施例中另一种图像处理方法流程图,包括如下步骤:
步骤S1100,对获取的目标视频帧图像包含的第一类对象进行掩膜处理,获得掩膜处理后的待处理图像;第一类对象为待修复的图像元素。
步骤S1101,对获取的视频帧图像包含的第二类对象进行识别,基于识别结果确定对象初始掩膜模板。
步骤S1102,对待处理图像中第一类对象进行修复处理,获得第一修复图像,并基于第一修复图像中初始模糊区域,生成相应的图像初始掩膜模板。
步骤S1103,对对象初始掩膜模板中的第二类对象进行对象轮廓补全处理,获得对象目标掩膜模板。
步骤S1104,当图像初始掩膜模板包含的初始模糊像素的第一数量达到第一阈值时,对初始模糊像素对应的模糊区域进行形态学处理,获得图像目标掩膜模板。
步骤S1105,当图像目标掩膜模板包含的中间模糊像素的第二数量达到第二阈值时,在第一修复图像中,对中间模糊像素对应的像素区域进行修复处理,获得第二修复图像。
步骤S1106,将对象初始掩膜模板与对象目标掩膜模板进行对比,判断第二类对象的轮廓是否一致,若是则执行步骤S1107,否则执行步骤S1108。
步骤S1107,将第二修复图像作为目标修复图像。
步骤S1108,在第二修复图像中,对第二类对象对应的像素区域进行修复处理,获得第三修复图像,并将第三修复图像作为目标修复图像。
参见图12,图12示例性提供本申请实施例中一种图像处理的具体实施方法流程图,包括如下步骤:
步骤S1200,对获取的目标视频帧图像包含的第一类对象进行掩膜处理,获得掩膜处理后的待处理图像,第一类对象为待修复的图像元素。
步骤S1201,通过视觉识别模型,对获取的目标视频帧图像包含的第二类对象进行识别,并基于识别结果确定第二类对象的对象初始掩膜模板。
步骤S1202,将包含待处理图像的视频序列和包含待处理图像的对象初始掩膜模板的掩膜模板序列,输入已训练的信息传播模型,通过已训练的信息传播模型获得第一修复图像、图像初始掩膜模板以及对象目标掩膜模板。
即,已训练的信息传播模型对应两个输入参数,分别为:
第一输入参数:
xm={xmt}(t=0,1,2,…,T),其中,xmt=xt·(1-mt);
第一输入参数为包含待处理图像的视频序列,该视频序列中的每一帧图像均可以为待处理图像xmt
第二输入参数:
mobj=FVIS(xm),其中,
第一输入参数为包含待处理图像的对象初始掩膜模板的掩膜模板序列,该掩膜模板序列中的每一掩膜模板均可以为与相应的待处理图像对应的对象初始掩膜模板;例如:为xm1的对象初始掩膜模板;
将已训练的信息传播模型记为FT,记修复完成的第一修复图像为xtcomp,记对象目标掩膜模板为记图像初始掩膜模板为mblur,则有:
步骤S1203,判断图像初始掩膜模板包含的初始模糊像素的第一数量是否到达第一阈值,若是则执行步骤S1204,否则执行步骤S1210。
步骤S1204,对初始模糊像素对应的模糊区域进行形态学处理,获得图像目标掩膜模板。
步骤S1205,判断图像目标掩膜模板包含的中间模糊像素的第二数量是否到达第二阈值,若是则执行步骤S1206,否则执行步骤S1210。
步骤S1206,将图像目标掩膜模板和第一修复图像,输入已训练的图像修复模型,通过已训练的图像修复模型,获得第二修复图像。
步骤S1207,判断对象初始掩膜模板与对象目标掩膜模板包含的第二类对象的轮廓是否一致,若是则执行步骤S1211,否则执行步骤S1208。
步骤S1208,将第二修复图像和对象目标掩膜模板,输入已训练的对象修复模型,通过已训练的对象修复模型,获得第三修复图像。
步骤S1209,将第三修复图像作为待处理图像对应的目标修复图像。
步骤S1210,将第一修复图像作为待处理图像对应的目标修复图像。
步骤S1211,将第二修复图像作为待处理图像对应的目标修复图像。
参见图13,图13与图12相对应,图13提供了本申请实施例中一种图像处理具体实施方法示意图。
从图13中可知,根据使用的模型,将图像处理过程分为三个阶段,下面对三个阶段进行详细说明。
阶段一:将待处理图像和对象初始掩膜模板,输入已训练的信息传播模型;在已训练的信息传播模型中,基于帧间参考信息,使用与待处理图像连续的其他视频帧图像中相应区域的可用像素,对待处理图像进行帧间参考信息修复,该已训练的信息传播模型同时具备一定的图像生成能力,通过图像生成能力对其他视频帧图像中没有可用像素的像素部分,借助空间和时域上的信息,进行像素生成,以完成图像修复,获得第一修复图像;同时,该已训练的信息传播模型还具备物体补全能力,通过物体补全能力,对待处理图像中的第二类对象进行轮廓补全处理,获得对象目标掩膜模板;且已训练的信息传播模型还可以基于修复完成后的图像,确定初始模糊区域对应的图像初始掩膜模板;最后,阶段一中的已训练的信息传播模型同时输出第一修复图像、第一修复图像中修复结果模糊的初始模糊区域对应的图像初始掩膜模板,以及对象目标掩膜模板。
阶段二:首先确定图像初始掩膜模板中初始模糊区域的初始模糊像素的第一数量,接着判断该第一数量是否大于第一阈值,若初始模糊区域中初始模糊像素的第一数量少于第一阈值,则忽略该初始模糊区域,将第一修复图像作为目标修复图像输出,不进行后续处理;若初始模糊区域中初始模糊像素的第一数量达到第一阈值,则应用扩张和腐蚀操作将分散的初始模糊区域连接起来,获得处理后的图像目标掩膜模板,在获得图像目标掩膜模板后,确定图像目标掩膜模板中模糊区域的中间模糊像素的第二数量,接着判断该第二数量是否大于第二阈值,若中间模糊像素的第二数量少于第二阈值,则忽略该模糊区域,将第一修复图像作为目标修复图像输出,不进行后续处理;若中间模糊像素的第二数量达到第二阈值,则调用图像修复模型,基于处理后的图像目标掩膜模板,在第一修复图像上,对图像目标掩膜模板中的模糊区域的像素位置进行修复。
阶段三:在阶段二的基础上进行,若对象目标掩膜模板相对对象初始掩膜模板在第二类对象的掩膜区域内改变的像素个数小于第三阈值,认为第二类对象的掩膜区域内没有需要补全的物体轮廓,将第二修复图像作为目标修复图像;若对象目标掩膜模板相对对象初始掩膜模板在第二类对象的掩膜区域内改变的像素个数达到第三阈值,则调用对象修复模型,对第二类对象的掩膜区域内的像素进行修复,覆盖图像修复模块的修复内容,获得第三修复图像,并将第三修复图像作为目标修复图像。
在本申请中,通过已训练的信息传播模型,基于待处理图像和对象初始掩膜模板,确定第一修复图像、图像初始掩膜模板,以及对象目标掩膜模板,基于已训练的信息传播模型,实现参考像素传播,使对背景发生复杂运动的图像内容进行较好修复。针对待修复的图像元素修复完毕,获得第一修复图像后,为了保证图像修复过程中,图像处理的准确性,在确定图像初始掩膜模板包含的初始模糊像素的第一数量达到第一阈值时,对初始模糊像素对应的模糊区域进行形态学处理,获得图像目标掩膜模板,以使分散的初始模糊区域连接,以及模糊区域更加的规则,提升判断的准确性;接着,确定图像目标掩膜模板包含的中间模糊像素的第二数量,当第二数量达到第二阈值时,通过图像修复模型,在第一修复图像中,对中间模糊像素对应的像素区域进行修复处理,获得第二修复图像,实现对第一修复图像中模糊区域进行修复处理,即对第一修复图像中的模糊区域进行增强;最后,确定对象初始掩膜模板与对象目标掩膜模板中第二类对象的轮廓不一致时,通过对象修复模型,在第二修复图像中,对第二类对象对应的像素区域进行修复处理,获得第三修复图像,实现对遮挡对象区域进行修复处理,即对第二修复图像中的模糊区域进行增强。实现了对由纹理复杂、对象遮挡情况导致的修复模糊的模糊区 域进行修复处理,对模糊区域进行增强处理,提升了目标修复图像的图像质量。
在本申请实施例中,对待处理图像进行图像处理的过程中,涉及了已训练的信息传播模型、已训练的图像修复模型以及已训练的对象修复模型,而模型在使用之前,为了保证模型输出的准确性,需要进行模型训练。下面,对模型训练的过程进行详细说明。
一、信息传播模型。
在本申请实施例中,已训练的信息传播模型是根据训练样本数据集中的训练样本,对待训练的信息传播模型执行循环迭代训练后获得的。
下面以一次循环迭代过程为例,对待训练的信息传播模型的训练过程进行介绍。
参见图14,图14为本申请实施例中一种信息传播模型的训练方法,包括如下步骤:
步骤S1400,获取训练样本数据集,训练样本数据集中包括至少一组训练样本,每组训练样本中包括:针对待修复的图像元素进行掩膜处理后的历史图像及相应的实际修复图像,以及历史图像中需保留的图像元素对应的对象历史掩膜模板及相应的对象实际掩膜模板。
步骤S1401,从训练样本数据集中选取训练样本,并将训练样本输入待训练的信息传播模型。
步骤S1402,通过待训练的信息传播模型,预测历史图像对应的预测修复图像,并基于预测修复图像中的预测模糊区域,生成图像预测掩膜模板,以及对象历史掩膜模板对应的对象预测掩膜模板。
步骤S1403,基于预测修复图像和实际修复图像构建第一类损失函数,基于图像预测掩膜模板和图像中间掩膜模板构建第二类损失函数,以及基于对象预测掩膜模板和对象实际掩膜模板构建第三类损失函数,其中,图像中间掩膜模板,是基于预测修复图像和实际修复图像确定的。
在一种可能的实现方式中,第一类损失函数是通过如下方式确定的:
基于预测修复图像和实际修复图像之间的图像差异像素值,确定第一子损失函数;即,使用L1损失构建第一子损失函数,第一子损失函数记为
基于预测修复图像和实际修复图像的第二对比结果,确定第二子损失函数,其中,第二对比结果用于表征预测修复图像和实际修复图像是否一致;即,使用对抗损失Lgen构建第二子损失函数,第二子损失函数记为
基于第一子损失函数和第二子损失函数,确定第一类损失函数。
在一种可能的实现方式中,第二类损失函数是通过如下方式确定的:
基于图像预测掩膜模板和图像中间掩膜模板之间的掩膜差异像素值,确定第三子损失函数,并将第三子损失函数作为第二类损失函数;其中,图像预测掩膜模板是预测修复图像中的预测模糊区域的像素数量大于设定阈值时获得的。
即,使用L1损失构建第三子损失函数,第三子损失函数记为
其中,c为RGB3个通道,H*W表示一个H*W大小的矩阵,记为dt的预测值,dt为预测修复图像与实际修复图像之间的实际差别,即预测修复图像相比与实际修复图像中实际模糊区域的像素数量,且 表示预测修复图像,yt为实际修复图像。
在一种可能的实现方式中,第三类损失函数是通过如下方式确定的:
基于对象预测掩膜模板和历史对象实际掩膜模板之间的对象差异像素值,确定第四子损失函数;即,使用L1损失构建第四子损失函数,第四子损失函数记为 其中表示对象预测掩膜模板,表示历史对象实际掩膜模板;
基于对象预测掩膜模板和历史对象实际掩膜模板之间的相似度,确定第五子损失函数;即,使用dice损失Ldice构建第五子损失函数,第五子损失函数记为 其中表示对象预测掩膜模板,表示历史对象实际掩膜模板;
基于第四子损失函数和第五子损失函数,确定第三类损失函数。
步骤S1404,基于第一类损失函数、第二类损失函数,以及第三类损失函数构建目标损失函数。
目标损失函数为:
步骤S1405,基于目标损失函数,对待训练的信息传播模型进行参数调整。
二、图像修复模型。
在本申请实施例中,图像修复模型选用潜在扩散模型(Latent Diffusion Models,LDM)或大面具上色模型(Large Mask Inpainting,LaMa)等用于模糊区域的图像生成工具。
在对LDM模型进行训练时,将原始图像、原始图像对应的图像掩膜模板、引导文本以及目标图像输入到待训练的LDM模型,并在该LDM模型中基于引导文本反复混合前景部分和背景部分,获得预测图像;并基于预测图像和原始图像构建损失函数,基于损失函数对待训练的LDM模型进行参数调整;其中,前景部分为需要进行修复的部分,背景部分为原始图像中除需要修复的部分以外的其他部分;目标图像为对原始图像进行图像修复后,达到修复标准的图像。
在对LaMa模型进行训练时,将原始图像、原始图像对应的图像掩膜模板,以及目标图像输入到待训练的LaMa模型,并在该LaMa模型中,将包含图像掩膜的原始图像,以及原始图像的图像掩膜进行叠加,得到一个4通道的图像,对该4通道的图像进行下采样操作后,经过快速傅里叶卷积处理,并将快速傅里叶处理后,进行上采样操作,得到预测图像;并基于原始图像和预测图像构建对抗损失,和感受野感知损失构建损失函数,并基于损失函数对待训练的LaMa模型进行参数调整;其中,感受野是卷积神经网络经每一层输出的特征图上在原始图像上映射的区域大小。
三、对象修复模型。
在本申请实施例中,对象修复模型选用使用tranformer作为网络结构的模型,例如STTN。
在对象修复模型进行训练时,将原始图像、包含掩膜区域的原始图像输入到待训练的对象修复模型,并在该对象修复模型中通过自注意力同时填充所有输入图像中的掩膜区域,获得预测图像;并基于预测图像和原始图像构建损失函数,基于损失函数对待训练的对象修复模型进行参数调整;其中,训练过程中的损失函数使用L1损失以及对抗损失Lgen
需要说明的是,本申请实施例中涉及到的模型可以单独训练,也可进行模型联合训练。
在本申请实施例中,提出了对信息传播模型、图像修复模型、对象修复模型的训练方式,以表征信息传播模型、图像修复模型、对象修复模型输出结果的准确性,进一步本申请实施例中在图像处理过程中,使用模型处理时,图像处理的准确性,提高处理后的视频帧图像的图像质量。
与本申请实施例基于同一发明构思,本申请实施例还提供了一种数图像处理装置, 装置解决问题的原理与上述实施例的方法相似,因此装置的实施可以参见上述方法的实施,重复之处不再赘述。
参见图15,图15示例性提供本申请实施例提供一种图像处理装置1500,该图像处理装置1500包括:
第一处理单元1501,配置为对获取的目标视频帧图像包含的第一类对象进行掩膜处理,获得掩膜处理后的待处理图像;第一类对象为待修复的图像元素;第二处理单元1502,配置为对待处理图像中第一类对象进行修复处理,获得第一修复图像,并基于第一修复图像中初始模糊区域,生成相应的图像初始掩膜模板;第三处理单元1503,配置为当图像初始掩膜模板包含的初始模糊像素的第一数量达到第一阈值时,对初始模糊像素对应的初始模糊区域进行形态学处理,获得图像目标掩膜模板;第四处理单元1504,配置为当图像目标掩膜模板包含的中间模糊像素的第二数量达到第二阈值时,在第一修复图像中,对中间模糊像素对应的像素区域进行修复处理,获得第二修复图像;确定单元1505,配置为基于第二修复图像,确定待处理图像对应的目标修复图像。
在一种可能的实现方式中,第二处理单元1502具体配置为:将包含待处理图像的视频序列,输入已训练的信息传播模型;在已训练的信息传播模型中,基于视频序列中其他视频帧图像中的图像元素,对待处理图像中第一类对象进行修复处理,获得第一修复图像,并基于第一修复图像中初始模糊区域,生成相应的图像初始掩膜模板。
在一种可能的实现方式中,第二处理单元1502具体配置为:将对象初始掩膜模板,输入已训练的信息传播模型,其中,对象初始掩膜模板是对视频帧图像中包含的第二类对象进行识别后确定的,第二类对象为需保留的图像元素;在已训练的信息传播模型中,对对象初始掩膜模板中的第二类对象进行对象轮廓补全处理,获得对象目标掩膜模板。
在一种可能的实现方式中,确定单元1505具体配置为:将对象初始掩膜模板与对象目标掩膜模板进行对比,获得第一对比结果,其中,第一对比结果用于表征第二类对象的轮廓是否一致;基于第一对比结果,对第二修复图像进行处理,获得目标修复图像。
在一种可能的实现方式中,确定单元1505具体配置为:若第一对比结果表征第二类对象轮廓不一致,则在第二修复图像中,对第二类对象对应的像素区域进行修复处理,获得第三修复图像,并将第三修复图像作为目标修复图像;若第一对比结果表征第二对象轮对一致,则将第二修复图像作为目标修复图像。
在一种可能的实现方式中,已训练的信息传播模型是通过如下方式训练获得的:根据训练样本数据集中的训练样本,对待训练的信息传播模型执行循环迭代训练,获得已训练的信息传播模型,其中,在一次循环迭代过程中执行以下操作:从训练样本数据集中选取训练样本;其中,训练样本为:针对待修复的图像元素进行掩膜处理后的历史图像,以及历史图像中需保留的图像元素对应的对象历史掩膜模板;将训练样本输入信息传播模型,预测历史图像对应的预测修复图像,并基于预测修复图像中预测模糊区域,生成图像预测掩膜模板,以及对象历史掩膜模板对应的对象预测掩膜模板;采用基于预测修复图像、图像预测掩膜模板以及对象预测掩膜模板构建的目标损失函数,对信息传播模型进行参数调整。
在一种可能的实现方式中,训练样本中还包括:历史图像对应的实际修复图像,与对象历史掩膜模板对应的对象实际掩膜模板;则目标损失函数是采用以下方式构建的:基于预测修复图像和实际修复图像构建第一类损失函数,基于图像预测掩膜模板和图像中间掩膜模板构建第二类损失函数,以及基于对象预测掩膜模板和对象实际掩膜模板构建第三类损失函数,其中,图像中间掩膜模板,是基于预测修复图像和实际修复图像确定的;基于第一类损失函数、第二类损失函数,以及第三类损失函数构建目标损失函数。
在一种可能的实现方式中,第一类损失函数是通过如下方式确定的:基于预测修复图像和实际修复图像之间的图像差异像素值,确定第一子损失函数;基于预测修复图像和实际修复图像的第二对比结果,确定第二子损失函数,其中,第二对比结果用于表征预测修复图像和实际修复图像是否一致;基于第一子损失函数和第二子损失函数,确定第一类损失函数。
在一种可能的实现方式中,第二类损失函数是通过如下方式确定的:基于图像预测掩膜模板和图像中间掩膜模板之间的掩膜差异像素值,确定第三子损失函数,并将第三子损失函数作为第二类损失函数。
在一种可能的实现方式中,第三类损失函数是通过如下方式确定的:基于对象预测掩膜模板和历史对象实际掩膜模板之间的对象差异像素值,确定第四子损失函数;基于对象预测掩膜模板和历史对象实际掩膜模板之间的相似度,确定第五子损失函数;基于第四子损失函数和第五子损失函数,确定第三类损失函数。
在一种可能的实现方式中,第二处理单元1502生成相应的图像初始掩膜模板后,还配置为:当图像初始掩膜模板包含的初始模糊像素的第一数量小于第一阈值时,将第一修复图像作为待处理图像对应的目标修复图像。
在一种可能的实现方式中,第三处理单元1503获得图像目标掩膜模板后,还配置为:当图像目标掩膜模板包含的中间模糊像素的第二数量小于第二阈值时,将第一修复图像作为待处理图像对应的目标修复图像。
为了描述的方便,以上各部分按照功能划分为各单元(或模块)分别描述。当然,在实施本申请时可以把各单元(或模块)的功能在同一个或多个软件或硬件中实现。
所属技术领域的技术人员能够理解,本申请的各个方面可以实现为***、方法或程序产品。因此,本申请的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“***”。
在介绍了本申请示例性实施方式的图像处理方法和装置之后,接下来,介绍根据本申请的另一示例性实施方式的用于图像处理的电子设备。
与本申请上述方法实施例基于同一发明构思,本申请实施例中还提供了一种电子设备,该电子设备可以是服务器。在该实施例中,电子设备的结构可以如图16所示,包括存储器1601,通讯模块1603以及一个或多个处理器1602。
存储器1601,用于存储处理器1602执行的计算机程序。存储器1601可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作***,以及运行即时通讯功能所需的程序等;存储数据区可存储各种即时通讯信息和操作指令集等。
存储器1601可以是易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM);存储器1601也可以是非易失性存储器(non-volatile memory),例如只读存储器,快闪存储器(flash memory),硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD);或者存储器1601是能够用于携带或存储具有指令或数据结构形式的期望的计算机程序并能够由计算机存取的任何其他介质,但不限于此。存储器1601可以是上述存储器的组合。
处理器1602,可以包括一个或多个中央处理单元(central processing unit,CPU)或者为数字处理单元等等。处理器1602,用于调用存储器1601中存储的计算机程序时实现上述图像处理方法。
通讯模块1603配置为与终端设备和其他服务器进行通信。
本申请实施例中不限定上述存储器1601、通讯模块1603和处理器1602之间的具体连接介质。本申请实施例在图16中以存储器1601和处理器1602之间通过总线1604连 接,总线1604在图16中以粗线描述,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。总线1604可以分为地址总线、数据总线、控制总线等。为便于描述,图16中仅用一条粗线描述,但并不描述仅有一根总线或一种类型的总线。
存储器1601中存储有计算机存储介质,计算机存储介质中存储有计算机可执行指令,计算机可执行指令用于实现本申请实施例的图像处理方法。处理器1602用于执行上述的图像处理方法。
在另一种实施例中,电子设备也可以是其他电子设备,如图3所示的终端设备310。在该实施例中,电子设备的结构可以如图17所示,包括:通信组件1710、存储器1720、显示单元1730、摄像头1740、传感器1750、音频电路1760、蓝牙模块1770、处理器1780等部件。
通信组件1710配置为与服务器进行通信。在一些实施例中,可以包括电路无线保真(Wireless Fidelity,WiFi)模块,WiFi模块属于短距离无线传输技术,电子设备通过WiFi模块可以帮助用户收发信息。
存储器1720可用于存储软件程序及数据。处理器1780通过运行存储在存储器1720的软件程序或数据,从而执行终端设备310的各种功能以及数据处理。存储器1720可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。存储器1720存储有使得终端设备310能运行的操作***。本申请中存储器1720可以存储操作***及各种应用程序,还可以存储执行本申请实施例图像处理方法的代码。
显示单元1730还可用于显示由用户输入的信息或提供给用户的信息以及终端设备310的各种菜单的图形用户界面(graphical user interface,GUI)。具体地,显示单元1730可以包括设置在终端设备310正面的显示屏1732。其中,显示屏1732可以采用液晶显示器、发光二极管等形式来配置。显示单元1730可以用于显示本申请实施例中的目标修复图像等。
显示单元1730还可用于接收输入的数字或字符信息,产生与终端设备310的用户设置以及功能控制有关的信号输入,具体地,显示单元1730可以包括设置在终端设备310正面的触摸屏1731,可收集用户在其上或附近的触摸操作,例如点击按钮,拖动滚动框等。
其中,触摸屏1731可以覆盖在显示屏1732之上,也可以将触摸屏1731与显示屏1732集成而实现终端设备310的输入和输出功能,集成后可以简称触摸显示屏。本申请中显示单元1730可以显示应用程序以及对应的操作步骤。
摄像头1740可用于捕获静态图像。摄像头1740可以是一个,也可以是多个。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给处理器1780转换成数字图像信号。
终端设备还可以包括至少一种传感器1750,比如加速度传感器1751、距离传感器1752、指纹传感器1753、温度传感器1754。终端设备还可配置有陀螺仪、气压计、湿度计、温度计、红外线传感器、光传感器、运动传感器等其他传感器。
音频电路1760、扬声器1761、传声器1762可提供用户与终端设备310之间的音频接口。音频电路1760可将接收到的音频数据转换后的电信号,传输到扬声器1761,由扬声器1761转换为声音信号输出。终端设备310还可配置音量按钮,用于调节声音信号的音量。另一方面,传声器1762将收集的声音信号转换为电信号,由音频电路1760接收后转换为音频数据,再将音频数据输出至通信组件1710以发送给比如另一终端设 备310,或者将音频数据输出至存储器1720以便进一步处理。
蓝牙模块1770用于通过蓝牙协议来与其他具有蓝牙模块的蓝牙设备进行信息交互。例如,终端设备可以通过蓝牙模块1770与同样具备蓝牙模块的可穿戴电子设备(例如智能手表)建立蓝牙连接,从而进行数据交互。
处理器1780是终端设备的控制中心,利用各种接口和线路连接整个终端的各个部分,通过运行或执行存储在存储器1720内的软件程序,以及调用存储在存储器1720内的数据,执行终端设备的各种功能和处理数据。在一些实施例中,处理器1780可包括一个或多个处理单元;处理器1780还可以集成应用处理器和基带处理器,其中,应用处理器主要处理操作***、用户界面和应用程序等,基带处理器主要处理无线通信。可以理解的是,上述基带处理器也可以不集成到处理器1780中。本申请中处理器1780可以运行操作***、应用程序、用户界面显示及触控响应,以及本申请实施例的图像处理方法。另外,处理器1780与显示单元1730耦接。
在一些可能的实施方式中,本申请提供的图像处理方法的各个方面还可以实现为一种程序产品的形式,其包括计算机程序,当程序产品在电子设备上运行时,计算机程序用于使电子设备执行本说明书上述描述的根据本申请各种示例性实施方式的图像处理方法中的步骤。
程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以是但不限于电、磁、光、电磁、红外线、或半导体的***、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
本申请的实施方式的程序产品可以采用便携式紧凑盘只读存储器(CD-ROM)并包括计算机程序,并可以在计算装置上运行。然而,本申请的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被命令执行***、装置或者器件使用或者与其结合使用。
可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读计算机程序。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由命令执行***、装置或者器件使用或者与其结合使用的程序。
可读介质上包含的计算机程序可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。
应当注意,尽管在上文详细描述中提及了装置的若干单元或子单元,但是这种划分仅仅是示例性的并非强制性的。实际上,根据本申请的实施方式,上文描述的两个或更多单元的特征和功能可以在一个单元中具体化。反之,上文描述的一个单元的特征和功能可以进一步划分为由多个单元来具体化。
此外,尽管在附图中以特定顺序描述了本申请方法的操作,但是,这并非要求或者暗示必须按照该特定顺序来执行这些操作,或是必须执行全部所示的操作才能实现期望的结果。附加地或备选地,可以省略某些步骤,将多个步骤合并为一个步骤执行,和/或将一个步骤分解为多个步骤执行。
本领域内的技术人员应明白,本申请实施例可提供为方法、***、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的 实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用计算机程序的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (16)

  1. 一种图像处理方法,由计算机设备执行,所述方法包括:
    对获取的目标视频帧图像包含的第一类对象进行掩膜处理,获得掩膜处理后的待处理图像;所述第一类对象为待修复的图像元素;
    对所述待处理图像中所述第一类对象进行修复处理,获得第一修复图像,并基于所述第一修复图像中的初始模糊区域,生成图像初始掩膜模板;
    当所述图像初始掩膜模板包含的初始模糊像素的第一数量达到第一阈值时,对所述初始模糊像素对应的初始模糊区域进行形态学处理,获得图像目标掩膜模板;
    当所述图像目标掩膜模板包含的中间模糊像素的第二数量达到第二阈值时,在所述第一修复图像中,对所述中间模糊像素对应的像素区域进行修复处理,获得第二修复图像;
    基于所述第二修复图像,确定所述待处理图像对应的目标修复图像。
  2. 如权利要求1所述的方法,其中,所述对所述待处理图像中所述第一类对象进行修复处理,获得第一修复图像,并基于所述第一修复图像中的初始模糊区域,生成相应的图像初始掩膜模板,包括:
    将包含所述待处理图像的视频序列,输入已训练的信息传播模型;
    在所述已训练的信息传播模型中,基于所述视频序列中其他视频帧图像中的图像元素,对所述待处理图像中所述第一类对象进行修复处理,获得第一修复图像,并基于所述第一修复图像中的初始模糊区域,生成相应的图像初始掩膜模板。
  3. 如权利要求2所述的方法,其中,所述方法还包括:
    将对象初始掩膜模板,输入所述已训练的信息传播模型,其中,所述对象初始掩膜模板是对所述视频帧图像中包含的所述第二类对象进行识别后确定的,所述第二类对象为需保留的图像元素;
    在所述已训练的信息传播模型中,对所述对象初始掩膜模板中的第二类对象进行对象轮廓补全处理,获得对象目标掩膜模板。
  4. 如权利要求3所述的方法,其中,基于所述第二修复图像,确定所述待处理图像对应的目标修复图像,包括:
    将所述对象初始掩膜模板与所述对象目标掩膜模板进行对比,获得第一对比结果,其中,所述第一对比结果用于表征所述第二类对象的轮廓是否一致;
    基于所述第一对比结果,对所述第二修复图像进行处理,获得所述目标修复图像。
  5. 如权利要求4所述的方法,其中,所述基于所述第一对比结果,对所述第二修复图像进行处理,获得所述目标修复图像,包括:
    若所述第一对比结果表征所述第二类对象轮廓不一致,则在所述第二修复图像中,对所述第二类对象对应的像素区域进行修复处理,获得第三修复图像,并将所述第三修复图像作为所述目标修复图像;
    若所述第一对比结果表征所述第二对象轮对一致,则将所述第二修复图像作为所述目标修复图像。
  6. 如权利要求2或3所述的方法,其中,所述已训练的信息传播模型是通过如下方式训练获得的:
    根据训练样本数据集中的训练样本,对待训练的信息传播模型执行循环迭代训练,获得已训练的信息传播模型,其中,在一次循环迭代过程中执行以下操作:
    从所述训练样本数据集中选取训练样本;其中,所述训练样本包括:针对待修复的图像元素进行掩膜处理后的历史图像,以及所述历史图像中需保留的图像元素对应的对 象历史掩膜模板;
    将所述训练样本输入所述信息传播模型,预测所述历史图像对应的预测修复图像,并基于所述预测修复图像中的预测模糊区域,生成图像预测掩膜模板,以及所述对象历史掩膜模板对应的对象预测掩膜模板;
    采用基于所述预测修复图像、所述图像预测掩膜模板以及所述对象预测掩膜模板构建的目标损失函数,对所述信息传播模型进行参数调整。
  7. 如权利要求6所述的方法,其中,所述训练样本还包括:所述历史图像对应的实际修复图像,与所述对象历史掩膜模板对应的对象实际掩膜模板;
    所述信息传播模型的目标损失函数是采用以下方式构建的:
    基于所述预测修复图像和所述实际修复图像构建第一类损失函数,基于所述图像预测掩膜模板和图像中间掩膜模板构建第二类损失函数,以及基于所述对象预测掩膜模板和所述对象实际掩膜模板构建第三类损失函数,其中,所述图像中间掩膜模板,是基于所述预测修复图像和所述实际修复图像确定的;
    基于所述第一类损失函数、所述第二类损失函数,以及所述第三类损失函数构建所述目标损失函数。
  8. 如权利要求7所述的方法,其中,所述第一类损失函数是通过如下方式确定的:
    基于所述预测修复图像和所述实际修复图像之间的图像差异像素值,确定第一子损失函数;
    基于所述预测修复图像和所述实际修复图像的第二对比结果,确定第二子损失函数,其中,所述第二对比结果用于表征所述预测修复图像和所述实际修复图像是否一致;
    基于所述第一子损失函数和所述第二子损失函数,确定所述第一类损失函数。
  9. 如权利要求8所述的方法,其中,所述第二类损失函数是通过如下方式确定的:
    基于所述图像预测掩膜模板和所述图像中间掩膜模板之间的掩膜差异像素值,确定第三子损失函数,并将所述第三子损失函数作为所述第二类损失函数。
  10. 如权利要求8或9所述的方法,其中,所述第三类损失函数是通过如下方式确定的:
    基于所述对象预测掩膜模板和所述历史对象实际掩膜模板之间的对象差异像素值,确定第四子损失函数;
    基于所述对象预测掩膜模板和所述历史对象实际掩膜模板之间的相似度,确定第五子损失函数;
    基于所述第四子损失函数和所述第五子损失函数,确定所述第三类损失函数。
  11. 如权利要求1~10任一所述的方法,其中,所述生成相应的图像初始掩膜模板后,还包括:
    当所述图像初始掩膜模板包含的初始模糊像素的第一数量小于第一阈值时,将所述第一修复图像作为所述待处理图像对应的目标修复图像。
  12. 如权利要求1~10任一所述的方法,其中,所述获得图像目标掩膜模板后,还包括:
    当所述图像目标掩膜模板包含的中间模糊像素的第二数量小于第二阈值时,将所述第一修复图像作为所述待处理图像对应的目标修复图像。
  13. 一种图像处理装置,所述装置包括:
    第一处理单元,配置为对获取的目标视频帧图像包含的第一类对象进行掩膜处理,获得掩膜处理后的待处理图像;所述第一类对象为待修复的图像元素;
    第二处理单元,配置为对所述待处理图像中所述第一类对象进行修复处理,获得第一修复图像,并基于所述第一修复图像中的初始模糊区域,生成相应的图像初始掩膜模 板;
    第三处理单元,配置为当所述图像初始掩膜模板包含的初始模糊像素的第一数量达到第一阈值时,对所述初始模糊像素对应的初始模糊区域进行形态学处理,获得图像目标掩膜模板;
    第四处理单元,配置为当所述图像目标掩膜模板包含的中间模糊像素的第二数量达到第二阈值时,在所述第一修复图像中,对所述中间模糊像素对应的像素区域进行修复处理,获得第二修复图像;
    确定单元,配置为基于所述第二修复图像,确定所述待处理图像对应的目标修复图像。
  14. 一种电子设备,所述电子设备包括:存储器和处理器,其中:
    所述存储器,用于存储计算机程序;
    所述处理器,用于执行所述计算机程序,实现权利要求1~12任一所述方法的步骤。
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,当所述计算机程序被处理器执行时,实现如权利要求1~12任一所述方法的步骤。
  16. 一种计算机程序产品,所述计算机程序产品包括计算机程序,计算机程序存储在计算机可读存储介质中;当所述计算机程序被处理器执行时,实现如权利要求1~12任一所述方法的步骤。
PCT/CN2023/105718 2022-08-26 2023-07-04 图像处理方法、装置、设备、存储介质及程序产品 WO2024041235A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211029204.9A CN117011156A (zh) 2022-08-26 2022-08-26 图像处理方法、装置、设备及存储介质
CN202211029204.9 2022-08-26

Publications (1)

Publication Number Publication Date
WO2024041235A1 true WO2024041235A1 (zh) 2024-02-29

Family

ID=88562459

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/105718 WO2024041235A1 (zh) 2022-08-26 2023-07-04 图像处理方法、装置、设备、存储介质及程序产品

Country Status (2)

Country Link
CN (1) CN117011156A (zh)
WO (1) WO2024041235A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117333403B (zh) * 2023-12-01 2024-03-29 合肥金星智控科技股份有限公司 图像增强方法、存储介质和图像处理***

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102230361B1 (ko) * 2019-09-18 2021-03-23 고려대학교 산학협력단 단일 이미지를 이용하는 배경이미지 복원장치 및 그 동작 방법
CN113888431A (zh) * 2021-09-30 2022-01-04 Oppo广东移动通信有限公司 图像修复模型的训练方法、装置、计算机设备及存储介质
CN114022497A (zh) * 2021-09-30 2022-02-08 泰康保险集团股份有限公司 一种图像处理方法及装置
CN114170112A (zh) * 2021-12-17 2022-03-11 中国科学院自动化研究所 一种修复图像的方法、装置以及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102230361B1 (ko) * 2019-09-18 2021-03-23 고려대학교 산학협력단 단일 이미지를 이용하는 배경이미지 복원장치 및 그 동작 방법
CN113888431A (zh) * 2021-09-30 2022-01-04 Oppo广东移动通信有限公司 图像修复模型的训练方法、装置、计算机设备及存储介质
CN114022497A (zh) * 2021-09-30 2022-02-08 泰康保险集团股份有限公司 一种图像处理方法及装置
CN114170112A (zh) * 2021-12-17 2022-03-11 中国科学院自动化研究所 一种修复图像的方法、装置以及存储介质

Also Published As

Publication number Publication date
CN117011156A (zh) 2023-11-07

Similar Documents

Publication Publication Date Title
US11734851B2 (en) Face key point detection method and apparatus, storage medium, and electronic device
KR102451198B1 (ko) 이미지 생성 방법 및 장치
TWI728465B (zh) 圖像處理方法和裝置、電子設備及儲存介質
US20210158533A1 (en) Image processing method and apparatus, and storage medium
CN109508681B (zh) 生成人体关键点检测模型的方法和装置
CN112184738B (zh) 一种图像分割方法、装置、设备及存储介质
WO2022156626A1 (zh) 一种图像的视线矫正方法、装置、电子设备、计算机可读存储介质及计算机程序产品
WO2021103731A1 (zh) 一种语义分割方法、模型训练方法及装置
CN114973049B (zh) 一种统一卷积与自注意力的轻量视频分类方法
WO2024041235A1 (zh) 图像处理方法、装置、设备、存储介质及程序产品
CN112989085A (zh) 图像处理方法、装置、计算机设备及存储介质
CN113076814A (zh) 文本区域的确定方法、装置、设备及可读存储介质
CN115937033A (zh) 图像生成方法、装置及电子设备
CN111246196B (zh) 视频处理方法、装置、电子设备及计算机可读存储介质
CN114913061A (zh) 一种图像处理方法、装置、存储介质及电子设备
CN113642359B (zh) 人脸图像生成方法、装置、电子设备及存储介质
CN112037305A (zh) 对图像中的树状组织进行重建的方法、设备及存储介质
WO2020155908A1 (zh) 用于生成信息的方法和装置
CN112052863B (zh) 一种图像检测方法及装置、计算机存储介质、电子设备
CN117151987A (zh) 一种图像增强方法、装置及电子设备
Lee et al. An image-guided network for depth edge enhancement
CN114913196A (zh) 一种基于注意力机制稠密光流计算方法
CN113888432A (zh) 一种图像增强方法、装置和用于图像增强的装置
CN117576245B (zh) 一种图像的风格转换方法、装置、电子设备及存储介质
CN114187408B (zh) 三维人脸模型重建方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23856325

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023856325

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2023856325

Country of ref document: EP

Effective date: 20240531