WO2023060918A1 - 一种基于语义和姿态图引导的图片匿名化方法 - Google Patents

一种基于语义和姿态图引导的图片匿名化方法 Download PDF

Info

Publication number
WO2023060918A1
WO2023060918A1 PCT/CN2022/097530 CN2022097530W WO2023060918A1 WO 2023060918 A1 WO2023060918 A1 WO 2023060918A1 CN 2022097530 W CN2022097530 W CN 2022097530W WO 2023060918 A1 WO2023060918 A1 WO 2023060918A1
Authority
WO
WIPO (PCT)
Prior art keywords
semantic
image
pose
graph
anonymization
Prior art date
Application number
PCT/CN2022/097530
Other languages
English (en)
French (fr)
Inventor
张继东
吕超
曹靖城
吴宇松
Original Assignee
天翼数字生活科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 天翼数字生活科技有限公司 filed Critical 天翼数字生活科技有限公司
Publication of WO2023060918A1 publication Critical patent/WO2023060918A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0021Image watermarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the invention relates to the field of video applications, and mainly relates to anonymizing pictures in the field of video applications.
  • video surveillance cameras are from the initial closed-circuit television surveillance system, that is, the first generation of analog TV surveillance system, to the PC-based card-based video surveillance system in the semi-digital era, and finally to the current embedded technology-based video surveillance system. , a digital age dominated by network video surveillance systems featuring network and communication technologies and intelligent image analysis.
  • the current intelligent video analysis technology mainly analyzes real-time video images to achieve early warning.
  • the development of network communication makes users pay more and more attention to personal privacy, and pictures, as a rich information carrier, are more sensitive to users.
  • Early image anonymization efforts simply used methods such as masking, obfuscation, or pixelation of sensitive information. Although these methods have high ease of use, they are basically ineffective in the face of the current popular deep learning recognition methods.
  • researchers have gradually proposed more complex and effective methods: such as using the k-same algorithm for face anonymization, and using the generative confrontation network (GAN) framework to achieve image anonymization.
  • GAN generative confrontation network
  • the patent "Method for Protecting Face Anonymity and Privacy Based on Generative Adversarial Network” discloses a method for protecting face anonymity and privacy based on Generative Adversarial Network.
  • the invention first preprocesses the face image data; then builds a generative confrontation network structure; then establishes the objective function of face area anonymity; then establishes the objective function of scene content area preservation; Combination; Finally, the public data set is used for training and testing, and the final result is output.
  • This method replaces the synthetic face in the face area of the image to achieve the effect of face anonymity. Compared with the previous mosaic occlusion method, it is more efficient and more visually friendly.
  • this method only replaces the face, and does not process body parts other than the face and other scenes on the picture.
  • this method still has risks in terms of user privacy.
  • this method relies on the accuracy of face detection, and there is a possibility of anonymization failure.
  • the patent "Privacy Protection Method for Visual Pictures of Service Robots Based on Generative Adversarial Networks” discloses a privacy protection method for visual pictures of service robots based on generative adversarial networks. Data preprocessing, and then the privacy identification module determines whether the input preprocessing data has privacy. If it is determined that it is a picture involving privacy, the picture is converted into picture data that does not involve privacy and stored; training data growth and feature learning are It is used to update the training data set, and based on the training data set, the feature model is obtained through the improved Cycle-GAN algorithm for the image conversion.
  • the present invention can make the image data itself not involve privacy content from the source, but this invention directly uses Cycle-GAN to migrate the original image, which lacks a fixed guidance mechanism, which may lead to large differences in styles between different processing results, which is not suitable for Used as training and testing data.
  • the present invention uses a guided confrontation generation network to realize global anonymization of pictures and protect user privacy to the greatest extent.
  • the present invention can keep the usability of picture data as far as possible, and can simultaneously meet user privacy protection and actual development needs.
  • a picture anonymization method guided by semantic graph and pose graph including: performing semantic segmentation on the original picture to obtain a semantic graph; Generate a scene graph with the same semantics but different content as the original picture under the guidance of the graph; use the portrait part in the semantic graph as a mask to intercept the portrait graph from the original picture; The pose of the person is extracted and estimated to generate a pose graph; the anonymized confrontation generation network is used to generate a new portrait image with the same pose as the portrait image but different characters under the guidance of the pose graph; the scene The graph and the new portrait image are superimposed according to the semantic graph to obtain a final anonymized image.
  • a picture anonymization system guided by semantic graph and pose graph including: a picture semantic anonymization module, a character pose anonymization module and a superposition module.
  • the image semantic anonymization module is configured to: perform semantic segmentation on the original image to obtain a semantic map; use the image semantic anonymization confrontation generation network to generate an image with the same semantics as the original image but different from the original image under the guidance of the semantic map.
  • the scene graph of the content is disclosed, including: a picture semantic anonymization module, a character pose anonymization module and a superposition module.
  • the image semantic anonymization module is configured to: perform semantic segmentation on the original image to obtain a semantic map; use the image semantic anonymization confrontation generation network to generate an image with the same semantics as the original image but different from the original image under the guidance of the semantic map.
  • the person pose anonymization module is configured to: use the portrait part in the semantic map as a mask to intercept the portrait map from the original picture; extract and estimate the pose of the person in the portrait map to generate a pose map ; Using the pose anonymization confrontation generation network to generate a new portrait image with the same posture as the portrait image but different characters under the guidance of the pose graph.
  • the overlay module is configured to: overlay the scene graph and the new portrait image according to the semantic graph to obtain a final anonymized image.
  • a computing device for image anonymization guided by a semantic graph and a pose graph including: a processor;
  • the above-mentioned method can be executed when the above-mentioned processor is executed.
  • Fig. 1 shows a block diagram for a picture anonymization system 100 guided by a semantic graph and a pose graph according to an embodiment of the present invention
  • FIG. 2 shows a diagram 200 further describing the function of the image semantic anonymization module 101 according to an embodiment of the present invention
  • FIG. 3 shows a diagram of a multi-channel attention selection model 300 according to one embodiment of the present invention
  • Fig. 4 shows a diagram 400 further describing the function of the character gesture anonymization module 102 according to an embodiment of the present invention
  • FIG. 5 shows a data flow diagram 500 for a picture anonymization process guided by a semantic graph and a pose graph according to an embodiment of the present invention
  • FIG. 6 shows a flow chart of a method 600 for anonymizing a picture guided by a semantic graph and a pose graph according to an embodiment of the present invention.
  • FIG. 7 shows a block diagram 700 of an exemplary computing device according to one embodiment of the invention.
  • the present invention uses the method of semantic map guidance and posture map guidance to globally anonymize the user's original picture, which can not only ensure that the privacy of the user is not leaked, but also maintain the original semantic information of the picture and the posture information of the characters.
  • the present invention can provide usable training data for the development and optimization of humanoid detection, motion detection and other AI algorithm models that do not have high requirements on human faces, and can also provide users with an active anonymization and encryption privacy protection mechanism.
  • FIG. 1 shows a block diagram of a system 100 for image anonymization guided by semantic graphs and pose graphs according to an embodiment of the present invention.
  • the system 100 is divided into modules, and communication and data exchange are performed between modules in a manner known in the art.
  • each module can be implemented by software or hardware or a combination thereof.
  • the system 100 includes a picture semantic anonymization module 101 , a character pose anonymization module 102 and a superposition module 103 .
  • the image semantic anonymization module 101 is configured to first semantically segment the image to obtain a semantic graph, and then use the confrontation generation network to generate a scene graph with the same semantics but different content under the guidance of the semantic graph .
  • the character pose anonymization module 102 is configured to further guide and generate the characters in the picture on the basis of the image semantic anonymization module 101, firstly perform pose estimation on the characters to obtain a pose map of key points of the human body, A new portrait image with the same pose but different characters is then generated using an adversarial generative network guided by the pose graph.
  • the superposition module 103 is configured to superimpose the scene graph generated by the picture semantic anonymization module 101 and the new portrait image generated by the character pose anonymization module 102 according to the semantic graph to obtain the final anonymization picture.
  • the location information of the characters on the picture can be obtained through semantic segmentation, and the final anonymized picture superposition can be realized through this information.
  • the camera used in the intelligent video monitoring technology involved in the present invention generally refers to the home camera involved in the field of smart home, the monitoring probe involved in the field of smart city, and the cameras generally installed in public places. camera equipment for surveillance.
  • This kind of monitoring equipment can take pictures and videos of the scene, and store the acquired image data locally for subsequent processing or send the data to remote equipment (for example, smart home control platform, central control platform, other computing equipment, etc.) deal with.
  • remote equipment for example, smart home control platform, central control platform, other computing equipment, etc.
  • This article does not limit the connection and communication methods between the monitoring device and the remote device, but considers that various methods known in the art can be used.
  • the system 100 can be implemented in a monitoring device, and can also be implemented on a remote device.
  • one or more modules in the system 100 may be separately implemented in the monitoring device and the remote device.
  • FIG. 2 shows a diagram 200 further describing the function of the image semantic anonymization module 101 according to an embodiment of the present invention.
  • the image semantic anonymization module 101 is configured to implement three stages of semantic segmentation, semantic-guided reconstruction and image optimization.
  • the self-encoder built using ShuffleNet as the backbone network is used as the semantic generator, and the scene semantic map Sg is obtained by reasoning the input original image Ig.
  • both the semantic-guided reconstruction stage and the image optimization stage can logically/functionally constitute a cascaded semantic-guided image semantic anonymization adversarial generation network based on a multi-channel attention selection mechanism.
  • the semantic-guided reconstruction stage is used to generate coarse-grained image semantic anonymization results using cascaded semantic guidance, while the image optimization stage is used to generate more detailed images through a multi-channel attention selection mechanism. result.
  • a target texture image Ir is randomly selected from the scene texture image library as the conditional image, and the randomly selected target texture image Ir is concatenated with the scene semantic image Sg obtained in the semantic segmentation stage, and
  • the cascaded results are input to the generator Gi to infer the generated image I'g, where the generator Gi is a U-Net model built based on RefineNet, and the semantic map S'g of the generated image I'g is optimized to match the original scene during training.
  • the loss function of the semantic graph Sg optimizes the generator Gi.
  • L1-L4 are the four components when calculating the loss function.
  • the image optimization stage uses a multi-channel attention selection model to optimize the generated image I′g in the previous stage to obtain the final scene graph I′′g.
  • the purpose of using a multi-channel attention selection model is to learn from a larger generation space Produce more fine-grained results, and generate an uncertain map to guide the optimization of pixel loss.
  • Figure 3 shows a diagram of a multi-channel attention selection model 300 according to an embodiment of the present invention.
  • the multi-channel attention selection model 300 includes a multi-scale spatial pooling part and a multi-channel attention selection part.
  • the multi-scale spatial pooling part uses a set of different sizes and strides to perform global average pooling on the same input features, and obtains multi-scale features with different receptive fields to perceive different spatial backgrounds.
  • the multi-channel attention selection part utilizes to generate a series of different intermediate images and combine them into the final output.
  • the multi-channel attention selection model 300 selects the conditional image Ir, the generated image I'g, the generator Gi, and the feature map output from the last convolutional layer in the semantic segmentation stage to concatenate as the feature input multi-scale space
  • the multi-scale spatial pooling part performs average pooling of different scales to obtain multi-scale spatial context features.
  • the features pooled at different scales are multiplied by the input features, and the result is convolved to generate new multi-scale features and used as the input of the multi-channel attention selection part.
  • the multi-channel attention selection part enlarges the channel representation of the image through the convolutional network, and combines with the attention map to produce more reasonable results.
  • the multi-channel attention selection model 300 selects the conditional image I r , the generated image I'g, the generator Gi, and the feature map F i output by the last convolutional layer in the semantic segmentation stage Cascading with F s is the feature input in the multi-scale space pooling part, and the generated multi-scale features are used as the input of the multi-channel attention selection part.
  • the multi-channel attention selection part expands the channel representation of the image through the convolutional network, where the middle picture and corresponding attention images The calculation method of is shown in the formula (1):
  • the pixel-level Loss (loss function) optimization calculation can be made more robust.
  • the image semantic anonymization module 101 uses the indoor09 indoor scene dataset to train the generator Gi and the multi-attention selection model 300 .
  • FIG. 4 shows a diagram 400 further describing the function of the character gesture anonymization module 102 according to an embodiment of the present invention.
  • the realization of the functions of the character pose anonymization module 102 is similar to that of the picture semantic anonymization module 101, the difference is that the pose graph extracted by the openPose model is used to replace the semantic graph, and a person is randomly selected from the public portrait image dataset. images as conditional images.
  • the character pose anonymization module 102 uses the CUHK03 humanoid dataset for training.
  • the character pose anonymization module 102 is configured to implement three stages of pose estimation, pose-guided reconstruction and picture optimization.
  • the portrait part in the semantic map obtained by the picture semantic anonymization module 101 is used as a mask, the original portrait image Ig is intercepted from the original input picture, and the original portrait image Ig is captured by the openPose model The pose of the person in is extracted and estimated to generate a pose graph Sg.
  • both the pose-guided reconstruction stage and the image optimization stage can logically/functionally constitute a cascaded pose-guided character pose anonymization adversarial generation network based on a multi-channel attention selection mechanism.
  • the pose-guided reconstruction stage is used to generate coarse-grained person pose anonymization results, while the image optimization stage is used to generate more detailed results through a multi-channel attention selection mechanism.
  • a portrait image Ir is randomly selected from the portrait image dataset as the conditional image, and the randomly selected portrait image Ir is concatenated with the pose image Sg obtained in the pose estimation stage, and the concatenated
  • the result is input to the generator Gi to infer the generated image I'g, where the generator Gi is a U-Net model built based on RefineNet.
  • the pose graph S'g of the generated image I'g and the original pose graph Sg are optimized.
  • the loss function optimizes the generator Gi.
  • L1-L4 are the four components when calculating the loss function.
  • the picture optimization stage uses the multi-channel attention selection model to optimize the generated picture I'g in the previous stage to obtain the final portrait image I"g.
  • the multi-channel attention selection model please refer to the description of Figure 3 above .
  • FIG. 5 shows a data flow diagram 500 for an image anonymization process guided by a semantic graph and a pose graph according to an embodiment of the present invention.
  • the data flow diagram 500 can be divided into a picture semantic anonymization stage 501 , a character pose anonymization stage 502 and a superposition stage 503 .
  • the input image is semantically segmented to form a semantic graph, and the semantic graph is generated through the image semantic anonymization adversarial generative network as described above to generate a scene graph.
  • the stage 502 of anonymizing the person's pose can be started.
  • the original portrait image is intercepted from the input image first using the portrait part in the semantic graph as a mask. Pose extraction and estimation are performed on this raw portrait image to generate a pose map.
  • This pose graph generates a new portrait graph through the person pose anonymization adversarial generative network as described above.
  • the overlay stage 503 can be started.
  • the portrait images are overlaid according to the semantic graph to form an anonymized image for output.
  • FIG. 6 shows a flow chart of a method 600 for anonymizing pictures guided by semantic graphs and pose graphs according to an embodiment of the present invention.
  • step 601 semantic segmentation is performed on the original image to obtain a semantic map.
  • the original picture may be a picture taken by a surveillance camera, or a certain frame in a video taken by a surveillance camera, or a picture selected by a user.
  • the self-encoder and decoder built by using ShuffleNet as the backbone network are used as the semantic generator, and reasoning is performed on the original image to obtain the semantic graph.
  • the semantic map may indicate the position information of the person on the original picture.
  • the image semantic anonymization confrontation generation network includes a semantically guided reconstruction stage and an image optimization stage, wherein the semantically guided reconstruction stage is used to generate coarse-grained image semantic anonymization based on the semantic graph using cascaded semantic guidance
  • the image optimization stage is used to optimize the image semantic anonymization results generated in the semantic guidance reconstruction stage through a multi-channel attention selection mechanism to obtain a final scene graph with a finer granularity.
  • step 603 the original portrait image is intercepted from the original picture by using the portrait part in the semantic map obtained in step 601 as a mask.
  • step 604 the pose of the person in the original portrait image is extracted and estimated to generate a pose graph.
  • the openPose model is used to extract and estimate the pose of the person in the original portrait image obtained in step 603 to generate a pose graph.
  • the character pose anonymization adversarial generation network includes a pose-guided reconstruction stage and an image optimization stage, wherein the pose-guided reconstruction stage is used to generate coarse-grained character pose anonymization based on the pose graph using cascaded pose guidance
  • the image optimization stage is used to optimize the pose anonymization results generated in the pose-guided reconstruction stage through a multi-channel attention selection mechanism to obtain a final portrait image with a finer granularity.
  • step 606 the scene graph generated in step 602 and the new portrait image generated in step 605 are superimposed according to the semantic graph obtained in step 601 to obtain a final anonymized image.
  • the position information of the person on the original picture can be obtained through semantic segmentation, and the superimposition of the scene map and the new portrait map can be realized through this information.
  • the main advantages of the present invention are: (1) Anonymize the whole picture, and only the abstract semantic map and figure pose picture of the original picture are kept in the generated picture, human face, body, background All are completely replaced, which can minimize the risk of privacy leakage; (2) On the basis of complete anonymization, it can maintain the original semantic information, character posture information and object motion information of the picture, and can optimize human figure detection, motion detection, etc. for development
  • the non-authenticated AI algorithm model provides a large amount of available training data; (3) use the multi-channel attention model to further optimize the initial result of the confrontation generation network output, so that the quality of the output picture is higher.
  • the present invention also has the following advantages. For example, with online try-on applications using similar technologies, the present invention can not only adjust clothing, but also replace facial information and background information, protecting users to the greatest extent. privacy.
  • FIG. 7 illustrates a block diagram 700 of an exemplary computing device, which is an example of a hardware device applicable to aspects of the invention, according to one embodiment of the invention.
  • the monitoring device, remote device, and computing device associated with the user mentioned above can all be implemented as the computing device in FIG. 7 .
  • Computing device 700 can be any machine that can be configured to perform processing and/or computing, and can be, but is not limited to, a workstation, server, desktop, laptop, tablet, personal digital processing, smartphone , on-board computer, or any combination thereof.
  • Computing device 700 may include components that may be connected or communicate via one or more interfaces and bus 702 .
  • computing device 700 may include a bus 702 , one or more processors 704 , one or more input devices 706 , and one or more output devices 708 .
  • the one or more processors 704 may be any type of processor and may include, but is not limited to, one or more general purpose processors and/or one or more special purpose processors (eg, dedicated processing chips).
  • Input device 706 may be any type of device capable of entering information into a computing device and may include, but is not limited to, a mouse, keyboard, touch screen, microphone, and/or remote control.
  • Output devices 708 may be any type of device capable of presenting information and may include, but are not limited to, displays, speakers, video/audio output terminals, vibrators, and/or printers.
  • the computing device 700 may also include a non-transitory storage device 710 or be connected to the non-transitory storage device.
  • the non-transitory storage device may be any storage device that is non-transitory and capable of storing data, and the non-transitory storage device Transient storage devices may include, but are not limited to, magnetic disk drives, optical storage devices, solid state memory, floppy disks, floppy disks, hard disks, magnetic tape or any other magnetic media, optical disks or any other optical media, ROM (read only memory), RAM (random memory access memory), cache memory and/or any memory chip or cartridge, and/or any other medium from which a computer can read data, instructions and/or code.
  • the non-transitory storage device 710 is detachable from the interface.
  • the non-transitory storage device 710 may have data/instructions/codes for implementing the above methods and steps.
  • Computing device 700 may also include a communication device 712 .
  • Communication device 712 may be any type of device or system capable of communicating with internal devices and/or with a network and may include, but is not limited to, a modem, network card, infrared communication device, wireless communication device, and/or chipset, such as a Bluetooth device , IEEE 1302.11 devices, WiFi devices, WiMax devices, cellular communications devices, and/or similar devices.
  • Bus 702 may include, but is not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an Enhanced ISA (EISA) bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus.
  • ISA Industry Standard Architecture
  • MCA Micro Channel Architecture
  • EISA Enhanced ISA
  • VESA Video Electronics Standards Association
  • PCI Peripheral Component Interconnect
  • Computing device 700 may also include working memory 714, which may be any type of working memory capable of storing instructions and/or data that facilitate the operation of processor 704 and may include, but is not limited to, random access memory and/or Read-only storage device.
  • working memory 714 may be any type of working memory capable of storing instructions and/or data that facilitate the operation of processor 704 and may include, but is not limited to, random access memory and/or Read-only storage device.
  • Software components may be located in the working memory 714, including but not limited to an operating system 716, one or more application programs 718, drivers, and/or other data and code.
  • the instructions for implementing the above-mentioned methods and steps of the present invention may be included in the one or more application programs 718, and the present invention may be realized by reading and executing the instructions of the one or more application programs 718 by the processor 704 The method 600 described above.
  • custom hardware could also be used, and/or particular components could be implemented in hardware, software, firmware, middleware, microcode, hardware description voice, or any combination thereof.
  • connections to other computing devices such as network input/output devices and the like, may be employed.
  • programming hardware eg, programmable logic circuits including field programmable gate arrays (FPGAs) and/or programmable logic arrays (PLAs)
  • assembly language or hardware programming languages eg, VERILOG, VHDL, C++

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本发明涉及一种基于语义和姿态图引导的图片匿名化方法。本发明还涉及一种基于语义图和姿态图引导的图片匿名化***(100)。在***中,图片语义匿名化模块(101)配置为首先对图片进行语义分割以得到语义图,随后使用对抗生成网络在语义图的引导下生成一个具有相同语义但不同内容的场景图。人物姿态匿名化模块(102)被配置为在图片语义匿名化模块的基础上对图片中的人物进行进一步引导生成,首先对人物进行姿态估计得到姿态图,随后使用对抗生成网络在姿态图的引导下生成一个具有相同姿态但不同人物的新的人像图。叠加模块(103)被配置为将图片语义匿名化模块(100)生成的场景图与人物姿态匿名化模块(102)生成的新的人像图根据语义图进行叠加,以得到最终的匿名化图片。

Description

一种基于语义和姿态图引导的图片匿名化方法 技术领域
本发明涉及视频应用领域,主要涉及视频应用领域中对图片进行匿名化。
背景技术
视频监控摄像机的发展,是由最初的闭路电视监控***即第一代模拟电视监控***到后来半数字时代的基于PC机插卡式的视频监控***,最后进入到现在的以嵌入式技术为依托,以网络、通信技术为平台,以智能图像分析为特色的网络视频监控***为主的数字时代。
由于机器学习和人工智能技术的发展和不断进步,智能视频监控技术的应用也越来越普遍。目前的智能视频分析技术主要针对实时的视频图像进行分析,以达到预警的作用。网络传播的发达使得用户对个人隐私的重视程度越来越高,而图片作为一种丰富的信息载体对于用户而言更为敏感。早期的图片匿名化工作只是对敏感信息使用掩蔽、模糊化或像素化等方法。虽然这些方法有很高的易用性,但在面对当前流行的深度学习识别方法时基本是无效的。近年来,逐渐有研究者提出了更复杂有效的方法:例如使用k-same算法进行人脸匿名,使用生成对抗网络(GAN)框架去实现图片匿名化。
专利“基于生成对抗网络的人脸匿名隐私保护方法”(CN111242837A)公开了一种基于生成对抗网络的人脸匿名隐私保护方法。该发明首先对人脸图像数据预处理;然后构建生成对抗网络结构;再建立人脸区域匿名的目标函数;然后建立场景内容区域保留的目标函数;随后进行人脸匿名与场景保留的目标函数的结合;最后采用公开数据集进行训练及测试,输出最终结果。本方法对图像中人脸区域进行合成脸的替换而达到人脸匿名的效果,相对以往的马赛克遮挡的方法更高效并且在视觉上更友好。但该方法只对人脸进行替换,人脸以外的身体部分以及图片上其他场景并不做处理,对于家庭室内场景的图片匿名化,该方法仍然存在用户隐私方面的风险。同时该方法依赖于人脸检测的准确 性,存在匿名化失败的可能。
专利“基于生成式对抗网络的服务机器人视觉图片隐私保护方法”(CN110363183A)公开了一种基于生成式对抗网络的服务机器人视觉图片隐私保护方法,该发明由视觉数据的采集端采集的数据首先进行数据预处理,然后由隐私识别模块判定输入的预处理数据是否存在隐私,如果判定为涉及隐私的图片,进行图片转换,转换成不涉及隐私的图片数据并进行存储;训练数据生长与特征学习是用于训练数据集的更新,并基于训练数据集,通过改进的Cycle-GAN算法获取特征模型,用于所述图片转换。本发明能从源头上使图片数据本身不涉及隐私内容,但该发明直接使用Cycle-GAN对原图片进行迁移,缺少固定的引导机制,可能导致不同的处理结果之间风格差异较大,不适合作为训练测试数据使用。
因此,需要一种改进的技术来对图片进行匿名化,同时保持图片原有的语义信息和人物的姿态信息。
发明内容
提供本发明内容以便以简化形式介绍将在以下具体实施方式中进一步的描述一些概念。本发明内容并非旨在标识所要求保护的主题的关键特征或必要特征,也不旨在用于帮助确定所要求保护的主题的范围。
本发明针对视频监控场景,使用引导式对抗生成网络,实现图片的全局匿名化,最大程度地保护用户隐私。此外,本发明能够尽量保持图片数据的可用性,能够同时满足用户隐私保护和开发实际需求。
根据本发明的一个实施例,公开了一种基于语义图和姿态图引导的图片匿名化方法,包括:对原始图片进行语义分割以得到语义图;使用图片语义匿名化对抗生成网络在所述语义图的引导下生成一个与所述原始图片具有相同语义但不同内容的场景图;将所述语义图中的人像部分作为掩膜从所述原始图片中截取人像图;对所述人像图中的人物姿态进行提取和估计以生成姿态图;使用人物姿态匿名化对抗生成网络在所述姿态图的引导下生成一个与所述人像图具有相同姿态但不同人物的新的人像图;将所述场景图与所述新的人像图根据所述语义图进行叠加,以得到最终的匿名化图片。
根据本发明的一个实施例,公开了一种基于语义图和姿态图引导的图片匿名化***,包括:图片语义匿名化模块,人物姿态匿名化模块和叠加模块。所述图片语义匿名化模块被配置为:对原始图片进行语义分割以得到语义图;使用图片语义匿名化对抗生成网络在所述语义图的引导下生成一个与所述原始图片具有相同语义但不同内容的场景图。所述人物姿态匿名化模块被配置为:将所述语义图中的人像部分作为掩膜从所述原始图片中截取人像图;对所述人像图中的人物姿态进行提取和估计以生成姿态图;使用人物姿态匿名化对抗生成网络在所述姿态图的引导下生成一个与所述人像图具有相同姿态但不同人物的新的人像图。所述叠加模块被配置为:将所述场景图与所述新的人像图根据所述语义图进行叠加,以得到最终的匿名化图片。
根据本发明的另一个实施例,公开了一种用于基于语义图和姿态图引导的图片匿名化的计算设备,包括:处理器;存储器,所述存储器存储有指令,所述指令在被所述处理器执行时能执行如上所述的方法。
通过阅读下面的详细描述并参考相关联的附图,这些及其他特点和优点将变得显而易见。应该理解,前面的概括说明和下面的详细描述只是说明性的,不会对所要求保护的各方面形成限制。
附图说明
为了能详细地理解本发明的上述特征所用的方式,可以参照各实施例来对以上简要概述的内容进行更具体的描述,其中一些方面在附图中示出。然而应该注意,附图仅示出了本发明的某些典型方面,故不应被认为限定其范围,因为该描述可以允许有其它等同有效的方面。
图1示出了根据本发明的一个实施例的用于基于语义图和姿态图引导的图片匿名化***100的框图;
图2示出了根据本发明的一个实施例的进一步描述图片语义匿名化模块101功能的示图200;
图3示出了根据本发明的一个实施例的多通道注意力选择模型300的示图;
图4示出了根据本发明的一个实施例的进一步描述人物姿态匿名化模块 102功能的示图400;
图5示出了根据本发明的一个实施例的用于基于语义图和姿态图引导的图片匿名化过程的数据流图500;
图6示出了根据本发明的一个实施例的用于基于语义图和姿态图引导的图片匿名化方法600的流程图;以及
图7出了根据本发明的一个实施例的示例性计算设备的框图700。
具体实施方式
下面结合附图详细描述本发明,本发明的特点将在以下的具体描述中得到进一步的显现。
家用摄像头领域的用户需求越来越丰富,而很多AI功能的准确性都依赖于相关图片视频训练数据的丰富度。虽然在用户使用过程中积累了大量极具价值的真实数据,但出于隐私保护等原因,在实际开发中这些数据都不能被使用。隐私保护和模型训练数据短缺之间的矛盾一直困扰着开发者。
本发明使用语义图引导和姿态图引导的方法对用户的原始图片进行全局匿名化,既能保证用户隐私不被泄露,又能保持图片原有的语义信息和人物的姿态信息。本发明能为开发优化人形检测、运动检测等对人脸要求不高的AI算法模型提供可用的训练数据,同时也能为用户提供一种主动匿名化加密的隐私保护机制。
图1示出了根据本发明的一个实施例的用于基于语义图和姿态图引导的图片匿名化***100的框图。如图1中示出的,该***100按模块进行划分,各模块之间通过本领域已知的方式进行通信和数据交换。在本发明中,各模块可通过软件或硬件或其组合的方式来实现。该***100包括图片语义匿名化模块101、人物姿态匿名化模块102和叠加模块103。
根据本发明的一个实施例,图片语义匿名化模块101被配置为首先对图片进行语义分割以得到语义图,随后使用对抗生成网络在语义图的引导下生成一个具有相同语义但不同内容的场景图。
根据本发明的一个实施例,人物姿态匿名化模块102被配置为在图片语义匿名化模块101的基础上对图片中的人物进行进一步引导生成,首先对人物进 行姿态估计得到人体关键点姿态图,随后使用对抗生成网络在姿态图的引导下生成一个具有相同姿态但不同人物的新的人像图。
根据本发明的一个实施例,叠加模块103被配置为将图片语义匿名化模块101生成的场景图与人物姿态匿名化模块102生成的新的人像图根据语义图进行叠加,以得到最终的匿名化图片。通过语义分割可以得到人物在图片上的位置信息,通过该信息实现最终的匿名化图片的叠加。
本领域的技术人员可知,本发明中涉及的智能视频监控技术中所采用的摄像头一般指智能家居领域中所涉及的家用摄像头,智慧城市领域中所涉及的监控探头,以及一般安装在公共场所起到监控作用的摄像设备。此种监控设备能对场景进行拍照、摄像,并将获取的图像数据存储在本机进行后续处理或将数据发送到远程设备(例如,智能家居控制平台、中央控制平台、其他计算设备等)进行处理。本文并未对监控设备与远程设备之间的连接和通信方式进行限制,而是认为可采用本领域已知的各种方式来进行。根据本发明的一个实施例,***100可被实现在监控设备中,也可被实现在远程设备上。根据本发明的另一个实施例,***100中的一个或多个模块可被分开地实现在监控设备和远程设备中。
图2示出了根据本发明的一个实施例的进一步描述图片语义匿名化模块101功能的示图200。图片语义匿名化模块101被配置为实现语义分割、语义引导重建和图片优化三个阶段。
如图2所示,在语义分割阶段,使用ShuffleNet作为骨干网络搭建的自编解码机来作为语义生成器,对输入的原始图片Ig进行推理得到场景语义图Sg。
在本发明的上下文中,语义引导重建阶段和图片优化阶段两者可在逻辑上/功能上构成了级联语义引导下的基于多通道注意力选择机制的图片语义匿名化对抗生成网络。在该图片语义匿名化对抗生成网络中,语义引导重建阶段用于采用级联语义引导产生粗粒度级的图片语义匿名化结果,而图片优化阶段用于通过多通道注意力选择机制产生更细致的结果。
在语义引导重建阶段,将从场景纹理图片库中随机选取一个目标纹理图片Ir作为条件图像,并将该随机选取的目标纹理图片Ir与语义分割阶段得到的场景语义图片Sg进行级联,并将级联后的结果输入生成器Gi以推理得到生成图 像I′g,其中生成器Gi是基于RefineNet构建的U-Net模型,训练中通过优化生成图像I′g的语义图S′g与原始场景语义图Sg的损失函数对生成器Gi进行优化。其中,L1-L4为计算损失函数时的四个分量。
图片优化阶段使用多通道注意力选择模型对上一阶段的生成图片I′g进行优化,以得到最终的场景图I″g。使用多通道注意力选择模型的目的是从更大的生成空间中产生更加细粒度级的结果,并且生成不确定映射去引导优化像素损失。图3示出了根据本发明的一个实施例的多通道注意力选择模型300的示图。
多通道注意力选择模型300包括多尺度空间池化部分和多通道注意力选择部分。多尺度空间池化部分使用一组不同的大小和步幅来在相同的输入特征上执行全局平均池化,获得了具有不同接受域的多尺度特征来感知不同的空间背景。多通道注意力选择部分利用生成一系列不同中间图片并组合成最终输出。
参考图2和图3,多通道注意力选择模型300选取条件图像Ir、生成图像I′g、生成器Gi和语义分割阶段中最后一个卷积层输出的特征图级联为特征输入多尺度空间池化部分中,该多尺度空间池化部分对进行不同规模的平均池化从而获取多尺度的空间上下文特征。为了保留有用信息将经过不同尺度池化后的特征与输入特征相乘,该结果经过卷积后产生新的多尺度特征并作为多通道注意力选择部分的输入。多通道注意力选择部分通过卷积网络扩大图像的通道表示,并且结合注意力映射产生更合理的结果。
具体而言,进一步参考图2和图3,多通道注意力选择模型300选取条件图像I r、生成图像I′g、生成器Gi和语义分割阶段中最后一个卷积层输出的特征图F i和F s级联为特征输入多尺度空间池化部分中,生成的多尺度特征作为多通道注意力选择部分的输入。多通道注意力选择部分通过卷积网络扩大图像的通道表示,其中中间图片
Figure PCTCN2022097530-appb-000001
和对应的注意力图片
Figure PCTCN2022097530-appb-000002
的计算方法如公式(1)所示:
Figure PCTCN2022097530-appb-000003
最后,利用学习到的注意力图片对每个中间图片进行选择,计算方法如公式(2)所示:
Figure PCTCN2022097530-appb-000004
同时,通过学习不确定性映射(uncertainty maps),可以使像素级Loss(损失函数)优化计算更加鲁棒。
根据本发明的一个实施例,图片语义匿名化模块101使用indoor09室内场景数据集对生成器Gi和多注意力选择模型300进行训练。
图4示出了根据本发明的一个实施例的进一步描述人物姿态匿名化模块102功能的示图400。如图4所示,人物姿态匿名化模块102功能的实现与图片语义匿名化模块101类似,不同的是用openPose模型提取的姿态图替代了语义图,并在公开的人像图片数据集中随机选择一张图片作为条件图像。人物姿态匿名化模块102使用CUHK03人形数据集进行训练。
具体而言,人物姿态匿名化模块102被配置为实现姿态估计、姿态引导重建和图片优化三个阶段。
如图4所示,在姿态估计阶段,将图片语义匿名化模块101得到的语义图中的人像部分作为掩膜,从原始输入图片中截取原始人像图Ig,并用openPose模型对该原始人像图Ig中的人物姿态进行提取和估计,以生成姿态图Sg。
在本发明的上下文中,姿态引导重建阶段和图片优化阶段两者可在逻辑上/功能上构成了级联姿态引导下的基于多通道注意力选择机制的人物姿态匿名化对抗生成网络。在该人物姿态匿名化对抗生成网络中,姿态引导重建阶段用于产生粗粒度级的人物姿态匿名化结果,而图片优化阶段用于通过多通道注意力选择机制产生更细致的结果。
在姿态引导重建阶段,将从人像图片数据集中随机选取一个人像图片Ir来作为条件图像,并将该随机选取的人像图片Ir与姿态估计阶段得到的姿态图Sg进行级联,并将级联后的结果输入生成器Gi以推理得到生成图像I′g,其中生成器Gi是基于RefineNet构建的U-Net模型,训练中通过优化生成图像I′g的姿态图S′g与原始姿态图Sg的损失函数对生成器Gi进行优化。其中,L1-L4为计算损失函数时的四个分量。
图片优化阶段使用多通道注意力选择模型对上一阶段的生成图片I′g进行优化,以得到最终的人像图I″g。多通道注意力选择模型的具体描述请参见以上对于图3的描述。
图5示出了根据本发明的一个实施例的用于基于语义图和姿态图引导的图片匿名化过程的数据流图500。该数据流图500可分为图片语义匿名化阶段501、人物姿态匿名化阶段502和叠加阶段503。
参考图5,在图片语义匿名化阶段501中,输入图片经过语义分割形成了语义图,该语义图通过如上所述的图片语义匿名化对抗生成网络生成场景图。同时,在形成语义图之后,人物姿态匿名化阶段502可启动,在该阶段中,首先将语义图中的人像部分作为掩膜从输入图片中截取原始人像图。对该原始人像图进行姿态提取和估计以生成姿态图。该姿态图通过如上所述的人物姿态匿名化对抗生成网络生成新的人像图。在图片语义匿名化阶段501和人物姿态匿名化阶段502完成之后,叠加阶段503可启动,在该阶段中,将图片语义匿名化阶段501生成的场景图与人物姿态匿名化阶段502生成的新的人像图根据语义图进行叠加以形成以供输出的匿名化的图片。
图6示出了根据本发明的一个实施例的用于基于语义图和姿态图引导的图片匿名化方法600的流程图。
在步骤601,对原始图片进行语义分割,以得到语义图。根据本发明的一个实施例,该原始图片可以是由监控摄像头拍摄的图片,或者是由监控摄像头拍摄的视频中的某一帧,或者是由用户选取的图片。根据本发明的一个实施例,使用ShuffleNet作为骨干网络搭建的自编解码机来作为语义生成器,对原始图片进行推理以得到语义图。根据本发明的一个实施例,语义图可以指示人物在原始图片上的位置信息。
在步骤602,使用图片语义匿名化对抗生成网络在语义图的引导下生成一个与原始图片具有相同语义但不同内容的场景图。根据本发明的一个实施例,图片语义匿名化对抗生成网络包括语义引导重建阶段和图片优化阶段,其中语义引导重建阶段用于基于语义图采用级联语义引导来产生粗粒度级的图片语义匿名化结果,并且图片优化阶段用于通过多通道注意力选择机制对语义引导重建阶段产生的图片语义匿名化结果进行优化,以得到具有更细致粒度级的最终的场景图。
在步骤603,将步骤601中得到的语义图中的人像部分作为掩膜从原始图片中截取原始人像图。
在步骤604,对原始人像图中的人物姿态进行提取和估计,以生成姿态图。根据本发明的一个实施例,使用openPose模型对步骤603中得到的原始人像图中的人物姿态进行提取和估计,以生成姿态图。
在步骤605,使用人物姿态匿名化对抗生成网络在姿态图的引导下生成一个与原始人像图具有相同姿态但不同人物的新的人像图。根据本发明的一个实施例,人物姿态匿名化对抗生成网络包括姿态引导重建阶段和图片优化阶段,其中姿态引导重建阶段用于基于姿态图采用级联姿态引导来产生粗粒度级的人物姿态匿名化结果,而图片优化阶段用于通过多通道注意力选择机制对姿态引导重建阶段产生的人物姿态匿名化结果进行优化,以得到具有更细致粒度级的最终的人像图。
在步骤606,将步骤602生成的场景图与步骤605中生成的新的人像图根据步骤601中得到的语义图进行叠加,以得到最终的匿名化图片。根据本发明的一个实施例,通过语义分割可以得到人物在原始图片上的位置信息,通过该信息实现场景图和新的人像图的叠加。
综上,本发明和现有技术相比,主要优势在于:(1)对图片全局进行匿名化处理,原图只有抽象的语义图和人物姿态图保留在生成图片中,人脸,人身,背景都被完全替换,能够最大程度降低隐私泄露风险;(2)在完全匿名化的基础上,能够保持图片原有的语义信息、人物姿态信息和物体运动信息能为开发优化人形检测、运动检测等非身份认证的AI算法模型提供大量可用的训练数据;(3)使用多通道注意力模型进一步优化对抗生成网络输出的初始结果,使得输出图片的质量更高。
此外,在实际应用中,本发明还具有以下优势,例如,用类似技术的在线试穿应用,本发明不仅能对服装进行调整,还能把脸部信息和背景信息进行替换,最大程度保护用户隐私。
图7出了根据本发明的一个实施例的示例性计算设备的框图700,该计算设备是可应用于本发明的各方面的硬件设备的一个示例。例如,上文提及的监控设备、远程设备、与用户相关联的计算设备均可被实现为图7中的计算设备。计算设备700可以是可被配置成用于实现处理和/或计算的任何机器,可以是但并不局限于工作站、服务器、桌面型计算机、膝上型计算机、平板计算机、个 人数字处理、智能手机、车载计算机或者它们的任何组合。计算设备700可包括可经由一个或多个接口和总线702连接或通信的组件。例如,计算设备700可包括总线702、一个或多个处理器704、一个或多个输入设备706以及一个或多个输出设备708。该一个或多个处理器704可以是任何类型的处理器并且可包括但不限于一个或多个通用处理器和/或一个或多个专用处理器(例如,专门的处理芯片)。输入设备706可以是任何类型的能够向计算设备输入信息的设备并且可以包括但不限于鼠标、键盘、触摸屏、麦克风和/或远程控制器。输出设备708可以是任何类型的能够呈现信息的设备并且可以包括但不限于显示器、扬声器、视频/音频输出终端、振动器和/或打印机。计算设备700也可以包括非瞬态存储设备710或者与所述非瞬态存储设备相连接,所述非瞬态存储设备可以是非瞬态的并且能够实现数据存储的任何存储设备,并且所述非瞬态存储设备可以包括但不限于磁盘驱动器、光存储设备、固态存储器、软盘、软磁盘、硬盘、磁带或任何其它磁介质、光盘或任何其它光介质、ROM(只读存储器)、RAM(随机存取存储器)、高速缓冲存储器和/或任何存储芯片或盒式磁带、和/或计算机可从其读取数据、指令和/或代码的任何其它介质。非瞬态存储设备710可从接口分离。非瞬态存储设备710可具有用于实施上述方法和步骤的数据/指令/代码。计算设备700也可包括通信设备712。通信设备712可以是任何类型的能够实现与内部装置通信和/或与网络通信的设备或***并且可以包括但不限于调制解调器、网卡、红外通信设备、无线通信设备和/或芯片组,例如蓝牙设备、IEEE 1302.11设备、WiFi设备、WiMax设备、蜂窝通信设备和/或类似设备。
总线702可以包括但不限于工业标准结构(ISA)总线、微通道结构(MCA)总线、增强型ISA(EISA)总线、视频电子标准协会(VESA)局部总线和外部设备互连(PCI)总线。
计算设备700还可包括工作存储器714,该工作存储器714可以是任何类型的能够存储有利于处理器704的工作的指令和/或数据的工作存储器并且可以包括但不限于随机存取存储器和/或只读存储设备。
软件组件可位于工作存储器714中,这些软件组件包括但不限于操作***716、一个或多个应用程序718、驱动程序和/或其它数据和代码。用于实现本 发明上述方法和步骤的指令可包含在所述一个或多个应用程序718中,并且可通过处理器704读取和执行所述一个或多个应用程序718的指令来实现本发明的上述方法600。
也应该认识到可根据具体需求而做出变化。例如,也可使用定制硬件、和/或特定组件可在硬件、软件、固件、中间件、微代码、硬件描述语音或其任何组合中实现。此外,可采用与其它计算设备、例如网络输入/输出设备等的连接。例如,可通过具有汇编语言或硬件编程语言(例如,VERILOG、VHDL、C++)的编程硬件(例如,包括现场可编程门阵列(FPGA)和/或可编程逻辑阵列(PLA)的可编程逻辑电路)利用根据本发明的逻辑和算法来实现所公开的方法和设备的部分或全部。
尽管目前为止已经参考附图描述了本发明的各方面,但是上述方法和设备仅是示例,并且本发明的范围不限于这些方面,而是仅由所附权利要求及其等同物来限定。各种组件可被省略或者也可被等同组件替代。另外,也可以在与本发明中描述的顺序不同的顺序实现所述步骤。此外,可以按各种方式组合各种组件。也重要的是,随着技术的发展,所描述的组件中的许多组件可被之后出现的等同组件所替代。

Claims (10)

  1. 一种基于语义图和姿态图引导的图片匿名化方法,包括:
    对原始图片进行语义分割以得到语义图;
    使用图片语义匿名化对抗生成网络在所述语义图的引导下生成一个与所述原始图片具有相同语义但不同内容的场景图;
    将所述语义图中的人像部分作为掩膜从所述原始图片中截取人像图;
    对所述人像图中的人物姿态进行提取和估计以生成姿态图;
    使用人物姿态匿名化对抗生成网络在所述姿态图的引导下生成一个与所述人像图具有相同姿态但不同人物的新的人像图;
    将所述场景图与所述新的人像图根据所述语义图进行叠加,以得到最终的匿名化图片。
  2. 如权利要求1所述的方法,其特征在于,所述图片语义匿名化对抗生成网络包括语义引导重建阶段和图片优化阶段,其中所述语义引导重建阶段用于基于所述语义图采用级联语义引导来产生粗粒度级的图片语义匿名化结果,并且所述图片优化阶段用于通过多通道注意力选择机制对所述语义引导重建阶段产生的所述图片语义匿名化结果进行优化,以得到具有更细致粒度级的场景图。
  3. 如权利要求1所述的方法,其特征在于,所述人物姿态匿名化对抗生成网络包括姿态引导重建阶段和图片优化阶段,其中所述姿态引导重建阶段用于基于所述姿态图采用级联姿态引导来产生粗粒度级的人物姿态匿名化结果,而所述图片优化阶段用于通过多通道注意力选择机制对所述姿态引导重建阶段产生的所述人物姿态匿名化结果进行优化,以得到具有更细致粒度级的人像图。
  4. 如权利要求1所述的方法,其特征在于,所述对原始图片进行语义分割以得到语义图进一步包括:使用ShuffleNet作为骨干网络搭建的自编解码机来 作为语义生成器,对所述原始图片进行推理以得到所述语义图。
  5. 如权利要求1所述的方法,其特征在于,对所述人像图中的人物姿态进行提取和估计以生成姿态图进一步包括:使用openPose模型对所述原始人像图中的人物姿态进行提取和估计,以生成所述姿态图。
  6. 一种基于语义图和姿态图引导的图片匿名化***,包括:
    图片语义匿名化模块,所述图片语义匿名化模块被配置为:
    对原始图片进行语义分割以得到语义图;
    使用图片语义匿名化对抗生成网络在所述语义图的引导下生成一个与所述原始图片具有相同语义但不同内容的场景图;
    人物姿态匿名化模块,所述人物姿态匿名化模块被配置为:
    将所述语义图中的人像部分作为掩膜从所述原始图片中截取人像图;
    对所述人像图中的人物姿态进行提取和估计以生成姿态图;
    使用人物姿态匿名化对抗生成网络在所述姿态图的引导下生成一个与所述人像图具有相同姿态但不同人物的新的人像图;
    叠加模块,所述叠加模块被配置为:
    将所述场景图与所述新的人像图根据所述语义图进行叠加,以得到最终的匿名化图片。
  7. 如权利要求6所述的***,其特征在于,所述图片语义匿名化对抗生成网络包括语义引导重建阶段和图片优化阶段,其中所述语义引导重建阶段用于基于所述语义图采用级联语义引导来产生粗粒度级的图片语义匿名化结果,并且所述图片优化阶段用于通过多通道注意力选择机制对所述语义引导重建阶段产生的所述图片语义匿名化结果进行优化,以得到具有更细致粒度级的场景图。
  8. 如权利要求6所述的***,其特征在于,所述人物姿态匿名化对抗生成 网络包括姿态引导重建阶段和图片优化阶段,其中所述姿态引导重建阶段用于基于所述姿态图采用级联姿态引导来产生粗粒度级的人物姿态匿名化结果,而所述图片优化阶段用于通过多通道注意力选择机制对所述姿态引导重建阶段产生的所述人物姿态匿名化结果进行优化,以得到具有更细致粒度级的人像图。
  9. 如权利要求6所述的***,其特征在于,所述对原始图片进行语义分割以得到语义图进一步包括:使用ShuffleNet作为骨干网络搭建的自编解码机来作为语义生成器,对所述原始图片进行推理以得到所述语义图;
    对所述人像图中的人物姿态进行提取和估计以生成姿态图进一步包括:使用openPose模型对所述原始人像图中的人物姿态进行提取和估计,以生成所述姿态图。
  10. 一种用于基于语义图和姿态图引导的图片匿名化的计算设备,包括:
    处理器;
    存储器,所述存储器存储有指令,所述指令在被所述处理器执行时能执行如权利要求1-5任一所述的方法。
PCT/CN2022/097530 2021-10-14 2022-06-08 一种基于语义和姿态图引导的图片匿名化方法 WO2023060918A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111196429.9A CN113919998B (zh) 2021-10-14 2021-10-14 一种基于语义和姿态图引导的图片匿名化方法
CN202111196429.9 2021-10-14

Publications (1)

Publication Number Publication Date
WO2023060918A1 true WO2023060918A1 (zh) 2023-04-20

Family

ID=79240288

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/097530 WO2023060918A1 (zh) 2021-10-14 2022-06-08 一种基于语义和姿态图引导的图片匿名化方法

Country Status (2)

Country Link
CN (1) CN113919998B (zh)
WO (1) WO2023060918A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116778564A (zh) * 2023-08-24 2023-09-19 武汉大学 一种身份保持的人脸匿名化方法、***及设备

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190116290A1 (en) * 2017-10-16 2019-04-18 Nokia Technologies Oy Apparatus and methods for determining and providing anonymized content within images
CN110363183A (zh) * 2019-07-30 2019-10-22 贵州大学 基于生成式对抗网络的服务机器人视觉图片隐私保护方法
CN111242837A (zh) * 2020-01-03 2020-06-05 杭州电子科技大学 基于生成对抗网络的人脸匿名隐私保护方法
US20200211154A1 (en) * 2018-12-30 2020-07-02 Altumview Systems Inc. Method and system for privacy-preserving fall detection
CN111539262A (zh) * 2020-04-02 2020-08-14 中山大学 一种基于单张图片的运动转移方法及***
CN112241708A (zh) * 2020-10-19 2021-01-19 戴姆勒股份公司 用于由原始人物图像生成新的人物图像的方法及装置
CN113343878A (zh) * 2021-06-18 2021-09-03 北京邮电大学 基于生成对抗网络的高保真人脸隐私保护方法和***
US20210295581A1 (en) * 2020-03-18 2021-09-23 Robert Bosch Gmbh Anonymization apparatus, surveillance device, method, computer program and storage medium

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11397462B2 (en) * 2012-09-28 2022-07-26 Sri International Real-time human-machine collaboration using big data driven augmented reality technologies
US10755479B2 (en) * 2017-06-27 2020-08-25 Mad Street Den, Inc. Systems and methods for synthesizing images of apparel ensembles on models
US10546387B2 (en) * 2017-09-08 2020-01-28 Qualcomm Incorporated Pose determination with semantic segmentation
KR102109372B1 (ko) * 2018-04-12 2020-05-12 가천대학교 산학협력단 멀티 스케일 이미지와 멀티 스케일 확장된 컨볼루션 기반의 완전 컨볼루션 뉴럴 네트워크를 이용한 시맨틱 이미지 세그먼테이션 장치 및 그 방법
CN110021051B (zh) * 2019-04-01 2020-12-15 浙江大学 一种基于生成对抗网络通过文本指导的人物图像生成方法
US11244504B2 (en) * 2019-05-03 2022-02-08 Facebook Technologies, Llc Semantic fusion
CN110473266A (zh) * 2019-07-08 2019-11-19 南京邮电大学盐城大数据研究院有限公司 一种基于姿态指导的保留源场景人物动作视频生成方法
US11475608B2 (en) * 2019-09-26 2022-10-18 Apple Inc. Face image generation with pose and expression control
CN111325806A (zh) * 2020-02-18 2020-06-23 苏州科达科技股份有限公司 基于语义分割的服装颜色识别方法、装置和***
CN112651423A (zh) * 2020-11-30 2021-04-13 深圳先进技术研究院 一种智能视觉***
CN113160035A (zh) * 2021-04-16 2021-07-23 浙江工业大学 一种基于姿态引导、风格和形状特征约束的人体图像生成方法
CN113255813B (zh) * 2021-06-02 2022-12-02 北京理工大学 一种基于特征融合的多风格图像生成方法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190116290A1 (en) * 2017-10-16 2019-04-18 Nokia Technologies Oy Apparatus and methods for determining and providing anonymized content within images
US20200211154A1 (en) * 2018-12-30 2020-07-02 Altumview Systems Inc. Method and system for privacy-preserving fall detection
CN110363183A (zh) * 2019-07-30 2019-10-22 贵州大学 基于生成式对抗网络的服务机器人视觉图片隐私保护方法
CN111242837A (zh) * 2020-01-03 2020-06-05 杭州电子科技大学 基于生成对抗网络的人脸匿名隐私保护方法
US20210295581A1 (en) * 2020-03-18 2021-09-23 Robert Bosch Gmbh Anonymization apparatus, surveillance device, method, computer program and storage medium
CN111539262A (zh) * 2020-04-02 2020-08-14 中山大学 一种基于单张图片的运动转移方法及***
CN112241708A (zh) * 2020-10-19 2021-01-19 戴姆勒股份公司 用于由原始人物图像生成新的人物图像的方法及装置
CN113343878A (zh) * 2021-06-18 2021-09-03 北京邮电大学 基于生成对抗网络的高保真人脸隐私保护方法和***

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116778564A (zh) * 2023-08-24 2023-09-19 武汉大学 一种身份保持的人脸匿名化方法、***及设备
CN116778564B (zh) * 2023-08-24 2023-11-17 武汉大学 一种身份保持的人脸匿名化方法、***及设备

Also Published As

Publication number Publication date
CN113919998B (zh) 2024-05-14
CN113919998A (zh) 2022-01-11

Similar Documents

Publication Publication Date Title
KR102469295B1 (ko) 깊이를 사용한 비디오 배경 제거
US11645506B2 (en) Neural network for skeletons from input images
US11410457B2 (en) Face reenactment
JP7490004B2 (ja) 機械学習を用いた画像カラー化
Betancourt et al. The evolution of first person vision methods: A survey
WO2021236296A9 (en) Maintaining fixed sizes for target objects in frames
CN106682632B (zh) 用于处理人脸图像的方法和装置
EP3975046B1 (en) Method and apparatus for detecting occluded image and medium
WO2023060918A1 (zh) 一种基于语义和姿态图引导的图片匿名化方法
WO2022267653A1 (zh) 图像处理方法、电子设备及计算机可读存储介质
US20220100989A1 (en) Identifying partially covered objects utilizing machine learning
Harichandana et al. PrivPAS: A real time Privacy-Preserving AI System and applied ethics
CN113903063A (zh) 基于深度时空网络决策融合的人脸表情识别方法及***
CN110110742B (zh) 多特征融合方法、装置、电子设备及存储介质
KR101189043B1 (ko) 영상통화 서비스 및 그 제공방법, 이를 위한 영상통화서비스 제공서버 및 제공단말기
CN111274447A (zh) 基于视频的目标表情生成方法、装置、介质、电子设备
KR102678533B1 (ko) 인공지능을 이용한 영상 내 객체 블러링 방법 및 그 장치
Hassan et al. Selective content removal for egocentric wearable camera in Nutritional Studies
KR102633279B1 (ko) 선택적 비식별화 장치 및 방법
US20240202232A1 (en) Methods and Systems for Processing Imagery
KR102475956B1 (ko) 얼굴인식을 위한 얼굴정보 등록 서비스 제공 시스템
US20230086009A1 (en) System and techniques to normalize objects in spatial imaging of spaces
Hassan Cross-Domain Visual Learning and Applications in Privacy, Retrieval and Model Adaptation
CN114827706A (zh) 图像处理的方法、计算机程序产品、电子设备及存储介质
CN111797839A (zh) 特征提取方法、装置、存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22879867

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE