WO2021256702A1 - Dispositif électronique et son procédé de commande - Google Patents

Dispositif électronique et son procédé de commande Download PDF

Info

Publication number
WO2021256702A1
WO2021256702A1 PCT/KR2021/005652 KR2021005652W WO2021256702A1 WO 2021256702 A1 WO2021256702 A1 WO 2021256702A1 KR 2021005652 W KR2021005652 W KR 2021005652W WO 2021256702 A1 WO2021256702 A1 WO 2021256702A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
data
attribute value
attribute
image data
Prior art date
Application number
PCT/KR2021/005652
Other languages
English (en)
Korean (ko)
Inventor
이한빛
이상구
유강민
Original Assignee
삼성전자 주식회사
서울대학교산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 삼성전자 주식회사, 서울대학교산학협력단 filed Critical 삼성전자 주식회사
Publication of WO2021256702A1 publication Critical patent/WO2021256702A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present disclosure relates to an electronic device and a method for controlling the same, and more particularly, to an electronic device for generating a converted image for an image, and a method for controlling the same.
  • GAN Generative Adversarial Network
  • the image-to-image conversion technology converts various image properties such as color, shape, shape, age, and gender of the background or object of the image to generate an image in which some properties of the original image are converted.
  • an artificial intelligence model including a GAN network is trained to learn various properties of an input image and to output an image obtained by converting at least one property among various properties. That is, the conventional artificial intelligence model for generating a transformed image has been trained to output a transformed image by receiving an image in the image space where the image exists.
  • the artificial intelligence model transforms the properties of the input image and creates a transformed image close to the actual image. This may be needed That is, when image conversion is performed using the conventional artificial intelligence model, there is a problem in that the load of the electronic device including the artificial intelligence model increases or the clarity of image conversion decreases.
  • the present disclosure is intended to solve the above problems, and specifically, an electronic device and its control to increase the inference accuracy of the image and to generate a clear image by converting the image in the latent space created by the learning of the artificial intelligence model to provide a method.
  • the electronic device converts an input image into image data in the latent space using a memory in which pre-generated information about the latent space is stored, and a function for normalizing an image to data in the latent space, obtaining attribute conversion data for changing an attribute value of an image included in the image data to a target attribute value, and generating transformed image data for the image data based on the image data and the attribute transformation data; and a processor configured to generate a converted image for the input image based on the converted image data.
  • the attribute conversion data and the converted image data are data on the latent space.
  • the method of controlling an electronic device includes converting an input image into image data in the latent space using a function for normalizing an image to data in the latent space, the image data obtaining attribute conversion data for changing an attribute value of an image included in a target attribute value to a target attribute value; generating transformed image data for the image data based on the image data and the attribute transformation data; and the and generating a converted image for the input image based on the converted image data.
  • the attribute conversion data and the converted image data are data on the latent space.
  • FIG. 1 is a diagram schematically illustrating an electronic device according to various embodiments of the present disclosure
  • FIG. 2 is a block diagram illustrating a configuration of an electronic device according to various embodiments of the present disclosure
  • FIG. 3 is a view for explaining an electronic device that performs image conversion based on a target attribute value according to an embodiment of the present disclosure
  • FIG. 4 is a diagram for explaining a learning process of a transformation network included in an electronic device according to an embodiment of the present disclosure
  • FIG. 5 is a view for explaining a converted image generated by an electronic device according to an embodiment of the present disclosure
  • FIG. 6 is a block diagram illustrating a detailed configuration of an electronic device according to an embodiment of the present disclosure.
  • FIG. 7 is a flowchart illustrating a method of controlling an electronic device according to various embodiments of the present disclosure.
  • expressions such as “have,” “may have,” “include,” or “may include” indicate the presence of a corresponding characteristic (eg, a numerical value, function, operation, or component such as a part). and does not exclude the presence of additional features.
  • expressions such as “A or B,” “at least one of A and/and B,” or “one or more of A or/and B” may include all possible combinations of the items listed together.
  • “A or B,” “at least one of A and B,” or “at least one of A or B” means (1) includes at least one A, (2) includes at least one B; Or (3) it may refer to all cases including both at least one A and at least one B.
  • expressions such as “first,” “second,” “first,” or “second,” may modify various elements, regardless of order and/or importance, and refer to one element. It is used only to distinguish it from other components, and does not limit the components. As an example, the use order or arrangement order of components combined with such an ordinal number should not be limited by the number. If necessary, each ordinal number may be used interchangeably.
  • a component eg, a first component is "coupled with/to (operatively or communicatively)" to another component (eg, a second component);
  • another component eg, a second component
  • the certain element may be directly connected to the other element or may be connected through another element (eg, a third element).
  • a component eg, a first component
  • another component eg, a second component
  • a device configured to may mean that the device is “capable of” with other devices or parts.
  • a coprocessor configured (or configured to perform) A, B, and C may refer to a dedicated processor (eg, an embedded processor), or one or more software programs stored on a memory device, to perform the corresponding operations. By doing so, it may mean a generic-purpose processor (eg, a central processing unit (CPU) or an application processor) capable of performing corresponding operations.
  • the term user may refer to a person who uses an electronic device or a device (eg, an artificial intelligence electronic device) using the electronic device.
  • a device eg, an artificial intelligence electronic device
  • FIG. 1 is a diagram schematically illustrating an electronic device according to various embodiments of the present disclosure
  • the electronic device 100 includes a smart phone, a tablet PC, a mobile phone, a video phone, an e-book reader, a desktop PC, a laptop PC, a netbook computer, a workstation, a server, a PDA, a portable multimedia player (PMP), an MP3 It may include at least one of a player, a kiosk, a medical device, a camera, and a wearable device.
  • a wearable device may be an accessory (e.g., watch, ring, bracelet, anklet, necklace, eyewear, contact lens, or head-mounted-device (HMD)), a textile or clothing integral (e.g.
  • the electronic garment may include at least one of a body-worn (eg, skin pad) or bioimplantable circuit
  • the electronic device may include, for example, a television, a digital video disk (DVD) player, an audio, At least one of a refrigerator, air conditioner, vacuum cleaner, oven, microwave oven, washing machine, air purifier, set-top box, home automation control panel, security control panel, media box, game console, electronic dictionary, electronic key, camcorder, or electronic picture frame.
  • a television a digital video disk (DVD) player
  • an audio At least one of a refrigerator, air conditioner, vacuum cleaner, oven, microwave oven, washing machine, air purifier, set-top box, home automation control panel, security control panel, media box, game console, electronic dictionary, electronic key, camcorder, or electronic picture frame.
  • the electronic device 100 may generate a converted image 12 for the image 11 .
  • the image 11 is an image stored in the electronic device 100 , an image acquired using a camera (not shown) included in the electronic device 100 , and the electronic device 100 is an external device (not shown). It may be a variety of images, such as an image received from , or a virtual image generated by the electronic device 100 .
  • the image 11 is an image obtained by capturing a specific object, and may include at least one object.
  • the image 11 is JPEG (Joint Photographic Experts Group), GIF (Graphics Interchange Format), PNG (Potable Network Graphics), TIFF (Tag Image File Format), BMP (Microsoft Windows Device Independent Bitmap), HEIF (High Efficiency) Image File Format), BPG (Better Portable Graphics), Raw, or AI (Adobe Illustrator), SVG (Scalable Vector Graphics), VML (Vector Markup Language), CGM (Computer Graphics Metafile), Gerber format ( Gerber format) or a composite image such as PDF (Portable Document Format) or EPS (Encapsulated Postscript).
  • the image 11 may include a plurality of attribute information.
  • the properties of the image may be various elements representing the background or object included in the image, such as the color, size, shape, and shape of the background or object included in the image. For example, if the image contains an image of a person (object), various factors such as the person's gender, age, height, hair style, hair length, hair color, size or shape of eyes/nose/mouth, skin color or race It can be attribute information.
  • the electronic device 100 may generate the converted image 12 by converting at least one piece of property information among a plurality of pieces of property information included in the image 11 .
  • the electronic device 100 may convert the image 11 into image data 21 in a latent space 20 .
  • the latent space is a space corresponding to an actual observation space or an image space in which an actual image exists, and represents a space in which an image existing in the actual observation space or the image space can be described. .
  • the electronic device 100 may generate the latent space in advance by reducing the dimension of the multidimensional image data set in the actual observation space or the image space to generate a new dimension image data set.
  • the dimension may be a concept corresponding to the attribute information of the image. That is, the electronic device 100 may generate a latent space in advance by reducing the dimension of the image data set to include only attribute information necessary to express an image among a plurality of attribute information included in the image data set.
  • the electronic device 100 may convert the image 11 into the image data 21 in the latent space by using a function F that normalizes the image 11 to data in the latent space.
  • normalization means limiting the range of data included in the image 11 to a range desired by a user or a preset range.
  • the function F refers to a function that normalizes the image 11 and converts it into image data 21 in the latent space. For example, when the image 11 is input as an input of the function F, the function F limits the range of data included in the image 11 to a range desired by the user or a preset range to convert the image data 21 in the latent space can do.
  • the image data in the latent space may be expressed in the form of a vector.
  • the image data 21 may include all or part of attribute information included in the image.
  • the electronic device 100 may include an artificial intelligence model.
  • the artificial intelligence model may be an artificial intelligence model including an artificial neural network learned through machine learning (or machine learning) or deep learning.
  • the artificial intelligence model may be composed of a plurality of neural network layers. Each of the plurality of neural network layers has a plurality of weight values, and a neural network operation is performed through an operation between the operation result of a previous layer and the plurality of weights.
  • the plurality of weights of the plurality of neural network layers may be optimized by the learning result of the artificial intelligence model. For example, a plurality of weights may be updated so that a loss value or a cost value obtained from the artificial intelligence model during the learning process is reduced or minimized.
  • the artificial neural network may include a deep neural network (DNN), for example, a Convolutional Neural Network (CNN), a Deep Neural Network (DNN), a Recurrent Neural Network (RNN), a Restricted Boltzmann Machine (RBM), There may be a Deep Belief Network (DBN), a Bidirectional Recurrent Deep Neural Network (BRDNN), or a Deep Q-Networks, but is not limited thereto.
  • DNN Deep Neural Network
  • DNN Deep Belief Network
  • BBDNN Bidirectional Recurrent Deep Neural Network
  • Deep Q-Networks Deep Q-Networks
  • the electronic device 100 may generate the latent space 20 in a learning process of the artificial intelligence model included in the electronic device 100 .
  • the electronic device 100 may include an artificial intelligence model trained to normalize an image to data on a latent space.
  • the AI model may include a function F that normalizes the image to data in the latent space.
  • the electronic device 100 may generate the latent space 20 in the process of learning to normalize the plurality of learning images to data on the latent space.
  • the electronic device 100 may generate the image data 21 in the previously generated latent space 20 and convert the generated image data 21 to generate the converted image data 22 .
  • the electronic device 100 may generate the converted image 12 by using the converted image data 22 .
  • the electronic device 100 may convert the transformed image data 22 into the transformed image 12 by using the inverse function F-1 of the function F.
  • the electronic device 100 may perform image transformation in a latent space in which the dimension of the image data set is reduced.
  • the electronic device 100 may directly infer the image data in the latent space.
  • the electronic device 100 since the electronic device 100 infers the transformed image data in the latent space in which the dimension of the image data set is reduced, resource use and load of the electronic device 100 may be reduced, and the user's intention It is possible to provide a converted image more conforming to
  • FIG. 2 is a block diagram illustrating a configuration of an electronic device according to various embodiments of the present disclosure
  • the electronic device 100 may include a memory 110 and a processor 120 .
  • Functions related to artificial intelligence according to the present disclosure are operated through a processor and a memory.
  • the memory 110 is a configuration for storing an operating system (OS) for controlling overall operations of the components of the electronic device 100 and at least one instruction or data related to the components of the electronic device 100 .
  • OS operating system
  • An instruction means one action statement that can be directly executed by the processor 120 in a programming language, and is a minimum unit for program execution or operation.
  • the processor 120 may perform operations according to various embodiments to be described later by executing at least one instruction stored in the memory 110 .
  • the memory 110 is a component for storing various programs and data necessary for the operation of the electronic device 100 .
  • the memory 110 may be implemented as a non-volatile memory, a volatile memory, a flash-memory, a hard disk drive (HDD), or a solid state drive (SSD).
  • the memory 110 is accessed by the processor 120 , and reading/writing/modification/deletion/update of data by the processor 120 may be performed.
  • the term "memory” refers to a memory 110, a ROM (not shown) in the processor 120, a RAM (not shown), or a memory card (not shown) mounted in the electronic device 100 (eg, micro SD). card, memory stick).
  • the latent space 20 is a space of a new-dimensional image data set generated by reducing the dimension of a multi-dimensional image data set, and represents a space in which an image in the image space can be described.
  • the memory 110 may store an artificial intelligence model used to convert the image 11 into the transformed image 12 .
  • the memory 110 converts the image data 21 on the latent space and a flow-based generative model trained to normalize the image 11 into data on the latent space to convert the transformed image data 22 ) to generate a translator network model.
  • the flow-based generation model may be an artificial intelligence model trained to generate image data similar to the training image by learning the characteristics of the training image.
  • image data similar to the training image may be expressed in a vector form.
  • the flow-based generation model can create a latent space while learning to transform a plurality of images into a plurality of image data. That is, the latent space 20 may represent a set of training image data sets generated when the flow-based generation model is trained.
  • the flow-based generative model is a network capable of inverse transformation.
  • the fact that the inverse transformation is possible indicates that the input value can be output by using the output value of the network as an input.
  • the flow-based generation model may be an artificial intelligence model that is not only trained to output image data by taking an image as an input, but is also trained to generate an image by receiving image data as an input conversely.
  • the flow-based generation model may include a network capable of at least one inverse transformation.
  • the flow-based generative model may include a plurality of networks such as squeeze networks, actnorm networks, 1x1 convolution networks, coupling networks, and split networks, and squeeze networks, actnorm networks, 1x1 convolution networks, coupling networks, and split networks are Each may be a network capable of inverse transformation.
  • squeeze networks, actnorm networks, 1x1 convolution networks, coupling networks, and split networks are known technologies, a detailed description to be omitted.
  • the transformation network model may be an artificial intelligence model trained to output attribute transformation data for transforming attributes of an image based on image data in the latent space. Specifically, the transformation network model learns to provide attribute transformation data for transforming an attribute of an image from an original attribute value to a target attribute value based on the image data in the latent space, the original attribute value of the image, and the target attribute value of the image It may be a model that has been In this case, the attribute conversion data may be expressed as a vector as data on the latent space. A detailed operation of the transformation network model will be described later with reference to FIG. 3 .
  • the memory 110 may include a prior network model and a property classification network model required to learn the transformation network model.
  • the prior network model and the attribute classification network model may be used in the training process of the model of the transformation network to increase the accuracy of the transformation network.
  • the prior network may be a model trained to output a distribution of attribute transformation data for converting an attribute of an image from an original attribute value to a target attribute value, based on the original attribute value of the image and the target attribute value of the image.
  • the transformation network may accurately output the distribution of the attribute transformation data.
  • the attribute classification network model may be a model trained to output image attribute values included in image data.
  • the attribute classification network model may output an attribute value of transformed image data generated based on the attribute transformation data output by the transformation network.
  • the attribute classification network model may determine whether an attribute value of transformed image data generated through the transformation network includes a target attribute value. By using the attribute value of the transformed image data output through the attribute classification network, the transformation network may accurately output the transformed image data including the target attribute value.
  • the processor 120 may be electrically connected to the memory 110 to control overall operations and functions of the electronic device 100 .
  • the processor 120 may control hardware or software components connected to the processor 120 by driving an operating system or an application program, and may perform various data processing and operations.
  • the processor 120 may load and process commands or data received from at least one of other components into the volatile memory, and store various data in the non-volatile memory.
  • the processor 120 is a general-purpose processor (eg, a CPU (Central) It may be implemented as a processing unit) or an application processor (AP).
  • a general-purpose processor eg, a CPU (Central) It may be implemented as a processing unit) or an application processor (AP).
  • AP application processor
  • the processor may include one or a plurality of processors.
  • one or more processors are general-purpose processors such as CPUs, APs, Digital Signal Processors (DSPs), etc., graphics-only processors such as GPUs (Graphics Processing Units), VPUs (Vision Processing Units), or artificial intelligence-only processors such as NPUs.
  • DSPs Digital Signal Processors
  • GPUs Graphics Processing Units
  • VPUs Vision Processing Units
  • artificial intelligence-only processors such as NPUs.
  • processors control to process input data according to a predefined operation rule or artificial intelligence model stored in the memory.
  • the AI-only processor may be designed with a hardware structure specialized for processing a specific AI model.
  • a predefined action rule or artificial intelligence model is characterized in that it is created through learning.
  • being made through learning means that a basic artificial intelligence model is learned using a plurality of learning data by a learning algorithm, so that a predefined action rule or artificial intelligence model set to perform a desired characteristic (or purpose) is created means burden.
  • Such learning may be performed in the device itself on which artificial intelligence according to the present disclosure is performed, or may be performed through a separate server and/or system.
  • Examples of the learning algorithm include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.
  • the processor 120 may generate the transformed image 12 for the image 11 based on the information on the latent space 20 .
  • the processor 120 may convert the input image into image data on the latent space 20 by using a function that normalizes the image 11 to data on the latent space 20 .
  • the function for normalizing the image 11 to data on the latent space 20 may be a function included in the flow-based generative model. That is, the processor 120 may convert the image 11 into image data in the latent space 20 using a flow-based generation model.
  • the processor 120 may receive a target attribute value for at least one attribute among a plurality of attributes included in the image.
  • the target attribute value indicates the attribute value of the image to be changed.
  • 'brown' may be a target attribute value.
  • the processor 120 may identify an attribute value corresponding to a target attribute value among attribute values of a plurality of attributes included in the image data, and determine conversion of the image data using a difference between the target attribute value and the attribute value. Specifically, when there is a difference between the target attribute value and the attribute value, the processor 120 may determine that image data conversion is necessary. On the other hand, when there is no difference between the input target attribute value and the attribute value of the current image, the processor 120 may determine that image data conversion is not required.
  • FIG. 3 is a diagram for describing an electronic device that performs image conversion based on a target attribute value according to an embodiment of the present disclosure.
  • the processor 120 may convert the image 11 into image data 21 in the latent space 20 using a function that normalizes the image to data in the latent space. .
  • the processor 120 converts the image 11 into the latent space 20 using the flow-based generation model 30 .
  • the processor 120 may input the image 11 into the flow-based generation model 30 , and obtain image data 21 on the latent space 20 for the input image 11 .
  • the processor 120 may acquire the attribute conversion data 23 for changing the attribute value 1 of the image included in the image data 21 to the target attribute value 2 .
  • the processor 120 may determine the conversion of the image data 21 by identifying the target attribute value 2 input to the electronic device 100 and the attribute value 1 corresponding to the target attribute value.
  • the processor 120 may extract the attribute transformation data 23 based on the difference 3 between the target attribute value 2 and the attribute value 1 . .
  • the processor 120 may input the target attribute value 2 and the difference 3 between the attribute value 1 to the transformation network 40 to output attribute transformation data 23 .
  • the transformation network 40 is an artificial intelligence model trained to output attribute transformation data by inputting the difference between the target attribute value and the attribute value.
  • the transformation network 40 may learn the distribution of attribute transformation data when the transformation network 40 is trained.
  • the attribute transformation data is data on the latent space 20
  • the distribution of the attribute transformation data may be a normal distribution.
  • the transformation network 40 may determine the distribution 4 of the attribute transformation data 23 .
  • the transformation network 40 may extract the attribute transformation data 23 satisfying a preset condition from the distribution 4 of the attribute transformation data.
  • the transformation network 40 may extract the attribute transformation data 23 corresponding to the average value from the distribution 4 of the attribute transformation data.
  • the transformation network may extract the attribute transformation data 23 satisfying the condition from the distribution 4 of the attribute transformation data.
  • the processor 120 may generate the converted image data 22 for the image data 21 based on the image data 21 and the attribute conversion data 23 . At this time, the generated transformed image data 22 is also data on the latent space 20 .
  • the processor 120 combines the image data 21 obtained through the flow-based generation model 30 and the attribute conversion data 23 obtained through the conversion network 40 to generate the converted image data 22 .
  • the processor 120 can create In that the image data 21 and the property conversion data 23 can be displayed in the form of a vector on the latent space 20 , the processor 120 can combine the image data 21 and the property conversion data 23 . As a result, the converted image data 22 may be generated.
  • the processor 120 may generate the converted image 12 for the input image 11 based on the generated converted image data 22 .
  • the processor 120 generates an inverse function of a function that normalizes the image to data on the latent space, and uses the transformed image data 22 as an input of the generated inverse function to generate the transformed image 12 for the image 11 can create
  • the processor 120 converts the transformed image data 22 to the transformed image using the flow-based generation model 30 .
  • (12) can be converted to
  • the flow-based generation model 30 is an artificial intelligence model trained to enable inverse transformation
  • the processor 120 inputs the transformed image data 22 to the flow-based generation model 30 to generate the transformed image 12 .
  • FIG. 4 is a diagram for explaining a learning process of a transformation network included in an electronic device according to an embodiment of the present disclosure.
  • the transformation network 40 is an artificial intelligence model trained to output attribute transformation data using image data, attribute values, and target attribute values on the latent space 20 .
  • the processor 120 may train the transformation network 40 using the prior network 50 and the attribute classification network 60 .
  • the prior network 50 is an artificial intelligence model trained to output a reference distribution of attribute transformation data in an image using attribute values and target attribute values of the image. Specifically, the prior network 50 provides, through the distribution of the attribute transformation data 23, which attribute transformation data 23 is required in the latent space 20 to transform the image attribute from the attribute value to the target attribute value. can do.
  • the processor 120 may obtain a reference distribution of the attribute transformation data by inputting a difference value between the attribute value and the target attribute value into the prior network 50 , and train the transformation network 40 using the obtained reference distribution. .
  • the processor 120 may be used to normalize the distribution 4 of the transformation network 40 to the distribution of the attribute transformation data output by the prior network 50 .
  • the prior network 50 can be viewed as outputting a reference distribution of attribute transformation data.
  • the transformation network 40 receives image data as an input in addition to the difference between the attribute value and the target attribute value and learns to output attribute transformation data
  • the prior network 50 receives only the difference between the attribute value and the target attribute value as input and attributes the attribute It can be learned to output a distribution of transform data. That is, the prior network 50 provides distribution results of attribute conversion data for generating various images in that only the difference between the attribute value and the target attribute value is input regardless of the input image. For example, as shown in FIG. 4 , when the target attribute value is 'glasses' and the original attribute value is 'not wearing glasses', the prior network 50 receives the difference between the target attribute value and the attribute value as an input. It is possible to provide the distribution result of the attribute transformation data for generating various images 'with glasses added'.
  • the processor 120 may learn the prior network 50 by inputting various target attribute values and attribute values to the prior network 50 , but the previously learned prior network 50 may be stored in the memory 110 . have. In this case, the processor 120 may learn the transformation network 40 using the pre-stored prior network 50 .
  • the processor 120 inputs the training image data, the attribute values of the training image data, and the target attribute values into the transformation network 40 to learn the transformation network 40 to output the attribute transformation data. It is also possible to learn the distribution of , at this time, the processor 120 learns the transformation network 40 so that the distribution through the transformation network 40 can have the same or similar shape as the reference distribution output through the prior network 50 . can do it
  • the processor 120 determines the difference between the distribution through the transformation network 40 and the reference distribution output through the prior network 50 using the Kullback-Leiber divergence, so that the difference between the two distributions is preset.
  • the transformation network 40 can be trained to be less than a value. As a result of such learning, the transformation network 40 may output normalized attribute transformation data.
  • the attribute classification network 60 is an artificial intelligence model trained to output image attribute values by receiving image data on the latent space 20 as an input.
  • the processor 120 may optimize the attribute classification network 60 using the cross entropy between the probability of the image data 21 on the latent space 20 and the attribute value 1 of the image.
  • the processor 120 may train the transformation network 40 to output accurate attribute transformation data by using the attribute classification network 60 .
  • the processor 120 in the process of learning the transformation network 40, based on the image data of the training image and the attribute transformation data output by the transformation network 40, a transformation image for the image data of the training image Acquire data.
  • the processor 120 may input the obtained transformed image data to the attribute classification network 60 and train the transformation network 40 so that the output attribute value is equal to the target attribute value.
  • the processor 120 trains the transformation network 40 using the prior network 50 and the attribute classification network 60 so that the transformation network 40 outputs a transformed image matching the target attribute value.
  • the attribute classification network 60 may be used not only in the process of learning the transform network 40 , but also in the inference process of the transform network 40 .
  • the processor 120 determines whether the transformed image data generated based on the attribute transformation data generated by the transformation network 40 includes a target attribute value using the attribute classification network 60 . can do.
  • the processor 120 selects the transformed image 12 based on the transformed image data 22 can create
  • FIG. 5 is a diagram for explaining a converted image generated by an electronic device according to an embodiment of the present disclosure.
  • FIG. 5A is a diagram for explaining a case in which target attribute values such as glasses, bangs, and blonde hair are received as input in an image 510 including a person (object), and FIG. 5B is a long view in an image including clothing (object). It is a diagram for explaining a case where target attribute values such as an arm, a wide neck, or a slim fit are received as an input.
  • target attribute values such as glasses, bangs, and blonde hair are received as input in an image 510 including a person (object)
  • FIG. 5B is a long view in an image including clothing (object). It is a diagram for explaining a case where target attribute values such as an arm, a wide neck, or a slim fit are received as an input.
  • FIGS. 5A and 5B show a converted image converted by an existing known image conversion method in addition to the converted image converted using the image conversion method of the present disclosure together. started.
  • FIG. 5A illustrates a case in which an input image 510 is converted by the conventional image conversion method ((a) and (b)) and a case in which the input image 510 is converted by the image conversion method according to the present disclosure (c).
  • images in the third row of the tables 510 , 520 and 530 represent converted images generated according to the image conversion method of the present disclosure.
  • each table 550 , 560 , and 570 represents an input image including clothing (object)
  • the second row represents an image converted according to a conventional image conversion method
  • the third row A row represents an image converted according to the image conversion method according to the present disclosure.
  • a table 550 when the target attribute value is 'long arm' a table 560 when the target attribute value is 'wide neck', and a table 570 when the target attribute value is 'slim fit'.
  • the processor 120 may obtain a converted converted image by more accurately reflecting the target attribute value.
  • FIG. 6 is a block diagram illustrating a detailed configuration of an electronic device according to an embodiment of the present disclosure.
  • the electronic device 100 includes a camera 130 , a display 140 , a communication interface 150 , and an input interface 160 in addition to the memory 110 and the processor 120 .
  • a camera 130 the electronic device 100 according to the present disclosure includes a camera 130 , a display 140 , a communication interface 150 , and an input interface 160 in addition to the memory 110 and the processor 120 .
  • the camera 130 is a component for capturing an image.
  • the camera 130 is a device capable of taking still images and moving pictures, and includes one or more image sensors (eg, a front sensor or a rear sensor), a lens, an image signal processor (ISP), and a flash (eg, LED, Xenon). lamp, etc.) may be included.
  • image sensors eg, a front sensor or a rear sensor
  • ISP image signal processor
  • flash eg, LED, Xenon). lamp, etc.
  • the display 140 is a component for displaying an image.
  • the display 140 may display the image 11 and the converted image 12 .
  • the display 140 may display a process in which the image 11 is converted into the converted image 12 .
  • the display 140 may include an application screen for receiving a target attribute value, a graphic user interface (GUI) screen, and the like.
  • GUI graphic user interface
  • the display 140 may be implemented with various types of displays, such as a liquid crystal display (LCD), an organic light emitting diode (OLED) display, a plasma display panel (PDP), a wall, a micro LED, and the like.
  • the display 140 may be implemented as a touch screen combined with a touch sensor, a flexible display, a three-dimensional display (3D display), or the like.
  • the communication interface 150 is a component for the electronic device 100 to communicate with an external electronic device (not shown).
  • the electronic device 100 may receive an image from an external electronic device (not shown) through the communication interface 150 , convert the received image, and transmit the converted image to an external electronic device (not shown).
  • the communication interface 150 may include various communication modules such as a wired communication module (not shown), a short-range wireless communication module (not shown), and a wireless communication module (not shown).
  • the wired communication module is a module for performing communication with an external device (not shown) according to a wired communication method such as wired Ethernet.
  • the short-range wireless communication module is a module for performing communication with an external device (not shown) located in a short distance according to a short-range wireless communication method such as Bluetooth (Bluetooth, BT), BLE (Bluetooth Low Energy), or ZigBee method.
  • the wireless communication module is a module that is connected to an external network according to a wireless communication protocol such as WiFi and IEEE to communicate with an external device (not shown) and a voice recognition server (not shown).
  • the wireless communication module is configured according to various mobile communication standards such as 3rd Generation (3G), 3rd Generation Partnership Project (3GPP), Long Term Evolution (LTE), LTE Advanced (LTE-A), and 5G Networks. It may further include a mobile communication module for performing communication by accessing the mobile communication network.
  • 3G 3rd Generation
  • 3GPP 3rd Generation Partnership Project
  • LTE Long Term Evolution
  • LTE-A LTE Advanced
  • 5G Networks 5G Networks.
  • It may further include a mobile communication module for performing communication by accessing the mobile communication network.
  • the input interface 160 is a component for receiving a user input for controlling the electronic device 100 .
  • the input interface 160 may receive a user's input for the target attribute value.
  • the input interface 160 includes a touch panel 161 for receiving a user's touch input using a user's hand or a stylus pen, a keyboard 162 for receiving a user's manipulation, and a microphone 163 for receiving a user's voice command. ) may be included.
  • the present invention is not necessarily limited thereto, and various input means such as a button (not shown) may be added according to an embodiment.
  • FIG. 7 is a flowchart illustrating a method of controlling an electronic device according to various embodiments of the present disclosure
  • an input image may be converted into image data in the latent space by using a function that normalizes the image to the pre-generated data in the latent space ( S710 ).
  • the latent space represents a space created by reducing the dimension of a multidimensional image data set of an actual observation space or an image space.
  • the latent space may be a space generated while a flow-based generation model including a function trained to transform an input image into image data learns to transform an image into image data.
  • the attribute conversion data for changing the attribute value of the image included in the image data to the target attribute value may be obtained (S720).
  • the attribute conversion data may be data in the latent space.
  • the electronic device may receive a target attribute value and identify an attribute value corresponding to the target attribute value among a plurality of attribute values included in the image data.
  • conversion of image data may be determined using the identified target attribute value and a difference between the attribute values. For example, when there is a difference between the identified target attribute value and the attribute value, it is determined that image transformation is necessary, and attribute transformation data may be extracted based on the difference between the target attribute value and the attribute value.
  • the attribute transformation data may be extracted using a transformation network trained to extract the attribute transformation data.
  • the attribute transformation network may be extracted by inputting the difference value between the attribute value and the target attribute value into the transformation network.
  • the transformation network may learn a distribution of attribute transformation data for converting an attribute value of image data into a target attribute value in the learning step of the transformation network. Then, when the difference value between the attribute value and the target attribute value is input to the transformation network, the distribution of the attribute transformation data may be determined, and attribute transformation data satisfying a preset condition may be extracted from the determined distribution of the attribute transformation data.
  • converted image data for the image data may be generated (S730).
  • the converted image data may be generated by using the sum of the image data and the attribute conversion data.
  • the transformed image data may also be data in the latent space.
  • a converted image for the input image may be generated based on the converted image data (S740).
  • an inverse function of a function normalizing an image to data in the pre-generated latent space may be generated, and a transformed image for the input image may be generated by using transformed image data in the latent space as an input of the inverse function.
  • the transformed image may be obtained by inputting transformed image data into the flow-based generation model.
  • the flow-based generation model is an artificial intelligence model capable of inverse transformation, not only image data is generated from an image, but also an image can be generated from image data inversely.
  • the transformation network can be learned by the prior network and the attribute classification network.
  • the prior network is an artificial intelligence model trained to output a reference distribution of attribute transformation data using the attribute value and target attribute value of an image.
  • a reference distribution of attribute transformation data may be obtained by inputting a difference value between an attribute value and a target attribute value into the prior network, and the transformation network may be trained using the obtained reference distribution.
  • the attribute classification network is an artificial intelligence model trained to receive image data and output the attribute values of images included in the image data. By inputting transformed image data generated by the transformation network to the attribute classification network, the transformation network may be trained so that the output attribute value matches the target attribute value.
  • the attribute classification network may determine whether transformed image data generated by the transformation network includes a target attribute value. For example, in the reasoning step of the transformation network, transformed image data may be acquired based on attribute transformation data generated by the transformation network, and it may be determined whether the transformed image data includes a target attribute value based on the obtained attribute transformation data.
  • various operations described as being performed through the electronic device 100 may be performed through one or more electronic devices in the form of a control method or an operation method of the electronic device.
  • the embodiments described in the present disclosure are ASICs (Application Specific Integrated Circuits), DSPs (digital signal processors), DSPDs (digital signal processing devices), PLDs (Programmable logic devices), FPGAs (field programmable gate arrays) ), a processor, a controller, a micro-controller, a microprocessor, and may be implemented using at least one of an electrical unit for performing other functions.
  • ASICs Application Specific Integrated Circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs Programmable logic devices
  • FPGAs field programmable gate arrays
  • embodiments described herein may be implemented by the processor itself. According to the software implementation, embodiments such as the procedures and functions described in this specification may be implemented as separate software modules. Each of the above-described software modules may perform one or more functions and operations described herein.
  • computer instructions for performing a processing operation in a user device or an administrator device may be stored in a non-transitory computer-readable medium.
  • the computer instructions stored in the non-transitory computer-readable medium are executed by the processor of the specific device, the specific device performs the processing operation of the user device and/or the manager device according to the various embodiments described above.
  • the non-transitory readable medium refers to a medium that stores data semi-permanently, rather than a medium that stores data for a short moment, such as a register, cache, memory, and the like, and can be read by a device.
  • a non-transitory readable medium such as a CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

Un dispositif électronique est divulgué. Le dispositif électronique selon la présente invention comprend : une mémoire dans laquelle sont stockées des informations relatives à un espace latent préalablement généré; et un processeur qui convertit une image d'entrée en données d'image dans l'espace latent à l'aide d'une fonction de normalisation d'une image en données dans l'espace latent, obtient des données de conversion d'attribut pour modifier une valeur d'attribut d'image comprise dans les données d'image en une valeur d'attribut cible, génère des données d'image converties des données d'image sur la base des données d'image et des données de conversion d'attribut, et génère une image convertie de l'image d'entrée sur la base des données d'image converties. Les données de conversion d'attribut et les données d'image converties sont des données dans l'espace latent.
PCT/KR2021/005652 2020-06-18 2021-05-06 Dispositif électronique et son procédé de commande WO2021256702A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2020-0074103 2020-06-18
KR1020200074103A KR20210156470A (ko) 2020-06-18 2020-06-18 전자 장치 및 이의 제어 방법

Publications (1)

Publication Number Publication Date
WO2021256702A1 true WO2021256702A1 (fr) 2021-12-23

Family

ID=79177305

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2021/005652 WO2021256702A1 (fr) 2020-06-18 2021-05-06 Dispositif électronique et son procédé de commande

Country Status (2)

Country Link
KR (1) KR20210156470A (fr)
WO (1) WO2021256702A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102543461B1 (ko) * 2022-04-29 2023-06-14 주식회사 이너버즈 딥 러닝을 이용하여 특정한 속성을 선별적으로 변화시키는 이미지 조정 방법

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009146405A (ja) * 2007-12-12 2009-07-02 Fuji Xerox Co Ltd 人体姿勢推定システム、方法およびプログラム
KR102067340B1 (ko) * 2018-07-16 2020-01-16 한국과학기술원 유방암 병변 특성에 따른 유방 종괴 생성 방법 및 그 시스템
JP2020046984A (ja) * 2018-09-19 2020-03-26 株式会社Ye Digital 画像生成方法、画像判定方法、画像生成装置および画像生成プログラム
US20200151559A1 (en) * 2018-11-14 2020-05-14 Nvidia Corporation Style-based architecture for generative neural networks
KR20200058297A (ko) * 2018-11-19 2020-05-27 고려대학교 산학협력단 설명 가능한 소수샷 영상 분류 방법 및 장치

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009146405A (ja) * 2007-12-12 2009-07-02 Fuji Xerox Co Ltd 人体姿勢推定システム、方法およびプログラム
KR102067340B1 (ko) * 2018-07-16 2020-01-16 한국과학기술원 유방암 병변 특성에 따른 유방 종괴 생성 방법 및 그 시스템
JP2020046984A (ja) * 2018-09-19 2020-03-26 株式会社Ye Digital 画像生成方法、画像判定方法、画像生成装置および画像生成プログラム
US20200151559A1 (en) * 2018-11-14 2020-05-14 Nvidia Corporation Style-based architecture for generative neural networks
KR20200058297A (ko) * 2018-11-19 2020-05-27 고려대학교 산학협력단 설명 가능한 소수샷 영상 분류 방법 및 장치

Also Published As

Publication number Publication date
KR20210156470A (ko) 2021-12-27

Similar Documents

Publication Publication Date Title
WO2019164232A1 (fr) Dispositif électronique, procédé de traitement d'image associé et support d'enregistrement lisible par ordinateur
US11294457B2 (en) Display apparatus and controlling method thereof
WO2021101087A1 (fr) Appareil électronique et son procédé de commande
KR102118519B1 (ko) 전자 장치 및 그 제어 방법
WO2022014790A1 (fr) Mise à jour de gradient de rétropropagation guidée pour tâche de traitement d'image utilisant des informations redondantes de l'image
WO2021177784A1 (fr) Génération de carte de profondeur à super-résolution pour caméras multiples ou autres environnements
WO2017047913A1 (fr) Dispositif d'affichage, procédé de commande associé et support d'enregistrement lisible par ordinateur
WO2021071155A1 (fr) Appareil électronique et son procédé de commande
WO2021256702A1 (fr) Dispositif électronique et son procédé de commande
EP3785180A1 (fr) Appareil électronique et son procédé de commande
WO2022197136A1 (fr) Système et procédé permettant d'améliorer un modèle d'apprentissage machine destiné à une compréhension audio/vidéo au moyen d'une attention suscitée à multiples niveaux et d'une formation temporelle par antagonisme
WO2022158700A1 (fr) Dispositif électronique et procédé de commande associé
WO2021251615A1 (fr) Dispositif électronique et procédé de commande de dispositif électronique
WO2018080204A1 (fr) Appareil de traitement d'images, procédé de traitement d'images et support d'enregistrement lisible par ordinateur
KR102288001B1 (ko) 나이 변환된 얼굴을 갖는 직업영상 생성 장치 및 이를 포함한 포토 부스
WO2023075508A1 (fr) Dispositif électronique et procédé de commande associé
WO2022092487A1 (fr) Appareil électronique et son procédé de commande
WO2021107592A1 (fr) Système et procédé de retouche d'image précise pour éliminer un contenu non souhaité d'images numériques
WO2022108008A1 (fr) Appareil électronique et son procédé de commande
WO2021230474A1 (fr) Dispositif électronique, systeme et procédé de commande associés
WO2024085381A1 (fr) Dispositif électronique pour identifier une région d'intérêt dans une image, et son procédé de commande
WO2024039035A1 (fr) Dispositif électronique de traitement d'image et son procédé de commande
WO2023219240A1 (fr) Dispositif électronique et son procédé de commande
WO2023068484A1 (fr) Dispositif électronique de traitement d'image et son procédé de commande
WO2023153742A1 (fr) Dispositif électronique, et son procédé de commande

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21824856

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21824856

Country of ref document: EP

Kind code of ref document: A1