CN115601555A - Image processing method and apparatus, device and medium - Google Patents

Image processing method and apparatus, device and medium Download PDF

Info

Publication number
CN115601555A
CN115601555A CN202211255869.1A CN202211255869A CN115601555A CN 115601555 A CN115601555 A CN 115601555A CN 202211255869 A CN202211255869 A CN 202211255869A CN 115601555 A CN115601555 A CN 115601555A
Authority
CN
China
Prior art keywords
image
original image
noise
pixel point
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211255869.1A
Other languages
Chinese (zh)
Inventor
薛松
辛颖
冯原
李超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202211255869.1A priority Critical patent/CN115601555A/en
Publication of CN115601555A publication Critical patent/CN115601555A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure provides an image processing method, an image processing apparatus, a device and a medium, which relate to the technical field of artificial intelligence, and in particular to the technical fields of computer vision, image processing, deep learning, etc. The implementation scheme is as follows: acquiring an original image to be processed and noise data corresponding to the original image, wherein the original image comprises a target object, and the noise data is used for interfering with image characteristics of the original image; determining a key feature region comprising the target object in the original image; determining noise weight corresponding to each pixel point in the original image based on the key feature region, wherein the noise weight corresponding to each pixel point in the key feature region is smaller than the noise weight corresponding to each pixel point in other regions; and adding the noise data to the original image based on the noise weight to obtain a target image.

Description

Image processing method and apparatus, device and medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of computer vision, image processing, deep learning, and the like, and in particular, to an image processing method, an image processing apparatus, an electronic device, a computer-readable storage medium, and a computer program product.
Background
Artificial intelligence is the subject of research that causes computers to simulate certain human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge map technology and the like.
For the image recognition model based on deep learning, the more data participating in training, the better the obtained model effect. By carrying out image processing on the original image, more target images can be generated based on the limited original image, so that the training data set extension of the image recognition model is realized.
The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, the problems mentioned in this section should not be considered as having been acknowledged in any prior art.
Disclosure of Invention
The present disclosure provides an image processing method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product.
According to an aspect of the present disclosure, there is provided an image processing method including: acquiring an original image to be processed and noise data corresponding to the original image, wherein the original image comprises a target object, and the noise data is used for interfering with image characteristics of the original image; determining a key feature region comprising the target object in the original image; determining noise weight corresponding to each pixel point in the original image based on the key feature region, wherein the noise weight corresponding to each pixel point in the key feature region is smaller than the noise weight corresponding to each pixel point in other regions; and adding the noise data to the original image based on the noise weight to obtain a target image.
According to another aspect of the present disclosure, there is provided a training method of an image recognition model, including: acquiring a target image corresponding to the original image by using the image processing method; and training the image recognition model by using the original image and the target image.
According to another aspect of the present disclosure, there is provided an image recognition method including: inputting an image to be recognized into an image recognition model to obtain an image recognition result output by the image recognition model, wherein the image recognition model is obtained by utilizing the training method of the image recognition model.
According to another aspect of the present disclosure, there is provided an image processing apparatus including: the image processing device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire an original image to be processed and noise data corresponding to the original image, the original image comprises a target object, and the noise data is used for interfering with image characteristics of the original image; a first determining unit configured to determine a key feature region including the target object in the original image; a second determining unit, configured to determine, based on the key feature region, a noise weight corresponding to each pixel point in the original image, where the noise weight corresponding to each pixel point in the key feature region is smaller than the noise weight corresponding to each pixel point in other regions; and an adding unit configured to add the noise data to the original image based on the noise weight to obtain a target image.
According to another aspect of the present disclosure, there is provided a training apparatus for an image recognition model, including: the image processing apparatus as described above, configured to acquire a target image corresponding to an original image; and a training unit configured to train the image recognition model using the original image and the target image.
According to another aspect of the present disclosure, there is provided an image recognition apparatus including: training the obtained image recognition model by using the training device; and the input unit is used for inputting the image to be recognized into the image recognition model so as to obtain an image recognition result output by the image recognition model.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any of the methods described above.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform any of the above methods.
According to another aspect of the disclosure, a computer program product is provided, comprising a computer program, wherein the computer program is capable of implementing any of the above-mentioned methods when executed by a processor.
According to one or more embodiments of the present disclosure, noise can be prevented from affecting image features of a target object in an original image while adding noise to the original image.
It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of example only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.
FIG. 1 shows a schematic diagram of an exemplary system in which various methods described herein may be implemented, according to an exemplary embodiment of the present disclosure;
FIG. 2 shows a flow chart of an image processing method according to an exemplary embodiment of the present disclosure;
3A-3B illustrate an original image and an example of a thermodynamic diagram corresponding to the original image according to an exemplary embodiment of the present disclosure;
FIG. 4 shows a schematic diagram of a training process of an image recognition model according to an example embodiment of the present disclosure;
fig. 5 shows a block diagram of a structure of an image processing apparatus according to an exemplary embodiment of the present disclosure;
FIG. 6 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, the timing relationship, or the importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, based on the context, they may also refer to different instances.
The terminology used in the description of the various described examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the elements may be one or more. Furthermore, the term "and/or" as used in this disclosure is intended to encompass any and all possible combinations of the listed items.
In the related art, it is common to perform color transformation, rotation transformation, noise injection, image blending, random erasing, shifting, flip transformation, cropping, and the like on an original image to obtain a larger number of target images. Specifically, the noise injection process is to superimpose noise data on the original image. However, this method has the following problems: in the noise injection process, the content corresponding to different areas in the original image is not considered, and for a picture containing a specific target object, when noise data is superimposed on the area corresponding to the specific target object, the noise injection process will generate large interference on the image characteristics of the specific target object, and influence the identification of the target object.
Based on this, the present disclosure provides an image processing method, which determines a key feature region including a target object for an original image including the target object, and adds noise based on different weights for the key feature region and other regions to reduce the influence of the noise on the target object, so that the obtained target image can retain more accurate image features of the target object.
Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented in accordance with embodiments of the present disclosure. Referring to fig. 1, the system 100 includes one or more client devices 101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120. Client devices 101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.
In an embodiment of the present disclosure, the server 120 may run one or more services or software applications that enable the image processing method to be performed.
In some embodiments, the server 120 may also provide other services or software applications, which may include non-virtual environments and virtual environments. In some embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of client devices 101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model.
In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof, which may be executed by one or more processors. A user operating a client device 101, 102, 103, 104, 105, and/or 106 may, in turn, utilize one or more client applications to interact with the server 120 to take advantage of the services provided by these components. It should be understood that a variety of different system configurations are possible, which may differ from system 100. Accordingly, fig. 1 is one example of a system for implementing the various methods described herein, and is not intended to be limiting.
The user may use client devices 101, 102, 103, 104, 105, and/or 106 to obtain raw images to be processed. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that any number of client devices may be supported by the present disclosure.
Client devices 101, 102, 103, 104, 105, and/or 106 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptops), workstation computers, wearable devices, smart screen devices, self-service terminal devices, service robots, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and so forth. These computer devices may run various types and versions of software applications and operating systems, such as MICROSOFT Windows, APPLE iOS, UNIX-like operating systems, linux, or Linux-like operating systems (e.g., GOOGLE Chrome OS); or include various Mobile operating systems such as MICROSOFT Windows Mobile OS, iOS, windows Phone, android. Portable handheld devices may include cellular telephones, smart phones, tablets, personal Digital Assistants (PDAs), and the like. Wearable devices may include head-mounted displays (such as smart glasses) and other devices. The gaming system may include a variety of handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), short Message Service (SMS) applications, and may use a variety of communication protocols.
Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a variety of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a blockchain network, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.
The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture involving virtualization (e.g., one or more flexible pools of logical storage that may be virtualized to maintain virtual storage for the server). In various embodiments, the server 120 may run one or more services or software applications that provide the functionality described below.
The computing units in server 120 may run one or more operating systems including any of the operating systems described above, as well as any commercially available server operating systems. The server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.
In some implementations, the server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of the client devices 101, 102, 103, 104, 105, and 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of client devices 101, 102, 103, 104, 105, and 106.
In some embodiments, the server 120 may be a server of a distributed system, or a server incorporating a blockchain. The server 120 may also be a cloud server, or a smart cloud computing server or a smart cloud host with artificial intelligence technology. The cloud Server is a host product in a cloud computing service system, and is used for solving the defects of high management difficulty and weak service expansibility in the conventional physical host and Virtual Private Server (VPS) service.
The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of the databases 130 may be used to store information such as audio files and video files. The database 130 may reside in various locations. For example, the database used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The database 130 may be of different types. In certain embodiments, the database used by the server 120 may be, for example, a relational database. One or more of these databases may store, update, and retrieve data to and from the database in response to the command.
In some embodiments, one or more of the databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key-value stores, object stores, or regular stores supported by a file system.
The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.
Fig. 2 shows a flowchart of an image processing method 200 according to an exemplary embodiment of the present disclosure. As shown in fig. 2, method 200 includes:
step S201, acquiring an original image to be processed and noise data corresponding to the original image, wherein the original image comprises a target object, and the noise data is used for interfering with image characteristics of the original image;
step S202, determining a key feature area comprising the target object in the original image;
step S203, determining a noise weight corresponding to each pixel point in the original image based on the key feature region, wherein the noise weight corresponding to each pixel point in the key feature region is smaller than the noise weight corresponding to each pixel point in other regions; and
and step S204, adding the noise data to the original image based on the noise weight to obtain a target image.
By determining the key feature region containing the target object in the target image and then adding noise based on different weights for the key feature region and other regions, the influence of noise data on the target object can be reduced, so that more accurate image features of the target object can be reserved in the obtained target image. In the case where the number of images in the original image data set is limited, by using the above method, it is possible to obtain a target image to which noise is added based on the original image, to expand the size of the image data set, and to increase the diversity of image data while retaining key target object image features.
In some examples, the noise data corresponding to the original image described in step S201 may be random noise or periodic noise, and the noise is added to interfere with the image characteristics of the original image to obtain a target image different from the original image. In an actual application scenario, the noise data corresponding to the original image may also be configured manually, so as to control parameters such as frequency and amplitude of the noise data according to requirements.
According to some embodiments, the determining of the key feature region in the original image including the target object in step S202 includes: inputting the original image into an image feature extraction network to acquire image feature information which is output by the image feature extraction network and is associated with the position of the target object, wherein the image feature extraction network is obtained by utilizing a training image containing the target object for training; determining a thermodynamic diagram corresponding to the original image based on the image characteristic information, wherein the pixel value of each position in the thermodynamic diagram can represent the probability that the pixel point corresponding to the position in the original image is associated with the target object; and aiming at each pixel point in the original image, determining that the pixel point belongs to a key feature region in response to the fact that the pixel value of the position, corresponding to the pixel point, in the thermodynamic diagram is not smaller than a preset threshold value. Therefore, the image feature information associated with the position of the target object can be obtained by utilizing the image feature extraction network, the thermodynamic diagram capable of indicating the position of the target object in the original image is obtained based on the image feature information, and the key feature area in the original image is determined based on the thermodynamic diagram, so that the method is more accurate and efficient.
In one example, the image feature extraction network is a convolutional neural network comprising a plurality of channels, a plurality of feature maps corresponding to an original image can be obtained by inputting the original image into the convolutional neural network, the position of a target object is indicated by using a pixel value of each position in the feature map, and then a thermodynamic diagram corresponding to the original image can be obtained by using a GradCAM algorithm to represent the probability that each pixel point in the original image is associated with the target object.
Fig. 3A-3B illustrate an original image 301 and an example of a thermodynamic diagram 302 corresponding to the original image according to an exemplary embodiment of the present disclosure. The pixel values at each location are visually shown in the thermodynamic diagram 302 in different gray levels, the lower the gray level value, the higher the probability that the pixel point corresponding to the location in the original image 301 can be characterized to be associated with the target object. And determining whether the corresponding pixel point in the original image belongs to the key feature region or not by comparing the size relationship between the pixel value of each position in the thermodynamic diagram and a preset threshold value.
According to some embodiments, the acquiring noise data corresponding to the original image in step S201 includes: determining feature gradient information of the original image based on the image feature information; and determining noise data corresponding to the original image based on the characteristic gradient information. By obtaining the noise data corresponding to the original image by using the characteristic gradient information of the original image, the noise data can effectively interfere with the image characteristics of the original image, and the data diversity of the obtained target image is improved.
In some examples, noise data may be determined by determining characteristic Gradient information of the original image using Fast Gradient Signal Method (FGSM). For example, the method may be to obtain structural parameter information of the image feature extraction network, calculate a derivative of the model to the input based on the structural parameter information, and obtain a gradient direction of each position in the original image by using a sign function, so that corresponding noise data can be generated in a direction opposite to the gradient direction, so that the noise data can effectively influence the feature extraction and recognition result of the original image.
According to some embodiments, noise data corresponding to the original image may be determined based on the characteristic gradient information and a preset step size. Therefore, the amplitude of the noise data can be adjusted by adjusting the preset step length, so that the proportion of the noise data in the target image can be flexibly controlled, and the diversity of the target image is improved. For example, the feature gradient of each position in the original image may be multiplied by a preset step size, and the product is used as the noise data corresponding to the position, so as to obtain the noise data for the original image.
According to some embodiments, the noise data is a noise image of the same size as the original image, the adding of the noise data to the original image based on the noise weight in step S204 comprises: multiplying the pixel values of a plurality of pixel points in the noise image with the noise weight respectively to obtain a weighted noise image; and overlapping the weighted noise image and the original image to obtain the target image. Therefore, the weighted noise image can be obtained simply, conveniently and quickly, the pixel value of the key characteristic region corresponding to the original image in the weighted noise image is small, the pixel value of the other region corresponding to the original image is large, and the weighted noise image and the original image are superposed, so that the noise with large amplitude can be added to the other region in the original image, the noise with small amplitude is added to the key characteristic region, and the interference of the noise to the target object in the image is reduced.
In some examples, the noise data may also be in other forms, such as noise of a constant value that is superimposed with pixel values in the original image to achieve interference with image features of the original image.
In some embodiments, the noise weight corresponding to each pixel point in the key feature region is zero, and the noise weight corresponding to each pixel point in the other regions is greater than zero. Therefore, noise is not added to the key feature region in the original image, so that accurate image features of the target object can be reserved in the target image, and the influence on the identification result of the target object is avoided.
In some examples, the noise weight corresponding to each pixel point in the key feature region is zero, and the noise weight corresponding to each pixel point in the other regions is 1. When the key feature region in the original image is determined by using the thermodynamic diagram corresponding to the original diagram, the noise weight of the pixel point corresponding to each position in the original image can be determined directly based on the pixel value of the position in the thermodynamic diagram. For example, for each pixel point in the original image, in response to that the pixel value of the position corresponding to the pixel point in the thermodynamic diagram is not less than the preset threshold, the noise weight of the pixel point is determined to be 0, and in response to that the pixel value of the position corresponding to the pixel point in the thermodynamic diagram is less than the preset threshold, the noise weight of the pixel point is determined to be 1.
According to an aspect of the present disclosure, there is also provided a training method of an image recognition model, including: acquiring a target image corresponding to the original image by using the image processing method 200; and training the image recognition model by using the original image and the target image. Therefore, more target images can be generated based on limited original images, so that the extension of a training data set of the image recognition model is realized, and the training of the image recognition model is carried out by using more training data. In addition, as described above, the target image can retain more accurate image features of the target object, and the non-key feature region contains certain noise data, so that the diversity of data can be increased without changing the target object features in the training data, and the generalization capability and robustness of the model can be improved without affecting the recognition accuracy of the model on the target object.
Fig. 4 shows a schematic diagram of a training process of an image recognition model according to an exemplary embodiment of the present disclosure. By acquiring the characteristic gradient information of the original image, corresponding noise data can be generated along the opposite direction of the gradient direction corresponding to each position in the original image, so as to obtain a noise image with the same size as the original image. And further, pixel values of a plurality of pixel points in the noise image are multiplied by corresponding noise weights respectively to obtain a weighted noise image, and the weighted noise image is superposed with the original image to obtain the target image. And training the image recognition model by using an extended training data set containing the original image and the target image.
In some examples, when the image feature extraction network is applied in the image processing method 200 to obtain the thermodynamic diagram and the feature gradient information corresponding to the original image, so as to obtain the target image, the image feature extraction network may have a network structure the same as or similar to that of the image recognition model to be trained. Therefore, the noise data can be further prevented from influencing the recognition accuracy of the image recognition model on the target object, and the training effect is improved.
According to an aspect of the present disclosure, there is also provided an image recognition method, including: inputting an image to be recognized into an image recognition model to obtain an image recognition result output by the image recognition model, wherein the image recognition model is obtained by training through the image recognition model training method. By using the performance optimized image recognition model, the accuracy of image recognition can be improved.
According to an aspect of the present disclosure, an image processing apparatus is also provided. Fig. 5 illustrates a block diagram of the structure of an image processing apparatus 500 according to an exemplary embodiment of the present disclosure. As shown in fig. 5, the apparatus 500 includes: an acquiring unit 501 configured to acquire an original image to be processed, wherein the original image includes a target object, and noise data corresponding to the original image, wherein the noise data is used for interfering with image features of the original image; a first determining unit 502 configured to determine a key feature region including the target object in the original image; a second determining unit 503 configured to determine a noise weight corresponding to each pixel point in the original image based on the key feature region, the noise weight corresponding to each pixel point in the key feature region is smaller than the noise weight corresponding to each pixel point in other regions; and an adding unit 504 configured to add the noise data to the original image based on the noise weight to obtain a target image.
According to some embodiments, the first determination unit 502 comprises: a first input subunit, configured to input the original image into an image feature extraction network to obtain image feature information associated with a position of the target object, where the image feature extraction network is obtained by training using a training image containing the target object; a first determining subunit, configured to determine, based on the image feature information, a thermodynamic diagram corresponding to the original image, where a pixel value of each location in the thermodynamic diagram can represent a probability that the pixel point corresponding to the location in the original image is associated with the target object; and the second determining subunit is configured to, for each pixel point in the original image, determine that the pixel point belongs to a key feature region in response to that a pixel value of a position corresponding to the pixel point in the thermodynamic diagram is not less than a preset threshold.
According to some embodiments, the obtaining unit 501 comprises: a third determining subunit configured to determine feature gradient information of the original image based on the image feature information; and a fourth determining subunit, configured to determine, based on the feature gradient information, noise data corresponding to the original image.
According to some embodiments, the third determining subunit is configured to: and determining noise data corresponding to the original image based on the characteristic gradient information and a preset step length.
According to some embodiments, the noise data is a noise image of the same size as the original image, the adding unit 504 includes: the multiplication subunit is configured to multiply the pixel values of a plurality of pixel points in the noise image and the noise weights respectively to obtain a weighted noise image; and an overlaying subunit configured to overlay the weighted noise image with the original image to obtain the target image.
According to some embodiments, the noise weight corresponding to each pixel point in the key feature region is zero, and the noise weight corresponding to each pixel point in the other region is greater than zero.
The operations of the units 501 to 504 of the image processing apparatus 500 are similar to those of the steps S201 to S204 described above, and are not described herein again.
According to an aspect of the present disclosure, there is also provided a training apparatus for an image recognition model, including: the image processing apparatus 500 as described above, configured to acquire a target image corresponding to an original image; and a training unit configured to train the image recognition model using the original image and the target image.
According to an aspect of the present disclosure, there is also provided an image recognition apparatus including: the image recognition model obtained by training by using the training device of the image recognition model; and the input unit is used for inputting the image to be recognized into the image recognition model so as to obtain an image recognition result output by the image recognition model.
According to another aspect of the present disclosure, there is also provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image processing method described above.
According to another aspect of the present disclosure, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the image processing method described above.
According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the image processing method described above.
Referring to fig. 6, a block diagram of a structure of an electronic device 600, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606, an output unit 607, a storage unit 608, and a communication unit 609. The input unit 606 may be any type of device capable of inputting information to the device 600, and the input unit 606 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a track pad, a track ball, a joystick, a microphone, and/or a remote control. Output unit 607 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. The storage unit 608 may include, but is not limited to, a magnetic disk, an optical disk. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication transceiver, and/or a chipset, such as a bluetooth (TM) device, an 802.11 device, a WiFi device, a WiMax device, a cellular communication device, and/or the like.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 executes the respective methods and processes described above, such as the image processing method. For example, in some embodiments, the image processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the image processing method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the image processing method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be performed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the above-described methods, systems and apparatus are merely exemplary embodiments or examples and that the scope of the present invention is not limited by these embodiments or examples, but only by the claims as issued and their equivalents. Various elements in the embodiments or examples may be omitted or may be replaced with equivalents thereof. Further, the steps may be performed in an order different from that described in the present disclosure. Further, various elements in the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced by equivalent elements that appear after the present disclosure.

Claims (19)

1. An image processing method comprising:
acquiring an original image to be processed and noise data corresponding to the original image, wherein the original image comprises a target object, and the noise data is used for interfering with image characteristics of the original image;
determining a key feature region comprising the target object in the original image;
determining noise weight corresponding to each pixel point in the original image based on the key feature region, wherein the noise weight corresponding to each pixel point in the key feature region is smaller than the noise weight corresponding to each pixel point in other regions; and
adding the noise data to the original image based on the noise weight to obtain a target image.
2. The method of claim 1, wherein the determining a key feature region in the original image that includes the target object comprises:
inputting the original image into an image feature extraction network to obtain image feature information which is output by the image feature extraction network and is associated with the position of the target object, wherein the image feature extraction network is obtained by utilizing a training image containing the target object to train;
determining a thermodynamic diagram corresponding to the original image based on the image characteristic information, wherein the pixel value of each position in the thermodynamic diagram can represent the probability that the pixel point corresponding to the position in the original image is associated with the target object; and
and aiming at each pixel point in the original image, determining that the pixel point belongs to a key feature region in response to the fact that the pixel value of the position corresponding to the pixel point in the thermodynamic diagram is not smaller than a preset threshold value.
3. The method of claim 2, wherein said obtaining noise data corresponding to the original image comprises:
determining feature gradient information of the original image based on the image feature information; and
and determining noise data corresponding to the original image based on the characteristic gradient information.
4. The method of claim 3, wherein the determining noise data corresponding to the original image based on the feature gradient information comprises:
and determining noise data corresponding to the original image based on the characteristic gradient information and a preset step length.
5. The method of any of claims 1-4, wherein the noise data is a noise image of the same size as the original image, the adding the noise data to the original image based on the noise weight comprising:
multiplying the pixel values of a plurality of pixel points in the noise image with the noise weight respectively to obtain a weighted noise image; and
and superposing the weighted noise image and the original image to obtain the target image.
6. The method of any one of claims 1-5, wherein the noise weight for each pixel point correspondence in the key feature region is zero and the noise weight for each pixel point correspondence in the other region is greater than zero.
7. A training method of an image recognition model comprises the following steps:
acquiring a target image corresponding to the original image by using the method of any one of claims 1-6; and
and training the image recognition model by using the original image and the target image.
8. An image recognition method, comprising:
inputting an image to be recognized into an image recognition model to obtain an image recognition result output by the image recognition model, wherein the image recognition model is obtained by training by using the method of claim 7.
9. An image processing apparatus comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an original image to be processed and noise data corresponding to the original image, the original image comprises a target object, and the noise data is used for interfering the image characteristics of the original image;
a first determining unit configured to determine a key feature region including the target object in the original image;
the second determining unit is configured to determine a noise weight corresponding to each pixel point in the original image based on the key feature region, wherein the noise weight corresponding to each pixel point in the key feature region is smaller than the noise weight corresponding to each pixel point in other regions; and
an adding unit configured to add the noise data to the original image based on the noise weight to obtain a target image.
10. The apparatus of claim 9, wherein the first determining unit comprises:
a first input subunit, configured to input the original image into an image feature extraction network to obtain image feature information associated with the position of the target object, where the image feature extraction network is obtained by training using a training image that includes the target object;
a first determining subunit, configured to determine, based on the image feature information, a thermodynamic diagram corresponding to the original image, where a pixel value of each location in the thermodynamic diagram can represent a probability that the pixel point corresponding to the location in the original image is associated with the target object; and
and the second determining subunit is configured to determine, for each pixel point in the original image, that the pixel point belongs to a key feature region in response to that a pixel value of a position corresponding to the pixel point in the thermodynamic diagram is not less than a preset threshold.
11. The apparatus of claim 10, wherein the obtaining unit comprises:
a third determining subunit configured to determine feature gradient information of the original image based on the image feature information; and
a fourth determining subunit, configured to determine, based on the feature gradient information, noise data corresponding to the original image.
12. The apparatus of claim 11, wherein the third determining subunit is configured to:
and determining noise data corresponding to the original image based on the characteristic gradient information and a preset step length.
13. The apparatus according to any one of claims 9 to 12, wherein the noise data is a noise image of the same size as the original image, the adding unit includes:
the multiplication subunit is configured to multiply the pixel values of a plurality of pixel points in the noise image and the noise weights respectively to obtain a weighted noise image; and
a superposition subunit configured to superpose the weighted noise image with the original image to obtain the target image.
14. The apparatus of any one of claims 9-13, wherein the noise weight for each pixel point correspondence in the key feature region is zero and the noise weight for each pixel point correspondence in the other region is greater than zero.
15. An apparatus for training an image recognition model, comprising:
the apparatus of any one of claims 9-14, configured to acquire a target image corresponding to an original image; and
a training unit configured to train the image recognition model using the original image and the target image.
16. An image recognition apparatus comprising:
training the resulting image recognition model using the apparatus of claim 15; and
and the input unit is configured to input the image to be recognized into the image recognition model so as to obtain an image recognition result output by the image recognition model.
17. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-8.
19. A computer program product comprising a computer program, wherein the computer program realizes the method according to any of claims 1-8 when executed by a processor.
CN202211255869.1A 2022-10-13 2022-10-13 Image processing method and apparatus, device and medium Pending CN115601555A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211255869.1A CN115601555A (en) 2022-10-13 2022-10-13 Image processing method and apparatus, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211255869.1A CN115601555A (en) 2022-10-13 2022-10-13 Image processing method and apparatus, device and medium

Publications (1)

Publication Number Publication Date
CN115601555A true CN115601555A (en) 2023-01-13

Family

ID=84847102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211255869.1A Pending CN115601555A (en) 2022-10-13 2022-10-13 Image processing method and apparatus, device and medium

Country Status (1)

Country Link
CN (1) CN115601555A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116205819A (en) * 2023-03-23 2023-06-02 北京百度网讯科技有限公司 Character image generation method, training method and device of deep learning model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116205819A (en) * 2023-03-23 2023-06-02 北京百度网讯科技有限公司 Character image generation method, training method and device of deep learning model
CN116205819B (en) * 2023-03-23 2024-02-09 北京百度网讯科技有限公司 Character image generation method, training method and device of deep learning model

Similar Documents

Publication Publication Date Title
CN114743196B (en) Text recognition method and device and neural network training method
CN114972958B (en) Key point detection method, neural network training method, device and equipment
CN115511779B (en) Image detection method, device, electronic equipment and storage medium
CN114445667A (en) Image detection method and method for training image detection model
CN112784985A (en) Training method and device of neural network model, and image recognition method and device
CN114924862A (en) Task processing method, device and medium implemented by integer programming solver
CN114723949A (en) Three-dimensional scene segmentation method and method for training segmentation model
CN114550313A (en) Image processing method, neural network, and training method, device, and medium thereof
CN115601555A (en) Image processing method and apparatus, device and medium
CN116245998B (en) Rendering map generation method and device, and model training method and device
CN115600646B (en) Language model training method, device, medium and equipment
CN114913549B (en) Image processing method, device, equipment and medium
CN116152607A (en) Target detection method, method and device for training target detection model
CN115578501A (en) Image processing method, image processing device, electronic equipment and storage medium
CN115393514A (en) Training method of three-dimensional reconstruction model, three-dimensional reconstruction method, device and equipment
CN114429678A (en) Model training method and device, electronic device and medium
CN114494797A (en) Method and apparatus for training image detection model
CN114092556A (en) Method, apparatus, electronic device, medium for determining human body posture
CN112784912A (en) Image recognition method and device, and training method and device of neural network model
CN115512131B (en) Image detection method and training method of image detection model
CN115578451B (en) Image processing method, training method and device of image processing model
CN114882331A (en) Image processing method, apparatus, device and medium
CN114118379B (en) Neural network training method, image processing method, device, equipment and medium
CN113920304A (en) Sample image processing method, sample image processing device, electronic device, and medium
CN114511757A (en) Method and apparatus for training image detection model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination