WO2020228405A1 - Image processing method and apparatus, and electronic device - Google Patents

Image processing method and apparatus, and electronic device Download PDF

Info

Publication number
WO2020228405A1
WO2020228405A1 PCT/CN2020/079192 CN2020079192W WO2020228405A1 WO 2020228405 A1 WO2020228405 A1 WO 2020228405A1 CN 2020079192 W CN2020079192 W CN 2020079192W WO 2020228405 A1 WO2020228405 A1 WO 2020228405A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
layer
sampling
convolutional
layers
Prior art date
Application number
PCT/CN2020/079192
Other languages
French (fr)
Chinese (zh)
Inventor
李华夏
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2020228405A1 publication Critical patent/WO2020228405A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Definitions

  • the present disclosure relates to the field of data processing technology, and in particular to an image processing method, device and electronic equipment.
  • image processing tasks can be completed by artificial intelligence.
  • neural networks have been fully applied in the field of computer image recognition. For example, in the image to recognize different people, or automatically recognize different objects on the road in unmanned driving. These all constitute the specific content of image semantic recognition.
  • image semantic recognition involves image semantic segmentation. Image semantic segmentation is generally modeled as a pixel-level multi-classification problem. The goal is to distinguish each pixel of an image into one of multiple predefined categories.
  • the embodiments of the present disclosure provide an image processing method, device, and electronic device, which at least partially solve the problems in the prior art.
  • embodiments of the present disclosure provide an image processing method, including:
  • a segmentation network for performing image processing on the first image is set.
  • the segmentation network includes a plurality of convolutional layers and down-sampling layers.
  • the convolutional layer and the down-sampling layer are spaced apart.
  • Perform feature extraction on a target object in an image, and the down-sampling layer performs down-sampling on the image output by the convolutional layer;
  • the parallel convolutional layer is used to process the image output by the second down-sampling layer.
  • Each parallel convolutional layer The image features extracted above form a second image through fusion;
  • a third image containing the target object is acquired.
  • the performing target recognition on the second image includes:
  • a third down-sampling layer is provided, and the third down-sampling layer performs a down-sampling operation on the second image.
  • the performing target recognition on the second image further includes:
  • the up-sampling layer After the third down-sampling layer, a plurality of up-sampling layers are set, and the up-sampling layer performs an up-sampling operation on the image output by the third down-sampling layer.
  • the performing target recognition on the second image further includes:
  • target recognition is performed on the image output by the upsampling layer.
  • the method further includes:
  • the connecting convolutional layers between convolutional layers of the same image size includes:
  • the convolutional layer is connected based on the residual function.
  • connection of the convolutional layer based on the residual function includes:
  • the image features extracted on each parallel convolutional layer are merged to form a second image, including:
  • Different weight values are assigned to the multiple eigenvector matrices, and the sum of the eigenvector matrices with different weight values is used as the representation matrix of the second image.
  • an image processing device including:
  • An obtaining module used to obtain the first image containing the target object
  • the setting module is configured to set a segmentation network for performing image processing on the first image.
  • the segmentation network includes a plurality of convolutional layers and downsampling layers.
  • the convolutional layer and the downsampling layer are distributed at intervals, and the convolution A layer performs feature extraction on the target object in the first image, and the down-sampling layer performs a down-sampling operation on the image output by the convolutional layer;
  • the processing module is used to set multiple parallel convolutional layers with different sampling rates after the second down-sampling layer in the segmentation network.
  • the parallel convolutional layers are used to process the image output by the second down-sampling layer.
  • the image features extracted on two parallel convolutional layers are fused to form a second image;
  • the execution module is configured to obtain a third image containing the target object by performing target recognition on the second image.
  • an embodiment of the present disclosure also provides an electronic device, which includes:
  • At least one processor and,
  • a memory communicatively connected with the at least one processor; wherein,
  • the memory stores instructions that can be executed by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute any of the foregoing first aspect or any implementation of the first aspect Image processing method.
  • embodiments of the present disclosure also provide a non-transitory computer-readable storage medium that stores computer instructions that are used to make the computer execute the first aspect or the first aspect described above.
  • An image processing method in any implementation of one aspect.
  • the embodiments of the present disclosure also provide a computer program product.
  • the computer program product includes a computing program stored on a non-transitory computer-readable storage medium.
  • the computer program includes program instructions. When executed, the computer is caused to execute the image processing method in the foregoing first aspect or any implementation manner of the first aspect.
  • the image processing solution in the embodiment of the present disclosure includes acquiring a first image containing a target object; setting a segmentation network for image processing on the first image, the segmentation network including multiple convolutional layers and downsampling layers, the volume The distribution layer and the down-sampling layer are spaced apart, the convolutional layer performs feature extraction on the target object in the first image, and the down-sampling layer performs a down-sampling operation on the image output by the convolutional layer; After the second down-sampling layer in the segmentation network, multiple parallel convolutional layers with different sampling rates are set.
  • the parallel convolutional layer is used to process the image output by the second down-sampling layer, and each parallel convolutional layer is The extracted image features are fused to form a second image; by performing target recognition on the second image, a third image containing the target object is obtained.
  • FIG. 1 is a schematic diagram of an image processing flow provided by an embodiment of the disclosure
  • FIG. 2 is a schematic diagram of a neural network model provided by an embodiment of the disclosure.
  • FIG. 3 is a schematic diagram of another image processing flow provided by an embodiment of the disclosure.
  • FIG. 4 is a schematic diagram of another image processing flow provided by an embodiment of the disclosure.
  • FIG. 5 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the disclosure.
  • FIG. 6 is a schematic diagram of an electronic device provided by an embodiment of the disclosure.
  • the embodiment of the present disclosure provides an image processing method.
  • the image processing method provided in this embodiment can be executed by a computing device, and the computing device can be implemented as software, or as a combination of software and hardware, and the computing device can be integrated in a server, terminal device, etc.
  • an image processing method provided by an embodiment of the present disclosure includes the following steps:
  • S101 Acquire a first image containing a target object.
  • the target object is the content to be acquired by the solution of the present disclosure.
  • the target object may be a person with various actions, an animal with behavior characteristics, or a stationary object.
  • the target object is usually contained in a certain scene.
  • a photo containing a portrait of a person usually also contains a background.
  • the background may include trees, mountains, rivers, and other people.
  • you want to extract the target object separately from the image you need to identify and process the target object separately.
  • various behaviors of the target object can be analyzed.
  • the first image is an image that contains the target object.
  • the first image can be one of a series of pre-stored photos, a video frame extracted from a pre-saved video, or a live broadcast One or more frames extracted from the video.
  • the first image may include multiple objects.
  • the photo used to describe the action of the person may include the target person, other people with the target person, trees, buildings, etc.
  • the target person constitutes the target object of the first image, and other people, trees, buildings, etc. together with the target person constitute the background image. Based on actual needs, one or more objects can be selected as target objects in the first image.
  • the target object can be obtained from a video file, and the video collected from the target object contains multiple frame images, and multiple images containing one or more continuous actions of the target object can be selected from the frame images of the video to form an image set.
  • the first image containing the target object can be obtained.
  • the segmentation network includes a plurality of convolutional layers and downsampling layers.
  • the convolutional layer and the downsampling layer are spaced apart, and the convolutional layer is Feature extraction is performed on the target object in the first image, and the down-sampling layer performs a down-sampling operation on the image output by the convolutional layer.
  • the segmentation network includes a convolutional layer, a sampling layer and a fully connected layer.
  • the main parameters of the convolutional layer include the size of the convolution kernel and the number of input feature maps.
  • Each convolutional layer can contain several feature maps of the same size.
  • the feature values of the same layer adopt the method of sharing weights.
  • the convolution in each layer The core size is the same.
  • the convolution layer performs convolution calculation on the input image and extracts the layout features of the input image.
  • the feature extraction layer of the convolutional layer can be connected to the sampling layer.
  • the sampling layer is used to find the local average of the input image and perform secondary feature extraction.
  • the neural network model can ensure that the input The image has good robustness.
  • the sampling layer may include an up-sampling layer and a down-sampling layer.
  • the up-sampling layer adds pixel information in the image by interpolating the input image.
  • the down-sampling layer extracts the features of the input image by extracting the features of the input image.
  • a pooling layer (not shown in the figure) can also be provided after the convolutional layer.
  • the pooling layer uses the maximum pooling method to process the output results of the convolutional layer, which can be more Good to extract the invariant features of the input image.
  • the fully connected layer integrates the features in the image feature maps that have passed through multiple convolutional layers and pooling layers, and obtains the classification features of the input image features for image classification.
  • the fully connected layer maps the feature map generated by the convolutional layer into a fixed-length feature vector.
  • the feature vector contains the combined information of all the features of the input image, and the feature vector retains the most characteristic image features in the image to complete the image classification task. In this way, the prediction map corresponding to the input image can be calculated, thereby determining the target object contained in the first image.
  • a down-sampling layer, down-sampling layer and convolutional layer interval distribution are set in the segmentation network.
  • the convolutional layer performs feature extraction on the target object in the first image, and the down-sampling layer The image output by the convolutional layer performs down-sampling.
  • the calculation speed of the segmentation network for the first image is improved.
  • the disadvantage of traditional neural networks is that they need to input fixed-size images.
  • the images input into the neural network may have been cropped or distorted.
  • the cropped or distorted images will have content due to The loss situation causes the neural network to reduce the recognition accuracy of the object to be recognized in the input image.
  • the recognition accuracy of the target object by the neural network will also be reduced.
  • Parallel convolutional layers are set in the segmentation network. Specifically, after the second down-sampling layer in the segmentation network, multiple parallel sampling rates are set. Convolutional layer, the parallel convolutional layer is used to process the image output by the second down-sampling layer, and the image features extracted on each parallel convolutional layer are fused to form a second image.
  • the input image or the target object in the input image can be any aspect ratio or any size.
  • the segmentation network can extract features at different scales.
  • the parallel convolution layer can use 4 ⁇ 4, 2 ⁇ 2, and 1 ⁇ 1 convolution kernels to perform feature calculations on the input images respectively, so as to obtain 3 independently processed images, and merge the 3 independently processed images , Can form a second image. Since the formation of the second image is not affected by the reality of the size or proportion of the input image, the robustness of the segmentation network is further improved.
  • each embodiment is not limited to the detection of objects of a specific size, shape, or type, nor is it limited to the detection of images of a specific size, type, or content.
  • the system for image processing using parallel convolutional layer pooling according to various embodiments can be applied to images of any size, type, or content.
  • the parallel convolutional layer improves the robustness of the data and also increases the computational burden of the system. For this reason, the parallel convolutional layer is set after the second down-sampling layer in the segmentation network. At this time, the second The image output by the down-sampling layer has sufficient characteristics to meet the requirements of the parallel sampling layer. At the same time, after the first image is processed by the two sampling layers, the amount of data calculation is greatly reduced. While satisfying the robustness of parallel convolutional layers, it also reduces the computational cost of evaluating convolutional layers.
  • S104 Acquire a third image containing the target object by performing target recognition on the second image.
  • the size of the second image can be adjusted.
  • 3 parallel convolutional layers (1 ⁇ 1, 3 ⁇ 3, and 6 ⁇ 6, a total of 46 feature vectors) as an example
  • these 3 parallel convolutional layers can be used for each candidate window to pool features .
  • a 11776-dimensional (256 ⁇ 46) representation is generated for each window. These representations can be provided to the fully connected layer of the segmentation network, and the fully connected layer is used to perform target recognition based on these representations.
  • a third lower A sampling layer where the third down-sampling layer performs down-sampling operations on the second image.
  • the feature information contained in the image can be improved by increasing the pixel value of the image.
  • multiple for example, 3 can be set after the third downsampling layer An up-sampling layer, where the up-sampling layer performs an up-sampling operation on the image output by the third down-sampling layer.
  • the performing target recognition on the second image may include:
  • the output a1, a2, and a3 of the fully connected layer can be expressed by the following formula:
  • the weight matrix contains different weight values, which are obtained by training the segmentation network.
  • the bias vector contains different bias values, which can be obtained by training the segmentation network.
  • S303 Perform target recognition on the image output by the upsampling layer based on the weight value and the bias value.
  • the target object contained in the second image can be quickly recognized.
  • the process of constructing a segmentation network may further include the following steps:
  • multiple convolutional layers can be set in the segmentation network.
  • the images that need to be processed can be processed accordingly.
  • the size of the feature image output by different convolution layers will also be different.
  • the input parameters and convolution kernels of all convolution layers can be calculated to obtain each convolution The size of the layer output image.
  • the shallow features have more image features, and the deep features have more semantic features.
  • the convolutional layer that can produce the same size, increase the convolution The connection between the layers, thereby reducing the edge jagged problem in the image.
  • step S403 In the process of implementing step S403, according to a specific implementation manner of the embodiment of the present disclosure, the following steps may also be included:
  • mapping function W(xi) for the i-th convolutional layer, the input xi of the i-th convolutional layer and the output F(xi) of the i-th convolutional layer can be set, and then F(xi) +W(xi) is used as the input of the i+2th convolutional layer. In this way, the convolutional layers are connected.
  • a convolution kernel of the same size can be set in multiple parallel convolution layers.
  • the input to the multiple The images in parallel convolutional layers are feature extracted to form multiple feature vector matrices.
  • different weight values are assigned to the multiple eigenvector matrices, and the sum of the eigenvector matrices with different weight values is used as the representation matrix of the second image to finally form the second image.
  • an embodiment of the present disclosure also discloses an image processing device 50, including:
  • the acquiring module 501 is configured to acquire the first image containing the target object.
  • the target object is the content to be acquired by the solution of the present disclosure.
  • the target object may be a person with various actions, an animal with behavior characteristics, or a stationary object.
  • the target object is usually contained in a certain scene.
  • a photo containing a portrait of a person usually also contains a background.
  • the background may include trees, mountains, rivers, and other people.
  • you want to extract the target object separately from the image you need to identify and process the target object separately.
  • various behaviors of the target object can be analyzed.
  • the first image is an image that contains the target object.
  • the first image can be one of a series of pre-stored photos, a video frame extracted from a pre-saved video, or a live broadcast One or more frames extracted from the video.
  • the first image may include multiple objects.
  • the photo used to describe the action of the person may include the target person, other people with the target person, trees, buildings, etc.
  • the target person constitutes the target object of the first image, and other people, trees, buildings, etc. together with the target person constitute the background image. Based on actual needs, one or more objects can be selected as target objects in the first image.
  • the target object can be obtained from a video file, and the video collected from the target object contains multiple frame images, and multiple images containing one or more continuous actions of the target object can be selected from the frame images of the video to form an image set.
  • the first image containing the target object can be obtained.
  • the setting module 502 is configured to set a segmentation network for performing image processing on the first image.
  • the segmentation network includes multiple convolutional layers and down-sampling layers.
  • the convolutional layer and the down-sampling layer are distributed at intervals, and the volume
  • the build-up layer performs feature extraction on the target object in the first image
  • the down-sampling layer performs a down-sampling operation on the image output by the convolutional layer.
  • the segmentation network includes a convolutional layer, a sampling layer and a fully connected layer.
  • the main parameters of the convolutional layer include the size of the convolution kernel and the number of input feature maps.
  • Each convolutional layer can contain several feature maps of the same size.
  • the feature values of the same layer adopt the method of sharing weights.
  • the convolution in each layer The core size is the same.
  • the convolution layer performs convolution calculation on the input image and extracts the layout features of the input image.
  • the feature extraction layer of the convolutional layer can be connected to the sampling layer.
  • the sampling layer is used to find the local average value of the input image and perform secondary feature extraction.
  • the neural network model can ensure that the input The image has good robustness.
  • the sampling layer may include an up-sampling layer and a down-sampling layer.
  • the up-sampling layer adds pixel information in the image by interpolating the input image.
  • the down-sampling layer extracts the features of the input image by extracting the features of the input image.
  • a pooling layer (not shown in the figure) can also be provided after the convolutional layer.
  • the pooling layer uses the maximum pooling method to process the output results of the convolutional layer, which can be more Good to extract the invariant features of the input image.
  • the fully connected layer integrates the features in the image feature maps that have passed through multiple convolutional layers and pooling layers, and obtains the classification features of the input image features for image classification.
  • the fully connected layer maps the feature map generated by the convolutional layer into a fixed-length feature vector.
  • the feature vector contains the combined information of all the features of the input image, and the feature vector retains the most characteristic image features in the image to complete the image classification task. In this way, the prediction map corresponding to the input image can be calculated, thereby determining the target object contained in the first image.
  • a down-sampling layer, down-sampling layer and convolutional layer interval distribution are set in the segmentation network.
  • the convolutional layer performs feature extraction on the target object in the first image, and the down-sampling layer The image output by the convolutional layer performs down-sampling.
  • the calculation speed of the segmentation network for the first image is improved.
  • the processing module 503 is configured to set multiple parallel convolutional layers with different sampling rates after the second downsampling layer in the segmentation network, and the parallel convolutional layers are used to process the image output by the second downsampling layer, The image features extracted on each parallel convolutional layer are merged to form a second image.
  • the disadvantage of traditional neural networks is that they need to input fixed-size images.
  • the images input into the neural network may have been cropped or distorted.
  • the cropped or distorted images will have content due to The loss situation causes the neural network to reduce the recognition accuracy of the object to be recognized in the input image.
  • the recognition accuracy of the target object by the neural network will also be reduced.
  • Parallel convolutional layers are set in the segmentation network. Specifically, after the second down-sampling layer in the segmentation network, multiple parallel sampling rates are set. Convolutional layer, the parallel convolutional layer is used to process the image output by the second down-sampling layer, and the image features extracted on each parallel convolutional layer are fused to form a second image.
  • the input image or the target object in the input image can have any aspect ratio or any size.
  • the segmentation network can extract features at different scales.
  • the parallel convolution layer can use 4 ⁇ 4, 2 ⁇ 2, and 1 ⁇ 1 convolution kernels to perform feature calculations on the input images respectively, so as to obtain 3 independently processed images, and merge the 3 independently processed images , A second image can be formed. Since the formation of the second image is not affected by the reality of the size or proportion of the input image, the robustness of the segmentation network is further improved.
  • each embodiment is not limited to the detection of objects of a specific size, shape, or type, nor is it limited to the detection of images of a specific size, type, or content.
  • the system for image processing using parallel convolutional layer pooling according to various embodiments can be applied to images of any size, type, or content.
  • the parallel convolutional layer improves the robustness of the data and also increases the computational burden of the system. For this reason, the parallel convolutional layer is set after the second down-sampling layer in the segmentation network. At this time, the second The image output by the down-sampling layer has sufficient characteristics to meet the requirements of the parallel sampling layer. At the same time, after the first image is processed by the two sampling layers, the amount of data calculation is greatly reduced. While satisfying the robustness of parallel convolutional layers, it also reduces the computational cost of evaluating convolutional layers.
  • the execution module 504 is configured to obtain a third image containing the target object by performing target recognition on the second image.
  • the size of the second image can be adjusted.
  • 3 parallel convolutional layers (1 ⁇ 1, 3 ⁇ 3, and 6 ⁇ 6, a total of 46 feature vectors) as an example
  • these 3 parallel convolutional layers can be used for each candidate window to pool features .
  • a 11776-dimensional (256 ⁇ 46) representation is generated for each window. These representations can be provided to the fully connected layer of the segmentation network, and the fully connected layer is used to perform target recognition based on these representations.
  • the device shown in FIG. 5 can correspondingly execute the content in the foregoing method embodiment.
  • an electronic device 60 which includes:
  • At least one processor and,
  • a memory communicatively connected with the at least one processor; wherein,
  • the memory stores an instruction executable by the at least one processor, and the instruction is executed by the at least one processor, so that the at least one processor can execute the image processing method in the foregoing method embodiment.
  • the embodiments of the present disclosure also provide a non-transitory computer-readable storage medium that stores computer instructions, and the computer instructions are used to make the computer execute the foregoing method embodiments.
  • the embodiments of the present disclosure also provide a computer program product, the computer program product includes a calculation program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, when the program instructions are executed by a computer, The computer executes the image processing method in the foregoing method embodiment.
  • Fig. 6 shows a schematic structural diagram of an electronic device 60 suitable for implementing embodiments of the present disclosure.
  • Electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (for example, Mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG. 6 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
  • the electronic device 60 may include a processing device (such as a central processing unit, a graphics processor, etc.) 601, which may be loaded into a random access device according to a program stored in a read-only memory (ROM) 602 or from a storage device 608
  • the program in the memory (RAM) 603 executes various appropriate actions and processing.
  • the RAM 603 also stores various programs and data required for the operation of the electronic device 60.
  • the processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also connected to the bus 604.
  • the following devices can be connected to the I/O interface 605: including input devices 606 such as touch screen, touch panel, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; including, for example, liquid crystal display (LCD), speakers, An output device 607 such as a vibrator; a storage device 608 such as a magnetic tape, a hard disk, etc.; and a communication device 609.
  • the communication device 609 may allow the electronic device 60 to perform wireless or wired communication with other devices to exchange data.
  • the figure shows the electronic device 60 with various devices, it should be understood that it is not required to implement or have all the devices shown. It may be implemented alternatively or provided with more or fewer devices.
  • the process described above with reference to the flowchart can be implemented as a computer software program.
  • the embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602.
  • the processing device 601 the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
  • the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electric, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable signal medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wire, optical cable, RF (Radio Frequency), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or it may exist alone without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: obtains at least two Internet protocol addresses; and sends to the node evaluation device including the at least two A node evaluation request for an Internet Protocol address, wherein the node evaluation device selects an Internet Protocol address from the at least two Internet Protocol addresses and returns it; receives the Internet Protocol address returned by the node evaluation device; wherein, the obtained The Internet Protocol address indicates the edge node in the content distribution network.
  • the aforementioned computer-readable medium carries one or more programs, and when the aforementioned one or more programs are executed by the electronic device, the electronic device: receives a node evaluation request including at least two Internet Protocol addresses; Among the at least two Internet Protocol addresses, select an Internet Protocol address; return the selected Internet Protocol address; wherein, the received Internet Protocol address indicates an edge node in the content distribution network.
  • the computer program code used to perform the operations of the present disclosure may be written in one or more programming languages or a combination thereof.
  • the above-mentioned programming languages include object-oriented programming languages-such as Java, Smalltalk, C++, and also conventional Procedural programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagram can represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logical function Executable instructions.
  • the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure can be implemented in software or hardware. Wherein, the name of the unit does not constitute a limitation on the unit itself under certain circumstances.
  • the first obtaining unit can also be described as "a unit for obtaining at least two Internet Protocol addresses.”

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

An image processing method and apparatus, and an electronic device, relating to the technical field of data processing. The method comprises: obtaining a first image comprising a target object; providing a segmentation network for performing image processing on the first image; providing a plurality of parallel convolution layers having different sampling rates behind a second downsampling layer in the segmentation network, wherein the parallel convolution layer is used for processing an image outputted by the second downsampling layer, and image features extracted from each parallel convolution layer form a second image in a fusion manner; and obtaining a third image comprising the target object by performing target recognition on the second image. The present solution can improve the accuracy of target recognition.

Description

图像处理方法、装置及电子设备Image processing method, device and electronic equipment
相关申请的交叉引用Cross references to related applications
本申请要求于2019年05月15日提交的,申请号为201910403859.X、发明名称为“图像处理方法、装置及电子设备”的中国专利申请的优先权,该申请的全文通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on May 15, 2019 with the application number 201910403859.X and the invention title "Image processing method, device and electronic equipment". The full text of the application is incorporated into this by reference. Applying.
技术领域Technical field
本公开涉及数据处理技术领域,尤其涉及一种图像处理方法、装置及电子设备。The present disclosure relates to the field of data processing technology, and in particular to an image processing method, device and electronic equipment.
背景技术Background technique
随着人工智能技术的发展,越来越多的图像处理工作可以通过人工智能的方式来完成,神经网络作为人工智能的一种实现手段,在计算机图像识别领域得到了充分的应用。例如,在图像中对不同人物进行识别,或者在无人驾驶中自动识别道路上的不同对象。这些都构成了图像语义识别的具体内容。图像语义识别的过程中会涉及到图像语义分割,图像语义分割一般建模为像素级别的多分类问题,其目标是将图像的每一像素区分为预定义的多个类别之一。With the development of artificial intelligence technology, more and more image processing tasks can be completed by artificial intelligence. As a means of artificial intelligence, neural networks have been fully applied in the field of computer image recognition. For example, in the image to recognize different people, or automatically recognize different objects on the road in unmanned driving. These all constitute the specific content of image semantic recognition. The process of image semantic recognition involves image semantic segmentation. Image semantic segmentation is generally modeled as a pixel-level multi-classification problem. The goal is to distinguish each pixel of an image into one of multiple predefined categories.
目前已有的图像语义分割方法多数基于编码器解码器的卷积神经网络。但是这种网络结构虽然可以获得较好的语义分割结果,但是一旦采用编解码结构,必然会在编码过程明显的降低特征图的空间分辨率,尽管在上采样过程恢复图像的原始分辨率,但是不可避免的会造成空间细节信息的丢失,从而导致目标识别的准确度降低。At present, most of the existing image semantic segmentation methods are based on the convolutional neural network of the encoder and decoder. However, although this network structure can obtain better semantic segmentation results, once the codec structure is adopted, the spatial resolution of the feature map will be obviously reduced during the encoding process. Although the original resolution of the image is restored during the upsampling process, It will inevitably lead to the loss of spatial detail information, which leads to a decrease in the accuracy of target recognition.
发明内容Summary of the invention
有鉴于此,本公开实施例提供一种图像处理方法、装置及电子设备,至少部分解决现有技术中存在的问题。In view of this, the embodiments of the present disclosure provide an image processing method, device, and electronic device, which at least partially solve the problems in the prior art.
第一方面,本公开实施例提供了一种图像处理方法,包括:In the first aspect, embodiments of the present disclosure provide an image processing method, including:
获取包含目标对象的第一图像;Acquiring the first image containing the target object;
设置对第一图像进行图像处理的分割网络,所述分割网络包括多个卷积层和下采样层,所述卷积层和所述下采样层间隔分布,所述卷积层对所述第一图像中的目标对象进行特征提取,所述下采样层对所述卷积层输出的图像执行下采样操作;A segmentation network for performing image processing on the first image is set. The segmentation network includes a plurality of convolutional layers and down-sampling layers. The convolutional layer and the down-sampling layer are spaced apart. Perform feature extraction on a target object in an image, and the down-sampling layer performs down-sampling on the image output by the convolutional layer;
在所述分割网络中第二个下采样层之后,设置多个不同采样率平行卷积层,所述平行卷积层用于处理第二个下采样层输出的图像,每个平行卷积层上提取的图像特征通过融合的方式,形成第二图像;After the second down-sampling layer in the segmentation network, multiple parallel convolutional layers with different sampling rates are set. The parallel convolutional layer is used to process the image output by the second down-sampling layer. Each parallel convolutional layer The image features extracted above form a second image through fusion;
通过对所述第二图像进行目标识别,获取包含所述目标对象的第三图像。By performing target recognition on the second image, a third image containing the target object is acquired.
根据本公开实施例的一种具体实现方式,所述对所述第二图像进行目标识别,包括:According to a specific implementation of an embodiment of the present disclosure, the performing target recognition on the second image includes:
在所述平行卷积层之后,设置第三下采样层,所述第三下采样层对所述第二图像执行下采样操作。After the parallel convolutional layer, a third down-sampling layer is provided, and the third down-sampling layer performs a down-sampling operation on the second image.
根据本公开实施例的一种具体实现方式,所述对所述第二图像进行目标识别,还包括:According to a specific implementation of the embodiment of the present disclosure, the performing target recognition on the second image further includes:
在所述第三下采样层之后,设置多个上采样层,所述上采样层对第三下采样层输出的图像执行上采样操作。After the third down-sampling layer, a plurality of up-sampling layers are set, and the up-sampling layer performs an up-sampling operation on the image output by the third down-sampling layer.
根据本公开实施例的一种具体实现方式,所述对所述第二图像进行目标识别,还包括:According to a specific implementation of the embodiment of the present disclosure, the performing target recognition on the second image further includes:
在所述分割网络中设置全连接层;Setting a fully connected layer in the segmented network;
在所述全连接层中,对所述平行卷积层不同节点输出的图像设置不同的权重值以及针对采样层所有节点的偏置值;In the fully connected layer, different weight values and bias values for all nodes of the sampling layer are set for images output by different nodes of the parallel convolutional layer;
基于所述权重值和所述偏置值,对所述上采样层输出的图像进行目标识别。Based on the weight value and the bias value, target recognition is performed on the image output by the upsampling layer.
根据本公开实施例的一种具体实现方式,所述方法还包括:According to a specific implementation manner of the embodiments of the present disclosure, the method further includes:
获取所述分割网络中所有的卷积层;Acquiring all convolutional layers in the segmentation network;
获取所有卷积层中每一卷积层输出的特征图像的图像尺寸;Obtain the image size of the feature image output by each convolutional layer in all convolutional layers;
在将输出相同图像尺寸的卷积层之间进行卷积层连接。Connect convolutional layers between convolutional layers that will output the same image size.
根据本公开实施例的一种具体实现方式,所述在将输出相同图像尺寸的卷 积层之间进行卷积层连接,包括:According to a specific implementation of the embodiment of the present disclosure, the connecting convolutional layers between convolutional layers of the same image size includes:
获取N个输出相同图像尺寸的卷积层x中,第i个卷积层的输入xi和输出H(xi);Obtain the input xi and output H(xi) of the i-th convolutional layer among N convolutional layers x that output the same image size;
基于xi和H(xi),构建第i个卷积层的残差函数F(xi)=H(xi)-xi;Based on xi and H(xi), construct the residual function F(xi)=H(xi)-xi of the i-th convolutional layer;
基于所述残差函数进行卷积层的连接。The convolutional layer is connected based on the residual function.
根据本公开实施例的一种具体实现方式,所述基于所述残差函数进行卷积层的连接,包括:According to a specific implementation of the embodiment of the present disclosure, the connection of the convolutional layer based on the residual function includes:
设置针对第i个卷积层的映射函数W(xi);Set the mapping function W(xi) for the i-th convolutional layer;
获取第i个卷积层的输入xi及第i个卷积层的输出F(xi);Obtain the input xi of the i-th convolutional layer and the output F(xi) of the i-th convolutional layer;
将F(xi)+W(xi)作为第i+2个卷积层的输入。Take F(xi)+W(xi) as the input of the i+2th convolutional layer.
根据本公开实施例的一种具体实现方式,所述每个平行卷积层上提取的图像特征通过融合的方式,形成第二图像,包括:According to a specific implementation of the embodiment of the present disclosure, the image features extracted on each parallel convolutional layer are merged to form a second image, including:
在所述多个平行卷积层中设置相同大小的卷积核;Setting convolution kernels of the same size in the multiple parallel convolution layers;
基于所述卷积核,对输入到所述多个平行卷积层中的图像进行特征提取,形成多个特征向量矩阵;Based on the convolution kernel, perform feature extraction on the images input to the multiple parallel convolution layers to form multiple feature vector matrices;
为所述多个特征向量矩阵分配不同的权重值,将不同权重值的特征向量矩阵的和作为所述第二图像的表示矩阵。Different weight values are assigned to the multiple eigenvector matrices, and the sum of the eigenvector matrices with different weight values is used as the representation matrix of the second image.
第二方面,本公开实施例公开了一种图像处理装置,包括:In the second aspect, an embodiment of the present disclosure discloses an image processing device, including:
获取模块,用于获取包含目标对象的第一图像;An obtaining module, used to obtain the first image containing the target object;
设置模块,用于设置对第一图像进行图像处理的分割网络,所述分割网络包括多个卷积层和下采样层,所述卷积层和所述下采样层间隔分布,所述卷积层对所述第一图像中的目标对象进行特征提取,所述下采样层对所述卷积层输出的图像执行下采样操作;The setting module is configured to set a segmentation network for performing image processing on the first image. The segmentation network includes a plurality of convolutional layers and downsampling layers. The convolutional layer and the downsampling layer are distributed at intervals, and the convolution A layer performs feature extraction on the target object in the first image, and the down-sampling layer performs a down-sampling operation on the image output by the convolutional layer;
处理模块,用于在所述分割网络中第二个下采样层之后,设置多个不同采样率平行卷积层,所述平行卷积层用于处理第二个下采样层输出的图像,每个平行卷积层上提取的图像特征通过融合的方式,形成第二图像;The processing module is used to set multiple parallel convolutional layers with different sampling rates after the second down-sampling layer in the segmentation network. The parallel convolutional layers are used to process the image output by the second down-sampling layer. The image features extracted on two parallel convolutional layers are fused to form a second image;
执行模块,用于通过对所述第二图像进行目标识别,获取包含所述目标对象的第三图像。The execution module is configured to obtain a third image containing the target object by performing target recognition on the second image.
第三方面,本公开实施例还提供了一种电子设备,该电子设备包括:In a third aspect, an embodiment of the present disclosure also provides an electronic device, which includes:
至少一个处理器;以及,At least one processor; and,
与该至少一个处理器通信连接的存储器;其中,A memory communicatively connected with the at least one processor; wherein,
该存储器存储有可被该至少一个处理器执行的指令,该指令被该至少一个处理器执行,以使该至少一个处理器能够执行前述任第一方面或第一方面的任一实现方式中的图像处理方法。The memory stores instructions that can be executed by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute any of the foregoing first aspect or any implementation of the first aspect Image processing method.
第四方面,本公开实施例还提供了一种非暂态计算机可读存储介质,该非暂态计算机可读存储介质存储计算机指令,该计算机指令用于使该计算机执行前述第一方面或第一方面的任一实现方式中的图像处理方法。In a fourth aspect, embodiments of the present disclosure also provide a non-transitory computer-readable storage medium that stores computer instructions that are used to make the computer execute the first aspect or the first aspect described above. An image processing method in any implementation of one aspect.
第五方面,本公开实施例还提供了一种计算机程序产品,该计算机程序产品包括存储在非暂态计算机可读存储介质上的计算程序,该计算机程序包括程序指令,当该程序指令被计算机执行时,使该计算机执行前述第一方面或第一方面的任一实现方式中的图像处理方法。In a fifth aspect, the embodiments of the present disclosure also provide a computer program product. The computer program product includes a computing program stored on a non-transitory computer-readable storage medium. The computer program includes program instructions. When executed, the computer is caused to execute the image processing method in the foregoing first aspect or any implementation manner of the first aspect.
本公开实施例中的图像处理方案,包括获取包含目标对象的第一图像;设置对第一图像进行图像处理的分割网络,所述分割网络包括多个卷积层和下采样层,所述卷积层和所述下采样层间隔分布,所述卷积层对所述第一图像中的目标对象进行特征提取,所述下采样层对所述卷积层输出的图像执行下采样操作;在所述分割网络中第二个下采样层之后,设置多个不同采样率平行卷积层,所述平行卷积层用于处理第二个下采样层输出的图像,每个平行卷积层上提取的图像特征通过融合的方式,形成第二图像;通过对所述第二图像进行目标识别,获取包含所述目标对象的第三图像。通过本公开的方案,提高目标识别的准确度。The image processing solution in the embodiment of the present disclosure includes acquiring a first image containing a target object; setting a segmentation network for image processing on the first image, the segmentation network including multiple convolutional layers and downsampling layers, the volume The distribution layer and the down-sampling layer are spaced apart, the convolutional layer performs feature extraction on the target object in the first image, and the down-sampling layer performs a down-sampling operation on the image output by the convolutional layer; After the second down-sampling layer in the segmentation network, multiple parallel convolutional layers with different sampling rates are set. The parallel convolutional layer is used to process the image output by the second down-sampling layer, and each parallel convolutional layer is The extracted image features are fused to form a second image; by performing target recognition on the second image, a third image containing the target object is obtained. Through the solution of the present disclosure, the accuracy of target recognition is improved.
附图说明Description of the drawings
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the drawings that need to be used in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present disclosure. For those of ordinary skill in the art, other drawings can be obtained from these drawings without creative work.
图1为本公开实施例提供的一种图像处理流程示意图;FIG. 1 is a schematic diagram of an image processing flow provided by an embodiment of the disclosure;
图2为本公开实施例提供的一种神经网络模型示意图;2 is a schematic diagram of a neural network model provided by an embodiment of the disclosure;
图3为本公开实施例提供的另一种图像处理流程示意图;3 is a schematic diagram of another image processing flow provided by an embodiment of the disclosure;
图4为本公开实施例提供的另一种图像处理流程示意图;4 is a schematic diagram of another image processing flow provided by an embodiment of the disclosure;
图5为本公开实施例提供的图像处理装置结构示意图;FIG. 5 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the disclosure;
图6为本公开实施例提供的电子设备示意图。FIG. 6 is a schematic diagram of an electronic device provided by an embodiment of the disclosure.
具体实施方式Detailed ways
下面结合附图对本公开实施例进行详细描述。The embodiments of the present disclosure will be described in detail below in conjunction with the drawings.
以下通过特定的具体实例说明本公开的实施方式,本领域技术人员可由本说明书所揭露的内容轻易地了解本公开的其他优点与功效。显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。本公开还可以通过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本公开的精神下进行各种修饰或改变。需说明的是,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The following describes the implementation of the present disclosure through specific specific examples, and those skilled in the art can easily understand other advantages and effects of the present disclosure from the content disclosed in this specification. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, rather than all the embodiments. The present disclosure can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present disclosure. It should be noted that the following embodiments and the features in the embodiments can be combined with each other if there is no conflict. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present disclosure.
需要说明的是,下文描述在所附权利要求书的范围内的实施例的各种方面。应显而易见,本文中所描述的方面可体现于广泛多种形式中,且本文中所描述的任何特定结构及/或功能仅为说明性的。基于本公开,所属领域的技术人员应了解,本文中所描述的一个方面可与任何其它方面独立地实施,且可以各种方式组合这些方面中的两者或两者以上。举例来说,可使用本文中所阐述的任何数目个方面来实施设备及/或实践方法。另外,可使用除了本文中所阐述的方面中的一或多者之外的其它结构及/或功能性实施此设备及/或实践此方法。It should be noted that various aspects of the embodiments within the scope of the appended claims are described below. It should be obvious that the aspects described herein can be embodied in a wide variety of forms, and any specific structure and/or function described herein are only illustrative. Based on the present disclosure, those skilled in the art should understand that one aspect described herein can be implemented independently of any other aspects, and two or more of these aspects can be combined in various ways. For example, any number of aspects set forth herein can be used to implement devices and/or methods of practice. In addition, other structures and/or functionalities other than one or more of the aspects set forth herein may be used to implement this device and/or practice this method.
还需要说明的是,以下实施例中所提供的图示仅以示意方式说明本公开的基本构想,图式中仅显示与本公开中有关的组件而非按照实际实施时的组件数目、形状及尺寸绘制,其实际实施时各组件的型态、数量及比例可为一种随意的改变,且其组件布局型态也可能更为复杂。It should also be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present disclosure in a schematic manner. The figures only show the components related to the present disclosure rather than the number, shape, and shape of the components in actual implementation. For size drawing, the type, quantity, and ratio of each component can be changed at will during actual implementation, and the component layout type may also be more complicated.
另外,在以下描述中,提供具体细节是为了便于透彻理解实例。然而,所属领域的技术人员将理解,可在没有这些特定细节的情况下实践所述方面。In addition, in the following description, specific details are provided to facilitate a thorough understanding of the examples. However, those skilled in the art will understand that the described aspects may be practiced without these specific details.
本公开实施例提供一种图像处理方法。本实施例提供的图像处理方法可以由一计算装置来执行,该计算装置可以实现为软件,或者实现为软件和硬件的组合,该计算装置可以集成设置在服务器、终端设备等中。The embodiment of the present disclosure provides an image processing method. The image processing method provided in this embodiment can be executed by a computing device, and the computing device can be implemented as software, or as a combination of software and hardware, and the computing device can be integrated in a server, terminal device, etc.
参见图1,本公开实施例提供的一种图像处理方法,包括如下步骤:Referring to FIG. 1, an image processing method provided by an embodiment of the present disclosure includes the following steps:
S101,获取包含目标对象的第一图像。S101: Acquire a first image containing a target object.
目标对象是本公开的方案所要获取的内容,作为一个例子,目标对象可以是一个具有各种动作的人,也可以是具有行为特色的动物,或者是静止的物体等。The target object is the content to be acquired by the solution of the present disclosure. As an example, the target object may be a person with various actions, an animal with behavior characteristics, or a stationary object.
目标对象通常包含在一定的场景中,例如包含人物肖像的照片通常还含有背景,背景可以包括树木、山、河流、以及其他的人物等。此时如果想从图像中将目标对象单独的提取出来,就需要对目标对象进行单独的识别和处理。基于提取出来的目标对象,可以分析目标对象的各种行为。The target object is usually contained in a certain scene. For example, a photo containing a portrait of a person usually also contains a background. The background may include trees, mountains, rivers, and other people. At this time, if you want to extract the target object separately from the image, you need to identify and process the target object separately. Based on the extracted target object, various behaviors of the target object can be analyzed.
第一图像是包含了目标对象的图像,第一图像可以是通过预先存储的一系列照片中的一个,也可以是从一段预先保存的视频中提取出来的视频帧,还可以是从实时直播的视频中提取的一个或多个画面。第一图像中可以包含多个对象,例如用于描述人物动作的照片可以包含目标人物、与目标人物在一起的其他人物、树木、建筑物等。目标人物构成了第一图像的目标对象,与目标人物在一起的其他人物、树木、建筑物等构成了背景图像。基于实际的需要,可以在第一图像中选择一个或多个对象作为目标对象。The first image is an image that contains the target object. The first image can be one of a series of pre-stored photos, a video frame extracted from a pre-saved video, or a live broadcast One or more frames extracted from the video. The first image may include multiple objects. For example, the photo used to describe the action of the person may include the target person, other people with the target person, trees, buildings, etc. The target person constitutes the target object of the first image, and other people, trees, buildings, etc. together with the target person constitute the background image. Based on actual needs, one or more objects can be selected as target objects in the first image.
作为一个例子,可以从视频文件中获取目标对象,对目标对象采集的视频中包含多个帧图像,可以从视频的帧图像中选取多个包含一个或多个目标对象连续动作的图像,构成图像集合。通过对图像集合中的图像进行选取,能够获取包含目标对象的第一图像。As an example, the target object can be obtained from a video file, and the video collected from the target object contains multiple frame images, and multiple images containing one or more continuous actions of the target object can be selected from the frame images of the video to form an image set. By selecting images in the image collection, the first image containing the target object can be obtained.
S102,设置对第一图像进行图像处理的分割网络,所述分割网络包括多个卷积层和下采样层,所述卷积层和所述下采样层间隔分布,所述卷积层对所述第一图像中的目标对象进行特征提取,所述下采样层对所述卷积层输出的图像 执行下采样操作。S102. Set a segmentation network for performing image processing on the first image. The segmentation network includes a plurality of convolutional layers and downsampling layers. The convolutional layer and the downsampling layer are spaced apart, and the convolutional layer is Feature extraction is performed on the target object in the first image, and the down-sampling layer performs a down-sampling operation on the image output by the convolutional layer.
为了能够对第一图像的进行图像处理,构建基于神经网络模型的分割网络,参见图2,分割网络包括卷积层、采样层和全连接层。In order to be able to perform image processing on the first image, a segmentation network based on the neural network model is constructed. See Figure 2. The segmentation network includes a convolutional layer, a sampling layer and a fully connected layer.
卷积层主要参数包括卷积核的大小和输入特征图的数量,每个卷积层可以包含若干个相同大小的特征图,同一层特征值采用共享权值的方式,每层内的卷积核大小一致。卷积层对输入图像进行卷积计算,并提取输入图像的布局特征。The main parameters of the convolutional layer include the size of the convolution kernel and the number of input feature maps. Each convolutional layer can contain several feature maps of the same size. The feature values of the same layer adopt the method of sharing weights. The convolution in each layer The core size is the same. The convolution layer performs convolution calculation on the input image and extracts the layout features of the input image.
卷积层的特征提取层后面都可以与采样层连接,采样层用来求输入图像的局部平均值并进行二次特征提取,通过将采样层与卷积层连接,能够保证神经网络模型对于输入图像具有较好的鲁棒性。The feature extraction layer of the convolutional layer can be connected to the sampling layer. The sampling layer is used to find the local average of the input image and perform secondary feature extraction. By connecting the sampling layer and the convolutional layer, the neural network model can ensure that the input The image has good robustness.
采样层可以包括上采样层和下采样层,上采样层通过对输入图像进行插值等方式,增加图像中的像素信息。下采样层通过对输入的图像进行特征提取的方式,提取输入图像的特征,The sampling layer may include an up-sampling layer and a down-sampling layer. The up-sampling layer adds pixel information in the image by interpolating the input image. The down-sampling layer extracts the features of the input image by extracting the features of the input image.
为了加快分割网络的训练速度,还可以在卷积层后面还设置有池化层(图中未示出),池化层采用最大池化的方式对卷积层的输出结果进行处理,能够更好的提取输入图像的不变性特征。In order to speed up the training speed of the segmentation network, a pooling layer (not shown in the figure) can also be provided after the convolutional layer. The pooling layer uses the maximum pooling method to process the output results of the convolutional layer, which can be more Good to extract the invariant features of the input image.
全连接层将经过多个卷积层和池化层的图像特征图中的特征进行整合,获取输入图像特征具有的分类特征,以用于图像分类。在分割网络的神经网络模型中,全连接层将卷积层产生的特征图映射成一个固定长度的特征向量。该特征向量包含了输入图像所有特征的组合信息,该特征向量将图像中含有最具有特点的图像特征保留了下来以完成图像分类任务。这样一来便可以计算输入的图像对应的预测图,从而确定第一图像中所包含的目标对象。The fully connected layer integrates the features in the image feature maps that have passed through multiple convolutional layers and pooling layers, and obtains the classification features of the input image features for image classification. In the neural network model of the segmentation network, the fully connected layer maps the feature map generated by the convolutional layer into a fixed-length feature vector. The feature vector contains the combined information of all the features of the input image, and the feature vector retains the most characteristic image features in the image to complete the image classification task. In this way, the prediction map corresponding to the input image can be calculated, thereby determining the target object contained in the first image.
为了提高分割网络的计算速度,在分割网络中设置下采样层,下采样层和卷积层间隔分布,卷积层对所述第一图像中的目标对象进行特征提取,下采样层对所述卷积层输出的图像执行下采样操作。通过这种设置方式,提高了分割网络对于第一图像的计算速度。In order to improve the calculation speed of the segmentation network, a down-sampling layer, down-sampling layer and convolutional layer interval distribution are set in the segmentation network. The convolutional layer performs feature extraction on the target object in the first image, and the down-sampling layer The image output by the convolutional layer performs down-sampling. Through this setting method, the calculation speed of the segmentation network for the first image is improved.
S103,在所述分割网络中第二个下采样层之后,设置多个不同采样率平行卷积层,所述平行卷积层用于处理第二个下采样层输出的图像,每个平行卷积 层上提取的图像特征通过融合的方式,形成第二图像。S103. After the second down-sampling layer in the segmentation network, multiple parallel convolutional layers with different sampling rates are set. The parallel convolutional layers are used to process the image output by the second down-sampling layer. The image features extracted on the build-up layer are merged to form a second image.
传统神经网络的缺点是他们需要输入固定大小的图像,而事实上,由于对图像处理的不同,输入到神经网络中的图像可能已经经过裁剪或扭曲处理,经过裁剪或扭曲的图像会由于存在内容丢失的情况,导致神经网络对于输入图像中待识别物体的识别准确性降低。除此之外,当同样的目标对象在不同的图像中的尺寸发生变化时,也会降低神经网络对目标对象的识别准确度。The disadvantage of traditional neural networks is that they need to input fixed-size images. In fact, due to the difference in image processing, the images input into the neural network may have been cropped or distorted. The cropped or distorted images will have content due to The loss situation causes the neural network to reduce the recognition accuracy of the object to be recognized in the input image. In addition, when the size of the same target object in different images changes, the recognition accuracy of the target object by the neural network will also be reduced.
为了进一步提高分割网络对于第一图像的自适应性,参见图2,在分割网络中设置平行卷积层,具体的,在分割网络中第二个下采样层之后,设置多个不同采样率平行卷积层,平行卷积层用于处理第二个下采样层输出的图像,每个平行卷积层上提取的图像特征通过融合的方式,形成第二图像。In order to further improve the adaptability of the segmentation network to the first image, refer to Figure 2. Parallel convolutional layers are set in the segmentation network. Specifically, after the second down-sampling layer in the segmentation network, multiple parallel sampling rates are set. Convolutional layer, the parallel convolutional layer is used to process the image output by the second down-sampling layer, and the image features extracted on each parallel convolutional layer are fused to form a second image.
利用平行卷积层进行图像处理,输入图像或输入图像中目标对象可以是任意宽高比或任意的尺寸大小。当输入图像为不同尺度时,分割网络可以按不同尺度进行提取特征。例如,平行卷积层可以采用4×4、2×2和1×1的卷积核对输入的图像分别进行特征计算,从而得到3路独立处理的图像,通过将3路独立处理的图像进行融合,能够形成第二图像。由于第二图像的形成不受输入图像大小或比例的现实,从而进一步的提高了分割网络的鲁棒性。Using parallel convolutional layers for image processing, the input image or the target object in the input image can be any aspect ratio or any size. When the input image is of different scales, the segmentation network can extract features at different scales. For example, the parallel convolution layer can use 4×4, 2×2, and 1×1 convolution kernels to perform feature calculations on the input images respectively, so as to obtain 3 independently processed images, and merge the 3 independently processed images , Can form a second image. Since the formation of the second image is not affected by the reality of the size or proportion of the input image, the robustness of the segmentation network is further improved.
通过该实施方式,各实施例不限于对特定大小、形状或类型的对象的检测,也不限于对特定大小、类型或内容的图像的检测。根据各实施例的使用平行卷积层池化来进行图像处理的***可在能作用于任何大小、类型或内容的图像。Through this embodiment, each embodiment is not limited to the detection of objects of a specific size, shape, or type, nor is it limited to the detection of images of a specific size, type, or content. The system for image processing using parallel convolutional layer pooling according to various embodiments can be applied to images of any size, type, or content.
平行卷积层提高了数据鲁棒性的同时,也增加了***的计算负担,为此,将平行卷积层设置在所述分割网络中第二个下采样层之后,此时,第二个下采样层输出的图像具有足够的特征来满足平行采样层的需求,同时第一图像经过两个采样层的处理之后,数据的计算量大为降低。在满足平行卷积层鲁棒性的同时,降低了评价卷积层的计算消耗。之所以如此,是因为若将平行采样层放置在第三个采样层之后进行图像处理,则第一图像经过三个采样层的处理之后,将会损失过多的特征,从而导致平行卷积层获得的特征不足,影响平行卷积层对于目标对象的识别效果。The parallel convolutional layer improves the robustness of the data and also increases the computational burden of the system. For this reason, the parallel convolutional layer is set after the second down-sampling layer in the segmentation network. At this time, the second The image output by the down-sampling layer has sufficient characteristics to meet the requirements of the parallel sampling layer. At the same time, after the first image is processed by the two sampling layers, the amount of data calculation is greatly reduced. While satisfying the robustness of parallel convolutional layers, it also reduces the computational cost of evaluating convolutional layers. The reason for this is that if the parallel sampling layer is placed after the third sampling layer for image processing, after the first image is processed by the three sampling layers, too many features will be lost, resulting in parallel convolutional layers The lack of obtained features affects the recognition effect of the parallel convolutional layer on the target object.
S104,通过对所述第二图像进行目标识别,获取包含所述目标对象的第三 图像。S104: Acquire a third image containing the target object by performing target recognition on the second image.
可以对第二图像的大小进行调节,例如,可以构建最小化函数min(a;b)=c,其中a是第二图像的宽度,b第二图像的高度,c表示预定义尺度(例如256),并且可从整个图像提取特征图。例如,以3个平行卷积层(1×1、3×3和6×6,总共46个特征向量)为例,可以将这3个平行卷积层用于每个候选窗口以便池化特征。为每个窗口生成11776维(256×46)表示。这些表示可被提供给分割网络的全连接层,通过全连接层来基于这些表示进行目标识别。识别出的目标对象以单独的图像形式进行保存,形成第三图像。The size of the second image can be adjusted. For example, a minimization function min(a;b)=c can be constructed, where a is the width of the second image, b is the height of the second image, and c represents a predefined scale (for example, 256 ), and feature maps can be extracted from the entire image. For example, taking 3 parallel convolutional layers (1×1, 3×3, and 6×6, a total of 46 feature vectors) as an example, these 3 parallel convolutional layers can be used for each candidate window to pool features . A 11776-dimensional (256×46) representation is generated for each window. These representations can be provided to the fully connected layer of the segmentation network, and the fully connected layer is used to perform target recognition based on these representations. The identified target object is saved as a separate image to form a third image.
为了进一步的提高分割网络的处理效率,根据本公开实施例的一种具体实现方式,在对所述第二图像进行目标识别的过程中,在所述平行卷积层之后,可以设置第三下采样层,所述第三下采样层对所述第二图像执行下采样操作。通过设置第三下采样层,能够进一步的降低第二图像的像素值,减少分割网络的计算量。In order to further improve the processing efficiency of the segmentation network, according to a specific implementation of the embodiment of the present disclosure, in the process of performing target recognition on the second image, after the parallel convolutional layer, a third lower A sampling layer, where the third down-sampling layer performs down-sampling operations on the second image. By setting the third down-sampling layer, the pixel value of the second image can be further reduced and the calculation amount of the segmentation network can be reduced.
对于采用GPU等高速计算设备的情景,可以通过增加图像的像素值得方式,提高图像上所包含的特征信息,此时,可以在在第三下采样层之后,设置多个(例如,3个)上采样层,所述上采样层对第三下采样层输出的图像执行上采样操作。通过设置多个上采样层,能够通过插值等方式对第二图像增加更多的图像细节。For scenarios where high-speed computing devices such as GPU are used, the feature information contained in the image can be improved by increasing the pixel value of the image. In this case, multiple (for example, 3) can be set after the third downsampling layer An up-sampling layer, where the up-sampling layer performs an up-sampling operation on the image output by the third down-sampling layer. By setting multiple up-sampling layers, more image details can be added to the second image through interpolation or other methods.
参见图3,根据本公开实施例的一种具体实现方式,所述对所述第二图像进行目标识别,可以包括:Referring to FIG. 3, according to a specific implementation manner of an embodiment of the present disclosure, the performing target recognition on the second image may include:
S301,在所述分割网络中设置全连接层。S301. Set a fully connected layer in the segmented network.
S302,在所述全连接层中,对所述平行卷积层不同节点输出的图像设置不同的权重值以及针对采样层所有节点的偏置值。S302: In the fully connected layer, different weight values and bias values for all nodes of the sampling layer are set for images output by different nodes of the parallel convolution layer.
以x1、x2、x3为平行卷积层的输出为例,则对于全连接层的输出a1、a2、a3,可以用如下公式进行表示:Taking x1, x2, and x3 as the output of the parallel convolutional layer as an example, the output a1, a2, and a3 of the fully connected layer can be expressed by the following formula:
Figure PCTCN2020079192-appb-000001
Figure PCTCN2020079192-appb-000001
其中,
Figure PCTCN2020079192-appb-000002
分别为权重矩阵和偏置向量,权重矩阵中包含不同的权重值,权重值通过对分割网络训练等方式获得。偏置向量中包含不同的偏置值,偏置值可以通过对分割网络进行训练等方式获得。
among them,
Figure PCTCN2020079192-appb-000002
They are a weight matrix and a bias vector. The weight matrix contains different weight values, which are obtained by training the segmentation network. The bias vector contains different bias values, which can be obtained by training the segmentation network.
S303,基于所述权重值和所述偏置值,对所述上采样层输出的图像进行目标识别。S303: Perform target recognition on the image output by the upsampling layer based on the weight value and the bias value.
通过步骤S301-S303中的方式,能够快速的对第二图像中包含的目标对象进行识别。Through the method in steps S301-S303, the target object contained in the second image can be quickly recognized.
参见图4,根据本公开实施例的一种具体实现方式,在构建分割网络的过程中,还可以包括如下步骤:Referring to FIG. 4, according to a specific implementation manner of the embodiments of the present disclosure, the process of constructing a segmentation network may further include the following steps:
S401,获取所述分割网络中所有的卷积层。S401: Obtain all convolutional layers in the segmentation network.
根据不同的需要,分割网络中可以设置多个卷积层,通过对不同的卷积层设置不同的卷积核,可以对需要处理的图像进行相应的处理。According to different needs, multiple convolutional layers can be set in the segmentation network. By setting different convolution kernels for different convolutional layers, the images that need to be processed can be processed accordingly.
S402,获取所有卷积层中每一卷积层输出的特征图像的图像尺寸。S402: Obtain the image size of the feature image output by each convolutional layer in all convolutional layers.
基于卷积核和输入图像的不同,不同卷积层输出的特征图像的尺寸也会不同,此时可以通过对所有的卷积层的输入参数和卷积核进行计算,从而获得每一个卷积层输出图像的尺寸大小。Based on the difference between the convolution kernel and the input image, the size of the feature image output by different convolution layers will also be different. At this time, the input parameters and convolution kernels of all convolution layers can be calculated to obtain each convolution The size of the layer output image.
S403,在将输出相同图像尺寸的卷积层之间进行卷积层连接。S403: Perform convolutional layer connection between convolutional layers outputting the same image size.
深度学习网络中,浅层特征有更多的图像特征,深层特征有更多的语义特征,为了能够将浅层和深层的特征结合到一起,对于能够产生同样尺寸的卷积层,增加卷积层之间的连接,从而降低图像中的边缘锯齿问题。In the deep learning network, the shallow features have more image features, and the deep features have more semantic features. In order to be able to combine the shallow and deep features, for the convolutional layer that can produce the same size, increase the convolution The connection between the layers, thereby reducing the edge jagged problem in the image.
在实现步骤S403的过程中,根据本公开实施例的一种具体实现方式,还可以包括如下步骤:In the process of implementing step S403, according to a specific implementation manner of the embodiment of the present disclosure, the following steps may also be included:
S4031,获取N个输出相同图像尺寸的卷积层x中,第i个卷积层的输入xi和输出H(xi)。S4031: Obtain the input xi and output H(xi) of the i-th convolutional layer among the N convolutional layers x outputting the same image size.
S4032,基于xi和H(xi),构建第i个卷积层的残差函数F(xi)=H(xi)-xi。S4032, based on xi and H(xi), construct a residual function F(xi)=H(xi)-xi of the i-th convolutional layer.
S4033,基于所述残差函数进行卷积层的连接。S4033: Connect the convolutional layer based on the residual function.
具体的,可以设置针对第i个卷积层的映射函数W(xi),以及第i个卷积层的输入xi及第i个卷积层的输出F(xi),然后将F(xi)+W(xi)作为第i+2个卷积层的输入,通过这种方式,对卷积层进行连接。Specifically, the mapping function W(xi) for the i-th convolutional layer, the input xi of the i-th convolutional layer and the output F(xi) of the i-th convolutional layer can be set, and then F(xi) +W(xi) is used as the input of the i+2th convolutional layer. In this way, the convolutional layers are connected.
在将形成第二图像的过程中,为了能够快速的提取第二图像的特征,可以在多个平行卷积层中设置相同大小的卷积核,通过该卷积核,对输入到所述多个平行卷积层中的图像进行特征提取,形成多个特征向量矩阵。基于对分割网络训练的情况,为所述多个特征向量矩阵分配不同的权重值,将不同权重值的特征向量矩阵的和作为所述第二图像的表示矩阵,最终形成第二图像。In the process of forming the second image, in order to be able to quickly extract the features of the second image, a convolution kernel of the same size can be set in multiple parallel convolution layers. Through the convolution kernel, the input to the multiple The images in parallel convolutional layers are feature extracted to form multiple feature vector matrices. Based on the training of the segmentation network, different weight values are assigned to the multiple eigenvector matrices, and the sum of the eigenvector matrices with different weight values is used as the representation matrix of the second image to finally form the second image.
与上面的方法实施例相对应,参见图5,本公开实施例还公开了一种图像处理装置50,包括:Corresponding to the above method embodiment, referring to FIG. 5, an embodiment of the present disclosure also discloses an image processing device 50, including:
获取模块501,用于获取包含目标对象的第一图像。The acquiring module 501 is configured to acquire the first image containing the target object.
目标对象是本公开的方案所要获取的内容,作为一个例子,目标对象可以是一个具有各种动作的人,也可以是具有行为特色的动物,或者是静止的物体等。The target object is the content to be acquired by the solution of the present disclosure. As an example, the target object may be a person with various actions, an animal with behavior characteristics, or a stationary object.
目标对象通常包含在一定的场景中,例如包含人物肖像的照片通常还含有背景,背景可以包括树木、山、河流、以及其他的人物等。此时如果想从图像中将目标对象单独的提取出来,就需要对目标对象进行单独的识别和处理。基于提取出来的目标对象,可以分析目标对象的各种行为。The target object is usually contained in a certain scene. For example, a photo containing a portrait of a person usually also contains a background. The background may include trees, mountains, rivers, and other people. At this time, if you want to extract the target object separately from the image, you need to identify and process the target object separately. Based on the extracted target object, various behaviors of the target object can be analyzed.
第一图像是包含了目标对象的图像,第一图像可以是通过预先存储的一系列照片中的一个,也可以是从一段预先保存的视频中提取出来的视频帧,还可以是从实时直播的视频中提取的一个或多个画面。第一图像中可以包含多个对象,例如用于描述人物动作的照片可以包含目标人物、与目标人物在一起的其他人物、树木、建筑物等。目标人物构成了第一图像的目标对象,与目标人物在一起的其他人物、树木、建筑物等构成了背景图像。基于实际的需要,可以在第一图像中选择一个或多个对象作为目标对象。The first image is an image that contains the target object. The first image can be one of a series of pre-stored photos, a video frame extracted from a pre-saved video, or a live broadcast One or more frames extracted from the video. The first image may include multiple objects. For example, the photo used to describe the action of the person may include the target person, other people with the target person, trees, buildings, etc. The target person constitutes the target object of the first image, and other people, trees, buildings, etc. together with the target person constitute the background image. Based on actual needs, one or more objects can be selected as target objects in the first image.
作为一个例子,可以从视频文件中获取目标对象,对目标对象采集的视频中包含多个帧图像,可以从视频的帧图像中选取多个包含一个或多个目标对象 连续动作的图像,构成图像集合。通过对图像集合中的图像进行选取,能够获取包含目标对象的第一图像。As an example, the target object can be obtained from a video file, and the video collected from the target object contains multiple frame images, and multiple images containing one or more continuous actions of the target object can be selected from the frame images of the video to form an image set. By selecting images in the image collection, the first image containing the target object can be obtained.
设置模块502,用于设置对第一图像进行图像处理的分割网络,所述分割网络包括多个卷积层和下采样层,所述卷积层和所述下采样层间隔分布,所述卷积层对所述第一图像中的目标对象进行特征提取,所述下采样层对所述卷积层输出的图像执行下采样操作。The setting module 502 is configured to set a segmentation network for performing image processing on the first image. The segmentation network includes multiple convolutional layers and down-sampling layers. The convolutional layer and the down-sampling layer are distributed at intervals, and the volume The build-up layer performs feature extraction on the target object in the first image, and the down-sampling layer performs a down-sampling operation on the image output by the convolutional layer.
为了能够对第一图像的进行图像处理,构建基于神经网络模型的分割网络,参见图2,分割网络包括卷积层、采样层和全连接层。In order to be able to perform image processing on the first image, a segmentation network based on the neural network model is constructed. See Figure 2. The segmentation network includes a convolutional layer, a sampling layer and a fully connected layer.
卷积层主要参数包括卷积核的大小和输入特征图的数量,每个卷积层可以包含若干个相同大小的特征图,同一层特征值采用共享权值的方式,每层内的卷积核大小一致。卷积层对输入图像进行卷积计算,并提取输入图像的布局特征。The main parameters of the convolutional layer include the size of the convolution kernel and the number of input feature maps. Each convolutional layer can contain several feature maps of the same size. The feature values of the same layer adopt the method of sharing weights. The convolution in each layer The core size is the same. The convolution layer performs convolution calculation on the input image and extracts the layout features of the input image.
卷积层的特征提取层后面都可以与采样层连接,采样层用来求输入图像的局部平均值并进行二次特征提取,通过将采样层与卷积层连接,能够保证神经网络模型对于输入图像具有较好的鲁棒性。The feature extraction layer of the convolutional layer can be connected to the sampling layer. The sampling layer is used to find the local average value of the input image and perform secondary feature extraction. By connecting the sampling layer and the convolutional layer, the neural network model can ensure that the input The image has good robustness.
采样层可以包括上采样层和下采样层,上采样层通过对输入图像进行插值等方式,增加图像中的像素信息。下采样层通过对输入的图像进行特征提取的方式,提取输入图像的特征,The sampling layer may include an up-sampling layer and a down-sampling layer. The up-sampling layer adds pixel information in the image by interpolating the input image. The down-sampling layer extracts the features of the input image by extracting the features of the input image.
为了加快分割网络的训练速度,还可以在卷积层后面还设置有池化层(图中未示出),池化层采用最大池化的方式对卷积层的输出结果进行处理,能够更好的提取输入图像的不变性特征。In order to speed up the training speed of the segmentation network, a pooling layer (not shown in the figure) can also be provided after the convolutional layer. The pooling layer uses the maximum pooling method to process the output results of the convolutional layer, which can be more Good to extract the invariant features of the input image.
全连接层将经过多个卷积层和池化层的图像特征图中的特征进行整合,获取输入图像特征具有的分类特征,以用于图像分类。在分割网络的神经网络模型中,全连接层将卷积层产生的特征图映射成一个固定长度的特征向量。该特征向量包含了输入图像所有特征的组合信息,该特征向量将图像中含有最具有特点的图像特征保留了下来以完成图像分类任务。这样一来便可以计算输入的图像对应的预测图,从而确定第一图像中所包含的目标对象。The fully connected layer integrates the features in the image feature maps that have passed through multiple convolutional layers and pooling layers, and obtains the classification features of the input image features for image classification. In the neural network model of the segmentation network, the fully connected layer maps the feature map generated by the convolutional layer into a fixed-length feature vector. The feature vector contains the combined information of all the features of the input image, and the feature vector retains the most characteristic image features in the image to complete the image classification task. In this way, the prediction map corresponding to the input image can be calculated, thereby determining the target object contained in the first image.
为了提高分割网络的计算速度,在分割网络中设置下采样层,下采样层和 卷积层间隔分布,卷积层对所述第一图像中的目标对象进行特征提取,下采样层对所述卷积层输出的图像执行下采样操作。通过这种设置方式,提高了分割网络对于第一图像的计算速度。In order to improve the calculation speed of the segmentation network, a down-sampling layer, down-sampling layer and convolutional layer interval distribution are set in the segmentation network. The convolutional layer performs feature extraction on the target object in the first image, and the down-sampling layer The image output by the convolutional layer performs down-sampling. Through this setting method, the calculation speed of the segmentation network for the first image is improved.
处理模块503,用于在所述分割网络中第二个下采样层之后,设置多个不同采样率平行卷积层,所述平行卷积层用于处理第二个下采样层输出的图像,每个平行卷积层上提取的图像特征通过融合的方式,形成第二图像。The processing module 503 is configured to set multiple parallel convolutional layers with different sampling rates after the second downsampling layer in the segmentation network, and the parallel convolutional layers are used to process the image output by the second downsampling layer, The image features extracted on each parallel convolutional layer are merged to form a second image.
传统神经网络的缺点是他们需要输入固定大小的图像,而事实上,由于对图像处理的不同,输入到神经网络中的图像可能已经经过裁剪或扭曲处理,经过裁剪或扭曲的图像会由于存在内容丢失的情况,导致神经网络对于输入图像中待识别物体的识别准确性降低。除此之外,当同样的目标对象在不同的图像中的尺寸发生变化时,也会降低神经网络对目标对象的识别准确度。The disadvantage of traditional neural networks is that they need to input fixed-size images. In fact, due to the difference in image processing, the images input into the neural network may have been cropped or distorted. The cropped or distorted images will have content due to The loss situation causes the neural network to reduce the recognition accuracy of the object to be recognized in the input image. In addition, when the size of the same target object in different images changes, the recognition accuracy of the target object by the neural network will also be reduced.
为了进一步提高分割网络对于第一图像的自适应性,参见图2,在分割网络中设置平行卷积层,具体的,在分割网络中第二个下采样层之后,设置多个不同采样率平行卷积层,平行卷积层用于处理第二个下采样层输出的图像,每个平行卷积层上提取的图像特征通过融合的方式,形成第二图像。In order to further improve the adaptability of the segmentation network to the first image, refer to Figure 2. Parallel convolutional layers are set in the segmentation network. Specifically, after the second down-sampling layer in the segmentation network, multiple parallel sampling rates are set. Convolutional layer, the parallel convolutional layer is used to process the image output by the second down-sampling layer, and the image features extracted on each parallel convolutional layer are fused to form a second image.
利用平行卷积层进行图像处理,输入图像或输入图像中目标对象可以是任意宽高比或任意的尺寸大小。当输入图像为不同尺度时,分割网络可以按不同尺度进行提取特征。例如,平行卷积层可以采用4×4、2×2和1×1的卷积核对输入的图像分别进行特征计算,从而得到3路独立处理的图像,通过将3路独立处理的图像进行融合,能够形成第二图像。由于第二图像的形成不受输入图像大小或比例的现实,从而进一步的提高了分割网络的鲁棒性。Using parallel convolutional layers for image processing, the input image or the target object in the input image can have any aspect ratio or any size. When the input image is of different scales, the segmentation network can extract features at different scales. For example, the parallel convolution layer can use 4×4, 2×2, and 1×1 convolution kernels to perform feature calculations on the input images respectively, so as to obtain 3 independently processed images, and merge the 3 independently processed images , A second image can be formed. Since the formation of the second image is not affected by the reality of the size or proportion of the input image, the robustness of the segmentation network is further improved.
通过该实施方式,各实施例不限于对特定大小、形状或类型的对象的检测,也不限于对特定大小、类型或内容的图像的检测。根据各实施例的使用平行卷积层池化来进行图像处理的***可在能作用于任何大小、类型或内容的图像。Through this embodiment, each embodiment is not limited to the detection of objects of a specific size, shape, or type, nor is it limited to the detection of images of a specific size, type, or content. The system for image processing using parallel convolutional layer pooling according to various embodiments can be applied to images of any size, type, or content.
平行卷积层提高了数据鲁棒性的同时,也增加了***的计算负担,为此,将平行卷积层设置在所述分割网络中第二个下采样层之后,此时,第二个下采样层输出的图像具有足够的特征来满足平行采样层的需求,同时第一图像经过两个采样层的处理之后,数据的计算量大为降低。在满足平行卷积层鲁棒性的 同时,降低了评价卷积层的计算消耗。之所以如此,是因为若将平行采样层放置在第三个采样层之后进行图像处理,则第一图像经过三个采样层的处理之后,将会损失过多的特征,从而导致平行卷积层获得的特征不足,影响平行卷积层对于目标对象的识别效果。The parallel convolutional layer improves the robustness of the data and also increases the computational burden of the system. For this reason, the parallel convolutional layer is set after the second down-sampling layer in the segmentation network. At this time, the second The image output by the down-sampling layer has sufficient characteristics to meet the requirements of the parallel sampling layer. At the same time, after the first image is processed by the two sampling layers, the amount of data calculation is greatly reduced. While satisfying the robustness of parallel convolutional layers, it also reduces the computational cost of evaluating convolutional layers. The reason for this is that if the parallel sampling layer is placed after the third sampling layer for image processing, after the first image is processed by the three sampling layers, too many features will be lost, resulting in parallel convolutional layers The lack of obtained features affects the recognition effect of the parallel convolutional layer on the target object.
执行模块504,用于通过对所述第二图像进行目标识别,获取包含所述目标对象的第三图像。The execution module 504 is configured to obtain a third image containing the target object by performing target recognition on the second image.
可以对第二图像的大小进行调节,例如,可以构建最小化函数min(a;b)=c,其中a是第二图像的宽度,b第二图像的高度,c表示预定义尺度(例如256),并且可从整个图像提取特征图。例如,以3个平行卷积层(1×1、3×3和6×6,总共46个特征向量)为例,可以将这3个平行卷积层用于每个候选窗口以便池化特征。为每个窗口生成11776维(256×46)表示。这些表示可被提供给分割网络的全连接层,通过全连接层来基于这些表示进行目标识别。识别出的目标对象以单独的图像形式进行保存,形成第三图像。The size of the second image can be adjusted. For example, a minimization function min(a;b)=c can be constructed, where a is the width of the second image, b is the height of the second image, and c represents a predefined scale (for example, 256 ), and feature maps can be extracted from the entire image. For example, taking 3 parallel convolutional layers (1×1, 3×3, and 6×6, a total of 46 feature vectors) as an example, these 3 parallel convolutional layers can be used for each candidate window to pool features . A 11776-dimensional (256×46) representation is generated for each window. These representations can be provided to the fully connected layer of the segmentation network, and the fully connected layer is used to perform target recognition based on these representations. The identified target object is saved as a separate image to form a third image.
图5所示装置可以对应的执行上述方法实施例中的内容,本实施例未详细描述的部分,参照上述方法实施例中记载的内容,在此不再赘述。The device shown in FIG. 5 can correspondingly execute the content in the foregoing method embodiment. For parts that are not described in detail in this embodiment, refer to the content recorded in the foregoing method embodiment, which will not be repeated here.
参见图6,本公开实施例还提供了一种电子设备60,该电子设备包括:Referring to FIG. 6, an embodiment of the present disclosure also provides an electronic device 60, which includes:
至少一个处理器;以及,At least one processor; and,
与该至少一个处理器通信连接的存储器;其中,A memory communicatively connected with the at least one processor; wherein,
该存储器存储有可被该至少一个处理器执行的指令,该指令被该至少一个处理器执行,以使该至少一个处理器能够执行前述方法实施例中图像处理方法。The memory stores an instruction executable by the at least one processor, and the instruction is executed by the at least one processor, so that the at least one processor can execute the image processing method in the foregoing method embodiment.
本公开实施例还提供了一种非暂态计算机可读存储介质,该非暂态计算机可读存储介质存储计算机指令,该计算机指令用于使该计算机执行前述方法实施例中。The embodiments of the present disclosure also provide a non-transitory computer-readable storage medium that stores computer instructions, and the computer instructions are used to make the computer execute the foregoing method embodiments.
本公开实施例还提供了一种计算机程序产品,该计算机程序产品包括存储在非暂态计算机可读存储介质上的计算程序,该计算机程序包括程序指令,当该程序指令被计算机执行时,使该计算机执行前述方法实施例中的图像处理方法。The embodiments of the present disclosure also provide a computer program product, the computer program product includes a calculation program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, when the program instructions are executed by a computer, The computer executes the image processing method in the foregoing method embodiment.
下面参考图6,其示出了适于用来实现本公开实施例的电子设备60的结构 示意图。本公开实施例中的电子设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图6示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。Reference is now made to Fig. 6 which shows a schematic structural diagram of an electronic device 60 suitable for implementing embodiments of the present disclosure. Electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (for example, Mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers, etc. The electronic device shown in FIG. 6 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
如图6所示,电子设备60可以包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有电子设备60操作所需的各种程序和数据。处理装置601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。As shown in FIG. 6, the electronic device 60 may include a processing device (such as a central processing unit, a graphics processor, etc.) 601, which may be loaded into a random access device according to a program stored in a read-only memory (ROM) 602 or from a storage device 608 The program in the memory (RAM) 603 executes various appropriate actions and processing. The RAM 603 also stores various programs and data required for the operation of the electronic device 60. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to the bus 604.
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、图像传感器、麦克风、加速度计、陀螺仪等的输入装置606;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置607;包括例如磁带、硬盘等的存储装置608;以及通信装置609。通信装置609可以允许电子设备60与其他设备进行无线或有线通信以交换数据。虽然图中示出了具有各种装置的电子设备60,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。Generally, the following devices can be connected to the I/O interface 605: including input devices 606 such as touch screen, touch panel, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; including, for example, liquid crystal display (LCD), speakers, An output device 607 such as a vibrator; a storage device 608 such as a magnetic tape, a hard disk, etc.; and a communication device 609. The communication device 609 may allow the electronic device 60 to perform wireless or wired communication with other devices to exchange data. Although the figure shows the electronic device 60 with various devices, it should be understood that it is not required to implement or have all the devices shown. It may be implemented alternatively or provided with more or fewer devices.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM 602被安装。在该计算机程序被处理装置601执行时,执行本公开实施例的方法中限定的上述功能。In particular, according to an embodiment of the present disclosure, the process described above with reference to the flowchart can be implemented as a computer software program. For example, the embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602. When the computer program is executed by the processing device 601, the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的***、装 置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行***、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行***、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。It should be noted that the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two. The computer-readable storage medium may be, for example, but not limited to, an electric, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device. In the present disclosure, a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium. The computer-readable signal medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device . The program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wire, optical cable, RF (Radio Frequency), etc., or any suitable combination of the above.
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or it may exist alone without being assembled into the electronic device.
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取至少两个网际协议地址;向节点评价设备发送包括所述至少两个网际协议地址的节点评价请求,其中,所述节点评价设备从所述至少两个网际协议地址中,选取网际协议地址并返回;接收所述节点评价设备返回的网际协议地址;其中,所获取的网际协议地址指示内容分发网络中的边缘节点。The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: obtains at least two Internet protocol addresses; and sends to the node evaluation device including the at least two A node evaluation request for an Internet Protocol address, wherein the node evaluation device selects an Internet Protocol address from the at least two Internet Protocol addresses and returns it; receives the Internet Protocol address returned by the node evaluation device; wherein, the obtained The Internet Protocol address indicates the edge node in the content distribution network.
或者,上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:接收包括至少两个网际协议地址的节点评价请求;从所述至少两个网际协议地址中,选取网际协议地址;返回选取出的网际协议地址;其中,接收到的网际协议地址指示内容分发网络中的边缘节点。Alternatively, the aforementioned computer-readable medium carries one or more programs, and when the aforementioned one or more programs are executed by the electronic device, the electronic device: receives a node evaluation request including at least two Internet Protocol addresses; Among the at least two Internet Protocol addresses, select an Internet Protocol address; return the selected Internet Protocol address; wherein, the received Internet Protocol address indicates an edge node in the content distribution network.
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的 计算机程序代码,上述程序设计语言包括面向对象的程序设计语言-诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言-诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)-连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。The computer program code used to perform the operations of the present disclosure may be written in one or more programming languages or a combination thereof. The above-mentioned programming languages include object-oriented programming languages-such as Java, Smalltalk, C++, and also conventional Procedural programming language-such as "C" language or similar programming language. The program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).
附图中的流程图和框图,图示了按照本公开各种实施例的***、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的***来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the accompanying drawings illustrate the possible implementation architecture, functions, and operations of the system, method, and computer program product according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram can represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logical function Executable instructions. It should also be noted that, in some alternative implementations, the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。The units involved in the embodiments described in the present disclosure can be implemented in software or hardware. Wherein, the name of the unit does not constitute a limitation on the unit itself under certain circumstances. For example, the first obtaining unit can also be described as "a unit for obtaining at least two Internet Protocol addresses."
应当理解,本公开的各部分可以用硬件、软件、固件或它们的组合来实现。It should be understood that each part of the present disclosure can be implemented by hardware, software, firmware or a combination thereof.
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以权利要求的保护范围为准。The above are only specific implementations of the present disclosure, but the protection scope of the present disclosure is not limited thereto. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present disclosure. All should be covered within the protection scope of this disclosure. Therefore, the protection scope of the present disclosure should be subject to the protection scope of the claims.

Claims (11)

  1. 一种图像处理方法,其特征在于,包括:An image processing method, characterized by comprising:
    获取包含目标对象的第一图像;Acquiring the first image containing the target object;
    设置对第一图像进行图像处理的分割网络,所述分割网络包括多个卷积层和下采样层,所述卷积层和所述下采样层间隔分布,所述卷积层对所述第一图像中的目标对象进行特征提取,所述下采样层对所述卷积层输出的图像执行下采样操作;A segmentation network for performing image processing on the first image is set. The segmentation network includes a plurality of convolutional layers and down-sampling layers. The convolutional layer and the down-sampling layer are spaced apart. Perform feature extraction on a target object in an image, and the down-sampling layer performs down-sampling on the image output by the convolutional layer;
    在所述分割网络中第二个下采样层之后,设置多个不同采样率平行卷积层,所述平行卷积层用于处理第二个下采样层输出的图像,每个平行卷积层上提取的图像特征通过融合的方式,形成第二图像;After the second down-sampling layer in the segmentation network, multiple parallel convolutional layers with different sampling rates are set. The parallel convolutional layer is used to process the image output by the second down-sampling layer. Each parallel convolutional layer The image features extracted above form a second image through fusion;
    通过对所述第二图像进行目标识别,获取包含所述目标对象的第三图像。By performing target recognition on the second image, a third image containing the target object is acquired.
  2. 根据权利要求1所述的方法,其特征在于,所述对所述第二图像进行目标识别,包括:The method of claim 1, wherein the performing target recognition on the second image comprises:
    在所述平行卷积层之后,设置第三下采样层,所述第三下采样层对所述第二图像执行下采样操作。After the parallel convolutional layer, a third down-sampling layer is provided, and the third down-sampling layer performs a down-sampling operation on the second image.
  3. 根据权利要求2所述的方法,其特征在于,所述对所述第二图像进行目标识别,还包括:The method according to claim 2, wherein the performing target recognition on the second image further comprises:
    在所述第三下采样层之后,设置多个上采样层,所述上采样层对第三下采样层输出的图像执行上采样操作。After the third down-sampling layer, a plurality of up-sampling layers are set, and the up-sampling layer performs an up-sampling operation on the image output by the third down-sampling layer.
  4. 根据权利要求3所述的方法,其特征在于,所述对所述第二图像进行目标识别,还包括:The method according to claim 3, wherein the performing target recognition on the second image further comprises:
    在所述分割网络中设置全连接层;Setting a fully connected layer in the segmented network;
    在所述全连接层中,对所述平行卷积层不同节点输出的图像设置不同的权重值以及针对采样层所有节点的偏置值;In the fully connected layer, different weight values and bias values for all nodes of the sampling layer are set for images output by different nodes of the parallel convolutional layer;
    基于所述权重值和所述偏置值,对所述上采样层输出的图像进行目标识别。Based on the weight value and the bias value, target recognition is performed on the image output by the upsampling layer.
  5. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1, wherein the method further comprises:
    获取所述分割网络中所有的卷积层;Acquiring all convolutional layers in the segmentation network;
    获取所有卷积层中每一卷积层输出的特征图像的图像尺寸;Obtain the image size of the feature image output by each convolutional layer in all convolutional layers;
    在将输出相同图像尺寸的卷积层之间进行卷积层连接。Connect convolutional layers between convolutional layers that will output the same image size.
  6. 根据权利要求5所述的方法,其特征在于,所述在将输出相同图像尺寸的卷积层之间进行卷积层连接,包括:The method according to claim 5, wherein the connecting convolutional layers between convolutional layers of the same image size, comprising:
    获取N个输出相同图像尺寸的卷积层x中,第i个卷积层的输入xi和输出H(xi);Obtain the input xi and output H(xi) of the i-th convolutional layer among N convolutional layers x that output the same image size;
    基于xi和H(xi),构建第i个卷积层的残差函数F(xi)=H(xi)-xi;Based on xi and H(xi), construct the residual function F(xi)=H(xi)-xi of the i-th convolutional layer;
    基于所述残差函数进行卷积层的连接。The convolutional layer is connected based on the residual function.
  7. 根据权利要求6所述的方法,其特征在于,所述基于所述残差函数进行卷积层的连接,包括:The method according to claim 6, wherein the connecting convolutional layers based on the residual function comprises:
    设置针对第i个卷积层的映射函数W(xi);Set the mapping function W(xi) for the i-th convolutional layer;
    获取第i个卷积层的输入xi及第i个卷积层的输出F(xi);Obtain the input xi of the i-th convolutional layer and the output F(xi) of the i-th convolutional layer;
    将F(xi)+W(xi)作为第i+2个卷积层的输入。Take F(xi)+W(xi) as the input of the i+2th convolutional layer.
  8. 根据权利要求1所述的方法,其特征在于,所述每个平行卷积层上提取的图像特征通过融合的方式,形成第二图像,包括:The method according to claim 1, wherein the image features extracted on each parallel convolutional layer are fused to form a second image, comprising:
    在所述多个平行卷积层中设置相同大小的卷积核;Setting convolution kernels of the same size in the multiple parallel convolution layers;
    基于所述卷积核,对输入到所述多个平行卷积层中的图像进行特征提取,形成多个特征向量矩阵;Based on the convolution kernel, perform feature extraction on the images input to the multiple parallel convolution layers to form multiple feature vector matrices;
    为所述多个特征向量矩阵分配不同的权重值,将不同权重值的特征向量矩阵的和作为所述第二图像的表示矩阵。Different weight values are assigned to the multiple eigenvector matrices, and the sum of the eigenvector matrices with different weight values is used as the representation matrix of the second image.
  9. 一种图像处理装置,其特征在于,包括:An image processing device, characterized by comprising:
    获取模块,用于获取包含目标对象的第一图像;An obtaining module, used to obtain the first image containing the target object;
    设置模块,用于设置对第一图像进行图像处理的分割网络,所述分割网络包括多个卷积层和下采样层,所述卷积层和所述下采样层间隔分布,所述卷积层对所述第一图像中的目标对象进行特征提取,所述下采样层对所述卷积层输出的图像执行下采样操作;The setting module is configured to set a segmentation network for performing image processing on the first image. The segmentation network includes a plurality of convolutional layers and downsampling layers. The convolutional layer and the downsampling layer are distributed at intervals, and the convolution A layer performs feature extraction on the target object in the first image, and the down-sampling layer performs a down-sampling operation on the image output by the convolutional layer;
    处理模块,用于在所述分割网络中第二个下采样层之后,设置多个不同采样率平行卷积层,所述平行卷积层用于处理第二个下采样层输出的图像,每个 平行卷积层上提取的图像特征通过融合的方式,形成第二图像;The processing module is used to set multiple parallel convolutional layers with different sampling rates after the second down-sampling layer in the segmentation network. The parallel convolutional layers are used to process the image output by the second down-sampling layer. The image features extracted on two parallel convolutional layers are fused to form a second image;
    执行模块,用于通过对所述第二图像进行目标识别,获取包含所述目标对象的第三图像。The execution module is configured to obtain a third image containing the target object by performing target recognition on the second image.
  10. 一种电子设备,其特征在于,所述电子设备包括:An electronic device, characterized in that, the electronic device includes:
    至少一个处理器;以及,At least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,A memory communicatively connected with the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行前述任一权利要求1-8所述的图像处理方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute any of the preceding claims 1-8. Image processing method.
  11. 一种非暂态计算机可读存储介质,该非暂态计算机可读存储介质存储计算机指令,该计算机指令用于使该计算机执行前述任一权利要求1-8所述的图像处理方法。A non-transitory computer-readable storage medium that stores computer instructions for causing the computer to execute the image processing method according to any one of the foregoing claims 1-8.
PCT/CN2020/079192 2019-05-15 2020-03-13 Image processing method and apparatus, and electronic device WO2020228405A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910403859.XA CN110222726A (en) 2019-05-15 2019-05-15 Image processing method, device and electronic equipment
CN201910403859.X 2019-05-15

Publications (1)

Publication Number Publication Date
WO2020228405A1 true WO2020228405A1 (en) 2020-11-19

Family

ID=67821169

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/079192 WO2020228405A1 (en) 2019-05-15 2020-03-13 Image processing method and apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN110222726A (en)
WO (1) WO2020228405A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112651983A (en) * 2020-12-15 2021-04-13 北京百度网讯科技有限公司 Mosaic image identification method and device, electronic equipment and storage medium
CN113469083A (en) * 2021-07-08 2021-10-01 西安电子科技大学 SAR image target classification method and system based on anti-sawtooth convolution neural network

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222726A (en) * 2019-05-15 2019-09-10 北京字节跳动网络技术有限公司 Image processing method, device and electronic equipment
CN111369468B (en) * 2020-03-09 2022-02-01 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and computer readable medium
CN111931600B (en) * 2020-07-21 2021-04-06 深圳市鹰硕教育服务有限公司 Intelligent pen image processing method and device and electronic equipment
CN113691863B (en) * 2021-07-05 2023-06-20 浙江工业大学 Lightweight method for extracting video key frames
CN113936220B (en) * 2021-12-14 2022-03-04 深圳致星科技有限公司 Image processing method, storage medium, electronic device, and image processing apparatus
CN117437429A (en) * 2022-07-15 2024-01-23 华为技术有限公司 Image data processing method, device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862287A (en) * 2017-11-08 2018-03-30 吉林大学 A kind of front zonule object identification and vehicle early warning method
CN110046607A (en) * 2019-04-26 2019-07-23 西安因诺航空科技有限公司 A kind of unmanned aerial vehicle remote sensing image board house or building materials test method based on deep learning
CN110222726A (en) * 2019-05-15 2019-09-10 北京字节跳动网络技术有限公司 Image processing method, device and electronic equipment
CN110456805A (en) * 2019-06-24 2019-11-15 深圳慈航无人智能***技术有限公司 A kind of UAV Intelligent tracking flight system and method

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10282663B2 (en) * 2015-08-15 2019-05-07 Salesforce.Com, Inc. Three-dimensional (3D) convolution with 3D batch normalization
CN106920227B (en) * 2016-12-27 2019-06-07 北京工业大学 The Segmentation Method of Retinal Blood Vessels combined based on deep learning with conventional method
CN107292352B (en) * 2017-08-07 2020-06-02 北京中星微人工智能芯片技术有限公司 Image classification method and device based on convolutional neural network
CN107657257A (en) * 2017-08-14 2018-02-02 中国矿业大学 A kind of semantic image dividing method based on multichannel convolutive neutral net
CN107909113B (en) * 2017-11-29 2021-11-16 北京小米移动软件有限公司 Traffic accident image processing method, device and storage medium
CN108022647B (en) * 2017-11-30 2022-01-25 东北大学 Lung nodule benign and malignant prediction method based on ResNet-inclusion model
CN108615010B (en) * 2018-04-24 2022-02-11 重庆邮电大学 Facial expression recognition method based on parallel convolution neural network feature map fusion
CN108986124A (en) * 2018-06-20 2018-12-11 天津大学 In conjunction with Analysis On Multi-scale Features convolutional neural networks retinal vascular images dividing method
CN109389030B (en) * 2018-08-23 2022-11-29 平安科技(深圳)有限公司 Face characteristic point detection method and device, computer equipment and storage medium
CN109344878B (en) * 2018-09-06 2021-03-30 北京航空航天大学 Eagle brain-like feature integration small target recognition method based on ResNet

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862287A (en) * 2017-11-08 2018-03-30 吉林大学 A kind of front zonule object identification and vehicle early warning method
CN110046607A (en) * 2019-04-26 2019-07-23 西安因诺航空科技有限公司 A kind of unmanned aerial vehicle remote sensing image board house or building materials test method based on deep learning
CN110222726A (en) * 2019-05-15 2019-09-10 北京字节跳动网络技术有限公司 Image processing method, device and electronic equipment
CN110456805A (en) * 2019-06-24 2019-11-15 深圳慈航无人智能***技术有限公司 A kind of UAV Intelligent tracking flight system and method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112651983A (en) * 2020-12-15 2021-04-13 北京百度网讯科技有限公司 Mosaic image identification method and device, electronic equipment and storage medium
CN112651983B (en) * 2020-12-15 2023-08-01 北京百度网讯科技有限公司 Splice graph identification method and device, electronic equipment and storage medium
CN113469083A (en) * 2021-07-08 2021-10-01 西安电子科技大学 SAR image target classification method and system based on anti-sawtooth convolution neural network
CN113469083B (en) * 2021-07-08 2024-05-31 西安电子科技大学 SAR image target classification method and system based on antialiasing convolutional neural network

Also Published As

Publication number Publication date
CN110222726A (en) 2019-09-10

Similar Documents

Publication Publication Date Title
WO2020228405A1 (en) Image processing method and apparatus, and electronic device
CN110189246B (en) Image stylization generation method and device and electronic equipment
JP2023547917A (en) Image segmentation method, device, equipment and storage medium
CN110399848A (en) Video cover generation method, device and electronic equipment
CN110070551B (en) Video image rendering method and device and electronic equipment
WO2020228383A1 (en) Mouth shape generation method and apparatus, and electronic device
CN110796664B (en) Image processing method, device, electronic equipment and computer readable storage medium
WO2022237811A1 (en) Image processing method and apparatus, and device
CN110399847B (en) Key frame extraction method and device and electronic equipment
CN112232311B (en) Face tracking method and device and electronic equipment
CN111222509A (en) Target detection method and device and electronic equipment
CN110211017B (en) Image processing method and device and electronic equipment
CN110555861B (en) Optical flow calculation method and device and electronic equipment
CN110197459B (en) Image stylization generation method and device and electronic equipment
WO2024012255A1 (en) Semantic segmentation model training method and apparatus, electronic device, and storage medium
WO2024041235A1 (en) Image processing method and apparatus, device, storage medium and program product
CN110060324B (en) Image rendering method and device and electronic equipment
CN114419322B (en) Image instance segmentation method and device, electronic equipment and storage medium
CN115100536B (en) Building identification method and device, electronic equipment and computer readable medium
CN112052863B (en) Image detection method and device, computer storage medium and electronic equipment
WO2021073204A1 (en) Object display method and apparatus, electronic device, and computer readable storage medium
CN115311414A (en) Live-action rendering method and device based on digital twinning and related equipment
CN115082828A (en) Video key frame extraction method and device based on dominating set
CN111200705B (en) Image processing method and device
CN113808151A (en) Method, device and equipment for detecting weak semantic contour of live image and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20806268

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20806268

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.03.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20806268

Country of ref document: EP

Kind code of ref document: A1