CN113256656A

CN113256656A - Image segmentation method and device

Info

Publication number: CN113256656A
Application number: CN202110593641.2A
Authority: CN
Inventors: 柯磊; 戴宇荣
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2021-08-13

Abstract

The image segmentation method and the device are characterized by comprising the following steps: acquiring image characteristics in a preset region including an object to be segmented; acquiring image characteristics of an obstruction for obstructing the object to be segmented through a first neural network based on the image characteristics in the predetermined area, wherein the obstruction is an object for obstructing the object to be segmented in the predetermined area; acquiring the image characteristics of the object to be segmented based on the image characteristics in the preset area and the image characteristics of the obstruction through a second neural network.

Description

Image segmentation method and device

Technical Field

The present application relates to the field of image processing, and more particularly, to an image segmentation method and apparatus.

Background

Instance segmentation (instance segmentation) is a basic algorithm in image and/or video scene understanding, which organically combines object detection (object detection) and semantic segmentation (semantic segmentation), and not only needs to predict whether each pixel of an input image belongs to an object, but also needs to distinguish pixels contained in different objects. Fig. 1 is a schematic diagram illustrating example segmentation of the prior art, which has been used on a large scale in related fields such as automatic matting, medical imaging, automatic driving, and the like.

However, mutual occlusion of objects is common in daily life, and severe occlusion tends to result in confusing occlusion boundaries and non-continuous natural object shapes. Fig. 2 is a diagram showing example segmentation results in a severe occlusion situation of the prior art, wherein (a) shows an input image, (b) shows a correct example segmentation result, and (c) shows an example segmentation result of the prior art. As shown in fig. 2, the image segmentation algorithm in the prior art lacks modeling of an occlusion object and an occlusion relationship, can only process objects of relatively limited categories, and cannot well process object segmentation under the occlusion condition, and under the high occlusion condition of similar objects, a large-area occlusion prediction error is easily generated, so that the model performance is significantly reduced under the occlusion condition, and the operation speed is slow; in addition, the image segmentation algorithm in the prior art cannot well distinguish similar objects with high overlapping in the image, and aggregation errors are easy to generate.

Disclosure of Invention

According to an exemplary embodiment of the present invention, there is provided an image segmentation method including: acquiring image characteristics in a preset region including an object to be segmented; acquiring image characteristics of an obstruction for obstructing the object to be segmented through a first neural network based on the image characteristics in the predetermined area, wherein the obstruction is an object for obstructing the object to be segmented in the predetermined area; acquiring the image characteristics of the object to be segmented based on the image characteristics in the preset area and the image characteristics of the obstruction through a second neural network.

The step of obtaining image features of the obstruction may comprise: and acquiring boundary information and mask information of the shielding object from the image characteristics in the preset region, wherein the mask information indicates pixel points belonging to the shielding object in the image.

The step of obtaining the image features of the object to be segmented may comprise: and acquiring boundary information and mask information of the object to be segmented based on the image features in the preset region and the image features of the shielding object, wherein the mask information indicates pixel points belonging to the object to be segmented in the image.

The step of obtaining image features within a predetermined region comprising an object to be segmented may comprise: acquiring image characteristics of an image, and performing object detection on the image characteristics of the image to acquire a region of interest including an object to be segmented as the predetermined region to acquire image characteristics in the region of interest.

The step of acquiring image features of the image may comprise: the low-dimensional image features of the image are subjected to predetermined processing to acquire high-dimensional image features of the image.

The first neural network may include a first convolution layer, wherein the first convolution layer associates image features of the obstruction according to a distance between the high-dimensional image features by using the high-dimensional image features to acquire the image features of the obstruction, and the second neural network may include a second convolution layer, wherein the second convolution layer associates image features of the object to be segmented according to a distance between the high-dimensional image features by using the high-dimensional image features and the associated image features of the obstruction to acquire the image features of the object to be segmented.

The first convolutional layer and the second convolutional layer may be non-local operator based map convolutional layers.

The first neural network may include sequentially connected single convolutional layers, first convolutional layers, and full convolutional layers, and the second neural network may include sequentially connected single convolutional layers, second convolutional layers, and full convolutional layers.

According to an exemplary embodiment of the present invention, there is provided an image segmentation apparatus characterized by including: a feature acquisition unit configured to acquire an image feature within a predetermined region including an object to be segmented; an obstruction acquisition unit configured to acquire, by a first neural network, an image feature of an obstruction that obstructs the object to be segmented based on an image feature within the predetermined area, wherein the obstruction is an object that obstructs the object to be segmented within the predetermined area; an object obtaining unit configured to obtain, through a second neural network, image features of the object to be segmented based on the image features within the predetermined region and the image features of the obstruction.

The obstruction obtaining unit can obtain boundary information and mask information of the obstruction from image features in the predetermined area, wherein the mask information indicates pixel points belonging to the obstruction in the image.

The object obtaining unit may obtain boundary information and mask information of the object to be segmented based on image features within the predetermined region and image features of the obstruction, where the mask information indicates pixel points in the image that belong to the object to be segmented.

The feature acquisition unit may acquire an image feature of an image, and perform object detection on the image feature of the image to acquire a region of interest including an object to be segmented as the predetermined region to acquire an image feature within the region of interest.

The feature acquisition unit may perform predetermined processing on the low-dimensional image feature of the image to acquire the high-dimensional image feature of the image.

According to an exemplary embodiment of the present invention, there is provided an electronic apparatus, including: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the image segmentation method described above.

According to an exemplary embodiment of the present invention, a computer-readable storage medium is provided, characterized in that instructions in the computer-readable storage medium, when executed by at least one processor, enable the at least one processor to perform the above-mentioned image segmentation method.

According to an exemplary embodiment of the invention, a computer program product is provided, characterized in that instructions in the computer program product are executed by at least one processor to perform the above-described image segmentation method.

Additional aspects and/or advantages of the present general inventive concept will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the general inventive concept.

Drawings

The above and other objects and features of the present invention will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a schematic diagram illustrating an example segmentation of the prior art;

FIG. 2 is a diagram illustrating an example segmentation result under severe occlusion conditions of the prior art;

FIG. 3 is a flowchart illustrating an image segmentation method according to an exemplary embodiment of the present invention;

FIG. 4 is a diagram illustrating an image segmentation method according to an exemplary embodiment of the present invention;

FIG. 5 is a diagram illustrating a non-local operator based map convolution layer according to an exemplary embodiment of the present invention;

fig. 6 is a block diagram illustrating an image segmentation apparatus according to an exemplary embodiment of the present invention;

FIG. 7 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present invention;

fig. 8 is a diagram illustrating a server according to an exemplary embodiment of the present invention;

fig. 9 is a diagram illustrating an example segmentation result according to an exemplary embodiment of the present invention.

Detailed Description

Reference will now be made in detail to the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.

Fig. 3 is a flowchart illustrating an image segmentation method according to an exemplary embodiment of the present invention, and fig. 4 is a schematic diagram illustrating an image segmentation method according to an exemplary embodiment of the present invention. An image segmentation method according to an exemplary embodiment of the present invention will be described below with reference to fig. 3 and 4.

Referring to fig. 3, in step S310, an image feature within a predetermined region including an object to be segmented may be acquired. More specifically, image features of an image may be acquired by feature extraction, and an object detection may be performed on the image features of the image to acquire a region of interest including an object to be segmented as the predetermined region to acquire image features within the region of interest. Here, predetermined operations such as convolution and fusion may be performed on low-dimensional image features of an image, such as low-dimensional image features (H, W, 3) having only R, G, B three dimensions, where H denotes height and W denotes width, to acquire high-dimensional image features of an image, such as high-dimensional images (H, W, 256) having 256 dimensions, where H denotes height and W denotes width, wherein acquiring the high-dimensional image features may better distinguish pixel points of the image, enhancing expressive power of the image features, thereby making the entire image segmentation process more accurate. Further, by way of example only and not limitation, high-dimensional image features of an image may be obtained by convolution with a depth residual of 101 (ResNet-101) network and then fusion with a feature pyramid network; and the candidate frame coordinates of the interest target area (RoI) can be predicted by carrying out an object detection algorithm on the high-dimensional image features of the image by a Faster regional convolutional network (Faster R-CNN), and the image features of the interest target area can be accurately acquired in the high-dimensional image features of the image by using the candidate frame coordinates of the RoI through a RoI alignment algorithm. Furthermore, a first-order full convolution object detection (FCOS) model of a single-stage anchor-free node (anchor-free) may optionally be employed to obtain the acceleration on detection.

In step S320, image features of an obstruction that obstructs the object to be segmented in the predetermined area may be acquired through a first neural network based on the image features in the predetermined area. Here, boundary information of the obstruction and mask information indicating pixel points in the image that belong to the obstruction may be acquired from image features within the predetermined region (such as a region of interest). More specifically, the image features within the region of interest may be input to a predetermined neural network to obtain image features of an obstruction occluding the object to be segmented, and by way of example only and not limitation, the features of the obstruction occluding the object to be segmented may be input to obtain boundary information of the obstruction such as a single convolutional layer (the convolutional kernel size may be, for example, 1x1), and may also be input to obtain mask information of the obstruction such as a single convolutional layer (the convolutional kernel size may be, for example, 1x 1).

Further, the first neural network may include a first convolution layer, wherein the first convolution layer obtains image features of the obstruction by correlating image features of the obstruction according to a distance between high-dimensional image features using the high-dimensional image features. Here, by way of example only and not limitation, the first neural network may further include at least one of a single convolutional layer, a first convolutional layer, and a full convolutional layer, such as a sequentially connected single convolutional layer (convolutional core size may be, for example, 3x3), a first convolutional layer, and a full convolutional layer (may have, for example, two single convolutional layers (convolutional core sizes may each be, for example, 3x3)), as shown in fig. 4. It should be understood that the order of connection of the single convolutional layer, the first convolutional layer and the full convolutional layer, and which networks the entire convolutional network includes, may be as shown in fig. 4, and may be changed as needed by those skilled in the art. More specifically, by way of example only and not limitation, to reduce the number of parameters of the model, the first convolution layer may be a Non-local operator (Non-local operator) based graph convolution layer. Non-local operator based map convolution layers are described in more detail below with reference to FIG. 5.

FIG. 5 is a diagram illustrating a non-local operator based map convolution layer according to an exemplary embodiment of the present invention. Referring to fig. 5, by way of example only and not limitation, non-local operator based map convolutional layers according to an exemplary embodiment of the present invention may include a first single convolutional layer 510, a second single convolutional layer 520, a third single convolutional layer 530, a dot product unit 540, a softmax operator unit 550, a dot product unit 560, and an addition unit 570. The first single convolutional layer 510, the second single convolutional layer 520, and the third single convolutional layer 530 are three parallel single convolutional layers, and the convolutional core sizes may all be, for example, 1 × 1. More specifically, the first single convolution layer 510 may perform a convolution operation such as 1x1 on the input image feature of the region of interest (which may be the image feature output by the single convolution layer shown in fig. 4) to obtain a first convolution image feature, the second single convolution layer 520 may perform a convolution operation such as 1x1 on the input image feature of the region of interest (which may be the image feature output by the single convolution layer shown in fig. 4) to obtain a second convolution image feature, the third single convolution layer 530 may perform a convolution operation such as 1x1 on the input image feature of the region of interest (which may be the image feature output by the single convolution layer shown in fig. 4) to obtain a third convolution image feature, the dot product unit 540 may perform a dot product operation on the second convolution image feature and the third convolution image feature to obtain the image feature of the first dot product, the softmax unit 550 may perform a softmax operation on the image feature of the dot product to obtain a softmax operated image feature, the dot product unit 560 may perform a dot product operation on the first convolution image feature and the softmax-operated image feature to obtain an image feature of a second dot product, and then the addition unit 570 may perform an addition operation on the image feature of the second dot product and the input image feature of the region of interest (which may be the image feature output by the single convolution layer shown in fig. 4) to obtain a final output image feature. According to the non-local operator-based graph convolution layer disclosed by the exemplary embodiment of the invention, the pixel points in the image space can be effectively associated according to the similarity of the corresponding image features (namely, the distance between the image features), the re-aggregation of the input target region features is realized, and the problem of discontinuity caused by the fact that the pixel points of the same object are shielded and cut in the space can be better solved.

Returning to fig. 3, in step S330, the image feature of the object to be segmented may be obtained by the second neural network based on the image feature within the predetermined region and the image feature of the obstruction. Here, boundary information and mask information of the object to be segmented may be acquired based on image features within the predetermined region (such as a region of interest) and image features of the obstruction, wherein the mask information indicates pixel points in an image belonging to the object to be segmented. Here, by way of example only and not limitation, the image feature of the object to be segmented may be input to obtain boundary information of the object to be segmented, such as a single convolutional layer (the convolutional kernel size may be, for example, 1x1), and may also be input to obtain mask information of the object to be segmented, such as a single convolutional layer (the convolutional kernel size may be, for example, 1x 1).

Further, the second neural network may include a second convolutional layer, wherein the second convolutional layer associates image features of the object to be segmented according to a distance between high-dimensional image features by using the high-dimensional image features and the associated image features of the obstruction to acquire the image features of the object to be segmented. Here, by way of example only and not limitation, the second neural network may further include at least one of a single convolutional layer, a second convolutional layer, and a full convolutional layer, such as sequentially connected single convolutional layers (convolutional kernel size may be, for example, 3x3), second convolutional layers, and full convolutional layers (may have, for example, two single convolutional layers (convolutional kernel sizes may each be, for example, 3x3)), as shown in fig. 4. Here, the second convolutional layer may be a graph convolutional layer based on a non-local operator, similarly to the first convolutional layer, and the second neural network and the first neural network form a cascade network relationship. The current step can simultaneously consider the relationship between the occlusion and the occluded object, effectively distinguish the adjacent object boundaries of the occlusion (occluder) and the occluded object (occluder) (i.e. the object), and finally output the segmentation result of the object.

Fig. 6 is a block diagram illustrating an image segmentation apparatus according to an exemplary embodiment of the present invention.

Referring to fig. 6, an image segmentation apparatus 600 according to an exemplary embodiment of the present invention may include a feature acquisition unit 610, an obstruction acquisition unit 620, and an object acquisition unit 630.

The feature acquisition unit 610 may be configured to acquire an image feature within a predetermined region including an object to be segmented. More specifically, the feature acquisition unit 610 may acquire an image feature of an image, perform object detection on the image feature of the image to acquire a region of interest including an object to be segmented as the predetermined region, and acquire the image feature within the region of interest. Further, the feature acquisition unit 610 may perform predetermined processing on a low-dimensional image feature of an image to acquire a high-dimensional image feature of the image.

The obstruction acquisition unit 620 may be configured to acquire, by a first neural network, an image feature of an obstruction that obstructs the object to be segmented within the predetermined area based on the image feature within the predetermined area. More specifically, the obstruction obtaining unit 620 may obtain boundary information and mask information of the obstruction from the image features in the predetermined region, where the mask information indicates pixel points in the image that belong to the obstruction. Further, the first neural network may comprise a first convolution layer, wherein the first convolution layer correlates image features of the obstruction according to a distance between high-dimensional image features by using the high-dimensional image features to obtain image features of the obstruction.

The object obtaining unit 630 may be configured to obtain, through the second neural network, image features of the object to be segmented based on the image features within the predetermined region and the image features of the obstruction. More specifically, the object obtaining unit 630 may obtain boundary information and mask information of the object to be segmented based on image features within the predetermined region and image features of the obstruction, where the mask information indicates pixel points in the image that belong to the object to be segmented. In addition, the second neural network includes a second convolutional layer, wherein the second convolutional layer associates image features of the object to be segmented according to a distance between high-dimensional image features by using the high-dimensional image features and the associated image features of the obstruction to acquire the image features of the object to be segmented.

Here, the first neural network may further include at least one of a single convolutional layer, a first convolutional layer, and a full convolutional layer, such as a sequentially connected single convolutional layer, first convolutional layer, and full convolutional layer, wherein the first convolutional layer may be a non-local operator based map convolutional layer. The second neural network may further include at least one of a single convolutional layer, a second convolutional layer, and a full convolutional layer, such as a sequentially connected single convolutional layer, second convolutional layer, and full convolutional layer, wherein the second convolutional layer may be a non-local operator based map convolutional layer.

Fig. 7 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present invention. The electronic device 700 may be, for example: a smart phone, a tablet computer, an MP4(Moving Picture Experts Group Audio Layer IV) player, a notebook computer or a desktop computer. The electronic device 700 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so forth.

In general, the electronic device 700 includes: a processor 701 and a memory 702.

The processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 701 may be implemented in at least one hardware form of a DSP (digital signal processing), an FPGA (Field Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 701 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 701 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 701 may further include an AI (artificial intelligence) processor for processing computing operations related to machine learning.

Memory 702 may include one or more computer-readable storage media, which may be non-transitory. Memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 702 is used to store at least one instruction for execution by processor 701 to implement the image segmentation method provided by the method embodiment shown in fig. 3.

In some embodiments, the electronic device 700 may further optionally include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 703 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 704, touch screen display 705, camera 706, audio circuitry 707, positioning components 708, and power source 709.

The peripheral interface 703 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 701 and the memory 702. In some embodiments, processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 704 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 704 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 704 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 704 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 704 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 704 may also include NFC (Near Field Communication) related circuits, which are not limited by this disclosure.

The display screen 705 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 705 is a touch display screen, the display screen 705 also has the ability to capture touch signals on or over the surface of the display screen 705. The touch signal may be input to the processor 701 as a control signal for processing. At this point, the display 705 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 705 may be one, disposed on the front panel of the electronic device 700; in other embodiments, the number of the display screens 705 may be at least two, and the at least two display screens are respectively disposed on different surfaces of the electronic device 700 or are in a folding design; in still other embodiments, the display 705 may be a flexible display disposed on a curved surface or on a folded surface of the electronic device 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The Display 705 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or the like.

The camera assembly 706 is used to capture images or video. Optionally, camera assembly 706 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each of the rear cameras is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (virtual reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing or inputting the electric signals to the radio frequency circuit 704 to realize voice communication. For stereo capture or noise reduction purposes, the microphones may be multiple and disposed at different locations of the electronic device 700. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 707 may also include a headphone jack.

The positioning component 708 is operable to locate a current geographic Location of the electronic device 700 to implement a navigation or LBS (Location Based Service). The Positioning component 708 can be a Positioning component based on the GPS (Global Positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.

The power supply 709 is used to supply power to various components in the electronic device 700. The power source 709 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When power source 709 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the electronic device 700 also includes one or more sensors 710. The one or more sensors 710 include, but are not limited to: acceleration sensor 711, gyro sensor 712, pressure sensor 713, fingerprint sensor 714, optical sensor 715, and proximity sensor 716.

The acceleration sensor 711 may detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the electronic device 700. For example, the acceleration sensor 711 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 701 may control the touch screen 705 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 711. The acceleration sensor 711 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 712 may detect a body direction and a rotation angle of the electronic device 700, and the gyro sensor 712 may cooperate with the acceleration sensor 711 to acquire a 3D motion of the user with respect to the electronic device 700. From the data collected by the gyro sensor 712, the processor 701 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 713 may be disposed on a side bezel of electronic device 700 and/or an underlying layer of touch display 705. When the pressure sensor 713 is disposed on a side frame of the electronic device 700, a user holding signal of the electronic device 700 may be detected, and the processor 701 may perform left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 713. When the pressure sensor 713 is disposed at a lower layer of the touch display 705, the processor 701 controls the operability control on the UI according to the pressure operation of the user on the touch display 705. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 714 is used for collecting a fingerprint of a user, and the processor 701 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 714, or the fingerprint sensor 714 identifies the identity of the user according to the collected fingerprint. When the user identity is identified as a trusted identity, the processor 701 authorizes the user to perform relevant sensitive operations, including unlocking a screen, viewing encrypted information, downloading software, paying, changing settings, and the like. The fingerprint sensor 714 may be disposed on the front, back, or side of the electronic device 700. When a physical button or vendor Logo is provided on the electronic device 700, the fingerprint sensor 714 may be integrated with the physical button or vendor Logo.

The optical sensor 715 is used to collect the ambient light intensity. In one embodiment, the processor 701 may control the display brightness of the touch display 705 based on the ambient light intensity collected by the optical sensor 715. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 705 is increased; when the ambient light intensity is low, the display brightness of the touch display 705 is turned down. In another embodiment, processor 701 may also dynamically adjust the shooting parameters of camera assembly 706 based on the ambient light intensity collected by optical sensor 715.

A proximity sensor 716, also referred to as a distance sensor, is typically disposed on the front panel of the electronic device 700. The proximity sensor 716 is used to capture the distance between the user and the front of the electronic device 700. In one embodiment, the processor 701 controls the touch display screen 705 to switch from the bright screen state to the dark screen state when the proximity sensor 716 detects that the distance between the user and the front surface of the electronic device 700 is gradually decreased; when the proximity sensor 716 detects that the distance between the user and the front surface of the electronic device 700 gradually becomes larger, the processor 701 controls the touch display screen 705 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 7 does not constitute a limitation of the electronic device 700 and may include more or fewer components than those shown, or combine certain components, or employ a different arrangement of components.

Fig. 8 is a diagram illustrating a server according to an exemplary embodiment of the present invention. Referring to fig. 8, a server 800 includes one or more processing processors 810 and memory 820. The memory 820 may include one or more programs for performing the methods described above with reference to fig. 3. The server 800 may also include a power component 830 configured to perform power management for the server 800, a wired or wireless network interface 840 configured to connect the server 800 to a network, and an input/output (I/O) interface 850. The server 800 may operate based on an operating system stored in memory 820, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

According to an exemplary embodiment of the present invention, there may also be provided a computer-readable storage medium, wherein when executed by at least one processor, instructions in the computer-readable storage medium cause the at least one processor to perform an image segmentation method according to an exemplary embodiment of the present invention. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or compact disc memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or a extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a hard disk, a magnetic tape, a magneto-optical data storage device, a, A solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

According to an exemplary embodiment of the present disclosure, there may also be provided a computer program product comprising computer instructions executable by at least one processor to perform an image segmentation method according to an exemplary embodiment of the present invention.

Fig. 9 is a diagram illustrating an example segmentation result according to an exemplary embodiment of the present invention. Referring to fig. 9, it can be seen that the example segmentation result (b) according to an exemplary embodiment of the present invention can better segment an occluded object than the example segmentation result (a) of the related art.

According to the embodiment of the invention, the pixel points of the image can be better distinguished by acquiring the high-dimensional image characteristics, and the expression capability of the image characteristics is enhanced, so that the whole image segmentation processing is more accurate; the neural network can be combined with the two-stage example segmentation, two cascade-shaped neural networks (graph convolution layers) are introduced, in an interested area, the front graph convolution layer modeling can output the boundary and the mask of a sheltering object, pixel points in an image space are effectively associated according to the similarity (namely, the distance between image features) of corresponding image features, the re-aggregation of input target area features is realized, the problem that the pixel points of the same object are sheltered and cut in space to cause discontinuity can be better solved, the back graph convolution layer finally outputs the corresponding mask of the sheltered object on the basis of the front graph layer, the sheltering and sheltered relations are simultaneously considered, the boundaries of adjacent objects of the sheltering object and the sheltered object can be effectively distinguished, the overlapped sheltered objects can be effectively distinguished from each other, and the high speed of a high-performance server end can still be kept under the sheltering condition, and still light and fast; in addition, compared with the traditional segmentation algorithm, the prediction result has more interpretability due to the fact that the occlusion object and the occluded object are respectively and explicitly modeled.

While the invention has been shown and described with reference to certain exemplary embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims and their equivalents.

Claims

1. An image segmentation method, comprising:

acquiring image characteristics in a preset region including an object to be segmented;

acquiring image characteristics of an obstruction for obstructing the object to be segmented through a first neural network based on the image characteristics in the predetermined area, wherein the obstruction is an object for obstructing the object to be segmented in the predetermined area;

acquiring the image characteristics of the object to be segmented based on the image characteristics in the preset area and the image characteristics of the obstruction through a second neural network.

2. The image segmentation method of claim 1, wherein the step of obtaining image features of the obstruction comprises:

acquiring boundary information and mask information of the obstruction from image features in the predetermined area,

and the mask information indicates pixel points belonging to the shielding object in the image.

3. The image segmentation method according to claim 1 or 2, wherein the step of obtaining image features of the object to be segmented comprises:

acquiring boundary information and mask information of the object to be segmented based on image features in the predetermined region and image features of the obstruction,

and the mask information indicates pixel points belonging to the object to be segmented in the image.

4. The image segmentation method according to claim 1, wherein the step of obtaining image features within a predetermined region including an object to be segmented comprises:

acquiring image characteristics of an image, and performing object detection on the image characteristics of the image to acquire a region of interest including an object to be segmented as the predetermined region to acquire image characteristics in the region of interest.

5. The image segmentation method according to claim 4, wherein the step of obtaining image features of the image comprises:

the low-dimensional image features of the image are subjected to predetermined processing to acquire high-dimensional image features of the image.

6. The image segmentation method according to claim 5,

the first neural network comprises a first convolution layer, wherein the first convolution layer associates image features of the obstruction according to distances between high-dimensional image features by using the high-dimensional image features to obtain image features of the obstruction,

the second neural network comprises a second convolution layer, wherein the second convolution layer is used for correlating the image characteristics of the object to be segmented according to the distance between the high-dimensional image characteristics by using the high-dimensional image characteristics and the image characteristics of the associated occlusion objects so as to obtain the image characteristics of the object to be segmented.

7. An image segmentation apparatus, comprising:

a feature acquisition unit configured to acquire an image feature within a predetermined region including an object to be segmented;

an obstruction acquisition unit configured to acquire, by a first neural network, an image feature of an obstruction that obstructs the object to be segmented based on an image feature within the predetermined area, wherein the obstruction is an object that obstructs the object to be segmented within the predetermined area;

an object obtaining unit configured to obtain, through a second neural network, image features of the object to be segmented based on the image features within the predetermined region and the image features of the obstruction.

8. An electronic device, comprising:

at least one processor;

at least one memory storing computer-executable instructions,

wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the image segmentation method of any one of claims 1 to 6.

9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by at least one processor, enable the at least one processor to perform the image segmentation method of any of claims 1 to 6.

10. A computer program product in which instructions are executed by at least one processor to perform the image segmentation method according to any one of claims 1 to 6.