CN114627470A

CN114627470A - Image processing method, image processing device, computer equipment and storage medium

Info

Publication number: CN114627470A
Application number: CN202210526004.8A
Authority: CN
Inventors: 周彦宁; 肖凯文; 叶虎; 蔡德; 马兆轩; 韩骁
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-05-16
Filing date: 2022-05-16
Publication date: 2022-06-14
Anticipated expiration: 2042-05-16
Also published as: CN114627470B

Abstract

The application provides an image processing method, an image processing device, computer equipment and a storage medium, and belongs to the technical field of artificial intelligence. The method comprises the following steps: performing feature extraction on the input cell image through a feature extraction layer in the image classification model to obtain a global feature and a plurality of local features of the cell image; fusing the global features and the local features through a feature fusion layer in the image classification model to obtain classification features; determining target attribute information of the cell image according to the distance between the global features and the image attribute features of the target image category through an attribute determination layer in the image classification model; and outputting the target image category and the target attribute information of the cell image through an output layer in the image classification model. In the scheme, the reason for classifying the cell image into the target image category is explained through the target attribute information, so that the image processing process has interpretability.

Description

Image processing method, image processing device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to an image processing method and apparatus, a computer device, and a storage medium.

Background

With the development of artificial intelligence technology, images can be classified based on an image classification model obtained by large-scale data training. The classification accuracy of the image classification model often depends on the quality of the sample data set, and how to improve the accuracy of the image classification model is a problem to be researched.

At present, before training an image classification model, a professional skilled in the related art is usually required to label a sample image in a sample data set, and model training is performed through the labeled sample data set to improve the classification accuracy of the image classification model.

However, when the image classification model obtained by the above scheme training is used for image processing, the output is a two-classification result or a multi-classification result, and cannot provide a basis for predicting a certain classification result, that is, the incidence relation between input and output is not provided, so that the image processing process lacks interpretability.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, computer equipment and a storage medium, so that the image processing process has interpretability. The technical scheme is as follows.

In one aspect, an image processing method is provided, and the method includes:

performing feature extraction on an input cell image through a feature extraction layer in an image classification model to obtain global features and a plurality of local features of the cell image, wherein the image classification model is used for classifying the input image and outputting image categories and attribute information of the input image, the attribute information is used for representing the association relationship between the input image and the image categories, and the local features are used for indicating the features of anchor point positions in the cell image;

fusing the global features and the local features through a feature fusion layer in the image classification model to obtain classification features, wherein the classification features are used for determining the target image category to which the cell image belongs;

determining, by an attribute determination layer in the image classification model, target attribute information of the cell image according to distances between the global feature and a plurality of image attribute features of the target image category, the image attribute features being used to represent features of image attributes included in the target image category;

outputting the target image class and the target attribute information of the cell image through an output layer in the image classification model.

In another aspect, there is provided an image processing apparatus, the apparatus including:

the cell image processing device comprises a feature extraction module, a cell image processing module and a cell image processing module, wherein the feature extraction module is used for performing feature extraction on an input cell image through a feature extraction layer in an image classification model to obtain a global feature and a plurality of local features of the cell image, the image classification model is used for classifying the input image to output an image category and attribute information of the input image, the attribute information is used for representing an incidence relation between the input image and the image category, and the local features are used for indicating features of anchor point positions in the cell image;

the feature fusion module is used for fusing the global features and the local features through a feature fusion layer in the image classification model to obtain classification features, and the classification features are used for determining the target image category to which the cell image belongs;

an attribute determining module, configured to determine, by an attribute determining layer in the image classification model, target attribute information of the cell image according to distances between the global feature and a plurality of image attribute features of the target image category, where the image attribute features are used to represent features of image attributes included in the target image category;

an output module, configured to output the target image category and the target attribute information of the cell image through an output layer in the image classification model.

In some embodiments, the feature fusion module comprises:

a feature processing unit, configured to process the local features to obtain attention information of a plurality of image categories, where the attention information is used to indicate weights of corresponding image categories;

and the feature fusion unit is used for weighting the global features through the attention information of the image categories to obtain the classification features, and the classification features comprise the weighted features of the image categories.

In some embodiments, the feature processing unit is configured to, for any image category, obtain a plurality of pyramid features of the image category, where the pyramid features are used to indicate a plurality of local features extracted from a corresponding pyramid image, and the pyramid image is obtained based on the cell image; determining attention information for the image category based on the plurality of pyramidal features.

In some embodiments, the feature processing unit is configured to, for any pyramid feature, generate an attention feature corresponding to the pyramid feature by using a maximum value of each element in a plurality of local features included in the pyramid feature; generating the attention feature of the image category according to the maximum value of each element in the attention feature corresponding to the pyramid features; and normalizing the attention features of the image categories to obtain the attention information of the image categories.

In some embodiments, the apparatus further comprises:

the image processing module is used for extracting a foreground area of the image to be processed to obtain a foreground image; and segmenting the foreground image to obtain a plurality of cell images.

In some embodiments, the apparatus further comprises:

the training module is used for performing feature extraction on the sample cell image through a feature extraction layer in the image classification model to obtain a sample global feature and a plurality of sample local features of the sample cell image, wherein the sample local features are used for indicating features of anchor point positions in the sample cell image; fusing the sample global features and the plurality of sample local features through a feature fusion layer in the image classification model to obtain a plurality of sample classification features, wherein the sample classification features are used for determining the probability that the sample cell images belong to each sample image category; respectively determining distances between the plurality of sample classification features and a plurality of prototype features through an attribute determination layer in the image classification model, wherein the prototype features are used for representing features of a plurality of sample subcategories included in the corresponding sample image category; training the image classification model based on the label image class to which the sample cell image belongs, the plurality of sample local features, the plurality of sample classification features, and distances between the plurality of sample classification features and a plurality of prototype features.

In some embodiments, the training module is configured to determine a probability that the sample cellular image belongs to each sample image class based on a plurality of sample classification features; determining an asymmetry loss based on a label image class to which the sample cell image belongs and a probability that the sample cell image belongs to the label image class; determining pyramid classification loss and pyramid regression loss based on the plurality of sample local features; for any sample image class, determining a contrast loss of the sample image class by a distance between a sample classification feature of the sample image class and a prototype feature of the sample image class; and training the image classification model through the asymmetric loss, the pyramid classification loss, the pyramid regression loss and the contrast loss of a plurality of sample image categories.

In some embodiments, the training module is configured to obtain, from the prototype features of the sample image class, a target sub-feature with a minimum distance from the sample classification feature of the sample image class; determining a contrast loss for the sample image class by a distance between the target sub-feature and the sample classification feature.

In some embodiments, the sample cell image further comprises attribute information indicating a sample subcategory in a label image category to which the sample cell image belongs;

the training module is used for determining attribute loss through the distance between a sample subcategory feature of a first sample cell image and a sample subcategory feature of a second sample cell image, wherein the first sample cell image and the second sample cell image belong to a sample subcategory in the same sample image category; the training the image classification model through the asymmetric loss, the pyramid classification loss, the pyramid regression loss, and the contrast loss of the plurality of sample image classes includes: training the image classification model through the asymmetric loss, the pyramid classification loss, the pyramid regression loss, the contrast loss of the plurality of sample image categories, and the attribute loss.

In another aspect, a computer device is provided, which includes a processor and a memory, where the memory is used to store at least one piece of computer program, and the at least one piece of computer program is loaded and executed by the processor to implement the image processing method in the embodiment of the present application.

In another aspect, a computer-readable storage medium is provided, in which at least one piece of computer program is stored, the at least one piece of computer program being loaded and executed by a processor to implement an image processing method as in the embodiments of the present application.

In another aspect, a computer program product or a computer program is provided, the computer program product or the computer program comprising computer program code stored in a computer-readable storage medium, the computer program code being read by a processor of a computer device from the computer-readable storage medium, the computer program code being executed by the processor to cause the computer device to perform the image processing method provided in the various alternative implementations of the aspects.

The embodiment of the application provides an image processing scheme, and since images of different image types have different attributes, if an image has an attribute in a certain image type, it indicates that the image may belong to the image type. The global features and the local features in the cell images are extracted and fused, so that the information acquired from the cell images is increased, and the cell images are classified through the classification features obtained through fusion to obtain the target image categories. After the classification, the target attribute information is determined through the distance between the global feature and a plurality of image attribute features of the target image category to represent the association relationship between the cell image and the target image category, so that the target attribute information can explain the reason for classifying the cell image into the target image category, and the image processing process has interpretability.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment of an image processing method according to an embodiment of the present application;

FIG. 2 is a flow chart of an image processing method provided according to an embodiment of the present application;

FIG. 3 is a flow chart of another image processing method provided according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of an image classification model provided according to an embodiment of the present application;

fig. 5 is a block diagram of an image processing apparatus provided according to an embodiment of the present application;

fig. 6 is a block diagram of another image processing apparatus provided according to an embodiment of the present application;

fig. 7 is a block diagram of a terminal according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a server provided according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, the following detailed description of the embodiments of the present application will be made with reference to the accompanying drawings.

The terms "first," "second," and the like in this application are used for distinguishing between similar items and items that have substantially the same function or similar functionality, and it should be understood that "first," "second," and "nth" do not have any logical or temporal dependency or limitation on the number or order of execution.

The term "at least one" in this application means one or more, and the meaning of "a plurality" means two or more.

It should be noted that the information (including but not limited to user device information, user personal information, etc.), data (including but not limited to data for analysis, stored data, displayed data, etc.) and signals referred to in this application are authorized by the user or fully authorized by various parties, and the collection, use and processing of the relevant data are subject to relevant laws and regulations and standards in relevant countries and regions. For example, the images referred to in this application are all acquired with sufficient authorization.

Hereinafter, terms related to the present application are explained.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and the like.

Computer Vision technology (CV) is a science for researching how to make a machine "look", and more specifically, it refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image Recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition), video processing, video semantic understanding, video content Recognition, three-dimensional object reconstruction, 3D (3 Dimension) technologies, virtual reality, augmented reality and map construction, automatic driving, smart transportation, and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

RetinaNet (an Object Detection algorithm) originated in 2018 paper Focal local for detect Object Detection (Loss of focus in Dense Object Detection). The main network of the RetinaNet is ResNet50 or ResNet101, and the main network is followed by a Feature Pyramid (FPN) network.

A Residual Network (ResNet) is a convolutional neural Network, and is characterized by easy optimization and can improve accuracy by adding considerable depth.

fast-RCNN (an Object Detection algorithm) was published in 2015 as fast R-CNN (forward read-Time Object Detection with Region proposed network for Real-Time Object Detection). The biggest innovation of the algorithm is to provide an RPN (region pro-social network) network, and connect region generation and a convolution network together by using an Anchor mechanism.

The Anchor is actually a rectangular frame generated on each pixel point of the feature map, and is calculated according to the down-sampling rate, and the rectangular frame corresponds to a frame which is enlarged in proportion on the original image. Since the position label of the object on the picture is represented by a rectangular box, introducing an Anchor provides a strong a priori knowledge for us.

The L1 norm loss function, also known as the minimum absolute deviation, minimum absolute error, minimizes the sum of the absolute differences of the target value and the estimated value.

smooth L1 is L1 after smoothing.

Full-field digital sections (WSI), one pathological section was digitized to form one WSI.

Otsu (the Otsu method, also called the maximum inter-class variance method) algorithm is an efficient algorithm for binarizing images.

Convolutional Neural Networks (CNN), which refers to a feed-forward Neural network including convolution calculations and having a deep structure, is one of the representative algorithms of deep learning (deep learning).

Prototype learning (prototype learning), which is a deep learning mode, uses "prototypes" to express the common features of a defined class, and determines the class attributes of a sample by measuring the distance between the sample and different prototypes in a feature space.

The image processing method provided by the embodiment of the application can be executed by computer equipment. In some embodiments, the computer device is a terminal or a server. An implementation environment of the image processing method provided in the embodiment of the present application is described below by taking a computer device as an example, and fig. 1 is a schematic diagram of an implementation environment of an image processing method provided in the embodiment of the present application. Referring to fig. 1, the implementation environment includes a terminal 101 and a server 102. The terminal 101 and the server 102 can be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

In some embodiments, the terminal 101 is a smartphone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, a smart voice interaction device, a smart appliance, a vehicle-mounted terminal, and the like, but is not limited thereto. The terminal 101 is installed and runs an application for image processing, by which it is possible to upload an image to be processed and display the processed image. Those skilled in the art will appreciate that the number of terminals described above may be greater or fewer. For example, the number of the terminals may be only one, or several tens or hundreds of the terminals, or more. The number of terminals and the type of the device are not limited in the embodiments of the present application.

In some embodiments, the server 102 is an independent physical server, can also be a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server providing basic cloud computing services such as cloud service, cloud database, cloud computing, cloud function, cloud storage, web service, cloud communication, middleware service, domain name service, security service, CDN (Content Delivery Network), big data and artificial intelligence platform, and the like. The server 102 is configured to provide a background service for the application program for image processing. In some embodiments, the server 102 undertakes primary computing work and the terminal 101 undertakes secondary computing work; or, the server 102 undertakes the secondary computing work, and the terminal 101 undertakes the primary computing work; alternatively, the server 102 and the terminal 101 perform cooperative computing by using a distributed computing architecture.

In some embodiments, the image processing method provided in the embodiments of the present application can also be executed by the terminal, and the terminal acquires the image classification model from the server and then implements the image processing scheme provided in the embodiments of the present application based on the image classification model.

Fig. 2 is a flowchart of an image processing method according to an embodiment of the present application, and as shown in fig. 2, the image processing method is described in the embodiment of the present application as an example executed by a server. The image processing method includes the following steps.

201. The server performs feature extraction on an input cell image through a feature extraction layer in an image classification model to obtain global features and a plurality of local features of the cell image, the image classification model is used for classifying the input image and outputting image categories and attribute information of the input image, the attribute information is used for representing the association relationship between the input image and the image categories, and the local features are used for indicating features of anchor point positions in the cell image.

In the embodiment of the application, the server can process the input cell image through the image classification model to determine the image category and attribute information to which the cell image belongs. The image classification model comprises a feature extraction layer, a feature fusion layer, an attribute determination layer and an output layer. The feature extraction layer is used for extracting features of an input cell image, the features extracted by the feature extraction layer comprise global features and a plurality of local features, and the local features are extracted from anchor point positions in the cell image through the feature extraction layer. The cellular image includes a plurality of anchor points. The role of the feature fusion layer, the attribute determination layer, and the output layer, see the following steps.

202. And the server fuses the global feature and the local features through a feature fusion layer in the image classification model to obtain a classification feature, wherein the classification feature is used for determining the target image category to which the cell image belongs.

In the embodiment of the present application, the local features are features extracted according to image categories, that is, different image categories, and have different local features. Through the feature fusion layer in the image classification model, the global features and the local features can be fused according to the image categories to obtain the classification features.

203. And the server determines target attribute information of the cell image according to the distance between the global feature and a plurality of image attribute features of the target image category through an attribute determination layer in the image classification model, wherein the image attribute features are used for representing the features of the image attributes included in the target image category.

In an embodiment of the present application, the image classification model is capable of classifying the cell image into a plurality of preset image classes, each image class including a plurality of image attributes, and the image attributes have image attribute features. The distance between the global features and the image attribute features can reflect the similarity between the image attribute features and the global features, and the image attribute corresponding to the image attribute feature with the highest similarity is the image attribute of the cell image.

204. The server outputs the target image category and the target attribute information of the cell image through an output layer in the image classification model.

In the embodiment of the application, the image processing process is interpretable because the target attribute information can represent the association relationship between the cell image input into the image classification model and the target image class output by the image classification model, so that the target attribute information can classify the cell image into the target image class.

Fig. 2 illustrates a main flow of an image processing scheme provided in an embodiment of the present application, and the image processing scheme is further described below based on an application scenario. Fig. 3 is a flowchart of another image processing method provided according to an embodiment of the present application, and as shown in fig. 3, the image processing method is described as being executed by a server in the embodiment of the present application. The image processing method includes the following steps.

301. The server extracts the features of the sample cell image through a feature extraction layer in the image classification model to obtain a sample global feature and a plurality of sample local features of the sample cell image, wherein the sample local features are used for indicating the features of the anchor point positions in the sample cell image. The image classification model is used for classifying an input image and outputting an image category and attribute information of the input image, wherein the attribute information is used for representing the association relationship between the input image and the image category.

In this embodiment, the server is the server 102 shown in fig. 1, and the server is capable of training an image classification model based on a sample cell image and then performing image processing based on the trained image classification model. The structure of the image classification model comprises a feature extraction layer, a feature fusion layer, an attribute determination layer and an output layer. The feature extraction layer is used for extracting features of the input image, the feature fusion layer is used for fusing the features extracted by the feature extraction layer, the attribute determination layer is used for determining attribute information according to the features output by the feature fusion layer, and the output layer is used for outputting a classification result and the attribute information. The feature extraction layer is constructed based on a detection model, such as RetinaNet or FasterR-CNN, and the embodiment of the application does not introduce the feature extraction layer. The anchor point location is used to indicate the region in the image where feature extraction is performed.

For example, in the case of RetinaNet, the feature extraction layer includes a backbone network for extracting features, one sub-network for category classification, another sub-network for target location regression, and two sub-networks represented by detectors. The main network takes ResNet50 as a basic structure, on the basis, a feature pyramid is used for multi-scale feature fusion, and the feature extraction capability of the convolution network is enhanced through top-down paths and transverse connection. The feature extraction layer further comprises a global feature extractor for extracting global features of the sample. Referring to fig. 4, fig. 4 is a schematic structural diagram of an image classification model according to an embodiment of the present application. Referring to fig. 4, after the sample cell image is input, the sample cell image is processed by the residual error network and the feature pyramid, the last layer of the feature pyramid is used as the input of the global feature extractor, and the global feature extractor outputs the global feature network. For any layer of the feature pyramid, the output of that layer is taken as the input of the detector, and the detector outputs a plurality of sample local features of a plurality of sample image categories, which correspond to the pyramid scale of that layer. Because the feature pyramid is five layers, the five detectors respectively output a plurality of sample local features corresponding to the five pyramid scales.

The sample cell image is sample data in a sample data set, and the sample data set comprises a plurality of cell images marked with image categories.

For example, the sample data set is represented as

Wherein, in the step (A),

representing a set of labeled cell slice images, the cell slice images having a size of 1280 x 720;

representing a sample image class set containing information on the location of target cells in the cell slice image

And category information

，

And

and the coordinates of the upper left vertex and the coordinates of the lower right vertex of the circumscribed rectangle (detection frame and labeling frame) representing the target cell.

302. And the server fuses the sample global features and the plurality of sample local features through a feature fusion layer in the image classification model to obtain a plurality of sample classification features, wherein the sample classification features are used for determining the probability that the sample cell images belong to each sample image category.

In the embodiment of the application, because the attention area concerned by the detection network in the feature extraction layer is different from the attention area concerned by the global feature extractor, the global features of the sample and the local features of the plurality of samples can be fused through the feature fusion layer in the image classification model, so that the integrity of the extracted features is enhanced.

In some embodiments, the server can convert the plurality of local features into weights of different sample image classes through the feature fusion layer, and then weight the sample global features through the weights to realize fusion of the sample global features and the plurality of sample local features. Correspondingly, the server firstly processes the local features to obtain attention information of a plurality of image categories, and then weights the global features respectively according to the attention information of the image categories to obtain a plurality of sample classification features. Wherein the attention information is used to represent the weight of the corresponding image class. By converting the sample local features into weights, so that the sample global features and the plurality of sample local features are fused by the weights, the integrity of the extracted features can be enhanced.

In some embodiments, the server is capable of fusing the sample global features with the plurality of sample local features of different sample image classes, respectively, by sample image class. For any sample image category, the server can acquire a plurality of sample pyramid features belonging to the sample image category through the plurality of sample local features, and then determine sample attention information of the sample image category based on the plurality of sample pyramid features. Wherein the sample pyramid features are used to indicate a plurality of sample local features extracted from corresponding sample pyramid images, the sample pyramid images being derived based on the sample cell images. By processing the plurality of sample local features according to the sample image categories, the weight of each sample image category can be determined.

In some embodiments, for any sample image category, since the pyramid scales of different pyramid layers in the feature pyramid are different, for each pyramid layer, the server can determine a plurality of sample local features output by the pyramid layer to determine the sample attention information of the sample image category. Correspondingly, for any sample pyramid feature, the server can generate the attention feature corresponding to the sample pyramid feature through the maximum value of each element in the plurality of sample local features included in the sample pyramid feature. Then, the server generates the attention feature of the sample image category according to the maximum value of each element in the attention features corresponding to the sample pyramid features. And finally, the server normalizes the attention characteristics of the sample image category to obtain the sample attention information of the sample image category. By processing the sample pyramid features in the manner, the sample attention information for each sample image category can be accurately obtained, and therefore fusion of the sample global features and the local features of the plurality of samples can be achieved based on the sample attention information of each sample image category.

For example, referring to fig. 4, the feature output by the feature pyramid is input into the corresponding detector, and the detector performs classification and regression processes to obtain a plurality of sample pyramid features belonging to the sample image category c, which are expressed as

Wherein, in the step (A),

the pyramid level of the feature pyramid is shown.

Where k represents the number of predefined anchor points and also the number of sample local features in the sample pyramid feature, h represents the number of rows of sample local features, and w represents the number of columns of sample local features. It should be noted that, the number of rows and columns of the sample pyramid features output by different pyramid layers is different, and in the embodiment of the present application, an interpolation manner is adopted, so that the number of rows and columns of the sample pyramid features output by each pyramid layer is kept consistent, where the number of rows is h and the number of columns is w.

Sample pyramid features belonging to sample image class c that represent the level 1 output in the feature pyramid,

including k sample local features. Note that, since the sample local feature may also be referred to as a sample local feature map, h represents the height of the sample local feature map, and w represents the width of the sample local feature map.

See fig. 4 for

For example, the feature fusion layer passes through the maximum function

And generating attention features corresponding to the sample pyramid features:

. Wherein the content of the first and second substances,

the method comprises k sample local features, and each sample local feature is an h multiplied by w matrix.

The element in the first row and the first column is the maximum value among the elements in the k first rows and the first columns of the k sample local features. In the same way, the method for preparing the composite material,

the element of the first row and the second column is the maximum of the k elements of the first row and the second column of the k sample local features. By analogy, the description is not repeated.

Further, the feature fusion layer is processed by a maximum function

And generating attention characteristics corresponding to the sample image category c:

. Wherein the content of the first and second substances,

also a matrix of h x w,

the element in the first row and the first column in the pyramid is the maximum value of the elements in the n first rows and the first columns in the n sample pyramid features, that is, the maximum value is

The maximum value among the elements of the first row and the first column.

Then, the feature fusion layer normalizes the attention feature of the sample image class c based on the following formula (1) to obtain sample attention information of the sample image class c.

（1）

Wherein the content of the first and second substances,

sample attention information representing sample image class cDue to the fact that

Can be expressed in a matrix form, and can also be called a sample attention matrix or a weight matrix;

indicating the attention feature corresponding to the sample image class c. It should be noted that, in the following description,

the sum of the elements in (A) is 1,

to represent

Is divided by the sum of all elements.

Finally, the feature fusion layer weights the sample global features by using the sample attention information of the sample image category c as a weight based on the following formula (2), and obtains the sample classification features of the sample image category c.

（2）

Wherein the content of the first and second substances,

a sample classification feature representing a sample image class c,

representing the sum of elements, h the number of rows of elements, w the number of columns of elements,

sample attention information representing the sample image class c,

in two matricesThe elements of the corresponding positions are multiplied respectively, and W represents the global features of the sample.

303. The server determines the distance between the sample classification features and prototype features respectively through an attribute determination layer in the image classification model, wherein the prototype features are used for representing the features of sample sub-categories included in the corresponding sample image category.

In an embodiment of the application, the server is capable of determining a distance between the sample classification feature and the prototype feature based on a distance metric function through the attribute determination layer. The distance metric function is shown in equation (3) below.

（3）

Wherein the content of the first and second substances,

a function representing a measure of the distance is represented,

a sample classification feature representing a sample image class c,

a prototype feature representing the class c of sample images,

representing a normalization function, T represents a matrix transpose,

the temperature factor is expressed and the value is 0.1.

304. The server trains the image classification model based on the label image category to which the sample cell image belongs, the sample classification features and the distances between the sample classification features and prototype features.

In the embodiment of the application, when the server performs feature extraction through the feature extraction layer, pyramid classification loss and pyramid regression loss exist; based on a plurality of sample classification features output by the feature fusion layer, when the probability that the sample cell image belongs to each sample image category is determined, asymmetric loss, namely model classification loss, exists; distances between the plurality of sample classification features and the plurality of prototype features are determined, there is a loss of contrast for each sample image class.

In some embodiments, the service can determine the loss and train an image classification model based on the loss. Accordingly, the server can determine the probability that the sample cell image belongs to each sample image class based on the plurality of sample classification features, and then determine the asymmetry loss based on the label image class to which the sample cell image belongs and the probability that the sample cell image belongs to the label image class. The server can determine pyramid classification loss and pyramid regression loss based on the plurality of sample local features; for any sample image class, the server can determine the contrast loss of the sample image class by the distance between the sample classification feature of the sample image class and the prototype feature of the sample image class. And finally, the server trains an image classification model through asymmetric loss, pyramid classification loss, pyramid regression loss and comparison loss of a plurality of sample image categories. And the label image type of the sample cell image is the image type marked by the sample cell image.

For example, the server determines the above-described asymmetry loss by the following formula (4).

（4）

Wherein the content of the first and second substances,

representing the loss of asymmetry, p represents the probability that the sample cell image belongs to the label image class,

a hyper-parameter representing a positive sample,

a hyper-parameter representing a negative example,

and m represents a hyperparameter for adjusting the ratio of positive and negative samples.

For example, referring to fig. 4, the server processes the classification features through the global classifier in fig. 4 to obtain the probability that the sample cell image belongs to the label image class.

The server determines pyramid regression loss by smooth L1 (L1 norm) loss function and pyramid classification loss by focal loss. The embodiment of the present application does not limit this.

The server determines the contrast loss for the sample image class c by equation (5) below.

（5）

Wherein the content of the first and second substances,

which represents a loss of contrast,

a function representing a measure of the distance is represented,

a sample classification feature representing a sample image class c,

a prototype feature representing the sample image class c,

prototype features representing sample image class I, I representing the total number of sample image classes.

In some embodiments, there may be potentially N different sample subcategories within each sample image category that represent image attributes for that sample image category, N being a positive integer. The distance of the classification feature of the sample cell image from the prototype feature represents similarity to the sample image class. By relaxing the constraints, it is only necessary to be similar to the sample subcategories within one sample image category, and not to all sample subcategories within a sample image category. Correspondingly, the server obtains the target sub-feature with the minimum distance to the sample classification feature of the sample image class from the prototype feature of the sample image class, and then determines the contrast loss of the sample image class according to the distance between the target sub-feature and the sample classification feature.

For example, the server modifies the above equation (5) to the following equation (6), and determines the contrast loss of the sample image class c by the following equation (6).

（6）

Wherein, the first and the second end of the pipe are connected with each other,

which represents a loss of contrast,

a function representing a measure of the distance is represented,

a sample classification feature representing a sample image class c,

the target sub-feature with the smallest distance from the sample classification feature of the sample image class c among the prototype features representing the sample image class c,

prototype features representing a sample image class I, I representing the total number of sample image classes, N representing the number of sample subcategories present within a sample image class,

and (4) target sub-features of the jth sample sub-category in the prototype features representing the sample image category c.

In some embodiments, during the training process, the server can update the features of the sample subcategory in the prototype feature by equation (7) below.

（7）

a sample classification feature representing a sample image class c,

representing a weight parameter. The server can pass the updated

To optimize the contrast loss for the sample image class c.

For example, referring to fig. 4, the attribute determination layer performs prototype learning according to the above equations (5) to (7) by a plurality of sample image categories and each sample image category including a plurality of sample subcategories.

It should be noted that the contrast loss may also be replaced by a metric loss function such as ternary loss (Tripleloss) or N-pair loss (N-pair loss), which is not limited in the embodiment of the present application.

In some embodiments, the server is further capable of adding a small number of sample cell images with attribute information to the sample data set, and the sample cell images further include attribute information indicating a sample subcategory in a category of label images to which the sample cell images belong. The server can assist training by the attribute information. Accordingly, the server determines the attribute loss by a distance between a sample subcategory feature of a first sample cell image and a sample subcategory feature of a second sample cell image, the first sample cell image and the second sample cell image belonging to a sample subcategory in the same sample image category. And then training an image classification model through asymmetric loss, pyramid classification loss, pyramid regression loss, contrast loss and attribute loss of a plurality of sample image categories. The server determines the above-described attribute loss by the following formula (8).

（8）

the loss of the attribute is indicated by the loss of the attribute,

a function representing a measure of the distance is represented,

and

sample subcategory features representing different sample cell images with the same sample subcategory a, a represents the total number of sample subcategories (attributes) of one sample image category. Wherein attribute penalties are used to cause different attributes to be assigned to different sample subcategories.

In summary, the server determines the total loss of training of the image classification model by the following equation (9).

（9）

Wherein the content of the first and second substances,

which represents the total loss of training,

the pyramid regression loss is represented by the loss of the pyramid regression,

the loss of the pyramid classification is represented,

which represents a loss of contrast,

the loss of the attribute is indicated by the loss of the attribute,

and

the values are all 1 for the hyperparameters of the balance loss function.

305. And the server acquires a plurality of cell images through the images to be processed.

In the embodiment of the application, the server can extract the foreground area of the image to be processed to obtain the foreground image, and then segment the foreground image to obtain a plurality of cell images.

For example, the image to be processed is WSI, the server extracts the foreground region of the WSI by using a segmentation method, such as OTSU, and then segments the WSI based on a grid mode to obtain a plurality of cell images at a visual field level.

306. The server performs feature extraction on the input cell image through a feature extraction layer in the image classification model to obtain a global feature and a plurality of local features of the cell image, wherein the local features are used for indicating the features of the anchor point positions in the cell image.

This step is referred to as step 301, and will not be described herein again.

307. And the server fuses the global feature and the local features through a feature fusion layer in the image classification model to obtain a classification feature, wherein the classification feature is used for determining the target image category to which the cell image belongs.

In the embodiment of the application, the server can process a plurality of local features to obtain attention information of a plurality of image types. Then, the global features are weighted according to the attention information of the image categories to obtain classification features, wherein the attention information is used for representing the weights of the corresponding image categories, and the classification features comprise the weighted features of the image categories.

In some embodiments, the server is able to determine attention information by image category. Correspondingly, for any image category, the server acquires a plurality of pyramid features of the image category and then determines attention information of the image category based on the plurality of pyramid features. The pyramid features are used to indicate a plurality of local features extracted from the corresponding pyramid images, which are obtained based on the cell images, see step 302 above, and are not described herein again.

In some embodiments, the server can obtain attention information for each image category from the pyramid features. For any pyramid feature, the server generates an attention feature corresponding to the pyramid feature through the maximum value of each element in a plurality of local features included in the pyramid feature. Then, the server generates the attention feature of the image category according to the maximum value of each element in the attention features corresponding to the pyramid features. And finally, the server normalizes the attention features of the image categories to obtain the attention information of the image categories. Referring to step 302, the detailed description is omitted here.

308. And the server classifies the characteristics and determines the target image category to which the cell image belongs.

In the embodiment of the application, the server can predict the probability that the cell image belongs to each image category through the classification features, and determine the image category with the highest probability as the target image category to which the cell image belongs.

309. And the server determines target attribute information of the cell image according to the distance between the global feature and a plurality of image attribute features of the target image category through an attribute determination layer in the image classification model, wherein the image attribute features are used for representing the features of the image attributes included in the target image category.

In the embodiment of the application, after determining the target image category, the server can obtain a plurality of image attribute features of the target image category, then respectively determine distances between the global feature and the plurality of image attribute features, and obtain an image attribute represented by the image attribute feature with the smallest distance as target attribute information. The image attribute may also be referred to as a sub-category. Since the image attribute is target attribute information indicating the association between the cell image and the target image category, the image attribute indicates the reason for classifying the cell image into the target image category.

310. The server outputs the target image category and the target attribute information of the cell image through an output layer in the image classification model.

It should be noted that the server can also determine whether the cell image is an image including normal cells according to the type of the target image, if so, output the cell image as normal or negative, otherwise, output the cell image as abnormal or positive, which is not limited in the embodiment of the present application.

For example, the image to be processed is WSI obtained based on a pap smear or liquid-based thin-layer cytogram technique, and is used for screening cervical cancer. The cell image of the input image classification model is an image of a cell region in the WSI. The image classification model can output a target image type of the Cell image, and if the Cell image is an image of a Squamous Cell, the target image type may be any of Atypical Squamous Cells (ascis), Low-grade Squamous Intraepithelial lesions (LSIL), Atypical Squamous Cells-highly prone lesions (ascil), High-grade Squamous Intraepithelial lesions (HSIL), or Squamous Cell Carcinomas (SCC). If the cell image is an image of gland Cells, the target image class may be Atypical Gland Cells (AGC) or Adenocarcinoma (ADC). Taking the target image type of the cell image as LSIL as an example, since the judgment criteria of LSIL include, but are not limited to, nuclear-to-cytoplasmic ratio, cell hollowing, nuclear deep staining, etc., the target attribute information may be any one of nuclear-to-cytoplasmic ratio increase, cell hollowing, nuclear deep staining, etc.

The embodiment of the present application provides an image processing scheme, where because images of different image categories have different attributes, if an image has an attribute in an image category, it indicates that the image may belong to the image category. The global features and the local features in the cell images are extracted and fused, so that the information acquired from the cell images is increased, and the cell images are classified through the classification features obtained through fusion to obtain the target image categories. After the classification, the target attribute information is determined through the distance between the global feature and a plurality of image attribute features of the target image category to represent the association relationship between the cell image and the target image category, so that the target attribute information can explain the reason for classifying the cell image into the target image category, and the image processing process has interpretability.

Fig. 5 is a block diagram of an image processing apparatus provided according to an embodiment of the present application. The apparatus is for performing the steps in the above-described image processing method, and referring to fig. 5, the apparatus comprises: a feature extraction module 501, a feature fusion module 502, an attribute determination module 503 and an output module 504;

a feature extraction module 501, configured to perform feature extraction on an input cell image through a feature extraction layer in an image classification model to obtain a global feature and a plurality of local features of the cell image, where the image classification model is configured to classify the input image and output an image category and attribute information of the input image, the attribute information is used to represent an association relationship between the input image and the image category, and the local features are used to indicate features of anchor point positions in the cell image;

a feature fusion module 502, configured to fuse the global feature and the multiple local features through a feature fusion layer in the image classification model to obtain a classification feature, where the classification feature is used to determine a target image category to which the cell image belongs;

an attribute determining module 503, configured to determine, by an attribute determining layer in the image classification model, target attribute information of the cell image according to distances between the global feature and a plurality of image attribute features of the target image category, where the image attribute features are used to represent features of image attributes included in the target image category;

an output module 504, configured to output the target image category and the target attribute information of the cell image through an output layer in the image classification model.

In some embodiments, fig. 6 is a block diagram of another image processing apparatus provided in an embodiment of the present application, and referring to fig. 6, the feature fusion module 502 includes:

a feature processing unit 601, configured to process the plurality of local features to obtain attention information of a plurality of image categories, where the attention information is used to indicate weights of corresponding image categories;

a feature fusion unit 602, configured to weight the global feature according to the attention information of the multiple image categories to obtain the classification feature, where the classification feature includes weighted features of the multiple image categories.

In some embodiments, the feature processing unit 601 is configured to, for any image category, obtain a plurality of pyramid features of the image category, where the pyramid features are used to indicate a plurality of local features extracted from a corresponding pyramid image, and the pyramid image is obtained based on the cell image; based on the plurality of pyramid features, attention information for the image category is determined.

In some embodiments, the feature processing unit 601 is configured to, for any pyramid feature, generate an attention feature corresponding to the pyramid feature by using a maximum value of each element in a plurality of local features included in the pyramid feature; generating the attention feature of the image category according to the maximum value of each element in the attention feature corresponding to the pyramid features; and normalizing the attention feature of the image category to obtain the attention information of the image category.

In some embodiments, referring to fig. 6, the apparatus further comprises:

the image processing module 505 is configured to extract a foreground region of the image to be processed to obtain a foreground image; and segmenting the foreground image to obtain a plurality of cell images.

In some embodiments, referring to fig. 6, the apparatus further comprises:

a training module 506, configured to perform feature extraction on the sample cell image through a feature extraction layer in the image classification model to obtain a sample global feature and a plurality of sample local features of the sample cell image, where the sample local features are used to indicate features of anchor point positions in the sample cell image; fusing the global sample characteristics and the local sample characteristics through a characteristic fusion layer in the image classification model to obtain a plurality of sample classification characteristics, wherein the sample classification characteristics are used for determining the probability that the sample cell images belong to each sample image category; respectively determining distances between the sample classification features and prototype features through an attribute determination layer in the image classification model, wherein the prototype features are used for representing features of sample sub-categories included in the corresponding sample image category; training the image classification model based on the label image class to which the sample cell image belongs, the sample local features, the sample classification features and the distances between the sample classification features and prototype features.

In some embodiments, the training module 506 is configured to determine a probability that the sample cell image belongs to each sample image class based on a plurality of sample classification features; determining an asymmetric loss based on the label image class to which the sample cell image belongs and the probability that the sample cell image belongs to the label image class; determining pyramid classification loss and pyramid regression loss based on the plurality of sample local features; for any sample image category, determining the contrast loss of the sample image category according to the distance between the sample classification feature of the sample image category and the prototype feature of the sample image category; and training the image classification model through the asymmetric loss, the pyramid classification loss, the pyramid regression loss and the contrast loss of a plurality of sample image categories.

In some embodiments, the training module 506 is configured to obtain a target sub-feature with a minimum distance from the sample classification feature of the sample image class from the prototype feature of the sample image class; and determining the contrast loss of the sample image class according to the distance between the target sub-feature and the sample classification feature.

In some embodiments, the sample cell image further includes attribute information indicating a sample subcategory in a category of label images to which the sample cell image belongs;

the training module 506, configured to determine attribute loss according to a distance between a sample subcategory feature of a first sample cell image and a sample subcategory feature of a second sample cell image, the first sample cell image and the second sample cell image belonging to a sample subcategory in a same sample image category; should train this image classification model through this asymmetric loss, this pyramid classification loss, this pyramid regression loss and the contrast loss of these a plurality of sample image categories, include: and training the image classification model through the asymmetric loss, the pyramid classification loss, the pyramid regression loss, the contrast loss of the plurality of sample image categories and the attribute loss.

The embodiment of the application provides an image processing scheme, and since images of different image types have different attributes, if an image has an attribute in a certain image type, it indicates that the image may belong to the image type. The global features and the local features in the cell images are extracted and fused, so that the information acquired from the cell images is increased, and the cell images are classified through the classification features obtained through fusion to obtain the target image categories. After the classification, target attribute information is determined through the distance between a plurality of image attribute features of the target image category and the global feature so as to represent the association relationship between the cell image and the target image category, so that the target attribute information can explain the reason for classifying the cell image into the target image category, and the image processing process has interpretability.

It should be noted that: in the image processing apparatus provided in the above embodiment, only the division of the functional modules is illustrated when performing image processing, and in practical applications, the functions may be distributed by different functional modules as needed, that is, the internal structure of the apparatus may be divided into different functional modules to complete all or part of the functions described above. In addition, the image processing apparatus and the image processing method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.

In this embodiment of the present application, the computer device can be configured as a terminal or a server, when the computer device is configured as a terminal, the terminal can be used as an execution subject to implement the technical solution provided in the embodiment of the present application, when the computer device is configured as a server, the server can be used as an execution subject to implement the technical solution provided in the embodiment of the present application, or the technical solution provided in the present application can be implemented through interaction between the terminal and the server, which is not limited in this embodiment of the present application.

When the computer device is configured as a terminal, fig. 7 is a block diagram of a terminal 700 according to an embodiment of the present application. The terminal 700 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 700 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so on.

In general, terminal 700 includes: a processor 701 and a memory 702.

The processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 701 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 701 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 701 may be integrated with a GPU (Graphics Processing Unit) which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, the processor 701 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 702 may include one or more computer-readable storage media, which may be non-transitory. Memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 702 is used to store at least one computer program for execution by the processor 701 to implement the image processing method provided by the method embodiments herein.

In some embodiments, the terminal 700 may further optionally include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 703 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 704, a display screen 705, a camera assembly 706, an audio circuit 707, and a power supply 708.

The peripheral interface 703 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 701 and the memory 702. In some embodiments, processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 704 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 704 communicates with a communication network and other communication devices via electromagnetic signals. The rf circuit 704 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. In some embodiments, the radio frequency circuitry 704 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 704 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, various generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 704 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 705 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 705 is a touch display screen, the display screen 705 also has the ability to capture touch signals on or over the surface of the display screen 705. The touch signal may be input to the processor 701 as a control signal for processing. At this point, the display 705 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 705 may be one, disposed on a front panel of the terminal 700; in other embodiments, the display 705 can be at least two, respectively disposed on different surfaces of the terminal 700 or in a foldable design; in other embodiments, the display 705 may be a flexible display disposed on a curved surface or a folded surface of the terminal 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The Display 705 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or the like.

The camera assembly 706 is used to capture images or video. In some embodiments, camera assembly 706 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, the main camera and the wide-angle camera are fused to realize panoramic shooting and a VR (Virtual Reality) shooting function or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing or inputting the electric signals to the radio frequency circuit 704 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 700. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 707 may also include a headphone jack.

The power supply 708 is used to power the various components in the terminal 700. The power source 708 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When the power source 708 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

Those skilled in the art will appreciate that the configuration shown in fig. 7 is not limiting of terminal 700 and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components may be used.

When the computer device is configured as a server, fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 800 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 801 and one or more memories 802, where the memories 802 store at least one computer program, and the at least one computer program is loaded and executed by the processors 801 to implement the image Processing method provided by each method embodiment. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

The embodiment of the present application further provides a computer-readable storage medium, where at least one piece of computer program is stored in the computer-readable storage medium, and the at least one piece of computer program is loaded and executed by a processor of a computer device to implement the operations performed by the computer device in the image processing method according to the foregoing embodiment. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In some embodiments, a computer program according to embodiments of the present application may be deployed to be executed on one computer apparatus or on multiple computer apparatuses at one site, or on multiple computer apparatuses distributed at multiple sites and interconnected by a communication network, and the multiple computer apparatuses distributed at the multiple sites and interconnected by the communication network may constitute a block chain system.

Embodiments of the present application also provide a computer program product or a computer program comprising computer program code stored in a computer readable storage medium. The processor of the computer device reads the computer program code from the computer-readable storage medium, and the processor executes the computer program code, so that the computer device executes the image processing method provided in the above-described various alternative implementations.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. An image processing method, characterized in that the method comprises:

2. The method of claim 1, wherein said fusing the global feature and the plurality of local features to obtain a classification feature comprises:

processing the local features to obtain attention information of a plurality of image categories, wherein the attention information is used for representing the weight of the corresponding image category;

and weighting the global features through the attention information of the image categories to obtain the classification features, wherein the classification features comprise the weighted features of the image categories.

3. The method of claim 2, wherein said processing the plurality of local features to obtain attention information for a plurality of image classes comprises:

for any image category, obtaining a plurality of pyramid features of the image category, the pyramid features being used to indicate a plurality of local features extracted from corresponding pyramid images, the pyramid images being obtained based on the cell images;

determining attention information for the image category based on the plurality of pyramidal features.

4. The method of claim 3, wherein the determining attention information for the image category based on the plurality of pyramid features comprises:

for any pyramid feature, generating an attention feature corresponding to the pyramid feature through the maximum value of each element in a plurality of local features included in the pyramid feature;

generating the attention feature of the image category according to the maximum value of each element in the attention feature corresponding to the pyramid features;

and normalizing the attention features of the image categories to obtain the attention information of the image categories.

5. The method according to any one of claims 1-4, further comprising:

extracting a foreground area of an image to be processed to obtain a foreground image;

and segmenting the foreground image to obtain a plurality of cell images.

6. The method according to any one of claims 1-4, further comprising:

performing feature extraction on a sample cell image through a feature extraction layer in the image classification model to obtain a sample global feature and a plurality of sample local features of the sample cell image, wherein the sample local features are used for indicating features of anchor point positions in the sample cell image;

fusing the sample global features and the plurality of sample local features through a feature fusion layer in the image classification model to obtain a plurality of sample classification features, wherein the sample classification features are used for determining the probability that the sample cell images belong to each sample image category;

respectively determining distances between the plurality of sample classification features and a plurality of prototype features through an attribute determination layer in the image classification model, wherein the prototype features are used for representing features of a plurality of sample subcategories included in the corresponding sample image category;

training the image classification model based on the label image class to which the sample cell image belongs, the plurality of sample local features, the plurality of sample classification features, and distances between the plurality of sample classification features and a plurality of prototype features.

7. The method of claim 6, wherein training the image classification model based on the label image class to which the sample cell image belongs, the plurality of sample local features, the plurality of sample classification features, and distances between the plurality of sample classification features and a plurality of prototype features comprises:

determining a probability that the sample cell image belongs to each sample image category based on a plurality of sample classification features;

determining an asymmetry loss based on a label image class to which the sample cell image belongs and a probability that the sample cell image belongs to the label image class;

determining pyramid classification loss and pyramid regression loss based on the plurality of sample local features;

for any sample image class, determining a contrast loss of the sample image class by a distance between a sample classification feature of the sample image class and a prototype feature of the sample image class;

and training the image classification model through the asymmetric loss, the pyramid classification loss, the pyramid regression loss and the contrast loss of a plurality of sample image categories.

8. The method of claim 7, wherein determining the loss of contrast for the sample image class by a distance between a sample classification feature of the sample image class and a prototype feature of the sample image class comprises:

acquiring a target sub-feature with the minimum distance to the sample classification feature of the sample image category from the prototype feature of the sample image category;

determining a contrast loss for the sample image class by a distance between the target sub-feature and the sample classification feature.

9. The method according to claim 7, wherein the sample cell image further includes attribute information indicating a sample sub-category in a label image category to which the sample cell image belongs;

the method further comprises the following steps:

determining a property loss by a distance between a sample subcategory feature of a first sample cell image and a sample subcategory feature of a second sample cell image, the first sample cell image and the second sample cell image belonging to a sample subcategory in a same sample image category;

the training the image classification model through the asymmetric loss, the pyramid classification loss, the pyramid regression loss, and the contrast loss of the plurality of sample image classes includes:

training the image classification model through the asymmetric loss, the pyramid classification loss, the pyramid regression loss, the contrast loss of the plurality of sample image categories, and the attribute loss.

10. An image processing apparatus, characterized in that the apparatus comprises:

11. A computer device, characterized in that the computer device comprises a processor and a memory for storing at least one piece of computer program, which is loaded by the processor and executes the image processing method of any of claims 1 to 9.

12. A computer-readable storage medium for storing at least one piece of a computer program for executing the image processing method according to any one of claims 1 to 9.

13. A computer program product comprising a computer program, characterized in that the computer program realizes the image processing method according to any of claims 1 to 9 when executed by a processor.