CN111209948A - Image processing method and device and electronic equipment - Google Patents

Image processing method and device and electronic equipment Download PDF

Info

Publication number
CN111209948A
CN111209948A CN201911426049.2A CN201911426049A CN111209948A CN 111209948 A CN111209948 A CN 111209948A CN 201911426049 A CN201911426049 A CN 201911426049A CN 111209948 A CN111209948 A CN 111209948A
Authority
CN
China
Prior art keywords
image
region
super
feature
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911426049.2A
Other languages
Chinese (zh)
Inventor
王扬斌
张鹿鸣
王泽鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Fubo Technology Co Ltd
Original Assignee
Hangzhou Fubo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Fubo Technology Co Ltd filed Critical Hangzhou Fubo Technology Co Ltd
Priority to CN201911426049.2A priority Critical patent/CN111209948A/en
Publication of CN111209948A publication Critical patent/CN111209948A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The application provides an image processing method and device and electronic equipment. An image processing method comprising: dividing an image to be detected into a plurality of super pixel areas; acquiring semantic labels corresponding to the semantic features according to the semantic features of the super-pixel region; embedding the semantic tags into the super-pixel region to generate a saliency region; extracting depth features of the saliency areas and generating an image kernel of the image to be detected; and processing the image kernel by using a vector classifier, and carrying out scene classification on the image to be detected.

Description

Image processing method and device and electronic equipment
Technical Field
The present application relates to the field of image processing, and in particular, to an image processing method and apparatus, and an electronic device.
Background
Image scene classification has important applications in computer vision and intelligent systems, such as: image understanding, automatic driving. The technique aims to automatically classify images into different categories based on key information such as objects, regions, context, etc. The existing method based on deep learning has a training stage of a black box, which is not in accordance with the visual perception of human beings for image scenes. In the training stage, the existing method needs a large number of semantic labels at the region or pixel level, and great challenges are brought to manual labeling.
Disclosure of Invention
The embodiment of the application aims to provide an image processing method and device and an electronic device.
In a first aspect, an embodiment provides an image processing method, including: dividing an image to be detected into a plurality of super pixel areas; acquiring semantic labels corresponding to the semantic features according to the semantic features of the super-pixel region; embedding the semantic tags into the super-pixel region to generate a saliency region; extracting depth features of the saliency areas and generating an image kernel of the image to be detected; and processing the image kernel by using a vector classifier, and carrying out scene classification on the image to be detected.
In an optional embodiment, after the image to be measured is segmented into a plurality of super pixel regions, the method further includes: and removing the super pixel regions with the size smaller than a preset value or the super pixel fraction lower than a threshold value.
In an alternative embodiment, embedding semantic tags into the superpixel region, generating a saliency region, comprises: embedding the semantic tags into the super-pixel region by utilizing a manifold learning algorithm; acquiring a base matrix and a sparse matrix from an original matrix of the super-pixel region according to the semantic label; correspondingly acquiring a saliency region from the super-pixel region according to the base matrix; wherein the base matrix represents a feature matrix with semantic labels, and the sparse matrix represents a feature matrix without labels.
In an optional embodiment, after embedding the semantic tag into the super-pixel region and generating the saliency region, the method further includes: calculating a significance score of the significance region according to the sparse coding norm of the significance region; and sequencing the salient regions according to the salient scores to generate a generalized sequence pattern set.
In an optional embodiment, extracting depth features of a salient region and generating an image kernel of an image to be detected includes: acquiring depth features of a salient region corresponding to the generalized sequence pattern set according to a neural network architecture; acquiring a feature vector of the depth feature; and acquiring an image kernel from the salient region according to the Euclidean distance between the feature vectors.
In an optional embodiment, processing an image kernel by using a vector classifier, and performing scene classification on an image to be detected includes: training a multi-class support vector machine classifier based on the image kernel; and utilizing a support vector machine classifier to correspond the image to be detected to different scene categories according to the feature vector of the image to be detected.
In a second aspect, an embodiment provides an image processing apparatus, including: the image segmentation module is used for segmenting an image to be detected into a plurality of super pixel areas; the label acquisition module is used for acquiring semantic labels corresponding to the semantic features according to the semantic features of the super pixel region; the tag embedding module is used for embedding the semantic tags into the super-pixel region to generate a saliency region; the feature extraction module is used for extracting the depth features of the salient region and generating an image kernel of the image to be detected; and the scene classification module is used for processing the image kernel by using the vector classifier and carrying out scene classification on the image to be detected.
In an alternative embodiment, the tag embedding module is configured to: embedding the semantic tags into the super-pixel region by utilizing a manifold learning algorithm; acquiring a base matrix and a sparse matrix from an original matrix of the super-pixel region according to the semantic label; correspondingly acquiring a saliency region from the super-pixel region according to the base matrix; wherein the base matrix represents a feature matrix with semantic labels, and the sparse matrix represents a feature matrix without labels.
In an alternative embodiment, the feature extraction module is configured to: acquiring depth features of a salient region corresponding to the generalized sequence pattern set according to a neural network architecture; acquiring a feature vector of the depth feature; and acquiring an image kernel from the salient region according to the Euclidean distance between the feature vectors.
In an alternative embodiment, the scene classification module is configured to: training a multi-class support vector machine classifier based on the image kernel; and utilizing a support vector machine classifier to correspond the image to be detected to different scene categories according to the feature vector of the image to be detected.
In a third aspect, an embodiment provides an electronic device, including: a memory for storing a computer program; a processor for performing the method of any one of the preceding embodiments.
The beneficial effect that technical scheme that this application provided brought is:
1. according to the image scene classification method under the weak supervision, the accuracy of image scene classification is improved by combining human visual perception.
2. According to the embodiment of the application, only image-level semantic labels are needed in the training stage by using the manifold learning algorithm, so that the manual labeling amount is greatly reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;
fig. 2 is a schematic view of an interactive scene provided in an embodiment of the present application;
fig. 3 is a flowchart of an image processing method according to an embodiment of the present application;
FIG. 4 is a flowchart of another image processing method provided in the embodiments of the present application;
fig. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.
Icon: icon: the system comprises electronic equipment 1, a bus 10, a processor 11, a memory 12, a user terminal 100, a server 200, an image segmentation module 501, a label acquisition module 502, a label embedding module 503, a feature extraction module 504 and a scene classification module 505.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
As shown in fig. 1, the present embodiment provides an electronic apparatus 1 including: at least one processor 11 and a memory 12, one processor being exemplified in fig. 1. The processor 11 and the memory 12 are connected by a bus 10, and the memory 12 stores instructions executable by the processor 11 and the instructions are executed by the processor 11.
In an embodiment, the electronic device 1 may obtain raw image data stored in the memory, process the raw image data into an image kernel according to semantic features, and perform scene classification on the raw image data according to the image kernel according to a vector classifier.
Fig. 2 is a schematic view of an application scenario of the method for evaluating image quality according to this embodiment. As shown in fig. 2, the application scenario may include the user terminal 100, and the user terminal 100 may be a smartphone or a tablet computer with a photographing function. The user terminal 100 may execute the image processing method provided by the present application to perform scene classification according to the captured image.
According to the requirement, the application scenario may further include a server 200, and the server 200 may be a server, a server cluster, or a cloud computing center. The server 200 may receive the image uploaded by the user terminal 100, execute the image processing method provided by the present application, and perform scene classification according to the captured image.
Please refer to fig. 3, which is a track data processing method provided in this embodiment, and the method can be executed by the electronic device 1 shown in fig. 1 and used in the interaction scenario shown in fig. 2. The method comprises the following steps:
step 301: and dividing the image to be measured into a plurality of super pixel areas.
In this step, the image to be measured may be image data stored in the memory, or image data collected by the user terminal. In the field of computer vision, image segmentation refers to the process of subdividing a digital image into a plurality of image sub-regions (sets of pixels), also called superpixels. The super pixel area is a small area formed by a series of pixel points which are adjacent in position and similar in color, brightness, texture and other characteristics. Most of these small regions retain effective information for further image segmentation, and generally do not destroy the boundary information of objects in the image.
In one embodiment, the image may be segmented into superpixels by using a SLIC algorithm, and the image may be segmented by using three segmentation parameters (0.5A, 0.2A, and 0.1A, where a represents a smaller value of width and height of the image) to obtain a series of superpixel regions. The SLIC algorithm is a superpixel segmentation algorithm. SLIC clusters similar pixels together with K-means clustering and sets the K-means search range to 2S, S representing the number of pixels in each superpixel. Therefore, the search range can be greatly reduced, and the calculation efficiency is improved.
In one embodiment, the bottom-level features of the image region are extracted, and the bottom-level features include 9-bit color moment features (color moments) and 128-dimensional gradient histogram features. Meanwhile, a linear discriminant analysis algorithm is utilized to learn and generate a linear mapping matrix, 137-dimensional features of a large number of candidate regions are mapped into two categories of regions, namely a good segmentation region and a damage segmentation region, and the damage segmentation region is removed to reduce the influence on the final classification result.
Step 302: and acquiring semantic labels corresponding to the semantic features according to the semantic features of the super-pixel region.
In this step, semantic features of an image can be divided into a visual layer, an object layer and a concept layer, the visual layer is a commonly understood bottom layer, namely color, texture, shape and the like, and these features are all called bottom layer feature semantics; the object layer, i.e. the middle layer, usually contains attribute features, etc., that is, the state of a certain object at a certain time; the conceptual level is a high level, being what the image represents is closest to human understanding. For example, a graph is provided with sand, blue sky, seawater and the like, a visual layer is a block of distinction, an object layer is composed of sand, blue sky and seawater, a concept layer is a beach, and the graph shows semantics.
In one embodiment, the label for an image may have: buildings, pedestrians, cars, sky, etc.
Step 303: and embedding the semantic label into the super-pixel area to generate a saliency area.
In this step, one is interested in only a partial region of an image, which allows one to know the main content of the image. The salient region is the region which can most arouse the interest of the user and can most express the content of the image in one image.
In one embodiment, a non-negative matrix decomposition is used to extract a salient region from an image to be detected, an original feature matrix is obtained, and the original feature matrix is decomposed into a base matrix and a sparse matrix.
Step 304: and extracting the depth features of the salient region to generate an image kernel of the image to be detected.
In the step, the salient regions are ordered according to the salient scores to form a generalized sequence mode, human visual perception is simulated, and non-negative matrix decomposition based on space reservation is constructed.
In one embodiment, the saliency score is a semantic/visual saliency parameter of the sparse coding norm metric for each region.
Step 305: and processing the image kernel by using a vector classifier, and carrying out scene classification on the image to be detected.
In this step, a multi-class Vector classifier is trained based on the obtained image kernel features, and in one embodiment, the Vector classifier may be a Support Vector Machine (SVM).
In one embodiment, scene classification is performed based on image kernel features of a test image according to a trained binary SVM classifier.
Please refer to fig. 4, which is a track data processing method provided in this embodiment, and the method can be executed by the electronic device 1 shown in fig. 1 and used in the interaction scenario shown in fig. 2. The method comprises the following steps:
step 401: and dividing the image to be measured into a plurality of super pixel areas. Please refer to the above embodiment for the description of step 301.
Step 402: and removing the super pixel regions with the size smaller than a preset value or the super pixel fraction lower than a threshold value.
In this step, the super-pixel region with segmentation damage needs to be removed after image segmentation, and in an embodiment, the super-pixel region with a size smaller than 0.01wl (pixel) and the super-pixel region with a super-pixel fraction lower than a threshold are removed, and the super-pixel region with a super-pixel fraction greater than or equal to the threshold is retained. In an embodiment, 177 types of images with labels in the ImageNet dataset can be utilized to extract color moments and histogram features of the images to form 137-dimensional features, and then a transfer matrix is obtained by training an LDA (Linear Discriminant Analysis). The score of SLIC segmentation is measured using a transition matrix. For example, the division fraction interval is set to [0,1], 1 is set to good division, 0 bit division is poor, and the division threshold is set to 0.4.
Step 403: and acquiring semantic labels corresponding to the semantic features according to the semantic features of the super-pixel region. Please refer to the above embodiments for the description of step 302.
Step 404: and embedding the semantic labels into the super-pixel region by using a manifold learning algorithm.
In this step, the image-level semantic tags are embedded into the image region by manifold learning using a weak supervised learning algorithm, and the formula is as follows:
Figure BDA0002353163650000081
wherein ls(i, j) represents the similarity of image region i and image region j, ld(i, j) represents the disparity of image regions i and j. Y ═ Y1,y2,…,yn]∈Rd×NWherein y isiIs a d-dimensional vector representing the ith embedded region. Likewise, yjIndicating the jth embedded region. Semantic labels are transferred to specific regions of the image by using a manifold learning method, and the formula shows that the proximity of the region i and the region j in the feature space should be consistent with the semantic labels of the image.
Image level tags are passed into various regions of the image. For example, the image level tags are: sky, grass, house. The formula expresses the transfer of these three labels into three regions of the image. In practical applications, we select 32 object concepts in advance, for example: sky, street lights, pedestrians, vehicles, trees, animals, grass, rivers, mountains, and so forth. If the object in the new test image does not belong to the 32 classes, it is indicated that the object is not important for classification and can be ignored.
In one embodiment, the minimum Y ═ Y1,y2,…,yn]Each of yiIs a d-dimensional vector representing an image region. If 32 object concepts are specified, then d is 32. In the 32-dimensional vector, each element is 0 or 1, 0 represents that the object is absent from the image, and 1 represents that the object is present. Thus, semantic information of the image can be represented by the 32-dimensional vector.
ls(i, j) represents the similarity of image regions i and j, ld(i, j) represents the disparity of image regions i and j. If there is similarity between the two regions, the two regions can be merged into one larger region block, i.e. sharing the same image level label.
In one embodiment, d in the d-dimensional vector is a variable set by the user, and different values of d affect the dimension of the feature vector, thereby affecting the final result. The purpose of this process is to automatically pass image level tags into specific pixel regions of the image.
Step 405: and acquiring a base matrix and a sparse matrix from the original matrix of the super-pixel region according to the semantic label.
In this step, let
Figure BDA0002353163650000091
Representing N scene images, wherein
Figure BDA0002353163650000092
Figure BDA0002353163650000093
A feature matrix with a label is represented,
Figure BDA0002353163650000094
representing a feature matrix without a label. Using non-negative matrix factorization:
Figure BDA0002353163650000095
Figure BDA0002353163650000096
wherein P ∈ R(137+d)×tDenotes a basis matrix, Q ∈ Rt×NRepresenting a sparse matrix, | | I ⊙ Q | | represents a regularization term,
Figure BDA0002353163650000097
a representative indication matrix is provided which,
Figure BDA0002353163650000098
zero matrix for representation (MK × U), M representing the dimension of each subspace
Figure BDA0002353163650000099
Denote the block diagonal matrix ⊙ denotes the element-wise multiplication, γ being the regularization parameter.
Step 406: and correspondingly acquiring a saliency area from the super-pixel area according to the base matrix.
In this step, the base matrix has a feature matrix with semantic labels, and salient regions with corresponding base matrices are obtained from the plurality of salient regions for subsequent calculation.
Step 407: and calculating the significance score of the significance region according to the sparse coding norm of the significance region.
In this step, the optimal sparse matrix Q is utilized*Calculating a scene region riFor non-negative matrix factorization, a global solution cannot be obtained since Q is non-convex for both matrices P. Therefore, an iterative method is adopted to obtain an optimal matrix. The formula is as follows:
Figure BDA0002353163650000101
the formula calculates the saliency score of the ri region, and the optimal sparse matrix Q represents the saliency region in the image, so that the saliency score corresponds to the saliency degree of the corresponding saliency region.
Step 408: and sequencing the salient regions according to the salient scores to generate a generalized sequence pattern set.
In this step, Q is determined from the optimal sparse matrix*Calculating a scene region riThe significance score of (a), the significance region is sorted.
In one embodiment, the GSP algorithm may be divided into three phases of candidate set generation, candidate set counting and extended classification, similar to the association rule (Apriori) algorithm. In one embodiment, the association rule (Apriori) algorithm employs one AprioriAll algorithm, and the GSP algorithm counts fewer candidate sets than AprioriAll algorithm and does not need to calculate a frequent set in advance in the data conversion process. The Apriori algorithm is the first association rule mining algorithm, and is also the most classical algorithm. The method finds out the relation of item sets in the database by using an iterative method of searching layer by layer to form a rule, and the process of the method consists of connection and pruning. Where concatenation may be a matrix-like operation and pruning may be to remove unnecessary intermediate results. The concept of a set of terms in the algorithm is a set of terms. For example, a set containing K items is a set of K items, and the frequency of occurrence of the set of items is the number of transactions containing the set of items, which is referred to as the frequency of the set of items. If a certain item set meets the minimum support, it is called a frequent item set.
In one embodiment, the connection phase: if the first item of the drop sequence pattern S1 is the same as the sequence of the drop sequence pattern S2, S1 and S2 can be concatenated, i.e., the last item of S2 is added to S1.
Pruning: if a subset of the candidate sequence patterns is not a sequence pattern, the candidate sequence pattern is not likely to be a sequence pattern, and is deleted from the candidate sequence patterns.
Step 409: and acquiring depth features of the salient regions corresponding to the generalized sequence pattern set according to the neural network architecture.
In this step, depth features are integrated using a statistical algorithm: a neural network is used to extract depth features of A dimension from each region constituting the GSP, and then a statistical algorithm is used to fuse the region features. Specifically, let
Figure BDA0002353163650000111
Wherein
Figure BDA0002353163650000112
Representing the depth characteristics of each region in a GSP. Order to
Figure BDA0002353163650000113
Representing a series of
Figure BDA0002353163650000114
The m-th component of (1). Let F ═ min, max, mean denote the statistical method, which integrates the regional features in the GSP into an S-dimensional vector. This process formula is summarized below:
Figure BDA0002353163650000115
wherein W ∈ RS×4AIs the polymerization parameter of the fully-connected layer,
Figure BDA0002353163650000116
indicating that 4A feature vectors are concatenated into one feature vector. k is a radical ofuThe u-th statistical method is shown. For example, a GSP composed of three salient regions with depth features of theta sequentially is extracted from a scene image1={3,5,2},θ1={2,6,1},θ 11,4,3, where the characteristic dimension a is 3. Then
Figure BDA0002353163650000117
Figure BDA0002353163650000118
Using the F statistical method, ω ═ {1,2,3,3,6,5,2,4,3,2,4,3 }.
Step 410: and acquiring a feature vector of the depth feature.
In this step, if 3 superpixels are characterized by
Figure BDA0002353163650000119
First, we use statistical algorithm F ═ min, max, mean }, where we first say that we are
Figure BDA00023531636500001110
Firstly, selecting the min value of each line according to the line to obtain 1,3 and 2; then, selecting the max value of each row according to the rows to obtain 3,7 and 4; and obtaining the average value mean and the intermediate value mean by the same method, and finally obtaining the internal aggregation of {1,3,2,3,7,4,2,5,3,2,5,3 }. The external polymerization only needs to be carried out after the internal polymerization and then the splicing together.
Step 411: and acquiring an image kernel from the salient region according to the Euclidean distance between the feature vectors.
In this step, the image kernel mechanism depends on the distance between scene images, and the calculation method depends on the extracted GSP features. In particular, given a scene image, its GSP is transferred to a vector
Figure BDA0002353163650000121
Wherein each element is
Figure BDA0002353163650000122
Where δ (·,) represents the euclidean distance between the two vectors, ω represents the depth features extracted from each GSP, N represents the number of training scene images, B (P)*) Representing P in GSP*Number of significant regions.
In one embodiment, P*Representing a given scene image, its GSP being denoted P*. The dimensionality of the features can be further reduced by utilizing an image kernel mechanism。
Step 412: and training a multi-class support vector machine classifier based on the image kernel.
In this step, a multi-class SVM classifier is trained based on the obtained image kernel features. For distinguishing the p-th and q-th scene classes, a two-classification SVM classifier is trained, i.e.
Figure BDA0002353163650000123
Figure BDA0002353163650000124
Wherein, βi∈RNRepresenting the feature vector corresponding to the ith training scene image, liIs a class label, i.e. belongs to class p i1 in class qi= 1.α denotes a hyperplane C distinguishing the p-th class from the q-th class>0 balances the complexity of the image kernel mechanism and the number of undivided scene images. N is a radical ofpqIndicating the number of training images in either the p-th or q-th class. Assuming a common R-type scene image, a total of R (R-1)/2 SVM classifiers need to be trained.
Step 413: and utilizing a support vector machine classifier to correspond the image to be detected to different scene categories according to the feature vector of the image to be detected.
In this step, the image to be detected is mapped to different scene categories by using a support vector machine classifier according to the feature vector of the image to be detected, and finally the processing process of image scene classification is completed.
Please refer to fig. 5, which is an image processing apparatus 500 according to an embodiment of the present disclosure, where the image processing apparatus 500 may be applied to the electronic device 1 shown in fig. 1 and may be applied to the interactive scene shown in fig. 2 to obtain original image data stored in a memory, process the original image data into an image kernel according to semantic features, and perform scene classification on the original image data according to the image kernel according to a vector classifier. The device includes: the system comprises an image segmentation module 501, a label acquisition module 502, a label embedding module 503, a feature extraction module 504 and a scene classification module 505.
An image segmentation module 501 is configured to segment the image to be measured into a plurality of super pixel regions. Please refer to the description of step 301 in the above embodiments.
A tag obtaining module 502, configured to obtain a semantic tag corresponding to the semantic feature according to the semantic feature of the super pixel region. Please refer to the description of step 302 in the above embodiment.
And a tag embedding module 503, configured to embed the semantic tag into the super-pixel region to generate a saliency region. Please refer to the description of step 303 in the above embodiments.
The feature extraction module 504 is configured to extract depth features of the significant region, and generate an image kernel of the image to be detected. Please refer to the description of step 304 in the above embodiment.
And a scene classification module 505, configured to process the image kernel by using a vector classifier, and perform scene classification on the image to be detected. Please refer to the description of step 305 in the above embodiment.
In one embodiment, the tag embedding module 503 is configured to: embedding the semantic tags into the super-pixel region by utilizing a manifold learning algorithm; acquiring a base matrix and a sparse matrix from an original matrix of the super-pixel region according to the semantic label; correspondingly acquiring a saliency region from the super-pixel region according to the base matrix; wherein the base matrix represents a feature matrix with semantic labels, and the sparse matrix represents a feature matrix without labels.
In one embodiment, the feature extraction module 504 is configured to: acquiring depth features of a salient region corresponding to the generalized sequence pattern set according to a neural network architecture; acquiring a feature vector of the depth feature; and acquiring an image kernel from the salient region according to the Euclidean distance between the feature vectors.
In one embodiment, the scene classification module 505 is configured to: training a multi-class support vector machine classifier based on the image kernel; and utilizing a support vector machine classifier to correspond the image to be detected to different scene categories according to the feature vector of the image to be detected.
In an embodiment, the image processing apparatus 500 may further include a data filtering module for removing a super-pixel region having a size smaller than a predetermined value or a super-pixel fraction lower than a threshold value.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
It should be noted that the functions, if implemented in the form of software functional modules and sold or used as independent products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. An image processing method, comprising:
dividing an image to be detected into a plurality of super pixel areas;
obtaining a semantic label corresponding to the semantic feature according to the semantic feature of the super pixel region;
embedding the semantic label into the super-pixel region to generate a saliency region;
extracting the depth features of the salient region and generating an image kernel of the image to be detected;
and processing the image kernel by using a vector classifier, and carrying out scene classification on the image to be detected.
2. The method of claim 1, further comprising, after said segmenting the image under test into a plurality of super-pixel regions:
and removing the super pixel regions with the size smaller than a preset value or the super pixel fraction lower than a threshold value.
3. The method of claim 1, wherein said embedding said semantic tags into said superpixel region, generating a saliency region, comprises:
embedding the semantic label into the super-pixel region by utilizing a manifold learning algorithm;
acquiring a base matrix and a sparse matrix from an original matrix of the super-pixel region according to the semantic label;
correspondingly acquiring the saliency areas from the super-pixel areas according to the base matrix;
wherein the base matrix represents a feature matrix with semantic tags and the sparse matrix represents a feature matrix without tags.
4. The method of claim 3, further comprising, after said embedding said semantic tag into said superpixel region, generating a saliency region:
calculating a significance score of the significance region according to the sparse coding norm of the significance region;
and sequencing the significance regions according to the significance scores to generate a generalized sequence pattern set.
5. The method according to claim 4, wherein the extracting the depth feature of the salient region and generating an image kernel of the image to be detected comprises:
acquiring depth features of the salient region corresponding to the generalized sequence pattern set according to a neural network architecture;
acquiring a feature vector of the depth feature;
and acquiring the image kernel from the salient region according to the Euclidean distance between the feature vectors.
6. The method of claim 5, wherein the processing the image kernel with the vector classifier to perform scene classification on the image under test comprises:
training a multi-class support vector machine classifier based on the image kernel;
and utilizing the support vector machine classifier to correspond the image to be detected to different scene categories according to the feature vector of the image to be detected.
7. An image processing apparatus characterized by comprising:
the image segmentation module is used for segmenting an image to be detected into a plurality of super pixel areas;
the label acquisition module is used for acquiring a semantic label corresponding to the semantic feature according to the semantic feature of the super pixel region;
the label embedding module is used for embedding the semantic label into the super-pixel region to generate a saliency region;
the feature extraction module is used for extracting the depth features of the salient region and generating an image kernel of the image to be detected;
and the scene classification module is used for processing the image kernel by using a vector classifier and carrying out scene classification on the image to be detected.
8. The apparatus of claim 7, wherein the tag embedding module is to:
embedding the semantic label into the super-pixel region by utilizing a manifold learning algorithm;
acquiring a base matrix and a sparse matrix from an original matrix of the super-pixel region according to the semantic label;
correspondingly acquiring the saliency areas from the super-pixel areas according to the base matrix;
wherein the base matrix represents a feature matrix with semantic tags and the sparse matrix represents a feature matrix without tags.
9. The apparatus of claim 7, wherein the feature extraction module is configured to:
acquiring depth features of the salient region corresponding to the generalized sequence pattern set according to a neural network architecture;
acquiring a feature vector of the depth feature;
and acquiring the image kernel from the salient region according to the Euclidean distance between the feature vectors.
10. An electronic device, comprising:
a memory for storing a computer program;
a processor for performing the method of any one of claims 1 to 6.
CN201911426049.2A 2019-12-31 2019-12-31 Image processing method and device and electronic equipment Withdrawn CN111209948A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911426049.2A CN111209948A (en) 2019-12-31 2019-12-31 Image processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911426049.2A CN111209948A (en) 2019-12-31 2019-12-31 Image processing method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN111209948A true CN111209948A (en) 2020-05-29

Family

ID=70789480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911426049.2A Withdrawn CN111209948A (en) 2019-12-31 2019-12-31 Image processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111209948A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113052247A (en) * 2021-03-31 2021-06-29 清华苏州环境创新研究院 Garbage classification method and garbage classifier based on multi-label image recognition

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113052247A (en) * 2021-03-31 2021-06-29 清华苏州环境创新研究院 Garbage classification method and garbage classifier based on multi-label image recognition

Similar Documents

Publication Publication Date Title
US10102443B1 (en) Hierarchical conditional random field model for labeling and segmenting images
EP2955645A1 (en) System for automated segmentation of images through layout classification
Lee et al. Object-graphs for context-aware visual category discovery
Zhang et al. Coarse-to-fine object detection in unmanned aerial vehicle imagery using lightweight convolutional neural network and deep motion saliency
Elguebaly et al. Simultaneous high-dimensional clustering and feature selection using asymmetric Gaussian mixture models
Zhang et al. Weakly supervised human fixations prediction
Tian et al. Video object detection for tractability with deep learning method
Xia et al. Weakly supervised multimodal kernel for categorizing aerial photographs
Mehrjardi et al. A survey on deep learning-based image forgery detection
Kishorjit Singh et al. Image classification using SLIC superpixel and FAAGKFCM image segmentation
Bai et al. Principal pixel analysis and SVM for automatic image segmentation
Feng et al. Beyond tag relevance: integrating visual attention model and multi-instance learning for tag saliency ranking
Ali et al. Content-based image retrieval based on late fusion of binary and local descriptors
Gao et al. Evaluation of regularized multi-task leaning algorithms for single/multi-view human action recognition
Wang et al. Multi-level feature fusion model-based real-time person re-identification for forensics
Cholakkal et al. A classifier-guided approach for top-down salient object detection
Zhang et al. A review of co-saliency detection technique: Fundamentals, applications, and challenges
Najibi et al. Towards the success rate of one: Real-time unconstrained salient object detection
CN108664968B (en) Unsupervised text positioning method based on text selection model
Liao et al. Multi-scale saliency features fusion model for person re-identification
Zhou et al. Semantic image segmentation using low-level features and contextual cues
Choudhuri et al. Object localization on natural scenes: A survey
CN111209948A (en) Image processing method and device and electronic equipment
Mallick et al. Video retrieval using salient foreground region of motion vector based extracted keyframes and spatial pyramid matching
CN110796650A (en) Image quality evaluation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200529