CN112348061A

CN112348061A - Classification vector generation method and device, computer equipment and storage medium

Info

Publication number: CN112348061A
Application number: CN202011155908.1A
Authority: CN
Inventors: 姚广
Original assignee: Shanghai Eye Control Technology Co Ltd
Current assignee: Shanghai Eye Control Technology Co Ltd
Priority date: 2020-10-26
Filing date: 2020-10-26
Publication date: 2021-02-09

Abstract

The application relates to a classification vector generation method, a classification vector generation device, a computer device and a storage medium. The method comprises the following steps: acquiring picture data to be processed, wherein the picture data to be processed comprises the picture data to be processed under a plurality of rotation angles; inputting the picture data to be processed into a classification network obtained by pre-training so as to extract the feature vectors of a plurality of rotation angles corresponding to the picture data to be processed; respectively extracting preset dimensions in the feature vectors of the plurality of rotation angles according to a preset rule; combining the preset dimensions of the extracted feature vectors of the plurality of rotation angles to obtain a comprehensive classification vector, wherein the comprehensive classification vector is used for classifying the picture data to be processed to obtain a corresponding classification label. By adopting the method, the classification accuracy can be improved.

Description

Classification vector generation method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for generating a classification vector, a computer device, and a storage medium.

Background

Metric Learning (Metric Learning) is a commonly used machine Learning method in face recognition, proposed by Eric Xing in NIPS 2002. It can learn a feature (Embedding) space in which all data are converted into one feature vector, and the distance between feature vectors of similar samples is small and the distance between feature vectors of dissimilar samples is large, thereby distinguishing data. Metric Learning is also known as Distance Metric Learning (DML) or similarity Learning. Metric learning is now commonly used in many fields such as image object detection, image classification, object tracking, face recognition, data classification, etc.

In the conventional technology, metric learning is divided into two types, one is based on supervised learning, and the other is based on unsupervised learning.

However, in the current metric learning, the learning training is only performed according to the samples with uniform angles, and when the angles of the data are changed, the accuracy is low by using the model obtained by the original training.

Disclosure of Invention

In view of the above, it is necessary to provide a classification vector generation method, apparatus, computer device, and storage medium capable of improving classification accuracy.

A method of classification vector generation, the method comprising:

acquiring picture data to be processed, wherein the picture data to be processed comprises the picture data to be processed under a plurality of rotation angles;

inputting the picture data to be processed into a classification network obtained by pre-training so as to extract the feature vectors of a plurality of rotation angles corresponding to the picture data to be processed;

respectively extracting preset dimensions in the feature vectors of the plurality of rotation angles according to a preset rule;

combining the preset dimensions of the extracted feature vectors of the plurality of rotation angles to obtain a comprehensive classification vector, wherein the comprehensive classification vector is used for classifying the picture data to be processed to obtain a corresponding classification label.

In one embodiment, the inputting the to-be-processed image data into a classification network obtained by pre-training to extract feature vectors of a plurality of rotation angles corresponding to the to-be-processed image data includes:

extracting an initial classification vector corresponding to the to-be-processed picture data under each rotation angle;

splitting the initial classification vector into a plurality of grouping vectors corresponding to the number of the rotation angles according to a preset grouping rule;

performing dimensionality reduction processing on the grouping vector;

and reversely combining the grouped vectors subjected to the dimensionality reduction processing according to the preset grouping rule to obtain the feature vectors of a plurality of rotation angles corresponding to the picture data to be processed.

In one embodiment, the performing the dimensionality reduction on the grouping vector includes:

and sequentially inputting the grouping vectors into a full-connection layer and a normalization layer so as to perform dimension reduction processing on the grouping vectors.

In one embodiment, before the inputting the to-be-processed image data into a classification network obtained through pre-training to extract the feature vectors of the plurality of rotation angles corresponding to the to-be-processed image data, the method further includes:

splitting the picture data to be processed to obtain the picture data to be processed under a plurality of rotation angles;

and merging the to-be-processed picture data under the plurality of rotation angles in the dimension of batch size.

In one embodiment, the splitting the to-be-processed picture data to obtain the to-be-processed picture data at a plurality of rotation angles includes:

and splitting the picture data to be processed according to the channel dimension to obtain the picture data to be processed under a plurality of rotation angles.

In one embodiment, the training mode of the classification network obtained by pre-training includes:

acquiring sample picture data, wherein the sample picture data comprises sample picture data at a plurality of rotation angles and corresponding sample labels;

inputting the sample picture data into an initial classification network to extract feature vectors of a plurality of rotation angles corresponding to the sample picture data;

respectively extracting preset dimensions in the feature vectors of the plurality of rotation angles corresponding to the sample picture data according to a preset rule;

respectively calculating an angle loss function corresponding to each rotation angle according to the sample label and preset dimensions in the feature vectors of the plurality of rotation angles corresponding to the sample picture data;

calculating according to the angle loss function to obtain a comprehensive loss function;

and carrying out gradient back transmission through the comprehensive loss function to train the initial classification network until the variation of the comprehensive loss function is smaller than a preset value, and finishing the training of the initial classification network.

In one embodiment, the calculating a synthetic loss function according to the angle loss function includes:

and calculating the average value corresponding to the angle loss function as a comprehensive loss function.

A method of picture classification, the method comprising:

generating a comprehensive classification vector by using a classification vector generation method in any one of the above methods;

matching the comprehensive classification vector with a standard vector in a database;

and when the comprehensive classification vector is successfully matched with the standard vector in the database, acquiring a classification label corresponding to the standard vector which is successfully matched as the classification label of the picture data to be processed.

An apparatus for classification vector generation, the apparatus comprising:

the data acquisition module is used for acquiring picture data to be processed, and the picture data to be processed comprises the picture data to be processed under a plurality of rotation angles;

the multi-angle feature generation module is used for inputting the picture data to be processed into a classification network obtained by pre-training so as to extract feature vectors of a plurality of rotation angles corresponding to the picture data to be processed;

the preset dimension extraction module is used for respectively extracting preset dimensions in the feature vectors of the plurality of rotation angles according to a preset rule;

and the classification vector generation module is used for merging preset dimensions of the extracted feature vectors of the plurality of rotation angles to obtain a comprehensive classification vector, and the comprehensive classification vector is used for classifying the image data to be processed to obtain a corresponding classification label.

A computer device comprising a memory storing a computer program and a processor implementing the steps of the method of any preceding claim when the processor executes the computer program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any of the above.

According to the classification vector generation method, the classification vector generation device, the computer equipment and the storage medium, the feature vectors of the plurality of rotation angles corresponding to the image data to be processed are obtained through the classification network, the feature vectors of the plurality of rotation angles are combined to obtain the comprehensive classification vector, and the classification label is obtained by classifying the image data to be processed through the comprehensive feature vector, so that the classification accuracy can be improved.

Drawings

FIG. 1 is a flow diagram illustrating a method for generating classification vectors according to an embodiment;

FIG. 2 is a diagram illustrating the structure of a classification network in one embodiment;

FIG. 3 is a flow diagram of a training process for a classification network in one embodiment;

FIG. 4 is a diagram illustrating an iterative process in the training process of the classification network in the embodiment shown in FIG. 3;

FIG. 5 is a block diagram of feature vectors output by multiple network heads, in one embodiment;

FIG. 6 is a flow diagram of a method of picture classification in one embodiment;

FIG. 7 is a block diagram showing the structure of a classification vector generation apparatus according to an embodiment;

FIG. 8 is a block diagram showing the structure of an image classification apparatus according to an embodiment;

FIG. 9 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In an embodiment, as shown in fig. 1, a classification vector generation method is provided, and this embodiment is illustrated by applying the method to a terminal, and it is to be understood that the method may also be applied to a server, and may also be applied to a system including a terminal and a server, and is implemented by interaction between the terminal and the server. In this embodiment, the method includes the steps of:

s102: and acquiring to-be-processed picture data, wherein the to-be-processed picture data comprises to-be-processed picture data under a plurality of rotation angles.

Specifically, the to-be-processed picture data is a picture for which the comprehensive classification vector needs to be extracted, where the to-be-processed picture data includes to-be-processed picture data at a plurality of rotation angles, specifically, the rotation angle may refer to a rotation angle of the picture, such as 0, 90, 180, 270 degrees, and in other embodiments, other rotation angles may also be used. The image data to be processed is obtained by the image data to be processed of a plurality of rotation angles together.

S104: and inputting the picture data to be processed into a classification network obtained by pre-training so as to extract the feature vectors of a plurality of rotation angles corresponding to the picture data to be processed.

Specifically, the classification network is trained in advance, and may include a feature extraction network for extracting initial classification features of the image data to be processed, and a network header for performing dimension reduction on the extracted features, as shown in fig. 2, and may include a feature extraction network, such as a BN-inclusion network, and a multi-network header for performing dimension reduction on the extracted features, where each network header of the multi-network header may include a full connection layer and a normalization layer.

The feature extraction network of the initial classification features is used for processing the image data to be processed to obtain normalized initial classification features, for example, the initial classification features are optimized to enable the distribution of the initial classification features on classification to be more stable, that is, the classification network can learn feature vectors of the image data to be processed on each rotation angle, so that no matter how the image data to be processed rotates, the classification labels corresponding to the image data to be processed can be accurately obtained. The network head is used for reducing the dimensionality of the initial classification features and facilitating the reduction of the calculation amount in the subsequent classification processing. Referring to fig. 2, in the embodiment, multiple network heads are used, when the initial classification features are provided by the feature extraction network, the initial classification features are split to obtain split vectors with the number equal to that of the network heads, the split vectors are input to the network heads to obtain output vectors with corresponding dimensions, and finally, the output vectors output by the network heads are merged and connected to obtain feature vectors with multiple rotation angles corresponding to the to-be-processed image data. The corresponding dimension of the feature vector represents the feature of the rotation angle, for example, the feature vector corresponds to a feature vector of a to-be-processed picture with a rotation angle of X degrees, and the corresponding dimension of the feature vector with the rotation angle of X degrees is obtained as a final representation of the feature vectors of a plurality of rotation angles corresponding to the to-be-processed picture data. For example, when the feature vector represents a to-be-processed picture with a rotation angle of 0 degree, a preset dimension of the feature vector is acquired, for example, the first 1/n dimension is taken as a feature corresponding to the to-be-processed picture with 0 degree, where n is the number of rotation angles, for example, 4 rotation angles exist, and the dimension of the feature vector is 512 dimensions, and then the feature corresponding to the to-be-processed picture with the first 128 dimensions of 0 degree is used for classifying the subsequent to-be-processed pictures.

The terminal inputs the image data to be processed of a plurality of rotation angles into a classification network obtained through pre-training, so that a feature vector corresponding to the image data to be processed of each rotation angle can be obtained, wherein a specific dimension in the feature vector corresponding to the image data to be processed of each rotation angle is used for representing the feature of the rotation angle. Optionally, the classification network may process the to-be-processed picture data of a plurality of rotation angles in series or in parallel.

S106: and respectively extracting preset dimensions in the feature vectors of the plurality of rotation angles according to a preset rule.

Specifically, the preset rule is an extraction rule of a preset dimension of a feature vector in each rotation angle, for example, the feature vector characterizes a feature vector corresponding to a to-be-processed picture with a rotation angle of X degrees, and then the preset dimension of the feature vector is obtained, for example, the rotation angle is n, and the X degrees are arranged at the mth, and then the preset dimension may be a dimension/n (m-1) of the feature vector to a dimension/n m of the feature vector, for example, assuming that the rotation angle is 0 degree, where there are 4 rotation angles, 0 degree is the first, and the dimension of the feature vector is 512 dimensions, and then the preset dimension in the feature vector of the rotation angle is 512/4 × 0 to 512/4 × 1, that is, 0 to 128 dimensions.

S108: and combining the preset dimensions of the extracted feature vectors of the plurality of rotation angles to obtain a comprehensive classification vector, wherein the comprehensive classification vector is used for classifying the picture data to be processed to obtain a corresponding classification label.

Specifically, after the terminal obtains the feature vectors of the plurality of rotation angles, preset dimensions in the feature vectors of the plurality of rotation angles are merged, for example, the plurality of feature vectors are sequentially connected in the order from small to large according to the rotation angles to obtain a comprehensive classification vector. For example, 0-128 dimensions are extracted as dimensions corresponding to 0 degree, 129-256 dimensions are extracted as dimensions corresponding to 90 degrees, 257-384 dimensions corresponding to 180 degrees, and 385-512 dimensions corresponding to 270 degrees, so that the dimensions corresponding to the rotation angles of 0 degree, 90 degrees, 180 degrees and 270 degrees of the same picture to be processed are combined to obtain a new 512-dimensional comprehensive classification vector.

Therefore, the subsequent terminal can classify the picture data to be processed according to the comprehensive classification vector to obtain the corresponding classification label.

According to the classification vector generation method, the feature vectors of the plurality of rotation angles corresponding to the image data to be processed are obtained through the classification network, the feature vectors of the plurality of rotation angles are combined to obtain the comprehensive classification vector, the image data to be processed is classified through the comprehensive feature vector to obtain the classification label, and the classification accuracy can be improved.

In one embodiment, inputting the image data to be processed into a classification network obtained by pre-training to extract feature vectors of a plurality of rotation angles corresponding to the image data to be processed includes: extracting an initial classification vector corresponding to the to-be-processed picture data at each rotation angle; splitting the initial classification vector into a plurality of grouping vectors corresponding to the number of the rotation angles according to a preset grouping rule; carrying out dimensionality reduction on the grouping vectors; and reversely combining the grouped vectors subjected to the dimensionality reduction processing according to a preset grouping rule to obtain the feature vectors of a plurality of rotation angles corresponding to the picture data to be processed.

Specifically, as shown in fig. 2, the terminal inputs the picture data to be processed into a classification network, the classification network first extracts an initial classification vector corresponding to each rotation angle, for example, an initial classification vector of 2048 dimensions, using a feature extraction network, and then obtains a feature vector corresponding to each rotation angle through a dimension reduction process, for example, the dimension reduction process is performed into 512 dimensions, where the initial classification vector corresponding to each rotation angle needs to be input into a multi-network header for the dimension reduction process, for example, first, according to a preset grouping rule, a plurality of splitting vectors are obtained through splitting, for example, according to the number of the rotation angle, the splitting vectors are split into a plurality of splitting vectors, for example, the splitting vectors are equal to the number of the rotation angle, for example, the dimensions of the initial classification vector are equally divided according to the number of the rotation angle, for example, the splitting is performed into 4 vectors of 512 dimensions, and then the vectors after the dimension, for example, 4 128-dimensional vectors, and then go through the normalization layer to obtain corresponding normalized vectors, for example, 4 128-dimensional vectors, and finally merge the obtained normalized vectors into feature vectors, that is, 512-dimensional feature vectors, through the connection layer, for example, cat.

In practical applications, the data of the picture to be processed is (a, b, c, d), where a represents the number of the rotation angles, b represents the number of channels, for example, the picture to be processed at each rotation angle is represented by three channels of RGB, and c and d represent the resolution of the picture to be processed. Assuming that there are 4 rotation angles, for example, 0, 90, 180, 270 degrees, each picture to be processed can be represented by (4, 3, c, d), and then input into the classification network to obtain (4,2048) an initial classification vector, wherein 4 represents the number of the rotation angles, 2048 is the dimension before dimension reduction in the above, and then the dimension reduction processing is performed on the initial classification vector to obtain the feature vector corresponding to each rotation angle, i.e. (4,512), where 512 represents the dimensionality after dimensionality reduction, where the dimensionality reduction process first splits each of the resulting initial classification vectors into 4 vectors of 512 dimensions, then, the vectors after dimensionality reduction, for example, 4 vectors of 128 dimensions, are obtained through the corresponding full connection layer, and finally, the obtained normalized vectors are merged into feature vectors, that is, feature vectors of 512 dimensions, through the connection layer, for example, cat.

Optionally, performing dimensionality reduction on the initial classification vector to obtain a feature vector corresponding to each rotation angle, including: and sequentially inputting the initial classification vectors into the full-connection layer and the normalization layer so as to perform dimensionality reduction on the initial classification vectors to obtain the feature vectors corresponding to each rotation angle.

Specifically, referring to fig. 2, the terminal inputs the initial classification vector to a full connection layer to obtain a dimensionality of 512 dimensions, that is, a dimensionality reduction vector after dimensionality reduction, and then inputs the dimensionality reduction vector to a normalization layer to obtain a feature vector. It should be noted that, for each rotation angle, a corresponding feature vector is obtained through the classification network, so that after the feature vector is obtained, the feature vectors are combined to obtain a comprehensive feature vector, and classification processing is performed based on the comprehensive feature vector to obtain a classification label corresponding to the to-be-processed image data.

In the above embodiment, the features are extracted through the feature extraction network, and then the dimensionality reduction processing is performed, so that the dimensionality of the subsequent comprehensive feature vector is not too high, and the efficiency of the subsequent classification process is improved.

In one embodiment, before inputting the image data to be processed into a classification network obtained by pre-training to extract feature vectors of a plurality of rotation angles corresponding to the image data to be processed, the method further includes: splitting the picture data to be processed to obtain the picture data to be processed under a plurality of rotation angles; and merging the to-be-processed picture data under the plurality of rotation angles in the dimension of batch size.

In particular, the picture data to be processed may be obtained from a database, wherein it merges a plurality of rotation angles together, for example, (n, m, c, d), wherein m is the number of channels, n is the number of pictures to be processed, i.e. the batch size dimension, for example, the number of rotation angles is a, then m is a 3, for example, when a is 4, then m is 12, that is to say 0 to 12, and each 3 channels represents one rotation angle.

Therefore, during processing, the terminal splits the to-be-processed picture data to obtain the to-be-processed picture data at multiple rotation angles, for example, splits the to-be-processed picture data according to the channel dimension to obtain the to-be-processed picture data at multiple rotation angles, that is, splits the to-be-processed picture data to obtain a pieces of fractional data with the size of (n, m/3, c, d), and then merges the to-be-processed picture data at multiple rotation angles in the dimension of the batch size, that is, (a x n, m/3, c, d).

In the above embodiment, before the image data to be processed is input into the classification network, the image data to be processed is processed to obtain the image data to be processed at a plurality of rotation angles, so that a foundation is laid for subsequent processing.

In one embodiment, referring to fig. 3, fig. 3 is a flowchart of a training process of a classification network in an embodiment, and fig. 4 is a schematic diagram of an iteration process in the training process of the classification network in the embodiment shown in fig. 3, in this embodiment, a training mode of the classification network obtained by pre-training includes:

s302: and acquiring sample picture data, wherein the sample picture data comprises sample picture data at a plurality of rotation angles and corresponding sample labels.

Specifically, the sample picture data is collected in advance, wherein the sample picture data includes sample picture data at a plurality of rotation angles, such as 0, 90, 180, 270 degrees, and the like, above. The sample label refers to the class to which the sample belongs. In practical applications, the sample picture data can be represented by (BS, m, c, d), where BS represents the size of each batch input to the network, and can also be simply understood as the number of sample pictures involved, as a super parameter for setting. m represents the number of channels of the sample picture data, ranging from 0 to 3x, i.e. each 3 channels represents a rotation angle. c and d represent the resolution of the sample picture. Based on this data structure, the terminal splits the sample picture data into 4 pieces of fractional data of size (BS, 3, c, d) each representing one rotation angle of the same batch of pictures, before proceeding. Then, the terminal combines the 4 pieces of data in the first dimension, resulting in data of size (4 × BS, 3, c, d).

S304: and inputting the sample picture data into an initial classification network to extract the feature vectors of a plurality of rotation angles corresponding to the sample picture data.

Specifically, the structure of the initial classification network may be as shown in fig. 2, where the terminal inputs the sample picture data into the initial classification network, so as to extract the feature vectors of a plurality of rotation angles corresponding to the sample picture data, for example, the data of (4 × BS, 3, c, d) is input into the initial classification network to obtain the feature vector output of (4 × BS, 512).

S306: and respectively extracting preset dimensions in the feature vectors of the plurality of rotation angles corresponding to the sample picture data according to a preset rule.

The terminal can preliminarily and initially classify the rotation angle corresponding to the feature vector output by the network, and then extract the preset dimension corresponding to the rotation angle. For example, the initial classification network outputs (4 × BS,512) feature vectors, where 4 represents 4 rotation angles, and the rotation angles of the output feature vectors can be determined according to the feature vectors, and then the corresponding preset dimensions are obtained according to a preset rule. For example, 0-128 represents 0 degree feature, 128-256 represents 90 degree feature, 256-384 represents 180 degree feature, and 384-512 represents 270 degree feature.

S308: and respectively calculating an angle loss function corresponding to each rotation angle according to the sample label and preset dimensionality in the feature vectors of the plurality of rotation angles corresponding to the sample picture data.

S310: and calculating according to the angle loss function to obtain a comprehensive loss function.

S312: and carrying out gradient back transmission through the comprehensive loss function to train the initial classification network until the change of the comprehensive loss function is less than a preset value, and finishing the training of the initial classification network.

Specifically, when calculating the loss function, the terminal first splits the obtained feature vector into feature vectors of a number corresponding to the number of rotation angles according to the above merging method, for example, splits the feature vectors into 4 sub-feature vectors of size (BS, 512), and a preset dimension of each sub-feature vector represents a feature vector extracted by the classification network from the same batch of pictures at one angle. And the terminals are respectively put into the MS-loss to calculate the loss, so as to obtain four angle loss functions (l1, l2, l3 and l4), the four angle loss functions are used for calculating to obtain a comprehensive loss function, and the comprehensive loss function is subjected to back propagation, so that the one-step updating of the classified network training can be completed.

Specifically, referring to fig. 5, assuming that BS is 2 and the output dimension is 8, the corresponding 4 × BS vector is as shown in fig. 5, that is, the structure of the feature vector output by the multi-network head, and each part only takes the output of the corresponding preset dimension to calculate the angle loss function, for example, takes 0 to 128 dimensions to calculate the angle loss function corresponding to 0 degree feature, takes 128 to 256 dimensions to calculate the angle loss function corresponding to 90 degree feature, takes 256 to 384 dimensions to calculate the angle loss function corresponding to 180 degree feature, and takes 384 to 512 dimensions to calculate the angle loss function corresponding to 270 degree feature.

Optionally, the calculating a synthetic loss function according to the angle loss function includes: and calculating the average value corresponding to the angle loss function as a comprehensive loss function.

And the terminal repeats the steps until the classification network converges, the classification network training is finished, and the condition of the classification network convergence can be that the change of the comprehensive loss function is less than a preset value.

After the classification network is trained, the terminal may further perform a test on the classification network, for example, set a comparison model, where the comparison model uses the same hyper-parameter as the classification network, that is, the batch size is the same as the batch size during training, and the used training data is a data set that is not rotated or may be regarded as a rotation angle of 0 degree, and then test the recall rate performance of the terminal at the different rotation angles through the comparison model.

In addition, when the classified network is tested, the terminal also tests the recall rate under different rotation angles, combines the feature vectors extracted by the network under different rotation angles together to obtain a vector (BS, 2048), calculates the recall rate according to the combined vector, combines the feature vectors under different rotation angles according to the group model, and tests to obtain the recall rate.

And judging the recall rates of the classification network corresponding to the corresponding rotation angle and the merged vector and the comparison group model according to the test, and if the recall rate of the classification network is greater than that of the comparison group model, indicating that the training is successful.

In one embodiment, referring to fig. 6, fig. 6 is a flowchart of a picture classification method in an embodiment, in which the picture classification method may include:

s602: and generating a comprehensive classification vector by using the classification vector generation method in any embodiment.

Specifically, the terminal may obtain the image data to be processed first, and then generate the comprehensive classification vector by the classification vector generation method, which may be referred to above specifically and is not described herein again.

S604: the comprehensive classification vector is matched with a standard vector in a database.

Specifically, the database stores standard vectors, that is, the database stores each classification tag, and provides a standard vector corresponding to the classification tag, so that the terminal matches the comprehensive classification vector with the standard vector, for example, calculates a distance between the comprehensive classification vector and each standard vector.

S606: and when the comprehensive classification vector is successfully matched with the standard vector in the database, acquiring a classification label corresponding to the successfully matched standard vector as a classification label of the picture data to be processed.

Specifically, the terminal matches the comprehensive classification vector with the standard vector, for example, the standard vector with the minimum distance from the comprehensive classification vector may be acquired as a successfully matched standard vector, and then the classification label corresponding to the successfully matched standard vector is used as the classification label of the to-be-processed image data.

In the above embodiment, the feature vectors of the plurality of rotation angles corresponding to the image data to be processed are obtained through the classification network, the feature vectors of the plurality of rotation angles are combined to obtain the comprehensive classification vector, and the classification label is obtained by classifying the image data to be processed through the comprehensive feature vector, so that the classification accuracy can be improved.

It should be understood that although the steps in the flowcharts of fig. 1, 3 and 6 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1, 3 and 6 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least some of the other steps.

In one embodiment, as shown in fig. 7, there is provided a classification vector generation apparatus including: the data acquisition module 100, the multi-angle feature generation module 200, the preset dimension extraction module 300 and the classification vector generation module 400, wherein:

the data acquisition module 100 is configured to acquire to-be-processed picture data, where the to-be-processed picture data includes to-be-processed picture data at multiple rotation angles;

the multi-angle feature generation module 200 is configured to input the to-be-processed picture data into a classification network obtained through pre-training, so as to extract feature vectors of a plurality of rotation angles corresponding to the to-be-processed picture data;

the preset dimension extraction module 300 is configured to extract preset dimensions in the feature vectors of the plurality of rotation angles according to a preset rule;

the classification vector generation module 400 is configured to combine preset dimensions of the extracted feature vectors of the multiple rotation angles to obtain a comprehensive classification vector, where the comprehensive classification vector is used to classify the image data to be processed to obtain a corresponding classification label.

In one embodiment, the multi-angle feature generation module 200 may include:

the extraction unit is used for extracting an initial classification vector corresponding to the to-be-processed picture data under each rotation angle;

the grouping unit is used for splitting the initial classification vector into a plurality of grouping vectors corresponding to the number of the rotation angles according to a preset grouping rule;

the dimensionality reduction unit is used for carrying out dimensionality reduction processing on the grouping vectors;

and the output unit is used for reversely combining the grouped vectors subjected to the dimensionality reduction processing according to a preset grouping rule to obtain the feature vectors of a plurality of rotation angles corresponding to the image data to be processed.

In one embodiment, the dimension reduction unit is further configured to sequentially input the packet vector to the full-connection layer and the normalization layer to perform dimension reduction processing on the packet vector.

In one embodiment, the above classification vector generating apparatus may further include:

the splitting module is used for splitting the picture data to be processed to obtain the picture data to be processed under a plurality of rotation angles;

and the merging module is used for merging the to-be-processed picture data under the plurality of rotation angles in the dimension of the batch size.

In one embodiment, the splitting module is further configured to split the to-be-processed picture data according to the channel dimension to obtain the to-be-processed picture data at multiple rotation angles.

In one embodiment, the multi-angle feature generation module 200 may include:

a sample picture data obtaining module 100, configured to obtain sample picture data, where the sample picture data includes sample picture data at multiple rotation angles and corresponding sample labels;

the input module is used for inputting the sample picture data into the initial classification network so as to extract the feature vectors of a plurality of rotation angles corresponding to the sample picture data;

the training extraction module is used for respectively extracting preset dimensions in the feature vectors of the plurality of rotation angles corresponding to the sample picture data according to preset rules;

the angle loss function calculation module is used for calculating an angle loss function corresponding to each rotation angle according to preset dimensions in the feature vectors of the plurality of rotation angles corresponding to the sample label and the sample picture data;

the comprehensive loss function calculation module is used for calculating according to the angle loss function to obtain a comprehensive loss function;

and the training module is used for carrying out gradient return through the comprehensive loss function so as to train the initial classification network, and the training of the initial classification network is finished until the variation of the comprehensive loss function is smaller than a preset value.

In one embodiment, the above-mentioned integrated loss function calculating module is further configured to calculate an average value corresponding to the angle loss function as the integrated loss function.

In one embodiment, as shown in fig. 8, there is provided a picture classification apparatus including: comprehensive classification vector generation module 500, matching module 600 and classification label acquisition module 700, wherein:

a comprehensive classification vector generation module 500, configured to generate a comprehensive classification vector by using the classification vector generation method in any of the embodiments described above;

a matching module 600, configured to match the comprehensive classification vector with a standard vector in a database;

and the classification label obtaining module 700 is configured to, when the comprehensive classification vector is successfully matched with the standard vector in the database, obtain a classification label corresponding to the successfully matched standard vector as a classification label of the to-be-processed picture data.

For the specific definition of the classification vector generation device, reference may be made to the above definition of the classification vector generation method, which is not described herein again. The modules in the classification vector generation device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 9. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a classification vector generation method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program: acquiring picture data to be processed, wherein the picture data to be processed comprises the picture data to be processed under a plurality of rotation angles; inputting the picture data to be processed into a classification network obtained by pre-training so as to extract the feature vectors of a plurality of rotation angles corresponding to the picture data to be processed; respectively extracting preset dimensions in the feature vectors of the plurality of rotation angles according to a preset rule; and combining the preset dimensions of the extracted feature vectors of the plurality of rotation angles to obtain a comprehensive classification vector, wherein the comprehensive classification vector is used for classifying the picture data to be processed to obtain a corresponding classification label.

In one embodiment, the inputting of the to-be-processed picture data into a classification network obtained by pre-training to extract feature vectors of a plurality of rotation angles corresponding to the to-be-processed picture data, which is involved in the execution of the computer program by the processor, includes: extracting an initial classification vector corresponding to the to-be-processed picture data at each rotation angle; splitting the initial classification vector into a plurality of grouping vectors corresponding to the number of the rotation angles according to a preset grouping rule; carrying out dimensionality reduction on the grouping vectors; and reversely combining the grouped vectors subjected to the dimensionality reduction processing according to a preset grouping rule to obtain the feature vectors of a plurality of rotation angles corresponding to the picture data to be processed.

In one embodiment, the dimensionality reduction of the packet vector involved in execution of the computer program by the processor comprises: and sequentially inputting the grouping vectors into the full-connection layer and the normalization layer to perform dimensionality reduction on the grouping vectors.

In one embodiment, before the inputting the to-be-processed picture data into the classification network obtained by training in advance to extract the feature vectors of the plurality of rotation angles corresponding to the to-be-processed picture data, the method further includes: splitting the picture data to be processed to obtain the picture data to be processed under a plurality of rotation angles; and merging the to-be-processed picture data under the plurality of rotation angles in the dimension of batch size.

In one embodiment, the splitting of the to-be-processed picture data involved in the execution of the computer program by the processor to obtain the to-be-processed picture data at a plurality of rotation angles includes: and splitting the picture data to be processed according to the channel dimension to obtain the picture data to be processed under a plurality of rotation angles.

In one embodiment, the training of the pre-trained classification network involved in the execution of the computer program by the processor comprises: acquiring sample picture data, wherein the sample picture data comprises sample picture data at a plurality of rotation angles and corresponding sample labels; inputting the sample picture data into an initial classification network to extract the feature vectors of a plurality of rotation angles corresponding to the sample picture data; respectively extracting preset dimensions in the feature vectors of the plurality of rotation angles corresponding to the sample picture data according to a preset rule; respectively calculating an angle loss function corresponding to each rotation angle according to the sample label and preset dimensions in the feature vectors of the plurality of rotation angles corresponding to the sample picture data; calculating according to the angle loss function to obtain a comprehensive loss function; and carrying out gradient back transmission through the comprehensive loss function to train the initial classification network until the change of the comprehensive loss function is less than a preset value, and finishing the training of the initial classification network.

In one embodiment, the calculation of the synthetic loss function from the angular loss function involved in the execution of the computer program by the processor comprises: and calculating the average value corresponding to the angle loss function as a comprehensive loss function.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program: generating a comprehensive classification vector by using a classification vector generation method in any embodiment; matching the comprehensive classification vector with a standard vector in a database; and when the comprehensive classification vector is successfully matched with the standard vector in the database, acquiring a classification label corresponding to the successfully matched standard vector as a classification label of the picture data to be processed.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring picture data to be processed, wherein the picture data to be processed comprises the picture data to be processed under a plurality of rotation angles; inputting the picture data to be processed into a classification network obtained by pre-training so as to extract the feature vectors of a plurality of rotation angles corresponding to the picture data to be processed; respectively extracting preset dimensions in the feature vectors of the plurality of rotation angles according to a preset rule; and combining the preset dimensions of the extracted feature vectors of the plurality of rotation angles to obtain a comprehensive classification vector, wherein the comprehensive classification vector is used for classifying the picture data to be processed to obtain a corresponding classification label.

In one embodiment, the inputting of the to-be-processed picture data into a classification network trained in advance to extract feature vectors of a plurality of rotation angles corresponding to the to-be-processed picture data, when the computer program is executed by the processor, includes: extracting an initial classification vector corresponding to the to-be-processed picture data at each rotation angle; splitting the initial classification vector into a plurality of grouping vectors corresponding to the number of the rotation angles according to a preset grouping rule; carrying out dimensionality reduction on the grouping vectors; and reversely combining the grouped vectors subjected to the dimensionality reduction processing according to a preset grouping rule to obtain the feature vectors of a plurality of rotation angles corresponding to the picture data to be processed.

In one embodiment, the computer program, when executed by a processor, involves performing dimensionality reduction on a packet vector, comprising: and sequentially inputting the grouping vectors into the full-connection layer and the normalization layer to perform dimensionality reduction on the grouping vectors.

In one embodiment, before the step of inputting the to-be-processed picture data into a classification network trained in advance to extract the feature vectors of the plurality of rotation angles corresponding to the to-be-processed picture data, when the computer program is executed by the processor, the method further includes: splitting the picture data to be processed to obtain the picture data to be processed under a plurality of rotation angles; and merging the to-be-processed picture data under the plurality of rotation angles in the dimension of batch size.

In one embodiment, splitting the to-be-processed picture data to obtain the to-be-processed picture data at a plurality of rotation angles when the computer program is executed by the processor includes: and splitting the picture data to be processed according to the channel dimension to obtain the picture data to be processed under a plurality of rotation angles.

In one embodiment, the manner in which the computer program is trained to pre-train the derived classification network when executed by the processor includes: acquiring sample picture data, wherein the sample picture data comprises sample picture data at a plurality of rotation angles and corresponding sample labels; inputting the sample picture data into an initial classification network to extract the feature vectors of a plurality of rotation angles corresponding to the sample picture data; respectively extracting preset dimensions in the feature vectors of the plurality of rotation angles corresponding to the sample picture data according to a preset rule; respectively calculating an angle loss function corresponding to each rotation angle according to the sample label and preset dimensions in the feature vectors of the plurality of rotation angles corresponding to the sample picture data; calculating according to the angle loss function to obtain a comprehensive loss function; and carrying out gradient back transmission through the comprehensive loss function to train the initial classification network until the change of the comprehensive loss function is less than a preset value, and finishing the training of the initial classification network.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: generating a comprehensive classification vector by using a classification vector generation method in any embodiment; matching the comprehensive classification vector with a standard vector in a database; and when the comprehensive classification vector is successfully matched with the standard vector in the database, acquiring a classification label corresponding to the successfully matched standard vector as a classification label of the picture data to be processed.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for generating a classification vector, the method comprising:

2. The method according to claim 1, wherein the inputting the to-be-processed picture data into a classification network trained in advance to extract feature vectors of a plurality of rotation angles corresponding to the to-be-processed picture data comprises:

performing dimensionality reduction processing on the grouping vector;

3. The method of claim 2, wherein the performing the dimensionality reduction on the grouped vector comprises:

4. The method according to any one of claims 1 to 3, wherein before inputting the to-be-processed picture data into a classification network obtained by pre-training to extract the feature vectors of a plurality of rotation angles corresponding to the to-be-processed picture data, the method further comprises:

5. The method according to claim 4, wherein the splitting the to-be-processed picture data to obtain the to-be-processed picture data at a plurality of rotation angles includes:

6. The method according to any one of claims 1 to 3, wherein the training mode of the classification network obtained by pre-training comprises:

7. The method of claim 6, wherein said calculating a composite loss function from said angular loss function comprises:

8. A method for classifying pictures, the method comprising:

generating a composite classification vector by the classification vector generation method of any one of claims 1 to 7;

9. An apparatus for generating a classification vector, the apparatus comprising:

10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7 or 8.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7 or 8.