CN113515983A

CN113515983A - Model training method, mobile object identification method, device and equipment

Info

Publication number: CN113515983A
Application number: CN202010568176.2A
Authority: CN
Inventors: 王弘烈; 郭莉琳; 周橹楠; 邓兵
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-06-19
Filing date: 2020-06-19
Publication date: 2021-10-19

Abstract

The embodiment of the invention provides a model training method, a mobile object identification method, a device and equipment. The method comprises the following steps: acquiring a training image comprising a preset object and identity information corresponding to the preset object; processing the training image by using a first machine learning model to obtain a first training feature corresponding to the training image; processing the training image by using a second machine learning model to obtain a second training feature corresponding to the training image, wherein the first machine learning model is different from the second machine learning model, and the first training feature is different from the second training feature; and performing learning training based on the first training characteristic, the second training characteristic and the identity information corresponding to the preset object to obtain a target model for identifying the identity information of the preset object in the image. The target model is obtained through the learning training of the multi-dimensional characteristic information, and the image is effectively identified through the target model, so that the accuracy and reliability of the use of the target model are effectively guaranteed.

Description

Model training method, mobile object identification method, device and equipment

Technical Field

The invention relates to the technical field of data processing, in particular to a model training method, a mobile object identification device and mobile object identification equipment.

Background

With the rapid development of economy, the application of vehicles is more and more popular, and the identification and management of vehicles are more and more important. At present, the identification mode of the vehicle generally comprises: visual features (e.g., color, appearance, brand, etc.) of the target vehicle are extracted, and then the extracted visual features of the vehicle are used to locate the vehicle in the database that is most visually similar to the target vehicle.

However, since the vehicle images captured by different cameras of the same vehicle at different times may vary greatly according to different camera parameters (e.g., resolution, viewing angle, height, etc.) and different environmental conditions (e.g., illumination, vehicle speed, weather, etc.), and different vehicles may have very similar colors and shapes, especially vehicles produced by the same automobile manufacturer may be more easily confused. Thus, relying on visual features alone is not sufficient to achieve accurate and efficient identification of vehicles.

Disclosure of Invention

The embodiment of the invention provides a model training method, a moving object identification method, a device and equipment, a target model is obtained through the learning and training of multi-dimensional characteristic information, and then the moving object can be quickly and accurately identified by utilizing the target model under different camera shooting parameters and environmental conditions, so that the moving object can be conveniently managed based on an identification result.

In a first aspect, an embodiment of the present invention provides a model training method, including:

acquiring a training image comprising a preset object and identity information corresponding to the preset object;

processing the training image by using a first machine learning model to obtain a first training feature corresponding to the training image;

processing the training image by using a second machine learning model to obtain a second training feature corresponding to the training image, wherein the first machine learning model is different from the second machine learning model, and the first training feature is different from the second training feature;

and performing learning training based on the first training characteristic, the second training characteristic and the identity information corresponding to the preset object to obtain a target model for identifying the identity information of the preset object in the image.

In a second aspect, an embodiment of the present invention provides a model training apparatus, including:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a training image comprising a preset object and identity information corresponding to the preset object;

the first processing module is used for processing the training image by utilizing a first machine learning model to obtain a first training feature corresponding to the training image;

the first processing module is configured to process the training image by using a second machine learning model to obtain a second training feature corresponding to the training image, where the first machine learning model is different from the second machine learning model, and the first training feature is different from the second training feature;

and the first training module is used for performing learning training based on the first training characteristic, the second training characteristic and the identity information corresponding to the preset object to obtain a target model for identifying the identity information of the preset object in the image.

In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the model training method of the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer storage medium for storing a computer program, where the computer program is used to make a computer implement the model training method in the first aspect when executed.

In a fifth aspect, an embodiment of the present invention provides a method for identifying a moving object, including:

acquiring an image to be processed and a plurality of reference images, wherein the image to be processed comprises a moving object to be identified, and the reference images comprise a plurality of reference moving objects;

determining a feature to be analyzed corresponding to the image to be processed and a reference feature corresponding to the reference image;

and analyzing the features to be analyzed and the reference features by utilizing a machine learning model in the plurality of reference moving objects, and determining a target moving object corresponding to the moving object to be recognized, wherein the machine learning model is trained to recognize the identity information of the moving object in the image based on the image features.

In a sixth aspect, an embodiment of the present invention provides a mobile object identification apparatus, including:

the second acquisition module is used for acquiring an image to be processed and a plurality of reference images, wherein the image to be processed comprises a moving object to be identified, and the reference images comprise a plurality of reference moving objects;

a second determining module, configured to determine a feature to be analyzed corresponding to the image to be processed and a reference feature corresponding to the reference image;

and the second processing module is used for analyzing and processing the features to be analyzed and the reference features in the plurality of reference moving objects by utilizing a machine learning model, and determining a target moving object corresponding to the moving object to be recognized, wherein the machine learning model is trained to be used for recognizing the identity information of the moving object in the image based on the image features.

In a seventh aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the moving object identification method of the fifth aspect.

In an eighth aspect, an embodiment of the present invention provides a computer storage medium for storing a computer program, where the computer program is used to enable a computer to execute the method for identifying a moving object in the fifth aspect.

According to the technical scheme provided by the embodiment, the training image comprising the preset object and the identity information corresponding to the preset object are obtained, the training image is processed by using the first machine learning model and the second machine learning model, the first training feature and the second training feature corresponding to the training image are obtained, the first training feature and the second training feature are feature information with different dimensions corresponding to the training image, and then learning training is carried out on the basis of the first training feature, the second training feature and the identity information corresponding to the preset object to obtain the target model for identifying the identity information of the vehicle in the image, so that the target model is effectively trained on the basis of the feature information with multiple dimensions in the image, and can be rapidly realized under different shooting parameters and environmental conditions, The mobile object is accurately identified, so that the mobile object is conveniently managed based on the identification result, and the practicability of the method is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a model training method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a model training method according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of another model training method according to an embodiment of the present invention;

fig. 4 is a schematic flowchart of a process of processing the training image by using a first machine learning model to obtain a first training feature corresponding to the training image according to an embodiment of the present invention;

fig. 5 is a schematic flowchart of a process of obtaining a first training feature corresponding to the training image based on the feature weight coefficient and the training image according to an embodiment of the present invention;

fig. 6 is a schematic flowchart of a process of processing the training image by using a second machine learning model to obtain a second training feature corresponding to the training image according to an embodiment of the present invention;

fig. 7 is a schematic flowchart of a process of performing learning training based on the first training feature, the second training feature, and the identity information corresponding to the preset object to obtain a target model for identifying the identity information of the preset object in the image according to the embodiment of the present invention;

FIG. 8 is a schematic flow chart illustrating a further method for training a model according to an embodiment of the present invention;

fig. 9 is a schematic flowchart of a moving object identification method according to an embodiment of the present invention;

fig. 10 is a schematic flowchart of a process of determining a target moving object corresponding to the moving object to be recognized by analyzing the feature to be analyzed and the reference feature with a machine learning model in the plurality of reference moving objects according to the embodiment of the present invention;

fig. 11 is a schematic flowchart of determining a feature to be analyzed corresponding to the image to be processed according to an embodiment of the present invention;

FIG. 12 is a flowchart illustrating an image recognition method according to an embodiment of the present invention;

fig. 13 is a schematic flowchart of an image recognition method according to an embodiment of the present invention;

FIG. 14 is a schematic structural diagram of a model training apparatus according to an embodiment of the present invention;

FIG. 15 is a schematic structural diagram of an electronic device corresponding to the model training apparatus provided in the embodiment shown in FIG. 14;

fig. 16 is a schematic structural diagram of a moving object recognition apparatus according to an embodiment of the present invention;

fig. 17 is a schematic structural diagram of an electronic device corresponding to the moving object recognition apparatus provided in the embodiment shown in fig. 16.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, and "a" and "an" generally include at least two, but do not exclude at least one, unless the context clearly dictates otherwise.

It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element.

In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.

In order to facilitate understanding of the technical solution in this embodiment, the following briefly describes related technologies:

with the rapid development of economy, the holding capacity of urban automobiles is increased day by day, which also brings great challenges to traffic management and public safety maintenance. In terms of maintaining vehicle safety, vehicle retrieval operation of a specific vehicle is often required, the most extensive vehicle identification method in the prior art is realized based on a license plate, however, when license plate information cannot be identified, the vehicle is not provided with a license plate, the vehicle is in a fake license plate, and the like, the vehicle identification operation cannot be accurately performed.

In order to solve the above-mentioned situation that the vehicle identification operation cannot be accurately performed, the following vehicle identification methods are proposed in the prior art:

(1) the vehicle identification method based on the space-time association is characterized in that space-time constraint information and vehicle visual information of a vehicle are acquired through a camera, and then vehicle identification operation is carried out based on the space-time constraint information and the vehicle visual information. However, when the time-space constraint information and the vehicle visual information of the vehicle are obtained, if the latitude and longitude calibration of the point location where the camera is located is not accurate or the camera clock is not calibrated, the obtained time-space constraint information and the vehicle visual information are easily inaccurate, and thus the accurate vehicle identification operation cannot be realized.

(2) A vehicle identification method based on vehicle local marking. However, the vehicle identification method requires a large amount of manual labeling operation, and has low data processing efficiency and no universality.

(3) A vehicle identification method based on global features of a vehicle picture is provided. However, this method cannot distinguish between slightly different vehicles, such as: the recognition accuracy rate of vehicles with the same type and color is not high.

In order to solve the above technical problem, the present application provides a model training method, a moving object recognition method, a device and an apparatus, the method obtains a training image including a preset object and identity information corresponding to the preset object, and processes the training image by using a first machine learning model and a second machine learning model to obtain a first training feature and a second training feature corresponding to the training image, since the first training feature and the second training feature are feature information of different dimensions corresponding to the training image, and then performs learning training based on the first training feature, the second training feature and the identity information corresponding to the preset object to obtain a target model for recognizing identity information of a vehicle in the image, thereby effectively realizing that the target model is trained based on multi-dimensional feature information in the image, the target model can rapidly and accurately identify the moving object under different camera parameters and environmental conditions, so that the moving object (such as an object which can be managed, for example, a vehicle) can be managed based on the identification result, and the practicability of the method is further improved.

Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The features of the embodiments and examples described below may be combined with each other without conflict between the embodiments.

Fig. 1 is a schematic flow chart of a model training method according to an embodiment of the present invention; FIG. 2 is a schematic diagram of a model training method according to an embodiment of the present invention; referring to fig. 1-2, the present embodiment provides a model training method, the execution subject of which may be a model training apparatus, it being understood that the model training apparatus may be implemented as software, or a combination of software and hardware. Specifically, the model training method may include:

step S101: the method comprises the steps of obtaining a training image comprising a preset object and identity information corresponding to the preset object.

Step S102: the training images are processed by the first machine learning model, and first training features corresponding to the training images are obtained.

Step S103: and processing the training image by using a second machine learning model to obtain a second training characteristic corresponding to the training image, wherein the first machine learning model is different from the second machine learning model, and the first training characteristic is different from the second training characteristic.

Step S104: and performing learning training based on the first training characteristic, the second training characteristic and the identity information corresponding to the preset object to obtain a target model for identifying the identity information of the preset object in the image.

The following is a detailed description of the above steps:

The preset object may be any object corresponding to the standard identity information, and in different application scenarios, the preset object may have different expression forms, for example: in the application of road traffic management, the preset object may be a vehicle, and at this time, the standard identity information corresponding to the vehicle may be license plate information for identifying the identity of the vehicle. In addition, the preset objects can include moving objects and non-moving objects, and the moving objects can include moving vehicles, moving robots, moving drones, moving characters or moving animals, and the like; the non-moving objects may include: apparel, accessories, buildings, and the like.

In addition, the training image may be an image directly acquired by an image sensor, or may also be a CNN feature map obtained by analyzing and processing an acquired image through a Convolutional Neural Network (CNN). Specifically, those skilled in the art may set the configuration according to specific application requirements and design requirements, which are not described herein again.

In addition, the number of the training images is one or more, preferably, the acquired training images are a plurality of images, and when the preset object includes a moving vehicle, the plurality of training images may refer to the information of the running images of the same moving vehicle on different roads at different times. In addition, the embodiment does not limit the specific implementation manner of acquiring the training image, and a person skilled in the art may set the method according to specific application requirements, for example: the model training device can be in communication connection with an image acquisition device (for example, a camera on a road and a camera in a preset space), images can be acquired through the image acquisition device, and then the image acquisition device can send the acquired images to the model training device, so that the model training device can acquire training images including preset objects. Or the image acquisition device can acquire the image, then the image can be stored in the preset area, and the model training device can acquire the image by accessing the preset area.

In some examples, a plurality of objects and a plurality of first images corresponding to the plurality of objects are preset, the preset object may be any one of the plurality of objects, and the training image including the preset object may be at least a part of the plurality of first images. At this time, acquiring the training image including the preset object in this embodiment may include: acquiring a plurality of first images corresponding to a plurality of objects, wherein the plurality of objects comprise preset objects; and clustering the plurality of first images to obtain at least one image corresponding to the preset object.

The method comprises the steps that a plurality of first images corresponding to a plurality of objects can be stored in a preset area, and the plurality of first images corresponding to the plurality of objects can be acquired by accessing the preset area; alternatively, the plurality of first images corresponding to the plurality of objects may be acquired by an image acquisition device, for example: when the preset object is a vehicle, the image acquisition device may be applied to an application scene of road monitoring, and when the image acquisition device is a camera located on a road, the plurality of first images may be all vehicle-passing images on all roads acquired by the camera within a preset time period.

After a plurality of first images corresponding to a plurality of objects are acquired, clustering processing may be performed on the plurality of first images. For convenience of understanding, a vehicle is taken as a preset object for example, the plurality of first images may be clustered based on license plate information, or the plurality of first images may also be clustered based on preset image feature information, or the plurality of first images may also be clustered according to a preset clustering rule, so that at least one vehicle training image corresponding to the vehicle may be obtained.

For example, the plurality of first images corresponding to the plurality of vehicles exist as follows: an image a corresponding to the vehicle a, an image B corresponding to the vehicle B, an image C corresponding to the vehicle C, an image D corresponding to the vehicle a, an image E corresponding to the vehicle a, an image F corresponding to the vehicle B, and an image G corresponding to the vehicle C, then after acquiring the plurality of first images corresponding to the plurality of vehicles, the plurality of first images may be subjected to clustering processing, so that at least one vehicle training image (including: the image a, the image D, and the image E) corresponding to the vehicle a, at least one vehicle training image (including: the image B and the image F) corresponding to the vehicle B, and at least one vehicle training image (including: the image C and the image G) corresponding to the vehicle C may be obtained. For the vehicle a, the image D, and the image E may refer to image information acquired for the vehicle a at different times and on different roads; similarly, for vehicle B, image B and image F may refer to image information collected for vehicle B at different times and on different roads; for vehicle C, image C and image G may refer to image information captured for vehicle C at different times and on different roads.

It is understood that the preset vehicle may be at least one of a plurality of vehicles, for example, when the vehicle a is the preset vehicle, then the at least one vehicle training image corresponding to the preset vehicle may include the image a, the image D and the image E. When the vehicle B is a preset vehicle, then the at least one vehicle training image corresponding to the preset vehicle may include image B and image F. When the vehicle C is a preset vehicle, then the at least one vehicle training image corresponding to the preset vehicle may include image C and image G.

By acquiring a plurality of first images corresponding to a plurality of moving objects and then clustering the plurality of first images, training images including preset objects are obtained, so that the quality and efficiency of analyzing and processing the images can be improved, and at least one training image can comprise images in different shooting angles and different shooting environments due to the fact that the preset objects correspond to at least one training image, so that the quality and efficiency of learning and training a target model can be effectively improved, and the accuracy and reliability of data processing by using the target model are further improved.

Specifically, the first machine learning model may include an attention mechanism, and after the training image is acquired, the training image may be analyzed by using the attention mechanism to obtain a first training feature corresponding to the training image, where the obtained first training feature may include a local feature and a global feature of the training image.

The second machine learning model is different from the first machine learning model, and the second machine learning model is a model trained in advance for extracting image features, and in particular, the second machine learning model may include a perspective transformation sub-network (PTN module) composed of a convolutional neural network, a residual sub-network, and a transposed convolution sub-network, where the perspective transformation sub-network is used for performing perspective transformation processing on an image, the residual sub-network is used for extracting residual features of the image, the transposed convolution sub-network is used for obtaining weight coefficients of local responses of the residual features, and the obtained weight coefficients of local responses of the search features are used for realizing extraction operation of the second training features.

After the training image is acquired, the training image may be analyzed by using the second machine learning model, so that a second training feature corresponding to the training image may be obtained. It will be appreciated that, since the first machine learning model is different from the second machine learning model, the obtained second training features are different from the first training features, and the obtained second training features may include local features and global features of the training image.

After the first training feature, the second training feature and the identity information of the preset object are obtained, learning training can be performed based on the first training feature, the second training feature and the identity information corresponding to the preset object, specifically, the first training feature, the second training feature and the identity information corresponding to the preset object can be input into a classifier for learning training, then calculation is performed through a cross entropy loss function to obtain a loss value of model learning training, then training parameters of a model are updated based on the obtained loss value until the obtained loss value meets a preset requirement, a target model for identifying the identity information of the preset object in an image can be obtained, and after the target model is obtained, identity identification operation can be performed on the image to be analyzed by using the target model.

In the model training method provided by this embodiment, a training image including a preset object and identity information corresponding to the preset object are obtained, the training image is processed by using a first machine learning model and a second machine learning model, a first training feature and a second training feature corresponding to the training image are obtained, the first training feature and the second training feature are feature information of different dimensions corresponding to the training image, and then learning training is performed based on the first training feature, the second training feature and the identity information corresponding to the preset object, so as to obtain a target model for identifying identity information of a vehicle in the image, thereby effectively realizing that the target model is trained based on multi-dimensional feature information in the image, so that the target model can be trained under different imaging parameters and environmental conditions, the method realizes the rapid and accurate identification of the mobile object, thereby facilitating the management of the mobile object based on the identification result and further improving the practicability of the method.

FIG. 3 is a schematic flow chart of another model training method according to an embodiment of the present invention; on the basis of the foregoing embodiment, with continuing reference to fig. 3, the number of the training images may be multiple, and after the training image including the preset object is acquired, the method in this embodiment may further include:

step S301: and carrying out pixel normalization processing on the plurality of training images to obtain a plurality of intermediate images corresponding to the plurality of training images.

Step S302: the plurality of intermediate images are subjected to color gamut conversion processing to obtain a plurality of target training images corresponding to the plurality of training images.

When the number of the training images is multiple, the sizes of the training images may be the same or different, and at this time, in order to ensure the quality and efficiency of extracting the image features, pixel normalization processing may be performed on the training images, so that a plurality of intermediate images corresponding to the training images may be obtained, and it is understood that the obtained intermediate images have the same pixel size.

After the plurality of intermediate images are obtained, color gamut conversion processing can be performed on the plurality of intermediate images, so that a plurality of target training images corresponding to a plurality of training image rows can be obtained, the image feature dimensions of the plurality of training images can be ensured to be the same when feature extraction operation is performed, and the quality and the efficiency of extracting image features are further improved.

FIG. 4 is a schematic flowchart of a process of processing a training image by using a first machine learning model to obtain a first training feature corresponding to the training image according to an embodiment of the present invention; based on the foregoing embodiment, with reference to fig. 4, in this embodiment, a specific implementation manner for obtaining the first training feature corresponding to the training image is not limited, and a person skilled in the art may set the first training feature according to specific application requirements and design requirements, and preferably, the processing the training image by using the first machine learning model in this embodiment to obtain the first training feature corresponding to the training image may include:

step S401: and processing the training image by using an attention mechanism to obtain a characteristic weight coefficient corresponding to the training image.

Step S402: based on the feature weight coefficient and the training image, a first training feature corresponding to the training image is obtained.

When the first machine learning model is an attention mechanism, the attention mechanism can be used for processing the training image, so that the characteristic weight coefficient corresponding to the training image can be obtained. After the feature weight coefficients are obtained, the feature weight coefficients and the training images may be analyzed, so that first training features corresponding to the training images may be obtained. Specifically, referring to fig. 5, obtaining the first training feature corresponding to the training image based on the feature weight coefficient and the training image may include:

step S501: and determining a target weight coefficient corresponding to the characteristic weight coefficient, wherein the nonlinearity degree of the target weight coefficient is greater than the nonlinearity degree of the characteristic weight coefficient.

Step S502: based on the target weight coefficient and the training image, a first training feature is obtained.

Because the degree of nonlinearity of the feature weight coefficient is limited, in order to ensure the quality and efficiency of extracting the first training feature, after the feature weight coefficient is obtained, the feature weight coefficient may be analyzed, so that a target weight coefficient corresponding to the feature weight coefficient may be determined, specifically, determining the target weight coefficient corresponding to the feature weight coefficient may include: acquiring an activation function for increasing the non-linearity degree of the characteristic weight coefficient; and processing the characteristic weight coefficient by using an activation function to obtain a target weight coefficient.

Wherein the activation function is a pre-configured data processing function for increasing the degree of non-linearity of the feature weight coefficients. In addition, the embodiment does not limit the specific obtaining manner of the activation function, and a person skilled in the art may set the activation function according to specific application requirements and design requirements, for example: the activation function can be stored in a preset area, and the activation function can be obtained by accessing the preset area; alternatively, the activation function may be stored in a preset device, the preset device is in communication connection with the model training device, and the preset device may actively send the activation function to the model training device, or the model training device may send a function acquisition request to the preset device, so that the preset device may send the activation function to the model training device through the function acquisition request, and the model training device may stably acquire the activation function.

After the activation function is acquired, the feature weight coefficients may be analyzed by using the activation function, so that the target weight coefficients may be acquired. Specifically, processing the feature weight coefficient by using the activation function, and obtaining the target weight coefficient may include: processing the characteristic weight coefficient by using an activation function to obtain a processed weight coefficient; and carrying out normalization processing on the processed weight coefficient to obtain a target weight coefficient.

After the activation function and the characteristic weight coefficient are obtained, the characteristic weight coefficient can be analyzed and processed by the activation function, so that a processed weight coefficient can be obtained, and the nonlinearity degree of the processed weight coefficient is higher than that of the characteristic weight coefficient. In order to improve the quality and efficiency of data processing, after the processed weight coefficients are obtained, normalization processing may be performed on the processed weight coefficients, so that target weight information may be obtained.

After the target weight information is obtained, the target weight coefficient and the training image may be analyzed, so that the first training feature may be obtained. Specifically, obtaining the first training feature based on the target weight coefficient and the training image may include: and determining the product of the target weight coefficient and the training image as the first training characteristic.

In this embodiment, the attention mechanism is utilized to process the training image to obtain the feature weight coefficient corresponding to the training image, and then the first training feature corresponding to the training image is obtained based on the feature weight coefficient and the training image.

Fig. 6 is a schematic flowchart of a process of processing a training image by using a second machine learning model to obtain a second training feature corresponding to the training image according to an embodiment of the present invention; on the basis of the foregoing embodiment, with continued reference to fig. 6, in this embodiment, the processing the training image by using the second machine learning model to obtain the second training feature corresponding to the training image may include:

step S601: and analyzing the training image by using a perspective conversion sub-network to obtain an image transformation matrix corresponding to the training image.

When the second machine learning model includes a perspective transformation sub-network for performing perspective transformation processing on the image, the perspective transformation sub-network may be used to perform perspective transformation processing on the training image, so as to obtain an image transformation matrix corresponding to the training image.

In one implementation, the analyzing the training image with the perspective transformation sub-network to obtain the image transformation matrix corresponding to the training image may include: acquiring pixel point coordinate information in a training image; and analyzing and processing the coordinate information of the pixel points by using a perspective conversion sub-network to obtain an image transformation matrix corresponding to the training image.

Specifically, after the training image is obtained, the pixel coordinate information included in the training image can be identified by using a preset identification algorithm, and then the pixel coordinate information can be analyzed and processed by using a perspective transformation sub-network, so that an image transformation matrix corresponding to the training image can be obtained, that is, perspective change processing of the training image is realized.

Step S602: based on the image transformation matrix, second training features corresponding to the training images are obtained.

After the image transformation matrix is acquired, the image transformation matrix may be analyzed, so that a second training feature corresponding to the training image may be obtained. Specifically, obtaining the second training feature corresponding to the training image based on the image transformation matrix may include: analyzing the image transformation matrix by using a residual sub-network to obtain residual characteristics corresponding to the image transformation matrix; analyzing the residual error characteristics by using a transposed convolution sub-network to obtain a weight coefficient for identifying local response of the residual error characteristics; and determining a second training feature corresponding to the training image based on the weight coefficient and the residual feature.

Specifically, after the image transformation matrix is obtained, the image transformation matrix may be analyzed by using a residual sub-network for extracting residual features, so that residual features corresponding to the image transformation matrix may be obtained. And then, the residual error features can be analyzed by using the transposed convolution sub-network to obtain the weight coefficients for identifying the local responses of the residual error features, and after the weight coefficients for identifying the local responses of the residual error features are obtained, the weight coefficients and the residual error features can be analyzed, so that the second training features corresponding to the training images can be determined.

In one implementation, determining the second training features corresponding to the training images based on the weight coefficients and the residual features may include: and determining the product of the weight coefficient and the residual error feature as a second training feature corresponding to the training image.

In the embodiment, the training images are analyzed and processed through the perspective transformation sub-network to obtain the image transformation matrix corresponding to the training images, and then the second training features corresponding to the training images are obtained based on the image transformation matrix, so that the accuracy and reliability of obtaining the second training features are effectively ensured, and the quality and efficiency of learning and training the model are further improved.

Fig. 7 is a schematic flowchart of a process of performing learning training based on a first training feature, a second training feature, and identity information corresponding to a preset object to obtain a target model for identifying the identity information of the preset object in an image according to an embodiment of the present invention; on the basis of the foregoing embodiment, referring to fig. 7, in this embodiment, performing learning training based on the first training feature, the second training feature, and the identity information corresponding to the preset object, and obtaining the target model for identifying the identity information of the preset object in the image may include:

step S701: and determining target feature information corresponding to the training image according to the first training feature and the second training feature.

Step S702: and performing learning training based on the target characteristic information and the identity information corresponding to the preset object to obtain a target model.

After the first training feature and the second training feature are obtained, the first training feature and the second training feature may be analyzed, so that target feature information corresponding to the training image may be determined. In one implementation, determining target feature information corresponding to the training image from the first training feature and the second training feature may include: and splicing the first training characteristic and the second training characteristic to obtain target characteristic information corresponding to the training image.

Specifically, because the first training feature and the second training feature are feature vector information that identifies different dimensions of the training image, after the first training feature and the second training feature are obtained, the first training feature and the second training feature can be spliced, so that target feature information corresponding to the training image can be obtained, and the target feature information at this time is fused with the first training feature and the second training feature.

After the target feature information is acquired, learning training may be performed based on the target feature information and identity information corresponding to a preset object, so that a target model may be obtained. In an implementation manner, performing learning training based on the target feature information and the identity information corresponding to the preset object, obtaining the target model may include: and performing learning training on the target characteristic information and the identity information corresponding to the preset object by using a cross entropy loss function to obtain a target model.

In this embodiment, the target feature information corresponding to the training image can be determined through the obtained first training feature and the second training feature, and then learning training can be performed based on the target feature information and the identity information corresponding to the preset object to obtain the target model, so that model training operation can be effectively performed based on different dimensionality features of the training image, the quality and efficiency of learning training of the target model are ensured, and the practicability of the method is further improved.

FIG. 8 is a schematic flow chart illustrating a further method for training a model according to an embodiment of the present invention; on the basis of the foregoing embodiment, with reference to fig. 8, the method in this embodiment may further include:

step S801: and acquiring the characteristic extraction operation input by the user aiming at the training image.

Step S802: and acquiring a third training feature corresponding to the training image according to the feature extraction operation.

Step S803: and performing learning training based on the first training characteristic, the second training characteristic, the third training characteristic and the identity information corresponding to the preset object to obtain a target model for identifying the identity information of the preset object in the image.

After the training image is obtained, in order to enable the target model to be applicable to more application scenes and meet application requirements of different users, the user can input feature extraction operation aiming at the training image, and the feature extraction operation can realize different feature extraction operations in different application scenes; when the feature extraction operation input by the user for the training image is obtained, a third training feature corresponding to the training image may be obtained according to the feature extraction operation.

After the third training feature is obtained, learning training may be performed by combining the first training feature, the second training feature, the third training feature, and the identity information corresponding to the preset object, so as to obtain a target model for identifying the identity information of the preset object in the image.

In this embodiment, a third training feature corresponding to the training image is obtained according to the feature extraction operation by obtaining the feature extraction operation input by the user for the training image, and a target model for identifying the identity information of the preset object in the image is obtained based on the first training feature, the second training feature, the third training feature and the identity information corresponding to the preset object, so that the training features meeting different application scenarios and application requirements are effectively selected based on the interactive operation with the user, and the quality and the practicability of training the target model are further improved.

On the basis of any one of the above embodiments, the method in this embodiment may further include: and providing a data application interface corresponding to the target model so as to call the target model to perform data processing operation through the data application interface.

After the target model is trained and generated, in order to facilitate calling of the target model, a data application interface corresponding to the target model may be provided, and the data application interface is used for providing service applications corresponding to the target model through a network, so that corresponding data processing operations using the target model are facilitated.

For example, when a model update request for performing an update operation on the target model is obtained, the model update operation may be performed on the target model through the data application interface according to the model update request, thereby effectively achieving an optimized update operation on the data processing capability of the target model according to the model update request, and further improving the training capability of the target model. When a data processing request for the target model is acquired, the target model can be called through the data application interface to perform data processing operation according to the data processing request, so that the data processing operation can be effectively performed according to the data processing request and the target model, and the quality and the efficiency of applying the target model are further improved.

Fig. 9 is a schematic flowchart of a moving object identification method according to an embodiment of the present invention; referring to fig. 9, the present embodiment provides a moving object recognition method, and the execution subject of the method may be a moving object recognition apparatus, and it is understood that the moving object recognition apparatus may be implemented as software, or a combination of software and hardware. Specifically, the moving object identification method may include:

step S901: the method comprises the steps of obtaining an image to be processed and a plurality of reference images, wherein the image to be processed comprises a moving object to be identified, and the reference images comprise a plurality of reference moving objects.

Step S902: the method comprises the steps of determining a feature to be analyzed corresponding to an image to be processed and a reference feature corresponding to a reference image.

Step S903: and analyzing the features to be analyzed and the reference features by utilizing a machine learning model in the plurality of reference moving objects, and determining a target moving object corresponding to the moving object to be recognized, wherein the machine learning model is trained to be used for recognizing the identity information of the moving object in the image based on the image features.

The following is a detailed description of the above steps:

The mobile object to be recognized can be any object needing identity recognition operation, and specifically, the mobile object to be recognized can include a mobile vehicle, a mobile robot, a mobile unmanned aerial vehicle, a mobile character or a mobile animal and the like. The reference moving object may refer to a moving object having standard identity information, and the reference moving object is used for implementing identity recognition on the image to be processed.

In addition, the number of the images to be processed may be one or more, and when the number of the images to be processed is multiple, the clustering process may be performed on the multiple images to be processed, so as to obtain one or more images to be processed corresponding to one or more moving objects to be recognized, and when the same moving object to be recognized corresponds to multiple images to be processed, the multiple images to be processed may refer to image information of the same moving object to be recognized at different times and in different spaces, for example: when the moving object to be recognized includes a moving vehicle, the plurality of images to be processed may refer to the running image information on different roads at different times for the same moving vehicle.

In addition, the embodiment does not limit the specific implementation manner for acquiring the image to be processed and the multiple reference images, and those skilled in the art may set the method according to specific application requirements, for example: the moving object recognition device can be in communication connection with an image acquisition device (for example, a camera on a road and a camera in a preset space), an image to be processed can be acquired through the image acquisition device, and then the image acquisition device can send the acquired image to be processed to the moving object recognition device, so that the moving object recognition device can acquire the image to be processed including the moving object to be recognized. In addition, when the identity information of the moving object to be recognized in the image to be processed is recognized, the image to be processed in which the identity information is recognized may be determined as a reference image, and the reference image may be stored in a preset area to obtain a plurality of reference images by accessing the preset area.

In an implementable manner, a plurality of objects and a plurality of first images corresponding to the plurality of objects are preset, and the preset object may be any one of the plurality of objects, and the image to be processed including the moving object to be recognized may be at least a part of the plurality of first images. At this time, acquiring the to-be-processed image including the moving object to be recognized in the present embodiment may include: acquiring a plurality of first images corresponding to a plurality of objects, wherein the plurality of objects comprise moving objects to be identified; and clustering the plurality of first images to obtain at least one image corresponding to the moving object to be identified.

After a plurality of first images corresponding to a plurality of objects are acquired, clustering processing may be performed on the plurality of first images. For convenience of understanding, a vehicle is taken as an example of a moving object to be recognized, the plurality of first images may be clustered based on license plate information, or the plurality of first images may also be clustered based on preset image feature information, or the plurality of first images may also be clustered according to a preset clustering rule, so that at least one vehicle training image corresponding to the vehicle may be obtained.

For example, the plurality of first images corresponding to the plurality of vehicles exist as follows: an image a corresponding to the vehicle a, an image B corresponding to the vehicle B, an image C corresponding to the vehicle C, an image D corresponding to the vehicle a, an image E corresponding to the vehicle a, an image F corresponding to the vehicle B, and an image G corresponding to the vehicle C, then after acquiring the plurality of first images corresponding to the plurality of vehicles, the plurality of first images may be subjected to clustering processing, so that at least one vehicle image (including: the image a, the image D, and the image E) corresponding to the vehicle a, at least one vehicle image (including: the image B and the image F) corresponding to the vehicle B, and at least one vehicle image (including: the image C and the image G) corresponding to the vehicle C may be obtained. For the vehicle a, the image D, and the image E may refer to image information acquired for the vehicle a at different times and on different roads; similarly, for vehicle B, image B and image F may refer to image information collected for vehicle B at different times and on different roads; for vehicle C, image C and image G may refer to image information captured for vehicle C at different times and on different roads.

It is understood that the moving object to be recognized may be at least one of a plurality of vehicles, for example, when the vehicle a is the moving object to be recognized, then the at least one vehicle image corresponding to the moving object to be recognized may include the image a, the image D, and the image E. When the vehicle B is a moving object to be recognized, then the at least one vehicle image corresponding to the moving object to be recognized may include the image B and the image F. When the vehicle C is a moving object to be recognized, then the at least one vehicle image corresponding to the preset vehicle may include the image C and the image G.

The method comprises the steps of obtaining a plurality of first images corresponding to a plurality of moving objects, clustering the plurality of first images, and obtaining the images corresponding to the moving objects to be identified, so that the quality and the efficiency of analyzing and processing the images can be improved, and the accuracy and the reliability of identifying the moving objects to be identified can be effectively improved because at least one image corresponds to the moving objects to be identified and at least one image can comprise images in different shooting angles and different shooting environments.

After the image to be processed and the reference image are acquired, the image to be processed and the reference image may be analyzed, so that a feature to be analyzed corresponding to the image to be processed and a reference feature corresponding to the reference image may be determined.

After the features to be analyzed and the reference features are obtained, the features to be analyzed and the reference features can be input into a machine learning model for analysis processing, so that identity information of a target object corresponding to the moving object to be identified is determined in a plurality of reference moving objects; wherein the machine learning model is trained to recognize identity information of a moving object in the image based on the image features. Specifically, as shown in fig. 10, in the present embodiment, the analyzing, by using the machine learning model, the feature to be analyzed and the reference feature in the multiple reference moving objects, and determining the target moving object corresponding to the moving object to be recognized may include:

step S9031: and acquiring similarity information between the features to be analyzed and the reference features by using a machine learning model.

The obtaining of the similarity information between the feature to be analyzed and the reference feature by using the machine learning model may include: obtaining a cosine distance or an Euclidean distance between the feature to be analyzed and the reference feature by using a machine learning model; and determining similarity information between the feature to be analyzed and the reference feature based on the cosine distance or the Euclidean distance.

Specifically, after the cosine distance or the euclidean distance is obtained, the difference between 1 and the cosine distance is determined as the similarity information between the feature to be analyzed and the reference feature, or the difference between 1 and the euclidean distance is determined as the similarity information between the feature to be analyzed and the reference feature.

Of course, those skilled in the art may also use other methods to obtain the similarity information between the feature to be analyzed and the reference feature, as long as the accuracy and reliability of obtaining the similarity information can be ensured, which is not described herein again.

Step S9032: and determining the reference moving object corresponding to the reference feature with the maximum similarity information as the target moving object corresponding to the moving object to be identified.

Since the number of the reference images may be multiple, and the number of the reference features corresponding to the reference images is also multiple, the number of the similarity information between the features to be analyzed and the reference features is also multiple. After the plurality of pieces of similarity information are acquired, the reference feature with the largest similarity information can be identified, and then the reference moving object corresponding to the reference feature with the largest similarity information can be determined as the target moving object corresponding to the moving object to be identified.

For example, the reference image may include a reference image a, a reference image B, and a reference image C, and the reference feature may include a reference feature a, a reference feature B, and a reference feature C. The similarity between the feature to be analyzed and the reference feature a is similarity a, the similarity between the feature to be analyzed and the reference feature b is similarity b, and the similarity between the feature to be analyzed and the reference feature c is similarity c.

And then, the similarity a, the similarity B and the similarity c can be analyzed and compared to obtain a reference feature with the maximum similarity information as a reference feature B, and further, a reference image B corresponding to the reference feature B can be determined as a target reference image.

According to the moving object identification method provided by the embodiment, the to-be-analyzed feature corresponding to the to-be-processed image and the reference feature corresponding to the reference image are determined by acquiring the to-be-processed image and the plurality of reference images, and then the to-be-analyzed feature and the reference feature are analyzed and processed by utilizing the machine learning model in the plurality of reference moving objects, so that the moving object in the image can be quickly and accurately identified under different shooting parameters and environmental conditions, the quality and the efficiency of identifying the moving object in the image are effectively realized, and the practicability of the method is further improved.

FIG. 11 is a schematic flowchart of determining a feature to be analyzed corresponding to an image to be processed according to an embodiment of the present invention; on the basis of the foregoing embodiment, with reference to fig. 11, in this embodiment, a specific implementation manner of determining a feature to be analyzed corresponding to an image to be processed is not limited, and a person skilled in the art may set the feature according to specific application requirements and design requirement information, and preferably, the determining the feature to be analyzed corresponding to the image to be processed in this embodiment may include:

step S1001: and processing the image to be processed by utilizing the first machine learning model to obtain a first characteristic corresponding to the image to be processed.

The first machine learning model is a model trained in advance for extracting image features, and optionally, the first machine learning model may include an attention mechanism, and at this time, after the image to be processed is acquired, the image to be processed may be analyzed and processed by using the attention mechanism, so that first features corresponding to the image to be processed may be acquired, and the acquired first features may include local features and global features of the image to be processed.

In an implementation manner, processing the image to be processed by using the first machine learning model, and obtaining the first feature corresponding to the image to be processed may include: processing the image to be processed by using an attention mechanism to obtain a first characteristic weight coefficient corresponding to the image to be processed; and obtaining a first feature corresponding to the image to be processed based on the first feature weight coefficient and the image to be processed.

After the first feature weight coefficient is acquired, the first feature weight coefficient and the image to be processed may be analyzed to obtain a first feature corresponding to the image to be processed. Specifically, obtaining the first feature corresponding to the image to be processed based on the first feature weight coefficient and the image to be processed may include: determining a target weight coefficient corresponding to the first characteristic weight coefficient, wherein the degree of nonlinearity of the target weight coefficient is greater than that of the first characteristic weight coefficient; and obtaining a first characteristic based on the target weight coefficient and the image to be processed.

In one implementable manner, determining the target weight coefficient corresponding to the first characteristic weight coefficient may comprise: acquiring an activation function for increasing the degree of nonlinearity of the first feature weight coefficient; and obtaining a target weight coefficient by using the activation function and the first characteristic weight coefficient.

In one implementation, obtaining the target weight coefficients using the activation function and the first feature weight coefficients may include: processing the first characteristic weight coefficient by using an activation function to obtain a processed weight coefficient; and carrying out normalization processing on the processed weight coefficient to obtain a target weight coefficient.

In an implementable manner, obtaining the first feature based on the target weight coefficient and the image to be processed may comprise: and determining the product of the target weight coefficient and the image to be processed as the first characteristic.

The implementation process and the technical effect of the method in this embodiment are similar to the implementation process and the technical effect of extracting the first training feature in the embodiment described above, and details of the embodiment not described in detail herein may refer to the relevant description of the embodiment shown in fig. 4 to 7, and are not described herein again.

Step S1002: and processing the image to be processed by utilizing a second machine learning model to obtain a second feature corresponding to the image to be processed, wherein the first machine learning model is different from the second machine learning model, and the first feature is different from the second feature.

The second machine learning model is different from the first machine learning model, and is a model trained in advance for extracting image features, and in particular, the second machine learning model may include a perspective transformation sub-network, a residual sub-network and a transposed convolution sub-network, where the perspective transformation sub-network is configured to perform perspective transformation on an image, the residual sub-network is configured to extract residual features of the image, and the transposed convolution sub-network is configured to obtain weight coefficients of local responses of the residual features.

In an implementable manner, processing the image to be processed using the second machine learning model, obtaining the second feature corresponding to the image to be processed may comprise: analyzing and processing the image to be processed by utilizing a perspective conversion sub-network to obtain an image transformation matrix corresponding to the image to be processed; and obtaining a second characteristic corresponding to the image to be processed based on the image transformation matrix.

In an implementation manner, the analyzing the image to be processed by using the perspective transformation subnetwork to obtain the image transformation matrix corresponding to the image to be processed may include: acquiring pixel point coordinate information in an image to be processed; and analyzing and processing the coordinate information of the pixel points by using a perspective conversion sub-network to obtain an image transformation matrix corresponding to the image to be processed.

In one implementation, obtaining the second feature corresponding to the image to be processed based on the image transformation matrix may include: analyzing the image transformation matrix by using a residual sub-network to obtain residual characteristics corresponding to the image transformation matrix; analyzing the residual error characteristics by using a transposed convolution sub-network to obtain a weight coefficient for identifying local response of the residual error characteristics; and determining a second feature corresponding to the image to be processed based on the weight coefficient and the residual feature.

In one implementable approach, determining the second feature corresponding to the image to be processed based on the weight coefficients and the residual features may comprise: and determining the product of the weight coefficient and the residual error characteristic as a second characteristic corresponding to the image to be processed.

The implementation process and technical effect of the method in this embodiment are similar to the implementation process and technical effect of extracting the second training feature in the embodiment described above, and details of the embodiment not described in detail herein may refer to the relevant description of the embodiment shown in fig. 4 to 7, and are not described herein again.

Step S1003: and determining the feature to be analyzed corresponding to the image to be processed based on the first feature and the second feature.

After the first feature and the second feature are acquired, the first feature and the second feature may be subjected to analysis processing to determine a feature to be analyzed corresponding to the image to be processed. Since the first feature and the second feature may each be feature vector information of different dimensions of the image to be processed, in one implementable manner, determining the feature to be analyzed corresponding to the image to be processed based on the first feature and the second feature may comprise: and performing splicing processing on the first characteristic and the second characteristic, so that the characteristic to be analyzed corresponding to the image to be processed can be determined.

The implementation process and technical effect of the method in this embodiment are similar to those of the method in the embodiments shown in fig. 4 to 7 in the embodiments described above, and parts not described in detail in this embodiment may refer to the related description of the embodiments shown in fig. 4 to 7, and are not described again here.

In this embodiment, the first machine learning model and the second machine learning model are respectively utilized to process the image to be processed, so as to obtain the first feature and the second feature corresponding to the image to be processed, and then the feature to be analyzed corresponding to the image to be processed can be determined based on the first feature and the second feature, so that the accuracy and reliability of obtaining the feature to be analyzed corresponding to the image to be processed are effectively ensured, and the quality and the effect of identifying the moving object in the image are further improved.

In a specific application, referring to fig. 12, a vehicle image is taken as an image to be processed, and a moving vehicle to be recognized included in the vehicle image is taken as a moving object to be recognized for description. Specifically, the method may include:

step 1: the client sends the vehicle image to be identified to the vehicle identification device, wherein the vehicle image to be identified comprises the moving vehicle to be identified.

Step 2: the vehicle identification device receives the image of the vehicle to be identified sent by the client.

And step 3: and performing feature extraction operation on the vehicle image to be recognized to obtain a first feature and a second feature of the vehicle image to be recognized.

The method comprises the steps of utilizing a first machine learning model (attention mechanism) to conduct feature extraction operation on a vehicle image to be recognized, and obtaining first features. Specifically, the obtained vehicle image to be recognized may be matrix image data subjected to a preprocessing operation, and therefore, when the first machine learning model (attention mechanism) is used to perform a feature extraction operation on the vehicle image to be recognized, features of the vehicle to be recognized at different positions (individual matrix elements in the matrix image data) may be weighted by using the attention mechanism, so that a CNN feature map (i.e., a first feature) may be obtained.

Specifically, the obtaining the CNN feature map by performing weighting processing on features of the vehicle to be identified at different positions by using an attention mechanism may include: analyzing and processing the vehicle image to be recognized by using an attention mechanism network to obtain a channel feature descriptor (i.e. a weight coefficient for weighting each element in the matrix), wherein the channel feature descriptor can be expressed by the following formula:

wherein, p is a preset parameter for avoiding data | | | m | | being 0, p ═ 2, epsilon ═ 1e-12, m is a channel feature descriptor, and m ∈ R^H×WH is the vehicle map to be identifiedThe height of the image, W, is the width of the image of the vehicle to be identified.

After the channel feature descriptors are obtained, the channel feature descriptors can be operated through an activation function, normalization processing is carried out on the operated result, the value of the normalized result is limited to be 0-1, and therefore a channel attention feature map can be obtained, and then the channel attention feature map and the channel feature descriptors can be multiplied to obtain a CNN feature map subjected to channel processing.

And performing feature extraction operation on the vehicle image to be recognized by using a second machine learning model to obtain a second feature, wherein the second machine learning model can comprise a perspective conversion sub-network, a residual sub-network and a transposed convolution sub-network which are formed by a convolution neural network. The perspective transformation sub-network is used for carrying out perspective transformation processing on the image, the residual sub-network is used for extracting residual characteristics of the image, and the transposed convolution sub-network is used for acquiring weight coefficients of local responses of the residual characteristics.

Specifically, as shown in fig. 13, performing a feature extraction operation on the to-be-recognized vehicle image by using the second machine learning model, and obtaining the second feature may include:

step 31: and analyzing and processing the vehicle image to be identified by utilizing the perspective conversion sub-network to obtain an image transformation matrix corresponding to the vehicle image to be identified.

The vehicle image to be identified is input to a perspective conversion sub-network formed by two layers of convolutional neural networks for processing, so that an image transformation matrix theta can be obtained, and the image transformation matrix theta can be expressed as follows:

wherein the content of the first and second substances,

in order to transform the matrix for the image,

for outputting a characteristic mapThe coordinates of the pixel points are calculated,

the pixel point coordinates of the input original characteristic diagram are obtained.

Step 32: and analyzing the image transformation matrix by using a residual sub-network to obtain residual characteristics corresponding to the image transformation matrix.

Step 33: the residual features are analyzed by using a Transposed Convolutional subnetwork (CTL), and weight coefficients for identifying local responses of the residual features are obtained.

The transposed convolution sub-network may include a vector convolution operation Unit conv, a Batch Normalization Unit (BN for short), a Linear rectification function (ReLU) Unit, and a transposed convolution Unit (trans-conv for short).

Step 34: and determining a second feature corresponding to the vehicle image to be identified based on the weight coefficient and the residual feature.

And 4, step 4: and splicing the first characteristic and the second characteristic to obtain the target characteristic of the vehicle image to be identified.

And 5: and acquiring a preset reference image and reference image characteristics corresponding to the preset reference image.

Step 6: and inputting the target characteristics and the reference image characteristics into a machine learning model for analysis processing, and obtaining the identity information of the vehicle to be recognized in the vehicle image to be recognized.

Specifically, a cosine distance between the depth feature of the reference image and the depth feature of the vehicle image to be recognized is calculated by using a machine learning model, wherein the cosine distance can be obtained by the following formula:

wherein x is the depth feature of the reference image, y is the depth feature of the vehicle image to be identified, and s is the cosine distance between the depth feature of the reference image and the depth feature of the vehicle image to be identified.

After the cosine distance is acquired, the similarity (1-s) between the target vehicle image and the vehicle image to be recognized can be determined according to the size of the cosine distance, and the vehicle identity in the reference image with the maximum similarity is determined as the identity information of the vehicle to be recognized in the vehicle image to be recognized, so that the vehicle identity of the vehicle image to be recognized is effectively recognized.

The vehicle identification method provided by the application embodiment combines big data and a visual algorithm, and realizes effective feature extraction operation on the vehicle picture, so that the problems of inaccurate data caused by inaccurate calibration of the longitude and latitude of the camera point position and the uncalibrated camera clock in the prior art are effectively solved, in addition, the historical experience is fully drawn by utilizing a machine learning technology and a big data technology, the retrieval accuracy of the vehicle picture can be improved, and the working efficiency of vehicle identification is improved; in addition, the local features and/or the global features of the vehicle can be accurately identified, and more details which are not easy to find out of the global features can be found by using the local features of the vehicle, so that the vehicle with higher similarity can be accurately identified, and the practicability of the method is further improved.

FIG. 14 is a schematic structural diagram of a model training apparatus according to an embodiment of the present invention; referring to fig. 14, the present embodiment provides a model training apparatus that can perform the model training method shown in fig. 1, and the model training apparatus may include: a first acquisition module 11, a first processing module 12 and a first training module 13, in particular,

a first obtaining module 11, configured to obtain a training image including a preset object and identity information corresponding to the preset object;

the first processing module 12 is configured to process the training image by using the first machine learning model to obtain a first training feature corresponding to the training image;

the first processing module 12 is configured to process the training image by using a second machine learning model to obtain a second training feature corresponding to the training image, where the first machine learning model is different from the second machine learning model, and the first training feature is different from the second training feature;

the first training module 13 is configured to perform learning training based on the first training feature, the second training feature, and the identity information corresponding to the preset object, and obtain a target model for identifying the identity information of the preset object in the image.

In some examples, the number of the training images is multiple, and after the training image including the preset object is acquired, the first processing module 12 in this embodiment may be further configured to perform: performing pixel normalization processing on the plurality of training images to obtain a plurality of intermediate images corresponding to the plurality of training images; the plurality of intermediate images are subjected to color gamut conversion processing to obtain a plurality of target training images corresponding to the plurality of training images.

In some examples, the first machine learning model includes an attention mechanism and the second machine learning model includes a perspective transformation sub-network, a residual sub-network, and a transposed convolution sub-network comprised of a convolutional neural network.

In some examples, in processing the training image using the first machine learning model to obtain the first training features corresponding to the training image, the first processing module 12 may be configured to perform: processing the training image by using an attention mechanism to obtain a characteristic weight coefficient corresponding to the training image; based on the feature weight coefficient and the training image, a first training feature corresponding to the training image is obtained.

In some examples, in obtaining the first training feature corresponding to the training image based on the feature weight coefficient and the training image, the first processing module 12 may be configured to: determining a target weight coefficient corresponding to the characteristic weight coefficient, wherein the nonlinearity degree of the target weight coefficient is greater than the nonlinearity degree of the characteristic weight coefficient; based on the target weight coefficient and the training image, a first training feature is obtained.

In some examples, in determining the target weight coefficients corresponding to the feature weight coefficients, the first processing module 12 may be configured to perform: acquiring an activation function for increasing the non-linearity degree of the characteristic weight coefficient; and processing the characteristic weight coefficient by using an activation function to obtain a target weight coefficient.

In some examples, when the activation function is used to process the feature weight coefficients to obtain the target weight coefficients, the first processing module 12 may be configured to perform: processing the characteristic weight coefficient by using an activation function to obtain a processed weight coefficient; and carrying out normalization processing on the processed weight coefficient to obtain a target weight coefficient.

In some examples, in obtaining the first training feature based on the target weight coefficient and the training image, the first processing module 12 may be configured to perform: and determining the product of the target weight coefficient and the training image as the first training characteristic.

In some examples, when processing the training image with the second machine learning model to obtain the second training features corresponding to the training image, the first processing module 12 may be configured to perform: analyzing and processing the training image by utilizing a perspective conversion sub-network to obtain an image transformation matrix corresponding to the training image; based on the image transformation matrix, second training features corresponding to the training images are obtained.

In some examples, when the training image is analyzed and processed by the perspective transformation sub-network to obtain an image transformation matrix corresponding to the training image, the first processing module 12 may be configured to perform: acquiring pixel point coordinate information in a training image; and analyzing and processing the coordinate information of the pixel points by using a perspective conversion sub-network to obtain an image transformation matrix corresponding to the training image.

In some examples, in obtaining the second training feature corresponding to the training image based on the image transformation matrix, the first processing module 12 may be configured to perform: analyzing the image transformation matrix by using a residual sub-network to obtain residual characteristics corresponding to the image transformation matrix; analyzing the residual error characteristics by using a transposed convolution sub-network to obtain a weight coefficient for identifying local response of the residual error characteristics; and determining a second training feature corresponding to the training image based on the weight coefficient and the residual feature.

In some examples, in determining the second training feature corresponding to the training image based on the weight coefficient and the residual feature, first processing module 12 may be configured to perform: and determining the product of the weight coefficient and the residual error feature as a second training feature corresponding to the training image.

In some examples, when performing learning training based on the first training feature, the second training feature, and the identity information corresponding to the preset object to obtain a target model for identifying the identity information of the preset object in the image, the first training module 13 may be configured to perform: determining target feature information corresponding to the training image according to the first training feature and the second training feature; and performing learning training based on the target characteristic information and the identity information corresponding to the preset object to obtain a target model.

In some examples, in determining target feature information corresponding to a training image based on the first training feature and the second training feature, the first training module 13 may be configured to perform: and splicing the first training characteristic and the second training characteristic to obtain target characteristic information corresponding to the training image.

In some examples, when performing learning training based on the target feature information and the identity information corresponding to the preset object to obtain the target model, the first training module 13 may be configured to perform: and performing learning training on the target characteristic information and the identity information corresponding to the preset object by using a cross entropy loss function to obtain a target model.

In some examples, the first obtaining module 11, the first processing module 12 and the first training module 13 in this embodiment may be further configured to perform the following steps:

a first obtaining module 11, configured to obtain a feature extraction operation input by a user for the training image;

the first processing module 12 is configured to obtain a third training feature corresponding to the training image according to the feature extraction operation;

and the first training module 13 is configured to perform learning training based on the first training feature, the second training feature, the third training feature, and the identity information corresponding to the preset object, and obtain a target model for identifying the identity information of the preset object in the image.

In some examples, the first processing module 12 in this embodiment may be further configured to perform the following steps: and providing a data application interface corresponding to the target model so as to call the target model to perform data processing operation through the data application interface.

The apparatus shown in fig. 14 can perform the method of the embodiment shown in fig. 1-8, and the detailed description of this embodiment can refer to the related description of the embodiment shown in fig. 1-8. The implementation process and technical effect of the technical solution refer to the descriptions in the embodiments shown in fig. 1 to 8, and are not described herein again.

In one possible design, the structure of the model training apparatus shown in fig. 14 may be implemented as an electronic device, which may be a mobile phone, a tablet computer, a server, or other devices. As shown in fig. 15, the electronic device may include: a first processor 21 and a first memory 22. Wherein the first memory 22 is used for storing a program for the corresponding electronic device to execute the model training method provided in the embodiments shown in fig. 1-8, and the first processor 21 is configured to execute the program stored in the first memory 22.

The program comprises one or more computer instructions, wherein the one or more computer instructions, when executed by the first processor 21, are capable of performing the steps of:

Further, the first processor 21 is also used to execute all or part of the steps in the embodiments shown in fig. 1 to 8.

The electronic device may further include a first communication interface 23 for communicating with other devices or a communication network.

In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for an electronic device, which includes a program for executing the model training method in the method embodiments shown in fig. 1 to 7.

Fig. 16 is a schematic structural diagram of a moving object recognition apparatus according to an embodiment of the present invention; referring to fig. 16, the present embodiment provides a moving object recognition apparatus that may perform the moving object recognition method shown in fig. 9 described above, and the moving object recognition apparatus may include: a second acquisition module 31, a second processing module 32 and a second training module 33, in particular,

a second obtaining module 31, configured to obtain an image to be processed and a plurality of reference images, where the image to be processed includes a moving object to be identified, and the reference images include a plurality of reference moving objects;

a second determination module 32, configured to determine a feature to be analyzed corresponding to the image to be processed and a reference feature corresponding to the reference image;

and a second processing module 33, configured to perform analysis processing on the feature to be analyzed and the reference feature by using a machine learning model in the plurality of reference moving objects, and determine a target moving object corresponding to the moving object to be recognized, where the machine learning model is trained to recognize the identity information of the moving object in the image based on the image features.

In some examples, in determining the feature to be analyzed corresponding to the image to be processed, the second determination module 32 may be configured to perform: processing the image to be processed by utilizing the first machine learning model to obtain a first characteristic corresponding to the image to be processed; processing the image to be processed by utilizing a second machine learning model to obtain a second feature corresponding to the image to be processed, wherein the first machine learning model is different from the second machine learning model, and the first feature is different from the second feature; and determining the feature to be analyzed corresponding to the image to be processed based on the first feature and the second feature.

In some examples, the first machine learning model includes an attention mechanism, and when the first machine learning model is used to process the image to be processed to obtain the first feature corresponding to the image to be processed, the second determining module 32 may be configured to perform: processing the image to be processed by using an attention mechanism to obtain a first characteristic weight coefficient corresponding to the image to be processed; and obtaining a first feature corresponding to the image to be processed based on the first feature weight coefficient and the image to be processed.

In some examples, when obtaining the first feature corresponding to the image to be processed based on the first feature weight coefficient and the image to be processed, the second determining module 32 may be configured to perform: determining a target weight coefficient corresponding to the first characteristic weight coefficient, wherein the degree of nonlinearity of the target weight coefficient is greater than that of the first characteristic weight coefficient; and obtaining a first characteristic based on the target weight coefficient and the image to be processed.

In some examples, in determining the target weight coefficient corresponding to the first feature weight coefficient, the second determination module 32 may be operable to perform: acquiring an activation function for increasing the degree of nonlinearity of the first feature weight coefficient; and obtaining a target weight coefficient by using the activation function and the first characteristic weight coefficient.

In some examples, in obtaining the target weight coefficient using the activation function and the first feature weight coefficient, the second determination module 32 may be configured to perform: processing the first characteristic weight coefficient by using an activation function to obtain a processed weight coefficient; and carrying out normalization processing on the processed weight coefficient to obtain a target weight coefficient.

In some examples, in obtaining the first feature based on the target weight coefficient and the image to be processed, the second determination module 32 may be configured to perform: and determining the product of the target weight coefficient and the image to be processed as the first characteristic.

In some examples, the second machine learning model includes a perspective transformation sub-network, a residual sub-network, and a transposed convolution sub-network comprised of a convolutional neural network; when the image to be processed is processed by using the second machine learning model to obtain the second feature corresponding to the image to be processed, the second determining module 32 may be configured to perform: analyzing and processing the image to be processed by utilizing a perspective conversion sub-network to obtain an image transformation matrix corresponding to the image to be processed; and obtaining a second characteristic corresponding to the image to be processed based on the image transformation matrix.

In some examples, when the image to be processed is analyzed and processed by using the perspective transformation subnetwork to obtain the image transformation matrix corresponding to the image to be processed, the second determining module 32 may be configured to perform: acquiring pixel point coordinate information in an image to be processed; and analyzing and processing the coordinate information of the pixel points by using a perspective conversion sub-network to obtain an image transformation matrix corresponding to the image to be processed.

In some examples, in obtaining the second feature corresponding to the image to be processed based on the image transformation matrix, the second determination module 32 may be configured to perform: analyzing the image transformation matrix by using a residual sub-network to obtain residual characteristics corresponding to the image transformation matrix; analyzing the residual error characteristics by using a transposed convolution sub-network to obtain a weight coefficient for identifying local response of the residual error characteristics; and determining a second feature corresponding to the image to be processed based on the weight coefficient and the residual feature.

In some examples, in determining the second feature corresponding to the image to be processed based on the weight coefficients and the residual features, the second determination module 32 may be configured to perform: and determining the product of the weight coefficient and the residual error characteristic as a second characteristic corresponding to the image to be processed.

In some examples, when the feature to be analyzed and the reference feature are analyzed and processed by using a machine learning model in a plurality of reference moving objects, and a target moving object corresponding to the moving object to be recognized is determined, the second processing module 33 may be configured to perform: acquiring similarity information between the features to be analyzed and the reference features by using a machine learning model; and determining the reference moving object corresponding to the reference feature with the maximum similarity information as the target moving object corresponding to the moving object to be identified.

In some examples, when the machine learning model is used to obtain the similarity information between the feature to be analyzed and the reference feature, the second processing module 33 may be configured to perform: obtaining a cosine distance or an Euclidean distance between the feature to be analyzed and the reference feature by using a machine learning model; and determining similarity information between the feature to be analyzed and the reference feature based on the cosine distance or the Euclidean distance.

In some examples, the moving object to be identified includes a vehicle.

The apparatus shown in fig. 16 can perform the method of the embodiment shown in fig. 9-13, and reference may be made to the related description of the embodiment shown in fig. 9-13 for parts of this embodiment that are not described in detail. The implementation process and technical effect of the technical solution are described in the embodiments shown in fig. 9 to 13, and are not described herein again.

In one possible design, the structure of the moving object recognition apparatus shown in fig. 16 may be implemented as an electronic device, which may be a mobile phone, a tablet computer, a server, or other devices. As shown in fig. 17, the electronic device may include: a second processor 41 and a second memory 42. Wherein the second memory 42 is used for storing programs for the corresponding electronic device to execute the moving object identification method provided in the embodiments shown in fig. 9-13, and the second processor 41 is configured for executing the programs stored in the second memory 42.

The program comprises one or more computer instructions, wherein the one or more computer instructions, when executed by the second processor 41, are capable of performing the steps of:

determining a feature to be analyzed corresponding to the image to be processed and a reference feature corresponding to a reference image;

and analyzing the features to be analyzed and the reference features by utilizing a machine learning model in the plurality of reference moving objects, and determining a target moving object corresponding to the moving object to be recognized, wherein the machine learning model is trained to be used for recognizing the identity information of the moving object in the image based on the image features.

Further, the second processor 41 is also used to execute all or part of the steps in the embodiments shown in fig. 9-13.

The electronic device may further include a second communication interface 43 for communicating with other devices or a communication network.

In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for an electronic device, which includes a program for executing the moving object identification method in the method embodiments shown in fig. 9 to 13.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of model training, comprising:

2. The method according to claim 1, wherein the number of the training images is plural, and after acquiring the training image including the preset subject, the method further comprises:

performing pixel normalization processing on a plurality of training images to obtain a plurality of intermediate images corresponding to the plurality of training images;

and performing color gamut conversion processing on the plurality of intermediate images to obtain a plurality of target training images corresponding to the plurality of training images.

3. The method of claim 1, wherein the first machine learning model comprises an attention mechanism, and wherein the second machine learning model comprises a perspective transformation sub-network, a residual sub-network, and a transposed convolution sub-network formed from a convolutional neural network.

4. The method of claim 3, wherein processing the training image using a first machine learning model to obtain a first training feature corresponding to the training image comprises:

processing the training image by using an attention mechanism to obtain a characteristic weight coefficient corresponding to the training image;

and obtaining a first training feature corresponding to the training image based on the feature weight coefficient and the training image.

5. The method of claim 4, wherein obtaining the first training features corresponding to the training image based on the feature weight coefficients and the training image comprises:

determining a target weight coefficient corresponding to the characteristic weight coefficient, wherein the target weight coefficient has a degree of nonlinearity greater than the characteristic weight coefficient;

and obtaining the first training characteristic based on the target weight coefficient and the training image.

6. The method of claim 5, wherein determining a target weight coefficient corresponding to the feature weight coefficient comprises:

acquiring an activation function for increasing the degree of nonlinearity of the characteristic weight coefficient;

and processing the characteristic weight coefficient by using the activation function to obtain the target weight coefficient.

7. The method of claim 6, wherein processing the feature weight coefficients using the activation function to obtain the target weight coefficients comprises:

processing the characteristic weight coefficient by using the activation function to obtain a processed weight coefficient;

and carrying out normalization processing on the processed weight coefficient to obtain the target weight coefficient.

8. The method of claim 5, wherein obtaining the first training feature based on the target weight coefficients and a training image comprises:

and determining the product of the target weight coefficient and the training image as the first training feature.

9. The method of claim 3, wherein processing the training image with a second machine learning model to obtain second training features corresponding to the training image comprises:

analyzing and processing the training image by using the perspective conversion sub-network to obtain an image transformation matrix corresponding to the training image;

based on the image transformation matrix, second training features corresponding to the training images are obtained.

10. The method of claim 9, wherein analyzing the training image with the sub-network of perspective transformation to obtain an image transformation matrix corresponding to the training image comprises:

acquiring pixel point coordinate information in the training image;

and analyzing and processing the pixel point coordinate information by using the perspective conversion sub-network to obtain an image transformation matrix corresponding to the training image.

11. The method of claim 9, wherein obtaining second training features corresponding to the training images based on the image transformation matrix comprises:

analyzing the image transformation matrix by using the residual sub-network to obtain residual characteristics corresponding to the image transformation matrix;

analyzing the residual error feature by using the transposed convolution sub-network to obtain a weight coefficient for identifying the local response of the residual error feature;

determining a second training feature corresponding to the training image based on the weight coefficient and the residual feature.

12. The method of claim 11, wherein determining a second training feature corresponding to the training image based on the weight coefficient and the residual feature comprises:

and determining the product of the weight coefficient and the residual error feature as a second training feature corresponding to the training image.

13. The method of claim 1, wherein performing learning training based on the first training feature, the second training feature and identity information corresponding to a preset object to obtain a target model for identifying the identity information of the preset object in the image comprises:

determining target feature information corresponding to the training image according to the first training feature and the second training feature;

and performing learning training based on the target characteristic information and the identity information corresponding to a preset object to obtain the target model.

14. The method of claim 13, wherein determining target feature information corresponding to the training image from the first and second training features comprises:

and splicing the first training characteristic and the second training characteristic to obtain target characteristic information corresponding to the training image.

15. The method of claim 13, wherein performing learning training based on the target feature information and identity information corresponding to a preset object to obtain the target model comprises:

and performing learning training on the target characteristic information and the identity information corresponding to a preset object by using a cross entropy loss function to obtain the target model.

16. The method of claim 1, further comprising:

acquiring a feature extraction operation input by a user aiming at the training image;

acquiring a third training feature corresponding to the training image according to the feature extraction operation;

and performing learning training based on the first training characteristic, the second training characteristic, the third training characteristic and the identity information corresponding to the preset object to obtain a target model for identifying the identity information of the preset object in the image.

17. The method according to any one of claims 1-16, further comprising:

and providing a data application interface corresponding to the target model so as to call the target model to perform data processing operation through the data application interface.

18. A method for identifying a moving object, comprising:

19. The method of claim 18, wherein determining features to be analyzed corresponding to the image to be processed comprises:

processing the image to be processed by utilizing a first machine learning model to obtain a first characteristic corresponding to the image to be processed;

processing the image to be processed by utilizing a second machine learning model to obtain a second feature corresponding to the image to be processed, wherein the first machine learning model is different from the second machine learning model, and the first feature is different from the second feature;

determining a feature to be analyzed corresponding to the image to be processed based on the first feature and the second feature.

20. The method of claim 19, wherein the first machine learning model comprises an attention mechanism, and wherein processing the image to be processed using the first machine learning model to obtain a first feature corresponding to the image to be processed comprises:

processing the image to be processed by using an attention mechanism to obtain a first characteristic weight coefficient corresponding to the image to be processed;

and obtaining a first feature corresponding to the image to be processed based on the first feature weight coefficient and the image to be processed.

21. The method according to claim 20, wherein obtaining the first feature corresponding to the image to be processed based on the first feature weight coefficient and the image to be processed comprises:

determining a target weight coefficient corresponding to the first characteristic weight coefficient, wherein the target weight coefficient has a degree of nonlinearity greater than the first characteristic weight coefficient;

and obtaining the first characteristic based on the target weight coefficient and the image to be processed.

22. The method of claim 21, wherein determining a target weight coefficient corresponding to the first feature weight coefficient comprises:

acquiring an activation function for increasing the degree of nonlinearity of the first feature weight coefficient;

and obtaining the target weight coefficient by using the activation function and the first characteristic weight coefficient.

23. The method of claim 22, wherein obtaining the target weight coefficient using the activation function and the first feature weight coefficient comprises:

processing the first characteristic weight coefficient by using the activation function to obtain a processed weight coefficient;

24. The method of claim 21, wherein obtaining the first feature based on the target weight coefficient and an image to be processed comprises:

and determining the product of the target weight coefficient and the image to be processed as the first characteristic.

25. The method of claim 19, wherein the second machine learning model comprises a perspective transformation sub-network, a residual sub-network, and a transposed convolution sub-network comprised of a convolutional neural network; processing the image to be processed by using a second machine learning model to obtain a second feature corresponding to the image to be processed, including:

analyzing and processing the image to be processed by utilizing the perspective conversion sub-network to obtain an image transformation matrix corresponding to the image to be processed;

and obtaining a second characteristic corresponding to the image to be processed based on the image transformation matrix.

26. The method of claim 25, wherein analyzing the image to be processed by the sub-network of perspective transformation to obtain an image transformation matrix corresponding to the image to be processed comprises:

acquiring pixel point coordinate information in the image to be processed;

and analyzing and processing the pixel point coordinate information by using the perspective conversion sub-network to obtain an image transformation matrix corresponding to the image to be processed.

27. The method of claim 25, wherein obtaining a second feature corresponding to the image to be processed based on the image transformation matrix comprises:

and determining a second feature corresponding to the image to be processed based on the weight coefficient and the residual feature.

28. The method of claim 27, wherein determining a second feature corresponding to the image to be processed based on the weight coefficients and the residual features comprises:

and determining the product of the weight coefficient and the residual error characteristic as a second characteristic corresponding to the image to be processed.

29. The method according to claim 18, wherein the step of performing analysis processing on the features to be analyzed and the reference features by using a machine learning model in the plurality of reference moving objects to determine a target moving object corresponding to the moving object to be identified comprises:

acquiring similarity information between the feature to be analyzed and the reference feature by using a machine learning model;

and determining the reference moving object corresponding to the reference feature with the maximum similarity information as the target moving object corresponding to the moving object to be identified.

30. The method of claim 29, wherein obtaining similarity information between the feature to be analyzed and the reference feature using a machine learning model comprises:

obtaining a cosine distance or an Euclidean distance between the feature to be analyzed and the reference feature by utilizing a machine learning model;

and determining similarity information between the feature to be analyzed and the reference feature based on the cosine distance or Euclidean distance.

31. The method according to any of claims 18-30, wherein the moving object to be identified comprises a vehicle.

32. A model training apparatus, comprising:

33. An electronic device, comprising: a memory, a processor; wherein the memory is to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the model training method of any one of claims 1-17.

34. A moving object recognition apparatus, comprising:

35. An electronic device, comprising: a memory, a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the mobile object recognition method of any one of claims 18-31.