WO2020228446A1 - 模型训练方法、装置、终端及存储介质 - Google Patents
模型训练方法、装置、终端及存储介质 Download PDFInfo
- Publication number
- WO2020228446A1 WO2020228446A1 PCT/CN2020/083523 CN2020083523W WO2020228446A1 WO 2020228446 A1 WO2020228446 A1 WO 2020228446A1 CN 2020083523 W CN2020083523 W CN 2020083523W WO 2020228446 A1 WO2020228446 A1 WO 2020228446A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- tracking
- response
- test
- recognition model
- image
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/62—Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/48—Matching video sequences
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30241—Trajectory
Definitions
- This application relates to the field of Internet technology, in particular to the field of visual target tracking, and in particular to a model training method, a model training device, a terminal, and a storage medium.
- Visual object tracking is an important research direction in the field of computational vision.
- the so-called visual target tracking refers to predicting the size and position of the tracking object in other images when the size and position of the tracking object in a certain image are known.
- Visual target tracking is usually used in video surveillance, human-computer interaction and unmanned driving and other application scenarios that require high real-time performance, such as: the size and position of the tracking object in a certain frame of image in a given video sequence In this case, predict the size and position of the tracking object in subsequent frame images of the video sequence.
- the embodiments of the application provide a model training method, device, terminal and storage medium, which can better train the first object recognition model, so that the first object recognition model obtained by the update training has better visual target tracking performance , Make it more suitable for visual target tracking scene, improve the accuracy of visual target tracking.
- an embodiment of the present application provides a model training method, which is executed by a computing device, and the model training method includes:
- the template image and the test image both include a tracking object, the test image includes a tracking label of the tracking object, and the tracking label is used to indicate that the tracking object is The marked position in the test image;
- first object recognition model to perform recognition processing on the features of the tracking object in the test image to obtain a first test response
- second object recognition model to track the tracking in the test image Identify the characteristics of the object, and obtain the second test response
- the difference information between the first reference response and the second reference response Based on the difference information between the first reference response and the second reference response, the difference information between the first test response and the second test response, and the difference between the tracking tag and the tracking response Update the first object recognition model.
- an embodiment of the present application provides a model training device, and the model training device includes:
- the acquiring unit is configured to acquire a training template image and a test image, the template image and the test image both include a tracking object, the test image includes a tracking label of the tracking object, and the tracking label is used to indicate the Tracking the marked position of the object in the test image;
- the processing unit is configured to call a first object recognition model to perform recognition processing on the characteristics of the tracking object in the template image to obtain a first reference response, and call a second object recognition model to perform a recognition process on the features of the template image Perform recognition processing on the characteristics of the tracked object to obtain a second reference response;
- the processing unit is further configured to call the first object recognition model to perform recognition processing on the characteristics of the tracked object in the test image, obtain a first test response, and call the second object recognition model to Performing identification processing on the feature of the tracking object in the test image to obtain a second test response;
- the processing unit is further configured to perform tracking processing on the first test response to obtain a tracking response on the tracking object, where the tracking response is used to indicate the tracking position of the tracking object in the test image;
- the update unit is configured to be based on the difference information between the first reference response and the second reference response, the difference information between the first test response and the second test response, and the tracking label and the And update the first object recognition model according to the difference information between the tracking responses.
- an embodiment of the present application provides a terminal, the terminal includes an input device and an output device, and the terminal further includes:
- a computer storage medium stores one or more instructions, and the one or more instructions are used to be loaded by the processor and execute the following steps:
- the template image and the test image both include a tracking object, the test image includes a tracking label of the tracking object, and the tracking label is used to indicate that the tracking object is The marked position in the test image;
- first object recognition model to perform recognition processing on the features of the tracking object in the test image to obtain a first test response
- second object recognition model to track the tracking in the test image Identify the characteristics of the object, and obtain the second test response
- the difference information between the first reference response and the second reference response Based on the difference information between the first reference response and the second reference response, the difference information between the first test response and the second test response, and the difference between the tracking tag and the tracking response Update the first object recognition model.
- an embodiment of the present application provides a computer storage medium, the computer storage medium stores one or more instructions, and the one or more instructions are used to be loaded by a processor and execute the following steps:
- the template image and the test image both include a tracking object, the test image includes a tracking label of the tracking object, and the tracking label is used to indicate that the tracking object is The marked position in the test image;
- first object recognition model to perform recognition processing on the features of the tracking object in the test image to obtain a first test response
- second object recognition model to track the tracking in the test image Identify the characteristics of the object, and obtain the second test response
- the difference information between the first reference response and the second reference response Based on the difference information between the first reference response and the second reference response, the difference information between the first test response and the second test response, and the difference between the tracking tag and the tracking response Update the first object recognition model.
- Fig. 1a is a scene diagram of visual target tracking based on a first object recognition model provided by an embodiment of the present application
- FIG. 1b is a schematic diagram of the implementation environment of the model training method provided by the embodiment of the present application.
- FIG. 2 is a schematic flowchart of a model training method provided by an embodiment of the present application.
- Fig. 3a is a structural diagram of a convolutional neural network provided by an embodiment of the present application.
- Figure 3b is a schematic diagram of determining a tracking response and a tracking label provided by an embodiment of the present application
- FIG. 4 is a schematic flowchart of another model training method provided by an embodiment of the present application.
- FIG. 5 is a schematic diagram of acquiring a first object recognition model provided by an embodiment of the present application.
- FIG. 6 is a schematic diagram of joint optimization of a first object recognition model provided by an embodiment of the present application.
- FIG. 7 is a schematic diagram of obtaining positive samples and negative samples according to another embodiment of the present application.
- FIG. 8 is a schematic structural diagram of a model training device provided by an embodiment of the present application.
- FIG. 9 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
- visual target tracking mainly uses traditional image processing models to achieve tracking processing.
- traditional image processing models are designed to achieve image classification tasks and are obtained by training with image classification data.
- Visual target tracking is not to achieve image classification tasks, so traditional image processing models are not suitable for application in visual target tracking scenes, resulting in low accuracy of visual target tracking.
- the embodiments of the present application provide a first object recognition model.
- the first object recognition model refers to an image recognition model with image recognition performance, such as a super-resolution test sequence (Visual Geometry Group, VGG) model, Google Net Model and deep residual network (Deep residual network, ResNet) model, etc.
- VGG Visual Geometry Group
- ResNet deep residual network
- the first object recognition model can accurately extract features from the image and the extracted features are more suitable for visual target tracking scenes. Therefore, the first object recognition model combined with related tracking algorithms is applied to the visual target tracking scenes, It can improve the accuracy and real-time of visual target tracking.
- the step of using the first object recognition model and the tracking algorithm to achieve visual target tracking may include: (1) Obtaining a to-be-processed image and a reference image including a tracking object, the tracking object being the reference image that needs to be tracked
- the image elements of, such as people, animals, etc. in the reference image; the reference image may include tag information of the tracking object, and the tag information is used to indicate the size and position of the tracking object.
- the labeling information may be represented in the form of a labeling box, for example as shown in 101 in Figure 1 described below; (2) Determine the prediction tracking included in the image to be processed according to the labeling information in the reference image Object, the predicted tracking object mentioned here refers to the image element that may be the tracking object in the image to be processed.
- the predicted tracking object mentioned here refers to the image element that may be the tracking object in the image to be processed.
- multiple candidate frames can be generated in the image to be processed according to the size of the labeled frame in the reference image, and each candidate frame represents a predicted tracking object, such as the figure described below.
- A, B, and C in 1 indicate the three predicted tracking objects determined; (2) Invoking the first object recognition model to recognize the tracking objects in the reference image to obtain the first recognition feature, the first Recognition feature refers to the feature of the tracking object, such as the facial contour feature, eye feature or posture feature of the tracking object, etc.; (3) Invoke the first object recognition model to predict the tracking object included in the image to be processed Perform recognition processing to obtain the second recognition feature, which refers to the feature of each predicted tracking object, such as the facial contour feature, eye feature, nose feature, or posture feature of each predicted tracking object; (4) Determine the target feature for tracking processing based on the first recognition feature and the second recognition feature, and use a tracking algorithm to track the target feature to obtain the position of the tracking object in the image to be processed .
- the tracking algorithm may include a correlation filter tracking algorithm, a dual network-based tracking algorithm, a sparse representation algorithm, etc.
- the correlation filter algorithm is taken as an example in this embodiment. After the correlation filter algorithm performs tracking processing on the target feature, a Gaussian-shaped response map is obtained, and the position of the peak on the response map represents the position of the tracked object in the image to be processed.
- the determination of the target feature for tracking processing based on the first identification feature and the second identification feature can be understood as: by analyzing the feature of the tracking object and the feature of each predicted tracking object, it is determined that each predicted tracking Which predicted tracking object in the object is used as the tracking object included in the image to be processed, so that the feature of the predicted tracking object is processed by the tracking algorithm to obtain the position of the tracking object in the image to be processed, thereby completing the tracking Object tracking.
- the implementation of step (4) may include: scoring the first identification feature and each second identification feature respectively, and determining the second identification feature with the highest matching score as the target feature.
- the implementation of step (4) may further include: performing fusion processing on each of the second identification features, and determining the result of the fusion processing as the target feature.
- a scene of visual target tracking provided by this embodiment of the application
- 101 represents a reference image
- 102 is an image to be processed
- 1011 represents the label information of the tracking object in the form of a label frame
- the label frame 1101 The size represents the size of the tracking object in the reference image
- the position of the label frame 1101 represents the location of the tracking object in the reference image
- 103 represents the first object recognition model.
- the first object recognition model 103 is called to perform recognition processing on 1011, the first recognition feature is obtained, and the first object recognition model is called Recognize the predicted tracking objects A, B, and C respectively, and obtain three second recognition features.
- the target feature is determined based on the first identification feature and the three second identification features, assuming that the second identification feature corresponding to the predicted tracking object C is determined as the target feature; then a tracking algorithm such as a correlation tracking filter algorithm is used to perform the target feature
- the tracking process obtains a Gaussian-shaped response map, and the peak point on the response map indicates the position of the tracking object in the image to be processed, as shown in 104.
- an embodiment of the application also proposes a model training method, which is used to train the first object recognition model to ensure that the first object recognition model can accurately extract features from the image And the extracted features are more suitable for tracking scenes.
- the model training method may be executed by a computing device such as a terminal, and specifically may be executed by a processor of the terminal.
- the terminal may include, but is not limited to: a smart terminal, a tablet computer, a laptop computer, a desktop computer, and so on.
- Fig. 1b is a schematic diagram of an implementation environment of the model training method provided by an embodiment of the application.
- the terminal device 10 and the server device 20 are communicatively connected through a network 30, and the network 30 may be a wired network or a wireless network.
- the terminal device 10 and the server device 20 are integrated with the model training device provided in any embodiment of the present application, which is used to implement the model training method provided in any embodiment of the present application.
- the model training method proposed in the embodiment of the present application may include the following steps S201-S205:
- Step S201 Obtain a template image and a test image for training.
- the template image and the test image are images used to train and update the model, and both the template image and the test image include tracking objects, and the template image may also include tag information of the tracking objects
- the annotation information of the tracking object is used to indicate the size and position of the tracking object in the template image, and the annotation information may be annotated by the terminal for the template image
- the test image also includes a response corresponding to the test image A label, the response label is used to indicate the label position of the tracking object in the test image.
- the label position may refer to the true position of the tracking object in the test image marked by the terminal; the test image may also include the tracking object Labeling information.
- the labeling information of the tracking object is used to indicate the size and position of the tracking object in the test image.
- the template image and the test image may be two frames of images in the same video sequence.
- a video sequence including the tracking object is recorded by a camera, and any frame in the video sequence is selected to include the tracking object.
- the image is used as a template image, and a frame of image including the tracking object in the video sequence is selected as the test image in addition to the template image.
- the template image and the test image may not be images in the same video sequence.
- the template image may be an image obtained by shooting a first shooting scene including a tracking object by a shooting device. Before or after the template image is obtained, the image obtained by shooting the second shooting scene including the tracking object with the shooting device, that is, the template image and the test image are two independent images.
- the template image and the test image are in the same video sequence as an example for description.
- Step S202 Invoke the first object recognition model to identify the features of the tracking object in the template image to obtain a first reference response, and invoke the second object recognition model to perform the identification process on the tracking object in the template image The characteristics of the recognition process are performed to obtain the second reference response.
- Step S203 Invoke the first object recognition model to perform identification processing on the characteristics of the tracking object in the test image to obtain a first test response, and invoke the second object recognition model to perform a The characteristics of the tracked object are identified to obtain a second test response.
- the same point of the first object recognition model and the second object recognition model is that both are image recognition models with image recognition performance.
- the convolutional neural network model has become a commonly used image recognition model due to its strong feature extraction performance.
- the first object recognition model and the second object recognition in the embodiment of the present application may be convolutional Neural network models, such as VGG model, GoogleNet model, and ResNet model.
- the difference between the first object recognition model and the second object recognition model is that the second object recognition model is an updated image recognition model, or the second object recognition model is pre-trained and tested for An image recognition model, where the first object recognition model is an image recognition model to be updated.
- the convolutional neural network model is mainly used in image recognition, face recognition, and text recognition.
- the network structure of the convolutional neural network can be shown in Figure 3a: it mainly includes a convolutional layer 301, a pooling layer 302, and a full connection. ⁇ 303.
- Each convolutional layer is connected to a pooling layer.
- the convolutional layer 301 is mainly used for feature extraction.
- the pooling layer 302 is also called a sub-sampling layer and is mainly used to reduce the scale of input data.
- the layer 303 calculates the classification value of the classification according to the features extracted by the convolutional layer, and finally outputs the classification and its corresponding classification value. It can be seen from this that the network structure of the first object recognition model and the second object recognition model also includes a convolutional layer, a pooling layer, and a fully connected layer.
- Each convolutional neural network model includes multiple convolutional layers, each convolutional layer is responsible for extracting different features of the image, the features extracted by the previous convolutional layer are used as the input of the latter convolutional layer, and each convolutional layer is responsible for
- the extracted features can be set according to a specific function, or set manually.
- the first convolutional layer can be set to be responsible for extracting the overall shape features of the graphics; the second convolutional layer is responsible for extracting the line features of the graphics; the third convolutional layer is responsible for extracting the discontinuities of the graphics Features etc.
- the first convolutional layer when recognizing images containing human faces, can be set to be responsible for extracting contour features of human faces; the second convolutional layer can be responsible for extracting facial features of human faces.
- Each convolutional layer includes multiple filters of the same size for convolution calculation, each filter corresponds to a filter channel, and each filter obtains a set of features after convolution calculation. Therefore, each The convolutional layer recognizes the input image and extracts multi-dimensional features. In the convolutional layer, the more the number of convolutional layers, the deeper the network structure of the convolutional neural network model, and the greater the number of features extracted; the more filters included in each convolutional layer, each Each convolutional layer is extracted to the higher the feature dimension.
- a model includes more convolutional layers, and/or there are more filters in each convolutional layer, a larger storage space is required when storing the model, which will require more storage space.
- the model is called a heavyweight model; conversely, if a model includes fewer convolutional layers, and/or a small number of filters in each convolutional layer, the model does not require large storage space when storing it , The model that requires less storage space is called a lightweight model.
- the first object recognition model and the second object recognition model may both be heavyweight models, or the second object recognition model is a heavyweight model, and the first object recognition model is a second object recognition model.
- Lightweight model obtained by model compression processing If the first object recognition model is a heavyweight model, the updated first object recognition model can extract high-dimensional features and has better recognition performance. When it is applied to the visual target tracking scene, it can improve the tracking performance accuracy.
- the first object recognition model is a lightweight model obtained by performing model compression processing on the second object recognition model, the updated first object recognition model has similar feature extraction performance to the second object recognition model, because it is more Less storage space allows it to be effectively used in mobile devices and other low-power products.
- feature extraction can be performed quickly to achieve real-time visual target tracking. In practical applications, it is possible to choose whether the first object recognition model is a heavyweight model or a lightweight model according to specific scene requirements.
- step S202 calling the first object recognition model to perform recognition processing on the feature of the tracking object in the template image to obtain the first reference response is essentially calling the convolutional layer of the first object recognition model to the template The feature of the tracking object in the image is subjected to feature extraction processing to obtain the first reference response.
- the first reference response is used to represent the features of the tracking object in the template image recognized by the first object recognition model, such as size, shape, contour, etc., and the first reference response can be represented by a feature map; It can be understood that the second reference response is used to represent the characteristics of the tracking object in the template image recognized by the second object recognition model; the first test response is used to represent the features recognized by the first object recognition model The feature of the tracked object in the test image; the second test response is used to represent the feature of the tracked object in the test image recognized by the second object recognition model.
- the template image may include labeling information of the tracking object
- the function of the labeling information may be: determining the size and location of the tracking object to be recognized by the first object recognition model in the template image So that the first object recognition model can accurately determine who needs to be recognized; the label information of the tracking object in the template image can be represented in the form of a label box.
- the calling of the first object recognition model to perform recognition processing on the characteristics of the tracking object in the template image to obtain the first reference response may refer to calling the first object recognition model in combination with the annotations in the template image The information recognizes the template image.
- the calling the first object recognition model to identify the features of the tracking object in the template image to obtain the first reference response may refer to the template image The features of the label box in the box are recognized.
- the template image only includes the tracking object, or includes the tracking object and the background that does not affect the recognition processing of the tracking object, such as wall, ground, sky, etc.
- whether the terminal is a template or not Setting the label information of the tracking object in the image can enable the first object recognition model to accurately determine who needs to be recognized.
- the implementation manner of calling the first object recognition model to recognize the features of the tracking object in the template image to obtain the first reference response may be: using the template image as the input of the first object recognition model ,
- the first convolution layer of the first object recognition model uses multiple filters of a certain size to perform convolution calculation on the template image, and extract the first feature of the tracking object in the template image; use the first feature as the second convolution Layer input, the second convolutional layer uses multiple filters to perform convolution calculation on the first feature, and extract the second feature of the tracking object in the template image; input the second feature to the third convolutional layer, the third volume
- the product layer uses multiple filters to perform convolution calculation on the second feature to obtain the fourth feature of the tracking object in the template image, and so on, until the last convolution layer completes the convolution calculation, the output result is the first reference response.
- the embodiment of calling the second object recognition model to perform the recognition processing on the test image to obtain the second reference response may be the same as the implementation manner described above, and will not be repeated here.
- Step S204 Perform tracking processing on the first test response to obtain the tracking response of the tracking object.
- the embodiment of the present application implements the tracking training of the first object recognition model through step S204.
- the step S204 may include: using a tracking training algorithm to perform tracking processing on the first test response to obtain a tracking response at the tracking object.
- the tracking training algorithm is an algorithm for tracking and training the first object recognition model, and may include a correlation filter tracking algorithm, a tracking algorithm based on a dual network, a sparse representation algorithm, and the like.
- the tracking response is used to indicate the tracking position of the tracking object in the test image that is determined according to the tracking training algorithm and the first test response. In fact, the tracking position can be understood as predicted according to the tracking training algorithm and the first test response. Track the position of the object in the test image.
- the tracking training algorithm is a correlation filter algorithm
- the tracking training algorithm is used to track the first test response
- the way to obtain the tracking response of the tracking object may be: A test response is tracked to obtain a Gaussian-shaped response graph, and the tracking response is determined according to the response graph.
- the implementation manner of determining the tracking response according to the response graph may be: using the response graph as the tracking response.
- the response map can reflect the tracking position of the tracking object in the test image.
- the maximum point or peak point in the response map can be used as the tracking position of the tracking object in the test image.
- the tracking tag is used to indicate the marked position of the tracked object in the test image, and the marked position may refer to the real position of the tracked object in the test image pre-marked by the terminal.
- the tracking label may also be a Gaussian-shaped response graph, and the peak point on the response graph represents the true position of the tracking object in the test image.
- FIG. 3b a schematic diagram of determining the tracking label and tracking response provided by this embodiment of the application is shown.
- 304 represents a test image and 3041 represents a tracking object.
- the tracking label that the terminal pre-marks for the test image can be as shown in Figure 3b.
- the peak point 3061 on 306 indicates the marked position of the tracking object in the test object.
- the tracking response may be determined according to the characteristics of the specific tracking training algorithm.
- Step S205 based on the difference information between the first reference response and the second reference response, the difference information between the first test response and the second test response, and the tracking tag and the tracking In response to the difference information, the first object recognition model is updated.
- the first reference response is used to represent the characteristics of the tracking object in the template image recognized by the first object recognition model, such as size, shape, contour, etc.
- the second reference response is used to Represents the feature of the tracking object in the template image recognized by the second object recognition model; it can be seen that the difference information between the first reference response and the second reference response may include the first object recognition model and When the second object recognition model performs feature extraction on the template image, the size of the difference between the extracted features.
- the size of the difference between the features can be represented by the distance between the features.
- the first reference response includes the face contour of the tracking object in the template image recognized by the first object recognition model, which is represented as Facial contour 1 and the second reference response include the facial contour of the tracking object in the template image recognized by the second object recognition model, denoted as facial contour 2; the first reference response and the second reference response
- the difference information between may include the distance between the facial contour 1 and the facial contour 2.
- the size of the difference between the features can also be represented by the similarity value between the features. The larger the similarity value, the smaller the difference between the features, the smaller the similarity value, the smaller the similarity between the features. The greater the difference.
- the difference information between the first test response and the second test response may include the difference between the extracted features when the first object recognition model and the second object recognition model perform feature extraction on the test image.
- the size of the difference It can be seen from the description in step S204 that the difference information between the tracking tag and the tracking response reflects the distance between the tracking position of the tracking object in the test image and the marked position.
- the process may be based on the difference information between the first reference response and the second reference response, the difference information between the first test response and the second test response, and the tracking
- the difference information between the tag and the tracking response is determined, the value of the loss optimization function corresponding to the first object recognition model is determined, and then the first object recognition model is updated according to the principle of reducing the value of the loss optimization function .
- the update here refers to: update each model parameter in the first object recognition model.
- the model parameters of the first object recognition model may include but are not limited to: gradient parameters, weight parameters, and so on.
- the first object recognition model and the second object recognition model are first called respectively to recognize the features of the tracking object in the template image.
- Obtain the first reference response and the second reference response and then call the first object recognition model and the second object recognition model to identify the characteristics of the tracked object in the test image to obtain the first test response and the second test response; further Ground, the first test response is tracked to obtain the tracking response of the tracked object; furthermore, the difference information between the first reference response and the second reference response, the difference between the first test response and the second test response Difference information, determine the loss of feature extraction performance of the first object recognition model compared to the second object recognition model; and determine the loss of tracking performance of the first object recognition model based on the difference information between the tracking tag and the tracking response .
- FIG. 4 is a schematic flowchart of another model training method provided by an embodiment of the present application.
- the model training method can be executed by computing devices such as terminals; the terminals here can include, but are not limited to: smart terminals, tablet computers, laptop computers, desktop computers, and so on.
- the model training method may include the following steps S401-S408:
- Step S401 Obtain a second object recognition model, and crop the second object recognition model to obtain a first object recognition model.
- the second object recognition model is a trained heavyweight model for image recognition
- the first object recognition model is a lightweight model for image recognition to be trained.
- the model compression refers to compressing the trained heavyweight model in time and space to remove some unimportant filters or parameters included in the heavyweight model and improve the feature extraction speed.
- the model compression may include model cropping and model training.
- the model cropping refers to reducing the network structure of the second object recognition model by cropping the number of filters and feature channels included in the model.
- the model training refers to the use of the second object recognition model and the template image and test image for training to update and train the cropped first object recognition model based on the transfer learning technology.
- the first object recognition model is made to have the same or similar feature recognition performance as the second object recognition model.
- the transfer learning technology refers to transferring the performance of one model to another model.
- transfer learning refers to calling the second object recognition model to identify the features of the tracking object in the template image to obtain the first Second reference response, using the second reference response as a supervisory label to train the first object recognition model to recognize the features of the tracking object in the template image, and then calling the second object recognition model to the tracking object in the test image
- the second test response is obtained by identifying the features of, and the second test response is used as a supervision label to train the first object recognition model to recognize the feature of the tracking object in the test image.
- the teacher-learning model is a typical model compression method based on the transfer learning technology.
- the second object recognition model is equivalent to the teacher model
- the first object recognition model is equivalent to the student model.
- cropping may refer to subtracting a certain number of filters included in each convolutional layer in the second object recognition model. And/or the number of feature channels corresponding to each convolutional layer is also subtracted by the corresponding number. For example, the number of filters and the number of feature channels in each convolutional layer of the second object recognition model is subtracted by three-fifths, or seven-eighths or any number; after practice, the second object is recognized The number of filters included in each convolutional layer in the model and the number of feature channels corresponding to each convolutional layer minus seven-eighths, a better first object recognition model can be obtained through training and updating. For example, referring to FIG.
- FIG. 5 a schematic diagram of cutting the second object recognition model to obtain the first object recognition model provided by this embodiment of the application.
- the cutting process of the second object recognition model by the above method only involves Convolutional layer, so for the convenience of description, only the convolutional layers of the first object recognition model and the second object recognition model are shown in FIG. 5.
- the second object recognition model is a VGG-8 model
- the first object recognition model is also a VGG-8 model.
- the VGG-8 model includes 5 convolutional layers, 501 represents the convolutional layer of the second object recognition model, 502 represents the convolutional layer of the first object recognition model, and 503 represents each convolutional layer of the second object recognition model
- the number of filters included, the number of characteristic channels, and the size of the filter are subtracted by seven-eighths to obtain the number of filters in each convolutional layer of the first object recognition model , The number of characteristic channels and the size of the filter, as shown in 504.
- Step S402 Obtain a template image and a test image for training, where both the template image and the test image include a tracking object, and the test image includes a tracking tag of the tracking object, and the tracking tag is used to represent the tracking object The label position in the test image.
- Step S403 Invoke the first object recognition model to identify the features of the tracking object in the template image to obtain a first reference response, and invoke the second object recognition model to analyze the features in the template image The characteristics of the tracked object are identified and processed to obtain a second reference response.
- Step S404 calling the first object recognition model to perform recognition processing on the characteristics of the tracking object in the test image to obtain a first test response, and calling the second object recognition model The characteristics of the tracked object are identified to obtain a second test response.
- Step S405 Perform tracking processing on the first test response to obtain the tracking response of the tracking object.
- step S405 may include using a tracking training algorithm to track the first test response to obtain the tracking response of the tracking object.
- the tracking training algorithm may include tracking algorithm parameters, and the implementation of using the tracking training algorithm to track the first test response to obtain the tracking response to the tracked object in the test image may be:
- the first test response is substituted into the tracking training algorithm with known tracking algorithm parameters for calculation, and the tracking response is determined according to the calculated result.
- the tracking algorithm parameters in the tracking training algorithm described in the embodiment of the present application are obtained by training the tracking training algorithm according to the second object recognition model and the template image.
- the following takes the tracking training algorithm as the correlation filter algorithm as an example to introduce the process of using the second object recognition model and template image to train the tracking training algorithm to obtain the tracking algorithm parameters of the correlation filter tracking algorithm.
- the tracking algorithm parameter of the correlation filter tracking algorithm refers to the filter parameter of the correlation filter parameter, and the training process of the correlation filter algorithm may include steps S11-13:
- Step S11 generating training samples according to the template image, and obtaining tracking labels corresponding to the training samples
- the template image includes the tracking object and the tracking label corresponding to the tracking object
- the training sample generated from the template image also includes the tracking object.
- the tracking label corresponding to the tracking object included in the template image may refer to the real position of the tracking object in the template image
- the tracking label including the tracking object in the template image may be pre-marked by the terminal.
- the method of generating training samples based on the template image may be: cutting out image blocks including the tracking object in the template image, performing cyclic shift processing on the image blocks to obtain training samples, and tracking labels corresponding to the training samples Determined according to the tracking label included in the template image and the degree of cyclic shift operation.
- the method of cyclically shifting the template image can be: pixelizing the image blocks of the template image to determine the pixels used to represent the tracking object. These pixels form the pixel matrix of the tracking object. For each pixel matrix Rows are cyclically shifted to obtain multiple new pixel matrices. In the above cyclic shift process, the value of each pixel does not change, but the position of the pixel changes, and the value of the pixel does not change. Therefore, the matrix after the cyclic shift is also used to represent the tracking object, and the position of the pixel occurs. Change, the position of the tracking object rendered by the new pixel matrix has changed.
- each row of the pixel matrix can be expressed as an nx1 vector, and each vector element in the vector corresponds to a pixel; each pixel in the nx1 vector is sequentially Move right or left, and get a new set of vectors each time.
- Step S12 call the second object recognition model to perform feature extraction processing on the training sample, and obtain the feature of the tracking object in the training sample;
- Calling the second object recognition model to perform feature extraction processing on multiple training samples is essentially a process of calling the convolutional layer of the second object recognition model to perform feature extraction on the training samples.
- the second object recognition model includes multiple convolutional layers, and each convolutional layer includes multiple filters for convolution calculation, so the features extracted by each convolutional layer are multi-dimensional, and each convolutional layer The extracted multi-dimensional features are successively used as the input of the next convolutional layer until the output of the last convolutional layer is obtained.
- the second object recognition model includes 5 convolutional layers.
- the feature dimension of the obtained training samples is D, assuming Represents the feature of the i-th dimension extracted by the second object recognition model, and finally the trained feature extracted by the second object recognition model is expressed as
- Step S13 Obtain a ridge regression equation for determining the relevant filter parameters, and solve the ridge regression equation to obtain the relevant filter parameters.
- the working principle of the correlation filter algorithm is: extract the features of the image including the tracking object; convolve the extracted features with the correlation filter to obtain a response map, and determine the location of the tracking object in the image from the response map .
- the convolution operation is required between two quantities of the same size. Therefore, it is necessary to ensure that the dimensions of the correlation filter and the characteristics of the training sample are the same.
- the ridge regression equation corresponding to the correlation filter algorithm can be shown in formula (1):
- ⁇ represents the convolution operation
- D represents the feature dimension of the training sample extracted by the second object recognition model
- w i represents the i-th dimension filter parameter of the correlation filter
- x represents the training sample
- y represents the tracking of the training sample x label
- ⁇ represents the regularization coefficient
- the filter parameters of each dimension of the relevant filter can be obtained.
- the equation (1) is minimized, and the equation (1) is solved in the frequency domain to obtain the filter parameters of the relevant filter in each dimension.
- the formula for solving the filter parameters in the frequency domain is introduced.
- the formula for solving the filter parameters of the d dimension in the frequency domain is expressed as (2):
- w d represents the relevant filter parameter corresponding to the dth convolutional layer
- ⁇ represents the dot multiplication operation
- * represents the complex conjugate operation.
- the filter parameters of the correlation filter of each dimension can be calculated, and the filter parameters of each dimension constitute the filter parameter of the correlation filter algorithm.
- the first test response can be tracked based on the correlation filter algorithm to obtain the tracking response of the tracking object in the test image .
- the correlation filter algorithm is used to track the first test response, and the tracking response to the tracking object in the test image can be expressed by formula (3),
- w represents the filter parameter of the correlation filter
- r represents the tracking response.
- Step S406 Obtain a loss optimization function corresponding to the first object recognition model.
- this embodiment of the application proposes to recognize the first object
- the model performs joint optimization of feature recognition loss and tracking loss.
- the loss optimization function corresponding to the first object recognition model can be expressed as formula (4):
- Step S407 based on the difference information between the first reference response and the second reference response, the difference information between the first test response and the second test response, and the tracking tag and the tracking In response to the difference information between the responses, the value of the loss optimization function is determined.
- the loss optimization function of the first object recognition model includes the feature recognition loss function and the tracking loss function.
- the value of the loss optimization function is determined in step S407, the value of the feature recognition loss function and the tracking loss function can be determined first. Value, and then determine the value of the optimized loss function according to the value of the feature recognition loss function and the value of the tracking loss function.
- the tracking the difference information between the responses and determining the value of the loss optimization function includes: acquiring the feature recognition loss function, and based on the difference information between the first reference response and the second reference response, The difference information between the first test response and the second test response, determine the value of the feature recognition loss function; obtain the tracking loss function, and based on the difference between the tracking tag and the tracking response The information determines the value of the tracking loss function; the value of the loss optimization function is determined based on the value of the feature recognition loss function and the value of the tracking loss function.
- the first reference response is used to represent the feature of the tracking object in the template image recognized by the first object recognition model
- the second is used to represent the second object
- the characteristics of the tracking object in the template image recognized by the recognition model, and the difference information between the first reference response and the second reference response reflects the comparison of the first object recognition model and the second object recognition model to the template image
- the difference between the extracted features can be expressed by distance, that is, the difference between the first reference response and the second reference response
- the information includes the distance between the first reference response and the second reference response;
- the difference information between the first test response and the second test response includes the distance between the first test response and the second test response.
- the feature recognition loss function restricts the distance between the above-mentioned features, so that the first object recognition model and the second object recognition model have the same or similar feature extraction performance. It can be seen that the feature loss optimization function includes two parts of loss, one is the feature recognition loss of the test image, and the other is the feature recognition loss of the template image.
- the loss value of the feature recognition loss of the test image is determined by the distance between the first reference response and the second reference response, and the loss value of the feature recognition loss of the template image is determined by the first test response and the second test response.
- the distance between is determined, and the loss value of the feature recognition loss about the test image and the loss value of the recognition loss about the reference image are substituted into the feature recognition loss function, and the value of the feature recognition loss function can be calculated.
- the feature recognition loss function can be expressed as formula (5):
- the difference between the tracking label and the tracking response reflects the Euclidean distance between the tracking response and the tracking label.
- the tracking performance of the first object recognition model is optimized.
- the value of the tracking loss function can be obtained.
- the tracking loss function can be expressed as formula (6):
- r represents the tracking response
- g represents the tracking label
- r can be obtained by formula (7)
- w in formula (7) represents the filter parameter of the tracking training algorithm, which can be obtained by the steps of S11-S13.
- the first object recognition model includes multiple convolutional layers, and the first test response is to perform fusion processing on each sub-test response obtained by performing recognition processing on the test image by each convolutional layer of the first object recognition model Obtained later.
- the first object recognition model includes a first convolutional layer, a second convolutional layer, and a third convolutional layer
- the first test response is the first test sub-response corresponding to the first convolutional layer
- the The second test sub-response corresponding to the second convolution layer and the third test sub-response corresponding to the third convolution layer are fused.
- the first object recognition model can be optimized for multi-scale tracking loss.
- multi-scale tracking loss optimization refers to: calculating the tracking loss values of multiple convolutional layers of the first object recognition model, and then determining the first object recognition based on the tracking loss values of the multiple convolutional layers The value of the model's tracking loss function.
- the tracking loss is determined based on the difference information between the tracking tag and the tracking response
- the value of the function includes: based on the difference information between the first tracking label corresponding to the first convolutional layer and the first tracking response obtained by tracking the first test sub-response, determining the first The tracking loss value of the convolutional layer; based on the difference information between the second tracking label corresponding to the second convolutional layer and the second tracking response obtained by tracking the second test sub-response, determine the first The tracking loss value of the second convolutional layer; based on the difference information between the third tracking label corresponding to the third convolutional layer and the third tracking response obtained by tracking the third test sub-response, determining the The tracking loss value of the third convolutional layer; the tracking loss value corresponding to the first convolutional layer, the tracking loss value corresponding to the second convolutional layer, and the
- the first tracking sub-response, the second tracking sub-response, and the third tracking sub-response may be the first test corresponding to the first convolutional layer, the second convolutional layer, and the third convolutional layer using a tracking training algorithm.
- the sub-response, the second test sub-response, and the third test sub-response are tracked. Since the features extracted by different convolutional layers are different, the first tracking sub-response, the second tracking sub-response, and the third tracking sub-response have different resolutions.
- the tracking algorithm parameters used by the tracking training algorithm to track the test subresponses of different convolutional layers are different.
- the tracking algorithm parameters under a certain convolutional layer are through the second object recognition model and the corresponding convolutional layer.
- the corresponding template image is obtained through training, and the specific training process can refer to steps S11-S13, which will not be repeated here.
- the multiple convolutional layers included in the first object recognition model are connected together in the order of connection, and the first, second, and third convolutional layers mentioned above may be the first Any three convolutional layers in the convolutional layer of an object recognition model, or the first convolutional layer is the first convolutional layer indicated by the connection order, and the third convolutional layer is the connection The last convolutional layer indicated by the order, the second convolutional layer is any convolutional layer except the first convolutional layer and the last convolutional layer, at this time the first convolutional layer It can be called the high-level convolutional layer of the first object recognition model, the second object recognition model is the middle convolutional layer of the first object recognition model, and the third convolutional layer is the low-level convolution layer of the first object recognition model.
- l represents the lth convolutional layer of the first object recognition model
- r l represents the lth tracking sub-response obtained by tracking the lth test sub-response of the lth convolutional layer by the tracking algorithm
- g l represents The tracking label of the tracking object included in the test image corresponding to the lth convolutional layer.
- the tracking algorithm performs tracking processing on the first test sub-response of the first convolutional layer to obtain the first tracking sub-response
- the tracking algorithm parameters corresponding to the first convolutional layer used are obtained through the second object recognition model and the first l
- the template image corresponding to the convolutional layer is trained.
- FIG. 6 a schematic diagram of joint optimization of a first object recognition model provided by an embodiment of this application.
- the feature recognition loss optimization shown in the figure is as shown in formula (5) and the multi-scale tracking loss optimization is as shown in formula ( 8)
- 601 represents the first object recognition model
- 602 represents the second object recognition model.
- Step S408 According to the principle of reducing the value of the loss optimization function, the first object recognition model is updated.
- the value of the loss optimization function is continuously reduced, and the value of the feature recognition loss function and the value of the tracking loss function can be deduced according to the value of the loss optimization function, and then the model parameters of the first object recognition model are adjusted to make the first The distance between the reference response and the second reference response, and the distance between the first test response and the second test response meet the value of the feature recognition loss function; at the same time, adjust the model parameters of the first object recognition model so that the tracking response is The Euclidean distance between the tracking labels satisfies the value of the tracking loss function.
- the template image and test image used in the foregoing steps S401-S408 to update the first object recognition model are both images that include the tracking object, this ensures that the updated first object recognition model can perform better on the tracking object.
- the ability to extract features may include other backgrounds in addition to the tracking object. Therefore, in order to further improve the capability of the first object recognition model, the embodiment of the present application After the first object recognition model is updated through steps S401-S408, the positive and negative samples are also used to update the first object recognition model, so that the first object recognition model has better feature discrimination ability, that is, it can Better distinguish the tracked objects and background included in the image.
- using the positive sample and the negative sample to update the first object recognition model may include: acquiring a reference image including the tracking object, and determining the positive sample and the negative sample for training based on the reference image, the reference The image may be the first frame of image in the video sequence to be tracked using the first object recognition model, the positive sample refers to an image that includes the tracking object, and the negative sample refers to an image that does not include the tracking object , The positive sample includes the positive sample tracking label of the tracking object, and the negative sample includes the negative sample tracking label of the tracking object; calling the updated first object recognition model to identify the positive sample , Obtain a positive sample recognition response, and call the updated first object recognition model to perform recognition processing on the negative sample to obtain a negative sample recognition response; track the positive sample recognition response to obtain the positive Tracking response to the positive sample of the tracking object in the sample; tracking the negative sample identification response to obtain the negative sample tracking response to the tracking object in the negative sample; based on the positive sample The difference information between the tracking response and the positive sample tracking label, and the difference
- the method of obtaining positive samples and negative samples based on the reference image may be: randomly cropping the reference image to obtain multiple image blocks, taking the image block containing the tracking object as the positive sample, and will not include The image block of the tracking object is taken as a negative sample.
- the positive sample tracking label corresponding to the positive sample is the true position of the tracking object in the positive sample, and the negative sample does not contain the tracking object, and the corresponding negative sample tracking label is 0.
- Figure 7 shows a schematic diagram of obtaining positive samples and negative samples.
- Figure 7 701 is a reference image.
- the reference image is randomly cropped to obtain multiple image blocks, such as multiple labeled frames included in 701, each The labeled box represents an image block; assuming that the tracking object is 702, select the image block including 702 from the multiple image blocks of 701 as positive samples, such as 703 and 704 in the figure, and select the image block not including 702 as negative samples. 705 and 706 in the figure.
- the positive sample tracking labels corresponding to 703 and 704 are the true positions of the tracking objects in 703 and 704, as shown by the dots in the lower figure of 703 and 704. Since the negative samples 705 and 706 do not include tracking objects, their corresponding tracking labels are 0, so no dots appear.
- the training is based on the difference information between the positive sample tracking response and the positive sample tracking label, and the difference information between the negative sample tracking response and the negative sample tracking label.
- the updated first object recognition model includes: obtaining a tracking loss optimization function; based on difference information between the positive sample tracking response and the positive sample tracking label, and the negative sample tracking response and the negative sample. The difference information between the tracking tags is determined, and the value of the tracking loss optimization function is determined; and the updated first object recognition model is trained according to the principle of reducing the value of the tracking loss optimization function.
- the difference information between the positive sample tracking response and the positive sample tracking label includes that the first object recognition model performs tracking processing on the positive sample to obtain the Euclidean distance between the position of the tracking object and the true position of the tracking object in the positive sample.
- the difference information between the negative sample tracking response and the negative sample tracking label includes the tracking processing of the negative sample by the first object recognition model, and the obtained position of the tracking object is the difference between the tracking object and the true position of the tracking object in the negative sample. Euclidean distance between. Bring the above two into the tracking loss optimization function to obtain the value of the tracking loss optimization function, and then update the updated first object recognition model again according to the principle of reducing the value of the tracking loss optimization function. By repeating the steps of tracking loss optimization, the update of the updated first object recognition model is completed.
- multi-scale tracking optimization may also be adopted.
- the first object recognition model includes a first convolutional layer, a second convolutional layer, and a third convolutional layer
- the positive sample tracking label includes the first positive sample tracking label corresponding to the first convolutional layer, and the The second positive sample tracking label corresponding to the second convolutional layer and the third positive sample tracking label corresponding to the third convolutional layer
- the positive sample recognition response is the first sub-recognition response of the positive sample corresponding to the first convolutional layer
- the second The second sub-recognition response of the positive sample corresponding to the convolutional layer and the third sub-recognition response of the positive sample corresponding to the third convolutional layer are fused
- the negative sample recognition response is the first negative sample corresponding to the first convolutional layer.
- the positive sample tracking response may include a first positive sample tracking response obtained by using a tracking training algorithm to track the first sub-identification response of the positive sample, and a second positive sample tracking obtained by tracking the second sub-identification response of the positive sample.
- the negative sample tracking response may include the first negative tracking response obtained when the tracking training algorithm is used to track the first negative sample identification response, and the second negative sample tracking response obtained when the tracking training algorithm tracks the second negative sample identification response.
- the negative sample sub-tracking response, and the third negative sample sub-tracking response obtained when the tracking training algorithm performs tracking processing on the third negative sample identification response.
- the implementation of the multi-scale tracking loss optimization may be based on the difference information between the first positive sample tracking response and the first positive sample tracking label, and the difference information between the first negative sample tracking response and the negative sample tracking response , Determine the value of the tracking loss optimization function of the first convolutional layer; based on the difference information between the second positive sample tracking response and the second positive sample tracking label, and the difference between the second negative sample tracking response and the negative sample tracking response Difference information, determine the value of the tracking loss optimization function of the second convolutional layer, and based on the difference information between the third positive sample tracking response and the third positive sample tracking label, and the third negative sample tracking response and the negative sample tracking response Determine the value of the tracking loss optimization function of the third convolutional layer; finally, according to the value of the tracking loss optimization function of the first convolutional layer, the value of the tracking loss optimization function of the second convolutional layer, and the third The value of the tracking loss optimization function of the convolutional layer determines the value of the tracking loss optimization function. It is assumed that the tracking loss optimization function of multi-
- g l indicates the tracking label of the positive sample corresponding to the positive sample under the first convolutional layer
- w l represents the tracking algorithm parameter corresponding to the lth convolutional layer.
- the tracking algorithm parameters corresponding to different convolutional layers are trained by the second object recognition model and the corresponding positive samples under the corresponding convolutional layer.
- the corresponding positive samples under different convolutional layers have the same size and different resolution.
- the first object recognition model and certain tracking algorithms can be combined and applied in scene analysis, monitoring equipment, human-computer interaction and other scenes that require visual target tracking.
- the implementation of combining the first object recognition model and certain tracking algorithms in the visual target tracking scene may include: acquiring the image to be processed, and determining the image to be processed according to the annotation information of the tracking object in the reference image
- the image to be processed may be an image other than the first frame in the video sequence to be used for visual target tracking using the first object recognition model; call the updated first object recognition model to the reference Perform recognition processing on the tracking object in the image to obtain a first recognition feature; call the updated first object recognition model to recognize the predicted tracking object in the image to be processed to obtain a second recognition feature;
- the first identification feature and the second identification feature determine a target feature for tracking processing, and use a tracking algorithm to track the target feature to obtain position information of the tracking object in the image to be processed .
- the heavyweight second object recognition model is used to train the lightweight first object recognition model in the embodiment of this application
- the first object recognition model and the second object recognition model are respectively called to the template image used for training.
- the feature of the tracking object is identified by the first reference response and the second reference response, and then the first object recognition model and the second object recognition model are called to identify the feature of the tracking object in the test image used for training.
- the first object recognition model is optimized according to the loss in feature extraction performance and the loss in tracking performance, so that the updated lightweight first object recognition model has the same or the same value as the second object recognition model. Relatively similar feature extraction performance, faster feature extraction speed, and ensure that the features extracted by the first object recognition model are more suitable for visual target tracking scenes, thereby improving the accuracy and real-time performance of visual target tracking.
- an embodiment of the present application also discloses a model training device, which can execute the methods shown in FIG. 2 and FIG. 4. Please refer to Figure 8.
- the model training device can run the following units:
- the acquiring unit 801 is configured to acquire a training template image and a test image, the template image and the test image both include a tracking object, the test image includes a tracking label of the tracking object, and the tracking label is used to indicate the Track the marked position of the object in the test image;
- the processing unit 802 is configured to call a first object recognition model to perform recognition processing on the characteristics of the tracking object in the template image, obtain a first reference response, and call a second object recognition model to analyze all the features in the template image. Performing recognition processing on the characteristics of the tracking object to obtain a second reference response;
- the processing unit 802 is further configured to call the first object recognition model to perform recognition processing on the characteristics of the tracked object in the test image, obtain a first test response, and call the second object recognition model to Performing identification processing on the feature of the tracking object in the test image to obtain a second test response;
- the processing unit 802 is further configured to perform tracking processing on the first test response to obtain a tracking response of the tracking object, where the tracking response is used to indicate the tracking position of the tracking object in the test image;
- the update unit 803 is configured to be based on the difference information between the first reference response and the second reference response, the difference information between the first test response and the second test response, and the tracking tag and And update the first object recognition model by tracking the difference information between the responses.
- the acquisition unit 801 is further configured to: acquire a second object recognition model; the processing unit 802 is also configured to; crop the second object recognition model to obtain the first object recognition model.
- the update unit 803 is based on the difference information between the first reference response and the second reference response, and the difference information between the first test response and the second test response.
- the difference information between the tracking tag and the tracking response when the first object recognition model is updated, the following operations are performed: obtain the loss optimization function corresponding to the first object recognition model; based on the first reference The difference information between the response and the second reference response, the difference information between the first test response and the second test response, and the difference information between the tracking tag and the tracking response, determine all The value of the loss optimization function; according to the principle of reducing the value of the loss optimization function, the first object recognition model is updated.
- the loss optimization function includes a feature recognition loss function and a tracking loss function
- the update unit 803 is based on the difference information between the first reference response and the second reference response, the first The difference information between a test response and the second test response and the difference information between the tracking tag and the tracking response, when determining the value of the loss optimization function, perform the following operations: obtain the feature recognition Loss function, and based on the difference information between the first reference response and the second reference response, and the difference information between the first test response and the second test response, determine the feature recognition loss function Obtain the tracking loss function, and determine the value of the tracking loss function based on the difference information between the tracking tag and the tracking response; Identify the value of the loss function and the tracking loss function based on the feature The value of determines the value of the loss optimization function.
- the first object recognition model includes a first convolutional layer, a second convolutional layer, and a third convolutional layer
- the first test response is the first convolutional layer corresponding to the first convolutional layer.
- a test sub-response, a second test sub-response corresponding to the second convolutional layer, and a third test sub-response corresponding to the third convolutional layer are fused; the update unit 803 is based on the tracking label and When the difference information between the tracking responses determines the value of the tracking loss function, the following operations are performed:
- the first object recognition model includes a plurality of convolutional layers, the plurality of convolutional layers are connected in a connection order, and the first convolutional layer is the first convolutional layer indicated by the connection order.
- Convolutional layers the third convolutional layer is the last convolutional layer indicated by the connection order, and the second convolutional layer is divided by the first convolutional layer and the last convolutional layer Any convolutional layer outside the layer.
- the acquiring unit 801 is further configured to acquire a reference image including a tracking object, and determine a positive sample and a negative sample for training based on the reference image, and the positive sample refers to including the tracking An image of an object, the negative sample refers to an image that does not include the tracking object, the positive sample includes the positive sample tracking label of the tracking object, and the negative sample includes the negative sample tracking label of the tracking object, so
- the reference image includes the annotation information of the tracking object;
- the processing unit 802 is further configured to call the updated first object recognition model to perform recognition processing on the positive sample to obtain a positive sample recognition response, and call the updated first object recognition model to Recognize the negative sample, and get the negative sample recognition response;
- the processing unit 802 is further configured to track the positive sample identification response to obtain a positive sample tracking response to the tracking target in the positive sample; and track the negative sample identification response, Obtaining the negative sample tracking response to the tracking object in the negative sample;
- the update unit 803 is further configured to train based on the difference information between the positive sample tracking response and the positive sample tracking label, and the difference information between the negative sample tracking response and the negative sample tracking label The updated first object recognition model.
- the update unit 803 is based on the difference information between the positive sample tracking response and the positive sample tracking label, and the difference between the negative sample tracking response and the negative sample tracking label Information, when training the updated first object recognition model, perform the following steps:
- Obtain a tracking loss optimization function determine the tracking based on the difference information between the positive sample tracking response and the positive sample tracking label, and the difference information between the negative sample tracking response and the negative sample tracking label
- the value of the loss optimization function according to the principle of reducing the value of the tracking loss function, the updated first object recognition model is updated.
- the acquiring unit 801 is further configured to acquire an image to be processed; the processing unit 802 is further configured to determine that the image to be processed is in the reference image according to the annotation information of the tracking object
- the processing unit 802 is also used to call the updated first object recognition model to recognize the tracking object in the reference image to obtain the first recognition feature; the processing unit 803 , Is also used to call the updated first object recognition model to recognize the predicted tracking object in the to-be-processed image to obtain the second recognition feature;
- the processing unit 802 is also used to The first identification feature and the second identification feature determine a target feature used for tracking processing, and use a tracking algorithm to track the target feature to obtain position information of the tracking object in the image to be processed.
- each step involved in the method shown in FIG. 2 or FIG. 4 may be executed by each unit in the model training device shown in FIG. 8.
- step S201 shown in FIG. 2 may be executed by the acquiring unit 801 shown in FIG. 8
- steps S202-S204 may be executed by the processing unit 802 shown in FIG. 8
- step S205 may be executed by the updating unit 803 shown in FIG.
- steps S401, S402, and S406 shown in FIG. 4 can be executed by the acquiring unit 801 shown in FIG. 8, and steps S403-S405, and S407 can be executed by the processing unit 802 in FIG. 8, and step S408 It can be executed by the update unit 803 shown in FIG. 8.
- each unit in the model training device shown in FIG. 8 can be separately or completely combined into one or several other units to form, or some of the units can be disassembled. It is composed of multiple units with smaller functions, which can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present application.
- the above-mentioned units are divided based on logical functions. In practical applications, the function of one unit may also be realized by multiple units, or the functions of multiple units may be realized by one unit. In other embodiments of the present application, the model-based training device may also include other units. In actual applications, these functions may also be implemented with the assistance of other units, and may be implemented by multiple units in cooperation.
- a general-purpose computing device such as a computer including a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM) and other processing elements and storage elements
- CPU central processing unit
- RAM random access storage medium
- ROM read-only storage medium
- Run a computer program (including program code) capable of executing the steps involved in the corresponding method shown in FIG. 2 or FIG. 4 to construct the model training device as shown in FIG. 8 and to implement the embodiments of the present application
- the model training method can be recorded on, for example, a computer-readable recording medium, and loaded into the aforementioned computing device via the computer-readable recording medium, and run in it.
- the first object recognition model is first called separately, and the first object recognition model and the second object recognition model
- the feature of the tracking object is identified to obtain the first reference response and the second reference response, and then the first object recognition model and the second object recognition model are called to identify the feature of the tracking object in the test image to obtain the first test response And the second test response; further, the first test response is tracked to obtain the tracking response of the tracked object; further, the difference information between the first reference response and the second reference response, the first test response and the The difference information between the second test responses is determined to determine the loss of feature extraction performance of the first object recognition model compared to the second object recognition model; and the first object recognition is determined based on the difference information between the tracking tag and the tracking response Loss of model tracking performance.
- an embodiment of the present application also provides a computing device, such as the terminal shown in FIG. 9.
- the terminal includes at least a processor 901, an input device 902, an output device 903, and a computer storage medium 904.
- the input device 902 may also include a camera component, which can be used to obtain template images and/or test images, the camera component can also be used to obtain reference images and/or images to be processed, and the camera component can be a terminal
- the components that are configured on the terminal at the factory can also be external components connected to the terminal.
- the terminal may also be connected to other devices to receive template images and/or test images sent by other devices, or to receive reference images and/or images to be processed sent by other devices.
- the computer storage medium 904 may be stored in the memory of the terminal.
- the computer storage medium 904 is used to store a computer program.
- the computer program includes program instructions.
- the processor 901 is used to execute the program instructions stored in the computer storage medium 904. .
- the processor 901, or CPU Central Processing Unit, central processing unit) is the computing core and control core of the terminal.
- the processor 901 described in this embodiment of the application may be used to execute: obtain a template image and a test image for training, and both the template image and the test image include tracking objects ,
- the test image includes a tracking tag of the tracking object, the tracking tag is used to indicate the marked position of the tracking object in the test image; calling the first object recognition model to the tracking object in the template image To obtain a first reference response, and call the second object recognition model to recognize the features of the tracking object in the template image to obtain a second reference response; call the first object
- the recognition model performs recognition processing on the characteristics of the tracking object in the test image to obtain a first test response, and calls the second object recognition model to perform recognition processing on the characteristics of the tracking object in the test image , Obtain a second test response; perform tracking processing on the first test response to obtain a tracking response of the tracking object, the tracking response being used to indicate the tracking position of
- the embodiment of the present application also provides a computer storage medium (Memory).
- the computer storage medium is a memory device in a terminal for storing programs and data. It can be understood that the computer storage medium herein may include a built-in storage medium in the terminal, and of course, may also include an extended storage medium supported by the terminal.
- the computer storage medium provides storage space, and the storage space stores the operating system of the terminal.
- one or more instructions suitable for being loaded and executed by the processor 901 are stored in the storage space, and these instructions may be one or more computer programs (including program codes).
- the computer storage medium here may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory; in an embodiment of the present application, it may also be at least one located in A computer storage medium remote from the aforementioned processor.
- the processor 901 can load and execute one or more instructions stored in the computer storage medium to implement the corresponding steps of the method in the above-mentioned model training embodiment; in a specific implementation, one of the instructions in the computer storage medium Or multiple instructions are loaded by the processor 901 and execute the following steps:
- the template image and the test image both include a tracking object, the test image includes a tracking label of the tracking object, and the tracking label is used to indicate that the tracking object is Test the label position in the image; call the first object recognition model to identify the features of the tracking object in the template image to obtain the first reference response, and call the second object recognition model to analyze the features in the template image
- the feature of the tracked object is identified to obtain a second reference response; the first object recognition model is invoked to identify the feature of the tracked object in the test image to obtain the first test response, and call all
- the second object recognition model performs recognition processing on the characteristics of the tracked object in the test image to obtain a second test response; performs tracking processing on the first test response to obtain the tracking response of the tracked object, so
- the tracking response is used to indicate the tracking position of the tracking object in the test image; based on the difference information between the first reference response and the second reference response, the first test response and the second 2.
- one or more instructions in the computer storage medium are loaded by the processor 901 and the following steps are also performed: obtaining a second object recognition model; cropping the second object recognition model to obtain the first object recognition model .
- the processor 901 is based on the difference information between the first reference response and the second reference response, and the difference information between the first test response and the second test response. And the difference information between the tracking tag and the tracking response, when updating the first object recognition model, perform the following operations:
- the loss optimization function corresponding to the first object recognition model based on the difference information between the first reference response and the second reference response, the difference between the first test response and the second test response.
- the difference information and the difference information between the tracking tag and the tracking response determine the value of the loss optimization function; and update the first object recognition model according to the principle of reducing the value of the loss optimization function .
- the loss optimization function includes a feature recognition loss function and a tracking loss function
- the processor 901 is based on the difference information between the first reference response and the second reference response, the first When the difference information between a test response and the second test response and the difference information between the tracking tag and the tracking response are determined, the following operations are performed when the value of the loss optimization function is determined:
- the value of the feature recognition loss function acquire the tracking loss function, and determine the value of the tracking loss function based on the difference information between the tracking tag and the tracking response; recognize the value of the loss function based on the feature and The value of the tracking loss function determines the value of the loss optimization function.
- the first object recognition model includes a first convolutional layer, a second convolutional layer, and a third convolutional layer
- the first test response is the first convolutional layer corresponding to the first convolutional layer.
- a test sub-response, a second test sub-response corresponding to the second convolutional layer, and a third test sub-response corresponding to the third convolutional layer are fused; the processor 901 is based on the tracking label and When the difference information between the tracking responses determines the value of the tracking loss function, the following operations are performed:
- the first object recognition model includes a plurality of convolutional layers, the plurality of convolutional layers are connected in a connection order, and the first convolutional layer is the first convolutional layer indicated by the connection order.
- Convolutional layers the third convolutional layer is the last convolutional layer indicated by the connection order, and the second convolutional layer is divided by the first convolutional layer and the last convolutional layer Any convolutional layer outside the layer.
- one or more instructions in the computer storage medium are loaded by the processor 901 and the following steps are also executed:
- the positive sample refers to an image that includes the tracking object
- the negative sample refers to an image that does not include the tracking object.
- An image of an object the positive sample includes a positive sample tracking label of the tracking object, the negative sample includes a negative sample tracking label of the tracking object, and the reference image includes the annotation information of the tracking object;
- the updated first object recognition model performs recognition processing on the positive sample to obtain a positive sample recognition response, and calls the updated first object recognition model to perform recognition processing on the negative sample to obtain a negative sample recognition response Tracing the positive sample identification response to obtain a positive sample tracking response to the tracking target in the positive sample; and tracking the negative sample identification response to obtain the negative sample Tracking response to the negative sample of the tracking object in the tracking object; based on the difference information between the positive sample tracking response and the positive sample tracking label, and the difference between the negative sample tracking response and the negative sample tracking label Information, training the updated first object recognition model.
- the processor 901 is based on the difference information between the positive sample tracking response and the positive sample tracking label, and the difference between the negative sample tracking response and the negative sample tracking label Information, when training the updated first object recognition model, perform the following operations:
- Obtain a tracking loss optimization function determine the tracking based on the difference information between the positive sample tracking response and the positive sample tracking label, and the difference information between the negative sample tracking response and the negative sample tracking label
- the value of the loss optimization function according to the principle of reducing the value of the tracking loss function, the updated first object recognition model is updated.
- one or more instructions in the computer storage medium are loaded by the processor 901 and the following steps are also executed:
- the tracking object is recognized to obtain a first recognition feature
- the updated first object recognition model is called to recognize the predicted tracking object in the image to be processed to obtain a second recognition feature
- the first identification feature and the second identification feature determine a target feature used for tracking processing, and use a tracking algorithm to track the target feature to obtain position information of the tracking object in the image to be processed.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (12)
- 一种模型训练方法,由计算设备执行,包括:获取用于训练的模板图像和测试图像,所述模板图像和所述测试图像均包括跟踪对象,所述测试图像包括所述跟踪对象的跟踪标签,所述跟踪标签用于表示所述跟踪对象在所述测试图像中的标注位置;调用第一物体识别模型对所述模板图像中的所述跟踪对象的特征进行识别处理,得到第一参考响应,并调用第二物体识别模型对所述模板图像中的所述跟踪对象的特征进行识别处理,得到第二参考响应;调用所述第一物体识别模型对所述测试图像中的所述跟踪对象的特征进行识别处理,得到第一测试响应,并调用所述第二物体识别模型对所述测试图像中的所述跟踪对象的特征进行识别处理,得到第二测试响应;对所述第一测试响应进行跟踪处理,得到所述跟踪对象的跟踪响应,所述跟踪响应用于表示所述跟踪对象在所述测试图像中的跟踪位置;基于所述第一参考响应与所述第二参考响应之间的差异信息、所述第一测试响应与所述第二测试响应之间的差异信息以及所述跟踪标签与所述跟踪响应之间的差异信息,更新所述第一物体识别模型。
- 如权利要求1所述的方法,还包括:获取第二物体识别模型;对所述第二物体识别模型进行裁剪,得到第一物体识别模型。
- 如权利要求1所述的方法,所述基于所述第一参考响应与所述第二参考响应之间的差异信息、所述第一测试响应与所述第二测试响应之间的差异信息以及所述跟踪标签与所述跟踪响应之间的差异信息,更新所述第一物体识别模型,包括:获取所述第一物体识别模型对应的损失优化函数;基于所述第一参考响应与所述第二参考响应之间的差异信息、所述第一测试响应与所述第二测试响应之间的差异信息以及所述跟踪标签与所述跟踪响应之间的差异信息,确定所述损失优化函数的值;按照减小所述损失优化函数的值的原则,对所述第一物体识别模型进行更新。
- 如权利要求3所述的方法,所述损失优化函数包括特征识别损失函数和跟踪损失函数,所述基于所述第一参考响应与所述第二参考响应之间的差异信息、所述第一测试响应与所述第二测试响应之间的差异信息以及所述跟踪标签与所述跟踪响应之间的差异信息,确定所述损失优化函数的值,包括:获取所述特征识别损失函数,并基于所述第一参考响应与所述第二参考响应之间的差异信息、所述第一测试响应与所述第二测试响应之间的差异信息,确定所述特征识别损失函数的值;获取所述跟踪损失函数,并基于所述跟踪标签与所述跟踪响应之间的差异信息确定所述跟踪损失函数的值;基于所述特征识别损失函数的值和所述跟踪损失函数的值确定损失优化函数的值。
- 如权利要求4所述的方法,所述第一物体识别模型包括第一卷积层、第二卷积层和第三卷积层,所述第一测试响应是由所述第一卷积层对应的第一测试子响应、所述第二卷积层对应的第二测试子响应以及所述第三卷积层对应的第三测试子响应融合得到的;所述基于所述跟踪标签与所述跟踪响应之间的差异信息确定所述跟踪损失函数的值,包括:基于所述第一卷积层对应的第一跟踪标签与对所述第一测试子响应进行跟踪处理得到的第一跟踪响应之间的差异信息,确定所述第一卷积层的跟踪损失值;基于所述第二卷积层对应的第二跟踪标签与对所述第二测试子响应进行跟踪处理得到的第二跟踪响应之间的差异信息,确定所述第二卷积层的跟踪损失值;基于所述第三卷积层对应的第三跟踪标签与对所述第三测试子响应进行跟踪处理得到的第三跟踪响应之间的差异信息,确定所述第三卷积层的跟踪损失值;将所述第一卷积层对应的跟踪损失值、所述第二卷积层对应的跟踪损失值以及所述第三卷积层对应的跟踪损失值进行融合处理,得到跟踪损失函数的值;其中,所述第一跟踪响应、所述第二跟踪响应以及所述第三跟踪响应具有不同分辨率。
- 权利要求5所述的方法,所述第一物体识别模型包括多个卷积层,所述多个卷积层按照连接顺序相连接,所述第一卷积层为所述连接顺序所指示的第一个卷积层,所述第三卷积层为所述连接顺序所指示的最后一个卷积层,所述第二卷积层为除所述第一个卷积层和所述最后一个卷积层外的任意一个卷积层。
- 如权利要求1所述的方法,还包括:获取包括跟踪对象的参考图像,并基于所述参考图像确定用于训练的正样本和负样本,所述正样本是指包括所述跟踪对象的图像,所述负样本是指不包括所述跟踪对象的图像,所述正样本包括所述跟踪对象的正样本跟踪标签,所述负样本包括所述跟踪对象的负样本跟踪标签,所述参考图像中包括所述跟踪对象的标注信息;调用所述已更新的第一物体识别模型对所述正样本进行识别处理,得到正样本识别响应,并调用所述已更新的第一物体识别模型对所述负样本进行识别处理,得到负样本识别响应;对所述正样本识别响应进行跟踪处理,得到在所述正样本中对所述跟踪对象的正样本跟踪响应;并对所述负样本识别响应进行跟踪处理,得到所述在所述负样本中对所述跟踪对象的负样本跟踪响应;基于所述正样本跟踪响应与所述正样本跟踪标签之间的差异信息,以及所述负样本跟踪响应与所述负样本跟踪标签之间的差异信息,训练所述已更新的第一物体识别模型。
- 如权利要求7所述的方法,所述基于所述正样本跟踪响应与所述正样本跟踪标签之间的差异信息,以及所述负样本跟踪响应与所述负样本跟踪标签之间的差异信息,训练所述已更新的第一物体识别模型, 包括:获取跟踪损失优化函数;基于所述正样本跟踪响应与所述正样本跟踪标签之间的差异信息,以及所述负样本跟踪响应与所述负样本跟踪标签之间的差异信息,确定所述跟踪损失优化函数的值;按照减小所述跟踪损失优化函数的值的原则,对所述已更新的第一物体识别模型进行更新。
- 如权利要求7或8所述的方法,还包括:获取待处理图像,并根据所述参考图像中的所述跟踪对象的标注信息确定所述待处理图像中包括的预测跟踪对象;调用已更新的第一物体识别模型对所述参考图像中的所述跟踪对象进行识别处理,得到第一识别特征;调用所述已更新的第一物体识别模型对所述待处理图像中的所述预测跟踪对象进行识别处理,得到第二识别特征;基于所述第一识别特征和所述第二识别特征确定用于跟踪处理的目标特征,并采用跟踪算法对所述目标特征进行跟踪处理,得到所述跟踪对象在所述待处理图像中的位置信息。
- 一种模型训练装置,包括:获取单元,用于获取训练的模板图像和测试图像,所述模板图像和所述测试图像均包括跟踪对象,所述测试图像包括所述跟踪对象的跟踪标签,所述跟踪标签用于表示所述跟踪对象在所述测试图像中的标注位置;处理单元,用于调用第一物体识别模型对所述模板图像中的所述跟踪对象的特征进行识别处理,得到第一参考响应,并调用第二物体识别模型对所述模板图像中的所述跟踪对象的特征进行识别处理,得到第二参考响应;调用所述第一物体识别模型对所述测试图像中的所述跟踪对象的特征进行识别处理,得到第一测试响应,并调用所述第 二物体识别模型对所述测试图像中的所述跟踪对象的特征进行识别处理,得到第二测试响应;对所述第一测试响应进行跟踪处理,得到所述跟踪对象的跟踪响应,所述跟踪响应用于表示所述跟踪对象在所述测试图像中的跟踪位置;更新单元,用于基于所述第一参考响应与所述第二参考响应之间的差异信息、所述第一测试响应与所述第二测试响应之间的差异信息以及所述跟踪标签与所述跟踪响应之间的差异信息,更新所述第一物体识别模型。
- 一种终端,包括输入设备和输出设备,还包括:处理器,用于实现一条或多条指令;以及,计算机存储介质,所述计算机存储介质存储有一条或多条指令,所述一条或多条指令用于由所述处理器加载并执行如权利要求1-9中的任一项所述的模型训练方法。
- 一种计算机存储介质,所述计算机存储介质中存储有计算机程序指令,所述计算机程序指令被处理器执行时,用于执行如权利要求1-9中的任一项所述的模型训练方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20805250.6A EP3971772B1 (en) | 2019-05-13 | 2020-04-07 | Model training method and apparatus, and terminal and storage medium |
KR1020217025275A KR102591961B1 (ko) | 2019-05-13 | 2020-04-07 | 모델 트레이닝 방법 및 장치, 및 이를 위한 단말 및 저장 매체 |
JP2021536356A JP7273157B2 (ja) | 2019-05-13 | 2020-04-07 | モデル訓練方法、装置、端末及びプログラム |
US17/369,833 US11704817B2 (en) | 2019-05-13 | 2021-07-07 | Method, apparatus, terminal, and storage medium for training model |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910397253.XA CN110147836B (zh) | 2019-05-13 | 2019-05-13 | 模型训练方法、装置、终端及存储介质 |
CN201910397253.X | 2019-05-13 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/369,833 Continuation US11704817B2 (en) | 2019-05-13 | 2021-07-07 | Method, apparatus, terminal, and storage medium for training model |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020228446A1 true WO2020228446A1 (zh) | 2020-11-19 |
Family
ID=67595324
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/083523 WO2020228446A1 (zh) | 2019-05-13 | 2020-04-07 | 模型训练方法、装置、终端及存储介质 |
Country Status (6)
Country | Link |
---|---|
US (1) | US11704817B2 (zh) |
EP (1) | EP3971772B1 (zh) |
JP (1) | JP7273157B2 (zh) |
KR (1) | KR102591961B1 (zh) |
CN (1) | CN110147836B (zh) |
WO (1) | WO2020228446A1 (zh) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112967315A (zh) * | 2021-03-02 | 2021-06-15 | 北京百度网讯科技有限公司 | 一种目标跟踪方法、装置及电子设备 |
CN113469977A (zh) * | 2021-07-06 | 2021-10-01 | 浙江霖研精密科技有限公司 | 一种基于蒸馏学习机制的瑕疵检测装置、方法、存储介质 |
CN113838093A (zh) * | 2021-09-24 | 2021-12-24 | 重庆邮电大学 | 基于空间正则化相关滤波器的自适应多特征融合跟踪方法 |
CN114463829A (zh) * | 2022-04-14 | 2022-05-10 | 合肥的卢深视科技有限公司 | 模型训练方法、亲属关系识别方法、电子设备及存储介质 |
CN114693995A (zh) * | 2022-04-14 | 2022-07-01 | 北京百度网讯科技有限公司 | 应用于图像处理的模型训练方法、图像处理方法和设备 |
CN115455306A (zh) * | 2022-11-11 | 2022-12-09 | 腾讯科技(深圳)有限公司 | 推送模型训练、信息推送方法、装置和存储介质 |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20180027887A (ko) * | 2016-09-07 | 2018-03-15 | 삼성전자주식회사 | 뉴럴 네트워크에 기초한 인식 장치 및 뉴럴 네트워크의 트레이닝 방법 |
CN110147836B (zh) * | 2019-05-13 | 2021-07-02 | 腾讯科技(深圳)有限公司 | 模型训练方法、装置、终端及存储介质 |
KR20210061839A (ko) * | 2019-11-20 | 2021-05-28 | 삼성전자주식회사 | 전자 장치 및 그 제어 방법 |
CN111401192B (zh) * | 2020-03-10 | 2023-07-18 | 深圳市腾讯计算机***有限公司 | 基于人工智能的模型训练方法和相关装置 |
JP7297705B2 (ja) * | 2020-03-18 | 2023-06-26 | 株式会社東芝 | 処理装置、処理方法、学習装置およびプログラム |
US11599742B2 (en) * | 2020-04-22 | 2023-03-07 | Dell Products L.P. | Dynamic image recognition and training using data center resources and data |
CN111738436B (zh) * | 2020-06-28 | 2023-07-18 | 电子科技大学中山学院 | 一种模型蒸馏方法、装置、电子设备及存储介质 |
CN111767711B (zh) * | 2020-09-02 | 2020-12-08 | 之江实验室 | 基于知识蒸馏的预训练语言模型的压缩方法及平台 |
CN113515999A (zh) * | 2021-01-13 | 2021-10-19 | 腾讯科技(深圳)有限公司 | 图像处理模型的训练方法、装置、设备及可读存储介质 |
CN114245206B (zh) * | 2022-02-23 | 2022-07-15 | 阿里巴巴达摩院(杭州)科技有限公司 | 视频处理方法及装置 |
CN114359563B (zh) * | 2022-03-21 | 2022-06-28 | 深圳思谋信息科技有限公司 | 模型训练方法、装置、计算机设备和存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103149939A (zh) * | 2013-02-26 | 2013-06-12 | 北京航空航天大学 | 一种基于视觉的无人机动态目标跟踪与定位方法 |
US20190021426A1 (en) * | 2017-07-20 | 2019-01-24 | Siege Sports, LLC | Highly Custom and Scalable Design System and Method for Articles of Manufacture |
CN109344742A (zh) * | 2018-09-14 | 2019-02-15 | 腾讯科技(深圳)有限公司 | 特征点定位方法、装置、存储介质和计算机设备 |
CN110147836A (zh) * | 2019-05-13 | 2019-08-20 | 腾讯科技(深圳)有限公司 | 模型训练方法、装置、终端及存储介质 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101934325B1 (ko) * | 2014-12-10 | 2019-01-03 | 삼성에스디에스 주식회사 | 객체 분류 방법 및 그 장치 |
EP3336774B1 (en) | 2016-12-13 | 2020-11-25 | Axis AB | Method, computer program product and device for training a neural network |
KR102481885B1 (ko) * | 2017-09-08 | 2022-12-28 | 삼성전자주식회사 | 클래스 인식을 위한 뉴럴 네트워크 학습 방법 및 디바이스 |
CN109215057B (zh) * | 2018-07-31 | 2021-08-20 | 中国科学院信息工程研究所 | 一种高性能视觉跟踪方法及装置 |
US11010888B2 (en) * | 2018-10-29 | 2021-05-18 | International Business Machines Corporation | Precision defect detection based on image difference with respect to templates |
CN109671102B (zh) * | 2018-12-03 | 2021-02-05 | 华中科技大学 | 一种基于深度特征融合卷积神经网络的综合式目标跟踪方法 |
CN109766954B (zh) | 2019-01-31 | 2020-12-04 | 北京市商汤科技开发有限公司 | 一种目标对象处理方法、装置、电子设备及存储介质 |
-
2019
- 2019-05-13 CN CN201910397253.XA patent/CN110147836B/zh active Active
-
2020
- 2020-04-07 WO PCT/CN2020/083523 patent/WO2020228446A1/zh unknown
- 2020-04-07 JP JP2021536356A patent/JP7273157B2/ja active Active
- 2020-04-07 EP EP20805250.6A patent/EP3971772B1/en active Active
- 2020-04-07 KR KR1020217025275A patent/KR102591961B1/ko active IP Right Grant
-
2021
- 2021-07-07 US US17/369,833 patent/US11704817B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103149939A (zh) * | 2013-02-26 | 2013-06-12 | 北京航空航天大学 | 一种基于视觉的无人机动态目标跟踪与定位方法 |
US20190021426A1 (en) * | 2017-07-20 | 2019-01-24 | Siege Sports, LLC | Highly Custom and Scalable Design System and Method for Articles of Manufacture |
CN109344742A (zh) * | 2018-09-14 | 2019-02-15 | 腾讯科技(深圳)有限公司 | 特征点定位方法、装置、存储介质和计算机设备 |
CN110147836A (zh) * | 2019-05-13 | 2019-08-20 | 腾讯科技(深圳)有限公司 | 模型训练方法、装置、终端及存储介质 |
Non-Patent Citations (1)
Title |
---|
NATIONAL INTELLECTUAL PROPERTY ADMINISTRATION, 13 May 2019 (2019-05-13) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112967315A (zh) * | 2021-03-02 | 2021-06-15 | 北京百度网讯科技有限公司 | 一种目标跟踪方法、装置及电子设备 |
CN112967315B (zh) * | 2021-03-02 | 2022-08-02 | 北京百度网讯科技有限公司 | 一种目标跟踪方法、装置及电子设备 |
CN113469977A (zh) * | 2021-07-06 | 2021-10-01 | 浙江霖研精密科技有限公司 | 一种基于蒸馏学习机制的瑕疵检测装置、方法、存储介质 |
CN113469977B (zh) * | 2021-07-06 | 2024-01-12 | 浙江霖研精密科技有限公司 | 一种基于蒸馏学习机制的瑕疵检测装置、方法、存储介质 |
CN113838093A (zh) * | 2021-09-24 | 2021-12-24 | 重庆邮电大学 | 基于空间正则化相关滤波器的自适应多特征融合跟踪方法 |
CN113838093B (zh) * | 2021-09-24 | 2024-03-19 | 重庆邮电大学 | 基于空间正则化相关滤波器的自适应多特征融合跟踪方法 |
CN114463829A (zh) * | 2022-04-14 | 2022-05-10 | 合肥的卢深视科技有限公司 | 模型训练方法、亲属关系识别方法、电子设备及存储介质 |
CN114693995A (zh) * | 2022-04-14 | 2022-07-01 | 北京百度网讯科技有限公司 | 应用于图像处理的模型训练方法、图像处理方法和设备 |
CN114463829B (zh) * | 2022-04-14 | 2022-08-12 | 合肥的卢深视科技有限公司 | 模型训练方法、亲属关系识别方法、电子设备及存储介质 |
CN115455306A (zh) * | 2022-11-11 | 2022-12-09 | 腾讯科技(深圳)有限公司 | 推送模型训练、信息推送方法、装置和存储介质 |
CN115455306B (zh) * | 2022-11-11 | 2023-02-07 | 腾讯科技(深圳)有限公司 | 推送模型训练、信息推送方法、装置和存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN110147836A (zh) | 2019-08-20 |
JP7273157B2 (ja) | 2023-05-12 |
KR102591961B1 (ko) | 2023-10-19 |
CN110147836B (zh) | 2021-07-02 |
JP2022532460A (ja) | 2022-07-15 |
EP3971772A4 (en) | 2022-08-10 |
US20210335002A1 (en) | 2021-10-28 |
KR20210110713A (ko) | 2021-09-08 |
EP3971772A1 (en) | 2022-03-23 |
EP3971772B1 (en) | 2023-08-09 |
US11704817B2 (en) | 2023-07-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020228446A1 (zh) | 模型训练方法、装置、终端及存储介质 | |
US20220092351A1 (en) | Image classification method, neural network training method, and apparatus | |
CN112446270B (zh) | 行人再识别网络的训练方法、行人再识别方法和装置 | |
WO2019100724A1 (zh) | 训练多标签分类模型的方法和装置 | |
CN109960742B (zh) | 局部信息的搜索方法及装置 | |
EP3968179A1 (en) | Place recognition method and apparatus, model training method and apparatus for place recognition, and electronic device | |
WO2021022521A1 (zh) | 数据处理的方法、训练神经网络模型的方法及设备 | |
CN109902548B (zh) | 一种对象属性识别方法、装置、计算设备及*** | |
US20220148291A1 (en) | Image classification method and apparatus, and image classification model training method and apparatus | |
CN111310731A (zh) | 基于人工智能的视频推荐方法、装置、设备及存储介质 | |
CN110222718B (zh) | 图像处理的方法及装置 | |
Xia et al. | Loop closure detection for visual SLAM using PCANet features | |
CN111914997B (zh) | 训练神经网络的方法、图像处理方法及装置 | |
WO2020098257A1 (zh) | 一种图像分类方法、装置及计算机可读存储介质 | |
US20220157041A1 (en) | Image classification method and apparatus | |
CN111695673B (zh) | 训练神经网络预测器的方法、图像处理方法及装置 | |
US20220157046A1 (en) | Image Classification Method And Apparatus | |
CN111179270A (zh) | 基于注意力机制的图像共分割方法和装置 | |
CN112529149A (zh) | 一种数据处理方法及相关装置 | |
CN113065575A (zh) | 一种图像处理方法及相关装置 | |
CN111242114A (zh) | 文字识别方法及装置 | |
Ocegueda-Hernandez et al. | A lightweight convolutional neural network for pose estimation of a planar model | |
CN114155388A (zh) | 一种图像识别方法、装置、计算机设备和存储介质 | |
CN116363656A (zh) | 包含多行文本的图像识别方法、装置及计算机设备 | |
CN113822871A (zh) | 基于动态检测头的目标检测方法、装置、存储介质及设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20805250 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021536356 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20217025275 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2020805250 Country of ref document: EP Effective date: 20211213 |