CN109492537B

CN109492537B - Object identification method and device

Info

Publication number: CN109492537B
Application number: CN201811206301.4A
Authority: CN
Inventors: 魏承赟
Original assignee: Guilin Feiyu Technology Corp ltd
Current assignee: Guilin Feiyu Technology Corp ltd
Priority date: 2018-10-17
Filing date: 2018-10-17
Publication date: 2023-03-14
Anticipated expiration: 2038-10-17
Also published as: CN109492537A

Abstract

The invention relates to an object identification method and a device, wherein the method comprises the following steps: s1, extracting a sample of a tracking target from an initial frame picture, training a tracker, and storing target characteristics into a sample space; s2, reading a current frame picture, and judging whether a tracking target in a previous frame is lost or not; s3, if the target is lost, processing the position of the target in the previous frame and samples at a plurality of positions around the target in the previous frame by using the tracker trained for the previous time to obtain a score map; s4, if the target is not lost, extracting a current frame sample from the position of the tracking target of the previous frame by using the tracker trained for the last time to obtain a score map; s5, performing score evaluation on the sample score maps of all positions, and judging whether the score maps are ideal or not; s6, if the target scaling is ideal, updating the sample weight and the target position, predicting the target scaling and updating the scale; and S7, weighting and updating the new sample to a sample space by the sample weight, and training the tracker according to the set frame number interval.

Description

Object identification method and device

Technical Field

The invention relates to the technical field of computer vision, in particular to an optimized object identification method and device.

Background

Since the twenty-first century, image data has been explosively increased with the rapid development of internet technology and the popularization of mobile phones, cameras, and personal computers. On the other hand, with the need of building a safe city, the number of monitoring cameras is increasing, and according to incomplete statistics, the number of monitoring cameras in Guangzhou city exceeds 30 ten thousand, while the number of monitoring cameras in China reaches 2000 ten thousand, and still increases by 20% per year. Such large-scale data far exceeds the analysis processing capacity of human beings. Therefore, it is urgently required to process these image and video data intelligently. In this context, how to automatically and intelligently analyze and understand image data by using computer vision technology is receiving a lot of attention.

Object recognition is a classic problem of computer vision tasks and a core problem of solving many high-level vision tasks, and the research of object recognition lays a foundation for solving the high-level vision tasks (such as behavior recognition, scene understanding and the like). It has wide applications in people's daily life and industrial production, for example: intelligent video monitoring, automobile auxiliary driving, intelligent transportation, internet image retrieval, virtual reality, human-computer interaction and the like.

In recent decades, with the successful application of a large number of statistical machine learning algorithms in the fields of artificial intelligence and computer vision, the computer vision technology has been advanced dramatically, especially in recent years, the arrival of the big data era provides richer massive image data for vision tasks, the development of high-performance computing equipment provides hardware support for big data computing, and a large number of computer vision algorithms are emerging continuously, however, although a large number of technologies and algorithms are emerging at present, compared with the prior art, the robustness, correctness, efficiency and range of an object identification method are improved greatly, but some difficulties and identification obstacles still exist, and the existing object identification algorithms mainly have the following defects:

1. the proportional tracking speed is too slow;

2. the function of finding back without loss is realized, and once the tracking loss occurs, the tracking can not be carried out;

3. the tracking can be carried out only in a short time, and all application scenes cannot be met.

Disclosure of Invention

In order to overcome the above-mentioned deficiencies of the prior art, an object of the present invention is to provide an object identification method and apparatus, so as to achieve the purpose of continuing tracking in case of loss of tracking.

Another objective of the present invention is to provide an object recognition method and apparatus for increasing the proportional tracking speed.

It is still another object of the present invention to provide an object recognition method and apparatus, which can realize long-time tracking.

To achieve the above and other objects, the present invention provides an object recognition method, comprising the steps of:

s1, extracting a sample of a tracked target from an initial frame picture, training a tracker, and storing target characteristics into a sample space;

s2, reading a current frame picture, and judging whether a tracking target in a previous frame is lost;

s3, if the judgment result is that the tracking target is lost, extracting the picture samples of the position where the target is lost in the last frame and a plurality of positions around the position by using the tracker trained last time, obtaining a score map of each position, and entering the step S5;

s4, if the tracking target is not lost, extracting a picture sample of the current frame by using the position of the target of the previous frame, and evaluating the sample by using the tracker trained last time to obtain a score map;

s5, performing score evaluation on the samples at all positions, judging whether the score map is ideal, and returning to the step S2 after entering the step S6 or entering the next frame according to the judgment result;

s6, updating the sample weight, updating the target position, predicting the target scaling and updating the scale;

s7, updating the new sample to a sample space by sample weight weighting, training a tracker according to the set frame number interval, and returning to the S2; the new sample is weighted and updated to a sample space by sample weight, and the tracker is trained according to the set frame number interval, which specifically comprises the following steps: step S700, weighting the samples of the frame; step S701, judging whether a sample space is full of samples or not; step S702, if the sample space is full, judging whether the new sample is stored in the sample space in a mode of fusing the old sample and inserting the new sample or is stored in the sample space in a mode of fusing the old sample and the old sample; step S703, if the sample space is not full, a new sample is directly put in after the old sample; step S704, training the tracker using the sample space according to a preset training interval.

Preferably, the step S1 further comprises:

step S100, acquiring the position and size information of a tracking target in an initial frame picture;

step S101, extracting HOG characteristics and CN characteristics of a tracking target area, and preprocessing the extracted target characteristics;

step S102, training a tracker and a dimensionality reduction matrix according to the preprocessed target features, and performing dimensionality reduction processing on the target features;

and step S103, storing the target characteristics subjected to dimension reduction into a sample space.

Preferably, in steps S3 and S4, the operation of extracting the sample includes extracting the HOG feature and the CN feature of the tracking target area, and preprocessing the extraction result.

Preferably, step S5 further comprises:

step S500, evaluating the fractional graph by using the average peak value correlation energy, and obtaining an energy value;

step S501, if the target of the previous frame is not lost, the evaluation is judged whether the change condition of the energy value and the peak value of the score map relative to the previous frame meets the preset condition or not, and whether the energy value and the peak value of the score map meet the preset condition or not;

step S502, if the target of the previous frame is lost, the evaluation is judged whether the peak values of the energy value and the score map meet the preset condition;

step S503, the ideal degree of the evaluation result is divided into excellent, better, worse and extremely poor, and when the final result is more than poor, the step S6 is executed; and when the final result is extremely poor, the tracking target of the frame is considered to be lost, the next frame is entered, and the step S2 is returned.

Preferably, in step S3, samples are sequentially extracted from the position of the tracking target of the previous frame of picture and the positions of the upper, lower, left, right, upper left, lower left, upper right and lower right around the tracking target.

Preferably, in step S3, each sample is compared with the tracker of the previous frame to obtain a score map of each position sample.

Preferably, step S6 further comprises:

step S600, sample weight is distributed according to the evaluation result in the step S5;

step S601, performing iterative optimization on the score map by using a Newton method to obtain the best score map, wherein the position of the maximum value in the score map is a target position;

and step S603, performing target scaling prediction by using PCA dimension reduction.

Preferably, in step S702, if the sample space is full, the similarity degree between the new sample and all the old samples in the sample space is calculated, and if the similarity degree between the new sample and the old samples is higher than a certain threshold, the new sample and the old samples are merged; otherwise, calculating the similarity degree between all the old samples in the sample space, selecting two samples with the highest similarity degree for fusion, and then inserting the vacated position into the new sample.

In order to achieve the above object, the present invention also provides an object recognition apparatus, comprising:

the initial frame processing unit is used for extracting a sample of a tracking target from an initial frame picture, training a tracker and storing target characteristics into a sample space;

the loss judging unit is used for reading the current frame picture and judging whether the tracking target in the previous frame is lost or not;

the loss finding unit is used for evaluating the position of the tracking target of the previous frame of picture and samples of a plurality of positions around the tracking target by utilizing the tracker trained at the last time and acquiring a score map of each position when the judgment result of the loss judging unit is that the tracking target is lost;

a current frame tracking target position obtaining unit, configured to extract a current frame picture sample according to the position of the tracking target of the previous frame picture if the tracking target is not lost as a result of the judgment of the loss judgment unit, and evaluate the sample by using the tracker trained last time to obtain a score map;

the tracking result evaluation unit is used for evaluating the fractional graph and judging whether the target is lost;

the tracking result updating unit is used for updating the sample weight, updating the position of a tracking target, predicting the target scaling and updating the target scale;

the tracker training unit is used for weighting and updating the new sample to a sample space by sample weight, training the tracker by the sample space according to a preset interval and returning to the loss judgment unit; the new samples are weighted and updated to the sample space by the sample weight, and the tracker is trained by the sample space according to the preset interval, which specifically comprises the following steps: for weighting the samples of the current frame; judging whether the sample space is full of samples or not; if the sample space is full, judging whether the new sample is stored in the sample space in a mode of fusing the old sample and inserting the new sample or is stored in the sample space in a mode of fusing the old sample and the old sample; if the sample space is not full, directly putting a new sample behind the old sample; and training the tracker by using a sample space according to a preset training interval.

Preferably, the initial frame processing unit further includes:

a tracking target obtaining unit, configured to obtain information of an initial frame picture, that is, obtain position and size information of a tracking target in the initial frame picture;

the feature extraction unit is used for extracting the HOG feature and the CN feature of the tracked target and preprocessing the extracted target feature;

the training dimensionality reduction unit is used for training the tracker and the dimensionality reduction matrix according to the preprocessed target characteristics and performing dimensionality reduction processing on the target characteristics;

and the storage unit is used for storing the target features subjected to dimension reduction into a sample space.

Compared with the prior art, the object identification method and the object identification device judge whether the tracking target is lost or not by the current frame picture, and perform loss recovery processing when the tracking target is lost, so that the aim of continuously tracking the target when the tracking target appears again under the condition of loss is fulfilled.

Drawings

FIG. 1 is a flow chart illustrating the steps of an object recognition method according to the present invention;

FIG. 2 is a detailed flowchart of step S1 according to an embodiment of the present invention;

FIG. 3 is a detailed flowchart of step S3 according to an embodiment of the present invention;

FIG. 4 is a detailed flowchart of step S301 according to an embodiment of the present invention;

FIG. 5 is a system architecture diagram of an object recognition device of the present invention;

FIG. 6 is a flowchart illustrating processing for a new frame of picture according to an embodiment of the present invention.

Detailed Description

Other advantages and capabilities of the present invention will be readily apparent to those skilled in the art from the present disclosure by describing the embodiments of the present invention with specific embodiments thereof in conjunction with the accompanying drawings. The invention is capable of other and different embodiments and its several details are capable of modification in various other respects, all without departing from the spirit and scope of the present invention.

FIG. 1 is a flow chart illustrating steps of an object recognition method according to the present invention. As shown in fig. 1, the object recognition method of the present invention includes the following steps:

step S1, extracting a sample of a tracking target from an initial frame picture, training a tracker, and storing target characteristics into a sample space. The tracking target is a moving object to be identified in the video image.

Specifically, as shown in fig. 2, step S1 further includes:

step S100, obtaining information of an initial frame picture, namely obtaining position and size information of a tracking target in the initial frame picture, and initializing parameters of a tracker;

step S101, extracting HOG (Histogram of Oriented Gradient) features and CN (Color Name) features of a tracking target, and preprocessing the extracted target features; specifically, the preprocessing in this step includes feature dimension reduction, cosine addition, DFT, interpolation, etc., and it should be noted that the dimension reduction matrix is initialized by PCA, and it is updated in S102, that is, dimension reduction is performed again before storing in the sample space.

Step S102, training a tracker and a dimension reduction matrix according to the preprocessed target characteristics, and carrying out dimension reduction processing on the target characteristics;

S2, reading a current frame picture, and judging whether a tracking target in a previous frame is lost or not; in the embodiment of the present invention, a target loss flag update flag is set for each frame to determine whether a tracking target is lost, specifically, the target loss flag update flag is initially true, and in step S5, the tracking target is updated according to a score evaluation result.

And S3, if the tracking target is lost according to the judgment result, extracting the picture samples of the position where the target is lost in the last frame and a plurality of positions around the position by using the tracker trained last time, obtaining a score map of each position, and skipping to the step S5.

Specifically, as shown in fig. 3, step S3 further includes:

step S300, extracting samples from the position of the tracking target of the previous frame of picture and 8 positions (upper, lower, left, right, upper left, lower left, upper right and lower right) around the tracking target in sequence, wherein the operation of extracting the samples comprises extracting HOG features and CN features of a tracking target region, and preprocessing the extraction result, and the preprocessing still comprises the operations of feature dimensionality reduction, cosine adding, DFT, interpolation and the like;

step S301, comparing each sample with the previous frame tracker to obtain a score map of each position sample. That is, the score map is derived from the comparison (i.e. frequency domain correlation) between the samples and the tracker of the previous frame, and there are several score maps for several samples, and the scores in the score maps are actually the correlation degrees.

And S4, if the tracking target is not lost according to the judgment result, extracting a picture sample of the current frame by using the position of the target of the previous frame, and evaluating the sample by using the tracker trained last time to obtain a score map. In the specific embodiment of the present invention, the operation of extracting the sample in this step also includes extracting the HOG features and CN features of the tracking target region, and preprocessing the extraction result, where the preprocessing still includes operations such as feature dimension reduction, cosine addition, DFT, interpolation, and the like. The fractional image is also from the comparison of the sample with the previous frame tracker, which is not described herein.

And S5, performing score evaluation on the score maps of the samples at the positions, and judging whether the score maps are ideal.

Specifically, step S5 further includes:

step S500, evaluating the fractional graph by using Average Peak Correlation Energy (APCE) to obtain an energy value;

in step S501, if the target of the previous frame is not lost, the evaluation is performed at this time to determine whether the variation of the energy value and the peak value of the score map with respect to the previous frame meets the preset condition, and whether the energy value and the peak value of the score map meet the preset condition.

Step S502, if the target of the previous frame is lost, the evaluation is determined whether the energy value and the peak value of the score map satisfy the preset condition.

And step S503, the step S6 is executed or the step S2 is executed again according to the ideal degree of the judgment result. In the embodiment of the present invention, the ideal degree of the evaluation result is divided into excellent, better, worse and extremely poor, and when the final result of the judgment is poor or above, the step S6 is entered; and when the final result is very bad, the tracking target of the current frame is considered to be lost, the next frame is entered, the step S2 is returned, and at this time, the target loss flag update flag is updated to false, as shown in fig. 4, wherein in fig. 4, the confidence level is higher, general and lower corresponds to the ideal degree, and is excellent, better and worse.

And S6, updating the sample weight, updating the target position, predicting the target scaling and updating the scale. That is, if the score is ideal (i.e., the ideal degree is excellent, better or worse) as a result of the determination in step S5, the sample weight is updated, the target position is updated, the target scaling prediction is performed, and the scale is updated.

Specifically, step S6 further includes:

and step S603, performing target scaling prediction by using PCA dimension reduction. The invention uses PCA dimension reduction technique, which can greatly reduce the size of the calculated data; meanwhile, the scaling uses a frequency domain interpolation method, so that the proportion required to be calculated is reduced, and the calculation speed is greatly improved.

And S7, weighting and updating the new sample to a sample space by the sample weight, training the tracker according to the set frame number interval, and returning to the step S2.

Specifically, step S7 further includes:

step S700, weighting the samples of the current frame.

Step S701 is to determine whether the sample space is full of samples, and in the embodiment of the present invention, whether the sample space is full of samples may be determined according to a preset sample space size.

Step S702, if the sample space is full, the new sample is stored in the sample space in a manner of merging the old sample into the new sample, or in a manner of merging with the old sample. Specifically, firstly, the similarity degree of the new sample and all the old samples in the sample space is calculated, and if the similarity degree of the new sample and the old samples is higher than a certain threshold value, the new sample and the old samples are fused; otherwise, calculating the similarity degree between all the old samples in the sample space, selecting two samples with the highest similarity degree for fusion, and then inserting the vacated position into the new sample.

In step S703, if the sample space is not full, a new sample is directly placed after the old sample.

Step S704, training the tracker using a sample space according to a preset training interval.

Fig. 5 is a system architecture diagram of an object recognition device according to the present invention. As shown in FIG. 5, an object recognition apparatus of the present invention comprises

And the initial frame processing unit 50 is used for extracting a sample of the tracking target from the initial frame picture, training the tracker and storing the target feature into a sample space. The tracking target is a moving object to be identified in the video image.

Specifically, the initial frame processing unit 50 further includes:

a feature extraction unit, configured to extract a HOG (Histogram of Oriented Gradient) feature and a CN feature of a tracked target, and perform preprocessing on the extracted target feature, where the preprocessing includes feature dimension reduction, cosine window addition, DFT, interpolation, and the like;

A loss judgment unit 51, configured to read a current frame picture, and judge whether a tracking target in a previous frame is lost;

and a loss retrieving unit 52, configured to extract, by using the tracker trained last time, the position of the tracking target in the previous frame of picture and picture samples of several positions around the tracking target when the determination result of the loss determining unit 51 is that the tracking target is lost, and obtain a score map of each position.

Specifically, the loss retrieving unit 52 further includes:

the adjacent position sample extraction unit is used for sequentially extracting samples from the position of the tracking target of the previous frame of picture and 8 positions (upper, lower, left, right, upper left, lower left, upper right and lower right) around the tracking target, and preprocessing the samples, wherein the operation of extracting the samples comprises the extraction of HOG (hyper text transport group) features and CN (CN) features of the tracking target region, and the preprocessing of the extraction result is carried out, and the preprocessing still comprises the operations of feature dimensionality reduction, cosine addition, DFT (discrete Fourier transform), interpolation and the like;

and the score map acquisition unit is used for comparing each sample with the tracker of the previous frame to obtain the score map of each position sample. That is, the score map is derived from the comparison (i.e. frequency domain correlation) between the samples and the tracker of the previous frame, and there are several score maps for several samples, and the score in the score map is actually the correlation.

A current frame tracking target position obtaining unit 53, configured to, when the determination result of the loss determining unit is that the tracking target is not lost, extract a current frame picture sample according to the position of the tracking target of the previous frame picture, and evaluate the sample by using the tracker trained last time to obtain a score map.

And a tracking result evaluation unit 54 for performing score evaluation on the score maps of the samples at the respective positions and judging whether the score maps are ideal.

The tracking result evaluation unit 54 is specifically configured to:

evaluating the fractional graph by using Average Peak Correlation Energy (APCE) and obtaining an energy value;

if the target of the previous frame is not lost, judging whether the change conditions of the energy value and the peak value of the fractional image relative to the previous frame meet the preset conditions or not and whether the energy value and the peak value of the fractional image meet the preset conditions or not;

if the target of the previous frame is lost, judging whether the energy value and the peak value of the score map meet the preset condition or not;

and starting the tracking result updating unit 55 or entering the next frame return loss judging unit 51 according to the ideal degree of the judgment result. In the embodiment of the present invention, the ideal degree of the evaluation result is divided into excellent, better, worse, and extremely poor, and when the final result of the judgment is poor or above, the tracking result updating unit 55 is started; and when the final result is very bad, the tracking target of the current frame is considered to be lost, the next frame is entered, and the next frame is returned to the loss judgment unit 51, and at this time, the target loss flag update flag is updated to false.

And a tracking result updating unit 55, configured to update the sample weight, update the tracking target position, perform target scaling prediction, and update the scale. That is, if the determination result of the tracking result evaluation unit 54 is that the score is ideal (i.e., the ideal degree is excellent or better or worse), the sample weight is updated, the target position is updated, the target scaling prediction is performed, and the scale is updated.

Specifically, the tracking result updating unit 55 further includes:

a sample weight updating unit for assigning a sample weight according to the evaluation result of the tracking result evaluating unit 54;

the tracking target position updating unit is used for performing iterative optimization on the score map by using a Newton method to obtain the best score map, and the position of the maximum value in the score map is a target position;

and the scale prediction updating unit is used for performing target scale prediction by using PCA dimension reduction. The invention uses PCA dimension reduction technique, which can greatly reduce the size of the calculated data; meanwhile, the scaling uses a frequency domain interpolation method, so that the proportion required to be calculated is reduced, and the calculation speed is greatly improved.

And a tracker training unit 56 for updating the new samples to the sample space with sample weight weighting and training the tracker according to the set frame number interval.

The tracker training unit 56 is specifically configured to:

weighting the samples of the frame;

judging whether the sample space is full of samples, wherein in the embodiment of the invention, whether the sample space is full of samples can be judged according to the preset size of the sample space;

if the sample space is full, judging whether the new sample is stored in the sample space in a mode of fusing the old sample and inserting the new sample or is stored in the sample space in a mode of fusing the old sample and the old sample; specifically, firstly, the similarity degree of the new sample and all the old samples in the sample space is calculated, and if the similarity degree of the new sample and the old samples is higher than a certain threshold value, the new sample and the old samples are fused; otherwise, calculating the similarity degree between all the old samples in the sample space, selecting two samples with the highest similarity degree for fusion, and then inserting the vacated position into the new sample.

If the sample space is not full, a new sample is directly placed after the old sample;

and training the tracker by using a sample space according to a preset training interval.

FIG. 6 is a flowchart illustrating processing of a new frame of picture according to an embodiment of the present invention. The treatment process is as follows:

1. reading in a new frame of picture;

2. judging whether the tracking target in the previous frame is lost or not, in the specific embodiment of the invention, adopting a target state Flag Update Flag to record whether the tracking target is lost or not, wherein if the target state Flag Update Flag is 0, the tracking target is lost, and if the target state Flag is 1, the tracking target is not lost;

3. if the Update Flag is 0, it indicates that the tracking target is lost, then samples are sequentially extracted from the position of the tracking target of the previous frame of picture and 8 positions (upper, lower, left, right, upper left, lower left, upper right and lower right) around the tracking target, and the samples are preprocessed (including dimensionality reduction, cosine window addition, FFT, interpolation) to obtain a score map of each position;

4. evaluating the 9 samples by using the tracker trained at the last time, obtaining the scores of all positions of the samples, and evaluating the scores;

5. judging whether a sample meets the retrieval condition by using the APCE;

6. if the samples meet the retrieval condition, optimizing the scoring result, finding out the position of the maximum score as the position of the retrieved tracking target, and entering 10.

7. If the Update Flag is 1, it indicates that the tracking target is not lost, then samples are extracted at the position of the previous frame, and features and preprocessing (including dimension reduction, cosine window addition, FFT and interpolation) are extracted from the samples;

8. evaluating the new sample by using the tracker trained last time to obtain the scores of all positions, and then evaluating the scores;

9. optimizing the scoring result and finding out the position of the maximum score;

10. updating the position of the tracking target, predicting the target scaling and updating the scale;

11. updating the new samples to the sample space with the sample weight weighting;

12. at regular intervals, the tracker is trained with samples in sample space.

In summary, the object identification method and apparatus of the present invention determine whether the tracking target is lost by the current frame picture, and perform the loss retrieving process when the tracking target is lost, so as to achieve the purpose that the target can continue to track when the tracking target is lost again, at the same time, the present invention increases the scale tracking, can perform the zoom tracking, and accelerates the rate of the scale tracking, and experiments prove that the rate tracking speed of the existing object identification method is about 140 ms/frame, and the rate tracking speed of the object identification optimized by the present invention is 40 ms/frame, which significantly accelerates the rate tracking, and the present invention trains the tracker by using the samples in the sample space at intervals, so that the present invention can achieve the purpose of long-time tracking.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Modifications and variations can be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the present invention. Therefore, the scope of the invention should be determined from the following claims.

Claims

1. An object identification method comprising the steps of:

s1, extracting a sample of a tracking target from an initial frame picture, training a tracker, and storing target characteristics into a sample space;

s4, if the tracking target is not lost according to the judgment result, extracting a picture sample of the current frame by using the position of the target of the previous frame, and evaluating the sample by using the tracker trained last time to obtain a score map;

s7, updating the new sample to a sample space by sample weight weighting, training a tracker according to the set frame number interval, and returning to the S2; the new sample is weighted and updated to a sample space by sample weight, and the tracker is trained according to the set frame number interval, which specifically comprises the following steps: step S700, weighting the samples of the frame; step S701, judging whether a sample space is full of samples or not; step S702, if the sample space is full, judging whether the new sample is stored in the sample space in a mode of fusing the old sample and inserting the new sample, or is stored in the sample space in a mode of fusing the old sample and the old sample; step S703, if the sample space is not full, a new sample is directly put in after the old sample; step S704, training the tracker using a sample space according to a preset training interval.

2. An object recognition method according to claim 1, wherein step S1 further comprises:

step S100, acquiring position and size information of a tracking target in an initial frame picture;

3. An object recognition method according to claim 1, characterized in that: in steps S3 and S4, the operation of extracting the sample includes extracting the HOG feature and the CN feature of the tracking target region, and preprocessing the extraction result.

4. An object recognition method according to claim 1, wherein step S5 further comprises:

step S500, evaluating the fractional graph by using the average peak correlation energy, and obtaining an energy value;

step S501, if the target of the previous frame is not lost, the evaluation is carried out at this time to judge whether the change condition of the energy value and the peak value of the score map relative to the previous frame meets the preset condition or not, and whether the energy value and the peak value of the score map meet the preset condition or not;

5. An object recognition method according to claim 1, characterized in that: in step S3, samples are sequentially extracted from the position of the tracking target of the previous frame of picture and the surrounding upper, lower, left, right, upper left, lower left, upper right, and lower right positions thereof.

6. An object recognition method according to claim 5, characterized in that: in step S3, each sample is compared with the tracker of the previous frame to obtain a score map of each position sample.

7. An object recognition method according to claim 1, wherein step S6 further comprises:

8. An object recognition method according to claim 1, characterized in that: in step S702, if the sample space is full, the similarity between the new sample and all the old samples in the sample space is calculated, and if the similarity between the new sample and the old samples is higher than a certain threshold, the new sample and the old samples are merged; otherwise, calculating the similarity degree between all the old samples in the sample space, selecting two samples with the highest similarity degree for fusion, and then inserting the vacated position into the new sample.

9. An object recognition device comprises

the loss judging unit is used for reading the current frame picture and judging whether the tracking target in the previous frame is lost;

the loss retrieving unit is used for evaluating the position of the tracking target of the previous frame of picture and samples of a plurality of positions around the tracking target by utilizing the tracker trained at the last time and acquiring a score map of each position, wherein the judgment result of the loss judging unit is that the tracking target is lost;

the tracker training unit is used for weighting and updating the new sample to a sample space by sample weight, training the tracker by the sample space according to a preset interval and returning to the loss judgment unit; the new sample is weighted and updated to a sample space by sample weight, and the tracker is trained by the sample space according to a preset interval, which specifically comprises the following steps: for weighting the samples of the current frame; judging whether the sample space is full of samples or not; if the sample space is full, judging whether the new sample is stored in the sample space in a mode of fusing the old sample and inserting the new sample or is stored in the sample space in a mode of fusing the old sample and the old sample; if the sample space is not full, directly putting a new sample behind the old sample;