CN109492537B - Object identification method and device - Google Patents

Object identification method and device Download PDF

Info

Publication number
CN109492537B
CN109492537B CN201811206301.4A CN201811206301A CN109492537B CN 109492537 B CN109492537 B CN 109492537B CN 201811206301 A CN201811206301 A CN 201811206301A CN 109492537 B CN109492537 B CN 109492537B
Authority
CN
China
Prior art keywords
sample
target
tracker
samples
tracking target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811206301.4A
Other languages
Chinese (zh)
Other versions
CN109492537A (en
Inventor
魏承赟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin Feiyu Technology Corp ltd
Original Assignee
Guilin Feiyu Technology Corp ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin Feiyu Technology Corp ltd filed Critical Guilin Feiyu Technology Corp ltd
Priority to CN201811206301.4A priority Critical patent/CN109492537B/en
Publication of CN109492537A publication Critical patent/CN109492537A/en
Application granted granted Critical
Publication of CN109492537B publication Critical patent/CN109492537B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an object identification method and a device, wherein the method comprises the following steps: s1, extracting a sample of a tracking target from an initial frame picture, training a tracker, and storing target characteristics into a sample space; s2, reading a current frame picture, and judging whether a tracking target in a previous frame is lost or not; s3, if the target is lost, processing the position of the target in the previous frame and samples at a plurality of positions around the target in the previous frame by using the tracker trained for the previous time to obtain a score map; s4, if the target is not lost, extracting a current frame sample from the position of the tracking target of the previous frame by using the tracker trained for the last time to obtain a score map; s5, performing score evaluation on the sample score maps of all positions, and judging whether the score maps are ideal or not; s6, if the target scaling is ideal, updating the sample weight and the target position, predicting the target scaling and updating the scale; and S7, weighting and updating the new sample to a sample space by the sample weight, and training the tracker according to the set frame number interval.

Description

Object identification method and device
Technical Field
The invention relates to the technical field of computer vision, in particular to an optimized object identification method and device.
Background
Since the twenty-first century, image data has been explosively increased with the rapid development of internet technology and the popularization of mobile phones, cameras, and personal computers. On the other hand, with the need of building a safe city, the number of monitoring cameras is increasing, and according to incomplete statistics, the number of monitoring cameras in Guangzhou city exceeds 30 ten thousand, while the number of monitoring cameras in China reaches 2000 ten thousand, and still increases by 20% per year. Such large-scale data far exceeds the analysis processing capacity of human beings. Therefore, it is urgently required to process these image and video data intelligently. In this context, how to automatically and intelligently analyze and understand image data by using computer vision technology is receiving a lot of attention.
Object recognition is a classic problem of computer vision tasks and a core problem of solving many high-level vision tasks, and the research of object recognition lays a foundation for solving the high-level vision tasks (such as behavior recognition, scene understanding and the like). It has wide applications in people's daily life and industrial production, for example: intelligent video monitoring, automobile auxiliary driving, intelligent transportation, internet image retrieval, virtual reality, human-computer interaction and the like.
In recent decades, with the successful application of a large number of statistical machine learning algorithms in the fields of artificial intelligence and computer vision, the computer vision technology has been advanced dramatically, especially in recent years, the arrival of the big data era provides richer massive image data for vision tasks, the development of high-performance computing equipment provides hardware support for big data computing, and a large number of computer vision algorithms are emerging continuously, however, although a large number of technologies and algorithms are emerging at present, compared with the prior art, the robustness, correctness, efficiency and range of an object identification method are improved greatly, but some difficulties and identification obstacles still exist, and the existing object identification algorithms mainly have the following defects:
1. the proportional tracking speed is too slow;
2. the function of finding back without loss is realized, and once the tracking loss occurs, the tracking can not be carried out;
3. the tracking can be carried out only in a short time, and all application scenes cannot be met.
Disclosure of Invention
In order to overcome the above-mentioned deficiencies of the prior art, an object of the present invention is to provide an object identification method and apparatus, so as to achieve the purpose of continuing tracking in case of loss of tracking.
Another objective of the present invention is to provide an object recognition method and apparatus for increasing the proportional tracking speed.
It is still another object of the present invention to provide an object recognition method and apparatus, which can realize long-time tracking.
To achieve the above and other objects, the present invention provides an object recognition method, comprising the steps of:
s1, extracting a sample of a tracked target from an initial frame picture, training a tracker, and storing target characteristics into a sample space;
s2, reading a current frame picture, and judging whether a tracking target in a previous frame is lost;
s3, if the judgment result is that the tracking target is lost, extracting the picture samples of the position where the target is lost in the last frame and a plurality of positions around the position by using the tracker trained last time, obtaining a score map of each position, and entering the step S5;
s4, if the tracking target is not lost, extracting a picture sample of the current frame by using the position of the target of the previous frame, and evaluating the sample by using the tracker trained last time to obtain a score map;
s5, performing score evaluation on the samples at all positions, judging whether the score map is ideal, and returning to the step S2 after entering the step S6 or entering the next frame according to the judgment result;
s6, updating the sample weight, updating the target position, predicting the target scaling and updating the scale;
s7, updating the new sample to a sample space by sample weight weighting, training a tracker according to the set frame number interval, and returning to the S2; the new sample is weighted and updated to a sample space by sample weight, and the tracker is trained according to the set frame number interval, which specifically comprises the following steps: step S700, weighting the samples of the frame; step S701, judging whether a sample space is full of samples or not; step S702, if the sample space is full, judging whether the new sample is stored in the sample space in a mode of fusing the old sample and inserting the new sample or is stored in the sample space in a mode of fusing the old sample and the old sample; step S703, if the sample space is not full, a new sample is directly put in after the old sample; step S704, training the tracker using the sample space according to a preset training interval.
Preferably, the step S1 further comprises:
step S100, acquiring the position and size information of a tracking target in an initial frame picture;
step S101, extracting HOG characteristics and CN characteristics of a tracking target area, and preprocessing the extracted target characteristics;
step S102, training a tracker and a dimensionality reduction matrix according to the preprocessed target features, and performing dimensionality reduction processing on the target features;
and step S103, storing the target characteristics subjected to dimension reduction into a sample space.
Preferably, in steps S3 and S4, the operation of extracting the sample includes extracting the HOG feature and the CN feature of the tracking target area, and preprocessing the extraction result.
Preferably, step S5 further comprises:
step S500, evaluating the fractional graph by using the average peak value correlation energy, and obtaining an energy value;
step S501, if the target of the previous frame is not lost, the evaluation is judged whether the change condition of the energy value and the peak value of the score map relative to the previous frame meets the preset condition or not, and whether the energy value and the peak value of the score map meet the preset condition or not;
step S502, if the target of the previous frame is lost, the evaluation is judged whether the peak values of the energy value and the score map meet the preset condition;
step S503, the ideal degree of the evaluation result is divided into excellent, better, worse and extremely poor, and when the final result is more than poor, the step S6 is executed; and when the final result is extremely poor, the tracking target of the frame is considered to be lost, the next frame is entered, and the step S2 is returned.
Preferably, in step S3, samples are sequentially extracted from the position of the tracking target of the previous frame of picture and the positions of the upper, lower, left, right, upper left, lower left, upper right and lower right around the tracking target.
Preferably, in step S3, each sample is compared with the tracker of the previous frame to obtain a score map of each position sample.
Preferably, step S6 further comprises:
step S600, sample weight is distributed according to the evaluation result in the step S5;
step S601, performing iterative optimization on the score map by using a Newton method to obtain the best score map, wherein the position of the maximum value in the score map is a target position;
and step S603, performing target scaling prediction by using PCA dimension reduction.
Preferably, in step S702, if the sample space is full, the similarity degree between the new sample and all the old samples in the sample space is calculated, and if the similarity degree between the new sample and the old samples is higher than a certain threshold, the new sample and the old samples are merged; otherwise, calculating the similarity degree between all the old samples in the sample space, selecting two samples with the highest similarity degree for fusion, and then inserting the vacated position into the new sample.
In order to achieve the above object, the present invention also provides an object recognition apparatus, comprising:
the initial frame processing unit is used for extracting a sample of a tracking target from an initial frame picture, training a tracker and storing target characteristics into a sample space;
the loss judging unit is used for reading the current frame picture and judging whether the tracking target in the previous frame is lost or not;
the loss finding unit is used for evaluating the position of the tracking target of the previous frame of picture and samples of a plurality of positions around the tracking target by utilizing the tracker trained at the last time and acquiring a score map of each position when the judgment result of the loss judging unit is that the tracking target is lost;
a current frame tracking target position obtaining unit, configured to extract a current frame picture sample according to the position of the tracking target of the previous frame picture if the tracking target is not lost as a result of the judgment of the loss judgment unit, and evaluate the sample by using the tracker trained last time to obtain a score map;
the tracking result evaluation unit is used for evaluating the fractional graph and judging whether the target is lost;
the tracking result updating unit is used for updating the sample weight, updating the position of a tracking target, predicting the target scaling and updating the target scale;
the tracker training unit is used for weighting and updating the new sample to a sample space by sample weight, training the tracker by the sample space according to a preset interval and returning to the loss judgment unit; the new samples are weighted and updated to the sample space by the sample weight, and the tracker is trained by the sample space according to the preset interval, which specifically comprises the following steps: for weighting the samples of the current frame; judging whether the sample space is full of samples or not; if the sample space is full, judging whether the new sample is stored in the sample space in a mode of fusing the old sample and inserting the new sample or is stored in the sample space in a mode of fusing the old sample and the old sample; if the sample space is not full, directly putting a new sample behind the old sample; and training the tracker by using a sample space according to a preset training interval.
Preferably, the initial frame processing unit further includes:
a tracking target obtaining unit, configured to obtain information of an initial frame picture, that is, obtain position and size information of a tracking target in the initial frame picture;
the feature extraction unit is used for extracting the HOG feature and the CN feature of the tracked target and preprocessing the extracted target feature;
the training dimensionality reduction unit is used for training the tracker and the dimensionality reduction matrix according to the preprocessed target characteristics and performing dimensionality reduction processing on the target characteristics;
and the storage unit is used for storing the target features subjected to dimension reduction into a sample space.
Compared with the prior art, the object identification method and the object identification device judge whether the tracking target is lost or not by the current frame picture, and perform loss recovery processing when the tracking target is lost, so that the aim of continuously tracking the target when the tracking target appears again under the condition of loss is fulfilled.
Drawings
FIG. 1 is a flow chart illustrating the steps of an object recognition method according to the present invention;
FIG. 2 is a detailed flowchart of step S1 according to an embodiment of the present invention;
FIG. 3 is a detailed flowchart of step S3 according to an embodiment of the present invention;
FIG. 4 is a detailed flowchart of step S301 according to an embodiment of the present invention;
FIG. 5 is a system architecture diagram of an object recognition device of the present invention;
FIG. 6 is a flowchart illustrating processing for a new frame of picture according to an embodiment of the present invention.
Detailed Description
Other advantages and capabilities of the present invention will be readily apparent to those skilled in the art from the present disclosure by describing the embodiments of the present invention with specific embodiments thereof in conjunction with the accompanying drawings. The invention is capable of other and different embodiments and its several details are capable of modification in various other respects, all without departing from the spirit and scope of the present invention.
FIG. 1 is a flow chart illustrating steps of an object recognition method according to the present invention. As shown in fig. 1, the object recognition method of the present invention includes the following steps:
step S1, extracting a sample of a tracking target from an initial frame picture, training a tracker, and storing target characteristics into a sample space. The tracking target is a moving object to be identified in the video image.
Specifically, as shown in fig. 2, step S1 further includes:
step S100, obtaining information of an initial frame picture, namely obtaining position and size information of a tracking target in the initial frame picture, and initializing parameters of a tracker;
step S101, extracting HOG (Histogram of Oriented Gradient) features and CN (Color Name) features of a tracking target, and preprocessing the extracted target features; specifically, the preprocessing in this step includes feature dimension reduction, cosine addition, DFT, interpolation, etc., and it should be noted that the dimension reduction matrix is initialized by PCA, and it is updated in S102, that is, dimension reduction is performed again before storing in the sample space.
Step S102, training a tracker and a dimension reduction matrix according to the preprocessed target characteristics, and carrying out dimension reduction processing on the target characteristics;
and step S103, storing the target characteristics subjected to dimension reduction into a sample space.
S2, reading a current frame picture, and judging whether a tracking target in a previous frame is lost or not; in the embodiment of the present invention, a target loss flag update flag is set for each frame to determine whether a tracking target is lost, specifically, the target loss flag update flag is initially true, and in step S5, the tracking target is updated according to a score evaluation result.
And S3, if the tracking target is lost according to the judgment result, extracting the picture samples of the position where the target is lost in the last frame and a plurality of positions around the position by using the tracker trained last time, obtaining a score map of each position, and skipping to the step S5.
Specifically, as shown in fig. 3, step S3 further includes:
step S300, extracting samples from the position of the tracking target of the previous frame of picture and 8 positions (upper, lower, left, right, upper left, lower left, upper right and lower right) around the tracking target in sequence, wherein the operation of extracting the samples comprises extracting HOG features and CN features of a tracking target region, and preprocessing the extraction result, and the preprocessing still comprises the operations of feature dimensionality reduction, cosine adding, DFT, interpolation and the like;
step S301, comparing each sample with the previous frame tracker to obtain a score map of each position sample. That is, the score map is derived from the comparison (i.e. frequency domain correlation) between the samples and the tracker of the previous frame, and there are several score maps for several samples, and the scores in the score maps are actually the correlation degrees.
And S4, if the tracking target is not lost according to the judgment result, extracting a picture sample of the current frame by using the position of the target of the previous frame, and evaluating the sample by using the tracker trained last time to obtain a score map. In the specific embodiment of the present invention, the operation of extracting the sample in this step also includes extracting the HOG features and CN features of the tracking target region, and preprocessing the extraction result, where the preprocessing still includes operations such as feature dimension reduction, cosine addition, DFT, interpolation, and the like. The fractional image is also from the comparison of the sample with the previous frame tracker, which is not described herein.
And S5, performing score evaluation on the score maps of the samples at the positions, and judging whether the score maps are ideal.
Specifically, step S5 further includes:
step S500, evaluating the fractional graph by using Average Peak Correlation Energy (APCE) to obtain an energy value;
in step S501, if the target of the previous frame is not lost, the evaluation is performed at this time to determine whether the variation of the energy value and the peak value of the score map with respect to the previous frame meets the preset condition, and whether the energy value and the peak value of the score map meet the preset condition.
Step S502, if the target of the previous frame is lost, the evaluation is determined whether the energy value and the peak value of the score map satisfy the preset condition.
And step S503, the step S6 is executed or the step S2 is executed again according to the ideal degree of the judgment result. In the embodiment of the present invention, the ideal degree of the evaluation result is divided into excellent, better, worse and extremely poor, and when the final result of the judgment is poor or above, the step S6 is entered; and when the final result is very bad, the tracking target of the current frame is considered to be lost, the next frame is entered, the step S2 is returned, and at this time, the target loss flag update flag is updated to false, as shown in fig. 4, wherein in fig. 4, the confidence level is higher, general and lower corresponds to the ideal degree, and is excellent, better and worse.
And S6, updating the sample weight, updating the target position, predicting the target scaling and updating the scale. That is, if the score is ideal (i.e., the ideal degree is excellent, better or worse) as a result of the determination in step S5, the sample weight is updated, the target position is updated, the target scaling prediction is performed, and the scale is updated.
Specifically, step S6 further includes:
step S600, sample weight is distributed according to the evaluation result in the step S5;
step S601, performing iterative optimization on the score map by using a Newton method to obtain the best score map, wherein the position of the maximum value in the score map is a target position;
and step S603, performing target scaling prediction by using PCA dimension reduction. The invention uses PCA dimension reduction technique, which can greatly reduce the size of the calculated data; meanwhile, the scaling uses a frequency domain interpolation method, so that the proportion required to be calculated is reduced, and the calculation speed is greatly improved.
And S7, weighting and updating the new sample to a sample space by the sample weight, training the tracker according to the set frame number interval, and returning to the step S2.
Specifically, step S7 further includes:
step S700, weighting the samples of the current frame.
Step S701 is to determine whether the sample space is full of samples, and in the embodiment of the present invention, whether the sample space is full of samples may be determined according to a preset sample space size.
Step S702, if the sample space is full, the new sample is stored in the sample space in a manner of merging the old sample into the new sample, or in a manner of merging with the old sample. Specifically, firstly, the similarity degree of the new sample and all the old samples in the sample space is calculated, and if the similarity degree of the new sample and the old samples is higher than a certain threshold value, the new sample and the old samples are fused; otherwise, calculating the similarity degree between all the old samples in the sample space, selecting two samples with the highest similarity degree for fusion, and then inserting the vacated position into the new sample.
In step S703, if the sample space is not full, a new sample is directly placed after the old sample.
Step S704, training the tracker using a sample space according to a preset training interval.
Fig. 5 is a system architecture diagram of an object recognition device according to the present invention. As shown in FIG. 5, an object recognition apparatus of the present invention comprises
And the initial frame processing unit 50 is used for extracting a sample of the tracking target from the initial frame picture, training the tracker and storing the target feature into a sample space. The tracking target is a moving object to be identified in the video image.
Specifically, the initial frame processing unit 50 further includes:
a tracking target obtaining unit, configured to obtain information of an initial frame picture, that is, obtain position and size information of a tracking target in the initial frame picture;
a feature extraction unit, configured to extract a HOG (Histogram of Oriented Gradient) feature and a CN feature of a tracked target, and perform preprocessing on the extracted target feature, where the preprocessing includes feature dimension reduction, cosine window addition, DFT, interpolation, and the like;
the training dimensionality reduction unit is used for training the tracker and the dimensionality reduction matrix according to the preprocessed target characteristics and performing dimensionality reduction processing on the target characteristics;
and the storage unit is used for storing the target features subjected to dimension reduction into a sample space.
A loss judgment unit 51, configured to read a current frame picture, and judge whether a tracking target in a previous frame is lost;
and a loss retrieving unit 52, configured to extract, by using the tracker trained last time, the position of the tracking target in the previous frame of picture and picture samples of several positions around the tracking target when the determination result of the loss determining unit 51 is that the tracking target is lost, and obtain a score map of each position.
Specifically, the loss retrieving unit 52 further includes:
the adjacent position sample extraction unit is used for sequentially extracting samples from the position of the tracking target of the previous frame of picture and 8 positions (upper, lower, left, right, upper left, lower left, upper right and lower right) around the tracking target, and preprocessing the samples, wherein the operation of extracting the samples comprises the extraction of HOG (hyper text transport group) features and CN (CN) features of the tracking target region, and the preprocessing of the extraction result is carried out, and the preprocessing still comprises the operations of feature dimensionality reduction, cosine addition, DFT (discrete Fourier transform), interpolation and the like;
and the score map acquisition unit is used for comparing each sample with the tracker of the previous frame to obtain the score map of each position sample. That is, the score map is derived from the comparison (i.e. frequency domain correlation) between the samples and the tracker of the previous frame, and there are several score maps for several samples, and the score in the score map is actually the correlation.
A current frame tracking target position obtaining unit 53, configured to, when the determination result of the loss determining unit is that the tracking target is not lost, extract a current frame picture sample according to the position of the tracking target of the previous frame picture, and evaluate the sample by using the tracker trained last time to obtain a score map.
And a tracking result evaluation unit 54 for performing score evaluation on the score maps of the samples at the respective positions and judging whether the score maps are ideal.
The tracking result evaluation unit 54 is specifically configured to:
evaluating the fractional graph by using Average Peak Correlation Energy (APCE) and obtaining an energy value;
if the target of the previous frame is not lost, judging whether the change conditions of the energy value and the peak value of the fractional image relative to the previous frame meet the preset conditions or not and whether the energy value and the peak value of the fractional image meet the preset conditions or not;
if the target of the previous frame is lost, judging whether the energy value and the peak value of the score map meet the preset condition or not;
and starting the tracking result updating unit 55 or entering the next frame return loss judging unit 51 according to the ideal degree of the judgment result. In the embodiment of the present invention, the ideal degree of the evaluation result is divided into excellent, better, worse, and extremely poor, and when the final result of the judgment is poor or above, the tracking result updating unit 55 is started; and when the final result is very bad, the tracking target of the current frame is considered to be lost, the next frame is entered, and the next frame is returned to the loss judgment unit 51, and at this time, the target loss flag update flag is updated to false.
And a tracking result updating unit 55, configured to update the sample weight, update the tracking target position, perform target scaling prediction, and update the scale. That is, if the determination result of the tracking result evaluation unit 54 is that the score is ideal (i.e., the ideal degree is excellent or better or worse), the sample weight is updated, the target position is updated, the target scaling prediction is performed, and the scale is updated.
Specifically, the tracking result updating unit 55 further includes:
a sample weight updating unit for assigning a sample weight according to the evaluation result of the tracking result evaluating unit 54;
the tracking target position updating unit is used for performing iterative optimization on the score map by using a Newton method to obtain the best score map, and the position of the maximum value in the score map is a target position;
and the scale prediction updating unit is used for performing target scale prediction by using PCA dimension reduction. The invention uses PCA dimension reduction technique, which can greatly reduce the size of the calculated data; meanwhile, the scaling uses a frequency domain interpolation method, so that the proportion required to be calculated is reduced, and the calculation speed is greatly improved.
And a tracker training unit 56 for updating the new samples to the sample space with sample weight weighting and training the tracker according to the set frame number interval.
The tracker training unit 56 is specifically configured to:
weighting the samples of the frame;
judging whether the sample space is full of samples, wherein in the embodiment of the invention, whether the sample space is full of samples can be judged according to the preset size of the sample space;
if the sample space is full, judging whether the new sample is stored in the sample space in a mode of fusing the old sample and inserting the new sample or is stored in the sample space in a mode of fusing the old sample and the old sample; specifically, firstly, the similarity degree of the new sample and all the old samples in the sample space is calculated, and if the similarity degree of the new sample and the old samples is higher than a certain threshold value, the new sample and the old samples are fused; otherwise, calculating the similarity degree between all the old samples in the sample space, selecting two samples with the highest similarity degree for fusion, and then inserting the vacated position into the new sample.
If the sample space is not full, a new sample is directly placed after the old sample;
and training the tracker by using a sample space according to a preset training interval.
FIG. 6 is a flowchart illustrating processing of a new frame of picture according to an embodiment of the present invention. The treatment process is as follows:
1. reading in a new frame of picture;
2. judging whether the tracking target in the previous frame is lost or not, in the specific embodiment of the invention, adopting a target state Flag Update Flag to record whether the tracking target is lost or not, wherein if the target state Flag Update Flag is 0, the tracking target is lost, and if the target state Flag is 1, the tracking target is not lost;
3. if the Update Flag is 0, it indicates that the tracking target is lost, then samples are sequentially extracted from the position of the tracking target of the previous frame of picture and 8 positions (upper, lower, left, right, upper left, lower left, upper right and lower right) around the tracking target, and the samples are preprocessed (including dimensionality reduction, cosine window addition, FFT, interpolation) to obtain a score map of each position;
4. evaluating the 9 samples by using the tracker trained at the last time, obtaining the scores of all positions of the samples, and evaluating the scores;
5. judging whether a sample meets the retrieval condition by using the APCE;
6. if the samples meet the retrieval condition, optimizing the scoring result, finding out the position of the maximum score as the position of the retrieved tracking target, and entering 10.
7. If the Update Flag is 1, it indicates that the tracking target is not lost, then samples are extracted at the position of the previous frame, and features and preprocessing (including dimension reduction, cosine window addition, FFT and interpolation) are extracted from the samples;
8. evaluating the new sample by using the tracker trained last time to obtain the scores of all positions, and then evaluating the scores;
9. optimizing the scoring result and finding out the position of the maximum score;
10. updating the position of the tracking target, predicting the target scaling and updating the scale;
11. updating the new samples to the sample space with the sample weight weighting;
12. at regular intervals, the tracker is trained with samples in sample space.
In summary, the object identification method and apparatus of the present invention determine whether the tracking target is lost by the current frame picture, and perform the loss retrieving process when the tracking target is lost, so as to achieve the purpose that the target can continue to track when the tracking target is lost again, at the same time, the present invention increases the scale tracking, can perform the zoom tracking, and accelerates the rate of the scale tracking, and experiments prove that the rate tracking speed of the existing object identification method is about 140 ms/frame, and the rate tracking speed of the object identification optimized by the present invention is 40 ms/frame, which significantly accelerates the rate tracking, and the present invention trains the tracker by using the samples in the sample space at intervals, so that the present invention can achieve the purpose of long-time tracking.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Modifications and variations can be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the present invention. Therefore, the scope of the invention should be determined from the following claims.

Claims (9)

1. An object identification method comprising the steps of:
s1, extracting a sample of a tracking target from an initial frame picture, training a tracker, and storing target characteristics into a sample space;
s2, reading a current frame picture, and judging whether a tracking target in a previous frame is lost;
s3, if the judgment result is that the tracking target is lost, extracting the picture samples of the position where the target is lost in the last frame and a plurality of positions around the position by using the tracker trained last time, obtaining a score map of each position, and entering the step S5;
s4, if the tracking target is not lost according to the judgment result, extracting a picture sample of the current frame by using the position of the target of the previous frame, and evaluating the sample by using the tracker trained last time to obtain a score map;
s5, performing score evaluation on the samples at all positions, judging whether the score map is ideal, and returning to the step S2 after entering the step S6 or entering the next frame according to the judgment result;
s6, updating the sample weight, updating the target position, predicting the target scaling and updating the scale;
s7, updating the new sample to a sample space by sample weight weighting, training a tracker according to the set frame number interval, and returning to the S2; the new sample is weighted and updated to a sample space by sample weight, and the tracker is trained according to the set frame number interval, which specifically comprises the following steps: step S700, weighting the samples of the frame; step S701, judging whether a sample space is full of samples or not; step S702, if the sample space is full, judging whether the new sample is stored in the sample space in a mode of fusing the old sample and inserting the new sample, or is stored in the sample space in a mode of fusing the old sample and the old sample; step S703, if the sample space is not full, a new sample is directly put in after the old sample; step S704, training the tracker using a sample space according to a preset training interval.
2. An object recognition method according to claim 1, wherein step S1 further comprises:
step S100, acquiring position and size information of a tracking target in an initial frame picture;
step S101, extracting HOG characteristics and CN characteristics of a tracking target area, and preprocessing the extracted target characteristics;
step S102, training a tracker and a dimensionality reduction matrix according to the preprocessed target features, and performing dimensionality reduction processing on the target features;
and step S103, storing the target characteristics subjected to dimension reduction into a sample space.
3. An object recognition method according to claim 1, characterized in that: in steps S3 and S4, the operation of extracting the sample includes extracting the HOG feature and the CN feature of the tracking target region, and preprocessing the extraction result.
4. An object recognition method according to claim 1, wherein step S5 further comprises:
step S500, evaluating the fractional graph by using the average peak correlation energy, and obtaining an energy value;
step S501, if the target of the previous frame is not lost, the evaluation is carried out at this time to judge whether the change condition of the energy value and the peak value of the score map relative to the previous frame meets the preset condition or not, and whether the energy value and the peak value of the score map meet the preset condition or not;
step S502, if the target of the previous frame is lost, the evaluation is judged whether the peak values of the energy value and the score map meet the preset condition;
step S503, the ideal degree of the evaluation result is divided into excellent, better, worse and extremely poor, and when the final result is more than poor, the step S6 is executed; and when the final result is extremely poor, the tracking target of the frame is considered to be lost, the next frame is entered, and the step S2 is returned.
5. An object recognition method according to claim 1, characterized in that: in step S3, samples are sequentially extracted from the position of the tracking target of the previous frame of picture and the surrounding upper, lower, left, right, upper left, lower left, upper right, and lower right positions thereof.
6. An object recognition method according to claim 5, characterized in that: in step S3, each sample is compared with the tracker of the previous frame to obtain a score map of each position sample.
7. An object recognition method according to claim 1, wherein step S6 further comprises:
step S600, sample weight is distributed according to the evaluation result in the step S5;
step S601, performing iterative optimization on the score map by using a Newton method to obtain the best score map, wherein the position of the maximum value in the score map is a target position;
and step S603, performing target scaling prediction by using PCA dimension reduction.
8. An object recognition method according to claim 1, characterized in that: in step S702, if the sample space is full, the similarity between the new sample and all the old samples in the sample space is calculated, and if the similarity between the new sample and the old samples is higher than a certain threshold, the new sample and the old samples are merged; otherwise, calculating the similarity degree between all the old samples in the sample space, selecting two samples with the highest similarity degree for fusion, and then inserting the vacated position into the new sample.
9. An object recognition device comprises
The initial frame processing unit is used for extracting a sample of a tracking target from an initial frame picture, training a tracker and storing target characteristics into a sample space;
the loss judging unit is used for reading the current frame picture and judging whether the tracking target in the previous frame is lost;
the loss retrieving unit is used for evaluating the position of the tracking target of the previous frame of picture and samples of a plurality of positions around the tracking target by utilizing the tracker trained at the last time and acquiring a score map of each position, wherein the judgment result of the loss judging unit is that the tracking target is lost;
a current frame tracking target position obtaining unit, configured to extract a current frame picture sample according to the position of the tracking target of the previous frame picture if the tracking target is not lost as a result of the judgment of the loss judgment unit, and evaluate the sample by using the tracker trained last time to obtain a score map;
the tracking result evaluation unit is used for evaluating the fractional graph and judging whether the target is lost;
the tracking result updating unit is used for updating the sample weight, updating the position of a tracking target, predicting the target scaling and updating the target scale;
the tracker training unit is used for weighting and updating the new sample to a sample space by sample weight, training the tracker by the sample space according to a preset interval and returning to the loss judgment unit; the new sample is weighted and updated to a sample space by sample weight, and the tracker is trained by the sample space according to a preset interval, which specifically comprises the following steps: for weighting the samples of the current frame; judging whether the sample space is full of samples or not; if the sample space is full, judging whether the new sample is stored in the sample space in a mode of fusing the old sample and inserting the new sample or is stored in the sample space in a mode of fusing the old sample and the old sample; if the sample space is not full, directly putting a new sample behind the old sample;
and training the tracker by using a sample space according to a preset training interval.
CN201811206301.4A 2018-10-17 2018-10-17 Object identification method and device Active CN109492537B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811206301.4A CN109492537B (en) 2018-10-17 2018-10-17 Object identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811206301.4A CN109492537B (en) 2018-10-17 2018-10-17 Object identification method and device

Publications (2)

Publication Number Publication Date
CN109492537A CN109492537A (en) 2019-03-19
CN109492537B true CN109492537B (en) 2023-03-14

Family

ID=65691341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811206301.4A Active CN109492537B (en) 2018-10-17 2018-10-17 Object identification method and device

Country Status (1)

Country Link
CN (1) CN109492537B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110853076B (en) * 2019-11-08 2023-03-31 重庆市亿飞智联科技有限公司 Target tracking method, device, equipment and storage medium
CN110910422A (en) * 2019-11-13 2020-03-24 北京环境特性研究所 Target tracking method and device, electronic equipment and readable storage medium
CN112200790B (en) * 2020-10-16 2023-04-07 鲸斛(上海)智能科技有限公司 Cloth defect detection method, device and medium
CN112150460B (en) * 2020-10-16 2024-03-15 上海智臻智能网络科技股份有限公司 Detection method, detection system, device and medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080187173A1 (en) * 2007-02-02 2008-08-07 Samsung Electronics Co., Ltd. Method and apparatus for tracking video image
CN102930296A (en) * 2012-11-01 2013-02-13 长沙纳特微视网络科技有限公司 Image identifying method and device
CN104574445A (en) * 2015-01-23 2015-04-29 北京航空航天大学 Target tracking method and device
CN104899561A (en) * 2015-05-27 2015-09-09 华南理工大学 Parallelized human body behavior identification method
CN106372666A (en) * 2016-08-31 2017-02-01 同观科技(深圳)有限公司 Target identification method and device
CN106920248A (en) * 2017-01-19 2017-07-04 博康智能信息技术有限公司上海分公司 A kind of method for tracking target and device
CN107992791A (en) * 2017-10-13 2018-05-04 西安天和防务技术股份有限公司 Target following failure weight detecting method and device, storage medium, electronic equipment
CN108510521A (en) * 2018-02-27 2018-09-07 南京邮电大学 A kind of dimension self-adaption method for tracking target of multiple features fusion
CN108564008A (en) * 2018-03-28 2018-09-21 厦门瑞为信息技术有限公司 A kind of real-time pedestrian and method for detecting human face based on ZYNQ
CN108664930A (en) * 2018-05-11 2018-10-16 西安天和防务技术股份有限公司 A kind of intelligent multi-target detection tracking

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080187173A1 (en) * 2007-02-02 2008-08-07 Samsung Electronics Co., Ltd. Method and apparatus for tracking video image
CN102930296A (en) * 2012-11-01 2013-02-13 长沙纳特微视网络科技有限公司 Image identifying method and device
CN104574445A (en) * 2015-01-23 2015-04-29 北京航空航天大学 Target tracking method and device
CN104899561A (en) * 2015-05-27 2015-09-09 华南理工大学 Parallelized human body behavior identification method
CN106372666A (en) * 2016-08-31 2017-02-01 同观科技(深圳)有限公司 Target identification method and device
CN106920248A (en) * 2017-01-19 2017-07-04 博康智能信息技术有限公司上海分公司 A kind of method for tracking target and device
CN107992791A (en) * 2017-10-13 2018-05-04 西安天和防务技术股份有限公司 Target following failure weight detecting method and device, storage medium, electronic equipment
CN108510521A (en) * 2018-02-27 2018-09-07 南京邮电大学 A kind of dimension self-adaption method for tracking target of multiple features fusion
CN108564008A (en) * 2018-03-28 2018-09-21 厦门瑞为信息技术有限公司 A kind of real-time pedestrian and method for detecting human face based on ZYNQ
CN108664930A (en) * 2018-05-11 2018-10-16 西安天和防务技术股份有限公司 A kind of intelligent multi-target detection tracking

Also Published As

Publication number Publication date
CN109492537A (en) 2019-03-19

Similar Documents

Publication Publication Date Title
CN109492537B (en) Object identification method and device
CN111126360B (en) Cross-domain pedestrian re-identification method based on unsupervised combined multi-loss model
CN110472531B (en) Video processing method, device, electronic equipment and storage medium
Lopez-Antequera et al. Appearance-invariant place recognition by discriminatively training a convolutional neural network
CN109815364B (en) Method and system for extracting, storing and retrieving mass video features
CN110120064B (en) Depth-related target tracking algorithm based on mutual reinforcement and multi-attention mechanism learning
CN109993102B (en) Similar face retrieval method, device and storage medium
CN110717411A (en) Pedestrian re-identification method based on deep layer feature fusion
CN112489081B (en) Visual target tracking method and device
JP2006338313A (en) Similar image retrieving method, similar image retrieving system, similar image retrieving program, and recording medium
CN111598067B (en) Re-recognition training method, re-recognition method and storage device in video
CN113920472A (en) Unsupervised target re-identification method and system based on attention mechanism
CN115527269A (en) Intelligent human body posture image identification method and system
CN111709236A (en) Case similarity matching-based trial risk early warning method
CN109241315B (en) Rapid face retrieval method based on deep learning
CN116665272A (en) Airport scene face recognition fusion decision method and device, electronic equipment and medium
CN115527083B (en) Image annotation method and device and electronic equipment
CN115393755A (en) Visual target tracking method, device, equipment and storage medium
CN112487927B (en) Method and system for realizing indoor scene recognition based on object associated attention
CN114298154A (en) Active learning method and device, electronic equipment and readable storage medium
CN111984812B (en) Feature extraction model generation method, image retrieval method, device and equipment
CN115082854A (en) Pedestrian searching method oriented to security monitoring video
Lanzarini et al. Face recognition using SIFT and binary PSO descriptors
CN112685580A (en) Social network head portrait comparison distributed detection system, method and device based on deep learning, processor and storage medium thereof
CN111598926B (en) Target tracking method and device for optimizing ECO feature extraction performance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant