CN116958870A

CN116958870A - Video feature extraction method and device, readable storage medium and terminal equipment

Info

Publication number: CN116958870A
Application number: CN202310917755.7A
Authority: CN
Inventors: 王侃; 胡淑萍; 庞建新; 谭欢
Original assignee: Shenzhen Ubtech Technology Co ltd
Current assignee: Shenzhen Ubtech Technology Co ltd
Priority date: 2023-07-24
Filing date: 2023-07-24
Publication date: 2023-10-27

Abstract

The application belongs to the technical field of target identification, and particularly relates to a video feature extraction method, a video feature extraction device, a computer readable storage medium and terminal equipment. The method comprises the following steps: acquiring a video sequence to be processed; respectively extracting image features of each video frame in the video sequence to obtain first image features of each video frame; calculating a first video feature of the video sequence from the first image feature of each video frame; respectively carrying out feature optimization on the first image features of each video frame according to the first video features to obtain second image features of each video frame; and calculating a second video characteristic of the video sequence according to the second image characteristic of each video frame. The method and the device can effectively weaken the quality influence of the video frames with poor quality on the finally extracted video features, and improve the robustness of video feature extraction.

Description

Video feature extraction method and device, readable storage medium and terminal equipment

Technical Field

The application belongs to the technical field of target identification, and particularly relates to a video feature extraction method, a video feature extraction device, a computer readable storage medium and terminal equipment.

Background

Video feature extraction refers to extracting feature information from a video sequence to identify the video sequence using the feature information. With the development of artificial intelligence technology, various videos are more and more abundant, and how to accurately extract video features becomes important.

In the prior art, the video features are typically calculated in an averaging manner, i.e. each video frame contributes equally to the final extracted video feature. If video frames with poor quality such as image blurring, overexposure and the like exist in the video sequence, the quality of the finally extracted video features is affected, and the robustness is poor.

Disclosure of Invention

In view of the above, embodiments of the present application provide a video feature extraction method, apparatus, computer readable storage medium, and terminal device, so as to solve the problem that the existing video feature extraction method is poor in robustness.

A first aspect of an embodiment of the present application provides a video feature extraction method, which may include:

acquiring a video sequence to be processed;

respectively extracting image features of each video frame in the video sequence to obtain first image features of each video frame;

calculating a first video feature of the video sequence from the first image feature of each video frame;

respectively carrying out feature optimization on the first image features of each video frame according to the first video features to obtain second image features of each video frame;

and calculating a second video characteristic of the video sequence according to the second image characteristic of each video frame.

In a specific implementation manner of the first aspect, the performing feature optimization on the first image features of each video frame according to the first video feature to obtain the second image features of each video frame may include:

calculating the feature similarity between the first video feature and the first image feature of the target video frame; wherein the target video frame is any video frame in the video sequence;

and calculating a second image characteristic of the target video frame according to the characteristic similarity, the first image characteristic of the target video frame and the first video characteristic.

In a specific implementation manner of the first aspect, the calculating the second image feature of the target video frame according to the feature similarity, the first image feature of the target video frame, and the first video feature may include:

respectively determining a first weight and a second weight according to the feature similarity; the first weight is a weight corresponding to a first image feature of the target video frame, the second weight is a weight corresponding to the first video feature, the first weight is positively correlated with the feature similarity, and the second weight is negatively correlated with the feature similarity;

and carrying out feature fusion on the first image feature of the target video frame and the first video feature according to the first weight and the second weight to obtain a second image feature of the target video frame.

In a specific implementation manner of the first aspect, the determining the first weight and the second weight according to the feature similarity may include:

determining the feature similarity as the first weight;

and determining the difference value between the preset weight sum and the feature similarity as the second weight.

In a specific implementation manner of the first aspect, the performing feature fusion on the first image feature of the target video frame and the first video feature according to the first weight and the second weight to obtain the second image feature of the target video frame may include:

calculating a first weighting feature according to the first weight and a first image feature of the target video frame;

calculating a second weighted feature according to the second weight and the first video feature;

and calculating a second image characteristic of the target video frame according to the first weighted characteristic and the second weighted characteristic.

In a specific implementation manner of the first aspect, the calculating a first video feature of the video sequence according to a first image feature of the respective video frame may include:

and carrying out average processing on the first image features of each video frame to obtain the first video features.

In a specific implementation manner of the first aspect, the calculating the second video feature of the video sequence according to the second image feature of the respective video frame may include:

and carrying out average processing on the second image features of each video frame to obtain the second video features.

A second aspect of an embodiment of the present application provides a video feature extraction apparatus, which may include:

the video sequence acquisition module is used for acquiring a video sequence to be processed;

the feature extraction module is used for extracting image features of each video frame in the video sequence respectively to obtain first image features of each video frame;

a first video feature calculation module for calculating a first video feature of the video sequence from a first image feature of each video frame;

the feature optimization module is used for respectively performing feature optimization on the first image features of each video frame according to the first video features to obtain second image features of each video frame;

and the second video feature calculation module is used for calculating second video features of the video sequence according to the second image features of the video frames.

In a specific implementation manner of the second aspect, the feature optimization module may include:

the similarity calculation sub-module is used for calculating the feature similarity between the first video feature and the first image feature of the target video frame; wherein the target video frame is any video frame in the video sequence;

and the feature calculation sub-module is used for calculating the second image feature of the target video frame according to the feature similarity, the first image feature of the target video frame and the first video feature.

In a specific implementation manner of the second aspect, the feature calculation sub-module may include:

the weight determining unit is used for determining a first weight and a second weight according to the feature similarity; the first weight is a weight corresponding to a first image feature of the target video frame, the second weight is a weight corresponding to the first video feature, the first weight is positively correlated with the feature similarity, and the second weight is negatively correlated with the feature similarity;

and the feature fusion unit is used for carrying out feature fusion on the first image feature of the target video frame and the first video feature according to the first weight and the second weight to obtain a second image feature of the target video frame.

In a specific implementation manner of the second aspect, the weight determining unit may specifically be configured to: determining the feature similarity as the first weight; and determining the difference value between the preset weight sum and the feature similarity as the second weight.

In a specific implementation manner of the second aspect, the feature fusion unit may specifically be configured to: calculating a first weighting feature according to the first weight and a first image feature of the target video frame; calculating a second weighted feature according to the second weight and the first video feature; and calculating a second image characteristic of the target video frame according to the first weighted characteristic and the second weighted characteristic.

In a specific implementation manner of the second aspect, the first video feature calculation module may specifically be configured to: and carrying out average processing on the first image features of each video frame to obtain the first video features.

In a specific implementation manner of the second aspect, the second video feature calculation module may specifically be configured to: and carrying out average processing on the second image features of each video frame to obtain the second video features.

A third aspect of the embodiments of the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of any of the video feature extraction methods described above.

A fourth aspect of the embodiments of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the steps of any one of the video feature extraction methods described above when executing the computer program.

A fifth aspect of the embodiments of the present application provides a computer program product for, when run on a terminal device, causing the terminal device to perform the steps of any of the video feature extraction methods described above.

Compared with the prior art, the embodiment of the application has the beneficial effects that: the embodiment of the application acquires a video sequence to be processed; respectively extracting image features of each video frame in the video sequence to obtain first image features of each video frame; calculating a first video feature of the video sequence from the first image feature of each video frame; respectively carrying out feature optimization on the first image features of each video frame according to the first video features to obtain second image features of each video frame; and calculating a second video characteristic of the video sequence according to the second image characteristic of each video frame. In the embodiment of the application, the initial image characteristics (namely the first image characteristics) of each video frame can be subjected to characteristic optimization through the initial video characteristics (namely the first video characteristics) of the video sequence, so that the quality influence of the video frame with poor quality on the finally extracted video characteristics is effectively weakened, and the robustness of video characteristic extraction is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of an embodiment of a video feature extraction method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of feature optimization of first image features of respective video frames based on first video features;

FIG. 3 is a schematic diagram illustrating a complete implementation of an embodiment of a video feature extraction method according to an embodiment of the present application;

FIG. 4 is a block diagram of an embodiment of a video feature extraction apparatus according to an embodiment of the application;

fig. 5 is a schematic block diagram of a terminal device in an embodiment of the present application.

Detailed Description

In order to make the objects, features and advantages of the present application more comprehensible, the technical solutions in the embodiments of the present application are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

In addition, in the description of the present application, the terms "first," "second," "third," etc. are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.

In view of this, embodiments of the present application provide a video feature extraction method, apparatus, computer readable storage medium and terminal device. In the embodiment of the application, the initial image characteristics of each video frame can be subjected to characteristic optimization through the initial video characteristics of the video sequence, so that the quality influence of the video frame with poor quality on the finally extracted video characteristics is effectively weakened, and the robustness of video characteristic extraction is improved.

It should be noted that, the execution body of the method of the present application is a terminal device, and specifically may include, but not limited to, a mobile phone, a tablet computer, a desktop computer, a notebook computer, a palm computer, a robot, and other computing devices.

Referring to fig. 1, an embodiment of a video feature extraction method according to an embodiment of the present application may include:

step S101, a video sequence to be processed is acquired.

Wherein the video sequence is a sequence of at least two video frames. In the embodiment of the application, the video sequence can be acquired through a preset camera acquisition device. For example, a video may be photographed by a monitoring camera installed at a preset position and taken as a video sequence.

After the video sequence is acquired, the video sequence can be stored in a preset position, and the video sequence can be acquired in time when video feature extraction is required. For example, the video sequence may be stored in a memory module of the terminal device, and may be directly obtained from the memory module of the terminal device when video feature extraction is required. For another example, in order to realize safe data backup, the terminal device may upload memory data to the cloud server at regular time, the video sequence may be stored in the cloud server, and when video feature extraction is required, a video sequence acquisition request may be sent to the cloud server, and the video sequence may be acquired by analyzing a response message returned by the cloud server.

In order to improve the efficiency of video feature extraction, frame extraction processing may be performed on the acquired video sequence, for example, one frame may be extracted from every N frames, so as to obtain a video sequence to be processed. Wherein N is an integer greater than 1.

Step S102, extracting image features of each video frame in the video sequence respectively to obtain first image features of each video frame.

After the video sequence is acquired, the video sequence may be split to obtain each video frame in the video sequence. For each video frame, image feature extraction may be performed, and image feature information in the video frame may be extracted and recorded as a first image feature.

In specific applications, any image feature extraction method in the prior art may be selected according to actual situations, which is not specifically limited in the embodiments of the present application.

In a specific implementation manner of the embodiment of the present application, each video frame { I } in the video sequence may be extracted through a preset image feature extraction network ₁ ,I ₂ ,I ₃ ,...,I _T Respectively extracting image features to obtain first image features { F } of each corresponding video frame ₁ ,F ₂ ,F ₃ ,...,F _T }. Wherein the image feature extraction network is a neural network for extracting image features, I _t For the t-th video frame in the video sequence, F _t And T is the first image characteristic of the T-th video frame, T is the video frame sequence number in the video sequence, T is more than or equal to 1 and less than or equal to T, and T is the total number of video frames in the video sequence.

It should be noted that, the selection of the image feature extraction network in the embodiment of the present application is not limited in particular, and may be set according to actual needs. For example, the image feature extraction network may be any of the prior art neural networks for feature extraction, such as convolutional neural networks (Convolutional Neural Network, CNN) or cyclic neural networks (Recurrent Neural Network, RNN). Before use, the selected neural network can be trained in advance to obtain the image feature extraction network in the embodiment of the application, and the image feature extraction network can be directly used in the subsequent image feature extraction.

Step S103, calculating a first video feature of the video sequence according to the first image feature of each video frame.

In a specific implementation manner of the embodiment of the present application, an average process may be performed on the first image feature of each video frame to obtain an initial video feature (denoted as a first video feature), where the following formula is shown:

wherein F is the first video feature.

Step S104, respectively performing feature optimization on the first image features of each video frame according to the first video features to obtain second image features of each video frame.

For ease of understanding, the process of computing the second image feature will be described in detail herein with reference to any one of the video frames in the video sequence (which will be referred to as the target video frame). As shown in fig. 2, the calculation process of the second image feature of the target video frame may specifically include:

step S1041, calculating a feature similarity between the first video feature and the first image feature of the target video frame.

In a specific implementation manner of the embodiment of the present application, a vector inner product between a first video feature and a first image feature of a target video frame and a product of a first vector modulus and a second vector modulus may be calculated respectively, where the first vector modulus is a modulus of a vector of the first video feature and the second vector modulus is a modulus of a vector of the first image feature of the target video frame. The ratio of the inner product to the product may then be taken as a feature similarity between the first video feature and the first image feature of the target video frame as shown in the following equation:

wherein, the liquid crystal display device comprises a liquid crystal display device, the term "s" is the modulus of the vector _t Is the feature similarity between the first video feature and the first image feature of the t-th video frame.

It should be noted that the above similarity calculation process is only an example, and in practical application, any similarity calculation mode in the prior art may be selected according to practical situations, which is not particularly limited in the embodiment of the present application.

Step S1042, calculating a second image feature of the target video frame according to the feature similarity, the first image feature of the target video frame and the first video feature.

In a specific implementation manner of the embodiment of the present application, the first weight and the second weight may be first determined respectively according to the feature similarity.

The first weight is a weight corresponding to a first image feature of the target video frame, and the second weight is a weight corresponding to the first video feature. The first weight is positively correlated with the feature similarity, i.e. the larger the feature similarity is, the larger the first weight is, whereas the smaller the feature similarity is, the smaller the first weight is; the second weight is inversely related to the feature similarity, i.e. the larger the feature similarity is, the smaller the second weight is, whereas the smaller the feature similarity is, the larger the second weight is. For example, the feature similarity may be determined as a first weight, and a difference between a preset weight sum and the feature similarity may be determined as a second weight, as shown in the following formula:

w _t ＝s _t

w _t ′＝w _sum -s _t

wherein w is _sum The specific value of the weight sum can be set according to practical situations, the embodiment of the application is not limited in particular, and is preferably set to be 1, w _t Is the first weight, w _t ' is the second weight

After the first weight and the second weight are determined, feature fusion can be performed on the first image feature and the first video feature of the target video frame according to the first weight and the second weight, so that the second image feature of the target video frame is obtained.

Specifically, a first weighted feature may be calculated based on the first weight and a first image feature of the target video frame, a second weighted feature may be calculated based on the second weight and the first video feature, and a second image feature of the target video frame may be calculated based on the first weighted feature and the second weighted feature, as shown in the following formula:

h _t ＝w _t F _t +w _t ′F

or h _t ＝s _t F _t +(1-s _t )F

Wherein w is _t F _t For the first weighting feature, w _t ' F is a second weighting feature, h _t Is the second image feature of the t-th video frame.

And traversing each video frame in the video sequence according to the process, so as to obtain the second image characteristic of each video frame.

Step S105, calculating a second video feature of the video sequence according to the second image feature of each video frame.

In a specific implementation manner of the embodiment of the present application, the second image feature of each video frame may be subjected to an averaging process to obtain a final video feature (denoted as a second video feature), where the following formula is shown:

h is the second video feature obtained after feature optimization.

Fig. 3 is a schematic diagram illustrating a complete implementation of an embodiment of a video feature extraction method according to an embodiment of the present application. As shown in the figure, the embodiment of the application firstly acquires a video sequence to be processed, and respectively performs image feature extraction on each video frame in the video sequence to obtain a first image feature of each video frame; then carrying out average processing on the first image features of each video frame to obtain first video features, and respectively calculating feature similarity between the first video features and the first image features of each video frame; then determining weights according to the feature similarity, and carrying out weighted average on the first video features and the first image features according to the weights to obtain second image features of each video frame; and finally, carrying out average processing on the second image features of each video frame to obtain the final video features of the video sequence.

After the final video feature of the video sequence is obtained, visual tasks such as video-based face detection and recognition, video-based vehicle recognition, video-based pedestrian detection and recognition and the like can be performed based on the video feature.

In summary, the embodiment of the application acquires the video sequence to be processed; respectively extracting image features of each video frame in the video sequence to obtain first image features of each video frame; calculating a first video feature of the video sequence from the first image feature of each video frame; respectively carrying out feature optimization on the first image features of each video frame according to the first video features to obtain second image features of each video frame; a second video feature of the video sequence is calculated from the second image features of the respective video frames. In the embodiment of the application, the first image characteristics of each video frame can be subjected to characteristic optimization through the first video characteristics of the video sequence, so that the quality influence of the video frame with poor quality on the finally extracted video characteristics is effectively weakened, and the robustness of video characteristic extraction is improved.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

Fig. 4 shows a block diagram of an embodiment of a video feature extraction apparatus according to the embodiment of the present application, corresponding to a video feature extraction method described in the foregoing embodiments.

In this embodiment, a video feature extraction apparatus may include:

a video sequence acquisition module 401, configured to acquire a video sequence to be processed;

the feature extraction module 402 is configured to extract image features of each video frame in the video sequence, so as to obtain first image features of each video frame;

a first video feature calculation module 403, configured to calculate a first video feature of the video sequence according to a first image feature of each video frame;

the feature optimization module 404 is configured to perform feature optimization on the first image features of each video frame according to the first video features, so as to obtain second image features of each video frame;

a second video feature calculation module 405 for calculating a second video feature of the video sequence based on the second image feature of the respective video frame.

In a specific implementation manner of the embodiment of the present application, the feature optimization module may include:

In a specific implementation manner of the embodiment of the present application, the feature calculation sub-module may include:

In a specific implementation manner of the embodiment of the present application, the weight determining unit may be specifically configured to: determining the feature similarity as the first weight; and determining the difference value between the preset weight sum and the feature similarity as the second weight.

In a specific implementation manner of the embodiment of the present application, the feature fusion unit may be specifically used for: calculating a first weighting feature according to the first weight and a first image feature of the target video frame; calculating a second weighted feature according to the second weight and the first video feature; and calculating a second image characteristic of the target video frame according to the first weighted characteristic and the second weighted characteristic.

In a specific implementation manner of the embodiment of the present application, the first video feature calculation module may be specifically configured to: and carrying out average processing on the first image features of each video frame to obtain the first video features.

In a specific implementation manner of the embodiment of the present application, the second video feature calculation module may be specifically configured to: and carrying out average processing on the second image features of each video frame to obtain the second video features.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described apparatus, modules and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Fig. 5 shows a schematic block diagram of a terminal device according to an embodiment of the present application, and for convenience of explanation, only a portion related to the embodiment of the present application is shown.

As shown in fig. 5, the terminal device 5 of this embodiment includes: a processor 50, a memory 51 and a computer program 52 stored in said memory 51 and executable on said processor 50. The steps of the respective video feature extraction method embodiments described above, such as steps S101 to S105 shown in fig. 1, are implemented when the processor 50 executes the computer program 52. Alternatively, the processor 50 may perform the functions of the modules/units of the apparatus embodiments described above, such as the functions of the modules 401 to 405 shown in fig. 4, when executing the computer program 52.

By way of example, the computer program 52 may be partitioned into one or more modules/units that are stored in the memory 51 and executed by the processor 50 to complete the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program 52 in the terminal device 5.

The terminal device 5 may be a mobile phone, a tablet computer, a desktop computer, a notebook computer, a palm computer, a robot, or other computing devices. It will be appreciated by those skilled in the art that fig. 5 is merely an example of the terminal device 5 and does not constitute a limitation of the terminal device 5, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the terminal device 5 may further include an input-output device, a network access device, a bus, etc.

The processor 50 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), field programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 51 may be an internal storage unit of the terminal device 5, such as a hard disk or a memory of the terminal device 5. The memory 51 may be an external storage device of the terminal device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the terminal device 5. The memory 51 is used for storing the computer program as well as other programs and data required by the terminal device 5. The memory 51 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable storage medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable storage medium may include content that is subject to appropriate increases and decreases as required by jurisdictions and by jurisdictions in which such computer readable storage medium does not include electrical carrier signals and telecommunications signals.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method for extracting video features, comprising:

acquiring a video sequence to be processed;

2. The method according to claim 1, wherein the performing feature optimization on the first image features of the respective video frames according to the first video features to obtain the second image features of the respective video frames includes:

3. The method according to claim 2, wherein the calculating the second image feature of the target video frame based on the feature similarity, the first image feature of the target video frame, and the first video feature comprises:

4. A video feature extraction method according to claim 3, wherein said determining a first weight and a second weight, respectively, from said feature similarities comprises:

determining the feature similarity as the first weight;

5. The method for extracting video features according to claim 3, wherein the performing feature fusion on the first image feature of the target video frame and the first video feature according to the first weight and the second weight to obtain the second image feature of the target video frame includes:

6. The video feature extraction method according to any one of claims 1 to 5, characterized in that said calculating a first video feature of the video sequence from a first image feature of the respective video frame comprises:

7. The video feature extraction method according to any one of claims 1 to 5, characterized in that said calculating a second video feature of the video sequence from a second image feature of the respective video frame comprises:

8. A video feature extraction apparatus, comprising:

9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the video feature extraction method according to any one of claims 1 to 7.

10. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the video feature extraction method according to any one of claims 1 to 7 when the computer program is executed.