CN113516030A - Action sequence verification method and device, storage medium and terminal - Google Patents

Action sequence verification method and device, storage medium and terminal Download PDF

Info

Publication number
CN113516030A
CN113516030A CN202110469750.3A CN202110469750A CN113516030A CN 113516030 A CN113516030 A CN 113516030A CN 202110469750 A CN202110469750 A CN 202110469750A CN 113516030 A CN113516030 A CN 113516030A
Authority
CN
China
Prior art keywords
sequence
action
action sequence
verified
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110469750.3A
Other languages
Chinese (zh)
Other versions
CN113516030B (en
Inventor
高盛华
钱一成
罗伟鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ShanghaiTech University
Original Assignee
ShanghaiTech University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ShanghaiTech University filed Critical ShanghaiTech University
Priority to CN202110469750.3A priority Critical patent/CN113516030B/en
Publication of CN113516030A publication Critical patent/CN113516030A/en
Application granted granted Critical
Publication of CN113516030B publication Critical patent/CN113516030B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an action sequence verification method, an action sequence verification device, a storage medium and a terminal, wherein the action sequence verification method comprises the following steps: acquiring an action sequence to be verified; extracting the characteristics of the action sequence to be verified to obtain a corresponding characteristic sequence; performing information fusion on the characteristic sequence to obtain the overall sequence characteristics to judge the action category of the action sequence to be verified; and comparing the characteristic sequence of the action sequence to be verified with the characteristic sequence of the standard action sequence to judge whether the action sequences belong to the same action sequence. The action sequence is verified by constructing a new neural network model, and the overall characteristics and the characteristic sequence of the action sequence are restrained according to time sequence, so that the action verification accuracy is high; the application field is wide, for example, whether people in two sections of videos complete the same action or not is identified, and the standardized flow of factories and workshops is detected; scoring the sports entertainment domain for action, and the like.

Description

Action sequence verification method and device, storage medium and terminal
Technical Field
The invention relates to the field of computer vision, in particular to an action sequence verification method, an action sequence verification device, a storage medium and a terminal.
Background
With the rapid development of information network technology, video has become a main means for people to acquire information, and the video permeates into various fields such as production, security, transportation, entertainment and the like. Among them, how to effectively utilize video information to realize motion recognition has become a research hotspot. In the method, how to accurately verify the action sequence is in a blank stage, such as judging whether the action sequences in two videos are the same action sequence, judging whether the work of a factory workshop meets the standard, and the like.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, it is an object of the present invention to provide an action sequence verification method, device, storage medium and terminal to solve the problems in the prior art.
To achieve the above and other related objects, a first aspect of the present invention provides an action sequence verification method, including: acquiring an action sequence to be verified; extracting the characteristics of the action sequence to be verified to obtain a corresponding characteristic sequence; performing information fusion on the characteristic sequence to obtain the overall sequence characteristics to judge the action category of the action sequence to be verified; and comparing the characteristic sequence of the action sequence to be verified with the characteristic sequence of the standard action sequence to judge whether the action sequences belong to the same action sequence.
In some embodiments of the first aspect of the present invention, the manner of feature extraction includes: and performing feature extraction on the action sequence to be verified based on a BN-inclusion network or a Resnet50 network, and modifying the final feature sequence of the full connection layer output preset dimension.
In some embodiments of the first aspect of the present invention, the obtaining of the overall sequence feature includes: and performing time-series information fusion on the characteristic sequence of the action sequence to be verified based on a Vision Transformer model to obtain the overall sequence characteristics.
In some embodiments of the first aspect of the present invention, the manner of comparing the features comprises: comparing the characteristic sequence of the action sequence to be verified with the characteristic sequence of the standard action sequence one by one according to a time sequence to obtain a characteristic comparison result; and constructing a loss function to supervise and constrain the feature comparison result.
In some embodiments of the first aspect of the present invention, the loss function is a first loss function; the method further comprises the following steps: acquiring the overall sequence characteristics of the action sequence to be verified based on a Vision Transformer model; judging the action type of the action sequence to be verified based on the overall sequence characteristics; and constructing a second loss function to supervise and constrain the judgment result of the action category.
In some embodiments of the first aspect of the present invention, the action sequence verification method comprises: and performing weighted calculation based on the first loss function and the second loss function to obtain a verification result of the action sequence to be verified.
In some embodiments of the first aspect of the present invention, the result of the feature extraction is a plurality of feature maps; the method comprises the following steps: dividing all feature maps into a plurality of fixed-size patches to obtain feature patch sequences; inputting the obtained feature patch sequence into a Vision Transformer; and the Vision Transformer weights and fuses the characteristics of the rest of the patches for each patch through a self-attribute module, a special token is reserved, and the characteristics corresponding to the token position represent the category information of the whole action sequence to be verified.
To achieve the above and other related objects, a second aspect of the present invention provides an action sequence verification apparatus, including: the action sequence acquisition module is used for acquiring an action sequence to be verified; the characteristic extraction module is used for extracting the characteristics of the action sequence to be verified so as to obtain a corresponding characteristic sequence; the characteristic fusion module is used for carrying out information fusion on the characteristic sequence to obtain the overall sequence characteristic to judge the action category of the action sequence to be verified; and the characteristic comparison module is used for comparing the characteristic sequence of the action sequence to be verified with the characteristic sequence of the standard action sequence to judge whether the action sequences belong to the same action sequence.
To achieve the above and other related objects, a third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the action sequence verification method.
To achieve the above and other related objects, a fourth aspect of the present invention provides an electronic terminal, comprising: a processor and a memory; the memory is used for storing computer programs, and the processor is used for executing the computer programs stored by the memory so as to enable the terminal to execute the action sequence verification method.
As described above, the present invention provides an action sequence verification method, an action sequence verification device, a storage medium, and a terminal, wherein a feature sequence is obtained by extracting features from an action sequence, and a finer-grained feature of the action sequence to be verified is obtained by performing time sequence matching and information fusion on the feature sequence, so as to implement accurate verification of the action sequence; the method can compare and verify whether the action sequences in the two videos are consistent or not, and is more suitable for the actual life scene needing a plurality of continuous atomic actions when the actions are completed; and the action sequence which is not predefined can be verified, and the application range is expanded. In addition, the action sequence is verified by constructing a new neural network model, so that not only can the action information of each atomic action in the action sequence be extracted, but also the overall characteristics of the action sequence can be extracted, and the overall characteristics of the action sequence, the characteristics of each atomic action of the action sequence and the time sequence of the characteristic sequence can be restrained at the same time, thereby greatly improving the success rate of action verification. Moreover, the invention has wide application field, can be used for accurately verifying whether people in two sections of videos complete the same action sequence, can also detect the standardized flow in the production fields of factories, workshops and the like, and is beneficial to improving the product quality; the method can also be applied to the fields of sports entertainment and the like, and can be used for scoring the actions of athletes, scoring the actions in man-machine interaction games and the like.
Drawings
Fig. 1 is a flowchart illustrating an action sequence verification method according to an embodiment of the invention.
FIG. 2 is a diagram illustrating a sequence of three groups of actions obtained from decimating a video according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of an action sequence verification neural network model and a working process thereof according to an embodiment of the invention.
Fig. 4 is a schematic structural diagram of an action sequence verification apparatus according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of an electronic terminal according to an embodiment of the invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It is noted that in the following description, reference is made to the accompanying drawings which illustrate several embodiments of the present invention. It is to be understood that other embodiments may be utilized and that changes may be made without departing from the spirit and scope of the present invention. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present invention is defined only by the claims of the issued patent. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
In the present invention, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," and/or "comprising," when used in this specification, specify the presence of stated features, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, operations, elements, components, items, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions or operations are inherently mutually exclusive in some way.
The invention provides an action sequence verification method, an action sequence verification device, a storage medium and a terminal, and aims to solve the problems in the prior art.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention are further described in detail by the following embodiments in conjunction with the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example one
As shown in fig. 1, the present embodiment provides a flow chart of an action sequence verification method, which includes steps S11 to S14, and can be specifically described as follows:
and S11, acquiring an action sequence to be verified. An action sequence is composed of a plurality of atomic actions with sequence relation, and because the functions which can be completed by a single atomic action are too few, more actions are shown in the form of the action sequence in life. The atomic actions contained in different action sequences may differ in number, may differ in corresponding single or multiple atomic actions, and may also differ in number and action of atomic actions.
For example, fig. 2 shows a motion sequence of three groups of different video pictures obtained by decimating three videos, wherein each video picture represents an atomic motion. These three sets of action sequences are named in order: seq1, seq2 and seq 3. One motion sequence is composed of a plurality of atomic motions with sequence relation (i.e. corresponding video pictures in fig. 1), seq1 and seq2 contain the same atomic motions, and the sequence of the atomic motions is the same, so seq1 and seq2 belong to the same motion sequence; since seq2 and seq3 do not differ in the order of atomic actions and in the number of atomic actions, seq2 and seq3 do not belong to the same action sequence.
Generally, the action sequence to be verified comes from a video, and can be obtained by sampling continuous video frames in the video according to a preset rule. It is preferable that the acquired motion sequence is preprocessed, for example, a picture is smoothed, restored, enhanced, median filtered, edge detected, and the like, and further, there are advantageous effects of removing irrelevant information, restoring useful information, enhancing detectability of relevant information, and simplifying data to the maximum extent.
And S12, extracting the characteristics of the action sequence to be verified to obtain a corresponding characteristic sequence. Specifically, a convolutional neural network is utilized to perform feature extraction operation on an action sequence to be verified, so as to obtain a feature sequence ordered according to time. The feature sequence is a representation of motion information included in the entire motion sequence in a feature space.
In a preferred embodiment of this embodiment, the feature extraction method includes: and constructing a feature extraction module (backhaul) based on a BN-inclusion network or a Resnet50 network to extract features of the action sequence to be verified, and modifying a final full connection layer (fc layer) to output a feature sequence with a preset dimension.
Specifically, the BN-inclusion network and the Resnet50 network include a plurality of layers, which extract input pictures hierarchically through a hierarchical structure, and as the layers are deepened, the extracted features are higher in dimension and more global. And the BN-inclusion network and the Resnet50 network have the characteristic of strong robustness.
Further, the feature extraction module finally needs to solve a classification problem of 45 classes, so that the K-dimensional features need to be converted into 45 dimensions through an fc layer, and the output preset dimension of the fc layer is larger than 45; and if the value of K is too high, the output dimension of the previous backbone is exceeded, so the output preset dimension of the fc layer is smaller than 512. Preferably, the output dimension of the feature sequence is 128 or 256, so that the effective balance of the identification efficiency and the identification precision can be realized.
And S13, carrying out information fusion on the characteristic sequence to obtain the overall sequence characteristics to judge the action type of the action sequence to be verified. Specifically, the time sequence information fusion is performed on the feature sequence of the action sequence to be verified based on a Vision Transformer model to obtain the overall sequence feature.
Specifically, the Vision Transformer model has the following working flow: inputting a video frame sequence (namely an action sequence to be verified) into a backbone (such as resnet50) to perform feature extraction and outputting a series of feature maps (feature maps); dividing all feature maps into a plurality of patches with fixed sizes, and inputting the obtained feature patch sequences into a vision transform; the Vision transform weights and fuses the characteristics of the rest of the patch for each patch through a self-attribute module (namely, the input feature patch sequence is subjected to characteristic sequence enhancement); meanwhile, a special token is reserved, and the corresponding characteristics of the token position are used for representing the category information of the whole video frame sequence. The Vision Transformer model has no strong inductive bias, has stronger time sequence modeling capability and smaller calculated amount, and is particularly suitable for the characteristic extraction and fusion of action sequences with time sequence.
Further, the output of the Vision Transformer model and the feature identification module (Backbone) keep consistent with each other in feature dimension, namely, the dimension of the last fc layer of the Vision Transformer model and the dimension of the Backbone output feature are set to be consistent, so that the validity of the Vision Transformer model can be tested.
And S14, comparing the characteristic sequence of the action sequence to be verified with the characteristic sequence of the standard action sequence to judge whether the action sequences belong to the same action sequence. Specifically, the characteristic sequences of the action sequences to be verified and the characteristic sequences of the standard action sequences are compared one by one according to a time sequence to obtain a characteristic comparison result; constructing a first loss function to supervise and constrain the feature comparison result; acquiring the overall sequence characteristics of the action sequence to be verified based on a Vision Transformer model; judging the action type of the action sequence to be verified based on the overall sequence characteristics; and constructing a second loss function to supervise and constrain the judgment result of the action category.
In a preferred embodiment of this embodiment, a weighting calculation is performed based on the first loss function and the second loss function, and the influence degrees of the two losses on the training of the integral model are balanced, so as to obtain the verification result of the action sequence to be verified. In some examples, the ratio of the weights of the first loss function and the second loss function may be set to 10, 2, 1, etc., wherein a more accurate verification result may be obtained when the weight is set to 10.
In some embodiments, the method may be applied to a controller, such as an arm (advanced RISC machines) controller, an fpga (field Programmable Gate array) controller, a soc (system on chip) controller, a dsp (digital Signal processing) controller, or an mcu (microcontroller unit) controller, among others. In some embodiments, the methods are also applicable to computers including components such as memory, memory controllers, one or more processing units (CPUs), peripheral interfaces, RF circuits, audio circuits, speakers, microphones, input/output (I/O) subsystems, display screens, other output or control devices, and external ports; the computer includes, but is not limited to, Personal computers such as desktop computers, notebook computers, tablet computers, smart phones, smart televisions, Personal Digital Assistants (PDAs), and the like. In other embodiments, the method may also be applied to servers, which may be arranged on one or more physical servers, or may be formed of a distributed or centralized cluster of servers, depending on various factors such as function, load, etc.
Example two
As shown in fig. 3, the present embodiment provides a schematic diagram of an action sequence verification neural network model and a workflow thereof. Specifically, the action sequence verification neural network model includes: the system comprises a feature extraction module (Intra-action module) established based on a 2D Backbone, a feature fusion module (Inter-action module) established based on a Vision Transformer and a feature comparison module (Alignment module). It is worth mentioning that the feature enhancement module built based on the Vision Transformer not only can realize parallel training, but also can obtain the global feature information of the action sequence, and is beneficial to ensuring the verification accuracy of the action sequence. After training of the neural network on the feature extraction module and the feature fusion module is completed, the action sequence to be verified and the standard action sequence are respectively input into the neural network and compared through the feature comparison module after the two feature sequences are obtained, and whether the action sequences belong to the same action sequence is judged.
Fig. 3 shows the workflow of the action sequence verification neural network model by taking the verification process of whether two groups of action sequences input frames 1 and input frames 2 are the same action sequence as an example, which can be specifically described as follows:
firstly, inputting the action sequences input frames 1 and input frames 2 into a 2D Backbone, connecting the features corresponding to a plurality of atomic actions in each action sequence according to a specified dimension by using a concat function, and acquiring a corresponding Feature map sequence (Feature map sequence) by using a reshape function.
Then, Feature sequences corresponding to the motion sequences input frames 1 and input frames 2 are input into the Feature enhancement module, Feature maps with different lengths are converted into vectors with fixed lengths by using a linear projection layer (linear projection layer), the Feature maps and the position encoding are combined in a mode of adding together, and an extra-low encoding represents that a corresponding result after passing through a transport encoder is the representation of the whole motion sequence, namely, the whole sequence Feature vector is obtained, so that the motion type of the corresponding motion sequence is judged and obtained, and a Classification threshold (class scores) is set based on a second loss function, so that a corresponding first loss value (class 1) and a corresponding second loss value (class scores 2) are obtained.
And inputting the feature sequences corresponding to the action sequences input frames 1 and input frames 2 into a feature registration module (Alignment module), comparing and matching the feature sequences of the two groups of action sequences based on a Sequence similarity matrix (Sequence similarity matrix) and an identity matrix (identity matrix), and supervising and constraining the feature registration result based on a first loss function to obtain a feature registration loss value (Sequence Alignment loss).
Finally, the first loss value (Classification loss1), the second loss value (Classification loss2) and the feature registration loss value (Sequence Alignment loss) are weighted and calculated based on preset weights, and a final action verification result is obtained, so that whether the action sequences input frames 1 and input frames 2 are the same action Sequence can be verified.
It is worth mentioning that the present invention trains the action sequence validation neural network model by proposing a new data set, which differs from existing multiple data sets in that the action sequence is more concerned than the single atomic action; and contains multiple action sequences with atomic action level differences that existing datasets do not have. An action sequence is composed of a plurality of atomic actions with sequence relation, and because the functions which can be completed by a single atomic action are too few, more actions are shown in the form of the action sequence in life.
In some examples, the training dataset is obtained in a manner that includes: 2000 videos containing 70 different action sequences are shot, some videos causing problems due to devices or actions are removed, and 1938 videos are remained; extracting pictures from the remaining video in each frame, wherein the total number of the pictures is 960,000, and each video lasts 20.58 seconds on average and comprises 495.85 frames; the 70 different action sequences can be divided into 14 groups, the first action sequence in each group is a standard sequence, the remaining four action sequences have slight difference from the standard sequence and are defined as error sequences, and the two action sequences are different from the standard sequence in that the sequence of some atomic actions is disturbed; the other two are to delete some atom actions on the basis of the standard sequence; finally, the obtained data set can be used for training the action sequence verification neural network model provided by the invention, the trained model can be used for verifying the action sequence to be verified, and can also be used for verifying whether a plurality of groups of action sequences are the same action.
EXAMPLE III
As shown in fig. 4, the present embodiment provides an action sequence verification apparatus, including: an action sequence obtaining module 41, configured to obtain an action sequence to be verified; the feature extraction module 42 is configured to perform feature extraction on the action sequence to be verified to obtain a corresponding feature sequence; a feature fusion module 43, configured to perform information fusion on the feature sequence to obtain an overall sequence feature to determine an action category of the action sequence to be verified; and the feature comparison module 44 is configured to perform feature comparison on the feature sequence of the action sequence to be verified and the feature sequence of the standard action sequence to determine whether the action sequences belong to the same action sequence.
It should be noted that the modules provided in this embodiment are similar to the methods and embodiments provided above, and therefore, the description thereof is omitted. It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the feature registration module 44 may be a separate processing element, or may be integrated into a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and a processing element of the apparatus calls and executes the functions of the feature registration module 44. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.
For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Example four
The present embodiment proposes a computer-readable storage medium on which a computer program is stored, which, when executed by a processor, implements the aforementioned action sequence verification method.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
EXAMPLE five
As shown in fig. 5, an embodiment of the present invention provides a schematic structural diagram of an electronic terminal. The electronic terminal provided by the embodiment comprises: a processor 51, a memory 52, a communicator 53; the memory 52 is connected with the processor 51 and the communicator 53 through a system bus and completes mutual communication, the memory 52 is used for storing computer programs, the communicator 53 is used for communicating with other devices, and the processor 51 is used for operating the computer programs, so that the electronic terminal executes the steps of the action sequence verification method.
The above-mentioned system bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other devices (such as a client, a read-write library and a read-only library). The Memory may include a Random Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
In summary, the present invention provides an action sequence verification method, an action sequence verification device, a storage medium, and a terminal, in which a feature sequence is obtained by extracting features from an action sequence, and fine-grained features of the action sequence to be verified are obtained by performing time sequence matching and information fusion on the feature sequence, so as to implement accurate verification of the action sequence. In real life, one action is often completed by an action sequence consisting of a plurality of continuous atomic actions, so that the accuracy of action verification can be effectively improved by verifying the action sequence with continuity, and the actual application range is wider. In addition, the action Sequence verification neural network model is built based on deep learning, the action Sequence is subjected to primary feature extraction through a basic network (backbone), and then the features with finer granularity are further extracted through a vision transformer and Sequence Alignment, so that the feature information is comprehensively extracted, and the action verification accuracy is high. Therefore, the present invention effectively overcomes various disadvantages of the prior art and has a high industrial utility value.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (10)

1. A method for action sequence verification, comprising:
acquiring an action sequence to be verified;
extracting the characteristics of the action sequence to be verified to obtain a corresponding characteristic sequence;
performing information fusion on the characteristic sequence to obtain the overall sequence characteristics to judge the action category of the action sequence to be verified;
and comparing the characteristic sequence of the action sequence to be verified with the characteristic sequence of the standard action sequence to judge whether the action sequences belong to the same action sequence.
2. The action sequence verification method according to claim 1, wherein the feature extraction manner includes:
and performing feature extraction on the action sequence to be verified based on a BN-inclusion network or a Resnet50 network, and modifying the final feature sequence of the full connection layer output preset dimension.
3. The action sequence verification method according to claim 1, wherein the manner of obtaining the overall sequence feature comprises:
and performing time-series information fusion on the characteristic sequence of the action sequence to be verified based on a Vision Transformer model to obtain the overall sequence characteristics.
4. The action sequence verification method according to claim 1, wherein the manner of feature comparison comprises:
comparing the characteristic sequence of the action sequence to be verified with the characteristic sequence of the standard action sequence one by one according to the time sequence to obtain a characteristic comparison result;
and constructing a loss function to supervise and constrain the feature comparison result.
5. The action sequence verification method according to claim 4, wherein the loss function is a first loss function; the method further comprises the following steps:
acquiring the overall sequence characteristics of the action sequence to be verified based on a Vision Transformer model;
judging the action type of the action sequence to be verified based on the overall sequence characteristics;
and constructing a second loss function to supervise and constrain the judgment result of the action category.
6. The action sequence verification method according to claim 5, comprising:
and performing weighted calculation based on the first loss function and the second loss function to obtain a verification result of the action sequence to be verified.
7. The action sequence verification method according to claim 3, wherein the result of the feature extraction is a plurality of feature maps; the method comprises the following steps:
dividing all feature maps into a plurality of fixed-size patches to obtain feature patch sequences;
inputting the obtained feature patch sequence into a Vision Transformer;
and the Vision Transformer weights and fuses the characteristics of the rest of the patches for each patch through a self-attribute module, a special token is reserved, and the characteristics corresponding to the token position represent the category information of the whole action sequence to be verified.
8. An action sequence verification apparatus, comprising:
the action sequence acquisition module is used for acquiring an action sequence to be verified;
the characteristic extraction module is used for extracting the characteristics of the action sequence to be verified so as to obtain a corresponding characteristic sequence;
the characteristic fusion module is used for carrying out information fusion on the characteristic sequence to obtain the overall sequence characteristic to judge the action category of the action sequence to be verified;
and the characteristic comparison module is used for comparing the characteristic sequence of the action sequence to be verified with the characteristic sequence of the standard action sequence to judge whether the action sequences belong to the same action sequence.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the action sequence verification method of any one of claims 1 to 7.
10. An electronic terminal, comprising: a processor and a memory;
the memory is configured to store a computer program, and the processor is configured to execute the computer program stored by the memory to cause the terminal to perform the action sequence verification method according to any one of claims 1 to 7.
CN202110469750.3A 2021-04-28 2021-04-28 Action sequence verification method and device, storage medium and terminal Active CN113516030B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110469750.3A CN113516030B (en) 2021-04-28 2021-04-28 Action sequence verification method and device, storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110469750.3A CN113516030B (en) 2021-04-28 2021-04-28 Action sequence verification method and device, storage medium and terminal

Publications (2)

Publication Number Publication Date
CN113516030A true CN113516030A (en) 2021-10-19
CN113516030B CN113516030B (en) 2024-03-26

Family

ID=78064106

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110469750.3A Active CN113516030B (en) 2021-04-28 2021-04-28 Action sequence verification method and device, storage medium and terminal

Country Status (1)

Country Link
CN (1) CN113516030B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8970348B1 (en) * 2012-08-28 2015-03-03 Intuit Inc. Using sequences of facial gestures to authenticate users
CN106845375A (en) * 2017-01-06 2017-06-13 天津大学 A kind of action identification method based on hierarchical feature learning
CN107122798A (en) * 2017-04-17 2017-09-01 深圳市淘米科技有限公司 Chin-up count detection method and device based on depth convolutional network
CN108573246A (en) * 2018-05-08 2018-09-25 北京工业大学 A kind of sequential action identification method based on deep learning
CN110602526A (en) * 2019-09-11 2019-12-20 腾讯科技(深圳)有限公司 Video processing method, video processing device, computer equipment and storage medium
CN111539289A (en) * 2020-04-16 2020-08-14 咪咕文化科技有限公司 Method and device for identifying action in video, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8970348B1 (en) * 2012-08-28 2015-03-03 Intuit Inc. Using sequences of facial gestures to authenticate users
CN106845375A (en) * 2017-01-06 2017-06-13 天津大学 A kind of action identification method based on hierarchical feature learning
CN107122798A (en) * 2017-04-17 2017-09-01 深圳市淘米科技有限公司 Chin-up count detection method and device based on depth convolutional network
CN108573246A (en) * 2018-05-08 2018-09-25 北京工业大学 A kind of sequential action identification method based on deep learning
CN110602526A (en) * 2019-09-11 2019-12-20 腾讯科技(深圳)有限公司 Video processing method, video processing device, computer equipment and storage medium
CN111539289A (en) * 2020-04-16 2020-08-14 咪咕文化科技有限公司 Method and device for identifying action in video, electronic equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
KIRILL GAVRILYUK ET AL.: "Actor-transformers for Group activity recognition", 《CVPR》, pages 1 - 3 *
ZHAOYUAN YIN: "Learning to recommend frame for interactive video object segmentation in the wild", 《ARXIV》 *
张舟;吴克伟;高扬;: "基于顺序验证提取关键帧的行为识别", 智能计算机与应用, no. 03 *
聂勇;张鹏;冯辉;杨涛;胡波;: "基于动作标准序列的3D视频人体动作识别", 太赫兹科学与电子信息学报, no. 05 *

Also Published As

Publication number Publication date
CN113516030B (en) 2024-03-26

Similar Documents

Publication Publication Date Title
CN110532920B (en) Face recognition method for small-quantity data set based on FaceNet method
Xu et al. Data uncertainty in face recognition
CA3066029A1 (en) Image feature acquisition
CN112801169B (en) Camouflage target detection method, system, device and storage medium based on improved YOLO algorithm
WO2018021942A2 (en) Facial recognition using an artificial neural network
JP2015506026A (en) Image classification
CN111079785A (en) Image identification method and device and terminal equipment
CN111461164B (en) Sample data set capacity expansion method and model training method
CN111881804B (en) Posture estimation model training method, system, medium and terminal based on joint training
JP2007128195A (en) Image processing system
CN111160225B (en) Human body analysis method and device based on deep learning
CN114863464B (en) Second-order identification method for PID drawing picture information
CN114118303B (en) Face key point detection method and device based on prior constraint
CN106407281B (en) Image retrieval method and device
CN110163095B (en) Loop detection method, loop detection device and terminal equipment
CN110083731B (en) Image retrieval method, device, computer equipment and storage medium
Chiu et al. Integrating content-based image retrieval and deep learning to improve wafer bin map defect patterns classification
US11593616B2 (en) Method for determining a data item's membership of a database and associated computer program product and information medium
US20220335566A1 (en) Method and apparatus for processing point cloud data, device, and storage medium
CN112749576A (en) Image recognition method and device, computing equipment and computer storage medium
CN111339920A (en) Cash adding behavior detection method, device and system, storage medium and electronic terminal
CN113516030B (en) Action sequence verification method and device, storage medium and terminal
CN110765917A (en) Active learning method, device, terminal and medium suitable for face recognition model training
CN111931767B (en) Multi-model target detection method, device and system based on picture informativeness and storage medium
CN114220006A (en) Commodity identification method and system based on commodity fingerprints

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant