CN114007064A

CN114007064A - Special effect synchronous evaluation method, device, equipment, storage medium and program product

Info

Publication number: CN114007064A
Application number: CN202111282835.7A
Authority: CN
Inventors: 张鹏; 严明; 肖央; 程文昕; 王泽尧
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-11-01
Filing date: 2021-11-01
Publication date: 2022-02-01
Anticipated expiration: 2041-11-01
Also published as: CN114007064B

Abstract

The application provides a special effect synchronous evaluation method, a device, equipment, a computer readable storage medium and a program product; the embodiment of the application can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, vehicle-mounted and the like, and relates to an automatic testing technology and a cloud computing technology; the method comprises the following steps: responding to a trigger operation aiming at an evaluation trigger mark in an evaluation interface, and acquiring a special effect video of a special effect to be evaluated; determining animation time information corresponding to a special effect animation of the special effect to be evaluated and audio time information corresponding to a special effect audio of the special effect to be evaluated based on an image sequence and an audio signal analyzed from the special effect video; determining an evaluation result according to the difference between the animation time information and the audio time information; the evaluation result represents the synchronization condition of the special effect animation and the special effect audio; and displaying the evaluation result in a result display area of the evaluation interface. According to the method and the device, the intelligent degree of special effect synchronous evaluation can be improved.

Description

Special effect synchronous evaluation method, device, equipment, storage medium and program product

Technical Field

The present application relates to automated testing technologies, and in particular, to a method, an apparatus, a device, a storage medium, and a program product for synchronously evaluating characteristics.

Background

Special effects are properly added aiming at dynamic contents such as videos, games and the like, so that the visual and auditory feelings of a user can be improved, and the user is further provided with fun. In order not to affect the user experience, the special effect picture and the special effect audio of the same dynamic content generally need to appear synchronously, so that the added special effect needs to be evaluated synchronously before being released to the user.

In the related art, the special effect synchronous evaluation is mostly realized through the subjective feeling of a tester. For example, the tester operates the game to release a skill, and determines whether the skill has a problem that a special effect picture and a special effect audio are not synchronized with each other in subjective vision and hearing. However, the method has the problem that the precision and the efficiency of the special effect synchronous evaluation are low, so that the intelligence degree of the special effect synchronous evaluation is low.

Disclosure of Invention

The embodiment of the application provides a special effect synchronous evaluation method, a special effect synchronous evaluation device, special effect synchronous evaluation equipment, a computer readable storage medium and a program product, and can improve the intelligent degree of special effect synchronous evaluation.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a special effect synchronous evaluation method, which comprises the following steps:

responding to a trigger operation aiming at an evaluation trigger mark in an evaluation interface, and acquiring a special effect video of a special effect to be evaluated;

determining animation time information corresponding to a special effect animation of the special effect to be evaluated and audio time information corresponding to a special effect audio of the special effect to be evaluated based on a video frame sequence and an audio signal analyzed from the special effect video;

the animation time information is time information with the precision of video frame duration, and the audio time information is time information with the precision of audio frame duration;

determining an evaluation result according to the difference between the animation time information and the audio time information; the evaluation result represents the synchronization condition of the special effect animation and the special effect audio;

and displaying the evaluation result in a result display area of the test interface.

The embodiment of the present application provides a special effect synchronous evaluation device, including:

the video acquisition module is used for responding to the trigger operation aiming at the evaluation trigger mark in the evaluation interface and acquiring the special effect video of the special effect to be evaluated;

the time determination module is used for determining animation time information corresponding to the special effect animation of the special effect to be evaluated and audio time information corresponding to the special effect audio of the special effect to be evaluated based on the video frame sequence and the audio signal analyzed from the special effect video; the animation time information is time information with the precision of video frame duration, and the audio time information is time information with the precision of audio frame duration;

the result generation module is used for determining an evaluation result according to the difference between the animation time information and the audio time information; the evaluation result represents the synchronization condition of the special effect animation and the special effect audio;

and the result display module is used for displaying the evaluation result in a result display area of the evaluation interface.

In some embodiments of the present application, the special effects to be evaluated include: a plurality of skill special effects; the video acquisition module is further configured to acquire a plurality of videos uploaded to a skill video area of the evaluation interface to obtain the special effect videos of the plurality of skill special effects; or releasing skills corresponding to the skill special effects in a virtual scene displayed in a virtual interaction area of the evaluation interface, displaying the skill special effects, and performing video recording on the virtual scene to obtain the special effect videos of the skill special effects.

In some embodiments of the present application, the evaluation result comprises: a plurality of sub-evaluation results corresponding to the respective special effect videos of the skill special effects; the special effect synchronous evaluation device further comprises: a result comparison module;

the result comparison module is used for comparing the plurality of sub-evaluation results and finding out a special effect to be repaired from the plurality of skill special effects;

the result display module is further configured to display the identification information corresponding to the special effect to be repaired in a repair prompt area of the evaluation interface.

In some embodiments of the present application, the animation time information includes: the animation start time and the animation end time of the special effect animation, and the audio time information comprises: the audio starting time and the audio ending time of the special effect audio;

the result generation module is further used for calculating a first time difference between the animation starting time and the audio starting time and a second time difference between the animation ending time and the audio ending time; and determining the evaluation result according to at least one of the first time difference and the second time difference.

In some embodiments of the present application, the result generation module is further configured to determine that the evaluation result is that the special effect animation is synchronized with the special effect audio when the first time difference is less than or equal to a first time threshold and the second time difference is less than or equal to a second time threshold; when the first time difference is larger than the first time threshold or the second time difference is larger than the second time threshold, determining that the evaluation result is that the special effect animation is not synchronous with the special effect audio.

In some embodiments of the present application, the time determination module is further configured to detect a start audio frame and an end audio frame of the special effect audio based on the determination of the amplitude feature and the frequency domain feature for each audio frame in the audio signal; calculating the audio starting time according to the number of the starting audio frame and the audio frame duration, and calculating the audio ending time according to the number of the ending audio frame and the audio frame duration; detecting a start point video frame and an end point video frame of the special effect animation based on determination of a dynamic region of each video frame in the video frame sequence; and calculating the animation starting time according to the number of the starting video frame and the video frame duration, and calculating the animation ending time according to the number of the ending video frame and the video frame duration.

In some embodiments of the present application, the amplitude signature comprises: short-time energy, the frequency domain features comprising: short-time zero-crossing rate; the time determination module is further configured to determine the short-time energy and the short-time zero-crossing rate for each of the audio frames in the audio signal; screening out a plurality of effective audio frames of which the short-time energy is greater than an energy threshold value and the short-time zero-crossing rate is greater than a zero-crossing rate threshold value from the audio signals; determining a first valid audio frame of the plurality of valid audio frames as the start audio frame, and determining a last valid audio frame of the plurality of valid audio frames as the end audio frame.

In some embodiments of the present application, the time determination module is further configured to determine, for each of the video frames in the sequence of video frames, a first dynamic region; determining a first video frame meeting an animation starting condition in the video frame sequence as the starting point video frame of the special effect animation; the animation starting condition is a video frame that the area of the first dynamic area is larger than a first area threshold value, and the overlapping area of the first dynamic area and a preset area is larger than a second area threshold value; respectively determining a second dynamic region for a plurality of other video frames positioned after the starting point video frame in the video frame sequence; when N continuous static video frames are extracted from the other video frames based on the second dynamic area, determining the Nth static video frame as the end point video frame; and N is a positive integer greater than 1, the still video frame is a video frame in which the area of the second dynamic region is less than or equal to the first area threshold, and the overlapping area of the second dynamic region and the preset region is less than or equal to the second area threshold.

In some embodiments of the present application, the time determining module is further configured to perform frame difference calculation on each of the video frames in the sequence of video frames and a first adjacent video frame of each of the video frames to obtain a first frame difference image; performing dimensionality reduction processing on the first frame difference image to obtain a first dimensionality reduction image; expanding an image area with the brightness larger than a brightness threshold value in the first dimension-reduced image to obtain an expanded area; and calculating a connected domain aiming at the expansion region to obtain the first dynamic region.

In some embodiments of the present application, the time determining module is further configured to perform frame difference calculation on each of the other video frames and a second adjacent video frame of each of the other video frames to obtain a second frame difference image; performing dimensionality reduction processing on the second frame difference image to obtain a second dimensionality reduction image; corroding the image area with the brightness larger than the brightness threshold value in the second dimension-reduced image to obtain a corroded area; and calculating a connected domain according to the corrosion region to obtain the second dynamic region.

In some embodiments of the present application, the audio frame comprises: and N audio signal points, wherein the duration of the audio frame is the ratio of N to the audio sampling rate.

In some embodiments of the present application, the animation time information includes: the animation duration of the special effect animation and the audio duration of the special effect audio; the result generation module is further used for calculating a third time difference between the animation duration and the audio duration; and determining the evaluation result according to the magnitude relation between the third time difference and a third time threshold.

The embodiment of the application provides an electronic device for special effect synchronous evaluation, which comprises:

a memory for storing executable instructions;

and the processor is used for realizing the special effect synchronous evaluation method provided by the embodiment of the application when the executable instructions stored in the memory are executed.

The embodiment of the present application provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute the computer-readable storage medium, so as to implement the special effect synchronous evaluation method provided by the embodiment of the present application.

The embodiment of the application has the following beneficial effects: the electronic equipment analyzes an image sequence and an audio signal of a special effect video to obtain time information of special effect animation with the precision of video frame duration and time information of special effect audio with the precision of audio frame duration, namely, the time information corresponding to the special effect animation and the special effect audio respectively is directly determined to a frame duration level to achieve higher precision, and then the high-precision time information is utilized to carry out precise comparison, so that the problem of asynchronization of the frame duration level which is difficult to be sensed can be found, namely, the problem of asynchronization of special effects with various degrees of severity can be found, the precision of special effect synchronous evaluation is greatly improved, the whole process is automatically realized, multiple operations are not needed, and the efficiency of special effect synchronous evaluation is accelerated. In summary, the method for synchronously evaluating the special effects provided by the embodiment of the application improves the precision of synchronous evaluation of the special effects, accelerates the efficiency of synchronous evaluation of the special effects, and finally improves the intelligent degree of synchronous evaluation of the special effects.

Drawings

Fig. 1 is a schematic structural diagram of a special effect synchronization evaluation system provided in an embodiment of the present application;

fig. 2 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;

fig. 3 is a first flowchart illustrating a special effect synchronous evaluation method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an evaluation interface provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of the evaluation results provided by the embodiments of the present application;

fig. 6 is a first schematic diagram of obtaining a special effect video according to an embodiment of the present application;

fig. 7 is a second schematic diagram of obtaining a special effect video according to an embodiment of the present application;

fig. 8 is a schematic diagram illustrating identification information of a special effect to be repaired, provided in an embodiment of the present application;

fig. 9 is a flowchart illustrating a second special effect synchronous evaluation method according to an embodiment of the present application;

fig. 10 is a third schematic flowchart of a special effect synchronous evaluation method provided in the embodiment of the present application;

FIG. 11 is a schematic diagram of the short-term energy per audio frame provided by an embodiment of the present application;

FIG. 12 is a schematic diagram of generating a first frame difference image according to an embodiment of the present disclosure;

FIG. 13 is a schematic diagram of a dimension reduction process provided by an embodiment of the application;

fig. 14 is a schematic diagram of a process of median filtering provided by an embodiment of the present application;

FIG. 15 is a comparison of the results before and after median filtering provided by embodiments of the present application;

FIG. 16 is a schematic diagram illustrating the effect of the expansion process provided by the embodiments of the present application;

FIG. 17 is a schematic view of a connected domain of an expansion region provided by an embodiment of the present application;

FIG. 18 is a block diagram of a system for effect synchronization evaluation according to an embodiment of the present disclosure;

FIG. 19 is a process diagram of a consistency calculation provided by an embodiment of the present application;

FIG. 20 is a schematic diagram of a process for initially detecting the start point and the end point of a skilled audio signal according to the short-term energy provided by an embodiment of the present application;

fig. 21 is a schematic diagram of a process for determining a skill release frame and an end frame according to an embodiment of the present application;

fig. 22 is a schematic diagram of a skill button area provided by an embodiment of the present application.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) Cloud Computing (Cloud Computing) refers to a mode of delivery and use of IT infrastructure, which refers to obtaining required resources in an on-demand, easily scalable manner over a network; the generalized cloud computing refers to a delivery and use mode of a service, and refers to obtaining a required service in an on-demand and easily-extensible manner through a network. Such services may be IT and software, internet related, or other services. Cloud Computing is a product of development and fusion of traditional computers and Network Technologies, such as Grid Computing (Grid Computing), Distributed Computing (Distributed Computing), Parallel Computing (Parallel Computing), Utility Computing (Utility Computing), Network Storage (Network Storage Technologies), Virtualization (Virtualization), Load balancing (Load Balance), and the like.

With the development of diversification of internet, real-time data stream and connecting equipment and the promotion of demands of search service, social network, mobile commerce, open collaboration and the like, cloud computing is rapidly developed. Different from the prior parallel distributed computing, the generation of cloud computing can promote the revolutionary change of the whole internet mode and the enterprise management mode in concept.

2) Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence means that the design principle and the implementation method of various machines are researched, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and the like.

3) Computer Vision technology (CV) is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning and map building, automatic driving, smart traffic, and other technologies, and also include common biometric technologies such as face equipment and fingerprint recognition.

4) The special effect picture is a display effect of a special effect, and is an effect that can be seen by human, for example, a skill release special effect in a game picture, a character appearance special effect in a video picture, and the like.

5) The special effect audio is a playing sound of a special effect, and is an effect that can be heard by human beings, for example, a special sound that appears when a game skill is released, or a score of a video character appearing on the scene, and the like.

6) And the special effect synchronous evaluation is used for analyzing whether the special effect picture and the special effect audio appear synchronously. In order to enable a user to have higher experience on games, videos and the like, special effect pictures and special effect audios are generally required to appear synchronously. For example, in a massively Multiplayer Online Role-Playing Game (MMORPG) Game and a Multiplayer Online tactical sports Game (MOBA), when a Game character clicks to release a skill, a special effect screen and a special effect audio of the skill should start Playing at the same time until the skill is finished.

7) A virtual scene is a scene that an application displays (or provides) at runtime. The virtual scene may be a simulation environment of a real world, a semi-simulation semi-fictional virtual environment, or a pure fictional virtual environment. The virtual scene may be a two-dimensional virtual scene, a three-dimensional virtual scene, and the like. The virtual scene can include, for example, sky, land, sea, and the like, and can also include virtual characters, and the virtual character can be manipulated by the user to move in the virtual scene.

Special effects are properly added aiming at dynamic contents such as videos, games and the like, so that the visual and auditory feelings of a user can be improved, and the user is further provided with fun. In order not to affect the user experience, the special effect picture and the special effect audio of the same dynamic content generally need to appear synchronously, for example, the special effect picture and the special effect audio at the time of skill release generally need to start and end at the same time, otherwise, the user experience is affected. Therefore, after adding a special effect to the dynamic content, the audio and the picture need to be evaluated synchronously to be issued to the user.

In the related art, the special effect synchronous evaluation is mostly realized through the subjective feeling of an evaluator. For example, the evaluator operates the game to release a skill, and subjectively, visually and audibly determines whether the skill has a problem that a special effect picture and a special effect audio are not synchronized with each other. However, when the evaluators perform special effect synchronous evaluation, only serious asynchronous problems can be found, so that the precision of the special effect synchronous evaluation is low; meanwhile, evaluators may need to perform manual operation for many times to perform judgment, so that the efficiency of special effect synchronous evaluation is low.

In summary, the problem of low precision and efficiency of special effect synchronous evaluation exists in the related technology, so that the intelligent degree of the special effect synchronous evaluation is low.

The embodiment of the application provides a special effect synchronous evaluation method, a special effect synchronous evaluation device, special effect synchronous evaluation equipment, a computer readable storage medium and a program product, and can improve the intelligent degree of special effect synchronous evaluation. The following describes an exemplary application of the electronic device for special effect synchronization evaluation provided in the embodiments of the present application, and the electronic device provided in the embodiments of the present application may be implemented as various types of terminals such as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, and a portable game device), and may also be implemented as a device cluster including terminals and a server. In the following, an exemplary application will be explained when the electronic devices are implemented as a device cluster consisting of terminals and servers.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a special effect synchronization evaluation system provided in an embodiment of the present application. In order to support a special effect synchronization evaluation application, in the characteristic synchronization evaluation system 100 shown in fig. 1, the terminal 400 is connected to the server 200 through the network 300 to serve as a front end of the server 200, and the network 300 may be a wide area network or a local area network, or a combination of the two.

The terminal 400 is configured to respond to a trigger operation of a tester for an evaluation trigger in an evaluation interface displayed on the graphical interface 400-1, acquire a special effect video of a special effect to be evaluated, and send the special effect video to the server 200 through the network 300.

The server 200 is configured to receive the special effect video sent by the terminal 400; determining animation time information corresponding to a special effect animation of the special effect to be evaluated and audio time information corresponding to a special effect audio of the special effect to be evaluated based on a video frame sequence and an audio signal analyzed from the special effect video; the animation time information is time information with the precision of video frame duration, and the audio time information is time information with the precision of audio frame duration; determining an evaluation result according to the difference between the animation time information and the audio time information; wherein the evaluation result represents the synchronization condition of the special effect animation and the special effect audio.

The server 200 is further configured to send the evaluation result to the terminal 400 through the network 300, and the terminal 400 is configured to display the evaluation result in a result display area of the evaluation interface.

In some embodiments, the server 200 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The terminal 400 may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, a smart home appliance, a vehicle-mounted terminal, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in the embodiment of the present invention.

Referring to fig. 2, fig. 2 is a schematic structural diagram of an electronic device provided in an embodiment of the present application, and the electronic device 500 shown in fig. 2 includes: at least one processor 510, memory 550, at least one network interface 520, and a user interface 530. The various components in the electronic device 500 are coupled together by a bus system 540. It is understood that the bus system 540 is used to enable communications among the components. The bus system 540 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 540 in fig. 2.

The Processor 510 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 530 includes one or more output devices 531 enabling presentation of media content, including one or more speakers and/or one or more visual display screens. The user interface 530 also includes one or more input devices 532, including user interface components to facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 550 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 550 optionally includes one or more storage devices physically located remote from processor 510.

The memory 550 may comprise volatile memory or nonvolatile memory, and may also comprise both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), and the volatile memory may be a Random Access Memory (RAM). The memory 550 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, memory 550 can store data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.

An operating system 551 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a network communication module 552 for communicating to other computing devices via one or more (wired or wireless) network interfaces 520, exemplary network interfaces 520 including: bluetooth, wireless-compatibility authentication (Wi-Fi), and Universal Serial Bus (USB), etc.;

a presentation module 553 for enabling presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 531 (e.g., a display screen, speakers, etc.) associated with the user interface 530;

an input processing module 554 to detect one or more user inputs or interactions from one of the one or more input devices 532 and to translate the detected inputs or interactions.

In some embodiments, the special effect synchronization evaluating apparatus provided in the embodiments of the present application may be implemented in software, and fig. 2 illustrates a special effect synchronization evaluating apparatus 555 stored in a memory 550, which may be software in the form of programs and plug-ins, and includes the following software modules: the video acquisition module 5551, the time determination module 5552, the result generation module 5553, the result presentation module 5554 and the result comparison module 5555 are logical, and thus any combination or further splitting can be performed according to the implemented functions. The functions of the respective modules will be explained below.

In other embodiments, the special effect synchronization evaluating apparatus provided in the embodiments of the present Application may be implemented in hardware, and for example, the special effect synchronization evaluating apparatus provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the special effect synchronization evaluating method provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

In some embodiments, the terminal or the server may implement the special effect synchronization evaluation method provided by the embodiments of the present application by running a computer program. For example, the computer program may be a native program or a software module in an operating system; the Application program may be a local (Native) Application program (APP), that is, a program that needs to be installed in an operating system to be run, such as an APP for special effect synchronous evaluation; or may be an applet, i.e. a program that can be run only by downloading it to the browser environment; but also an applet that can be embedded into any APP. In general, the computer programs described above may be any form of application, module or plug-in.

The embodiment of the application can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, vehicle-mounted and the like. In the following, the special effect synchronization evaluation method provided by the embodiment of the present application will be described in conjunction with exemplary applications and implementations of the electronic device provided by the embodiment of the present application.

Referring to fig. 3, fig. 3 is a first flowchart of a special effect synchronization evaluation method provided in the embodiment of the present application, which will be described with reference to the steps shown in fig. 3.

S101, responding to the trigger operation aiming at the evaluation trigger mark in the evaluation interface, and obtaining the special effect video of the special effect to be evaluated.

The embodiment of the application is realized in a scene of carrying out audio and picture synchronism evaluation on special effects, for example, whether the skill special effect in a game is audio and picture synchronism or not, whether the science fiction special effect in a movie is audio and picture synchronism or not and the like are evaluated. In the embodiment of the application, the electronic device monitors whether the electronic device has a trigger operation for the evaluation trigger on the presented evaluation interface. When the electronic equipment monitors that the trigger operation aiming at the evaluation trigger mark is yes, the electronic equipment responds to the trigger operation to acquire the special effect video of the special effect to be evaluated so as to analyze whether the picture and the audio of the special effect to be evaluated are synchronous or not based on the special effect video.

The evaluation interface may be displayed by the electronic device in response to an operation of the evaluator, for example, in response to a command input by the evaluator in a command line, or displayed by a sound of "start to perform a special effect test" of the evaluator, or displayed automatically by the electronic device, for example, displayed by timing triggering, or displayed immediately after the generation of a special effect to be evaluated is completed.

It can be understood that the evaluation trigger may be set in any area of the evaluation interface, and the size of the evaluation trigger may be adjusted according to the actual situation, which is not limited herein. The trigger operation may be a single click, a double click, a long press, a slide, and the like, and the application is not limited herein.

It should be noted that the evaluation interface further includes a result display area, and the result display area is used for displaying an evaluation result obtained by the special effect to be evaluated in the synchronous evaluation of the special effect.

For example, fig. 4 is a schematic diagram of an evaluation interface provided in an embodiment of the present application. In the evaluation interface 4-1, an evaluation trigger 4-11 and a result display area 4-12 are provided. When the evaluator clicks the evaluation trigger 4-11, the electronic device starts the special effect synchronous evaluation to obtain the special effect video.

The special effect to be evaluated may be a special effect released by game skill, or may also be a special effect of actions of characters in a video, a special effect of a scene, and the like, and the application is not limited herein. The special effect video can be uploaded manually recorded video, and can also be obtained by recording the special effect to be evaluated by calling an automatic script through electronic equipment.

S102, determining animation time information corresponding to the special effect animation of the special effect to be evaluated and audio time information corresponding to the special effect audio of the special effect to be evaluated based on the video frame sequence and the audio signals analyzed from the special effect video.

The electronic equipment separates the special effect video in image dimension and audio dimension to obtain a video frame sequence and an audio signal, wherein the video frame sequence comprises special effect animation of the special effect to be evaluated, and the audio signal comprises special effect audio of the special effect to be evaluated. The electronic equipment analyzes the video frame sequence and the audio signal to clarify the appearance of the special effect animation in the time dimension and the appearance of the special effect audio in the time dimension, and obtains animation time information and audio time information.

It should be noted that the animation time information is time information with precision of video frame duration, and the audio time information is time information with precision of audio frame duration. That is, in the embodiment of the present application, the electronic device directly determines the time information of the special effect animation at the frame duration level and the time information of the special effect audio at the frame duration level. The frame duration is typically on the order of milliseconds, so that the animation time information and the audio time information determined by the electronic device are more accurate.

It is to be understood that the effect animation refers to a visual representation of the effect to be evaluated, for example, a motion of a character in a movie, an appearance change of a virtual character in a game, and the like, and the effect audio refers to an auditory representation of the effect to be evaluated, for example, a score of a character in a movie, a pop sound in a game, and the like.

In the embodiment of the present application, the animation time information may include one or more of an animation start time, an animation end time, and an animation duration corresponding to the special effect animation, and the audio time information may include one or more of an audio start time, an audio end time, and an audio duration corresponding to the special effect audio.

In some embodiments, the electronic device may determine a dynamic region for each video frame in the sequence of video frames, then determine a video frame involved in the special effect animation based on an analysis of the dynamic region, and determine animation time information based on the video frame. In other embodiments, the electronic device may further match a video frame with the special effect animation from the video frame sequence according to an image template corresponding to the special effect animation, and determine animation time information based on the video frame.

In some embodiments, the electronic device may extract valid audio from the audio signal, and determine audio time information corresponding to the special effect audio based on audio frames related to the valid audio. In other embodiments, the electronic device may further match, by using a preset audio feature corresponding to the special-effect audio, a feature of each audio frame of the audio signal, and determine, based on the matched audio frame, audio time information corresponding to the special-effect audio, which is not limited herein.

S103, determining an evaluation result according to the difference between the animation time information and the audio time information.

The electronic equipment for synchronizing the special effect compares the animation time information with the audio time information, so that whether the special effect animation and the special effect audio appear synchronously or not can be determined, and an evaluation result is obtained. That is, the evaluation result characterizes the synchronization of the special effect animation and the special effect audio.

In some embodiments, the electronic device may directly use the animation time information and the audio time information as the evaluation result, or may use a time difference value between the animation time information and the audio time information as the evaluation result, for example, 0.5s, 1s, and the like.

In further embodiments, the evaluation result may further include prompt information, e.g., pass, fail, etc., for automatically comparing the time difference between the animation time information and the audio time information with corresponding time thresholds to give whether the special effect to be evaluated passes the special effect synchronicity evaluation. The time threshold may be preset, or may be adjusted according to the category of the special effect to be evaluated, for example, when the special effect to be evaluated is a skill special effect of a game, the time threshold is adjusted to a minimum value, and when the special effect is a movie special effect, the time threshold is adjusted to an intermediate value.

And S104, displaying the evaluation result in a result display area of the evaluation interface.

After the electronic device obtains the evaluation result, the evaluation result can be displayed in the result display area, so that an evaluator or a developer can know the synchronization result of the audio and the picture of the special effect to be evaluated so as to perform the next processing. It will be appreciated that the size and location of the result presentation area may be set according to the actual situation.

Exemplarily, based on fig. 4, referring to fig. 5, fig. 5 is a schematic diagram of an evaluation result provided by an embodiment of the present application. The electronic equipment displays the evaluation result 5-1 in a result display area 4-12 of the evaluation interface 4-1, wherein the evaluation result 5-1 comprises animation time information 5-11 (animation starting time: 0.2; animation ending time 1.3), audio time information 5-12 (audio starting time: 0.1; audio ending time: 0.89), and prompt information 5-13: failing to pass. Therefore, the evaluators can know the special effect synchronization condition of the special effect to be evaluated.

It can be understood that, compared with the implementation of synchronous evaluation of special effects based on subjective feelings of evaluators in the related art, in the embodiment of the application, the electronic device analyzes the image sequence and the audio signal of the special effect video to obtain the time information of the special effect animation with the precision of the duration of the video frame and the time information of the special effect audio with the precision of the duration of the audio frame, namely, the time information corresponding to the special effect animation and the special effect audio respectively is directly determined to the frame duration level, thereby achieving higher precision, then, the high-precision time information is utilized to carry out accurate comparison, so that the problem that the frame time length level is not synchronous and is not easy to be sensed can be found, can find out the problem of special effect asynchronism with various degrees of severity, greatly improve the precision of special effect synchronous evaluation, and the whole process is realized automatically, repeated operation is not needed, and the efficiency of special effect synchronous evaluation is accelerated. In summary, the method for synchronously evaluating the special effects provided by the embodiment of the application improves the precision of synchronous evaluation of the special effects, accelerates the efficiency of synchronous evaluation of the special effects, and finally improves the intelligent degree of synchronous evaluation of the special effects.

It should be noted that, in some embodiments, the electronic device may be implemented as a terminal, and in this case, S101 to S104 may be independently performed by the terminal to implement special effect synchronous evaluation. In other embodiments, the electronic device may also be implemented as a device cluster including a terminal and a server, where the terminal may complete the processes of S101 and S104, and the server may complete the processes of S102 to S103, and the server and the terminal may implement data interaction based on a cloud technology, for example, the terminal uploads a special-effect video to the server, and the server sends an evaluation result to the terminal.

In some embodiments of the present application, the special effects to be evaluated include: and a plurality of skill effects such as a state value return effect in the game, an attack effect and the like. At this time, obtaining the special effect video of the special effect to be evaluated, that is, the specific implementation process of S101 may include: s1011 or S1012, as follows:

s1011, acquiring the plurality of videos uploaded to the skill video area of the evaluation interface to obtain special effect videos of the plurality of skill special effects.

In the embodiment of the application, a skill video area is further arranged in the evaluation interface and used for enabling evaluators to upload videos required by special effect synchronous evaluation. Furthermore, a video uploading identifier can be arranged in the evaluation interface, and an evaluator can upload a plurality of videos to the skill video area at one time by triggering the video uploading identifier. Of course, the electronic device may also support uploading videos to the skill video area through a dragging operation, for example, the evaluator drags a plurality of videos on the main menu to the skill video area of the evaluation interface in sequence to upload the videos to the skill video area.

Exemplarily, based on fig. 4, referring to fig. 6, fig. 6 is a schematic diagram of obtaining a special effect video according to an embodiment of the present application. The evaluation interface 4-1 is further provided with a video uploading identifier 6-1 and a skill video area 6-2, after the electronic equipment detects that the evaluator operates on the video uploading identifier 6-1, the evaluator uploads a selected video 6-A and a selected video 6-B to the skill video area 6-2 through an interfaced path, and then when the evaluator clicks the evaluation trigger identifier 4-11, two videos in the skill video area 6-2 are used as special effect videos to perform special effect synchronism evaluation.

S1012, releasing skills corresponding to the skill special effects in a virtual scene displayed in a virtual interaction area of the evaluation interface, displaying the skill special effects, and performing video recording on the virtual scene to obtain special effect videos of the skill special effects.

In the embodiment of the application, a virtual interaction area can be further arranged in the evaluation interface, and the virtual interaction area is used for providing an interaction entrance with a virtual scene. At this time, the electronic device may sequentially release skills corresponding to the plurality of skill special effects in response to a skill release operation of a virtual scene displayed in the virtual interaction area by an evaluator, or a skill release instruction in a skill control script virtual scene, so that the plurality of skill special effects may be displayed in the virtual scene, and at this time, the video recording component is invoked to perform video recording on the virtual scene, so as to obtain a special effect video corresponding to each of the plurality of skill special effects.

Exemplarily, based on fig. 4, referring to fig. 7, fig. 7 is a schematic diagram of obtaining a special effect video according to an embodiment of the present application. In the evaluation interface 4-1, a virtual interaction area 7-1 is further arranged, the electronic device can call and display a virtual scene 7-2 in the virtual interaction area 7-1, such as a game battle scene, then sequentially release a state value reply skill and a reinforced defense skill in the virtual scene 7-2 to display skill special effects of the two skills, call a video recording script to record the video of the virtual scene 7-2, and obtain special effect videos corresponding to the two skills after the recording is finished. And finally, when the evaluator clicks the evaluation trigger 4-11, the special effect videos are used for special effect synchronism evaluation.

In the embodiment of the application, the electronic equipment can acquire the special effect videos in batches, and can acquire the uploaded recorded videos directly or acquire the special effect videos in a real-time recording skill and special effect display process mode, so that the diversity of the special effect video acquisition modes is improved.

In some embodiments of the present application, the evaluation result comprises: a plurality of sub-evaluation results corresponding to the special effect videos of the skill special effects, where after the evaluation results are displayed in a result display area of the evaluation interface, that is, after S104, the method may further include: S105-S106, as follows:

and S105, based on the comparison of the plurality of sub-evaluation results, finding out the special effect to be repaired from the plurality of skill special effects.

And S106, displaying the identification information corresponding to the special effect to be repaired in a repair prompt area of the assessment interface.

In the embodiment of the application, the electronic device may further transversely compare time difference values of animation time information and audio time information in the multiple sub-evaluation results, so as to select a skill special effect with the worst sub-evaluation result from the multiple skill special effects, or a skill special effect with a sub-evaluation result lower than an average level, as a to-be-repaired special effect to be repaired, obtain identification information corresponding to the to-be-repaired special effect, and display the corresponding identification information in a repair prompt area, so as to prompt an evaluator which skill special effects need to be repaired.

It can be understood that the identification information of the to-be-repaired special effect may be a name of the to-be-repaired special effect or a number of the to-be-repaired special effect, and the application is not limited herein.

For example, fig. 8 is a schematic diagram illustrating identification information of a special effect to be repaired, provided in an embodiment of the present application. The electronic equipment displays the sub-evaluation results of 2 skill special effects in a result display area 8-11 of the evaluation interface 8-1, namely the time difference value between the animation time information and the audio time information of the state recovery skill 8-111: 0.1, and 0.5 time difference between animation time information and audio time information of the defense addition skills 8-112, and displays the defense addition skills 8-112 in the repair tip area 8-12 of the assessment interface 8-1.

In the embodiment of the application, the electronic equipment can also directly compare the sub-evaluation results to determine the skill special effect needing to be repaired, display the related identification information and further improve the intelligent degree of the synchronous evaluation of the special effect.

Based on fig. 3, referring to fig. 9, fig. 9 is a schematic flowchart of a second special effect synchronization evaluation method provided in the embodiment of the present application. In some embodiments of the present application, the animation time information includes: the animation start time and the animation end time of the special effect animation, and the audio time information comprises: an audio start time and an audio end time of the special effect audio. In this case, determining the evaluation result according to the difference between the animation time information and the audio time information, that is, the specific implementation process of S103, may include: S1031-S1032 are as follows:

and S1031, calculating a first time difference between the animation start time and the audio start time and a second time difference between the animation end time and the audio end time.

In some embodiments, the electronic device may directly subtract the animation start time from the audio start time to obtain a first time difference, and directly subtract the animation end time from the audio end time to obtain a second time difference. In other embodiments, the electronic device may further determine the absolute value of the difference between the animation start time and the audio start time as the first time difference and the absolute value of the difference between the animation end time and the audio end time as the second time difference.

For example, the embodiment of the present application provides a calculation manner of the first time difference and the second time difference, which are respectively shown in equation (1) and equation (2):

Diff_start＝abs(T_{A_start}-T_{V_start}) (1)

Diff_end＝abs(T_{A_end}-T_{V_end}) (2)

wherein, T_{A_start}Is the audio start time, T_{V_start}Is the animation start time, T_{A_end}Is the audio end time, T_{V_end}Is the animation end time, Diff_startIs a first time difference, Diff_endIs the second time difference and abs is the absolute value calculation.

It is understood that, in the embodiment of the present application, the animation start time may be calculated from the number of the start video frame and the video frame duration, and the animation end time may be calculated from the number of the end video frame and the video frame duration. The video frame duration (calculated based on the video frame rate) refers to the display duration of each video frame, and the display duration and the number of the starting video frame are used for calculation, so that the special-effect animation can be displayed clearly after the duration of each video frame is passed, the accuracy of the animation starting time can be limited to the video frame duration, the animation starting time can reach the accuracy degree which cannot be perceived by human beings, and the follow-up more accurate time comparison can be conveniently carried out. Similarly, the electronic device determines the animation ending time in a similar manner, and the video frame duration is also used as the precision.

The audio start time can be calculated from the number of the start audio frame and the audio frame duration, and the audio end time can be calculated from the number of the end audio frame and the audio frame duration. The duration of an audio frame (obtained based on the number of sampling points included in the audio frame and the audio sampling rate) refers to the length of time each audio frame occupies when the audio signal is played. The electronic equipment combines the audio frame duration with the starting video frame number and the ending video frame, so that the special-effect audio starts to be played and the special-effect audio ends after the audio frame duration is determined, the precision of the calculated audio starting time and the audio ending time is limited to the audio frame duration, and the precision is high, so that the time comparison can be performed more accurately.

S1032, determining an evaluation result according to at least one of the first time difference and the second time difference.

The electronic device can obtain a first time difference by subtracting the animation start time from the audio start time, and can obtain a second time difference by subtracting the animation end time from the audio end time. Then, the electronic device may generate the evaluation result from any one of the first time difference and the second time difference, or may generate the evaluation result by using the first time difference and the second time difference at the same time.

It is to be understood that the electronic device may assume that the special effect animation and the special effect audio are not synchronized as long as any one of the start time of the special effect animation and the special effect audio and the end time of the special effect animation and the special effect audio does not satisfy the synchronization condition. Based on this, in the embodiment of the application, the electronic device may use any one of the first time difference and the second time difference as a judgment basis, and as long as it is determined that the selected time difference does not satisfy the corresponding judgment condition, the electronic device may directly generate an evaluation result that the special effect animation and the special effect audio are not synchronized, so that the evaluation result may be roughly determined by only one calculation, and the calculation amount required for generating the evaluation result is small.

It can also be understood that the electronic device may further determine the evaluation result by using the first time difference and the second time difference at the same time, so as to use the start time of the special effect animation and the special effect audio and the end time of the special effect animation and the special effect audio as the judgment basis at the same time, thereby generating a more accurate evaluation result.

In some embodiments, the electronic device may directly take one or more of the first time difference and the second time difference as the evaluation result. In other embodiments, the electronic device may further compare one of the first time difference and the second time difference to a plurality of corresponding time difference thresholds to determine the evaluation result.

In the embodiment of the application, the electronic device can accurately judge whether the special effect animation and the special effect audio are synchronous or not based on the start time and the end time of the special effect animation and the start time and the end time of the special effect audio, and the precision of synchronous evaluation of the special effect is improved.

In some embodiments of the present application, determining the evaluation result according to at least one of the first time difference and the second time difference, that is, the specific implementation process of S1032 may include: s1032a or S1032b, as follows:

s1032a, when the first time difference is less than or equal to the first time threshold and the second time difference is less than or equal to the second time threshold, determining that the evaluation result is that the special effect animation is synchronized with the special effect audio.

S1032b, when the first time difference is greater than the first time threshold, or the second time difference is greater than the second time threshold, determining that the evaluation result is that the special effect animation is not synchronized with the special effect audio.

The electronic equipment compares the first time difference with a first time threshold value, compares the second time difference with a second time threshold value, and if the first time difference is larger than the first time threshold value or the second time difference is larger than the second time threshold value, the electronic equipment determines the evaluation result as that the special effect animation is not synchronous with the characteristic audio. If the first time difference is smaller than or equal to the first time threshold and the second time difference is smaller than or equal to the second time threshold, the electronic device determines that the evaluation result is that the special effect animation is synchronous with the special effect audio.

It should be noted that, in the embodiment of the present application, the first time threshold and the second time threshold may be the same or different. The specific values of the first time threshold and the second time threshold can be set according to actual conditions, or can be determined by the electronic device according to the type of the special effect to be evaluated or the total duration of the special effect video. For example, when the total duration of the special effect video is long, the first time threshold and the second time threshold are both set to 0.1s, when the special effect to be evaluated is a skill special effect of the game, the first actual threshold is set to 0.05 yes, the second time threshold is set to 0.08s, and the like, and the application is not limited herein.

In the embodiment of the application, the electronic equipment can determine the evaluation result by comparing the first time difference and the second time difference with the corresponding time thresholds, so that the manual analysis of evaluators is not needed, and the efficiency of special effect synchronous evaluation is improved.

Based on fig. 9 and referring to fig. 10, fig. 10 is a third schematic flowchart of the special effect synchronization evaluation method provided in the embodiment of the present application. In some embodiments of the present application, determining animation time information corresponding to a special effect animation of the special effect to be evaluated and audio time information corresponding to a special effect audio of the special effect to be evaluated based on a video frame sequence and an audio signal analyzed from the special effect video, that is, a specific implementation process of S102 may include: S1021-S1024, as follows:

and S1021, determining amplitude characteristics and frequency domain characteristics of each audio frame in the audio signal, and detecting a starting audio frame and an end audio frame of the special effect audio.

The electronic equipment analyzes a plurality of audio frames from the audio signal, extracts amplitude characteristics and frequency domain special effects for each audio frame, determines an audio frame interval where effective audio is located from the audio signal according to the amplitude characteristics and the frequency domain characteristics of each audio frame, and determines a starting audio frame and an end audio frame of the special effects according to a starting frame and an end frame of the audio frame interval.

It is understood that the effective audio refers to audio other than noise, and thus, the effective audio is special effect audio included in the audio signal. In some embodiments, the electronic device may determine an audio frame for which both the amplitude feature and the frequency domain feature are greater than the corresponding threshold as audio frames of valid audio; in other embodiments, the electronic device may further determine, as valid audio, an audio frame with the maximum amplitude characteristic and the maximum frequency domain characteristic and an audio frame of a preset duration after the audio frame.

It will also be appreciated that the amplitude features characterize the audio signal points in the audio frame in terms of their energy, e.g. in terms of the sound intensity perceived by the human ear. The amplitude characteristic may be an amplitude value, or a short-time energy, etc. The frequency domain features characterize the audio signal points in the audio frame in frequency, e.g. by which single frequency signals the audio signal points are synthesized. The frequency domain characteristic may be a frequency value, a short-time zero crossing rate, or the like, and the application is not limited herein.

S1022, calculate the audio start time according to the number of the start audio frame and the audio frame duration, and calculate the audio end time according to the number of the end audio frame and the audio frame duration.

The electronic equipment multiplies the number of the starting audio frame by the audio frame duration, so that the special-effect audio can be started to play after the audio frame duration is determined, and the audio starting time is obtained. Similarly, the electronic device multiplies the number of the end point audio frame by the audio frame duration, so that the playing of the special-effect audio is finished after the number of the audio frame durations is determined, and the audio end time is obtained.

In the embodiment of the present application, the duration of the audio frame is obtained by comparing the number of audio signal points included in the audio frame with the audio sampling rate. The number of the audio signal points indicates how many audio signal points are used to form an audio frame, and the audio sampling rate represents the number of the audio signal points collected in 1 second. Comparing 1 with the audio sampling rate, the time length corresponding to each audio signal point can be determined, and the time length is multiplied by the number of the audio signal points, so that the audio frame time length is obtained. That is, in the audio frame, there are included: and when N audio signal points exist, the audio frame duration is the ratio of N to the audio sampling rate.

Based on this, the embodiments of the present application provide formulas for calculating the audio start time and the audio end time, as shown in formulas (3) and (4):

wherein A is_startIs the number of the starting audio frame, A_{sample_ratio}Is the audio sample rate, is the number of end audio frames, and N is the length of an audio frame, i.e. the number of audio signal points comprised.

S1023, based on the determination of the dynamic area of each video frame in the video frame sequence, detecting a starting point video frame and an end point video frame of the special effect animation.

The electronic device performs extraction and analysis of image features for each video frame to determine a dynamic region for each video frame where image content changes in the temporal dimension. When the special effect animation does not exist in the video frame sequence, the pictures of all the video frames in the video frame sequence do not change, and therefore no dynamic region exists.

It is understood that the foreground region refers to a region where dynamic content appears, and thus, the process of determining the dynamic region by the electronic device is to determine the foreground region. The dynamic device may determine the foreground region for each video frame by using a common foreground region determination method, for example, the electronic device may determine the foreground region of each video frame by using a frame difference method, that is, subtract each video frame from a corresponding reference video frame (which may be a previous frame or a first frame) to obtain the foreground region, or determine the foreground region of each video frame by using gaussian background modeling, that is, perform state representation on a pixel point in each video frame by using a gaussian function, and divide each video frame into the foreground region and the background region by using the difference of the pixel point states.

S1024, calculating the animation starting time according to the number of the starting point video frame and the video frame duration, and calculating the animation ending time according to the number of the ending point video frame and the video frame duration.

After the starting point video frame and the ending point video frame are determined, the electronic equipment can directly multiply the number of the starting point video frame and the video frame time length to determine how much video frame time length passes and then start playing the special-effect animation to obtain the animation starting time, and simultaneously multiply the number of the ending point video frame and the video frame time length to determine how much video frame time length passes and then finish playing the special-effect animation to obtain the animation ending time.

It is understood that the video frame duration can be compared to 1 for the video frame rate, wherein the video frame rate refers to the number of video frames displayed per second.

Based on this, the embodiments of the present application provide formulas for the starting time of the animation and the ending time of the animation, see formulas (5) and (6):

wherein, V_startIs a starting point viewNumber of frequency frames, V_endIs the number of the end video frame, V_{frame_ratio}Is the video frame rate.

Further, based on formula (3), formula (4), formula (5), and formula (6), formula (1) and formula (2) may become:

in the embodiment of the application, the electronic device respectively processes the audio signal and the image sequence to determine the start audio frame and the end audio frame from the audio signal, and determine the start video frame and the end video frame from the video frame sequence, and then according to the corresponding relation of the time information of the audio frames and the time information of the video frames, the audio start time, the audio end time, the animation start time and the animation end time with the accuracy of the frame duration level can be automatically obtained, so that the more accurate time information can be judged subsequently, and the evaluation result can be obtained.

In some embodiments of the present application, the amplitude signature comprises: short-time energy, frequency domain features include: short time zero crossing rate. Wherein the short-time energy is a feature determined based on a sum of magnitude-weighted squares of each audio signal point, which can be used to distinguish whether sound is present in the audio signal; the short-time zero crossing rate is the number of times that the amplitude value of an audio signal point in each audio frame passes through 0, and reflects the change of the waveform of the audio signal. At this time, based on the determination of the amplitude feature and the frequency domain feature for each audio frame in the audio signal, the start audio frame and the end audio frame of the special effect audio, i.e., S1021a-S1021c, are detected as follows:

s1021a, for each audio frame in the audio signal, a short-time energy and a short-time zero-crossing rate are determined.

Each audio frame is provided with a plurality of audio signal points, the electronic equipment counts the amplitude values of the audio signal points, calculates the short-time energy of each audio frame based on the superposition of the amplitude values of the audio signal points, and determines the short-time zero-crossing rate of each audio frame based on the operation of a symbol function on the amplitude value of each audio signal point and the amplitude value of the corresponding previous audio signal point.

Illustratively, equation (9) is a process for calculating the short-time energy provided by the embodiment of the present application:

wherein n represents the nth frame, E_nRepresenting the short-time energy of the nth frame, f (x) representing the amplitude value of each audio signal point, and N representing the number of audio signal points in each audio frame.

Illustratively, equation (10) is a process for calculating the short-time zero-crossing rate provided by the embodiment of the present application:

wherein n represents the nth frame, Z_nDenotes a short-time zero-crossing rate of the nth frame, N denotes the number of audio signal points in each audio frame, f (x) denotes an amplitude value of each audio signal point, sgn [ [ deg. ] ]]Is a symbolic function, and is defined as shown in formula (11):

s1021b, screening out a plurality of effective audio frames from the audio signals, wherein the short-time energy is greater than the energy threshold value, and the short-time zero-crossing rate is greater than the zero-crossing rate threshold value.

After obtaining the short-time energy and the short-time zero-crossing rate of each audio frame, the electronic equipment compares the short-time energy with the capacity threshold value, compares the short-time zero-crossing rate with the zero-crossing rate threshold value, and screens out the audio frames of which the short-time energy is greater than the corresponding energy threshold value and the short-time zero-crossing rate is greater than the corresponding zero-crossing rate threshold value from the audio frames of the audio signal, wherein the audio frames are effective audio frames, and the audio frames of which any one of the short-time energy or the short-time zero-crossing rate is less than the corresponding threshold value are noise points. In this manner, the electronic device is able to obtain a plurality of valid audio frames.

For example, fig. 11 is a schematic diagram of the short-term energy of each audio frame provided in the embodiment of the present application, where the horizontal axis is the frame number 11-1 (value range 0-350+) of the audio frame, and the vertical axis is the short-term energy 11-2 (value range 0-200+) of the audio frame, in which case, the electronic device determines the first audio frame 11-31 in the plurality of valid audio frames 11-3 with short-term energy greater than the energy threshold as the start audio frame, and determines the last audio frame 11-32 in the plurality of valid audio frames 11-3 with short-term energy greater than the energy threshold as the end audio frame.

S1021c, determining a first valid audio frame of the plurality of valid audio frames as a start audio frame, and determining a last valid audio frame of the plurality of valid audio frames as an end audio frame.

The electronic equipment takes the audio frame with the most front frame number in the plurality of effective audio frames, namely the first effective audio frame of the plurality of effective audio frames, as a starting point audio frame, and takes the audio frame with the most back frame number in the plurality of effective audio frames, namely the last effective audio frame of the plurality of effective audio frames, as an end point audio frame, so that the electronic equipment finishes the determination process of the starting point audio frame and the end point audio frame.

In the embodiment of the application, the electronic equipment can firstly screen out a plurality of effective audio frames by calculating the short-time energy and the short-time zero-crossing rate of each audio frame, namely, the effective audio part in the audio signal is determined, and then, the starting point audio frame and the ending point audio frame are determined from the plurality of effective audio frames, so that the starting and ending of the special effect audio are accurate to the time unit corresponding to the audio frame, namely, the time length of the audio frame, and the precision of the special effect synchronous test is improved.

In some embodiments of the present application, the detecting a start video frame and an end video frame of the special effect animation based on the determining of the dynamic region for each video frame in the sequence of video frames, that is, the S1023 implementation process, may include: s1023a-S1023d, as follows:

s1023a, for each video frame of the sequence of video frames, a first dynamic region is determined.

The electronic equipment subtracts pixel points from the corresponding first adjacent video frame by pixel points aiming at each video frame, so that the pixel points with changed pixel values are reserved, the pixel points with unchanged pixel values, namely background pixel points, are eliminated, and candidate dynamic regions of each video frame are determined. Then, one or more operations of dimensionality reduction, expansion, connected domain and the like are carried out on the candidate dynamic regions to obtain a first dynamic region of each video frame.

It can be understood that the dimension reduction operation may refer to binarization or downsampling, the dilation operation refers to range expansion of the dynamic region after dimension reduction, and the connected domain operation refers to merging of the dispersed dynamic regions to obtain a complete dynamic region.

And S1023b, determining the first video frame meeting the animation starting condition in the video frame sequence as the starting video frame of the special effect animation.

The animation starting condition is that the area of the first dynamic area is larger than a first area threshold, and the overlapping area of the first dynamic area and a preset area is larger than a video frame of a second area threshold.

The electronic equipment calculates the area of the first dynamic region of each video frame and compares the area with a first area threshold value, so as to judge whether the first dynamic region of each video frame reaches a certain area (noise elimination), meanwhile, calculates the area of an overlapping part of the first dynamic region of each video frame and a preset region, and compares the calculated overlapping area with a second area threshold value, so as to judge whether the first dynamic region of each video frame falls into the preset region. Then, the electronic device determines the video frames of which the first dynamic region has reached a certain area and which fall into the preset region as candidate starting video frames, and then determines the most advanced video frame, i.e. the first video frame, of the candidate starting video frames as a starting video frame.

The preset area refers to an area where the screen content changes significantly after the special effect animation is played, for example, after the skill special effect is released, a screen such as a countdown screen starts to appear at the skill icon. Therefore, the electronic device needs to assist in determining whether the special effect animation starts playing according to the overlapping area of the first dynamic region and the preset region, so as to avoid misjudgment of the starting point video frame (for example, avoid a video frame in which other actions of other virtual objects in the game start changing, and misjudgment of the starting point video frame of the characteristic animation of the skill special effect).

S1023c, a second dynamic region is determined for a number of other video frames of the sequence of video frames following the starting video frame.

After determining the starting video frame, the electronic device determines all video frames arranged after the starting video frame as other video frames, so as to obtain a plurality of other video frames. Then, the electronic device subtracts each other video frame from a corresponding second adjacent video frame to determine a candidate dynamic region of the other video frame, and performs one or more of dimension reduction, erosion, connected domain calculation and the like on the candidate dynamic region, thereby obtaining a second dynamic region.

The dimension reduction operation can be binarization or downsampling, the erosion operation is to reduce the range of the dynamic area after dimension reduction, and the connected domain operation is to merge the dispersed dynamic areas to obtain a complete dynamic area.

It will be appreciated that the first and second adjacent video frames may be the same type of video frame, for example, the first adjacent video frame being the last video frame of each video frame, the second adjacent video frame being the last video frame of each other video frame; the first adjacent video frame and the second adjacent video frame may also be different types of video frames, for example, the first adjacent video frame is a last video frame of each video frame, the second adjacent video frame is a video frame that precedes each other video frame and differs from each other video frame by 3 frames, and so on, and the application is not limited herein.

S1023d, when N consecutive still video frames are extracted from the plurality of other video frames based on the second dynamic region, the nth still video frame is determined as the end point video frame.

The electronic device determines whether there is a continuous still video frame in the plurality of other video frames, that is, there is a video frame sequence with unchanged picture content, based on the second dynamic regions of the plurality of other video frames. When there are N consecutive still video frames in the plurality of other video frames, the electronic device determines the nth still video frame, i.e., the last still video frame, as the end point video frame. The electronic device obtains the starting video frame and the ending video frame so as to determine the evaluation result subsequently. It is understood that N is a positive integer greater than 1.

It should be noted that, in the embodiment of the present application, the still video frame is a video frame in which the area of the second dynamic region is smaller than or equal to the first area threshold, and the overlapping area of the second dynamic region and the preset region is smaller than or equal to the second area threshold. That is, when the area of the second dynamic region is larger than the first area threshold, or the overlapping area of the second dynamic region and the preset region is larger than the second area threshold, the video frame cannot be counted as a still video frame.

In the embodiment of the application, the electronic device determines the first area to be a video frame with a large enough area and most of the dynamic area falling into the preset area as the video frame at which the special effect animation starts, and then continues to search for continuous still video frames after the video frame, so that the start and the end of the special effect animation can be positioned on the granularity of the video frame according to the video frame at which the special effect animation starts and the last video frame of the continuous still video frames, that is, the positioning precision of the special effect animation is improved to a time unit corresponding to the video frame, that is, the duration of the video frame, and the precision of the special effect synchronous evaluation is also improved.

In some embodiments of the present application, determining a first dynamic region for each video frame in the image sequence, i.e., a specific implementation of S1023a, may include: S201-S204, as follows:

s201, performing frame difference calculation on each video frame in the image sequence and a first adjacent video frame of each video frame to obtain a first frame difference image.

The electronic equipment acquires a corresponding first adjacent video frame aiming at each video frame, and then performs difference on each video frame and the corresponding first adjacent video frame to obtain a first frame difference image.

It is understood that the first adjacent video frame may be a previous video frame of each video frame, or a video frame that precedes and is spaced from each video frame by m, and the present application is not limited thereto. Further, since the characteristic animation has a significant difference between the two frames before and after the start, the electronic device may set a smaller frame interval, for example, only 1 frame or 2 frames apart, for each video frame and the first adjacent video frame.

Illustratively, equation (6) provides a calculation equation for calculating the first frame difference image:

D_n(x,y)＝|f_n(x,y)-f_n-1(x,y)| (12)

wherein f is_n(x, y) is the pixel value of each pixel of each video frame, f_n-1(x, y) is the pixel value of each pixel point of the previous video frame, D_n(x, y) is the first frame difference image.

Illustratively, fig. 12 is a schematic diagram of generating a first frame difference image according to an embodiment of the present application. The electronic device performs a difference between the video frame 12-1 in the image sequence and the previous video frame of the video frame, i.e. the video frame 12-2, to obtain a first frame difference image 12-3. The difference between video frame 12-1 and video frame 12-2 is shown in first frame difference image 12-3.

S202, performing dimensionality reduction on the first frame difference image to obtain a first dimensionality reduction image.

The electronic device performs binarization processing on the first frame difference image to realize dimension reduction processing on the first frame difference image, that is, the pixel value of the pixel point in the first frame difference image, which is greater than the pixel threshold value, is set to be a maximum value, for example, 255, and the pixel value of the pixel point, which is less than or equal to the pixel threshold value, is set to be a minimum value, for example, 0, so that the foreground and the background of the first frame difference image are distinguished, and the obtained image is the first dimension reduction image.

For example, fig. 13 is a schematic diagram of a dimension reduction process provided in an embodiment of the present application. The electronic device generates a first frame difference image 13-2 for the video frame 13-1, and then performs binarization processing on the first frame difference image 13-2 to obtain a first dimension-reduced image 13-3.

It is understood that the pixel threshold may be a fixed threshold, or may be set by the electronic device according to the condition of the first dimension-reduced image, such as a median, and the like, and the application is not limited herein.

In some embodiments, in order to eliminate noise and make the quality of the first dimension-reduced image better, the electronic device may further perform a filtering process on the binarized image obtained by the binarization process, so as to obtain the first dimension-reduced image. It is understood that the electronic device may perform median filtering, gaussian filtering, or the like on the binarized image to obtain the first reduced-dimension image.

The median filtering is to sort the pixels in all the neighborhoods of the pixel points, and then to take the pixel value in the middle position of the sequence as the pixel value of the current point. Illustratively, fig. 14 is a schematic process diagram of median filtering provided in an embodiment of the present application. Referring to fig. 14, the pixel value of the pixel point 14-1 is 200, and the pixel values of the 8 adjacent pixel points are 200, 100, 50, 195, 190, 200, 198, and 200, respectively. The electronic device sorts the pixel values of the pixels to obtain 50, 100, 190, 195, 198, 200, and 200, so that the electronic device finishes the median filtering on the pixel value 198 as the pixel value of the pixel 14-1.

Fig. 15 is a comparison graph of the effects before and after the median filtering provided by the embodiment of the present application. Taking the first frame difference image 13-2 in fig. 13 as an example, it can be seen that the foreground region 15-1 in the first frame difference image 13-2 before median filtering is not smooth, and the foreground region 15-21 in the filtered image 15-2 obtained by median filtering the first frame difference image 13-2 is smoother and less noisy.

S203, expanding the image area with the brightness larger than the brightness threshold value in the first dimension reduction image to obtain an expanded area.

The electronic equipment compares the brightness of each pixel point in the first dimension-reduced image with a brightness threshold value to find out an image area with the brightness larger than the brightness threshold value, wherein the image area is a high-brightness area in the first dimension-reduced image. Then, the electronic device expands the highlight region to increase the area of the highlight region, so as to implement dilation processing on the image region with brightness greater than the brightness threshold, and thus the dilation region can be obtained, so that the special effect animation can be timely found to have started through the dilation region (for example, the initial picture change of some special effect animations is not large).

It is understood that the calculation process of the dilation process for the image with brightness greater than the brightness threshold may be as shown in equation (13):

where A is the first dimension-reduced image, B is the convolution template, and x is each pixel value.

For example, fig. 16 is a schematic diagram illustrating the effect of the expansion process provided in the embodiment of the present application. The electronic device performs dilation on the first dimension-reduced image 16-1, so that the area of the highlight region 16-11 is enlarged to become the highlight region 16-21 in the image 16-2, and the highlight region 16-21 expands.

And S204, calculating a connected domain aiming at the expansion region to obtain a first dynamic region.

The electronic device may directly use the connected domain of the expansion region as the first dynamic region, or may enlarge the region where the connected domain of the expansion region is located by several times to obtain the first dynamic region.

It should be noted that the connected domain is generally an image region composed of foreground pixels having the same pixel value and located adjacent to each other in the image. In this embodiment, the electronic device may calculate the connected component of the dilated region based on a seed filling algorithm. At this time, the electronic device may select a foreground pixel point from the expansion region as a seed, and then combine the foreground pixel points that have the same pixel value as the seed and are adjacent to the seed into the same pixel point set, and finally the pixel point sets are the connected domain of the expansion region.

Illustratively, FIG. 17 is a schematic view of a connected domain of an expanded region provided by an embodiment of the present application. The electronic device performs connected component calculation on the dilated area 17-11 in the first dimension-reduced image 17-1 to obtain the connected component 17-12 in the first dimension-reduced image 17-1.

In the embodiment of the application, the electronic device may first calculate the frame difference to preliminarily determine the changed region in each video frame to obtain the first frame difference image, then perform image processing such as dimension reduction, expansion, connected domain calculation and the like on the first frame difference image to eliminate noise, and merge the discrete regions divided by the frame difference calculation, thereby obtaining a more accurate first dynamic region.

In some embodiments of the present application, the determining the second dynamic region for a plurality of other video frames in the image sequence after the starting video frame, respectively, that is, the implementation process of S1023c, may include: S205-S208, as follows:

and S205, performing frame difference calculation on each other video frame and a second adjacent video frame of each other video frame to obtain a second frame difference image.

The second adjacent video frame may be a video frame that precedes and is spaced from the other video frames. Further, the interval between the other video frame and the second adjacent video frame is larger than the interval between each video frame and the first adjacent video frame. For example, when the first adjacent video frame is the last video frame of each video frame, the second adjacent video frame is the 3 rd video frame from the other video frames onward. This is because the end of the special-effect animation generally changes very slowly, and when the frame difference calculation is performed with a second adjacent video frame with a small interval and the change is not completed, the video frame is determined as a still video frame, so that the video frame with a large interval needs to be selected as the second adjacent video frame for performing the frame difference calculation, so as to reduce the occurrence of false reports.

And S206, performing dimensionality reduction on the second frame difference image to obtain a second dimensionality reduction image.

It is understood that the process is similar to the processing process of S202, and is not described in detail herein.

And S207, corroding the image area with the brightness larger than the brightness threshold value in the second dimension reduction image to obtain a corroded area.

Since the embodiment of the present application is to search for a still video frame, and the still video frame requires a sufficiently small change from the previous video frame, the electronic device in the embodiment of the present application needs to erode the highlight region in the second dimension-reduced image to eliminate the interference of noise as much as possible when determining the sequence of still video frames.

The corrosion refers to moving the template element on the whole second dimension-reduced image, and only when the pixel values of the template element and the pixel points on the second dimension-reduced image are all equal, the pixel values of the pixel points are reserved, and the pixel values of other pixel points are cleared.

And S208, calculating a connected domain aiming at the corrosion region to obtain a second dynamic region.

It is understood that the process is similar to the process of S204, and is not described herein again.

In the embodiment of the application, the electronic device may first calculate the frame difference to preliminarily determine the changed region in each video frame to obtain the second frame difference image, and then perform image processing such as dimension reduction processing, corrosion, connected domain calculation and the like on the second frame difference image to eliminate noise interference, so as to obtain a more accurate second dynamic region.

In some embodiments of the present application, the animation time information includes: the determining of the evaluation result, that is, the specific implementation process of S103, according to the difference between the animation time information and the audio time information, may include: S1033-S1034, as follows:

s1033, calculating a third time difference between the animation duration and the audio duration.

S1034, determining an evaluation result according to the magnitude relation between the third time difference and the third time threshold.

The electronic device may perform a difference between the animation duration and the audio duration to obtain a third time difference, determine that the evaluation result is that the special effect animation and the special effect audio are synchronous when the third time difference is less than or equal to a third time threshold, and determine that the evaluation result is that the special effect animation and the special effect audio are asynchronous when the third time difference is greater than the third time threshold.

The third time threshold may be set according to actual conditions, and the application is not limited herein.

In the embodiment of the application, the electronic device can also directly judge whether the animation duration of the special effect animation is the same as the audio duration of the special effect audio without considering the starting time and the ending time, so that the judgment is simpler and more convenient.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.

The embodiment of the application is realized in the scene of testing the special effect synchronization of the skill special effect (special effect to be evaluated) of the game, namely, whether a picture (special effect animation) of the skill special effect and sound (special effect audio) of the skill special effect are synchronous or not is judged.

Referring to fig. 18, fig. 18 is a schematic diagram of a system architecture for performing special effect synchronization test according to an embodiment of the present application. In the system, a tester can upload 18-1 recorded skill video 18-2 (characteristic video of special effect to be evaluated) to a Web platform 18-A (terminal), when the tester operates on the Web platform 18-A to trigger the start of test (in response to the trigger operation of the evaluation trigger mark in the evaluation interface), the Web platform 18-A sends the skill video 18-2 to a background 18-B (server), and the background 18-B automatically analyzes the time difference between the start time and the end time of audio and picture through consistency calculation 18-3, wherein the larger the time difference is, the more asynchronous the audio and picture of the skill is, and the worse the consistency is. The back office 18-B then performs data storage 18-4 on the consistency results and feeds back to the Web platform 18-A to display the consistency results 18-5 (in a result display area of the evaluation interface, test results are displayed).

Referring to fig. 19, a schematic process diagram of consistency calculation provided in the embodiment of the present application is shown. Referring to fig. 19, the process includes:

and S301, reading in a skill video (acquiring a characteristic video of a special effect to be evaluated).

S302, separating the picture and the audio (analyzing an image sequence and an audio signal).

And S303, calculating the short-time zero crossing rate.

S304, preliminarily calculating an audio starting point t1 and an audio finishing point t 2.

And S305, calculating short-time energy.

S306, accurately calculating an audio starting point t1 and an audio ending point t2 (a starting audio frame and an ending audio frame).

And S307, calculating image frame difference.

And S308, carrying out frame difference image binarization.

And S309, binary image expansion/corrosion.

And S310, detecting a connected domain.

And S311, judging the position of the connected domain.

S312, a detection skill release frame t3, and an end frame t4 (a start video frame and an end video frame).

And S313, consistency judgment. The consistency determination process can be realized by equations (1) and (2). When any one of the time difference between the start point of the audio and the screen of the skill (first time difference) and the time difference between the end point of the audio and the screen of the skill (second time difference) is larger than a given time threshold (time difference threshold), the background determines that the audio and the screen of the skill are not synchronous, and the special effect of the skill needs to be adjusted.

Wherein S303-S306 and S307-S312 are performed synchronously.

Next, each processing procedure of the consistency detection will be described.

For the detection of the starting point and the end point of the audio, a double-threshold method based on short-time energy and a short-time zero-crossing rate is used in the background, namely when the short-time energy and the short-time zero-crossing rate of an audio frame are both greater than a specified threshold, the audio frame is judged to be an effective skill audio signal (a plurality of effective audio frames with short-time energy greater than an energy threshold and short-time zero-crossing rate greater than a zero-crossing rate threshold are screened out), and signals outside the double-threshold are basically noise.

Illustratively, referring to fig. 20, fig. 20 is a schematic diagram of a process for preliminarily detecting a start point and an end point of a skill audio signal according to short-time energy according to an embodiment of the present application, where the process includes:

s401, aiming at the audio sequence Xn, calculating short-time energy En.

It should be noted that the short-term energy represents energy of each audio frame, and represents strength information of the audio signal at different time points, and therefore, the short-term energy of the effective skill audio signal needs to be greater than the threshold T. The short-time energy can be calculated by equation (3).

S402, judging whether En is larger than a threshold value T. If yes, S403, S404 and S405 are respectively executed.

And S403, judging whether S is equal to 0 or not. If yes, S406 is performed.

Where S is used to record the start of the active audio.

S404, judging whether E is equal to N. If yes, S408 is executed, and if no, S407 is executed.

Where E is used to record the end of the active audio. N is the total number of audio frames of the audio signal.

S405, judging whether N is smaller than N. If yes, S408 is performed.

S406, update start point S ═ n.

S407, judging whether n is larger than E. If yes, S408 is performed.

And S408, updating the end point E to be n.

As can be seen from fig. 20, the background calculates the short-time energy for each audio frame, so as to find out the first audio frame with the short-time energy greater than the threshold T in the audio signal as the audio start point E1 obtained based on the short-time energy, and the last audio frame with the energy greater than the threshold T as the audio end point E2 obtained based on the short-time energy.

Next, the background uses the same method to obtain an audio start point S2 and an audio end point E2 based on the detection of the short-time zero-crossing rate. Normally, S1, S2, E1 and E2 are all relatively close to each other, but in order to avoid noise interference, the background takes the final start point S and end point E of the intersection (a plurality of valid audio frames) (the first valid audio frame is determined as the start audio frame, and the last valid audio frame is determined as the end audio frame), that is, the interval satisfying both the threshold of the short-time zero-crossing rate and the threshold of the short-time energy, as the real skill release interval, so that the time (audio start time) for starting playing and the time (audio end time) for ending playing the audio with skill special effect can be obtained.

The short-time zero-crossing rate represents the number of times that the amplitude value of the signal returns to the 0 point in a short time, and represents the frequency domain characteristics of the audio signal. The short-time zero-crossing rate can be calculated by equation (4).

Fig. 21 is a schematic process diagram for determining a skill release frame and an end frame according to an embodiment of the present application. Referring to fig. 21, the process includes:

s501, judging whether a skill release starting mark exists before a video frame. If not, S502 is executed, and if yes, S510 is executed.

S502, a frame difference image diff1 (first frame difference image) is calculated.

By calculating the frame difference image diff1 between a video frame and the previous video frame, the difference change between the two images can be obtained. The background uses different frame intervals when detecting a skill effect start frame (start point video frame) and a special effect end frame (end point video frame), because when detecting the special effect start frame, it detects the characteristic when the skill release button is pressed, and the special effect has obvious numbers before and after the button is pressed, so a smaller frame interval is used. The process of calculating the frame difference image diff1 can be realized by equation (12).

S503, median filtering of the frame difference image diff 1.

The purpose of the median filtering is to remove noise information in the frame difference image diff 1.

And S504, carrying out filtered image binarization (carrying out dimension reduction on the first frame difference image to obtain a first dimension reduction image).

And S505, expanding the binary image.

The purpose of performing dilation operation on the binary image is to find a maximum region of a highlight region in the binary image (a dilated region is obtained by dilating an image region with brightness greater than a brightness threshold), that is, to perform "territorial expansion" on the highlight region of the binary image, so as to merge a plurality of discrete regions divided by the frame difference image.

And S506, calculating a binary image connected domain (calculating the connected domain of the expansion region to obtain a first dynamic region).

The step may include the following steps:

1) and traversing the pixel points of the binary image by the background, and finding out the pixel point A (x, y) ═ 1. Specifically, the process may further include: a. taking A (x, y) as a seed by a background, recording the position of the seed, and pressing a point which is adjacent to the seed and has the same pixel value into an occupying stack; b. popping up a pixel at the top of the stack, and then pressing a pixel point which is adjacent to the pixel point and has the same pixel value into the stack; c. repeat b until the stack is empty.

2) Repeat 1) until the scan ends.

After the scanning is finished, all connected domains in the video frame can be obtained.

And S507, judging whether the position of the connected domain is located in the skill button area (preset area). If yes, S508 is executed, otherwise, the operation is ended.

And intercepting the skill button from the game interface, and recording the center and the radius of the skill button to obtain a skill button area. Illustratively, fig. 22 is a schematic diagram of a skill button area provided by an embodiment of the present application, wherein the skill button area 22-11 is located at the lower right corner of the game interface 22-1.

Then, the background judges the proportion of the pixel points in the connected domain falling into the skill button area, and when the proportion is greater than a threshold value, for example 80%, the skill icon special effect is judged.

And S508, judging whether the area of the connected domain is larger than a threshold value. If yes, S509 is executed, otherwise, the operation is ended.

The purpose of this step is to eliminate subtle interference and make the results more accurate.

S509, marking whether the skill starts to be identified (determining a video frame in which the area of the first dynamic region is greater than the first area threshold and the overlapping area of the first dynamic region and the preset region is greater than the second area threshold as a starting point video frame).

S510, a frame difference image diff2 (second frame difference image) is calculated.

S511, median filtering the frame difference image diff 2.

And S512, binarizing the filtered image (obtaining a second dimension-reduced image).

And S513, corroding the binary image (obtaining a corroded area).

And S514, calculating a binary image connected domain (obtaining a second dynamic region).

And S515, judging whether the position of the connected domain is located in the skill button area. If yes, S516 is performed, otherwise S517 is performed.

And S516, clearing the continuous static frames.

And S517, judging whether the area of the connected domain is larger than a threshold value. If yes, S516 is performed, otherwise S518 is performed.

And S518, increasing the number of the continuous static frames by 1.

S519, it is determined whether the number of consecutive still frames reaches a threshold (consecutive N still video frames). If yes, S520 is performed.

And S520, marking a skill release ending identifier.

By the method, whether the audio and the picture of the skill special effect are synchronous or not can be automatically determined, the testing efficiency of special effect synchronization is improved, the time difference is accurate to the frame level, the testing precision of special effect synchronization is improved, and the intelligent degree of special effect synchronization testing is increased.

Continuing with the exemplary structure of the special effect synchronization assessment apparatus 555 provided by the embodiment of the present application implemented as a software module, in some embodiments, as shown in fig. 2, the software module stored in the special effect synchronization assessment apparatus 555 of the memory 550 may include:

the video acquiring module 5551 is configured to respond to a trigger operation for an evaluation trigger in an evaluation interface, and acquire a special effect video of a special effect to be evaluated;

a time determining module 5552, configured to determine animation time information corresponding to a special effect animation of the special effect to be evaluated and audio time information corresponding to a special effect audio of the special effect to be evaluated based on a video frame sequence and an audio signal analyzed from the special effect video; the animation time information is time information with the precision of video frame duration, and the audio time information is time information with the precision of audio frame duration;

a result generating module 5553, configured to determine an evaluation result according to a difference between the animation time information and the audio time information; the evaluation result represents the synchronization condition of the special effect animation and the special effect audio;

a result displaying module 5554, configured to display the evaluation result in a result displaying area of the evaluation interface.

In some embodiments of the present application, the special effects to be evaluated include: a plurality of skill special effects; the video obtaining module 5551 is further configured to obtain multiple videos uploaded to a skill video area of the evaluation interface, so as to obtain the special effect videos of the multiple skill special effects; or releasing skills corresponding to the skill special effects in a virtual scene displayed in a virtual interaction area of the evaluation interface, displaying the skill special effects, and performing video recording on the virtual scene to obtain the special effect videos of the skill special effects.

In some embodiments of the present application, the evaluation result comprises: a plurality of sub-evaluation results corresponding to the respective special effect videos of the skill special effects; the special effect synchronization evaluation device 555 further includes: a result alignment module 5555;

the result comparison module 5555 is configured to, based on comparison between the plurality of sub-evaluation results, find out a special effect to be repaired from the plurality of skill special effects;

the result display module 5554 is further configured to display the identification information corresponding to the special effect to be repaired in a repair prompt area of the evaluation interface.

the result generation module 5553 is further configured to calculate a first time difference between the animation start time and the audio start time, and a second time difference between the animation end time and the audio end time; and determining the evaluation result according to at least one of the first time difference and the second time difference.

In some embodiments of the present application, the result generation module 5553 is further configured to determine that the evaluation result is that the special effect animation is synchronized with the special effect audio when the first time difference is less than or equal to a first time threshold and the second time difference is less than or equal to a second time threshold; when the first time difference is larger than the first time threshold or the second time difference is larger than the second time threshold, determining that the evaluation result is that the special effect animation is not synchronous with the special effect audio.

In some embodiments of the present application, the time determination module 5552 is further configured to detect a start audio frame and an end audio frame of the special effect audio based on the determination of the amplitude feature and the frequency domain feature for each audio frame in the audio signal; calculating the audio starting time according to the number of the starting audio frame and the audio frame duration, and calculating the audio ending time according to the number of the ending audio frame and the audio frame duration; detecting a start point video frame and an end point video frame of the special effect animation based on determination of a dynamic region of each video frame in the video frame sequence; and calculating the animation starting time according to the number of the starting video frame and the video frame duration, and calculating the animation ending time according to the number of the ending video frame and the video frame duration.

In some embodiments of the present application, the amplitude signature comprises: short-time energy, the frequency domain features comprising: short-time zero-crossing rate; the time determination module 5552 is further configured to determine the short-time energy and the short-time zero-crossing rate for each of the audio frames in the audio signal; screening out a plurality of effective audio frames of which the short-time energy is greater than an energy threshold value and the short-time zero-crossing rate is greater than a zero-crossing rate threshold value from the audio signals; determining a first valid audio frame of the plurality of valid audio frames as the start audio frame, and determining a last valid audio frame of the plurality of valid audio frames as the end audio frame.

In some embodiments of the present application, the time determination module 5552 is further configured to determine, for each of the video frames in the sequence of video frames, a first dynamic region; determining a first video frame meeting an animation starting condition in the video frame sequence as the starting video frame of the special effect animation, wherein the animation starting condition is that the area of the first dynamic region is larger than a first area threshold value, and the overlapping area of the first dynamic region and a preset region is larger than a second area threshold value; respectively determining a second dynamic region for a plurality of other video frames positioned after the starting point video frame in the video frame sequence; when N continuous static video frames are extracted from the other video frames based on the second dynamic area, determining the Nth static video frame as the end point video frame; and N is a positive integer greater than 1, the still video frame is a video frame in which the area of the second dynamic region is less than or equal to the first area threshold, and the overlapping area of the second dynamic region and the preset region is less than or equal to the second area threshold.

In some embodiments of the present application, the time determining module 5552 is further configured to perform frame difference calculation on each of the video frames in the sequence of video frames and a first neighboring video frame of each of the video frames to obtain a first frame difference image; performing dimensionality reduction processing on the first frame difference image to obtain a first dimensionality reduction image; expanding an image area with the brightness larger than a brightness threshold value in the first dimension-reduced image to obtain an expanded area; and calculating a connected domain aiming at the expansion region to obtain the first dynamic region.

In some embodiments of the present application, the time determining module 5552 is further configured to perform frame difference calculation on each of the other video frames and a second adjacent video frame of each of the other video frames to obtain a second frame difference image; performing dimensionality reduction processing on the second frame difference image to obtain a second dimensionality reduction image; corroding the image area with the brightness larger than the brightness threshold value in the second dimension-reduced image to obtain a corroded area; and calculating a connected domain according to the corrosion region to obtain the second dynamic region.

In some embodiments of the present application, the animation time information includes: the animation duration of the special effect animation and the audio duration of the special effect audio; the result generation module 5553 is further configured to calculate a third time difference between the animation duration and the audio duration; and determining the evaluation result according to the magnitude relation between the third time difference and a third time threshold.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the special effect synchronization evaluation method described in the embodiment of the present application.

Embodiments of the present application provide a computer-readable storage medium storing executable instructions, which when executed by a processor, cause the processor to perform a special effect synchronization evaluation method provided by embodiments of the present application, for example, a special effect synchronization evaluation method as shown in fig. 3.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EP ROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device (electronic device), or on multiple computing devices located at one site, or distributed across multiple sites and interconnected by a communication network.

In summary, according to the embodiment of the present application, the electronic device analyzes the image sequence and the audio signal of the special effect video to obtain the time information that the precision of the special effect animation is the video frame duration and the time information that the precision of the special effect audio is the audio frame duration, that is, the time information corresponding to the special effect animation and the special effect audio respectively is directly determined to the frame duration level, so as to achieve higher precision, and then the high-precision time information is used for performing precise comparison, so that the problem that the frame duration levels are not synchronous and cannot be easily sensed can be found, that is, the problem that the special effects are asynchronous and of various degrees of severity can be found, so that the precision of the special effect synchronous evaluation is greatly improved, and the whole process is automatically realized without multiple operations, thereby accelerating the efficiency of the special effect synchronous evaluation. In summary, the method for synchronously evaluating the special effects provided by the embodiment of the application improves the precision of synchronous evaluation of the special effects, accelerates the efficiency of synchronous evaluation of the special effects, and finally improves the intelligent degree of synchronous evaluation of the special effects. Furthermore, by the special effect synchronous evaluation method of the embodiment of the application, the possibility that the special effect synchronous problem is perceived by a player can be reduced, and meanwhile, the special effects which need to be modified can be transversely compared, so that the intelligent degree of special effect synchronous evaluation is further improved.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A special effect synchronous evaluation method is characterized by comprising the following steps:

and displaying the evaluation result in a result display area of the evaluation interface.

2. The method of claim 1, wherein the animation time information comprises: the animation start time and the animation end time of the special effect animation, and the audio time information comprises: the audio starting time and the audio ending time of the special effect audio;

determining an evaluation result according to the difference between the animation time information and the audio time information, including:

calculating a first time difference between the animation start time and the audio start time, and a second time difference between the animation end time and the audio end time;

and determining the evaluation result according to at least one of the first time difference and the second time difference.

3. The method of claim 2, wherein determining the evaluation result based on at least one of the first time difference and the second time difference comprises:

when the first time difference is smaller than or equal to a first time threshold value and the second time difference is smaller than or equal to a second time threshold value, determining that the evaluation result is that the special effect animation is synchronous with the special effect audio;

when the first time difference is larger than the first time threshold or the second time difference is larger than the second time threshold, determining that the evaluation result is that the special effect animation is not synchronous with the special effect audio.

4. The method according to claim 2 or 3, wherein the determining animation time information corresponding to the special effect animation of the special effect to be evaluated and audio time information corresponding to the special effect audio of the special effect to be evaluated based on the video frame sequence and the audio signal analyzed from the special effect video comprises:

detecting a starting audio frame and an end audio frame of the special effect audio based on the determination of amplitude characteristics and frequency domain characteristics of each audio frame in the audio signal;

calculating the audio starting time according to the number of the starting audio frame and the audio frame duration, and calculating the audio ending time according to the number of the ending audio frame and the audio frame duration;

detecting a start point video frame and an end point video frame of the special effect animation based on determination of a dynamic region of each video frame in the video frame sequence;

and calculating the animation starting time according to the number of the starting video frame and the video frame duration, and calculating the animation ending time according to the number of the ending video frame and the video frame duration.

5. The method of claim 4, wherein the amplitude signature comprises: short-time energy, the frequency domain features comprising: short-time zero-crossing rate;

the detecting a start audio frame and an end audio frame of the special effect audio based on determining an amplitude feature and a frequency domain feature for each audio frame in the audio signal comprises:

determining the short-time energy and the short-time zero-crossing rate for each of the audio frames in the audio signal;

screening out a plurality of effective audio frames of which the short-time energy is greater than an energy threshold value and the short-time zero-crossing rate is greater than a zero-crossing rate threshold value from the audio signals;

determining a first valid audio frame of the plurality of valid audio frames as the start audio frame, and determining a last valid audio frame of the plurality of valid audio frames as the end audio frame.

6. The method of claim 4, wherein detecting the start video frame and the end video frame of the special effect animation based on determining a dynamic region for each video frame in the sequence of video frames comprises:

determining a first dynamic region for each of the video frames in the sequence of video frames;

determining a first video frame meeting an animation starting condition in the video frame sequence as the starting point video frame of the special effect animation; the animation starting condition is a video frame that the area of the first dynamic area is larger than a first area threshold value, and the overlapping area of the first dynamic area and a preset area is larger than a second area threshold value;

respectively determining a second dynamic region for a plurality of other video frames positioned after the starting point video frame in the video frame sequence;

when N continuous static video frames are extracted from the other video frames based on the second dynamic area, determining the Nth static video frame as the end point video frame;

and N is a positive integer greater than 1, the still video frame is a video frame in which the area of the second dynamic region is less than or equal to the first area threshold, and the overlapping area of the second dynamic region and the preset region is less than or equal to the second area threshold.

7. The method of claim 6, wherein determining a first dynamic region for each of the video frames in the sequence of video frames comprises:

performing frame difference calculation on each video frame in the video frame sequence and a first adjacent video frame of each video frame to obtain a first frame difference image;

performing dimensionality reduction processing on the first frame difference image to obtain a first dimensionality reduction image;

expanding an image area with the brightness larger than a brightness threshold value in the first dimension-reduced image to obtain an expanded area;

and calculating a connected domain aiming at the expansion region to obtain the first dynamic region.

8. The method according to claim 6, wherein the determining a second dynamic region for each of a plurality of other video frames of the sequence of video frames that follow the starting video frame comprises:

performing frame difference calculation on each other video frame and a second adjacent video frame of each other video frame to obtain a second frame difference image;

performing dimensionality reduction processing on the second frame difference image to obtain a second dimensionality reduction image;

corroding the image area with the brightness larger than the brightness threshold value in the second dimension-reduced image to obtain a corroded area;

and calculating a connected domain according to the corrosion region to obtain the second dynamic region.

9. The method of claim 4, wherein the audio frame comprises: and N audio signal points, wherein the duration of the audio frame is the ratio of N to the audio sampling rate.

10. The method of claim 1, wherein the animation time information comprises: the animation duration of the special effect animation and the audio duration of the special effect audio;

calculating a third time difference between the animation duration and the audio duration;

and determining the evaluation result according to the magnitude relation between the third time difference and a third time threshold.

11. A special effect synchronization evaluation apparatus, characterized by comprising:

12. An electronic device for special effects synchronization assessment, the electronic device comprising:

a memory for storing executable instructions;

a processor configured to implement the special effect synchronization assessment method of any one of claims 1 to 10 when executing executable instructions stored in the memory.

13. A computer-readable storage medium storing executable instructions which, when executed by a processor, implement the special effect synchronization assessment method of any one of claims 1 to 10.

14. A computer program product comprising a computer program or instructions which, when executed by a processor, implement the special effect synchronization assessment method of any one of claims 1 to 10.