WO2024078238A1 - Video-recording control method, electronic device and medium - Google Patents

Video-recording control method, electronic device and medium Download PDF

Info

Publication number
WO2024078238A1
WO2024078238A1 PCT/CN2023/118317 CN2023118317W WO2024078238A1 WO 2024078238 A1 WO2024078238 A1 WO 2024078238A1 CN 2023118317 W CN2023118317 W CN 2023118317W WO 2024078238 A1 WO2024078238 A1 WO 2024078238A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
recording
content
camera
instruction
Prior art date
Application number
PCT/CN2023/118317
Other languages
French (fr)
Chinese (zh)
Inventor
卞超
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024078238A1 publication Critical patent/WO2024078238A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72439User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for image or video messaging
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72448User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems

Definitions

  • the present application relates to the field of recording technology, and in particular to a video recording control method, electronic equipment and medium.
  • the present application provides a video recording control method, electronic device and medium.
  • the present application provides a video recording control method, which is applied to an electronic device, and the method includes: in response to a first operation of a user to start recording, performing recording at least through a camera, and the recorded content includes at least video stream content; identifying image instructions input by the user through the camera in the video stream content, and the image instructions are used to achieve control over the recording; wherein, identifying image instructions input by the user through the camera in the video stream content includes: identifying at least a first image block of at least one image frame in the video stream content, identifying an image instruction that matches a characteristic behavior in the first image block, and determining a first time interval where the image instruction is located based on at least one image frame where the image instruction is located.
  • the first time interval where the image instruction is located can be determined, so that it is convenient to determine the second time interval including the first time interval based on the first time interval where the image instruction is located, and obtain the corresponding splicing content after deleting the second time interval in the recorded content, so as to realize recording control through image instructions during the recording process, meet the control requirements in video recording, and obtain the first recording file without image instructions, so as to facilitate the recording of scenes such as single-person recording and live broadcast, and improve the user experience.
  • the image instructions used for recording control can include image instructions in multiple forms, so that users can use different instruction forms to control the recording process in different scenes.
  • the recognition of image instructions can be realized by recognizing partial image blocks of image frames in the video stream content, effectively improving the recognition efficiency.
  • the characteristic behavior may be the image instruction content corresponding to each operation stored in the electronic device, which is used to match the image input by the user during the recording process to identify the image instruction input by the user during the recording process.
  • image blocks of an image frame in the video stream content may be identified, or all image blocks of an image frame in the video stream content may be identified.
  • An image block may be an image region of a set size.
  • determining the first time interval where the image instruction is located based on at least one image frame where the image instruction is located includes: at least one image frame includes a first image frame and a second image frame, the first image frame is the first image frame where the image instruction matches the characteristic behavior in the first image block, the second image frame is the last image frame where the image instruction matches the characteristic behavior in the first image block, the start time of the first time interval is the moment where the first image frame is located, and the end time of the first time interval is the moment where the second image frame is located.
  • the electronic device can compare each image frame in the recording process with each image frame in the image instruction content (or characteristic behavior) corresponding to the operation in the image resource library.
  • image frame from the beginning to the end of an image instruction that matches the image frame from the beginning to the end of any image instruction content stored in the resource library, that is, it is completely consistent, then it is confirmed that the corresponding image instruction is recognized, and it can be Execute the operation corresponding to the image instruction.
  • the first image frame in the image command input by the user that can match the image command content of any image command content in the resource library, that is, the first image frame can have its corresponding time point marked as the starting time point of the first time interval in which the image command is located
  • the last image frame that can match the image command content in the resource library, that is, the second image frame can have its corresponding time point marked as the ending time point of the first time interval of the image command.
  • the method before determining the image instruction that matches the characteristic behavior in the first image block, the method also includes: identifying a second image block of at least one image frame in the video stream content, and switching from the second image block to the first image block when no image instruction is identified.
  • the first image block is larger than the second image block, and the first image block includes the second image block.
  • the second image block is located at the center of the image frame, and the first image block extends in all directions based on the second image block.
  • the second image block may be an image block located at the center of the image frame, that is, for example, it may be the first center image block mentioned in the embodiment of the present application, and the first image block may be an image block of a set size extending from the image block at the center of the image frame to the surrounding areas, for example, it may be the second center image block mentioned in the embodiment of the present application.
  • the core information of the image generally only occupies a local area of the entire image, and has a higher probability of occupying the central area.
  • the images will generally match. Therefore, in the embodiment of the present application, a local comparison method in which the center position image block is gradually expanded is used to match the image. This can improve the matching efficiency while ensuring the matching accuracy.
  • the first image block is adjacent to the second image block.
  • the first image block may be an image block adjacent to the right side of the second image block, or an image block adjacent to the left side, top side, bottom side, or diagonal direction of the second image block.
  • the next image block arranged after the second image block may be determined according to a preset switching sequence, and each image block may be identified and matched according to the switching sequence.
  • the first image block is the next image block arranged after the second image block according to a switching sequence
  • the switching sequence includes preset positions of image blocks of different sequences.
  • the core information of the image generally only occupies a local area of the entire image, and when the core information matches, the images will generally match. Therefore, the local comparison method of comparing each image block in sequence according to the switching order is adopted in this application to match the image, which can improve the matching efficiency while ensuring the matching accuracy.
  • the method when the image instruction is matched with the characteristic behavior in the first image block, the method further includes: identifying a first pixel in the first image block, a display parameter difference between the first pixel and adjacent pixels being higher than a first threshold, and ignoring matching of the first pixel.
  • the reason for the mismatch in image matching results may be the existence of bad pixels (or noise points), rather than the image mismatch in the actual sense. Therefore, in the present application, pixel points whose display parameter difference with adjacent pixels is higher than the first threshold are identified, that is, bad pixels with obvious jumps, and the bad pixels are not matched, which can effectively ensure the accuracy of image matching.
  • the method when the image instruction is matched with the characteristic behavior in the first image block, the method also includes: identifying the unmatched second pixel and third pixel in the first image block, and ignoring the matching of the second pixel and the third pixel when the position difference between the second pixel and the third pixel exceeds a second threshold.
  • the two unmatched pixels when the position difference between any two unmatched pixels is too large, the two unmatched pixels can be considered as insignificant and negligible points, and the negligible points can be not matched, which can effectively ensure the accuracy of image matching.
  • the characteristic behavior matching the image instruction includes schematic content or body movements appearing in the first image block.
  • the schematic content includes specific text information or image information appearing in the first image block.
  • the recording of the video can be controlled by the schematic content, so that when the user is unable to input limb or facial images, the recording of the video can be controlled by specific text or images, such as text or images displayed on cardboard, to meet the multi-scenario requirements of video recording control.
  • the body movement includes specific gesture information or facial information appearing in the first image block.
  • performing recording through at least one camera includes: performing recording through a first camera; and performing recording through a second camera in response to an image instruction for switching cameras input by a user through the first camera.
  • the first camera is a front camera of the electronic device, and the second camera is a rear camera of the electronic device; or the first camera is a rear camera of the electronic device, and the second camera is a front camera of the electronic device.
  • the method further includes: identifying the user in the video stream content through the first camera. Image command input from the second camera.
  • the characteristic behavior matching the image instruction for switching the camera includes that a recorded image is rotated in the first image block due to the user turning the electronic device.
  • the electronic device when the electronic device detects that the recording screen is rotating, it can be judged that the user has the intention to switch the camera, and it is determined that the image instruction to switch the camera is detected, and the electronic device can be controlled to switch the camera.
  • the user does not need to input body movements or gestures, but only needs to turn the mobile phone to rotate the recording screen to control the electronic device to switch the camera, which meets the recording control needs of multiple scenes and improves the user experience.
  • the method further includes: generating a first recording file according to the recorded content, the first recording file including the spliced content after deleting the recorded content corresponding to the second time interval, and the second time interval is determined based on the first time interval where the image instruction is located.
  • determining the second time interval based on the first time interval where the image instruction is located includes: determining the third time interval according to the waveform similarity of the audio stream content before and after the second time interval; and determining the first time interval according to the third time interval.
  • the time interval where the image instruction is located can be optimized to obtain a second time interval.
  • the spliced content is generated after deleting the recorded content corresponding to the second time interval. This can further ensure that the recorded content will not mutate and effectively ensure that the generated recording file does not include the image instruction content.
  • the method further includes: generating a second recording file according to the recorded content, wherein the second recording file is marked with a start time and an end time of the second time interval.
  • the method also includes: generating a third recording file set based on the recorded content, the third recording file set including at least one first-category recording segment file corresponding to the recorded content in the second time interval and at least one second-category recording segment file of the recorded content outside the second time interval.
  • a third set of recorded files can be generated based on the recorded files marked with the start time and the end time of the second time interval, and the electronic device can store the first type of recorded clip files and the corresponding operations in the image instruction material library for subsequent recognition and matching of image instructions.
  • the electronic device can store the second type of recorded clip files, which can facilitate users to view them separately or perform editing processing such as splicing and synthesis.
  • the present application provides an electronic device, comprising: a memory for storing instructions executed by one or more processors of the electronic device, and a processor, which is one of the one or more processors of the electronic device, for executing the video recording control method mentioned in the present application.
  • the present application provides a readable storage medium having instructions stored thereon.
  • the instructions When the instructions are executed on an electronic device, the electronic device executes the video recording control method claimed in the present application.
  • the present application provides a computer program product, including: execution instructions, the execution instructions are stored in a readable storage medium, at least one processor of an electronic device can read the execution instructions from the readable storage medium, and at least one processor executes the execution instructions so that the electronic device executes the video recording control method mentioned in the present application.
  • FIG1 is a schematic diagram showing a hardware structure of an electronic device according to some embodiments of the present application.
  • FIG2a shows a schematic diagram of microphone distribution according to some embodiments of the present application.
  • FIG2b is a schematic diagram showing a sound receiving range of a microphone according to some embodiments of the present application.
  • 2c-2e respectively show a schematic diagram of the distribution of a rear camera according to some embodiments of the present application
  • FIG2f shows a schematic diagram of the distribution of front cameras according to some embodiments of the present application.
  • FIG2g is a schematic diagram showing a shooting range of a front camera and a rear camera according to some embodiments of the present application.
  • FIG3a shows a schematic diagram of a software structure of an electronic device according to some embodiments of the present application.
  • FIG3 b shows a functional schematic diagram of an image/video acquisition module according to some embodiments of the present application.
  • FIG3c shows a functional schematic diagram of an image/video acquisition module according to some embodiments of the present application.
  • FIG3d is a schematic diagram showing a method of performing independent control of a split flow in an image/video acquisition module according to some embodiments of the present application.
  • FIG3e shows a schematic diagram of a flow splitting control according to some embodiments of the present application.
  • FIG3f shows a functional schematic diagram of an operation control module according to some embodiments of the present application.
  • FIG3g shows a functional schematic diagram of an image/video recognition module according to some embodiments of the present application.
  • FIG3h shows a functional schematic diagram of an audio-video synchronization module according to some embodiments of the present application.
  • FIG3i is a schematic diagram showing a synchronous control method according to some embodiments of the present application.
  • FIG3j is a functional schematic diagram of a shooting result generating module according to some embodiments of the present application.
  • FIGS. 4a-4c are schematic diagrams showing schematic contents according to some embodiments of the present application.
  • FIG5 is a schematic diagram showing a flow chart of a video recording control method according to some embodiments of the present application.
  • FIG6 shows a schematic diagram of starting recording according to some embodiments of the present application.
  • FIG7 shows a schematic diagram of starting recording according to some embodiments of the present application.
  • FIG8 is a schematic diagram showing an audio optimization method according to some embodiments of the present application.
  • FIGS. 9a-9e are schematic diagrams showing recorded scenes according to some embodiments of the present application.
  • FIG10a is a schematic diagram showing the composition of body movements according to some embodiments of the present application.
  • FIG10b is a schematic diagram showing a partial gesture image according to some embodiments of the present application.
  • FIG11 is a schematic diagram showing the composition of the schematic content according to some embodiments of the present application.
  • FIG12 is a schematic diagram showing the composition of an image instruction matching method according to some embodiments of the present application.
  • 13a-13b are schematic diagrams showing a process of image instruction matching according to some embodiments of the present application.
  • 13c-13e are schematic diagrams showing matching situations of image instructions according to some embodiments of the present application.
  • FIG13f shows a schematic diagram of image frame matching according to some embodiments of the present application.
  • FIG13g is a schematic diagram showing a bad pixel in an image frame according to some embodiments of the present application.
  • FIG13h is a schematic diagram showing the composition of a difference point secondary analysis method according to some embodiments of the present application.
  • FIG13i is a schematic diagram showing a neglected point in an image frame according to some embodiments of the present application.
  • FIG14a is a schematic diagram showing a composition of a local image comparison method according to some embodiments of the present application.
  • FIG14b is a schematic diagram showing an arbitrary position movement comparison method according to some embodiments of the present application.
  • FIG14c is a schematic diagram showing an arbitrary position enlargement comparison method according to some embodiments of the present application.
  • FIG14d is a schematic diagram showing a fixed position alignment method according to some embodiments of the present application.
  • FIG15a shows a schematic flow chart of a method for determining a starting time point after a range is expanded according to some embodiments of the present application
  • FIG15b is a schematic flow chart of a method for determining a starting time point after a range is expanded according to some embodiments of the present application;
  • FIG16 is a schematic diagram showing a recording control system according to some embodiments of the present application.
  • FIG17 is a schematic diagram showing an implementation of a recording control method according to some embodiments of the present application.
  • the illustrative embodiments of the present application include, but are not limited to, a video recording control method, an electronic device, and a medium.
  • the electronic device 100 in the embodiment of the present application can be called a user equipment (UE), a terminal, etc.
  • the electronic device 100 can be a tablet computer (portable android device, PAD), a personal digital assistant (personal digital assistant, PDA), a handheld device with wireless communication function, a computing device, a vehicle-mounted device or a wearable device, etc.
  • the form of the terminal device is not specifically limited in the embodiment of the present application.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, and a subscriber identification module (SIM) card interface 195, etc.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100.
  • the electronic device 100 may include more or fewer components than shown in the figure, or combine some components, or split some components, or arrange the components differently.
  • the components shown in the figure may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (AP), a modem processor, a graphics processor (GPU), an image signal processor (ISP), a controller, a video codec, a digital signal processor (DSP), a baseband processor, and/or a neural-network processing unit (NPU), etc.
  • AP application processor
  • GPU graphics processor
  • ISP image signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • Different processing units may be independent devices or integrated in one or more processors.
  • the charging management module 140 is used to receive charging input from a charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 140 may receive charging input from a wired charger through the USB interface 130.
  • the charging management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100. While the charging management module 140 is charging the battery 142, it may also power the electronic device through the power management module 141.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the display screen 194, the camera 193, and the wireless communication module 160.
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle number, battery health status (leakage, impedance), etc.
  • the power management module 141 can also be set in the processor 110.
  • the power management module 141 and the charging management module 140 can also be set in the same device.
  • the wireless communication function of the electronic device 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
  • Antenna 1 and antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve the utilization of antennas.
  • antenna 1 can be reused as a diversity antenna for a wireless local area network.
  • the antenna can be used in combination with a tuning switch.
  • the mobile communication module 150 can provide solutions for wireless communications including 2G/3G/4G/5G, etc., applied to the electronic device 100.
  • the mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), etc.
  • the mobile communication module 150 may receive electromagnetic waves from the antenna 1, and perform filtering, amplification, and other processing on the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 150 may also amplify the signal modulated by the modulation and demodulation processor, and convert it into electromagnetic waves for radiation through the antenna 1.
  • at least some of the functional modules of the mobile communication module 150 may be arranged in the processor 110.
  • at least some of the functional modules of the mobile communication module 150 may be arranged in the same device as at least some of the modules of the processor 110.
  • the wireless communication module 160 can provide wireless communication solutions including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), bluetooth (BT), global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), infrared (IR) and the like applied to the electronic device 100.
  • WLAN wireless local area networks
  • BT wireless fidelity
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication
  • IR infrared
  • the wireless communication module 160 can be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110.
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110, modulate the frequency, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2.
  • the electronic device 100 implements the display function through a GPU, a display screen 194, and an application processor.
  • the GPU is a microprocessor for image processing, which connects the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • the processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos, etc.
  • the display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diodes (QLED), etc.
  • the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music and videos can be stored in the external memory card.
  • the internal memory 121 may be used to store computer executable program codes, which may include instructions.
  • 121 may include a program storage area and a data storage area.
  • the program storage area may store an operating system, an application required for at least one function (such as a sound playback function, an image playback function, etc.), etc.
  • the data storage area may store data created during the use of the electronic device 100 (such as audio data, a phone book, etc.), etc.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, a universal flash storage (UFS), etc.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by running instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
  • the electronic device 100 can implement audio functions such as music playing and recording through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone jack 170D, and the application processor.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signals.
  • the audio module 170 can also be used to encode and decode audio signals.
  • the audio module 170 can be arranged in the processor 110, or some functional modules of the audio module 170 can be arranged in the processor 110.
  • the speaker 170A also called a "speaker" is used to convert an audio electrical signal into a sound signal.
  • the electronic device 100 can listen to music or listen to a hands-free call through the speaker 170A.
  • the receiver 170B also called a "earpiece" is used to convert audio electrical signals into sound signals.
  • the voice can be received by placing the receiver 170B close to the human ear.
  • Microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C with his mouth, and the sound signal is input into the microphone 170C.
  • the electronic device 100 can collect sound signals, reduce noise, identify the sound source, and realize directional recording functions through the microphone.
  • the number of microphones 170C can be one or more.
  • microphone 170C includes microphone A, microphone B, and microphone C
  • the arrangement can be as shown in Figure 2a
  • microphone A can be located on the top of the mobile phone
  • microphone B can be located on the bottom of the mobile phone
  • microphone C can be located on the back of the mobile phone.
  • the sound receiving range of the three microphones can be shown in Figure 2b, the main sound receiving range of microphone A is the middle and upper part, which can be used for the scene of front and rear recording; microphone B mainly receives the sound in the middle and lower part, which can be used for the scene of front and rear recording; microphone C mainly receives the sound in the rear part, which can be used for the scene of rear recording.
  • the electronic device has only one microphone 170C, the acquired single-channel audio stream content can be backed up to achieve the acquisition of multiple copies of the audio stream content. It is understood that in some embodiments, when the electronic device has multiple microphones, the electronic device can directly acquire multiple copies of the audio stream content. In some embodiments, the electronic device has multiple microphones, and multiple copies of an audio stream acquired by one of the microphones can also be used.
  • the earphone interface 170D is used to connect a wired earphone.
  • the earphone interface 170D may be the USB interface 130, or may be a 3.5 mm open mobile terminal platform (OMTP) standard interface or a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the electronic device 100 can implement a recording function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.
  • ISP is used to process the data fed back by camera 193. For example, when taking a photo, the shutter is opened, and the light is transmitted to the camera photosensitive element through the lens. The light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to ISP for processing and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on the noise, brightness, and skin color of the image. ISP can also optimize the exposure, color temperature and other parameters of the recorded scene. In some embodiments, ISP can be set in camera 193.
  • the camera 193 is used to capture still images or videos.
  • the object generates an optical image through the lens and projects it onto the photosensitive element.
  • the photosensitive element can be a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS) phototransistor.
  • CMOS complementary metal oxide semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to be converted into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • the DSP converts the digital image signal into an image signal in a standard RGB, YUV or other format.
  • the electronic device 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.
  • the number of cameras can be one or more, and the arrangement of the cameras can be various.
  • the electronic device 100 can also include any number of front cameras.
  • FIG. 2f there can be three front cameras, including front camera a, front camera b, and front camera 7. Head c.
  • the recording ranges of the three rear cameras in FIG2d can be as shown in FIG2g, where the main recording range of the top rear camera 1 is the upper middle part, the main range of the rear camera 2 is the middle part, and the main range of the rear camera 3 is the lower part.
  • the recording ranges of the three front cameras in FIG2f can be as shown in FIG2g, where the main range of the front camera a is the left part, the main range of the front camera b is the middle part, and the main range of the front camera c is the right part.
  • the video stream content can be backed up to obtain multiple copies of the video stream content.
  • the electronic device has multiple cameras, and the electronic device can directly obtain multiple copies of the video stream content.
  • the electronic device has multiple cameras, and multiple copies of one video stream obtained by one of the cameras can also be used.
  • the settings of the camera or microphone in the embodiments of the present application are all examples, and the camera or microphone can be set in any way according to actual needs.
  • the software architecture of the electronic device 100 includes, from top to bottom, an application layer, an application framework layer, a hardware abstraction layer, and a kernel layer.
  • the application layer may include a series of application packages.
  • the application package may include a camera application, wherein the camera application may include a shooting mode control module, a voice acquisition module, an image/video acquisition module, an operation control module, a voice control module, an audio-visual synchronization module, and a shooting result generation module.
  • the shooting mode control module is used to control the recording mode based on user instructions, such as video recording mode, photo taking mode, dual view mode, etc.
  • the image/video acquisition module can be used to establish an image/video resource library, and during the recording process, obtain the video or image captured by the camera, and perform the shunting control of a single-channel video stream and the multi-channel control of multiple-channel video streams.
  • the image/video acquisition module can be the acquisition module in the camera mentioned in this application, wherein the image/video acquisition module performs the shunting control of a single-channel video stream and the multi-channel control of multiple-channel video streams in a manner similar to the control of the audio stream content in the voice acquisition module, which will not be repeated here.
  • the image/video resource library may include an image/video system library, a custom image/video control library, and an image control instruction material library.
  • the image/video system library is used for system preset image instruction content and corresponding operations
  • the custom image/video control library is used to store user-defined image instruction content and corresponding operations.
  • the image control instruction material library is used to store material clips and corresponding operations corresponding to the user's image instructions, and the system preset image instruction content, user-defined image instruction content, and image instruction corresponding material clips can be used as comparison data for control instructions.
  • the image/video acquisition module can be used to obtain a single-channel video captured by a single camera or multiple video streams captured by multiple cameras. It can be understood that in the embodiment of the present application, if the electronic device has only one camera, the image/video acquisition module needs to perform logical processing on the single-channel video acquired by the single camera and perform diversion, that is, the acquired single-channel video stream content can be backed up to achieve the acquisition of multiple video stream contents. Each video stream content can be subsequently operated on as needed.
  • FIG3d the main method of diversion is shown in FIG3d, which can include comprehensive diversion and intelligent diversion: among them, comprehensive diversion refers to copying the entire video file into multiple copies, and then processing them independently; intelligent diversion is to copy the effective part into multiple copies, and the effective part is mainly the recording content with the image instruction part.
  • overall diversion can be to copy the complete video stream A recorded in real time, in which case the original video stream content A can be retained, and the copied video stream content is used for image control processing, such as marking the time period corresponding to the image instruction, and removing the time period content corresponding to the image instruction.
  • Intelligent diversion can be to copy the effective parts B1, B2 and B3 of the real-time recorded video stream B. In this case, the original video stream content B can be retained, and the copied video stream content B1, B2 and B3 are used for image control processing.
  • the image/video acquisition module does not need to copy the video stream content, and several video stream contents among the multiple video stream contents acquired by multiple cameras can be set for image processing, and several video stream contents retain the original files.
  • the electronic device has three cameras, and the acquired video stream content includes video stream content A, video stream content B, and video stream content C. Then, the video stream contents A and B can be not processed, the original files can be retained, and the video stream content C can be set for image control processing, etc.
  • the video stream content A can be not processed, the original file can be retained, the video stream content B can be marked for generating multiple material clips, and the video stream content C can be marked to obtain a cropped recording file.
  • the image/video acquisition module can be used to send the acquired video to the operation control module.
  • the voice acquisition module can be used to establish a voice resource library, and during the recording process, obtain the audio collected by the microphone, and perform shunting control of a single audio stream and multi-channel control of multiple audio streams.
  • the voice resource library can include a voice system library, a custom voice control library, and a voice control instruction material library.
  • the voice system library is used to store the system preset voice instruction content and corresponding operations.
  • the custom voice control library is used to store the user-defined voice instruction content and corresponding operations.
  • the voice control instruction material library is used to store the material clips and corresponding operations corresponding to the user's voice instructions.
  • the voice acquisition module can be used to obtain single-channel audio collected by a single microphone or multi-channel audio stream content collected by multiple microphones. It can be understood that in the embodiment of the present application, if the electronic device has only one microphone, the voice acquisition module needs to perform logical processing on the single-channel audio obtained by the single microphone and perform diversion, that is, the obtained single-channel audio stream content can be backed up to achieve the acquisition of multiple audio stream contents. Each audio stream content can be subsequently operated on as needed.
  • the main methods of diversion can include comprehensive diversion and intelligent diversion: among them, comprehensive diversion refers to copying the entire audio file into multiple copies, and then processing them independently; intelligent diversion is to copy the effective part into multiple copies, and the effective part may include voice with voice command part.
  • the operation control module can be used to identify image instructions during the recording process and control the electronic device to perform corresponding operations.
  • the operation control module mainly includes an image/video recognition module and a system operation module.
  • the image/video recognition module can be used for the fuzzy recognition and precise recognition mentioned later in the embodiments of the present application, and the system operation module is used to execute the operations corresponding to the image instructions recognized by the voice image/video recognition module.
  • the execution mode may include intelligent execution and interactive execution.
  • Intelligent execution is to directly execute the operation corresponding to the image instruction after the image instruction is recognized; interactive execution means to display inquiry information after the image instruction is recognized, or to send inquiry information through voice to confirm whether the image instruction recognition is correct, such as popping up a pop-up window "Do you need to switch the camera?"
  • the user confirmation operation is detected, such as clicking a control representing confirmation, or giving an image or voice instruction representing confirmation, it is determined that the image instruction recognition is correct and the operation corresponding to the image instruction is executed.
  • the image/video recognition module can be used to obtain the correspondence between the system preset image instruction content and the operation, and to obtain the correspondence between the user-defined image instruction content and the operation.
  • the audio and video synchronization module is used to perform video processing and audio processing, that is, to perform synchronization marking or synchronization control on the time period corresponding to the control instruction in the video stream content and the audio stream content, and obtain the marked audio stream content and video stream content.
  • video processing and audio processing that is, to perform synchronization marking or synchronization control on the time period corresponding to the control instruction in the video stream content and the audio stream content, and obtain the marked audio stream content and video stream content.
  • FIG3i There are two ways of synchronization control: timely synchronization and overall synchronization.
  • Real-time synchronization means that during the recording process, after receiving each image instruction, in addition to marking the corresponding time point of the image instruction in the video stream, the image instruction time point in the audio stream content is also synchronized.
  • Overall synchronization means that after detecting the recording end instruction, the image instruction time point in the audio stream content is synchronized.
  • the shooting result generation module is used to generate a recording file based on the marked audio stream content and video stream content when the recording end instruction is detected.
  • the shooting result generation module is shown in Figure 3j, which can be used to match the marked points, remove the recorded content of the second time interval, and generate a first recording file.
  • the first recording file can be the spliced content after cutting out the content corresponding to the second time interval in the recorded content.
  • the marked points There are two ways to match the marked points. One is to match the marked video stream content with the marked audio stream content. For example, you can compare the time points in the marked video stream content and the time points in the audio stream content to see if they are consistent. If they are consistent, it proves that the match is successful. The second is to match the marked video stream content with other unprocessed (unmarked) video stream content. For example, you can use the image frames of the start and end time points in the marked video stream content, or the overall image frames between the start and end time points, to compare with the image frames of the corresponding start and end time points in the unprocessed video stream content, or the overall image frames between the start and end time points. If the image frames are consistent, it can be considered that the match is successful.
  • the content corresponding to the second time interval marked in the audio stream content and the video stream content may be removed.
  • the shooting result generation module may also be used to generate an uncropped original recording file based on the recording content, and to generate a plurality of cropped material segment files based on the recording content, which will be described in detail below.
  • the application layer can also include applications such as gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.
  • the application framework layer provides application programming interface (API) and programming framework for the applications in the application layer.
  • API application programming interface
  • the application framework layer includes some predefined functions.
  • the application framework layer can include camera API, camera service and camera framework.
  • the machine frame may include a video processing module and an audio processing module.
  • the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.
  • the window manager is used to manage window programs.
  • the window manager can obtain the display screen size, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data and make it accessible to applications.
  • the data may include videos, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying images, etc.
  • the view system can be used to build applications.
  • a display interface can be composed of one or more views.
  • a display interface including a text notification icon can include a view for displaying text and a view for displaying images.
  • the phone manager is used to provide communication functions for electronic devices, such as the management of call status (including answering, hanging up, etc.).
  • the resource manager provides various resources for applications, such as localized strings, icons, images, layout files, video files, and so on.
  • the notification manager enables applications to display notification information in the status bar. It can be used to convey notification-type messages and can disappear automatically after a short stay without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also be a notification that appears in the system top status bar in the form of a chart or scroll bar text, such as notifications of applications running in the background, or a notification that appears on the screen in the form of a dialog window. For example, a text message is displayed in the status bar, a prompt sound is emitted, an electronic device vibrates, an indicator light flashes, etc.
  • HAL Hardware Abstraction Layer It is an abstraction and encapsulation of hardware devices, providing a unified access interface for Android or Hongmeng system on different hardware devices. Different hardware manufacturers follow the HAL standard to implement their own hardware control logic, but developers do not need to care about the differences between different hardware devices, they only need to access the hardware according to the standard interface provided by HAL.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display driver, camera driver, audio driver, sensor driver, camera driver, and microphone (MIC) driver.
  • the MIC driver is used to drive the microphone in the hardware to obtain audio.
  • the camera driver is used to drive the camera to process image signals and obtain video.
  • a video recording control method is provided in an embodiment of the present application, which is used in an electronic device.
  • the method includes: obtaining the recorded content during the recording process, which may include video stream content, or video stream content and audio stream content, identifying the image instructions in the video stream content, and when it is determined that the image instructions in the video stream content match the image instruction content (or characteristic behavior) corresponding to the target operation stored in the electronic device, executing the image instructions, and marking the second time interval in the recorded content based on the first time interval where the image instructions are located, obtaining the marked recorded content, and when the end instruction is obtained, obtaining the first recorded file based on the marked recorded content.
  • the first recorded file can be the spliced content after deleting the content corresponding to the second time interval in the recorded content.
  • the image instruction content corresponding to the target operation may include schematic content.
  • the schematic content may include specific text information, and the specific text information may include text content that is consistent with the name of the operation or the key information.
  • the schematic content corresponding to the switch lens operation may be the text "switch lens".
  • the specific text information may also include fixed mode content set by the user.
  • the schematic content corresponding to the switch lens operation may be various forms of images corresponding to the number "1".
  • the schematic content may also include specific image information, and the specific image information may be an identifier representing a specific meaning.
  • the schematic content corresponding to the lens switching operation may be an identifier representing lens switching.
  • the image commands may not exist in the final recorded content, which is convenient for remote control in any way during the recording process, thereby facilitating the recording of scenes such as single-person recording and live broadcast, and improving the user experience.
  • the commands for recording control include commands in multiple forms, which is convenient for users to control the recording process in different scenes using different command forms.
  • FIG5 shows a flow chart of a video recording control method in the embodiment of the present application, wherein the video recording control method can be executed by the electronic device.
  • the video recording control method may include:
  • a recording start command is detected and video recording is started.
  • the user's recording start instruction may refer to the recording start instruction triggered by the user clicking the recording start control in the electronic device, or it may be a remote control instruction such as a voice instruction or image instruction corresponding to the start recording issued by the user.
  • the electronic device 100 when the user clicks the recording control 001 in the camera application in the electronic device 100, the electronic device 100 can detect the user's recording start instruction and perform video recording. It can be understood that in some embodiments, the electronic device can also start recording when the user clicks the recording control 001 in other applications, such as a chat application, or triggers the start recording function in other ways. For another example, as shown in FIG7 , when the user displays the "fist" image corresponding to the recording start operation, the electronic device can also detect the user's recording start instruction and perform video recording.
  • the recording start instruction in the embodiment of the present application can be any instruction that can trigger the start of recording.
  • the image acquisition module can control the electronic device to perform video recording when a recording start instruction is detected.
  • the electronic device can perform recording through a camera to obtain video stream content, or can perform recording through a camera and a microphone together to obtain video stream content and audio stream content respectively.
  • the electronic device can detect and recognize image instructions input by the user through the camera in the video stream content.
  • the electronic device can recognize the image command by detecting whether the image command in the recording process matches the image command content (or characteristic behavior) corresponding to the target operation stored in the electronic device, and can execute the target operation when the image command is recognized to control the video recording.
  • the method of identifying and matching the image command is described in detail later.
  • the image instruction content corresponding to the target operation may include schematic content, and the schematic content may include specific text information.
  • the specific text information may be understood as text content that is consistent with the name of the operation or the key information.
  • the schematic content corresponding to the switch lens operation may be the text "switch lens".
  • the specific text information may also include fixed mode content customized by the user.
  • the schematic content corresponding to the switch lens operation may be various forms of images corresponding to the number "1".
  • the schematic content may also include specific image information, and the specific image information may be an identifier representing a specific meaning.
  • the schematic content corresponding to the lens switching operation may be an identifier representing lens switching.
  • the image instruction content corresponding to the target operation may also include specific body information or specific facial information, wherein the specific body information may include specific gesture information.
  • the above target operation may include any operation such as switching cameras, adjusting focal length, etc.
  • the image instruction content corresponding to the camera switching operation may also include the rotation of the recorded image caused by turning the direction of the electronic device in the video stream content acquired by the camera.
  • the shooting direction of the mobile phone camera is the first shooting direction
  • the user wants to take a selfie that is, wants to switch to the front camera
  • the user can rotate the mobile phone to a first set angle to achieve the rotation of the recorded image, for example, the mobile phone is turned from the first shooting direction to the second shooting direction that has a first set angle with the first shooting direction.
  • the mobile phone When the mobile phone detects that the recorded image of the rear camera rotates due to the mobile phone being turned from the first shooting direction to the second shooting direction that has a first set angle with the first shooting direction, it is determined that an image instruction to switch the camera is detected, and at this time, the mobile phone is controlled to switch the shooting mode of the rear camera to the shooting mode of the front camera.
  • the user can also rotate the mobile phone to the first set angle to achieve the rotation of the recorded picture, for example, the mobile phone is turned from the third shooting direction to the fourth shooting direction that has the first set angle with the third shooting direction.
  • the mobile phone When the mobile phone detects that the recorded picture of the front camera rotates due to the rotation of the mobile phone from the third shooting direction to the fourth shooting direction that has the first set angle with the third shooting direction, it is determined that the image instruction for switching the camera is detected, and at this time, the mobile phone is controlled to switch the shooting mode of the front camera to the shooting mode of the rear camera.
  • the electronic device may perform fuzzy recognition on the image command, that is, recognize the approximate content or key information of the image, for example, when the image command contains a key part of the text content that is consistent with the stored image command content, or the image similarity reaches a set similarity, it is determined that the image command is recognized, and the operation corresponding to the image command is obtained.
  • the electronic device may recognize the image accurately, that is, when the image instruction is completely consistent with the stored image instruction content, it is determined that the image instruction is recognized and the operation corresponding to the image instruction is obtained.
  • the electronic device will only record the "switch lens” text when it is recorded.
  • the electronic device confirms that a "switch lens” image instruction is detected by using text or a two-finger close-together gesture, and executes the operation corresponding to the "switch lens” image instruction to switch the lens.
  • the electronic device may also display an inquiry message after obtaining the recognition result, or send an inquiry message through voice, so as to confirm whether the image command recognition is correct, such as popping up a pop-up window "Do you need to switch the camera?"
  • a user confirmation operation is detected, such as clicking a control representing confirmation, or giving an image command representing confirmation, etc., it is determined that the image command recognition is correct, and the operation corresponding to the image command is executed.
  • the recognition method when the recognition method is fuzzy recognition, there may be a situation where the recognized image instruction corresponds to multiple operations. At this time, a query message may be displayed to confirm the user's intention. If the user does not make a selection within a set time, the first operation set by the system may be selected by default for execution.
  • the image/video acquisition module can be used to detect image instructions during the recording process.
  • the electronic device after detecting the image instruction, the electronic device can execute the operation corresponding to the image instruction.
  • the method of marking the second time interval in the recorded content based on the first time interval in which the image instruction is located may include:
  • the electronic device can directly use the start time point and end time point of the first time interval in which the detected image instruction is located as the start time point and end time point of the second time interval in the video stream content, and mark the start time point and end time point corresponding to the second time interval in the video stream content.
  • the method of marking the second time interval in the recorded content based on the first time interval in which the image instruction is located may include:
  • the electronic device can directly use the start time point and end time point of the first time interval in which the detected image instruction is located as the start time point and end time point of the second time interval in the audio stream content and the video stream content, and mark the start time point and end time point corresponding to the second time interval in the audio stream content and the video stream content.
  • the start and end times of the first time interval in which the image instructions in the video stream content and the audio stream content are located can be optimized to obtain the optimized start and end times, and the optimized start and end times can be used as the start and end times of the second time interval.
  • the optimized start time point and end time point can be obtained by the audio optimization method, that is, the start time point and the end time point of the first time interval where the image instruction is located in the audio stream content are compared to expand the marking range, and then the start time point and the end time point after the expanded marking range are determined (that is, the start time point and the end time point corresponding to the third time interval are determined), and the start time point and the end time point after the expanded marking range are transparently transmitted to the video stream content for reverse comparison to determine whether there is an obvious conflict in the video, that is, the similarity of any two adjacent frames in each image frame of the expanded part of the video stream content (the interval in the third time interval except the first time interval) is compared to see whether they are both greater than the fifth threshold.
  • the start time point and the end time point after the expanded marking range are used as the start time point and the end time point of the second time interval. If there is less than or equal to the fifth threshold, the start time point and the end time point of the first time interval where the image instruction is located are used as the start time point and the end time point of the second time interval in the audio stream and video stream content.
  • a first recording file is generated based on the marked recording content.
  • the content corresponding to the second time interval in the marked video stream content and the marked audio stream content can be cropped, and the remaining content after cropping can be spliced to generate a first recording file.
  • a complete original recording file can also be generated based on the unmarked recording content.
  • a second recording file marked with the start time point and the end time point of the second time interval can also be generated to facilitate subsequent processing.
  • multiple material clips i.e., a third recording file set
  • the multiple material clips may include image instruction clips, i.e., the first type of recording clip files; and may also include recording content clips, i.e., the second type of recording clip files.
  • the image instruction clips include the recorded content corresponding to the time period corresponding to the image instruction, and the recording content clips are the remaining clip contents in the recorded video after cutting out the time period corresponding to the image instruction.
  • the electronic device can store image instruction-like segments and corresponding operations in an image instruction material library for subsequent image instruction recognition and matching.
  • the electronic device can store recorded content-like segments, which can facilitate users to view them separately or perform editing processing such as splicing and synthesis.
  • the start time point and the end time point in the marked audio stream content and the video stream content can also be matched before cutting.
  • the marked video stream content can be matched with the marked audio stream content or the unmarked video stream content.
  • the matching method may include: comparing the time points in the marked video stream content and the time points in the audio stream content to see if they are consistent, and if they are consistent, the match is successful.
  • the image frames at the start and end time points in the marked video stream content are compared with the image frames at the corresponding start and end time points in the unprocessed video stream content, and the entire image frame between the start and end time points in the marked video stream content is compared with the entire image frame between the start and end time points in the unprocessed video stream content. If the image frames are consistent, the match is considered successful.
  • the image instructions for recording control can include multiple forms of image instructions, which is convenient for users to use different instruction forms to control the recording process in different scenes.
  • FIG9a the user turns on the rear mode for recording, and FIG9a shows the recording screen of the rear camera.
  • FIG9b the user shows the text content of "switch camera” corresponding to the camera switching operation, for example, a paper with the text of "switch camera”, and the electronic device records the image of the text of "switch camera", then performs the corresponding camera switching operation, for example, converting the rear recording mode to the front recording mode.
  • the time period corresponding to the image instruction of "switch camera” in the video stream content and the audio stream content is 00:03-00:05
  • the 00:03-00:05 time period in one of the video stream content and one of the audio stream content is marked.
  • FIG9c shows the screen recorded in the front recording mode after executing the operation corresponding to the image instruction.
  • the time period corresponding to the image instruction in the video stream content and the audio stream content mentioned in this scenario can refer to the time period corresponding to the optimized second time interval.
  • the user shows the "ok" gesture action corresponding to the stop recording operation
  • the electronic device detects the image instruction corresponding to the "stop recording”
  • the electronic device detects the image instruction corresponding to the "stop recording”
  • the electronic device detects the image instruction corresponding to the "stop recording”
  • the electronic device detects the image instruction corresponding to the "stop recording”
  • the electronic device detects the image instruction corresponding to the "stop recording”
  • the corresponding operation such as stopping recording
  • an image/video resource library may be stored in the electronic device, and the image/video resource library may include an image/video system library and a custom image/video control library.
  • the image/video system library is used for system preset image instruction content and corresponding operations
  • the custom image/video control library is used to store user-defined image instruction content and corresponding operations
  • each image/video resource library may limit the recognition mode to fuzzy recognition and precise recognition.
  • the image instruction content corresponding to the operation in the embodiment of the present application can be any executable content customized by the user or preset by the system. It can be understood that the image instruction content referred to in the embodiment of the present application is not limited to a static image or a frame of image, but can also be a continuous multi-frame image instruction content within a period of time.
  • the image instruction content corresponding to the operation may include body movements, as shown in FIG10a, and the body movements may include specific body information and specific facial information.
  • the specific body information may include specific gesture information
  • the specific gesture information may be a specific gesture image, as shown in FIG10b
  • the specific gesture image may include gesture images such as [OK], [Like], [Yeah], [Love], [Fist], and [Love You].
  • Different gesture images may correspond to different operations, for example, [OK] represents "stop recording", and [Like] represents "zoom in focus", etc.
  • the facial information may be a facial image, wherein the facial image may include facial images such as [Nod], [Shake Head], and [Smiley Face].
  • Different facial images may correspond to different operations, for example, the operation corresponding to [Smiley Face] may be "snap shot”, the operation corresponding to [Nod] may be “stop recording”, and the operation corresponding to [Shake Head] may be "switch lens”, etc.
  • specific body information may include specific gesture information, and the specific gesture information may be a specific gesture image, as shown in FIG10b, and the specific gesture image may include gesture images such as [OK], [Like], [Fist], and [Love You].
  • Different gesture images may correspond to different operations, for example, [OK] represents “stop recording", and [Like] represents "zoom in focus", etc.
  • the body information may also include specific human posture information.
  • the specific human posture information may include posture images such as [turning in circles], [squatting], [large character posture], and [jumping]. Different posture images may correspond to different operations. For example, the operation corresponding to [turning in circles] may be "switch camera”, and the operation corresponding to [jumping] may be "enlarge focal length”, etc.
  • the image instruction content corresponding to the operation may also include schematic content, wherein:
  • the schematic content may include specific text information and specific image information.
  • the specific text information may include text content that is consistent with the name of the operation or with the key information.
  • the schematic content corresponding to the switch lens operation may be the text "switch lens".
  • the specific text information may also include fixed mode content that is customized by the user.
  • the schematic content corresponding to the switch lens operation may be various forms of images corresponding to the number "1".
  • the specific image information may be a logo indicating a specific meaning.
  • the schematic content corresponding to the lens switching operation may be a logo indicating lens switching.
  • Table 1 shows a correspondence table between some image instruction contents and operations in an embodiment of the present application.
  • the image instruction content corresponding to the lens switching operation stored in one of the image/video resource libraries in the electronic device can be the text "switch lens" or a two-finger gesture
  • the recognition method can be fuzzy recognition (wherein, in the option column in Table 1, whether there is a one-to-one correspondence is recorded. If yes is selected, the recognition method is limited to precise recognition, and if no is selected, the recognition method is limited to fuzzy recognition).
  • the image instruction content corresponding to the zoom focus operation can be the text "zoom focus” or a five-finger spread gesture, and the recognition method can be fuzzy recognition.
  • the image instruction content corresponding to the switch to the front camera lens operation can be the text "switch to front" or a check mark gesture, and the recognition method can be fuzzy recognition.
  • the matching method may include: an image frame comparison method and an image local comparison method.
  • file A represents the video stream content recorded in real time
  • file B represents the video stream content used for image control processing; it can be understood that file A can be kept without marking or other processing for subsequent comparison or other needs, and file B can be used for marking or cropping the time points corresponding to the control instructions.
  • image 1 in file B can represent the image instruction corresponding to the identified "switch camera" operation
  • image 2 can represent the image instruction corresponding to the "stop recording” operation.
  • Image 1 has a starting time point and an ending time point
  • image 2 also has a starting time point and an ending time point.
  • the process of image instruction recognition and matching is shown in Figure 13b: the electronic device will compare each image frame in the B file with each image frame in the image instruction content (or called feature behavior) corresponding to the operation in the image resource library.
  • image instruction content or called feature behavior
  • the recognition is confirmed to be successful.
  • the first image frame in the real-time captured image that can match any image instruction content in the resource library, such as the first image frame can be marked as the starting time point of the first time interval where the image instruction is located, and the last image frame that can match the image instruction content in the resource library, such as the second image frame, can be marked as the ending time point of the first time interval.
  • the image frames are completely consistent that is, the image frames between the starting time point and the ending time point are consistent, it can be proved that the recognition is successful.
  • the recognition is successful, further recognition can be performed in file A.
  • the marked start time point and end time point can be transparently transmitted to file A, that is, the consistent start time point and end time point can be determined in file A, and the corresponding start time point and end time point in file A can be transferred to file A.
  • the image frames between the end time points are compared with the image frames of the corresponding image instruction content in the resource library. If they match, it proves that the image recognition is successful. If they are inconsistent, it proves that the recognition has failed.
  • the recognition is successful only when both recognitions are successful. As long as there is one recognition failure, it can be considered that the image instruction recognition this time has failed. After the recognition fails, all the marks generated in this recognition process need to be cleared. In this way, the accuracy of image recognition can be effectively improved through the above-mentioned circular two-way recognition method.
  • each pixel of the image frame in the B file is compared with all the pixels of the corresponding image frame in the corresponding image instruction content in the image resource library.
  • the reason for the image matching result being mismatched may be the presence of bad pixels (or noise points) as shown in Figure 13g, rather than the image instruction mismatch in the actual sense, resulting in an erroneous matching result.
  • an embodiment of the present application provides a difference point secondary analysis method for comparing the above image frames, as shown in FIG13h , the difference point secondary analysis method includes a bad point removal method and a difference point ignoring method.
  • the bad pixel removal method refers to first analyzing whether there are bad pixels in the image before comparing two frames of images. When bad pixels exist, the bad pixels are removed before comparing the image frames.
  • a method for determining whether there are bad pixels may include: determining whether there is an obvious jump between any pixel and surrounding pixels. For example, it may be determined whether the display parameter difference (such as hue value) between each pixel and adjacent pixels is higher than a first threshold. If it is higher than the first threshold, it is determined that there is an obvious jump between the pixel and surrounding pixels, and the pixel is determined to be a bad pixel.
  • the display parameter difference such as hue value
  • the method of determining whether there is a bad pixel may include: comparing the similarity between each pixel and a bad pixel in a theoretical sense, and when the similarity exceeds a third threshold (eg, 95%), the pixel may be considered to be a bad pixel.
  • a third threshold eg, 95%)
  • the difference point ignoring method means that when the similarity obtained after comparing two frames of images is greater than the fourth threshold value (for example, 99%), and the position difference between any two unmatched pixel points in the two frames of images is greater than the second threshold value, then these different pixel points can be considered as insignificant ignored points, and no matching is required.
  • the two frames of images are determined to be matched images. For example, as shown in FIG. 13i, the two different pixel points in the two frames of images are far apart, and the position difference is greater than the second threshold value, then both pixel points can be considered as insignificant ignored points.
  • the local image comparison method mentioned in the embodiment of the present application is described below. It can be understood that the local comparison method in the embodiment of the present application is basically the same as the above-mentioned image frame comparison method, the difference is that in the above-mentioned image frame comparison method, the electronic device will compare each complete image frame in the B file with each complete image frame in the image resource library, while in the local comparison method, a specific part of the image frame in the B file is compared with a specific part of the image frame in the image resource library.
  • the local image comparison method may include an arbitrary position moving comparison method, an arbitrary position enlarging comparison method, and a fixed position comparison method.
  • Figure 14b shows a schematic diagram of an arbitrary position moving comparison method in an embodiment of the present application.
  • the arbitrary position moving comparison method is to move and traverse multiple image blocks in each frame of the image according to a certain trajectory or a certain switching order (for example, traversing from left to right in sequence).
  • the second image block in the upper left corner of the image frame can be matched with the image block of the corresponding image frame in the image resource library. If the match is unsuccessful, the first image block adjacent to the right side of the second image block is matched with the image block of the corresponding image frame in the image resource library.
  • the third image block adjacent to the right side of the first image block is matched with the image block of the corresponding image frame in the image resource library.
  • any image block in the image frame in the video stream content used for image control processing is successfully matched with the image block of the corresponding image frame in the image resource library, it proves that the frame image matches successfully.
  • the traversal is completed, there is still no image block that can be matched, which proves that the frame image matches unsuccessfully.
  • the center position enlargement comparison method is to use the first center image block of the set specification of the center position of the image frame as a reference, gradually expand the range of the image block, and perform comparison and matching. Specifically, first, the first central image block of the current image frame to be matched in the video stream content for image control processing (the image block in the leftmost picture in FIG. 14c) is compared with the image block of the corresponding image frame in the image resource library.
  • the match is successful, it proves that the current image frame to be matched in the video stream content and the corresponding image frame in the image resource library are matched successfully; if it is unsuccessful, the first central image block is expanded by a set range, for example, the second central image block (the image block in the middle picture in FIG. 14c) is obtained, and the second central image block is matched.
  • the matching is terminated, and it is determined that the current image frame to be matched in the video stream content and the corresponding image frame in the image resource library are matched successfully; when the second central image block and the corresponding image block in the image resource library are matched unsuccessfully, the second central image block is expanded by a set range, for example, the third central image block (the image block in the rightmost picture in FIG. 14c) is obtained, and the matching is continued. If the range is expanded to the entire image and the match is still unsuccessful, it is determined that the current image frame to be matched in the video stream content and the corresponding image frame in the image resource library are matched unsuccessfully. It can be understood that the second central image The block is larger than the first central image block and includes the first central image block. The second central image block extends in all directions based on the central image block.
  • Figure 14d shows a schematic diagram of a fixed position comparison method in an embodiment of the present application, wherein the fixed position comparison method is to set an image block at a fixed position, and the fixed position is set by an application or a user.
  • the image block at the fixed position may be an image block at a middle position or an image block at a corner position.
  • the number of image blocks at the fixed position may be one or more, and the present application does not limit this.
  • the local comparison method is used for comparison, which can improve the matching efficiency while ensuring the matching accuracy.
  • the marking of the start time point and the end time point of the image instruction can be marked in any feasible form in addition to the above-mentioned method of [Image 1 Start].
  • the marking of the start time point and the end time point of the image instruction can be as shown in Table 2:
  • the real image subscript method can be used, for example, when the image instruction is a switching lens text image, the start time point of the switching lens text image instruction is subscripted with 0, and the end time point is subscripted with 1;
  • the real image pairing method can also be used, for example, when the image instruction content is to enlarge the focal length value, a mark can be subscripted at the start time point of the enlarged focal length value image instruction, and the same mark can be subscripted at the end time point, so that the mark that appears first defaults to the start time point, and the mark that appears later defaults to the end time point;
  • the operation subscript method can also be used, for example, for the switching lens image instruction, the start time point corresponding to the actual operation corresponding to the switching lens image instruction can be subscripted with 0, and the end time point can be subscripted with 1;
  • the operation pairing marking method can also be used
  • Table 2 Image instruction marking table
  • FIG15a shows a method for determining the start time point after the expanded range.
  • the method for determining the start time point after the expanded range may include:
  • go to 1504 use the time point corresponding to the (n+1)th frame waveform in the audio stream content as the corresponding starting time point after the image instruction in the audio stream content is expanded.
  • the starting time point or when it is determined that the similarity between one frame waveform and the waveform corresponding to the first starting time point is greater than the sixth threshold, but the number of frames currently compared has reached the set first number, the time point corresponding to the frame waveform is used as the starting time point after the image instruction is expanded.
  • the time point corresponding to the waveform of the n+1th frame in the audio stream content is used as the corresponding starting time point after the image instruction in the audio stream content is expanded.
  • the time point corresponding to the n-th frame waveform in the audio stream content is used as the starting time point after the range is expanded.
  • FIG. 15b shows another method for determining the start time point after the range is expanded in an embodiment of the present application.
  • the method for determining the start time point after the range is expanded may include:
  • go to 1604 use the time point corresponding to the (n+1)th frame waveform in the audio stream content as the corresponding starting time point after the image instruction in the audio stream content is expanded.
  • the similarity between the previous frame waveform of the first starting time point and the waveform corresponding to the first starting time point, the similarity between the second frame waveform before the first starting time point and the previous frame waveform of the first starting time point, the similarity between the third frame waveform before the first starting time point and the second frame waveform before the first starting time point, and other similarities between each frame waveform and the next frame waveform of each frame waveform is greater than a sixth threshold, until it is determined that the similarity between one of the frame waveforms and the next frame waveform of the frame waveform is less than or equal to the sixth threshold, and the time point corresponding to the next frame waveform of the frame waveform is used as the starting time point after the expansion range of the image instruction, or when it is determined that the similarity between one of the frame waveforms and the next frame waveform of the frame waveform is greater than the sixth threshold, but the number of frames currently compared has reached the set first number, the time point corresponding to the frame
  • the time point corresponding to the waveform of the n+1th frame in the audio stream content is used as the corresponding starting time point after the image instruction in the audio stream content is expanded.
  • the time point corresponding to the n-th frame waveform in the audio stream content is used as the starting time point after the range is expanded.
  • the method for determining the end time point after the expansion of the image instruction can be similar to the method for determining the start time point, except that the end time point is determined by comparing the waveform after the first end time point with the waveform corresponding to the first end time point.
  • different recording files can be generated based on the recorded content.
  • a second time interval can be marked in the recorded content based on the first time interval where the image instruction is located, so as to crop the content corresponding to the second time interval and generate a first recording file that does not include the image instruction.
  • the first recording file is the spliced content after deleting the recording content corresponding to the second time interval from the recorded content.
  • a complete recording file i.e., a second recording file
  • the second time interval can also be marked in the recorded content based on the first time interval where the image instruction is located, so as to generate multiple material clips, i.e., a third recording file set.
  • the multiple material clips may include image instruction clips, i.e., first type of recording clip files; and may also include recording content clips, i.e., second type of recording clip files.
  • the image instruction clips include the recording content corresponding to the second time interval, and the recording content clips are the remaining clip contents after cutting out the corresponding recording content of the second time interval from the recording content.
  • the electronic device can store the image instruction-like segments and corresponding operations in the corresponding control instruction material library for subsequent image instruction recognition and matching.
  • the electronic device can store the recorded content-like segments, which can facilitate users to view them separately or perform editing processing such as splicing and synthesis.
  • obtaining multiple material clips includes: when the recorded content is video stream content, splitting the video stream content based on time points marked in the video stream content to obtain multiple video stream material clips; when the recorded content includes audio stream content and video stream content, splitting the audio stream content and/or video stream content based on time points marked in the audio stream content to obtain multiple audio stream material clips and/or video stream material clips, or splitting the video stream content and/or audio stream content based on time points marked in the video stream content to obtain multiple video stream material clips and/or audio stream material clips, and generating corresponding recording clips based on the correspondence between the corresponding audio stream material clips and the video stream material clips.
  • the electronic device when it detects an image instruction of "switch camera” during the recording process and determines the start time point and end time point corresponding to the image instruction of "switch camera” in the video stream content and the audio stream content, it can generate a first recording segment based on the audio stream content and the video stream content before the start time point of the "switch camera” image instruction, generate a second recording segment based on the audio stream content and the video stream content after the end time point, and generate a third recording segment based on the audio stream content and the video stream content in the time period corresponding to the "switch camera” image instruction.
  • the embodiment of the present application also provides a recording control system, as shown in FIG16 , which may include:
  • An image acquisition module in response to a first operation of starting recording by a user, performs recording at least through a camera, and the recorded content at least includes video stream content;
  • An image control module which can be used to identify image instructions input by the user through the camera in the video stream content, and the image instructions are used to control the recording;
  • the audio-visual synchronization module is used to mark the second time interval in the recorded content at the first time interval where the image instruction in the recorded content is located; and obtain the marked recorded content.
  • the shooting result generating module is used to generate at least a first recording file based on the recording content when a recording end instruction is detected.
  • the recording control method provided in the embodiment of the present application can be directly developed and implemented on the application side, or it can be constructed separately in the form of capability integration.
  • the capability integrated on the application side in the electronic device system can be provided in the form of AAR and JAR packages, which can be updated regardless of the version update, or it can be provided in the form of binary capability packages to the capabilities of all components in the electronic device system, which can be updated regardless of the version update.
  • the capabilities can also be provided to all components in the electronic device system through the interface of the framework layer of the version in the electronic device system, which can be updated with the system upgrade.
  • the various embodiments disclosed in the present application may be implemented in hardware, software, firmware, or a combination of these implementation methods.
  • the embodiments of the present application may be implemented as a computer program or program code executed on a programmable system, the programmable system comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
  • Program code can be applied to input instructions to perform the functions described in this application and generate output information.
  • the output information can be applied to one or more output devices in a known manner.
  • a processing system includes any system having a processor such as, for example, a digital signal processor (DSP), a microcontroller, an application specific integrated circuit (ASIC), or a microprocessor.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • Program code can be implemented with high-level programming language or object-oriented programming language to communicate with the processing system.
  • program code can also be implemented with assembly language or machine language.
  • the mechanism described in this application is not limited to the scope of any specific programming language. In either case, the language can be a compiled language or an interpreted language.
  • the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof.
  • the disclosed embodiments may also be implemented as instructions carried or stored on one or more temporary or non-temporary machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors.
  • instructions may be distributed over a network or through other computer-readable media.
  • machine-readable media may include any mechanism for storing or transmitting information in a machine (e.g., computer) readable form, including, but not limited to, floppy disks, optical disks, optical disks, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), random access memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or a tangible machine-readable memory for transmitting information (e.g., carrier waves, infrared signals, digital signals, etc.) using the Internet in electrical, optical, acoustic, or other forms of propagation signals. Therefore, machine-readable media include any type of machine-readable media suitable for storing or transmitting electronic instructions or information in a machine (e.g., computer) readable form.
  • a machine-readable media include any type of machine-readable media suitable for storing or transmitting electronic instructions or information in a machine
  • a logical unit/module can be a physical unit/module, or a part of a physical unit/module, or can be implemented as a combination of multiple physical units/modules.
  • the physical implementation method of these logical units/modules themselves is not the most important.
  • the combination of functions implemented by these logical units/modules is the key to solving the technical problems proposed by the present application.
  • the above-mentioned device embodiments of the present application do not introduce units/modules that are not closely related to solving the technical problems proposed by the present application, which does not mean that there are no other units/modules in the above-mentioned device embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Television Signal Processing For Recording (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)

Abstract

The present application relates to the technical field of recording. Disclosed are a video-recording control method, an electronic device and a medium. The method comprises: in response to a first operation performed by a user for starting recording, executing recording at least by means of a camera, wherein recorded content comprises at least video stream content; and recognizing, from the video stream content, an image instruction input by the user by means of the camera, wherein the image instruction is used for implementing recording control. The step of recognizing, from the video stream content, an image instruction input by the user by means of the camera comprises: recognizing at least a first image block of at least one image frame in the video stream content; recognizing, from the first image block, an image instruction that matches a feature behavior; and determining, according to the at least one image frame where the image instruction is located, a first time interval where the image instruction is located. On the basis of the solution, an image instruction can be recognized during recording, and recording control can be performed during recording by means of the image instruction, thereby meeting control requirements of a video-recording scenario.

Description

一种视频录制控制方法、电子设备及介质Video recording control method, electronic device and medium
本申请要求于2022年10月11日提交中国专利局、申请号为202211243144.0、申请名称为“一种视频录制控制方法、电子设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to a Chinese patent application filed with the China Patent Office on October 11, 2022, with application number 202211243144.0 and application name “A video recording control method, electronic device and medium”, the entire contents of which are incorporated by reference in this application.
技术领域Technical Field
本申请涉及录制技术领域,特别涉及一种视频录制控制方法、电子设备及介质。The present application relates to the field of recording technology, and in particular to a video recording control method, electronic equipment and medium.
背景技术Background technique
目前,视频录制的应用场景越来越多,例如录制短视频等,而对视频录制进行控制的方式,一般是用户点击录制界面上的按钮等控件来实现,操作麻烦,用户体验不好。现有技术中虽然存在肢体动作来实现电子设备控制的方式,但是在视频录制的应用场景中,由于肢体动作会被录入到录制内容中,而无法实现控制的效果,此外,目前视频录制实现控制的形式较为单一,难以满足视频录制场景中的控制需求。At present, there are more and more application scenarios for video recording, such as recording short videos, and the way to control video recording is generally to click buttons and other controls on the recording interface, which is cumbersome to operate and has a poor user experience. Although there are methods of using body movements to control electronic devices in the prior art, in the application scenario of video recording, the body movements will be recorded in the recorded content, and the control effect cannot be achieved. In addition, the current form of video recording control is relatively single, which is difficult to meet the control needs in the video recording scenario.
发明内容Summary of the invention
为解决上述问题,本申请提供一种视频录制控制方法、电子设备及介质。To solve the above problems, the present application provides a video recording control method, electronic device and medium.
第一方面,本申请提供一种视频录制控制方法,应用于电子设备,方法包括:响应于用户开始录制的第一操作,至少通过摄像头执行录制,录制内容至少包括视频流内容;识别视频流内容中用户通过摄像头输入的图像指令,图像指令用于实现对录制的控制;其中,识别视频流内容中用户通过摄像头输入的图像指令包括:对视频流内容中的至少一个图像帧的至少第一图像块进行识别,识别第一图像块中与特征行为匹配的图像指令,并根据图像指令所在的至少一个图像帧确定图像指令所在的第一时间区间。In a first aspect, the present application provides a video recording control method, which is applied to an electronic device, and the method includes: in response to a first operation of a user to start recording, performing recording at least through a camera, and the recorded content includes at least video stream content; identifying image instructions input by the user through the camera in the video stream content, and the image instructions are used to achieve control over the recording; wherein, identifying image instructions input by the user through the camera in the video stream content includes: identifying at least a first image block of at least one image frame in the video stream content, identifying an image instruction that matches a characteristic behavior in the first image block, and determining a first time interval where the image instruction is located based on at least one image frame where the image instruction is located.
基于上述方案,可以确定出图像指令所在的第一时间区间,如此便于基于图像指令所在的第一时间区间确定出包括第一时间区间的第二时间区间,并获取录制内容中删除掉第二时间区间后对应的拼接内容,实现在录制过程通过图像指令进行录制控制,满足视频录制中的控制需求,且可以获取到不存在图像指令的第一录制文件,从而便于进行单人录制、直播等场景的录制,提升用户体验。此外,用于录制控制的图像指令可以包括多种形式的图像指令,便于用户在不同的场景采用不同的指令形式进行录制过程的控制。进一步的,可以通过对视频流内容中的图像帧的部分图像块进行识别的方式实现对图像指令的识别,有效提高识别效率。Based on the above scheme, the first time interval where the image instruction is located can be determined, so that it is convenient to determine the second time interval including the first time interval based on the first time interval where the image instruction is located, and obtain the corresponding splicing content after deleting the second time interval in the recorded content, so as to realize recording control through image instructions during the recording process, meet the control requirements in video recording, and obtain the first recording file without image instructions, so as to facilitate the recording of scenes such as single-person recording and live broadcast, and improve the user experience. In addition, the image instructions used for recording control can include image instructions in multiple forms, so that users can use different instruction forms to control the recording process in different scenes. Furthermore, the recognition of image instructions can be realized by recognizing partial image blocks of image frames in the video stream content, effectively improving the recognition efficiency.
本申请中,特征行为可以为电子设备存储的各操作对应的图像指令内容,用于与录制过程中用户输入的图像进行匹配,以识别录制过程中用户输入的图像指令。In the present application, the characteristic behavior may be the image instruction content corresponding to each operation stored in the electronic device, which is used to match the image input by the user during the recording process to identify the image instruction input by the user during the recording process.
本申请中,可以对视频流内容中图像帧的部分图像块进行识别,也可以对视频流内容中图像帧的全部图像块进行识别。图像块可以为设定尺寸的图像区域。In the present application, some image blocks of an image frame in the video stream content may be identified, or all image blocks of an image frame in the video stream content may be identified. An image block may be an image region of a set size.
本申请中,当视频流内容中存在图像帧的至少一个图像块与特征行为中的对应图像帧的图像块匹配,则可以确定视频流内容中的该图像帧与特征行为中的对应图像帧匹配。In the present application, when at least one image block of an image frame in the video stream content matches an image block of a corresponding image frame in a characteristic behavior, it can be determined that the image frame in the video stream content matches the corresponding image frame in the characteristic behavior.
当视频流内容中存在与特征行为对应的各图像帧完全匹配的连续图像帧时,确定识别到视频流内容中的图像指令。When there are continuous image frames in the video stream content that completely match the image frames corresponding to the characteristic behavior, it is determined that an image instruction in the video stream content is recognized.
在一种可能的实现中,根据图像指令所在的至少一个图像帧确定图像指令所在的第一时间区间包括:至少一个图像帧包括第一图像帧及第二图像帧,第一图像帧为图像指令在第一图像块中与特征行为匹配的第一个图像帧,第二图像帧为图像指令在第一图像块中与特征行为匹配的最后一个图像帧,第一时间区间的开始时间为第一图像帧所在的时刻,第一时间区间的结束时间为第二图像帧所在的时刻。In a possible implementation, determining the first time interval where the image instruction is located based on at least one image frame where the image instruction is located includes: at least one image frame includes a first image frame and a second image frame, the first image frame is the first image frame where the image instruction matches the characteristic behavior in the first image block, the second image frame is the last image frame where the image instruction matches the characteristic behavior in the first image block, the start time of the first time interval is the moment where the first image frame is located, and the end time of the first time interval is the moment where the second image frame is located.
本申请中,电子设备可以将录制过程中的每个图像帧与图像资源库中的操作对应的图像指令内容(或称为特征行为)中的每个图像帧均进行比对,存在一图像指令的开始至结束的图像帧与资源库中存储的任意图像指令内容的开始到结束的图像帧匹配即完全一致时,则确认识别到对应的图像指令,可以 执行图像指令对应的操作。In the present application, the electronic device can compare each image frame in the recording process with each image frame in the image instruction content (or characteristic behavior) corresponding to the operation in the image resource library. When there is an image frame from the beginning to the end of an image instruction that matches the image frame from the beginning to the end of any image instruction content stored in the resource library, that is, it is completely consistent, then it is confirmed that the corresponding image instruction is recognized, and it can be Execute the operation corresponding to the image instruction.
本申请中,可以将用户输入的图像指令中,第一个能与资源库中的任意图像指令内容的图像指令内容匹配的图像帧,即第一图像帧,对应的时刻点标记为图像指令所处的第一时间区间的开始时刻点,将最后一个能与资源库中的图像指令内容匹配的图像帧,即第二图像帧,对应的时刻点标记为图像指令的第一时间区间的结束时刻点,基于上述方案,可以在录制过程中实现图像指令的动态识别,提高图像指令识别效率。In the present application, the first image frame in the image command input by the user that can match the image command content of any image command content in the resource library, that is, the first image frame, can have its corresponding time point marked as the starting time point of the first time interval in which the image command is located, and the last image frame that can match the image command content in the resource library, that is, the second image frame, can have its corresponding time point marked as the ending time point of the first time interval of the image command. Based on the above scheme, dynamic recognition of image commands can be realized during the recording process, thereby improving the efficiency of image command recognition.
在一种可能的实现中,确定第一图像块中与特征行为匹配的图像指令之前,方法还包括:对视频流内容中的至少一个图像帧的第二图像块进行识别,在未识别到图像指令时,由第二图像块切换到第一图像块。In one possible implementation, before determining the image instruction that matches the characteristic behavior in the first image block, the method also includes: identifying a second image block of at least one image frame in the video stream content, and switching from the second image block to the first image block when no image instruction is identified.
在一种可能的实现中,第一图像块比第二图像块大,且第一图像块包含第二图像块。In a possible implementation, the first image block is larger than the second image block, and the first image block includes the second image block.
在一种可能的实现中,第二图像块位于图像帧的中心位置,第一图像块基于第二图像块向四周延伸。In a possible implementation, the second image block is located at the center of the image frame, and the first image block extends in all directions based on the second image block.
本申请中,第二图像块可以为位于图像帧的中心位置的图像块,即,例如可以为本申请实施例中提及的第一中心图像块,第一图像块可以为图像帧的中心位置的图像块向四周延伸设定尺寸的图像块,例如,可以为本申请实施例中提及的第二中心图像块。In the present application, the second image block may be an image block located at the center of the image frame, that is, for example, it may be the first center image block mentioned in the embodiment of the present application, and the first image block may be an image block of a set size extending from the image block at the center of the image frame to the surrounding areas, for example, it may be the second center image block mentioned in the embodiment of the present application.
可以理解,在录制过程中,图像的核心信息一般只占据整个图像的局部区域,且占据中心区域的概率较大,而核心信息匹配的情况下,图像一般会匹配,因此,本申请实施例中采用中心位置图像块逐步扩大的局部比对法进行图像的匹配,可以在保证匹配正确率的情况下,提高匹配效率。It can be understood that during the recording process, the core information of the image generally only occupies a local area of the entire image, and has a higher probability of occupying the central area. When the core information matches, the images will generally match. Therefore, in the embodiment of the present application, a local comparison method in which the center position image block is gradually expanded is used to match the image. This can improve the matching efficiency while ensuring the matching accuracy.
在一种可能的实现中,第一图像块与第二图像块相邻。In a possible implementation, the first image block is adjacent to the second image block.
本申请中,第一图像块可以是与第二图像块右侧相邻的图像块,也可以是与第二图像块左侧、上侧、下侧、对角线方向相邻的图像块。本申请中,可以按照预设的切换顺序确定排列在第二图像块后的下一个图像块。并按照切换顺序对各图像块进行识别匹配。In the present application, the first image block may be an image block adjacent to the right side of the second image block, or an image block adjacent to the left side, top side, bottom side, or diagonal direction of the second image block. In the present application, the next image block arranged after the second image block may be determined according to a preset switching sequence, and each image block may be identified and matched according to the switching sequence.
在一种可能的实现中,第一图像块为按照切换顺序排列在第二图像块后的下一个图像块,切换顺序包括预先设定的不同顺序图像块所在的位置。In a possible implementation, the first image block is the next image block arranged after the second image block according to a switching sequence, and the switching sequence includes preset positions of image blocks of different sequences.
可以理解,在录制过程中,图像的核心信息一般只占据整个图像的局部区域,而核心信息匹配的情况下,图像一般会匹配,因此,本申请中采用按照切换顺序依次比对各图像块的局部比对法进行图像的匹配,可以在保证匹配正确率的情况下,提高匹配效率。It can be understood that during the recording process, the core information of the image generally only occupies a local area of the entire image, and when the core information matches, the images will generally match. Therefore, the local comparison method of comparing each image block in sequence according to the switching order is adopted in this application to match the image, which can improve the matching efficiency while ensuring the matching accuracy.
在一种可能的实现中,图像指令在第一图像块中与特征行为进行匹配时,方法还包括:识别第一图像块中的第一像素,第一像素与相邻像素的显示参数差值高于第一阈值,忽略匹配第一像素。In a possible implementation, when the image instruction is matched with the characteristic behavior in the first image block, the method further includes: identifying a first pixel in the first image block, a display parameter difference between the first pixel and adjacent pixels being higher than a first threshold, and ignoring matching of the first pixel.
可以理解,造成图像匹配结果为不匹配的原因可能是存在坏点(或噪点),而并不是实际意义上的图像不匹配,因此,本申请中识别出与相邻像素的显示参数差值高于第一阈值的像素点,即存在明显跳变的坏点,不对坏点进行匹配,可以有效保证图像匹配的正确率。It can be understood that the reason for the mismatch in image matching results may be the existence of bad pixels (or noise points), rather than the image mismatch in the actual sense. Therefore, in the present application, pixel points whose display parameter difference with adjacent pixels is higher than the first threshold are identified, that is, bad pixels with obvious jumps, and the bad pixels are not matched, which can effectively ensure the accuracy of image matching.
在一种可能的实现中,图像指令在第一图像块中与特征行为进行匹配时,方法还包括:识别第一图像块中不匹配的第二像素及第三像素,第二像素与第三像素的位置差距超过第二阈值,忽略匹配第二像素及第三像素。In one possible implementation, when the image instruction is matched with the characteristic behavior in the first image block, the method also includes: identifying the unmatched second pixel and third pixel in the first image block, and ignoring the matching of the second pixel and the third pixel when the position difference between the second pixel and the third pixel exceeds a second threshold.
可以理解,本申请中,当不匹配的任意两像素的位置差距太大,则可以认为该不匹配的两像素点是无关紧要的可忽略点,可以不对该可忽略点进行匹配。可以有效保证图像匹配的正确率。It can be understood that in this application, when the position difference between any two unmatched pixels is too large, the two unmatched pixels can be considered as insignificant and negligible points, and the negligible points can be not matched, which can effectively ensure the accuracy of image matching.
在一种可能的实现中,匹配图像指令的特征行为包括第一图像块中出现的示意内容或肢体动作。In a possible implementation, the characteristic behavior matching the image instruction includes schematic content or body movements appearing in the first image block.
在一种可能的实现中,示意内容包括第一图像块中出现的特定文字信息或图像信息。In a possible implementation, the schematic content includes specific text information or image information appearing in the first image block.
本申请中,可以通过示意内容控制视频的录制,便于在用户无法输入肢体或脸部图像时,通过特定文字或图像,例如展示在纸板上的文字或图像控制视频的录制,满足视频录制控制的多场景需求。In the present application, the recording of the video can be controlled by the schematic content, so that when the user is unable to input limb or facial images, the recording of the video can be controlled by specific text or images, such as text or images displayed on cardboard, to meet the multi-scenario requirements of video recording control.
在一种可能的实现中,肢体动作包括第一图像块中出现的特定手势信息或脸部信息。In a possible implementation, the body movement includes specific gesture information or facial information appearing in the first image block.
在一种可能的实现中,至少通过摄像头执行录制包括:通过第一摄像头执行录制;响应于用户通过第一摄像头输入的切换摄像头的图像指令,通过第二摄像头执行录制。In one possible implementation, performing recording through at least one camera includes: performing recording through a first camera; and performing recording through a second camera in response to an image instruction for switching cameras input by a user through the first camera.
在一种可能的实现中,第一摄像头为电子设备的前置摄像头,第二摄像头为电子设备的后置摄像头;或第一摄像头为电子设备的后置摄像头,第二摄像头为电子设备的前置摄像头。In a possible implementation, the first camera is a front camera of the electronic device, and the second camera is a rear camera of the electronic device; or the first camera is a rear camera of the electronic device, and the second camera is a front camera of the electronic device.
在一种可能的实现中,通过第二摄像头执行录制之后,方法还包括:识别视频流内容中用户通过第 二摄像头输入的图像指令。In a possible implementation, after recording is performed by the second camera, the method further includes: identifying the user in the video stream content through the first camera. Image command input from the second camera.
在一种可能的实现中,匹配切换摄像头的图像指令的特征行为包括第一图像块中存在用户调转电子设备方向导致的录制画面旋转。In a possible implementation, the characteristic behavior matching the image instruction for switching the camera includes that a recorded image is rotated in the first image block due to the user turning the electronic device.
本申请中,基于上述方案,当电子设备检测到录制画面旋转时,则可以判断用户具有切换摄像头的意图,并确定检测到了切换摄像头的图像指令,可以控制电子设备切换摄像头。如此,便于用户无需通过输入肢体动作或示意内容,只需调转手机实现录制画面的旋转即可控制电子设备切换摄像头,满足多场景的录制控制需求,提升用户体验。In this application, based on the above scheme, when the electronic device detects that the recording screen is rotating, it can be judged that the user has the intention to switch the camera, and it is determined that the image instruction to switch the camera is detected, and the electronic device can be controlled to switch the camera. In this way, the user does not need to input body movements or gestures, but only needs to turn the mobile phone to rotate the recording screen to control the electronic device to switch the camera, which meets the recording control needs of multiple scenes and improves the user experience.
在一种可能的实现中,方法还包括:根据录制内容生成第一录制文件,第一录制文件包括删除第二时间区间对应录制内容后的拼接内容,第二时间区间基于图像指令所在的第一时间区间确定。In a possible implementation, the method further includes: generating a first recording file according to the recorded content, the first recording file including the spliced content after deleting the recorded content corresponding to the second time interval, and the second time interval is determined based on the first time interval where the image instruction is located.
在一种可能的实现中,第二时间区间基于图像指令所在的第一时间区间确定包括:根据音频流内容在第二时间区间前后相邻的波形相似度确定第三时间区间;根据第三时间区间确定第一时间区间。In a possible implementation, determining the second time interval based on the first time interval where the image instruction is located includes: determining the third time interval according to the waveform similarity of the audio stream content before and after the second time interval; and determining the first time interval according to the third time interval.
本申请中,可以对图像指令所在的时间区间进行优化,获取第二时间区间,在检测到结束指令后,生成删除第二时间区间对应录制内容后的拼接内容,可以进一步确保录制内容不会产生突变,且有效保证生成的录制文件中不包括图像指令内容。In the present application, the time interval where the image instruction is located can be optimized to obtain a second time interval. After detecting the end instruction, the spliced content is generated after deleting the recorded content corresponding to the second time interval. This can further ensure that the recorded content will not mutate and effectively ensure that the generated recording file does not include the image instruction content.
在一种可能的实现中,方法还包括:根据录制内容生成第二录制文件,第二录制文件中标记有第二时间区间的开始时间和结束时间。In a possible implementation, the method further includes: generating a second recording file according to the recorded content, wherein the second recording file is marked with a start time and an end time of the second time interval.
在一种可能的实现中,方法还包括:根据录制内容生成第三录制文件集,第三录制文件集包括录制内容在第二时间区间对应的至少一个第一类录制片段文件及录制内容在第二时间区间以外的至少一个第二类录制片段文件。In a possible implementation, the method also includes: generating a third recording file set based on the recorded content, the third recording file set including at least one first-category recording segment file corresponding to the recorded content in the second time interval and at least one second-category recording segment file of the recorded content outside the second time interval.
本申请中,可以基于标记有第二时间区间的开始时间和结束时间的录制文件生成第三录制文件集,且电子设备可以将第一类录制片段文件与对应的操作存储在图像指令素材库中,以用于后续图像指令的识别和匹配。电子设备可以存储第二类录制片段文件,可以便于用户进行单独查看,或进行拼接合成等剪辑处理。In the present application, a third set of recorded files can be generated based on the recorded files marked with the start time and the end time of the second time interval, and the electronic device can store the first type of recorded clip files and the corresponding operations in the image instruction material library for subsequent recognition and matching of image instructions. The electronic device can store the second type of recorded clip files, which can facilitate users to view them separately or perform editing processing such as splicing and synthesis.
第二方面,本申请提供一种电子设备,包括:存储器,用于存储电子设备的一个或多个处理器执行的指令,以及处理器,是电子设备的一个或多个处理器之一,用于执行本申请提及的视频录制控制方法。In a second aspect, the present application provides an electronic device, comprising: a memory for storing instructions executed by one or more processors of the electronic device, and a processor, which is one of the one or more processors of the electronic device, for executing the video recording control method mentioned in the present application.
第三方面,本申请提供一种可读存储介质,可读介质上存储有指令,指令在电子设备上执行时使得电子设备执行权利要求本申请提的视频录制控制方法。In a third aspect, the present application provides a readable storage medium having instructions stored thereon. When the instructions are executed on an electronic device, the electronic device executes the video recording control method claimed in the present application.
第四方面,本申请提供一种计算机程序产品,包括:执行指令,执行指令存储在可读存储介质中,电子设备的至少一个处理器可以从可读存储介质读取执行指令,至少一个处理器执行执行指令使得电子设备执行本申请提及的视频录制控制方法。In a fourth aspect, the present application provides a computer program product, including: execution instructions, the execution instructions are stored in a readable storage medium, at least one processor of an electronic device can read the execution instructions from the readable storage medium, and at least one processor executes the execution instructions so that the electronic device executes the video recording control method mentioned in the present application.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1根据本申请的一些实施例,示出了一种电子设备的硬件结构示意图;FIG1 is a schematic diagram showing a hardware structure of an electronic device according to some embodiments of the present application;
图2a根据本申请的一些实施例,示出了一种麦克风的分布示意图;FIG2a shows a schematic diagram of microphone distribution according to some embodiments of the present application;
图2b根据本申请的一些实施例,示出了一种麦克风的收音范围示意图;FIG2b is a schematic diagram showing a sound receiving range of a microphone according to some embodiments of the present application;
图2c-2e根据本申请的一些实施例,分别示出了一种后置摄像头的分布示意图;2c-2e respectively show a schematic diagram of the distribution of a rear camera according to some embodiments of the present application;
图2f根据本申请的一些实施例,示出了一种前置摄像头的分布示意图;FIG2f shows a schematic diagram of the distribution of front cameras according to some embodiments of the present application;
图2g根据本申请的一些实施例,示出了一种前置和后置摄像头的拍摄范围示意图;FIG2g is a schematic diagram showing a shooting range of a front camera and a rear camera according to some embodiments of the present application;
图3a根据本申请的一些实施例,示出了一种电子设备的软件结构示意图;FIG3a shows a schematic diagram of a software structure of an electronic device according to some embodiments of the present application;
图3b根据本申请的一些实施例,示出了一种图像/视频采集模块的功能示意图;FIG3 b shows a functional schematic diagram of an image/video acquisition module according to some embodiments of the present application;
图3c根据本申请的一些实施例,示出了一种图像/视频采集模块的功能示意图;FIG3c shows a functional schematic diagram of an image/video acquisition module according to some embodiments of the present application;
图3d根据本申请的一些实施例,示出了一种图像/视频采集模块进行分流独立控制的示意图;FIG3d is a schematic diagram showing a method of performing independent control of a split flow in an image/video acquisition module according to some embodiments of the present application;
图3e根据本申请的一些实施例,示出了一种分流控制的示意图;FIG3e shows a schematic diagram of a flow splitting control according to some embodiments of the present application;
图3f根据本申请的一些实施例,示出了一种操作控制模块的功能示意图;FIG3f shows a functional schematic diagram of an operation control module according to some embodiments of the present application;
图3g根据本申请的一些实施例,示出了一种图像/视频识别模块的功能示意图; FIG3g shows a functional schematic diagram of an image/video recognition module according to some embodiments of the present application;
图3h根据本申请的一些实施例,示出了一种音画同步模块的功能示意图;FIG3h shows a functional schematic diagram of an audio-video synchronization module according to some embodiments of the present application;
图3i根据本申请的一些实施例,示出了一种同步控制的方式示意图;FIG3i is a schematic diagram showing a synchronous control method according to some embodiments of the present application;
图3j根据本申请的一些实施例,示出了一种拍摄结果生成模块的功能示意图;FIG3j is a functional schematic diagram of a shooting result generating module according to some embodiments of the present application;
图4a-4c根据本申请的一些实施例,分别示出了一种示意内容的示意图;4a-4c are schematic diagrams showing schematic contents according to some embodiments of the present application;
图5根据本申请的一些实施例,示出了一种视频录制控制方法的流程示意图;FIG5 is a schematic diagram showing a flow chart of a video recording control method according to some embodiments of the present application;
图6根据本申请的一些实施例,示出了一种开启录制的示意图;FIG6 shows a schematic diagram of starting recording according to some embodiments of the present application;
图7根据本申请的一些实施例,示出了一种开启录制的示意图;FIG7 shows a schematic diagram of starting recording according to some embodiments of the present application;
图8根据本申请的一些实施例,示出了一种音频优化法的示意图;FIG8 is a schematic diagram showing an audio optimization method according to some embodiments of the present application;
图9a-9e根据本申请的一些实施例,示出了录制的场景示意图;9a-9e are schematic diagrams showing recorded scenes according to some embodiments of the present application;
图10a根据本申请的一些实施例,示出了肢体动作的组成示意图;FIG10a is a schematic diagram showing the composition of body movements according to some embodiments of the present application;
图10b根据本申请的一些实施例,示出了部分手势图像的示意图;FIG10b is a schematic diagram showing a partial gesture image according to some embodiments of the present application;
图11根据本申请的一些实施例,示出了示意内容的组成示意图;FIG11 is a schematic diagram showing the composition of the schematic content according to some embodiments of the present application;
图12根据本申请的一些实施例,示出了图像指令匹配方法的组成示意图;FIG12 is a schematic diagram showing the composition of an image instruction matching method according to some embodiments of the present application;
图13a-13b根据本申请的一些实施例,示出了一种图像指令匹配的过程示意图;13a-13b are schematic diagrams showing a process of image instruction matching according to some embodiments of the present application;
图13c-13e根据本申请的一些实施例,分别示出了一种图像指令的匹配情况示意图;13c-13e are schematic diagrams showing matching situations of image instructions according to some embodiments of the present application;
图13f根据本申请的一些实施例,示出了一种图像帧匹配的示意图;FIG13f shows a schematic diagram of image frame matching according to some embodiments of the present application;
图13g根据本申请的一些实施例,示出了一种图像帧中存在坏点的示意图;FIG13g is a schematic diagram showing a bad pixel in an image frame according to some embodiments of the present application;
图13h根据本申请的一些实施例,示出了一种差异点二次分析法的组成示意图;FIG13h is a schematic diagram showing the composition of a difference point secondary analysis method according to some embodiments of the present application;
图13i根据本申请的一些实施例,示出了一种图像帧中存在忽略点的示意图;FIG13i is a schematic diagram showing a neglected point in an image frame according to some embodiments of the present application;
图14a根据本申请的一些实施例,示出了一种图像局部比对法的组成示意图;FIG14a is a schematic diagram showing a composition of a local image comparison method according to some embodiments of the present application;
图14b根据本申请的一些实施例,示出了一种任意位置移动比对法的示意图;FIG14b is a schematic diagram showing an arbitrary position movement comparison method according to some embodiments of the present application;
图14c根据本申请的一些实施例,示出了一种任意位置扩大比对法的示意图;FIG14c is a schematic diagram showing an arbitrary position enlargement comparison method according to some embodiments of the present application;
图14d根据本申请的一些实施例,示出了一种固定位置比对法的示意图;FIG14d is a schematic diagram showing a fixed position alignment method according to some embodiments of the present application;
图15a根据本申请的一些实施例,示出了一种确定扩大范围后的开始时刻点的方法流程示意图;FIG15a shows a schematic flow chart of a method for determining a starting time point after a range is expanded according to some embodiments of the present application;
图15b根据本申请的一些实施例,示出了一种确定扩大范围后的开始时刻点的方法流程示意图;FIG15b is a schematic flow chart of a method for determining a starting time point after a range is expanded according to some embodiments of the present application;
图16根据本申请的一些实施例,示出了一种录制控制***的示意图;FIG16 is a schematic diagram showing a recording control system according to some embodiments of the present application;
图17根据本申请的一些实施例,示出了录制控制方法的实现示意图。FIG17 is a schematic diagram showing an implementation of a recording control method according to some embodiments of the present application.
具体实施方式Detailed ways
本申请的说明性实施例包括但不限于一种视频录制控制方法、电子设备及介质。The illustrative embodiments of the present application include, but are not limited to, a video recording control method, an electronic device, and a medium.
下面首先对本申请实施例中提及的电子设备100的硬件结构进行介绍。The following first introduces the hardware structure of the electronic device 100 mentioned in the embodiment of the present application.
可以理解,本申请实施例中的电子设备100可以称为用户设备(user equipment,UE)、终端(terminal)等,例如,电子设备100可以为平板电脑(portable android device,PAD)、个人数字处理(personal digital assistant,PDA)、具有无线通信功能的手持设备、计算设备、车载设备或可穿戴设备等。本申请实施例中对终端设备的形态不做具体限定。It can be understood that the electronic device 100 in the embodiment of the present application can be called a user equipment (UE), a terminal, etc. For example, the electronic device 100 can be a tablet computer (portable android device, PAD), a personal digital assistant (personal digital assistant, PDA), a handheld device with wireless communication function, a computing device, a vehicle-mounted device or a wearable device, etc. The form of the terminal device is not specifically limited in the embodiment of the present application.
如图1所示,电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。As shown in Figure 1, the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, and a subscriber identification module (SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, etc.
可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。 It is to be understood that the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown in the figure, or combine some components, or split some components, or arrange the components differently. The components shown in the figure may be implemented in hardware, software, or a combination of software and hardware.
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (AP), a modem processor, a graphics processor (GPU), an image signal processor (ISP), a controller, a video codec, a digital signal processor (DSP), a baseband processor, and/or a neural-network processing unit (NPU), etc. Different processing units may be independent devices or integrated in one or more processors.
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。The charging management module 140 is used to receive charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from a wired charger through the USB interface 130. In some wireless charging embodiments, the charging management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100. While the charging management module 140 is charging the battery 142, it may also power the electronic device through the power management module 141.
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the display screen 194, the camera 193, and the wireless communication module 160. The power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle number, battery health status (leakage, impedance), etc. In some other embodiments, the power management module 141 can also be set in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 can also be set in the same device.
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。The wireless communication function of the electronic device 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。Antenna 1 and antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in electronic device 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve the utilization of antennas. For example, antenna 1 can be reused as a diversity antenna for a wireless local area network. In some other embodiments, the antenna can be used in combination with a tuning switch.
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。The mobile communication module 150 can provide solutions for wireless communications including 2G/3G/4G/5G, etc., applied to the electronic device 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, and perform filtering, amplification, and other processing on the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modulation and demodulation processor, and convert it into electromagnetic waves for radiation through the antenna 1. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be arranged in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be arranged in the same device as at least some of the modules of the processor 110.
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星***(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。The wireless communication module 160 can provide wireless communication solutions including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), bluetooth (BT), global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), infrared (IR) and the like applied to the electronic device 100. The wireless communication module 160 can be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110. The wireless communication module 160 can also receive the signal to be sent from the processor 110, modulate the frequency, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2.
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The electronic device 100 implements the display function through a GPU, a display screen 194, and an application processor. The GPU is a microprocessor for image processing, which connects the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。The display screen 194 is used to display images, videos, etc. The display screen 194 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diodes (QLED), etc. In some embodiments, the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music and videos can be stored in the external memory card.
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器 121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作***,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器110通过运行存储在内部存储器121的指令,和/或存储在设置于处理器中的存储器的指令,执行电子设备100的各种功能应用以及数据处理。The internal memory 121 may be used to store computer executable program codes, which may include instructions. 121 may include a program storage area and a data storage area. The program storage area may store an operating system, an application required for at least one function (such as a sound playback function, an image playback function, etc.), etc. The data storage area may store data created during the use of the electronic device 100 (such as audio data, a phone book, etc.), etc. In addition, the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, a universal flash storage (UFS), etc. The processor 110 executes various functional applications and data processing of the electronic device 100 by running instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。The electronic device 100 can implement audio functions such as music playing and recording through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone jack 170D, and the application processor.
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。The audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signals. The audio module 170 can also be used to encode and decode audio signals. In some embodiments, the audio module 170 can be arranged in the processor 110, or some functional modules of the audio module 170 can be arranged in the processor 110.
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。The speaker 170A, also called a "speaker", is used to convert an audio electrical signal into a sound signal. The electronic device 100 can listen to music or listen to a hands-free call through the speaker 170A.
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。The receiver 170B, also called a "earpiece", is used to convert audio electrical signals into sound signals. When the electronic device 100 receives a call or voice message, the voice can be received by placing the receiver 170B close to the human ear.
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以通过麦克风实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。Microphone 170C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 170C with his mouth, and the sound signal is input into the microphone 170C. The electronic device 100 can collect sound signals, reduce noise, identify the sound source, and realize directional recording functions through the microphone.
可以理解,本申请实施例中,麦克风170C的数量可以为一个,也可以为多个。当麦克风170C包括麦克风A、麦克风B、麦克风C时,排布方式可以如图2a所示,麦克风A可以设于手机顶部、麦克风B可以设于手机底部、麦克风C可以设于手机背部。其中,三个麦克风的收音范围可以如图2b所示,麦克风A主要的收音范围是中上部分,可以用于前置后置录制的场景;麦克风B主要收音的是中下部分,可以用于前置后置录制的场景;麦克风C主要收音的是后部分,可以用于后置录制的场景。It can be understood that in the embodiment of the present application, the number of microphones 170C can be one or more. When microphone 170C includes microphone A, microphone B, and microphone C, the arrangement can be as shown in Figure 2a, microphone A can be located on the top of the mobile phone, microphone B can be located on the bottom of the mobile phone, and microphone C can be located on the back of the mobile phone. Among them, the sound receiving range of the three microphones can be shown in Figure 2b, the main sound receiving range of microphone A is the middle and upper part, which can be used for the scene of front and rear recording; microphone B mainly receives the sound in the middle and lower part, which can be used for the scene of front and rear recording; microphone C mainly receives the sound in the rear part, which can be used for the scene of rear recording.
可以理解,本申请实施例中,若电子设备的麦克风170C只有一个时,则可以对获取的到单路音频流内容进行备份,以实现获取多份音频流内容。可以理解,在一些实施例中,在电子设备的麦克风有多个时,即电子设备可以直接获取多份音频流内容。在一些实施例中,电子设备麦克风有多个,也可以利用其中一个麦克风获取的一路音频流进行多份复制。It is understood that in the embodiment of the present application, if the electronic device has only one microphone 170C, the acquired single-channel audio stream content can be backed up to achieve the acquisition of multiple copies of the audio stream content. It is understood that in some embodiments, when the electronic device has multiple microphones, the electronic device can directly acquire multiple copies of the audio stream content. In some embodiments, the electronic device has multiple microphones, and multiple copies of an audio stream acquired by one of the microphones can also be used.
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。The earphone interface 170D is used to connect a wired earphone. The earphone interface 170D may be the USB interface 130, or may be a 3.5 mm open mobile terminal platform (OMTP) standard interface or a cellular telecommunications industry association of the USA (CTIA) standard interface.
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现录制功能。The electronic device 100 can implement a recording function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对录制场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。ISP is used to process the data fed back by camera 193. For example, when taking a photo, the shutter is opened, and the light is transmitted to the camera photosensitive element through the lens. The light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to ISP for processing and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on the noise, brightness, and skin color of the image. ISP can also optimize the exposure, color temperature and other parameters of the recorded scene. In some embodiments, ISP can be set in camera 193.
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。The camera 193 is used to capture still images or videos. The object generates an optical image through the lens and projects it onto the photosensitive element. The photosensitive element can be a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV or other format. In some embodiments, the electronic device 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.
可以理解,本申请实施例中,摄像头的数量的可以为一个,也可以为多个,摄像头的排布方式也可以为多种,例如,如图2c中所示,后置摄像头可以为两个;如图2d中所示,后置摄像头可以为三个,如图2e中所示,后置摄像头可以为四个;在一些实施例中,电子设备100还可以包括任意数量的前置摄像。例如,如图2f所示,前置摄像头可以为三个,分别包括前置摄像头a、前置摄像头b和前置摄像 头c。It can be understood that in the embodiments of the present application, the number of cameras can be one or more, and the arrangement of the cameras can be various. For example, as shown in FIG. 2c, there can be two rear cameras; as shown in FIG. 2d, there can be three rear cameras; as shown in FIG. 2e, there can be four rear cameras; in some embodiments, the electronic device 100 can also include any number of front cameras. For example, as shown in FIG. 2f, there can be three front cameras, including front camera a, front camera b, and front camera 7. Head c.
其中,图2d中的三个后置摄像头的录制范围可以如图2g所示,最上方的后置摄像头1主要的录制范围是中上部分,后置摄像头2主要范围是偏中间部分;后置摄像头3主要范围是偏下部分。图2f中的三个前置摄像头的录制范围可以如图2g所示,前置摄像头a主要范围是偏左部分,前置摄像头b主要范围是偏中间部分,前置摄像头c主要范围是偏右部分。The recording ranges of the three rear cameras in FIG2d can be as shown in FIG2g, where the main recording range of the top rear camera 1 is the upper middle part, the main range of the rear camera 2 is the middle part, and the main range of the rear camera 3 is the lower part. The recording ranges of the three front cameras in FIG2f can be as shown in FIG2g, where the main range of the front camera a is the left part, the main range of the front camera b is the middle part, and the main range of the front camera c is the right part.
本申请实施例中,当电子设备的摄像头193只有一个使得电子设备只能获取一路视频流时,可以对视频流内容进行备份,以实现获取多份视频流内容。可以理解,在一些实施例中,电子设备摄像头有多个,则电子设备可以直接获取多份视频流内容。在一些实施例中,电子设备摄像头有多个,也可以利用其中一个摄像头获取的一路视频流进行多份复制。In the embodiment of the present application, when the electronic device has only one camera 193 so that the electronic device can only obtain one video stream, the video stream content can be backed up to obtain multiple copies of the video stream content. It can be understood that in some embodiments, the electronic device has multiple cameras, and the electronic device can directly obtain multiple copies of the video stream content. In some embodiments, the electronic device has multiple cameras, and multiple copies of one video stream obtained by one of the cameras can also be used.
可以理解,本申请实施例中的摄像头或麦克风的设置方式均是举例说明,摄像头或麦克风可以根据实际需求进行任意方式的设置。It can be understood that the settings of the camera or microphone in the embodiments of the present application are all examples, and the camera or microphone can be set in any way according to actual needs.
下面对本申请实施例中提及的电子设备100的软件结构进行介绍。The software structure of the electronic device 100 mentioned in the embodiment of the present application is introduced below.
如图3a所示,电子设备100的软件架构从上至下分别为应用程序层,应用程序框架层,硬件抽象层,以及内核层。As shown in FIG. 3 a , the software architecture of the electronic device 100 includes, from top to bottom, an application layer, an application framework layer, a hardware abstraction layer, and a kernel layer.
应用程序层可以包括一系列应用程序包。如图3a所示,应用程序包可以包括相机应用,其中相机应用中可以包括拍摄模式控制模块、语音采集模块、图像/视频采集模块、操作控制模块、语音控制模块、音画同步模块与拍摄结果生成模块。The application layer may include a series of application packages. As shown in FIG3a , the application package may include a camera application, wherein the camera application may include a shooting mode control module, a voice acquisition module, an image/video acquisition module, an operation control module, a voice control module, an audio-visual synchronization module, and a shooting result generation module.
可以理解,拍摄模式控制模块用于基于用户指令控制录制模式,例如录像模式、拍照模式、双景模式等。It can be understood that the shooting mode control module is used to control the recording mode based on user instructions, such as video recording mode, photo taking mode, dual view mode, etc.
如图3b所示,图像/视频采集模块可以用于建立图像/视频资源库,以及在录制过程中,获取摄像头采集的视频或图像,以及进行单路视频流的分流控制和多路视频流的多路控制。可以理解,图像/视频采集模块可以为本申请中提及的摄像头中的采集模块,其中,图像/视频采集模块进行单路视频流的分流控制和多路视频流的多路控制方式与语音采集模块中对音频流内容的控制方式类似,此处不再赘述。As shown in FIG3b, the image/video acquisition module can be used to establish an image/video resource library, and during the recording process, obtain the video or image captured by the camera, and perform the shunting control of a single-channel video stream and the multi-channel control of multiple-channel video streams. It can be understood that the image/video acquisition module can be the acquisition module in the camera mentioned in this application, wherein the image/video acquisition module performs the shunting control of a single-channel video stream and the multi-channel control of multiple-channel video streams in a manner similar to the control of the audio stream content in the voice acquisition module, which will not be repeated here.
其中,图像/视频资源库可以包括图像/视频***库、自定义图像/视频控制库以及图像控制指令素材库。其中,图像/视频***库用于***预设的图像指令内容以及对应的操作,自定义图像/视频控制库用于存储用户自定义设置的图像指令内容以及对应的操作。图像控制指令素材库用于存储用户的图像指令对应的素材片段以及对应的操作,且***预设的图像指令内容、用户自定义设置的图像指令内容和图像指令对应的素材片段可以作为控制指令的比对数据。The image/video resource library may include an image/video system library, a custom image/video control library, and an image control instruction material library. The image/video system library is used for system preset image instruction content and corresponding operations, and the custom image/video control library is used to store user-defined image instruction content and corresponding operations. The image control instruction material library is used to store material clips and corresponding operations corresponding to the user's image instructions, and the system preset image instruction content, user-defined image instruction content, and image instruction corresponding material clips can be used as comparison data for control instructions.
具体的,如图3c所示,在录制过程中,图像/视频采集模块可以用于获取单摄像头采集的单路视频或多摄像头采集的多路视频流内容。可以理解,本申请实施例中,若电子设备的摄像头只有一个,则图像/视频采集模块需要对单摄像头获取到的单路视频做逻辑处理,进行分流,即可以对获取的到单路视频流内容进行备份,以实现获取多个视频流内容。每一份视频流内容可以按需进行后续的操作。其中,分流的主要方式如图3d所示,可以包括全面分流和智能分流:其中,全面分流指将整个视频文件复制成为多份,然后独立进行处理;智能分流是将有效部分复制为多份,有效部分主要是带有图像指令部分的录制内容。Specifically, as shown in FIG3c, during the recording process, the image/video acquisition module can be used to obtain a single-channel video captured by a single camera or multiple video streams captured by multiple cameras. It can be understood that in the embodiment of the present application, if the electronic device has only one camera, the image/video acquisition module needs to perform logical processing on the single-channel video acquired by the single camera and perform diversion, that is, the acquired single-channel video stream content can be backed up to achieve the acquisition of multiple video stream contents. Each video stream content can be subsequently operated on as needed. Among them, the main method of diversion is shown in FIG3d, which can include comprehensive diversion and intelligent diversion: among them, comprehensive diversion refers to copying the entire video file into multiple copies, and then processing them independently; intelligent diversion is to copy the effective part into multiple copies, and the effective part is mainly the recording content with the image instruction part.
例如,如图3e所示,整体分流可以为将对实时录制的完整视频流A进行复制,此时可以保留原本的视频流内容A,复制的视频流内容用于图像控制处理,例如进行图像指令对应时间段的标记,并去除图像指令对应的时间段内容等处理。智能分流可以为对实时录制的视频流B中有效部分B1、B2和B3进行复制。此时可以保留原本的视频流内容B,复制的视频流内容B1、B2和B3用于图像控制处理。For example, as shown in FIG3e, overall diversion can be to copy the complete video stream A recorded in real time, in which case the original video stream content A can be retained, and the copied video stream content is used for image control processing, such as marking the time period corresponding to the image instruction, and removing the time period content corresponding to the image instruction. Intelligent diversion can be to copy the effective parts B1, B2 and B3 of the real-time recorded video stream B. In this case, the original video stream content B can be retained, and the copied video stream content B1, B2 and B3 are used for image control processing.
可以理解,在一些实施例中,在电子设备的摄像头有多个时,图像/视频采集模块无需再对视频流内容进行复制,可以设置多个摄像头获取的多路视频流内容中的若干份视频流内容为用于图像处理,若干份视频流内容保留原文件。例如,电子设备的摄像头有三个,获取的视频流内容包括视频流内容A、视频流内容B和视频流内容C。则可以将视频流内容A和B不进行处理,保留原文件,设置视频流内容C用于图像控制处理等。或者可以将视频流内容A不进行处理,保留原文件,对视频流内容B进行标记处理,以用于生成多个素材片段,对视频流内容C进行标记处理,以获取裁剪后的录制文件。It can be understood that in some embodiments, when the electronic device has multiple cameras, the image/video acquisition module does not need to copy the video stream content, and several video stream contents among the multiple video stream contents acquired by multiple cameras can be set for image processing, and several video stream contents retain the original files. For example, the electronic device has three cameras, and the acquired video stream content includes video stream content A, video stream content B, and video stream content C. Then, the video stream contents A and B can be not processed, the original files can be retained, and the video stream content C can be set for image control processing, etc. Alternatively, the video stream content A can be not processed, the original file can be retained, the video stream content B can be marked for generating multiple material clips, and the video stream content C can be marked to obtain a cropped recording file.
可以理解,图像/视频采集模块可以用于将获取到的视频发送至操作控制模块。 It can be understood that the image/video acquisition module can be used to send the acquired video to the operation control module.
语音采集模块可以用于建立语音资源库,以及在录制过程中,获取麦克风采集的音频,以及进行单路音频流的分流控制和多路音频流的多路控制。其中,语音资源库可以包括语音***库、自定义语音控制库以及语音控制指令素材库。语音***库用于存储***预设的语音指令内容以及对应的操作。自定义语音控制库用于存储用户自定义设置的语音指令内容以及对应的操作。语音控制指令素材库用于存储用户的语音指令对应的素材片段以及对应的操作。The voice acquisition module can be used to establish a voice resource library, and during the recording process, obtain the audio collected by the microphone, and perform shunting control of a single audio stream and multi-channel control of multiple audio streams. Among them, the voice resource library can include a voice system library, a custom voice control library, and a voice control instruction material library. The voice system library is used to store the system preset voice instruction content and corresponding operations. The custom voice control library is used to store the user-defined voice instruction content and corresponding operations. The voice control instruction material library is used to store the material clips and corresponding operations corresponding to the user's voice instructions.
具体的,在录制过程中,语音采集模块可以用于获取单麦克风采集的单路音频或多麦克风采集的多路音频流内容。可以理解,本申请实施例中,若电子设备的麦克风只有一个,则语音采集模块需要对单麦克风获取到的单路音频做逻辑处理,进行分流,即可以对获取的到单路音频流内容进行备份,以实现获取多个音频流内容。每一份音频流内容可以按需进行后续的操作。其中,分流的主要方式可以包括全面分流和智能分流:其中,全面分流指将整个音频文件复制成为多份,然后独立进行处理;智能分流是将有效部分复制为多份,有效部分可以包括带有语音指令部分的语音。Specifically, during the recording process, the voice acquisition module can be used to obtain single-channel audio collected by a single microphone or multi-channel audio stream content collected by multiple microphones. It can be understood that in the embodiment of the present application, if the electronic device has only one microphone, the voice acquisition module needs to perform logical processing on the single-channel audio obtained by the single microphone and perform diversion, that is, the obtained single-channel audio stream content can be backed up to achieve the acquisition of multiple audio stream contents. Each audio stream content can be subsequently operated on as needed. Among them, the main methods of diversion can include comprehensive diversion and intelligent diversion: among them, comprehensive diversion refers to copying the entire audio file into multiple copies, and then processing them independently; intelligent diversion is to copy the effective part into multiple copies, and the effective part may include voice with voice command part.
操作控制模块可以用于对录制过程中的图像指令进行识别,并控制电子设备执行对应的操作。The operation control module can be used to identify image instructions during the recording process and control the electronic device to perform corresponding operations.
如图3f所示,操作控制模块主要包括图像/视频识别模块和***操作模块,图像/视频识别模块可以用于本申请实施例中后续提及的模糊识别和精确识别,***操作模块用于对语音图像/视频识别模块识别出的图像指令对应的操作进行执行。As shown in Figure 3f, the operation control module mainly includes an image/video recognition module and a system operation module. The image/video recognition module can be used for the fuzzy recognition and precise recognition mentioned later in the embodiments of the present application, and the system operation module is used to execute the operations corresponding to the image instructions recognized by the voice image/video recognition module.
其中,执行方式可以包括智能执行和交互执行。智能执行是识别出图像指令后,直接执行图像指令对应的操作;交互执行指识别出图像指令后显示询问信息,或通过语音发送询问信息,以便确认图像指令识别是否正确,例如弹出一个弹窗“是否需要切换摄像头”,当检测到用户确认操作,例如点击代表确认的控件,或者给出代表确认的图像或语音指令等情况下,确定图像指令识别正确,执行图像指令对应的操作。Among them, the execution mode may include intelligent execution and interactive execution. Intelligent execution is to directly execute the operation corresponding to the image instruction after the image instruction is recognized; interactive execution means to display inquiry information after the image instruction is recognized, or to send inquiry information through voice to confirm whether the image instruction recognition is correct, such as popping up a pop-up window "Do you need to switch the camera?" When the user confirmation operation is detected, such as clicking a control representing confirmation, or giving an image or voice instruction representing confirmation, it is determined that the image instruction recognition is correct and the operation corresponding to the image instruction is executed.
可以理解,本申请实施例中,如图3g所示,图像/视频识别模块可以用于获取***预设定的图像指令内容与操作之间的对应关系,以及用于获取用户自定义的图像指令内容与操作之间的对应关系。It can be understood that in an embodiment of the present application, as shown in Figure 3g, the image/video recognition module can be used to obtain the correspondence between the system preset image instruction content and the operation, and to obtain the correspondence between the user-defined image instruction content and the operation.
音画同步模块,如图3h所示,用于进行视频处理和音频处理,即对视频流内容和音频流内容中控制指令对应的时间段进行同步标记或称为同步控制,获取标记后的音频流内容和视频流内容。如图3i所示。同步控制的方式有及时同步和整体同步两种。The audio and video synchronization module, as shown in FIG3h, is used to perform video processing and audio processing, that is, to perform synchronization marking or synchronization control on the time period corresponding to the control instruction in the video stream content and the audio stream content, and obtain the marked audio stream content and video stream content. As shown in FIG3i. There are two ways of synchronization control: timely synchronization and overall synchronization.
及时同步就是在录制的过程中,接收到每个图像指令后,除了进行视频流中图像指令对应时刻点的标记,还进行音频流内容中图像指令时刻点的同步标记。整体同步就是在检测到录制结束指令后,才进行音频流内容中图像指令对应时刻点的同步标记。Real-time synchronization means that during the recording process, after receiving each image instruction, in addition to marking the corresponding time point of the image instruction in the video stream, the image instruction time point in the audio stream content is also synchronized. Overall synchronization means that after detecting the recording end instruction, the image instruction time point in the audio stream content is synchronized.
拍摄结果生成模块,用于在检测到录制结束指令时,基于标记后的音频流内容和视频流内容生成录制文件。例如,拍摄结果生成模块如图3j所示,可以用于匹配标记点、去除第二时间区间的录制内容以及生成第一录制文件。其中,第一录制文件可以是裁剪掉录制内容中第二时间区间对应的内容后的拼接内容。The shooting result generation module is used to generate a recording file based on the marked audio stream content and video stream content when the recording end instruction is detected. For example, the shooting result generation module is shown in Figure 3j, which can be used to match the marked points, remove the recorded content of the second time interval, and generate a first recording file. The first recording file can be the spliced content after cutting out the content corresponding to the second time interval in the recorded content.
匹配标记点可以使用两种方式,一是将标记过的视频流内容与标记过的音频流内容进行匹配;例如,可以比对标记过的视频流内容中的时刻点和音频流内容中的时刻点是否一致,当一致,则证明匹配成功。二是将标记过的视频流内容与其他未处理(未标记)的视频流内容进行匹配。例如,可以用标记过的视频流内容中的开始时刻点和结束时刻点的图像帧、或者开始时刻点与结束时刻点之间的整体图像帧与未处理的视频流内容中的对应开始时刻点和结束时刻点的图像帧、或者开始时刻点与结束时刻点之间的整体图像帧进行比对,图像帧一致则可以认为匹配成功。There are two ways to match the marked points. One is to match the marked video stream content with the marked audio stream content. For example, you can compare the time points in the marked video stream content and the time points in the audio stream content to see if they are consistent. If they are consistent, it proves that the match is successful. The second is to match the marked video stream content with other unprocessed (unmarked) video stream content. For example, you can use the image frames of the start and end time points in the marked video stream content, or the overall image frames between the start and end time points, to compare with the image frames of the corresponding start and end time points in the unprocessed video stream content, or the overall image frames between the start and end time points. If the image frames are consistent, it can be considered that the match is successful.
匹配成功后,可以去除音频流内容和视频流内容中标记的第二时间区间对应的内容。After the match is successful, the content corresponding to the second time interval marked in the audio stream content and the video stream content may be removed.
在一些实施例中,拍摄结果生成模块还可以用于基于录制内容生成未经裁剪的原始录制文件,以及基于录制内容生成经过裁剪后的多个素材片段文件,以下将详述。In some embodiments, the shooting result generation module may also be used to generate an uncropped original recording file based on the recording content, and to generate a plurality of cropped material segment files based on the recording content, which will be described in detail below.
可以理解,本申请实施例中,应用程序层还可以包括图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。It can be understood that in the embodiment of the present application, the application layer can also include applications such as gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。The application framework layer provides application programming interface (API) and programming framework for the applications in the application layer. The application framework layer includes some predefined functions.
应用程序框架层(Framework)可以包括相机接口(camera API),相机服务和相机框架。其中,相 机框架可以包括视频处理模块和音频处理模块。The application framework layer (Framework) can include camera API, camera service and camera framework. The machine frame may include a video processing module and an audio processing module.
在一些实施例中,应用程序框架层可以包括窗口管理器,内容提供器,视图***,电话管理器,资源管理器,通知管理器等。In some embodiments, the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。The window manager is used to manage window programs. The window manager can obtain the display screen size, determine whether there is a status bar, lock the screen, capture the screen, etc.
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。Content providers are used to store and retrieve data and make it accessible to applications. The data may include videos, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.
视图***包括可视控件,例如显示文字的控件,显示图片的控件等。视图***可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。The view system includes visual controls, such as controls for displaying text, controls for displaying images, etc. The view system can be used to build applications. A display interface can be composed of one or more views. For example, a display interface including a text notification icon can include a view for displaying text and a view for displaying images.
电话管理器用于提供电子设备的通信功能。例如通话状态的管理(包括接通,挂断等)。The phone manager is used to provide communication functions for electronic devices, such as the management of call status (including answering, hanging up, etc.).
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。The resource manager provides various resources for applications, such as localized strings, icons, images, layout files, video files, and so on.
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在***顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。The notification manager enables applications to display notification information in the status bar. It can be used to convey notification-type messages and can disappear automatically after a short stay without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc. The notification manager can also be a notification that appears in the system top status bar in the form of a chart or scroll bar text, such as notifications of applications running in the background, or a notification that appears on the screen in the form of a dialog window. For example, a text message is displayed in the status bar, a prompt sound is emitted, an electronic device vibrates, an indicator light flashes, etc.
硬件抽象层(HAL):是对硬件设备的抽象和封装,为安卓或鸿蒙***在不同硬件设备提供统一的访问接口。不同的硬件厂商遵循HAL标准来实现自己的硬件控制逻辑,但开发者可以不必关心不同硬件设备的差异,只需要按照HAL提供的标准接口访问硬件就可以了。Hardware Abstraction Layer (HAL): It is an abstraction and encapsulation of hardware devices, providing a unified access interface for Android or Hongmeng system on different hardware devices. Different hardware manufacturers follow the HAL standard to implement their own hardware control logic, but developers do not need to care about the differences between different hardware devices, they only need to access the hardware according to the standard interface provided by HAL.
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动、相机驱动,麦克风(MIC)驱动。MIC驱动用于驱动硬件中的麦克风获取音频。The kernel layer is the layer between hardware and software. The kernel layer contains at least display driver, camera driver, audio driver, sensor driver, camera driver, and microphone (MIC) driver. The MIC driver is used to drive the microphone in the hardware to obtain audio.
相机驱动用于驱动相机对图像信号进行处理,获取视频。The camera driver is used to drive the camera to process image signals and obtain video.
为解决上述问题,本申请实施例中提供一种视频录制控制方法,用于电子设备,方法包括:获取录制过程中的录制内容,录制内容可以包括视频流内容,也可以包括视频流内容和音频流内容,对视频流内容中的图像指令进行识别,当确定视频流内容中的图像指令与电子设备存储的目标操作对应的图像指令内容(或称为特征行为)匹配时,执行图像指令,并基于图像指令所处的第一时间区间标记出录制内容中的第二时间区间,获取标记后的录制内容,在获取到结束指令时,基于标记后的录制内容获取第一录制文件。其中,第一录制文件可以为录制内容中删除第二时间区间对应内容后的拼接内容。In order to solve the above problems, a video recording control method is provided in an embodiment of the present application, which is used in an electronic device. The method includes: obtaining the recorded content during the recording process, which may include video stream content, or video stream content and audio stream content, identifying the image instructions in the video stream content, and when it is determined that the image instructions in the video stream content match the image instruction content (or characteristic behavior) corresponding to the target operation stored in the electronic device, executing the image instructions, and marking the second time interval in the recorded content based on the first time interval where the image instructions are located, obtaining the marked recorded content, and when the end instruction is obtained, obtaining the first recorded file based on the marked recorded content. Among them, the first recorded file can be the spliced content after deleting the content corresponding to the second time interval in the recorded content.
本申请实施例中,目标操作对应的图像指令内容可以包括示意内容。In the embodiment of the present application, the image instruction content corresponding to the target operation may include schematic content.
在一些实施例中,示意内容可以包括特定文字信息,特定文字信息可以包括与操作的名称一致的或者关键信息一致的文字内容。例如,如图4a所示,切换镜头操作对应的示意内容可以为“切换镜头”文字。在一些实施例中,特定文字信息还可以包括用户自定义设置的固定模式内容,例如,如图4b所示,切换镜头操作对应的示意内容可以为数字“1”对应的各种形式的图像。In some embodiments, the schematic content may include specific text information, and the specific text information may include text content that is consistent with the name of the operation or the key information. For example, as shown in FIG4a, the schematic content corresponding to the switch lens operation may be the text "switch lens". In some embodiments, the specific text information may also include fixed mode content set by the user. For example, as shown in FIG4b, the schematic content corresponding to the switch lens operation may be various forms of images corresponding to the number "1".
在一些实施例中,示意内容还可以包括特定图像信息,特定图像信息可以是表示特定含义的标识。例如,如图4c所示,切换镜头操作对应的示意内容可以为表示切换镜头的标识。In some embodiments, the schematic content may also include specific image information, and the specific image information may be an identifier representing a specific meaning. For example, as shown in FIG4c , the schematic content corresponding to the lens switching operation may be an identifier representing lens switching.
基于上述方案,即使在录制过程通过图像指令进行录制控制,最终录制内容中也可以不存在图像指令,便于在录制过程中进行任意方式的远程控制,从而便于进行单人录制、直播等场景的录制,提升用户体验。此外,用于录制控制的指令包括多种形式的指令,便于用户在不同的场景采用不同的指令形式进行录制过程的控制。Based on the above scheme, even if the recording is controlled by image commands during the recording process, the image commands may not exist in the final recorded content, which is convenient for remote control in any way during the recording process, thereby facilitating the recording of scenes such as single-person recording and live broadcast, and improving the user experience. In addition, the commands for recording control include commands in multiple forms, which is convenient for users to control the recording process in different scenes using different command forms.
下面结合上述电子设备,对本申请实施例中提及的视频录制控制方法进行详细说明。图5示出了本申请实施例中一种视频录制控制方法的流程示意图,其中,视频录制控制方法可以由电子设备执行,如图5所示,视频录制控制方法可以包括:The video recording control method mentioned in the embodiment of the present application is described in detail below in combination with the above electronic device. FIG5 shows a flow chart of a video recording control method in the embodiment of the present application, wherein the video recording control method can be executed by the electronic device. As shown in FIG5, the video recording control method may include:
501:检测到录制开始指令,进行视频录制。501: A recording start command is detected and video recording is started.
可以理解,本申请实施例中,用户的录制开始指令可以指用户点击电子设备中的录制开始控件触发的录制开始指令,也可以是用户发出的开始录制对应的语音指令或者图像指令等远程控制指令。 It can be understood that in the embodiment of the present application, the user's recording start instruction may refer to the recording start instruction triggered by the user clicking the recording start control in the electronic device, or it may be a remote control instruction such as a voice instruction or image instruction corresponding to the start recording issued by the user.
例如,如图6所示,当用户在电子设备100中的相机应用中点击了录制控件001,则电子设备100可以检测到用户的录制开始指令,进行视频录制。可以理解,在一些实施例中,用户在其他应用,例如聊天应用中点击了录制控件001或通过其他方式触发了开始录制功能,电子设备也可以启动录制。再例如,如图7所示,当用户展示了录制开始操作对应的“握拳”图像,电子设备也可以检测到用户的录制开始指令,进行视频录制。For example, as shown in FIG6 , when the user clicks the recording control 001 in the camera application in the electronic device 100, the electronic device 100 can detect the user's recording start instruction and perform video recording. It can be understood that in some embodiments, the electronic device can also start recording when the user clicks the recording control 001 in other applications, such as a chat application, or triggers the start recording function in other ways. For another example, as shown in FIG7 , when the user displays the "fist" image corresponding to the recording start operation, the electronic device can also detect the user's recording start instruction and perform video recording.
可以理解,本申请实施例中的录制开始指令可以为任意可触发开始录制的指令。It can be understood that the recording start instruction in the embodiment of the present application can be any instruction that can trigger the start of recording.
可以理解,在一些实施例中,图像采集模块可以在检测到录制开始指令时,控制电子设备进行视频录制。It can be understood that in some embodiments, the image acquisition module can control the electronic device to perform video recording when a recording start instruction is detected.
本申请实施例中,电子设备可以通过摄像头执行录制以获取视频流内容,也可以通过摄像头和麦克风共同执行录制,以分别获取视频流内容和音频流内容。In the embodiment of the present application, the electronic device can perform recording through a camera to obtain video stream content, or can perform recording through a camera and a microphone together to obtain video stream content and audio stream content respectively.
502:识别录制过程中的图像指令。502: Identify image instructions during recording.
本申请和实施例中,电子设备可以检测并识别视频流内容中用户通过摄像头输入的图像指令。In the present application and embodiments, the electronic device can detect and recognize image instructions input by the user through the camera in the video stream content.
可以理解,本申请实施例中,电子设备可以通过检测录制过程中的图像指令与电子设备存储的目标操作对应的图像指令内容(或称为特征行为)是否匹配进行图像指令的识别,并可以在识别到图像指令时,执行目标操作,以对视频录制进行控制。其中,对图像指令进行识别匹配的方式在后文中详述。It can be understood that in the embodiment of the present application, the electronic device can recognize the image command by detecting whether the image command in the recording process matches the image command content (or characteristic behavior) corresponding to the target operation stored in the electronic device, and can execute the target operation when the image command is recognized to control the video recording. The method of identifying and matching the image command is described in detail later.
本申请实施例中,如前所述,目标操作对应的图像指令内容可以包括示意内容,示意内容可以包括特定文字信息,特定文字信息可以理解为与操作的名称一致的或者关键信息一致的文字内容。例如,如图4a所示,切换镜头操作对应的示意内容可以为“切换镜头”文字。在一些实施例中,特定文字信息还可以包括用户自定义设置的固定模式内容,例如,如图4b所示,切换镜头操作对应的示意内容可以为数字“1”对应的各种形式的图像。In the embodiment of the present application, as described above, the image instruction content corresponding to the target operation may include schematic content, and the schematic content may include specific text information. The specific text information may be understood as text content that is consistent with the name of the operation or the key information. For example, as shown in FIG4a, the schematic content corresponding to the switch lens operation may be the text "switch lens". In some embodiments, the specific text information may also include fixed mode content customized by the user. For example, as shown in FIG4b, the schematic content corresponding to the switch lens operation may be various forms of images corresponding to the number "1".
在一些实施例中,示意内容还可以包括特定图像信息,特定图像信息可以是表示特定含义的标识。例如,如图4c所示,切换镜头操作对应的示意内容可以为表示切换镜头的标识。In some embodiments, the schematic content may also include specific image information, and the specific image information may be an identifier representing a specific meaning. For example, as shown in FIG4c , the schematic content corresponding to the lens switching operation may be an identifier representing lens switching.
在一些实施例中,目标操作对应的图像指令内容还可以包括特定肢体信息或特定脸部信息,其中特定肢体信息可以包括特定手势信息。In some embodiments, the image instruction content corresponding to the target operation may also include specific body information or specific facial information, wherein the specific body information may include specific gesture information.
可以理解,上述目标操作可以包括切换摄像头、调整焦距等任意操作。It can be understood that the above target operation may include any operation such as switching cameras, adjusting focal length, etc.
在一些实施例中,切换摄像头操作对应的图像指令内容还可以包括摄像头获取的视频流内容中存在调转电子设备方向导致的录制画面的旋转。In some embodiments, the image instruction content corresponding to the camera switching operation may also include the rotation of the recorded image caused by turning the direction of the electronic device in the video stream content acquired by the camera.
例如,当用户当前时刻正在使用手机的后置摄像头拍摄他人,手机摄像头拍摄方向为第一拍摄方向,用户想要进行自拍,即想要切换为前置摄像头,则可以通过将手机调转第一设定角度以实现录制画面的旋转,例如,将手机从第一拍摄方向调转至与第一拍摄方向具有第一设定角度的第二拍摄方向。当手机检测到手机从第一拍摄方向调转至与第一拍摄方向具有第一设定角度的第二拍摄方向导致的后置摄像头的录制画面发生旋转,则确定检测到切换摄像头的图像指令,此时,控制手机将后置摄像头拍摄模式切换为前置摄像头拍摄模式。For example, when a user is currently using the rear camera of a mobile phone to take a photo of others, and the shooting direction of the mobile phone camera is the first shooting direction, and the user wants to take a selfie, that is, wants to switch to the front camera, the user can rotate the mobile phone to a first set angle to achieve the rotation of the recorded image, for example, the mobile phone is turned from the first shooting direction to the second shooting direction that has a first set angle with the first shooting direction. When the mobile phone detects that the recorded image of the rear camera rotates due to the mobile phone being turned from the first shooting direction to the second shooting direction that has a first set angle with the first shooting direction, it is determined that an image instruction to switch the camera is detected, and at this time, the mobile phone is controlled to switch the shooting mode of the rear camera to the shooting mode of the front camera.
在一些实施例中,若用户当前时刻正在使用手机的前置摄像头自拍,手机摄像头拍摄方向为第三拍摄方向,用户想要通过手机的后置摄像头拍摄他人,则也可以通过将手机调转第一设定角度以实现录制画面的旋转,例如,将手机从第三拍摄方向调转至与第三拍摄方向具有第一设定角度的第四拍摄方向。当手机检测到手机从第三拍摄方向调转至与第三拍摄方向具有第一设定角度的第四拍摄方向导致的前置摄像头的录制画面发生旋转,则确定检测到切换摄像头的图像指令,此时,控制手机将前置摄像头拍摄模式切换为后置摄像头拍摄模式。In some embodiments, if the user is currently taking a selfie with the front camera of the mobile phone, and the shooting direction of the mobile phone camera is the third shooting direction, and the user wants to take a picture of others with the rear camera of the mobile phone, the user can also rotate the mobile phone to the first set angle to achieve the rotation of the recorded picture, for example, the mobile phone is turned from the third shooting direction to the fourth shooting direction that has the first set angle with the third shooting direction. When the mobile phone detects that the recorded picture of the front camera rotates due to the rotation of the mobile phone from the third shooting direction to the fourth shooting direction that has the first set angle with the third shooting direction, it is determined that the image instruction for switching the camera is detected, and at this time, the mobile phone is controlled to switch the shooting mode of the front camera to the shooting mode of the rear camera.
在一些实施例中,电子设备对图像指令进行识别可以为模糊识别,即对图像的大致内容或关键信息进行识别,例如,当图像指令中存在与存储的图像指令内容中的文字内容关键部分一致,或图像相似度达到设定相似度。则确定识别到图像指令,并获取图像指令对应的操作。In some embodiments, the electronic device may perform fuzzy recognition on the image command, that is, recognize the approximate content or key information of the image, for example, when the image command contains a key part of the text content that is consistent with the stored image command content, or the image similarity reaches a set similarity, it is determined that the image command is recognized, and the operation corresponding to the image command is obtained.
在一些实施例中,电子设备对图像进行识别可以为精确识别,即当图像指令与存储的图像指令内容完全一致的情况下,则确定识别到图像指令,并获取图像指令对应的操作。In some embodiments, the electronic device may recognize the image accurately, that is, when the image instruction is completely consistent with the stored image instruction content, it is determined that the image instruction is recognized and the operation corresponding to the image instruction is obtained.
例如,电子设备中存储的切换镜头操作对应的图像指令内容中为“切换镜头”文字,或双指并拢手势,且设置的切换镜头操作对应的图像指令的识别为精确识别,则电子设备只有在录制到“切换镜头” 文字,或双指并拢手势,电子设备确认检测到“切换镜头”图像指令,并执行该“切换镜头”图像指令对应的操作切换镜头操作。For example, if the image instruction content corresponding to the switch lens operation stored in the electronic device contains the text "switch lens" or a two-finger close-together gesture, and the recognition of the image instruction corresponding to the switch lens operation is set to accurate recognition, the electronic device will only record the "switch lens" text when it is recorded. The electronic device confirms that a "switch lens" image instruction is detected by using text or a two-finger close-together gesture, and executes the operation corresponding to the "switch lens" image instruction to switch the lens.
可以理解,在一些实施例中,为了确保图像指令识别的正确性,电子设备还可以在获取识别结果后显示询问信息,或通过语音发送询问信息,以便确认图像指令识别是否正确,例如弹出一个弹窗“是否需要切换摄像头”,当检测到用户确认操作,例如点击代表确认的控件,或者给出代表确认的图像指令等情况下,确定图像指令识别正确,执行图像指令对应的操作。It can be understood that in some embodiments, in order to ensure the correctness of image command recognition, the electronic device may also display an inquiry message after obtaining the recognition result, or send an inquiry message through voice, so as to confirm whether the image command recognition is correct, such as popping up a pop-up window "Do you need to switch the camera?" When a user confirmation operation is detected, such as clicking a control representing confirmation, or giving an image command representing confirmation, etc., it is determined that the image command recognition is correct, and the operation corresponding to the image command is executed.
在一些实施例中,当识别方式为模糊识别的情况下,可能会存在识别到的图像指令对应多个操作的情况,此时也可以显示询问信息,以便确认用户意图。如果用户未进行选择超过设定时间,可以默认选择***设置的第一种操作进行执行。In some embodiments, when the recognition method is fuzzy recognition, there may be a situation where the recognized image instruction corresponds to multiple operations. At this time, a query message may be displayed to confirm the user's intention. If the user does not make a selection within a set time, the first operation set by the system may be selected by default for execution.
可以理解,在一些实施例中,图像/视频采集模块可以用于检测录制过程中的图像指令。It can be understood that in some embodiments, the image/video acquisition module can be used to detect image instructions during the recording process.
503:执行图像指令对应的操作,并基于录制内容中图像指令所处的第一时间区间,对录制内容中的第二时间区间进行标记,获取标记后的录制内容。503: Execute an operation corresponding to the image instruction, and based on the first time interval where the image instruction is located in the recorded content, mark the second time interval in the recorded content, and obtain the marked recorded content.
可以理解,本申请实施例中,电子设备在检测到图像指令后,可以执行图像指令对应的操作。It can be understood that in the embodiment of the present application, after detecting the image instruction, the electronic device can execute the operation corresponding to the image instruction.
本申请实施例中,当录制内容只包括视频流内容时,基于图像指令所处的第一时间区间,对录制内容中的第二时间区间进行标记的方式可以包括:In the embodiment of the present application, when the recorded content includes only video stream content, the method of marking the second time interval in the recorded content based on the first time interval in which the image instruction is located may include:
电子设备可以直接将检测到的图像指令所处的第一时间区间的开始时刻点和结束时刻点作为视频流内容中第二时间区间的开始时刻点和结束时刻点,并对视频流内容中第二时间区间对应的开始时刻点和结束时刻点进行标记。The electronic device can directly use the start time point and end time point of the first time interval in which the detected image instruction is located as the start time point and end time point of the second time interval in the video stream content, and mark the start time point and end time point corresponding to the second time interval in the video stream content.
本申请实施例中,当录制内容包括视频流内容和音频流内容时,基于图像指令所处的第一时间区间,对录制内容中的第二时间区间进行标记的方式可以包括:In the embodiment of the present application, when the recorded content includes video stream content and audio stream content, the method of marking the second time interval in the recorded content based on the first time interval in which the image instruction is located may include:
电子设备可以直接将检测到的图像指令所处的第一时间区间的开始时刻点和结束时刻点作为音频流内容和视频流内容中第二时间区间的开始时刻点和结束时刻点,并对音频流内容和视频流内容中第二时间区间对应的开始时刻点和结束时刻点进行标记。The electronic device can directly use the start time point and end time point of the first time interval in which the detected image instruction is located as the start time point and end time point of the second time interval in the audio stream content and the video stream content, and mark the start time point and end time point corresponding to the second time interval in the audio stream content and the video stream content.
在一些实施例中,为了进一步确保标记的准确性以及录制内容不会产生突变,可以还可以对视频流内容中和音频流内容中图像指令所处的第一时间区间的开始时刻点和结束时刻点进行优化,以获取优化后的开始时刻点和结束时刻点,即将优化后的开始时刻点和结束时刻点的作为第二时间区间的开始时刻点和结束时刻点。In some embodiments, in order to further ensure the accuracy of the marking and that the recorded content does not undergo sudden changes, the start and end times of the first time interval in which the image instructions in the video stream content and the audio stream content are located can be optimized to obtain the optimized start and end times, and the optimized start and end times can be used as the start and end times of the second time interval.
在一些实施例中,如图8所示,可以通过音频优化法获取优化后的开始时刻点和结束时刻点,即:对音频流内容中图像指令所处的第一时间区间的开始时刻点和结束时刻点的前后内容进行比对,以扩大标记范围,然后确定扩大标记范围后的开始时刻点和结束时刻点(即确定第三时间区间对应的开始时刻点和结束时刻点),并将扩大标记范围后的开始时刻点和结束时刻点透传至视频流内容中,进行反向对比,以判断视频是否存在明显冲突,即对比视频流内容中扩大部分(第三时间区间中除第一时间区间之外的区间)的各图像帧中的任意两相邻帧的相似度是否均大于第五阈值,若大于第五阈值,则将扩大标记范围后的开始时刻点和结束时刻点作为第二时间区间的开始时刻点和结束时刻点。若存在小于等于第五阈值,则将图像指令所处的第一时间区间的开始时刻点和结束时刻点作为音频流和视频流内容中的第二时间区间的开始时刻点和结束时刻点。In some embodiments, as shown in FIG8 , the optimized start time point and end time point can be obtained by the audio optimization method, that is, the start time point and the end time point of the first time interval where the image instruction is located in the audio stream content are compared to expand the marking range, and then the start time point and the end time point after the expanded marking range are determined (that is, the start time point and the end time point corresponding to the third time interval are determined), and the start time point and the end time point after the expanded marking range are transparently transmitted to the video stream content for reverse comparison to determine whether there is an obvious conflict in the video, that is, the similarity of any two adjacent frames in each image frame of the expanded part of the video stream content (the interval in the third time interval except the first time interval) is compared to see whether they are both greater than the fifth threshold. If they are greater than the fifth threshold, the start time point and the end time point after the expanded marking range are used as the start time point and the end time point of the second time interval. If there is less than or equal to the fifth threshold, the start time point and the end time point of the first time interval where the image instruction is located are used as the start time point and the end time point of the second time interval in the audio stream and video stream content.
其中,确定扩大标记范围后的开始时刻点和结束时刻点的方式在后文中详述。The method of determining the start time point and the end time point after the mark range is expanded will be described in detail later.
504:检测到录制结束指令后,基于标记后的录制内容,生成第一录制文件。504: After detecting the recording end instruction, a first recording file is generated based on the marked recording content.
可以理解,本申请实施例中,检测到录制结束指令后,可以对标记后的视频流内容和标记后的音频流内容中第二时间区间对应的内容进行裁剪,并对裁剪后的剩余内容进行拼接,生成第一录制文件。It can be understood that in an embodiment of the present application, after detecting the recording end instruction, the content corresponding to the second time interval in the marked video stream content and the marked audio stream content can be cropped, and the remaining content after cropping can be spliced to generate a first recording file.
在一些实施例中,还可以基于未进行标记的录制内容,生成完整的原录制文件。在一些实施例中,还可以生成标记有第二时间区间的开始时刻点和结束时刻点的第二录制文件,以便于后续处理。In some embodiments, a complete original recording file can also be generated based on the unmarked recording content. In some embodiments, a second recording file marked with the start time point and the end time point of the second time interval can also be generated to facilitate subsequent processing.
在一些实施例中,还可以对基于录制内容生成多个素材片段,即第三录制文件集。多个素材片段中可以包括图像指令类的片段,即第一类录制片段文件;也可以包括录制内容类的片段,即第二类录制片段文件。其中,图像指令类的片段包括图像指令对应时间段所对应的录制内容,录制内容类的片段为录制的视频中裁剪掉图像指令对应时间段的剩余的各个片段内容。 In some embodiments, multiple material clips, i.e., a third recording file set, can be generated based on the recorded content. The multiple material clips may include image instruction clips, i.e., the first type of recording clip files; and may also include recording content clips, i.e., the second type of recording clip files. The image instruction clips include the recorded content corresponding to the time period corresponding to the image instruction, and the recording content clips are the remaining clip contents in the recorded video after cutting out the time period corresponding to the image instruction.
在一些实施例中,电子设备可以将图像指令类的片段与对应的操作存储在图像指令素材库中,以用于后续图像指令的识别和匹配。电子设备可以存储录制内容类的片段,可以便于用户进行单独查看,或进行拼接合成等剪辑处理。In some embodiments, the electronic device can store image instruction-like segments and corresponding operations in an image instruction material library for subsequent image instruction recognition and matching. The electronic device can store recorded content-like segments, which can facilitate users to view them separately or perform editing processing such as splicing and synthesis.
可以理解,本申请实施例中,为了进一步确保标记的准确性,在进行裁剪之前,还可以对标记过的音频流内容和视频流内容中的开始时刻点和结束时刻点进行匹配。例如,可以将标记过的视频流内容与标记过的音频流内容或未标记过的视频流内容进行匹配,当确认匹配成功,则确定对音频流内容和视频流内容中开始时刻点和结束时刻点之间的内容进行裁剪,生成第一录制文件。It can be understood that in the embodiment of the present application, in order to further ensure the accuracy of the marking, the start time point and the end time point in the marked audio stream content and the video stream content can also be matched before cutting. For example, the marked video stream content can be matched with the marked audio stream content or the unmarked video stream content. When it is confirmed that the match is successful, it is determined to cut the content between the start time point and the end time point in the audio stream content and the video stream content to generate the first recording file.
其中,匹配方式可以包括:比对标记过的视频流内容中的时刻点和音频流内容中的时刻点是否一致,当一致,则证明匹配成功。或者是用标记过的视频流内容中的开始时刻点和结束时刻点的图像帧与未处理的视频流内容中的对应开始时刻点和结束时刻点的图像帧、标记过的视频流内容中的开始时刻点与结束时刻点之间的整体图像帧与未处理的视频流内容中开始时刻点与结束时刻点之间的整体图像帧进行比对,图像帧一致则可以认为匹配成功。The matching method may include: comparing the time points in the marked video stream content and the time points in the audio stream content to see if they are consistent, and if they are consistent, the match is successful. Alternatively, the image frames at the start and end time points in the marked video stream content are compared with the image frames at the corresponding start and end time points in the unprocessed video stream content, and the entire image frame between the start and end time points in the marked video stream content is compared with the entire image frame between the start and end time points in the unprocessed video stream content. If the image frames are consistent, the match is considered successful.
基于上述方案,即使在录制过程进行图像指令等录制控制,最终录制内容也不会存在额外的图像指令内容,从而便于进行单人录制、直播等场景的录制。此外,用于录制控制的图像指令可以包括多种形式的图像指令,便于用户在不同的场景采用不同的指令形式进行录制过程的控制。Based on the above scheme, even if recording control such as image instructions is performed during the recording process, there will be no additional image instruction content in the final recording content, which is convenient for recording scenes such as single-person recording and live broadcast. In addition, the image instructions for recording control can include multiple forms of image instructions, which is convenient for users to use different instruction forms to control the recording process in different scenes.
下面以图9a-9e中在录制过程中,用户通过图像指令进行录制过程中的控制场景为例,简要说明本申请中的方案。The following briefly describes the solution in the present application by taking the scenario in which the user controls the recording process through image instructions during the recording process in Figures 9a-9e as an example.
如图9a所示,用户开启了后置模式进行录制,图9a中展示了后置摄像头的录制画面,此时,如图9b所示,用户展示了切换摄像头操作对应的“切换摄像头”文字内容,例如,带有切换摄像头文字的纸张,电子设备录制到“切换摄像头”文字的图像,则执行对应的切换摄像头操作,例如将后置录制模式转换为前置录制模式。并在确定该“切换摄像头”的图像指令在视频流内容和音频流内容中对应的时间段为00:03-00:05时,对其中一份视频流内容和其中一份音频流内容中的00:03-00:05时间段进行标记。例如,图9c中展示了执行图像指令对应的操作后,前置录制模式录制到的画面。可以理解,本场景中提及的视频流内容和音频流内容中图像指令对应的时间段可以指优化后的第二时间区间对应的时间段。As shown in FIG9a, the user turns on the rear mode for recording, and FIG9a shows the recording screen of the rear camera. At this time, as shown in FIG9b, the user shows the text content of "switch camera" corresponding to the camera switching operation, for example, a paper with the text of "switch camera", and the electronic device records the image of the text of "switch camera", then performs the corresponding camera switching operation, for example, converting the rear recording mode to the front recording mode. And when it is determined that the time period corresponding to the image instruction of "switch camera" in the video stream content and the audio stream content is 00:03-00:05, the 00:03-00:05 time period in one of the video stream content and one of the audio stream content is marked. For example, FIG9c shows the screen recorded in the front recording mode after executing the operation corresponding to the image instruction. It can be understood that the time period corresponding to the image instruction in the video stream content and the audio stream content mentioned in this scenario can refer to the time period corresponding to the optimized second time interval.
且在后续录制过程中,如图9d所示,用户展示了停止录制操作对应的“ok”手势动作,电子设备检测到该“停止录制”对应的图像指令,则执行对应的操作,例如停止录制,并在确定该图像指令在视频流内容和音频流内容中对应的时间段为00:15-00:17时间段时,对视频流内容和音频流内容中的00:15-00:17时间段进行标记,并对视频流内容中和音频流内容中图像指令对应时间段00:03-00:05、00:15-00:17对应的内容进行裁剪,获取录制文件。例如,图9e中展示了最终的录制文件。从图9e中可以看出,整个录制过程为17秒,最终的录制文件裁剪了图像指令对应时间段后,录制文件的总时长为13秒。And in the subsequent recording process, as shown in FIG9d, the user shows the "ok" gesture action corresponding to the stop recording operation, and the electronic device detects the image instruction corresponding to the "stop recording", then performs the corresponding operation, such as stopping recording, and when it is determined that the time period corresponding to the image instruction in the video stream content and the audio stream content is the 00:15-00:17 time period, the 00:15-00:17 time period in the video stream content and the audio stream content is marked, and the content corresponding to the time periods 00:03-00:05 and 00:15-00:17 in the video stream content and the audio stream content corresponding to the image instruction is cropped to obtain the recording file. For example, the final recording file is shown in FIG9e. As can be seen from FIG9e, the entire recording process is 17 seconds, and after the final recording file is cropped from the time period corresponding to the image instruction, the total length of the recording file is 13 seconds.
下面对上述步骤502中对图像指令进行识别匹配的方式进行详述:The following is a detailed description of the method of identifying and matching the image instructions in the above step 502:
可以理解,在一些实施例中,电子设备中可以存储有图像/视频资源库,图像/视频资源库可以包括图像/视频***库和自定义图像/视频控制库。其中,图像/视频***库用于***预设的图像指令内容以及对应的操作,自定义图像/视频控制库用于存储用户自定义设置的图像指令内容以及对应的操作,且各图像/视频资源库中可以限定识别方式为模糊识别和精确识别。It can be understood that in some embodiments, an image/video resource library may be stored in the electronic device, and the image/video resource library may include an image/video system library and a custom image/video control library. Among them, the image/video system library is used for system preset image instruction content and corresponding operations, and the custom image/video control library is used to store user-defined image instruction content and corresponding operations, and each image/video resource library may limit the recognition mode to fuzzy recognition and precise recognition.
本申请实施例中操作对应的图像指令内容可以为用户自定义的或***预设的任意可实施的内容。可以理解,本申请实施例中所指的图像指令内容不局限于静态的一张图或者一帧图像,也可以是一段时间内连续的多帧图像指令内容。The image instruction content corresponding to the operation in the embodiment of the present application can be any executable content customized by the user or preset by the system. It can be understood that the image instruction content referred to in the embodiment of the present application is not limited to a static image or a frame of image, but can also be a continuous multi-frame image instruction content within a period of time.
例如,本申请实施例中,操作对应的图像指令内容可以包括肢体动作,如图10a所示,肢体动作可以包括特定肢体信息和特定脸部信息。其中,特定肢体信息可以包括特定手势信息,特定手势信息可以为特定手势图像,如图10b中所示,特定手势图像可以包括【OK】、【点赞】、【比yeah】、【爱心】、【握拳】、【爱你】等手势图像,不同的手势图像可以对应不同的操作,比如【OK】代表“停止录制”,【点赞】代表“放大焦距”等。脸部信息可以为脸部图像,其中脸部图像可以包括【点头】、【摇头】、【笑脸】等脸部图像,不同的脸部图像可以对应不同的操作,例如,【笑脸】对应的操作可以为“抓拍”,【点头】对应的操作可以为“停止录制”,【摇头】对应的操作可以为“切换镜头”等。在一些实施例中,特定肢 体信息还可以包括特定人体姿势信息,特定人体姿势信息可以包括【转圈】、【下蹲】、【大字姿势】、【跳跃】等姿势图像,不同的姿势图像可以对应不同的操作,例如,【转圈】对应的操作可以为“切换摄像头”,【跳跃】对应的操作可以为“放大焦距”等。For example, in an embodiment of the present application, the image instruction content corresponding to the operation may include body movements, as shown in FIG10a, and the body movements may include specific body information and specific facial information. Among them, the specific body information may include specific gesture information, and the specific gesture information may be a specific gesture image, as shown in FIG10b, and the specific gesture image may include gesture images such as [OK], [Like], [Yeah], [Love], [Fist], and [Love You]. Different gesture images may correspond to different operations, for example, [OK] represents "stop recording", and [Like] represents "zoom in focus", etc. The facial information may be a facial image, wherein the facial image may include facial images such as [Nod], [Shake Head], and [Smiley Face]. Different facial images may correspond to different operations, for example, the operation corresponding to [Smiley Face] may be "snap shot", the operation corresponding to [Nod] may be "stop recording", and the operation corresponding to [Shake Head] may be "switch lens", etc. In some embodiments, specific body information may include specific gesture information, and the specific gesture information may be a specific gesture image, as shown in FIG10b, and the specific gesture image may include gesture images such as [OK], [Like], [Fist], and [Love You]. Different gesture images may correspond to different operations, for example, [OK] represents "stop recording", and [Like] represents "zoom in focus", etc. The body information may also include specific human posture information. The specific human posture information may include posture images such as [turning in circles], [squatting], [large character posture], and [jumping]. Different posture images may correspond to different operations. For example, the operation corresponding to [turning in circles] may be "switch camera", and the operation corresponding to [jumping] may be "enlarge focal length", etc.
可以理解,本申请实施例中,操作对应的图像指令内容还可以包括示意内容,其中,It can be understood that in the embodiment of the present application, the image instruction content corresponding to the operation may also include schematic content, wherein:
如图11中所示,示意内容可以包括特定文字信息和特定图像信息。As shown in FIG. 11 , the schematic content may include specific text information and specific image information.
在一些实施例中,特定文字信息可以包括与操作的名称或者与关键信息一致的文字内容。例如,如前述图4a所示,切换镜头操作对应的示意内容可以为“切换镜头”文字。在一些实施例中,特定文字信息还可以包括用户自定义设置的固定模式内容,例如,如前述图4b所示,切换镜头操作对应的示意内容可以为数字“1”对应的各种形式的图像。In some embodiments, the specific text information may include text content that is consistent with the name of the operation or with the key information. For example, as shown in FIG. 4a above, the schematic content corresponding to the switch lens operation may be the text "switch lens". In some embodiments, the specific text information may also include fixed mode content that is customized by the user. For example, as shown in FIG. 4b above, the schematic content corresponding to the switch lens operation may be various forms of images corresponding to the number "1".
在一些实施例中,特定图像信息可以是表示特定含义的标识。例如,如前述图4c所示,切换镜头操作对应的示意内容可以为表示切换镜头的标识。In some embodiments, the specific image information may be a logo indicating a specific meaning. For example, as shown in FIG. 4c above, the schematic content corresponding to the lens switching operation may be a logo indicating lens switching.
表1中示出了本申请实施例中部分图像指令内容与操作的对应表,如表1所示,电子设备中其中一图像/视频资源库存储的切换镜头操作对应的图像指令内容可以为“切换镜头”文字,或双指并拢手势,识别方式可以为模糊识别(其中,表1中记录是否一一对应选项栏中,若选择为是,则限定识别方式为精确识别,若选择为否,则限定识别方式为模糊识别)。放大焦距操作对应的图像指令内容可以为“放大焦距”文字,或五指摊开手势,识别方式可以为模糊识别。切换到前置摄像头镜头操作对应的图像指令内容可以为“切换到前置”文字,或对号手势,识别方式可以为模糊识别。Table 1 shows a correspondence table between some image instruction contents and operations in an embodiment of the present application. As shown in Table 1, the image instruction content corresponding to the lens switching operation stored in one of the image/video resource libraries in the electronic device can be the text "switch lens" or a two-finger gesture, and the recognition method can be fuzzy recognition (wherein, in the option column in Table 1, whether there is a one-to-one correspondence is recorded. If yes is selected, the recognition method is limited to precise recognition, and if no is selected, the recognition method is limited to fuzzy recognition). The image instruction content corresponding to the zoom focus operation can be the text "zoom focus" or a five-finger spread gesture, and the recognition method can be fuzzy recognition. The image instruction content corresponding to the switch to the front camera lens operation can be the text "switch to front" or a check mark gesture, and the recognition method can be fuzzy recognition.
表1:图像指令内容与操作的对应表
Table 1: Correspondence between image command content and operation
下面对本申请实施例中提及的对图像指令进行识别匹配的方法进行说明,如图12所示,匹配方法可以包括:图像帧比对法和图像局部比对法。The following is an explanation of the method for identifying and matching image instructions mentioned in the embodiment of the present application. As shown in FIG. 12 , the matching method may include: an image frame comparison method and an image local comparison method.
首先对本申请实施例中的图像帧比对法进行说明。如图13a所示,A文件代表的是实时录制的视频流内容,B文件代表的是用于图像控制处理的视频流内容;可以理解,A文件可以不进行标记等处理,保留原文件,以便后续比对或其他需求,B文件可以用于进行控制指令对应的时刻点的标记或裁剪等处理。其中,其中B文件中的图像1可以代表识别出的“切换摄像头”操作对应的图像指令,图像2可以代表“停止录制”操作对应的图像指令。图像1具有起始时刻点和结束时刻点,图像2也具有起始时刻点和结束时刻点。First, the image frame comparison method in the embodiment of the present application is explained. As shown in Figure 13a, file A represents the video stream content recorded in real time, and file B represents the video stream content used for image control processing; it can be understood that file A can be kept without marking or other processing for subsequent comparison or other needs, and file B can be used for marking or cropping the time points corresponding to the control instructions. Among them, image 1 in file B can represent the image instruction corresponding to the identified "switch camera" operation, and image 2 can represent the image instruction corresponding to the "stop recording" operation. Image 1 has a starting time point and an ending time point, and image 2 also has a starting time point and an ending time point.
其中,图像指令进行识别匹配的过程如图13b所示:电子设备会将B文件中的每个图像帧与图像资源库中的操作对应的图像指令内容(或称为特征行为)中的每个图像帧进行比对,存在一图像指令的开始至结束的图像帧与资源库中存储的任意图像指令内容的开始到结束的图像帧匹配即完全一致时,则确认识别成功。Among them, the process of image instruction recognition and matching is shown in Figure 13b: the electronic device will compare each image frame in the B file with each image frame in the image instruction content (or called feature behavior) corresponding to the operation in the image resource library. When there is an image frame from the beginning to the end of an image instruction that matches the image frame from the beginning to the end of any image instruction content stored in the resource library, that is, it is completely consistent, then the recognition is confirmed to be successful.
例如,可以将实时采集的图像中第一个能与资源库中的任意图像指令内容匹配的图像帧,例如第一图像帧,对应的时刻点标记为图像指令所在的第一时间区间的开始时刻点,将最后一个能与资源库中的图像指令内容匹配上的图像帧,例如第二图像帧,对应的时刻点标记为第一时间区间的结束时刻点,如图13c所示,当图像帧完全一致,即开始时刻点和结束时刻点之间的图像帧均一致则可以证明识别成功。当图像帧存在部分不一致,例如有始无终的情况,即如图13d所示的开始时刻点的图像帧一致,但结束时刻点的图像帧不一致;或者有终无始的情况,即图13e所示的开始时刻点的图像帧不一致,结时刻束点的图像帧一致的情况,均确定识别失败。For example, the first image frame in the real-time captured image that can match any image instruction content in the resource library, such as the first image frame, can be marked as the starting time point of the first time interval where the image instruction is located, and the last image frame that can match the image instruction content in the resource library, such as the second image frame, can be marked as the ending time point of the first time interval. As shown in Figure 13c, when the image frames are completely consistent, that is, the image frames between the starting time point and the ending time point are consistent, it can be proved that the recognition is successful. When there are partial inconsistencies in the image frames, such as the case of having a beginning but no end, that is, the image frames at the starting time point are consistent as shown in Figure 13d, but the image frames at the ending time point are inconsistent; or the case of having an end but no beginning, that is, the image frames at the starting time point are inconsistent as shown in Figure 13e, and the image frames at the ending time point are consistent, it is determined that the recognition has failed.
当识别成功后,可以在A文件中进一步识别。例如,可以将标记好的开始时刻点和结束时刻点透传至A文件中,即在A文件中确定出一致的开始时刻点和结束时刻点,并将在A文件中对应的开始时刻点和 结束时刻点之间的图像帧与资源库中的对应图像指令内容的图像帧进行比对,匹配一致则证明图像识别成功,如果不一致都证明识别失败。在上述双向循环标记法识别的过程中,两次识别均成功才算识别成功,只要存在一次识别失败,均可以认为本次的图像指令识别就是失败的。识别失败后,本次识别过程中产生的标记需要全部清除。如此,通过上述循环双向识别方法,能够有效提供图像识别的准确度。When the recognition is successful, further recognition can be performed in file A. For example, the marked start time point and end time point can be transparently transmitted to file A, that is, the consistent start time point and end time point can be determined in file A, and the corresponding start time point and end time point in file A can be transferred to file A. The image frames between the end time points are compared with the image frames of the corresponding image instruction content in the resource library. If they match, it proves that the image recognition is successful. If they are inconsistent, it proves that the recognition has failed. In the process of the above-mentioned two-way circular marking method, the recognition is successful only when both recognitions are successful. As long as there is one recognition failure, it can be considered that the image instruction recognition this time has failed. After the recognition fails, all the marks generated in this recognition process need to be cleared. In this way, the accuracy of image recognition can be effectively improved through the above-mentioned circular two-way recognition method.
可以理解,上述提及的图像帧比对法可以如图13f所示,将B文件中的图像帧的每个像素点与图像资源库中的对应图像指令内容中对应图像帧的全部像素点进行比对。在一些实施例中,造成图像匹配结果为不匹配的原因可能是如13g所示存在坏点(或噪点),而并不是实际意义上的图像指令不匹配,造成匹配结果错误。It can be understood that the above-mentioned image frame comparison method can be shown in Figure 13f, where each pixel of the image frame in the B file is compared with all the pixels of the corresponding image frame in the corresponding image instruction content in the image resource library. In some embodiments, the reason for the image matching result being mismatched may be the presence of bad pixels (or noise points) as shown in Figure 13g, rather than the image instruction mismatch in the actual sense, resulting in an erroneous matching result.
为解决上述问题,本申请实施例提供一种差异点二次分析法进行上述图像帧的比对,如图13h所示,差异点二次分析法包括坏点去除法和差异点忽略法。To solve the above problem, an embodiment of the present application provides a difference point secondary analysis method for comparing the above image frames, as shown in FIG13h , the difference point secondary analysis method includes a bad point removal method and a difference point ignoring method.
其中,坏点去除法是指即在进行两帧图像对比之前,首先分析图像中是否存在坏点,当存在坏点时候,将坏点去除后,进行图像帧的比对。Among them, the bad pixel removal method refers to first analyzing whether there are bad pixels in the image before comparing two frames of images. When bad pixels exist, the bad pixels are removed before comparing the image frames.
一种实施例中,判断是否存在坏点的方式可以包括:判断任意像素点与周围像素点之间是否存在明显跳变,例如,可以判断各像素点与相邻像素点之间的显示参数差值(例如色调值)是否高于第一阈值,若高于第一阈值,则确定该像素点与周围像素点之间存在明显跳变,则确定该像素点为坏点。In one embodiment, a method for determining whether there are bad pixels may include: determining whether there is an obvious jump between any pixel and surrounding pixels. For example, it may be determined whether the display parameter difference (such as hue value) between each pixel and adjacent pixels is higher than a first threshold. If it is higher than the first threshold, it is determined that there is an obvious jump between the pixel and surrounding pixels, and the pixel is determined to be a bad pixel.
另一种实施例中,判断是否存在坏点的方式可以包括:比较各像素点与理论意义上的坏点的相似度,当相似度超过第三阈值(例如95%),则可以认为该像素点就是坏点。In another embodiment, the method of determining whether there is a bad pixel may include: comparing the similarity between each pixel and a bad pixel in a theoretical sense, and when the similarity exceeds a third threshold (eg, 95%), the pixel may be considered to be a bad pixel.
差异点忽略法是指当两帧图像对比后的得到相似度大于第四阈值(例如99%),而两帧图像中任意两个不匹配的像素点的位置差距大于第二阈值,则可以认为这些具有差异的像素点是无关紧要的忽略点,可以不进行匹配,此时确定两帧图像为匹配图像。例如,如图13i中所示,两帧图像中具有差异的两个像素点距离较远,位置差距大于第二阈值,则可以认为两个像素点均是无关紧要的忽略点。The difference point ignoring method means that when the similarity obtained after comparing two frames of images is greater than the fourth threshold value (for example, 99%), and the position difference between any two unmatched pixel points in the two frames of images is greater than the second threshold value, then these different pixel points can be considered as insignificant ignored points, and no matching is required. At this time, the two frames of images are determined to be matched images. For example, as shown in FIG. 13i, the two different pixel points in the two frames of images are far apart, and the position difference is greater than the second threshold value, then both pixel points can be considered as insignificant ignored points.
下面对本申请实施例中提及的图像局部比对方法进行说明。可以理解,本申请实施例中的局部比对法与上述图像帧比对法基本一致,区别在于,上述图像帧比对法中,电子设备会将B文件中的每个完整图像帧与图像资源库中的每个完整图像帧进行比对,而局部比对法中是将B文件中的特定部分图像帧与图像资源库中的特定部分图像帧进行比对。The local image comparison method mentioned in the embodiment of the present application is described below. It can be understood that the local comparison method in the embodiment of the present application is basically the same as the above-mentioned image frame comparison method, the difference is that in the above-mentioned image frame comparison method, the electronic device will compare each complete image frame in the B file with each complete image frame in the image resource library, while in the local comparison method, a specific part of the image frame in the B file is compared with a specific part of the image frame in the image resource library.
如图14a所示,图像局部比对法可以包括任意位置移动比对法、任意位置扩大比对法和固定位置比对法。As shown in FIG. 14 a , the local image comparison method may include an arbitrary position moving comparison method, an arbitrary position enlarging comparison method, and a fixed position comparison method.
图14b示出了本申请实施例中一种任意位置移动比对法的示意图。如图14b所示,任意位置移动比对法是在每帧图像中按照一定轨迹或一定的切换顺序(例如,从左到右依次遍历)多各图像块进行移动遍历匹配,例如,如图14b中所示,可以先将图像帧中左上角的第二图像块与图像资源库中的对应图像帧的图像块进行匹配,若匹配不成功,则将与第二图像块右侧相邻的第一图像块与图像资源库中的对应图像帧的图像块进行匹配,若匹配不成功,则将与第一图像块右侧相邻的第三图像块与图像资源库中的对应图像帧的图像块进行匹配。当在移动的过程中,用于图像控制处理的视频流内容中图像帧中的任意一个图像块与图像资源库中的对应图像帧的图像块匹配成功,则证明该帧图像匹配成功。当遍历完成后,仍然没有可以匹配的图像块,则证明该帧图像匹配不成功。Figure 14b shows a schematic diagram of an arbitrary position moving comparison method in an embodiment of the present application. As shown in Figure 14b, the arbitrary position moving comparison method is to move and traverse multiple image blocks in each frame of the image according to a certain trajectory or a certain switching order (for example, traversing from left to right in sequence). For example, as shown in Figure 14b, the second image block in the upper left corner of the image frame can be matched with the image block of the corresponding image frame in the image resource library. If the match is unsuccessful, the first image block adjacent to the right side of the second image block is matched with the image block of the corresponding image frame in the image resource library. If the match is unsuccessful, the third image block adjacent to the right side of the first image block is matched with the image block of the corresponding image frame in the image resource library. When in the process of moving, any image block in the image frame in the video stream content used for image control processing is successfully matched with the image block of the corresponding image frame in the image resource library, it proves that the frame image matches successfully. When the traversal is completed, there is still no image block that can be matched, which proves that the frame image matches unsuccessfully.
下面以图14c所述的中心位置扩大比对法为例,说明本申请实施例中的任意位置扩大比对方法。如图14c所示,中心位置扩大比对法是以图像帧中心位置的设定规格的第一中心图像块,为参考,逐渐扩大图像块的范围,进行对比匹配。具体的,首先将用于图像控制处理的视频流内容中当前待匹配图像帧的第一中心图像块(图14c中最左边图片中的图像块)与图像资源库中的对应图像帧的图像块进行比对,若匹配成功,则证明视频流内容中当前待匹配图像帧与图像资源库中的对应图像帧匹配成功;若不成功,则将第一中心图像块扩大设定范围,例如,获取到第二中心图像块(图14c中间图片中的图像块),对第二中心图像块进行匹配,当第二中心图像块与图像资源库中的对应图像块匹配成功,则结束匹配,确定视频流内容中当前待匹配图像帧与图像资源库中的对应图像帧匹配成功;当第二中心图像块与图像资源库中的对应图像块匹配不成功,则将中第二中心图像块扩大设定范围,例如,获取到第三中心图像块(图14c中最右边图片中的图像块),进行继续匹配。若扩大范围到整张图像,仍然匹配不成功,则确定视频流内容中当前待匹配图像帧与图像资源库中的对应图像帧匹配不成功。可以理解,第二中心图像 块大于第一中心图像块,且包含第一中心图像块。第二中心图像块基于中心图像块向四周延伸。The following takes the center position enlargement comparison method described in FIG14c as an example to illustrate the arbitrary position enlargement comparison method in the embodiment of the present application. As shown in FIG14c, the center position enlargement comparison method is to use the first center image block of the set specification of the center position of the image frame as a reference, gradually expand the range of the image block, and perform comparison and matching. Specifically, first, the first central image block of the current image frame to be matched in the video stream content for image control processing (the image block in the leftmost picture in FIG. 14c) is compared with the image block of the corresponding image frame in the image resource library. If the match is successful, it proves that the current image frame to be matched in the video stream content and the corresponding image frame in the image resource library are matched successfully; if it is unsuccessful, the first central image block is expanded by a set range, for example, the second central image block (the image block in the middle picture in FIG. 14c) is obtained, and the second central image block is matched. When the second central image block and the corresponding image block in the image resource library are matched successfully, the matching is terminated, and it is determined that the current image frame to be matched in the video stream content and the corresponding image frame in the image resource library are matched successfully; when the second central image block and the corresponding image block in the image resource library are matched unsuccessfully, the second central image block is expanded by a set range, for example, the third central image block (the image block in the rightmost picture in FIG. 14c) is obtained, and the matching is continued. If the range is expanded to the entire image and the match is still unsuccessful, it is determined that the current image frame to be matched in the video stream content and the corresponding image frame in the image resource library are matched unsuccessfully. It can be understood that the second central image The block is larger than the first central image block and includes the first central image block. The second central image block extends in all directions based on the central image block.
图14d示出了本申请实施例中一种固定位置比对法的示意图,其中,固定位置比对法则是设定固定位置的图像块,固定位置由应用或者用户设定,例如,如图14d所示,固定位置的图像块可以是中间位置的图像块,也可以是边角位置的图像块,固定位置的图像块数量可以为一个,也可以为多个,本申请不做限定。当用于图像控制处理的视频流内容中当前待匹配图像帧中的固定位置的图像块与图像资源库中的对应图像帧的对应图像块进行匹配成功,则确定视频流内容中当前待匹配图像帧与图像资源库中的对应图像帧匹配成功。当用于图像控制处理的视频流内容中当前待匹配图像帧的固定位置的图像块与图像资源库中的对应图像帧的对应图像块进行匹配不成功,则确定视频流内容中当前待匹配图像帧与图像资源库中的对应图像帧匹配不成功。Figure 14d shows a schematic diagram of a fixed position comparison method in an embodiment of the present application, wherein the fixed position comparison method is to set an image block at a fixed position, and the fixed position is set by an application or a user. For example, as shown in Figure 14d, the image block at the fixed position may be an image block at a middle position or an image block at a corner position. The number of image blocks at the fixed position may be one or more, and the present application does not limit this. When the image block at a fixed position in the current image frame to be matched in the video stream content used for image control processing is successfully matched with the corresponding image block of the corresponding image frame in the image resource library, it is determined that the current image frame to be matched in the video stream content is successfully matched with the corresponding image frame in the image resource library. When the image block at a fixed position in the current image frame to be matched in the video stream content used for image control processing is unsuccessful in matching with the corresponding image block of the corresponding image frame in the image resource library, it is determined that the current image frame to be matched in the video stream content is unsuccessful in matching with the corresponding image frame in the image resource library.
可以理解,在录制过程中,图像的核心信息一般只占据整个图像的局部区域,例如,60%到70%的区域,而核心信息匹配的情况下,图像一般会匹配,因此,本申请实施例中采集局部比对法进行比对,可以在保证匹配正确率的情况下,提高匹配效率。It can be understood that during the recording process, the core information of the image generally only occupies a local area of the entire image, for example, 60% to 70% of the area, and when the core information matches, the images will generally match. Therefore, in the embodiment of the present application, the local comparison method is used for comparison, which can improve the matching efficiency while ensuring the matching accuracy.
可以理解,本申请实施例中对图像指令开始时刻点和结束时刻点的标记,除了上述的【图像1起始】这样的方式,还可以按照任意的可实施的形式进行标记。It can be understood that in the embodiment of the present application, the marking of the start time point and the end time point of the image instruction can be marked in any feasible form in addition to the above-mentioned method of [Image 1 Start].
在一些实施例中,对图像指令开始时刻点和结束时刻点的标记可以如表2所示:可以采用真实图像下标法,例如当图像指令为切换镜头文字图像时,则在切换镜头文字图像指令的开始时刻点下标0,在结束时刻点下标1;还可以采用真实图像成对法,例如,当图像指令内容为放大焦距值时,可以在放大焦距值图像指令的开始时刻点下标一个标记,结束时刻点下标一个同样的标记,如此,先出现的标记默认为开始时刻点,后出现的标记默认为结束时刻点;还可以采用操作下标法,例如,对于切换镜头图像指令,可以对切换镜头图像指令对应的实际操作对应的开始时刻点进行下标0,结束时刻点进行下标1;还可以采用操作成对标记法,例如,可以在放大焦距值图像指令对应的操作的开始时刻点下标一个标记,结束时刻点下标一个同样的标记,如此,先出现的标记默认为开始时刻点,后出现的标记默认为结束时刻点。In some embodiments, the marking of the start time point and the end time point of the image instruction can be as shown in Table 2: the real image subscript method can be used, for example, when the image instruction is a switching lens text image, the start time point of the switching lens text image instruction is subscripted with 0, and the end time point is subscripted with 1; the real image pairing method can also be used, for example, when the image instruction content is to enlarge the focal length value, a mark can be subscripted at the start time point of the enlarged focal length value image instruction, and the same mark can be subscripted at the end time point, so that the mark that appears first defaults to the start time point, and the mark that appears later defaults to the end time point; the operation subscript method can also be used, for example, for the switching lens image instruction, the start time point corresponding to the actual operation corresponding to the switching lens image instruction can be subscripted with 0, and the end time point can be subscripted with 1; the operation pairing marking method can also be used, for example, a mark can be subscripted at the start time point of the operation corresponding to the enlarged focal length value image instruction, and the same mark can be subscripted at the end time point, so that the mark that appears first defaults to the start time point, and the mark that appears later defaults to the end time point.
表2:图像指令的标记方式表
Table 2: Image instruction marking table
下面对步骤503中,确定录制内容中扩大标记范围后的开始时刻点和结束时刻点的方法进行详述。图15a中示出了一种确定扩大范围后的开始时刻点的方法,如图15a所示,确定扩大范围后的开始时刻点的方法可以包括:The following is a detailed description of the method for determining the start time point and the end time point after the expanded marked range in the recorded content in step 503. FIG15a shows a method for determining the start time point after the expanded range. As shown in FIG15a, the method for determining the start time point after the expanded range may include:
1501:确定图像指令在音频流内容中所处的第一时间区间的第一开始时刻点对应的第i帧波形,以及音频流内容中的第n帧波形,其中,n=i-1。1501: Determine the i-th frame waveform corresponding to the first starting time point of the first time interval where the image instruction is located in the audio stream content, and the n-th frame waveform in the audio stream content, where n=i-1.
1502:判断音频流内容中第n帧波形与第i帧波形的相似度是否大于第六阈值。1502: Determine whether the similarity between the n-th frame waveform and the i-th frame waveform in the audio stream content is greater than a sixth threshold.
若是,转至1503,判断已比较的帧数是否达到第一数量。If so, go to 1503 to determine whether the number of compared frames reaches the first number.
若否,转至1504:将音频流内容中第n+1帧波形对应的时刻点作为音频流内容中图像指令扩大范围后的对应的开始时刻点。If not, go to 1504: use the time point corresponding to the (n+1)th frame waveform in the audio stream content as the corresponding starting time point after the image instruction in the audio stream content is expanded.
本申请实施例中,可以依次判断第一开始时刻点的前一帧波形、前第二帧波形、前第三帧波形等各帧波形与第一开始时刻点对应的波形的相似度是否大于第六阈值,直至在确定其中一帧波形与第一开始时刻点对应的波形相似度小于等于第六阈值时,将该帧波形的后一帧波形对应的时刻点作为图像指令的 开始时刻点,或者在确定其中一帧波形与第一开始时刻点对应的波形相似度大于第六阈值,但是当前已比较的帧数达到设定第一数量时,将该帧波形对应的时刻点作为图像指令的扩大范围后的开始时刻点。In the embodiment of the present application, it can be determined in sequence whether the similarity between the previous frame waveform, the previous second frame waveform, the previous third frame waveform and the waveform corresponding to the first starting time point is greater than the sixth threshold value, until it is determined that the similarity between one frame waveform and the waveform corresponding to the first starting time point is less than or equal to the sixth threshold value, and the time point corresponding to the next frame waveform of the frame waveform is used as the image instruction. The starting time point, or when it is determined that the similarity between one frame waveform and the waveform corresponding to the first starting time point is greater than the sixth threshold, but the number of frames currently compared has reached the set first number, the time point corresponding to the frame waveform is used as the starting time point after the image instruction is expanded.
1503:判断已比较的帧数是否达到第一数量。1503: Determine whether the number of compared frames reaches a first number.
若达到,则转至1505,将音频流内容中第n帧波形对应的时刻点作为扩大范围后的开始时刻点。If it is reached, go to 1505 and use the time point corresponding to the n-th frame waveform in the audio stream content as the starting time point after the range is expanded.
若未达到,则转至1506,赋值n=n-1。If not reached, go to 1506 and assign n=n-1.
1504:将音频流内容中第n+1帧波形对应的时刻点作为音频流内容中图像指令扩大范围后的对应的开始时刻点。1504: The time point corresponding to the waveform of the n+1th frame in the audio stream content is used as the corresponding starting time point after the image instruction in the audio stream content is expanded.
1505:将音频流内容中第n帧波形对应的时刻点作为扩大范围后的开始时刻点。1505: The time point corresponding to the n-th frame waveform in the audio stream content is used as the starting time point after the range is expanded.
1506:赋值n=n-1。1506: Assign n=n-1.
图15b中示出了本申请实施例中另一种确定扩大范围后的开始时刻点的方法,如图12所示,确定扩大范围后的开始时刻点的方法可以包括:FIG. 15b shows another method for determining the start time point after the range is expanded in an embodiment of the present application. As shown in FIG. 12 , the method for determining the start time point after the range is expanded may include:
1601:确定图像指令在音频流内容中所处的第一时间区间的第一开始时刻点对应的第i帧波形,以及音频流内容中的第n帧波形,其中,n=i-1。1601: Determine the i-th frame waveform corresponding to the first starting time point of the first time interval where the image instruction is located in the audio stream content, and the n-th frame waveform in the audio stream content, where n=i-1.
1602:判断音频流内容中第n帧波形与第n+1帧波形的相似度是否大于第六阈值。1602: Determine whether the similarity between the nth frame waveform and the (n+1)th frame waveform in the audio stream content is greater than a sixth threshold.
若是,转至1603,判断已比较的帧数是否达到第一数量。If so, go to 1603 to determine whether the number of compared frames reaches the first number.
若否,转至1604:将音频流内容中第n+1帧波形对应的时刻点作为音频流内容中图像指令扩大范围后的对应的开始时刻点。If not, go to 1604: use the time point corresponding to the (n+1)th frame waveform in the audio stream content as the corresponding starting time point after the image instruction in the audio stream content is expanded.
本申请实施例中,可以依次判断第一开始时刻点的前一帧波形与第一开始时刻点对应的波形相似度、第一开始时刻点的前第二帧波形与第一开始时刻点的前一帧波形的相似度、第一开始时刻点的前第三帧波形与第一开始时刻点的前第二帧波形的相似度等各帧波形与各帧波形的后一帧波形的相似度是否大于第六阈值,直至在确定其中一帧波形与该帧波形的后一帧波形的相似度小于等于第六阈值时,将该帧波形的后一帧波形对应的时刻点作为图像指令的扩大范围后的开始时刻点,或者在确定其中一帧波形与该帧波形的后一帧波形相似度大于第六阈值,但是当前已比较的帧数达到设定第一数量时,将该帧波形对应的时刻点作为图像指令的扩大范围后的开始时刻点。In the embodiment of the present application, it can be judged in turn whether the similarity between the previous frame waveform of the first starting time point and the waveform corresponding to the first starting time point, the similarity between the second frame waveform before the first starting time point and the previous frame waveform of the first starting time point, the similarity between the third frame waveform before the first starting time point and the second frame waveform before the first starting time point, and other similarities between each frame waveform and the next frame waveform of each frame waveform is greater than a sixth threshold, until it is determined that the similarity between one of the frame waveforms and the next frame waveform of the frame waveform is less than or equal to the sixth threshold, and the time point corresponding to the next frame waveform of the frame waveform is used as the starting time point after the expansion range of the image instruction, or when it is determined that the similarity between one of the frame waveforms and the next frame waveform of the frame waveform is greater than the sixth threshold, but the number of frames currently compared has reached the set first number, the time point corresponding to the frame waveform is used as the starting time point after the expansion range of the image instruction.
1603:判断已比较的帧数是否达到第一数量。1603: Determine whether the number of compared frames reaches a first number.
若达到,则转至1605,将音频流内容中第n帧波形对应的时刻点作为扩大范围后的开始时刻点。If it is reached, go to 1605 and use the time point corresponding to the n-th frame waveform in the audio stream content as the starting time point after the range is expanded.
若未达到,则转至1606,赋值n=n-1。If not reached, go to 1606 and assign n=n-1.
1604:将音频流内容中第n+1帧波形对应的时刻点作为音频流内容中图像指令扩大范围后的对应的开始时刻点。1604: The time point corresponding to the waveform of the n+1th frame in the audio stream content is used as the corresponding starting time point after the image instruction in the audio stream content is expanded.
1605:将音频流内容中第n帧波形对应的时刻点作为扩大范围后的开始时刻点。1605: The time point corresponding to the n-th frame waveform in the audio stream content is used as the starting time point after the range is expanded.
1606:赋值n=n-1。1606: Assign n=n-1.
其中,确定图像指令对应的扩大范围后的结束时刻点的方式可以与上述确定开始时刻点的方式类似,区别在于,确定结束时刻点是将第一结束时刻点之后的波形与第一结束时刻点对应的波形进行比较判断。此处不再赘述。The method for determining the end time point after the expansion of the image instruction can be similar to the method for determining the start time point, except that the end time point is determined by comparing the waveform after the first end time point with the waveform corresponding to the first end time point.
本申请实施例,可以对基于录制内容生成不同的录制文件。例如,可以基于图像指令所处的第一时间区间在录制内容中标记第二时间区间,以便对第二时间区间对应的内容进行裁剪处理,生成不包括图像指令的第一录制文件。其中,第一录制文件为录制内容中删除掉第二时间区间对应录制内容后的拼接内容。在一些实施例中,还可以基于录制内容,生成完整的录制文件,即第二录制文件,以便于用户查看,恢复原文件等需求。在一些实施例中,还可以基于图像指令所处的第一时间区间在录制内容中标记第二时间区间,以便于生成多个素材片段,即第三录制文件集。In an embodiment of the present application, different recording files can be generated based on the recorded content. For example, a second time interval can be marked in the recorded content based on the first time interval where the image instruction is located, so as to crop the content corresponding to the second time interval and generate a first recording file that does not include the image instruction. Among them, the first recording file is the spliced content after deleting the recording content corresponding to the second time interval from the recorded content. In some embodiments, a complete recording file, i.e., a second recording file, can also be generated based on the recorded content to facilitate user viewing, restoration of the original file, and other needs. In some embodiments, the second time interval can also be marked in the recorded content based on the first time interval where the image instruction is located, so as to generate multiple material clips, i.e., a third recording file set.
在一些实施例中,多个素材片段中可以包括图像指令类的片段,即第一类录制片段文件;也可以包括录制内容类的片段,即第二类录制片段为文件。其中,图像指令类的片段包括第二时间区间所对应的录制内容,录制内容类的片段为录制内容中裁剪掉第二时间区间的对应录制内容剩余的各个片段内容。In some embodiments, the multiple material clips may include image instruction clips, i.e., first type of recording clip files; and may also include recording content clips, i.e., second type of recording clip files. The image instruction clips include the recording content corresponding to the second time interval, and the recording content clips are the remaining clip contents after cutting out the corresponding recording content of the second time interval from the recording content.
在一些实施例中,电子设备可以将图像指令类的片段与对应的操作存储在对应的控制指令素材库中,以用于后续图像指令的识别和匹配。电子设备可以存储录制内容类的片段,可以便于用户进行单独查看,或进行拼接合成等剪辑处理。 In some embodiments, the electronic device can store the image instruction-like segments and corresponding operations in the corresponding control instruction material library for subsequent image instruction recognition and matching. The electronic device can store the recorded content-like segments, which can facilitate users to view them separately or perform editing processing such as splicing and synthesis.
在一些实施例中,获取多个素材片段包括:在录制内容为视频流内容时,基于视频流内容中标记的时刻点对视频流内容进行拆分,获取多个视频流素材片段;在录制内容包括音频流内容和视频流内容时,基于音频流内容中标记的时刻点对音频流内容和/或视频流内容进行拆分,获取多个音频流素材片段和/或视频流素材片段,或者基于视频流内容中标记的时刻点对视频流内容和/或音频流内容进行拆分,获取多个视频流素材片段和/或音频流素材片段,基于对应的音频流素材片段和视频流素材片段之间的对应关系,生成对应的录制片段。In some embodiments, obtaining multiple material clips includes: when the recorded content is video stream content, splitting the video stream content based on time points marked in the video stream content to obtain multiple video stream material clips; when the recorded content includes audio stream content and video stream content, splitting the audio stream content and/or video stream content based on time points marked in the audio stream content to obtain multiple audio stream material clips and/or video stream material clips, or splitting the video stream content and/or audio stream content based on time points marked in the video stream content to obtain multiple video stream material clips and/or audio stream material clips, and generating corresponding recording clips based on the correspondence between the corresponding audio stream material clips and the video stream material clips.
例如,电子设备在录制过程中检测到了“切换摄像头”的图像指令,并确定该“切换摄像头”的图像指令的在视频流内容和音频流内容中对应的开始时刻点和结束时刻点时,则可以基于“切换摄像头”图像指令的开始时刻点前的音频流内容和视频流内容生成第一录制片段,结束时刻点之后的音频流内容和视频流内容生成第二录制片段,“切换摄像头”图像指令对应时间段的音频流内容和视频流内容生成第三录制片段。For example, when the electronic device detects an image instruction of "switch camera" during the recording process and determines the start time point and end time point corresponding to the image instruction of "switch camera" in the video stream content and the audio stream content, it can generate a first recording segment based on the audio stream content and the video stream content before the start time point of the "switch camera" image instruction, generate a second recording segment based on the audio stream content and the video stream content after the end time point, and generate a third recording segment based on the audio stream content and the video stream content in the time period corresponding to the "switch camera" image instruction.
本申请实施例还提供一种录制控制***,如图16所示,可以包括;The embodiment of the present application also provides a recording control system, as shown in FIG16 , which may include:
图像采集模块,响应于用户开始录制的第一操作,至少通过摄像头执行录制,录制内容至少包括视频流内容;An image acquisition module, in response to a first operation of starting recording by a user, performs recording at least through a camera, and the recorded content at least includes video stream content;
图像控制模块,可以用于识别所述视频流内容中用户通过所述摄像头输入的图像指令,所述图像指令用于实现对录制的控制;An image control module, which can be used to identify image instructions input by the user through the camera in the video stream content, and the image instructions are used to control the recording;
音画同步模块,用于对录制内容中图像指令所处的第一时间区间对录制内容中的第二时间区间进行标记;获取标记后的录制内容。The audio-visual synchronization module is used to mark the second time interval in the recorded content at the first time interval where the image instruction in the recorded content is located; and obtain the marked recorded content.
拍摄结果生成模块,用于在检测到录制结束指令时,基于录制内容至少生成第一录制文件。The shooting result generating module is used to generate at least a first recording file based on the recording content when a recording end instruction is detected.
本申请实施例中提供的录制控制方法,可以直接在应用侧开发实现,也可以按照能力集成的方式进行单独构建,例如,如图17所示,可以通过AAR和JAR包的方式提供给电子设备***中应用侧集成的能力,可以不跟随版本的更新而更新,也可以以二进制方式的能力打包提供给电子设备***中所有部件的能力,可以不跟随版本的更新而更新。还可以通过电子设备***中版本的框架层的接口提供给电子设备***中所有部件的能力,可以跟随***升级。The recording control method provided in the embodiment of the present application can be directly developed and implemented on the application side, or it can be constructed separately in the form of capability integration. For example, as shown in FIG17 , the capability integrated on the application side in the electronic device system can be provided in the form of AAR and JAR packages, which can be updated regardless of the version update, or it can be provided in the form of binary capability packages to the capabilities of all components in the electronic device system, which can be updated regardless of the version update. The capabilities can also be provided to all components in the electronic device system through the interface of the framework layer of the version in the electronic device system, which can be updated with the system upgrade.
本申请公开的各实施例可以被实现在硬件、软件、固件或这些实现方法的组合中。本申请的实施例可实现为在可编程***上执行的计算机程序或程序代码,该可编程***包括至少一个处理器、存储***(包括易失性和非易失性存储器和/或存储元件)、至少一个输入设备以及至少一个输出设备。The various embodiments disclosed in the present application may be implemented in hardware, software, firmware, or a combination of these implementation methods. The embodiments of the present application may be implemented as a computer program or program code executed on a programmable system, the programmable system comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
可将程序代码应用于输入指令,以执行本申请描述的各功能并生成输出信息。可以按已知方式将输出信息应用于一个或多个输出设备。为了本申请的目的,处理***包括具有诸如例如数字信号处理器(DSP)、微控制器、专用集成电路(ASIC)或微处理器之类的处理器的任何***。Program code can be applied to input instructions to perform the functions described in this application and generate output information. The output information can be applied to one or more output devices in a known manner. For the purposes of this application, a processing system includes any system having a processor such as, for example, a digital signal processor (DSP), a microcontroller, an application specific integrated circuit (ASIC), or a microprocessor.
程序代码可以用高级程序化语言或面向对象的编程语言来实现,以便与处理***通信。在需要时,也可用汇编语言或机器语言来实现程序代码。事实上,本申请中描述的机制不限于任何特定编程语言的范围。在任一情形下,该语言可以是编译语言或解释语言。Program code can be implemented with high-level programming language or object-oriented programming language to communicate with the processing system. When necessary, program code can also be implemented with assembly language or machine language. In fact, the mechanism described in this application is not limited to the scope of any specific programming language. In either case, the language can be a compiled language or an interpreted language.
在一些情况下,所公开的实施例可以以硬件、固件、软件或其任何组合来实现。所公开的实施例还可以被实现为由一个或多个暂时或非暂时性机器可读(例如,计算机可读)存储介质承载或存储在其上的指令,其可以由一个或多个处理器读取和执行。例如,指令可以通过网络或通过其他计算机可读介质分发。因此,机器可读介质可以包括用于以机器(例如,计算机)可读的形式存储或传输信息的任何机制,包括但不限于,软盘、光盘、光碟、只读存储器(CD-ROMs)、磁光盘、只读存储器(ROM)、随机存取存储器(RAM)、可擦除可编程只读存储器(EPROM)、电可擦除可编程只读存储器(EEPROM)、磁卡或光卡、闪存、或用于利用因特网以电、光、声或其他形式的传播信号来传输信息(例如,载波、红外信号数字信号等)的有形的机器可读存储器。因此,机器可读介质包括适合于以机器(例如,计算机)可读的形式存储或传输电子指令或信息的任何类型的机器可读介质。In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried or stored on one or more temporary or non-temporary machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, instructions may be distributed over a network or through other computer-readable media. Therefore, machine-readable media may include any mechanism for storing or transmitting information in a machine (e.g., computer) readable form, including, but not limited to, floppy disks, optical disks, optical disks, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), random access memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or a tangible machine-readable memory for transmitting information (e.g., carrier waves, infrared signals, digital signals, etc.) using the Internet in electrical, optical, acoustic, or other forms of propagation signals. Therefore, machine-readable media include any type of machine-readable media suitable for storing or transmitting electronic instructions or information in a machine (e.g., computer) readable form.
在附图中,可以以特定布置和/或顺序示出一些结构或方法特征。然而,应该理解,可能不需要这样的特定布置和/或排序。而是,在一些实施例中,这些特征可以以不同于说明性附图中所示的方式和/或顺序来布置。另外,在特定图中包括结构或方法特征并不意味着暗示在所有实施例中都需要这样的特征,并且在一些实施例中,可以不包括这些特征或者可以与其他特征组合。 In the accompanying drawings, some structural or method features may be shown in a specific arrangement and/or order. However, it should be understood that such a specific arrangement and/or order may not be required. Instead, in some embodiments, these features may be arranged in a manner and/or order different from that shown in the illustrative drawings. In addition, the inclusion of structural or method features in a particular figure does not mean that such features are required in all embodiments, and in some embodiments, these features may not be included or may be combined with other features.
需要说明的是,本申请各设备实施例中提到的各单元/模块都是逻辑单元/模块,在物理上,一个逻辑单元/模块可以是一个物理单元/模块,也可以是一个物理单元/模块的一部分,还可以以多个物理单元/模块的组合实现,这些逻辑单元/模块本身的物理实现方式并不是最重要的,这些逻辑单元/模块所实现的功能的组合才是解决本申请所提出的技术问题的关键。此外,为了突出本申请的创新部分,本申请上述各设备实施例并没有将与解决本申请所提出的技术问题关系不太密切的单元/模块引入,这并不表明上述设备实施例并不存在其它的单元/模块。It should be noted that the units/modules mentioned in the various device embodiments of the present application are all logical units/modules. Physically, a logical unit/module can be a physical unit/module, or a part of a physical unit/module, or can be implemented as a combination of multiple physical units/modules. The physical implementation method of these logical units/modules themselves is not the most important. The combination of functions implemented by these logical units/modules is the key to solving the technical problems proposed by the present application. In addition, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules that are not closely related to solving the technical problems proposed by the present application, which does not mean that there are no other units/modules in the above-mentioned device embodiments.
需要说明的是,在本专利的示例和说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in the examples and description of this patent, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "include", "comprise" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, the elements defined by the sentence "including one" do not exclude the existence of other identical elements in the process, method, article or device including the elements.
虽然通过参照本申请的某些优选实施例,已经对本申请进行了图示和描述,但本领域的普通技术人员应该明白,可以在形式上和细节上对其作各种改变,而不偏离本申请的范围。 Although the present application has been illustrated and described with reference to certain preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the present application.

Claims (21)

  1. 一种视频录制控制方法,其特征在于,应用于电子设备,所述方法包括:A video recording control method, characterized in that it is applied to an electronic device, the method comprising:
    响应于用户开始录制的第一操作,至少通过摄像头执行录制,录制内容至少包括视频流内容;In response to a first operation of starting recording by a user, recording is performed at least through a camera, and the recorded content includes at least video stream content;
    识别所述视频流内容中用户通过所述摄像头输入的图像指令,所述图像指令用于实现对录制的控制;Identify an image command input by a user through the camera in the video stream content, wherein the image command is used to control recording;
    其中,所述识别所述视频流内容中用户通过所述摄像头输入的图像指令包括:对所述视频流内容中的至少一个图像帧的至少第一图像块进行识别,识别所述第一图像块中与特征行为匹配的所述图像指令,并根据所述图像指令所在的所述至少一个图像帧确定所述图像指令所在的第一时间区间。Among them, the identifying of image instructions input by the user through the camera in the video stream content includes: identifying at least a first image block of at least one image frame in the video stream content, identifying the image instructions matching the characteristic behavior in the first image block, and determining a first time interval where the image instructions are located based on the at least one image frame where the image instructions are located.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述图像指令所在的所述至少一个图像帧确定所述图像指令所在的第一时间区间包括:The method according to claim 1, characterized in that the determining the first time interval where the image instruction is located according to the at least one image frame where the image instruction is located comprises:
    所述至少一个图像帧包括第一图像帧及第二图像帧,所述第一图像帧为所述图像指令在所述第一图像块中与所述特征行为匹配的第一个图像帧,所述第二图像帧为所述图像指令在所述第一图像块中与所述特征行为匹配的最后一个图像帧,所述第一时间区间的开始时间为所述第一图像帧所在的时刻,所述第一时间区间的结束时间为所述第二图像帧所在的时刻。The at least one image frame includes a first image frame and a second image frame, the first image frame is the first image frame in which the image instruction matches the characteristic behavior in the first image block, the second image frame is the last image frame in which the image instruction matches the characteristic behavior in the first image block, the start time of the first time interval is the moment where the first image frame is located, and the end time of the first time interval is the moment where the second image frame is located.
  3. 根据权利要求1-2任一项所述的方法,其特征在于,所述确定所述第一图像块中与特征行为匹配的所述图像指令之前,所述方法还包括:The method according to any one of claims 1-2, characterized in that before determining the image instruction matching the characteristic behavior in the first image block, the method further comprises:
    对所述视频流内容中的至少一个图像帧的第二图像块进行识别,在未识别到所述图像指令时,由所述第二图像块切换到所述第一图像块。A second image block of at least one image frame in the video stream content is identified, and when the image instruction is not identified, the second image block is switched to the first image block.
  4. 根据权利要求3所述的方法,其特征在于,所述第一图像块比所述第二图像块大,且所述第一图像块包含所述第二图像块。The method according to claim 3 is characterized in that the first image block is larger than the second image block, and the first image block contains the second image block.
  5. 根据权利要求4所述的方法,其特征在于,所述第二图像块位于所述图像帧的中心位置,所述第一图像块基于所述第二图像块向四周延伸。The method according to claim 4 is characterized in that the second image block is located at the center of the image frame, and the first image block extends to all sides based on the second image block.
  6. 根据权利要求3所述的方法,其特征在于,所述第一图像块与所述第二图像块相邻。The method according to claim 3, characterized in that the first image block is adjacent to the second image block.
  7. 根据权利要求3所述的方法,其特征在于,所述第一图像块为按照切换顺序排列在所述第二图像块后的下一个图像块,所述切换顺序包括预先设定的不同顺序图像块所在的位置。The method according to claim 3 is characterized in that the first image block is the next image block arranged after the second image block in a switching order, and the switching order includes the positions of image blocks of different orders that are preset.
  8. 根据权利要求1-7中任一项所述的方法,其特征在于,所述图像指令在所述第一图像块中与所述特征行为进行匹配时,所述方法还包括:The method according to any one of claims 1 to 7, characterized in that when the image instruction is matched with the characteristic behavior in the first image block, the method further comprises:
    识别所述第一图像块中的第一像素,所述第一像素与相邻像素的显示参数差值高于第一阈值,忽略匹配所述第一像素。A first pixel in the first image block is identified, the difference between the display parameters of the first pixel and adjacent pixels being higher than a first threshold, and the matching of the first pixel is ignored.
  9. 根据权利要求1-7任一项所述的方法,其特征在于,所述图像指令在所述第一图像块中与所述特征行为进行匹配时,所述方法还包括:The method according to any one of claims 1 to 7, characterized in that when the image instruction is matched with the characteristic behavior in the first image block, the method further comprises:
    识别所述第一图像块中不匹配的第二像素及第三像素,所述第二像素与所述第三像素的位置差距超过第二阈值,忽略匹配所述第二像素及所述第三像素。Identify unmatched second pixels and third pixels in the first image block, if the position difference between the second pixel and the third pixel exceeds a second threshold, and ignore matching the second pixel and the third pixel.
  10. 根据权利要求1-9任一项所述的方法,其特征在于,匹配所述图像指令的所述特征行为包括所述第一图像块中出现的示意内容或肢体动作。The method according to any one of claims 1-9 is characterized in that the characteristic behavior matching the image instruction includes schematic content or body movements appearing in the first image block.
  11. 根据权利要求10所述的方法,其特征在于,所述示意内容包括所述第一图像块中出现的特定文字信息或特定图像信息。The method according to claim 10 is characterized in that the schematic content includes specific text information or specific image information appearing in the first image block.
  12. 根据权利要求10所述的方法,其特征在于,所述肢体动作包括所述第一图像块中出现的特定手势信息或特定脸部信息。The method according to claim 10 is characterized in that the body movement includes specific gesture information or specific facial information appearing in the first image block.
  13. 根据权利要求1所述的方法,其特征在于,所述至少通过摄像头执行录制包括:The method according to claim 1, wherein the step of performing recording at least through a camera comprises:
    通过第一摄像头执行录制;Recording is performed through the first camera;
    响应于用户通过所述第一摄像头输入的切换摄像头的图像指令,通过第二摄像头执行录制。In response to an image instruction of switching cameras input by the user through the first camera, recording is performed through the second camera.
  14. 根据权利要求13所述的方法,其特征在于,所述第一摄像头为所述电子设备的前置摄像头,所述第二摄像头为所述电子设备的后置摄像头;或The method according to claim 13, characterized in that the first camera is a front camera of the electronic device, and the second camera is a rear camera of the electronic device; or
    所述第一摄像头为所述电子设备的后置摄像头,所述第二摄像头为所述电子设备的前置摄像头。 The first camera is a rear camera of the electronic device, and the second camera is a front camera of the electronic device.
  15. 根据权利要求14所述的方法,其特征在于,所述通过第二摄像头执行录制之后,所述方法还包括:The method according to claim 14, characterized in that after performing recording by the second camera, the method further comprises:
    识别所述视频流内容中用户通过所述第二摄像头输入的图像指令。Identify image instructions input by the user through the second camera in the video stream content.
  16. 根据权利要求13-15任一项所述的方法,其特征在于,匹配切换摄像头的图像指令的特征行为包括所述第一图像块中存在用户调转所述电子设备方向导致的录制画面旋转。The method according to any one of claims 13-15 is characterized in that the characteristic behavior of matching the image instruction for switching the camera includes the presence of a rotation of the recorded screen in the first image block caused by the user turning the direction of the electronic device.
  17. 根据权利要求1-16任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 16, characterized in that the method further comprises:
    根据录制内容生成第一录制文件,所述第一录制文件包括删除第二时间区间对应录制内容后的拼接内容,所述第二时间区间基于所述图像指令所在的第一时间区间确定。A first recording file is generated according to the recorded content, wherein the first recording file includes the spliced content after deleting the recorded content corresponding to the second time interval, and the second time interval is determined based on the first time interval where the image instruction is located.
  18. 根据权利要求17所述的方法,其特征在于,所述方法还包括:The method according to claim 17, characterized in that the method further comprises:
    根据录制内容生成第二录制文件,所述第二录制文件中标记有所述第二时间区间的开始时间和结束时间。A second recording file is generated according to the recorded content, and the start time and end time of the second time interval are marked in the second recording file.
  19. 根据权利要求17所述的方法,其特征在于,所述方法还包括:The method according to claim 17, characterized in that the method further comprises:
    根据录制内容生成第三录制文件集,所述第三录制文件集包括所述录制内容在所述第二时间区间对应的至少一个第一类录制片段文件及所述录制内容在所述第二时间区间以外的至少一个第二类录制片段文件。A third recording file set is generated according to the recorded content, wherein the third recording file set includes at least one first-category recording segment file corresponding to the recorded content in the second time interval and at least one second-category recording segment file of the recorded content outside the second time interval.
  20. 一种电子设备,其特征在于,包括:存储器,用于存储所述电子设备的一个或多个处理器执行的指令,以及所述处理器,是所述电子设备的一个或多个处理器之一,用于执行权利要求1-19任一项所述的视频录制控制方法。An electronic device, characterized in that it comprises: a memory for storing instructions executed by one or more processors of the electronic device, and the processor, which is one of the one or more processors of the electronic device, is used to execute the video recording control method described in any one of claims 1-19.
  21. 一种可读存储介质,其特征在于,所述可读介质上存储有指令,所述指令在电子设备上执行时使得所述电子设备执行权利要求1-19任一项所述的视频录制控制方法。 A readable storage medium, characterized in that instructions are stored on the readable medium, and when the instructions are executed on an electronic device, the electronic device executes the video recording control method according to any one of claims 1-19.
PCT/CN2023/118317 2022-10-11 2023-09-12 Video-recording control method, electronic device and medium WO2024078238A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211243144.0 2022-10-11
CN202211243144.0A CN117880413A (en) 2022-10-11 2022-10-11 Video recording control method, electronic equipment and medium

Publications (1)

Publication Number Publication Date
WO2024078238A1 true WO2024078238A1 (en) 2024-04-18

Family

ID=90595415

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/118317 WO2024078238A1 (en) 2022-10-11 2023-09-12 Video-recording control method, electronic device and medium

Country Status (2)

Country Link
CN (1) CN117880413A (en)
WO (1) WO2024078238A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103167230A (en) * 2011-12-17 2013-06-19 富泰华工业(深圳)有限公司 Electronic equipment and method controlling shooting according to gestures thereof
CN106506968A (en) * 2016-11-29 2017-03-15 广东欧珀移动通信有限公司 Control method, control device, electronic installation
KR20210038446A (en) * 2020-02-14 2021-04-07 베이징 바이두 넷컴 사이언스 테크놀로지 컴퍼니 리미티드 Method and apparatus for controlling electronic device based on gesture
CN114637439A (en) * 2022-03-24 2022-06-17 海信视像科技股份有限公司 Display device and gesture track recognition method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103167230A (en) * 2011-12-17 2013-06-19 富泰华工业(深圳)有限公司 Electronic equipment and method controlling shooting according to gestures thereof
CN106506968A (en) * 2016-11-29 2017-03-15 广东欧珀移动通信有限公司 Control method, control device, electronic installation
KR20210038446A (en) * 2020-02-14 2021-04-07 베이징 바이두 넷컴 사이언스 테크놀로지 컴퍼니 리미티드 Method and apparatus for controlling electronic device based on gesture
CN114637439A (en) * 2022-03-24 2022-06-17 海信视像科技股份有限公司 Display device and gesture track recognition method

Also Published As

Publication number Publication date
CN117880413A (en) 2024-04-12

Similar Documents

Publication Publication Date Title
US20240168624A1 (en) Screen capture method and related device
JP7326476B2 (en) Screenshot method and electronic device
WO2021078284A1 (en) Content continuation method and electronic device
WO2021129198A1 (en) Method for photography in long-focal-length scenario, and terminal
WO2021057673A1 (en) Image display method and electronic device
WO2022042326A1 (en) Display control method and related apparatus
CN114185503B (en) Multi-screen interaction system, method, device and medium
WO2022179405A1 (en) Screen projection display method and electronic device
CN114115674B (en) Method for positioning sound recording and document content, electronic equipment and storage medium
CN114697732A (en) Shooting method, system and electronic equipment
WO2023241209A9 (en) Desktop wallpaper configuration method and apparatus, electronic device and readable storage medium
WO2023045712A1 (en) Screen mirroring abnormality processing method and electronic device
WO2024012011A1 (en) Video recording method and electronic device
WO2024078238A1 (en) Video-recording control method, electronic device and medium
WO2023020012A1 (en) Data communication method between devices, device, storage medium, and program product
EP4167072A1 (en) Method for opening file, and device
WO2024078236A1 (en) Recording control method, electronic device, and medium
WO2024140123A1 (en) Stop motion animation generation method, electronic device, cloud server, and system
WO2024140002A1 (en) Storage space management method and apparatus, electronic device, and storage medium
WO2023236830A1 (en) Device control method and electronic device
CN117714849A (en) Image shooting method and related equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23876443

Country of ref document: EP

Kind code of ref document: A1