CN116233348A - Video recording method and electronic equipment - Google Patents

Video recording method and electronic equipment Download PDF

Info

Publication number
CN116233348A
CN116233348A CN202310274620.3A CN202310274620A CN116233348A CN 116233348 A CN116233348 A CN 116233348A CN 202310274620 A CN202310274620 A CN 202310274620A CN 116233348 A CN116233348 A CN 116233348A
Authority
CN
China
Prior art keywords
video
sound signal
sound
target
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310274620.3A
Other languages
Chinese (zh)
Inventor
李�瑞
黄雪妍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202310274620.3A priority Critical patent/CN116233348A/en
Publication of CN116233348A publication Critical patent/CN116233348A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72439User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for image or video messaging

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Human Computer Interaction (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Studio Devices (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

A video recording method and electronic equipment are used for improving video shooting quality. The method comprises the following steps: starting a camera and a microphone of the electronic equipment, wherein the camera is used for collecting video signals, and the microphone is used for collecting sound signals in the environment; determining a first target sound signal in sound signals in the environment, wherein the first target sound signal is one or more sound signals in N sound signals included in the environment, the frequencies and/or sound sources of the N sound signals are different, and N is an integer greater than or equal to 1; and synthesizing the video signal acquired by the camera and the first target sound signal into a video file.

Description

Video recording method and electronic equipment
Technical Field
The present disclosure relates to the field of electronic technologies, and in particular, to a video recording method and an electronic device.
Background
Video shooting (or video recording) is a common means used by people to record life. Generally, users often use mobile phones to capture video. When shooting video, the mobile phone can collect not only video signals, but also all sound signals in shooting scenes, and then the video signals and the sound signals are synthesized into video files (or called video files).
However, for shooting scenes with noisy sound (such as a concert, a downtown, etc.), the video files shot by the user often have more noise and poor shooting effect.
Disclosure of Invention
The purpose of the application is to provide a video recording method and electronic equipment, which are used for improving video shooting quality.
In a first aspect, a video recording method is provided and applied to an electronic device, where the electronic device may be a mobile phone, a tablet computer, or other devices having a camera and a microphone. The electronic equipment starts a camera and a microphone, wherein the camera is used for collecting video signals, and the microphone is used for collecting sound signals in the environment; determining a first target sound signal in sound signals in the environment, wherein the first target sound signal is one or more sound signals in N sound signals included in the environment, the frequencies and/or sound sources of the N sound signals are different, and N is an integer greater than or equal to 1; and synthesizing the video signal acquired by the camera and the first target sound signal into a video file.
In this embodiment of the present application, when the electronic device records video, the video signal collected by the camera and the first target sound signal in the environment collected by the microphone may be synthesized into a video file, and the video file obtained by this video recording method includes less noisy sounds, so that the video effect is better.
Illustratively, the electronic device activates the camera and microphone, including: and starting the first application, wherein the first application has a video recording function, and when the electronic equipment starts the video recording function of the first application, the camera and the microphone are opened. The first application may be a camera application in the electronic device, and the camera application may be a camera application of the electronic device or a third party camera application downloaded from a network by the electronic device. Alternatively, the first application may be an instant messaging application, for example, a Hua-Cheng application, where the Hua-Cheng application has a video call function or a video shooting function, etc. Alternatively, the first application may also be a short video application of various types, such as a tremble, a reddish book, or the like. Alternatively, the first application may also be a social network, such as a microblog, forum, or the like. In summary, the first application may be any application in the electronic device having a photographing function.
The electronic device determines the first target sound signal of the N sound signals in the environment in a plurality of ways, including but not limited to at least one of the following three ways.
First mode
Determining a first target sound signal of sound signals in an environment, comprising: displaying N labels on a display screen of the electronic equipment, wherein the N labels are used for identifying the N sound signals; and determining a target tag in response to an operation for selecting the target tag, wherein the sound signal corresponding to the target tag is the first target sound signal.
In brief, the electronic device displays N sound signals on the display screen by means of a tag, and the user selects the first target device signal by means of selecting the tag. In this way, the first target sound signal is selected by the user, and the first target sound signal accords with the preference of the user and has better experience.
One implementation manner of displaying N labels on the display screen of the electronic device is that a preview interface is displayed on the display screen of the electronic device, a video signal collected by the camera is displayed on the preview interface, and N labels are displayed in the preview interface.
For example, the electronic device opens a camera application, the camera application displays a preview interface, the preview interface displays video signals collected by the camera in real time, and N labels corresponding to N sound signals in the environment are displayed in the preview interface.
The displaying N labels in the preview interface may be displaying N labels at any position in the preview interface, or displaying N labels at a preset position (may be a position set by default by the system), or displaying N labels at a position specified by the user (i.e., the position may be set by the user). It is understood that the display position of one or more of the N tabs may be changed according to the operation of the user.
Alternatively, displaying the N labels in the preview interface may further include: determining the positions of M shooting objects in a preview interface; displaying M labels at the positions of the M shooting objects in the preview interface, wherein the M shooting objects are sound sources of M sound signals corresponding to the M labels in the N labels; or displaying N-M labels out of the M labels in other positions in the preview interface; wherein M is an integer of 1 to N.
Assume that the electronic device recognizes that P photographic subjects are included in the preview interface (P is greater than M), and that the number of tags is N, and that three relationships exist between P and N.
1、P>N
The number N of labels is smaller than the number P of shooting objects in the preview interface. One possible scenario is that various objects in the video signal captured by the camera, where some of the objects do not make sound, so the microphone cannot capture the sound signal of the some of the objects, so the number of objects in the capture interface is greater than the number of tags.
2、P=N
The number N of tags is equal to the number P of photographic subjects in the preview interface. A possible scenario is that a plurality of photographic subjects included in a video signal acquired by a camera just all sound, and there is no side sound.
3、P<N
The number N of labels is larger than the number P of shooting objects in the preview interface. One possible scenario is that, because the field angle of the camera is limited, the video signal captured by the camera does not include a subject that is not in the field of view, but the sound emitted by the subject that is not in the field of view is captured by the microphone, like a bystander sound. In such a scene, there may be no object in the video signal collected by the camera, but the sound signal collected by the microphone includes the sound signal of the object, that is, the number of shooting objects in the shooting interface is smaller than the number of tags.
In the above three cases, when the electronic device displays N labels in the preview interface, corresponding M shooting objects may be matched in P shooting objects according to M labels in the N labels, where the M shooting objects are sound sources of M sound signals corresponding to the M labels. Then, M tags are displayed at positions where M photographic subjects are located. Other tabs (other tabs than M tabs) out of the N tabs may be displayed at other locations in the preview interface.
In one possible design, displaying N labels on a display screen of the electronic device includes: detecting the call-out operation of the N labels by a user; and responding to the call-out operation, and displaying the N labels on the preview interface. That is, the N labels may not automatically appear but be displayed by the user's evoking operation. Alternatively, when the electronic device detects a hiding operation for hiding the N tags, the N tags may be hidden. In this way, the user can control whether the N labels are displayed or not, and the experience is good.
Second mode
Determining a first target sound signal of sound signals in an environment, comprising: determining a subject capture object in a video signal, the subject capture object being one or more objects in the video signal; and determining a first target sound signal according to the subject shooting object, wherein the sound source of the first target sound signal is the subject shooting object.
There are various ways to determine the subject, including but not limited to at least one of the following.
In one aspect, the subject shooting object is an object in the video signal specified by the user on the preview interface.
And displaying video signals acquired by the camera in a preview interface of the electronic equipment, wherein the video signals are used for designating one or more objects in the video signals as main shooting objects. The subject shooting object determined in the mode is selected by the user, and the user preference is met.
In the second mode, the subject shooting object is an object of interest to the user in the video signal.
The object of interest to the user can be an object recorded by the electronic equipment and frequently shot by the user or an object frequently repaired. In one implementation, taking the example of the object being a cat, the electronic device determines that the number of images of the cat in the images stored in the gallery application is greater, and then determines that the object of interest to the user is a cat. In another implementation manner, the electronic device records an object with more times of image trimming when the user uses the modification software to trim the image, and determines that the object with more times of image trimming is an object of interest to the user. When the electronic equipment determines that an object of interest to a user exists in the video signal acquired by the camera, the object is determined to be a main shooting object.
Third mode
Determining a first target sound signal of sound signals in an environment, comprising: detecting a second operation for indicating a first mode, the first mode being a mode for indicating recording of a specific sound signal; in response to the second operation, the specific sound signal is determined to be the first target sound signal.
Optionally, the electronic device provides a plurality of specific sound signal recording modes, the user may select a specific sound signal recording mode, and the electronic device determines the specific sound signal selected by the user as the first target sound signal.
The above three ways of determining the first target sound signal in the environment are also applicable to other ways, and the embodiments of the present application are not limited.
In one possible design, after the electronic device determines the first target sound signal, waiting for a preset duration to automatically start recording; or, after the first target sound signal is determined, when an operation for instructing to start video recording is detected, video recording is started.
For example, the electronic device starts the camera application to record, and waits for a certain period of time (for example, 3 s) after determining the first target sound signal to automatically start recording, or detects that the user clicks a button for starting recording after determining the first target sound signal to start recording.
In the embodiment of the present application, the target sound signal may be changed (or switched) before or during video recording, and the following description is given in two scenarios.
Scene one
Before recording is started, the target sound signal is changed. For example, before recording is started, a first target sound signal is determined, if the user is not satisfied, the first target sound signal can be switched to a second target sound signal, and after switching to the second target sound signal, the second sound signal and a video signal collected by a camera are combined into a video file when recording is started.
Scene two
The target sound signal is changed in the middle of recording, wherein the middle of recording can be understood as stopping recording after starting recording.
For example, before recording, the electronic device determines the first target sound signal, and after recording is started, the electronic device synthesizes the video signal collected by the camera with the first target sound signal to form a recording file. Before stopping video recording, the electronic device detects that the user switches the first target sound signal to the second target sound signal, and then the electronic device continues to synthesize the video signal collected by the camera and the second target sound signal into a video file. If a video recording stopping instruction is detected, a video recording file is obtained, wherein a first segment in the video recording file is synthesized by a video signal acquired by the camera and the first target sound signal, and a second segment in the video recording file is synthesized by a video signal acquired by the camera and the second target sound signal. The first fragment is a fragment preceding the second fragment.
For example, after determining a first target sound signal, the electronic device synthesizes a video signal collected by the camera in a first duration with a first sound signal collected by the microphone in the first duration to obtain a first video clip, and before stopping video recording, the electronic device detects that the first target sound signal is switched to a second target sound signal, and then synthesizes the video signal collected by the camera in a second duration with a second target sound signal collected by the microphone in a second duration into a second video clip; and when the video recording stopping instruction is detected, synthesizing the first video recording section and the second video recording section into a video recording file. In this way, the target sounds with different segments protruding in the video file obtained by the electronic equipment are different, and the shooting experience is good.
In other embodiments, the electronic device stores the first video file and the second video file in response to a stop recording instruction; the first video file is synthesized by video signals collected by the camera and N sound signals in the environment, and the second video file is synthesized by video signals collected by the camera and the first target sound signals. That is, two video files can be obtained by one recording, one video file is a combination of the video signal collected by the camera and the first target sound signal, and the other video file is a combination of the video signal collected by the camera and the N sound signals in the environment, similar to the video file obtained by the traditional video recording method. The two video files obtained by the method can be conveniently compared and checked by a user, and the experience is good.
In some embodiments, synthesizing the video signal collected by the camera and the first target sound signal into a video file includes: enhancing the first target sound signal and/or weakening other sound signals which are other sound signals than the first target sound signal of the N sound signals; and synthesizing video signals acquired by the camera with the enhanced first target sound signals and the weakened other sound signals into video files.
In this way, the video file obtained by the electronic device includes various sound signals in the environment, but the first target sound signal is more prominent, and other sound signals are weaker, so that not only can various sounds in the real environment be reserved, but also the first target sound can be highlighted, the video experience is better, and the quality of the obtained video file is higher.
In a second aspect, there is provided an electronic device comprising:
a processor, a memory, and one or more programs;
wherein the one or more programs are stored in the memory, the one or more programs comprising instructions, which when executed by the processor, cause the electronic device to perform the steps of:
Starting a camera and a microphone of the electronic equipment, wherein the camera is used for collecting video signals, and the microphone is used for collecting sound signals in the environment;
determining a first target sound signal in sound signals in the environment, wherein the first target sound signal is one or more sound signals in N sound signals included in the environment, the frequencies and/or sound sources of the N sound signals are different, and N is an integer greater than or equal to 1;
and synthesizing the video signal acquired by the camera and the first target sound signal into a video file.
In one possible design, the instructions, when executed by the processor, cause the electronic device to specifically perform the steps of: displaying N labels on a display screen of the electronic equipment, wherein the N labels are used for identifying the N sound signals; and determining a target tag in response to an operation for selecting the target tag, wherein the sound signal corresponding to the target tag is the first target sound signal.
In one possible design, the instructions, when executed by the processor, cause the electronic device to specifically perform the steps of: determining a subject capture object in the video signal, the subject capture object being one or more objects in the video signal; and determining a first target sound signal according to the subject shooting object, wherein the sound source of the first target sound signal is the subject shooting object.
In one possible design, the subject capture object is an object in the video signal specified by a user on a preview interface; alternatively, the subject photographing object is an object of interest to the user in the video signal.
In one possible design, the instructions, when executed by the processor, cause the electronic device to specifically perform the steps of: detecting a second operation for indicating a first mode, the first mode being a mode for indicating recording of a specific sound signal; in response to the second operation, the specific sound signal is determined to be the first target sound signal.
In one possible design, the instructions, when executed by the processor, cause the electronic device to specifically perform the steps of: detecting the call-out operation of the N labels by a user; and displaying the N labels on a display screen.
In one possible design, the instructions, when executed by the processor, cause the electronic device to specifically perform the steps of: displaying a preview interface, wherein the preview interface comprises video signals collected by the camera; determining the positions of M shooting objects in the preview interface; displaying M labels at the positions of the M shooting objects in the preview interface, wherein the M shooting objects are sound sources of M sound signals corresponding to the M labels in the N labels; or displaying N-M labels out of the M labels in other positions in the preview interface; wherein M is an integer of 1 to N.
In one possible design, the instructions, when executed by the processor, cause the electronic device to further perform the steps of: after the first target sound signal is determined, waiting for a preset duration to automatically start video recording; or, after the first target sound signal is determined, when an operation for instructing to start video recording is detected, video recording is started.
In one possible design, the instructions, when executed by the processor, cause the electronic device to specifically perform the steps of: synthesizing a video signal acquired by the camera in a first time period and a first target sound signal acquired by the microphone in the first time period into a first video clip, wherein the first time period is the time period after the first target sound signal is determined; the instructions, when executed by the processor, cause the electronic device to further perform the steps of: before stopping video recording, switching the first target sound signal into a second target sound signal according to a target sound signal switching operation; synthesizing a video signal acquired by the camera in a second time period and a second target sound signal acquired by the microphone in the second time period into a second video clip, wherein the second time period is the time period after switching to the second target sound signal; and stopping the video recording instruction, and synthesizing the first video recording section and the second video recording section into a video recording file.
In one possible design, the instructions, when executed by the processor, cause the electronic device to further perform the steps of: responding to a recording stopping instruction, and storing a first video file and a second video file; the first video file is synthesized by video signals collected by the camera and N sound signals in the environment, and the second video file is synthesized by video signals collected by the camera and the first target sound signals.
In one possible design, the instructions, when executed by the processor, cause the electronic device to specifically perform the steps of: enhancing the first target sound signal and/or weakening other sound signals which are other sound signals than the first target sound signal of the N sound signals; and synthesizing video signals acquired by the camera with the enhanced first target sound signals and the weakened other sound signals into video files.
In a third aspect, a video file processing method is provided, which is applied to an electronic device. The method comprises the following steps: determining a first video file to be processed in a locally stored video file, wherein the first video file comprises video signals and N sound signals, and N is an integer greater than or equal to 1; determining a target sound signal of the N sound signals; and enhancing the target sound signal in the first video file, and/or weakening other sound signals in the first video file to obtain a second video file, wherein the other sound signals are other sound signals except the target sound signal in the N sound signals.
That is, the embodiment of the application can process the recorded video file, highlight the target sound signal in the video file, weaken other sound signals in the video file, for example, can weaken the noisy or dislike sound in the video file, and improve the quality of the video file.
It should be noted that, in general, in the recording process, the user may not consider that the sound in the environment is noisy or that the user does not want to record in the environment, and after the recording is completed, the user opens the recording file to find that some sound is recorded, at this time, the post-processing mode can be used to process the sound signal in the recording file, so that the user experience is better.
The method includes the steps that a first video file to be processed in locally stored video files is determined, the first application is started by the electronic equipment, the first application comprises at least one video file, and the first video file to be processed is determined according to user operation. The first application can be a local gallery or a cloud gallery of the electronic device; or the first application is a short video application, and the first video file is a short video downloaded by the electronic equipment; or the first application is an instant messaging application, and videos sent by other contacts in the first video file; alternatively, the first application is a social network and the first video file is a video downloaded from the social network (e.g., a video downloaded by a user after release by another person).
In one possible design, determining a target sound signal of the N sound signals includes: displaying N labels, wherein the N labels are used for identifying the N sound signals; and determining a target label in response to an operation for selecting the target label, wherein a sound signal corresponding to the target label is the target sound signal.
In one possible design, determining a target sound signal of the N sound signals includes: determining a subject shooting object in the video signal; the subject shooting object is one or more objects in the video signal; and determining the target sound signal according to the subject shooting object, wherein the sound source of the target sound signal is the subject shooting object.
Illustratively, the subject capture object is an object in the video signal specified by a user in the video signal; alternatively, the subject photographing object is an object of interest to the user in the video signal.
In a fourth aspect, there is provided an electronic device comprising:
a processor, a memory, and one or more programs;
wherein the one or more programs are stored in the memory, the one or more programs comprising instructions, which when executed by the processor, cause the electronic device to perform the steps of:
Determining a first video file to be processed in a locally stored video file, wherein the first video file comprises video signals and N sound signals, and N is an integer greater than or equal to 1; determining a target sound signal of the N sound signals; and enhancing the target sound signal in the first video file, and/or weakening other sound signals in the first video file to obtain a second video file, wherein the other sound signals are other sound signals except the target sound signal in the N sound signals.
In one possible design, the instructions, when executed by the processor, cause the electronic device to specifically perform the steps of: displaying N labels, wherein the N labels are used for identifying the N sound signals; and determining a target label in response to an operation for selecting the target label, wherein a sound signal corresponding to the target label is the target sound signal.
In one possible design, the instructions, when executed by the processor, cause the electronic device to specifically perform the steps of: determining a subject shooting object in the video signal; the subject shooting object is one or more objects in the video signal; and determining the target sound signal according to the subject shooting object, wherein the sound source of the target sound signal is the subject shooting object.
Illustratively, the subject capture object is an object in the video signal specified by a user in the video signal; alternatively, the subject photographing object is an object of interest to the user in the video signal.
In a fifth aspect, there is also provided a computer readable storage medium for storing a computer program which, when run on a computer, causes the computer to perform the method as provided in the first or third aspects above.
In a sixth aspect, there is also provided a computer program product comprising a computer program which, when run on a computer, causes the computer to perform the method as provided in the first or third aspect above.
In a seventh aspect, there is also provided a graphical user interface on an electronic device with a display screen, a memory, and a processor to execute one or more computer programs stored in the memory, the graphical user interface comprising a graphical user interface displayed by the electronic device when performing a method as provided in the first or third aspects above.
In an eighth aspect, embodiments of the present application further provide a chip system, where the chip system is coupled to a memory in an electronic device, and is configured to invoke a computer program stored in the memory and execute the technical solution of the first aspect of the embodiments of the present application, or is configured to invoke a computer program stored in the memory and execute the technical solution of the third aspect of the embodiments of the present application, where "coupled" in the embodiments of the present application means that two components are directly or indirectly combined with each other.
The advantages of the second to eighth aspects are described above, with reference to the advantages of the first aspect, and the description is not repeated.
Drawings
Fig. 1 is a schematic diagram of a first application scenario provided in an embodiment of the present application;
fig. 2A is a schematic diagram of a second application scenario provided in an embodiment of the present application;
fig. 2B is a schematic diagram of a third application scenario provided in an embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of a video recording method according to an embodiment of the present disclosure;
fig. 5 to 6 are schematic diagrams illustrating separation of different sound signals from a mixed sound signal according to an embodiment of the present application:
fig. 7 is a flowchart of a video recording method applied by a camera in an electronic device according to an embodiment of the present application;
fig. 8 to 9 are schematic diagrams of a preview interface displayed on an electronic device according to an embodiment of the present application;
fig. 10 to 14 are schematic diagrams of an electronic device display tag according to an embodiment of the present application;
fig. 15 to 18 are schematic diagrams of a GUI of an electronic device after recording starts according to an embodiment of the present application;
FIG. 19 is a flowchart illustrating a video file processing method according to an embodiment of the present disclosure;
FIG. 20 is a schematic diagram of a gallery application in an electronic device according to an embodiment of the present disclosure;
fig. 21 to 23 are schematic diagrams illustrating a video file processing procedure of an electronic device according to an embodiment of the present application;
fig. 24 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In the following, some terms in the embodiments of the present application are explained for easy understanding by those skilled in the art.
In the embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s).
Unless stated to the contrary, the ordinal terms such as "first," "second," and the like in the embodiments of the present application are used for distinguishing a plurality of objects, and are not used for limiting the size, content, order, timing, priority, importance, or the like of the plurality of objects.
As described above, for shooting scenes with noisy sounds (such as a concert, a downtown, etc.), the video files shot by the user often have more noise and poor shooting effect. One solution is to use late dubbing of a television series. For example, when shooting video, the sound signal is not collected or after the shot video file is subjected to silencing treatment, the video file is additionally dubbed. The post-dubbing mode generally requires a special software tool, and the synchronization of dubbing and video signals requires a professional technician, and in short, the post-production is difficult for non-professional users to realize.
The embodiment of the application provides a video recording method, a video file processing method and electronic equipment. Specifically, the video recording method provided by the embodiment of the present application may determine a first target sound signal in sound signals in an environment, and when recording video, synthesize a video signal collected by the camera with the first target sound signal into a video file. Therefore, the video file obtained by the electronic equipment comprises the first target sound signal and does not comprise other sound signals in the environment, namely, noisy sounds in the environment are filtered, the first target sound signal is reserved, and the video shooting quality is improved.
Several application scenarios provided in the embodiments of the present application are described below.
Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application. The application scene is a scene in which a user is recording a seaside landscape using an electronic device (such as a mobile phone). The camera of the electronic equipment is collecting video signals (including shooting objects such as sea waves, birds and mountains) in real time, and the microphone is collecting sound signals in the environment in real time, wherein the sound signals in the environment comprise various sound signals such as sea wave sound, bird song, human voice and the like. The electronic device may determine a first target sound signal of the plurality of sound signals in the environment, and then synthesize the first target sound signal with the video signal collected by the camera into a video file. If the first target sound signal is wave sound, the video signal collected by the camera of the video file is synthesized with the wave sound, and noise such as human sound, bird sound and the like is not included. Or after the electronic equipment determines the first target sound signal, enhancing the first target sound signal and/or weakening other sound signals in the environment, and synthesizing the enhanced first target sound signal and/or weakened other sound signals and video signals acquired by the camera into a video file. Thus, the video file can include a plurality of sound signals in the environment, but the sound of the first target sound signal is more prominent, other sound signals are relatively weaker, and the video effect is better.
Fig. 2A is a schematic diagram of still another application scenario provided in an embodiment of the present application. The application scene is a scene in which a user is recording a meeting using an electronic device (such as a mobile phone). The camera of the electronic device is collecting video signals (including shooting objects such as a participant, a table, a screen and the like) in real time, and the microphone collects sound signals in an environment in real time, wherein the sound signals in the environment comprise various sound signals such as the sound of a speaker, the sound of a hearing speaker and the like. The electronic device may determine a first target sound signal of the plurality of sound signals in the environment, and then synthesize the first target sound signal with the video signal collected by the camera into a video file. The first target sound signal is assumed to be the sound of a lecturer, namely, the video signal collected by the camera of the video file is synthesized with the sound of the lecturer, and the sound of the lecturer is not included, so that the noise of the photographed video file is less, and the effect is better. Or after the electronic equipment determines the first target sound signal, enhancing the first target sound signal and/or weakening other sound signals in the environment, and synthesizing the enhanced first target sound signal and/or weakened other sound signals and video signals acquired by the camera into a video file. Thus, the video file may include a plurality of sound signals in the environment, but the first target sound signal (such as the voice of the speaker) is more prominent, and the other sound signals (such as the voice of the speaker) are relatively weak, so that the video effect is better.
Fig. 2B is a schematic diagram of another application scenario provided in an embodiment of the present application. The application scene is a scene in which a user is recording a concert using an electronic device (such as a mobile phone). The camera of the electronic device is collecting video signals (including shooting objects such as stages, singers and audiences) in real time, and the microphone collects sound signals in an environment in real time, wherein the sound signals in the environment comprise various sound signals such as singing sounds of the singers and shouting sounds of the audiences. The electronic device may determine a first target sound signal of the plurality of sound signals in the environment, and then synthesize the first target sound signal with the video signal collected by the camera into a video file. The first target sound signal is assumed to be singing sound of singers, namely, video signals collected by the camera of the video file are synthesized with the singing sound, and the shouting sound of audiences is not included, so that the noise of the photographed video file is less, and the effect is good. Or after the electronic equipment determines the first target sound signal, enhancing the first target sound signal and/or weakening other sound signals in the environment, and synthesizing the enhanced first target sound signal and/or weakened other sound signals and video signals acquired by the camera into a video file. Thus, the video file may include multiple sound signals in the environment, but the first target sound signal (such as singing voice of singer) is more prominent, and other sound signals (such as shouting voice of audience) are relatively weak, so that the video effect is better.
The video recording method and the video file processing method provided by the embodiment of the application may be applied to electronic devices, where the electronic devices may be any devices with a camera and a display screen, such as a mobile phone, a tablet computer, a wearable device (e.g., a watch, a bracelet, a helmet, an earphone, a necklace, etc.), a vehicle-mounted device, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA), etc., and the embodiments of the application do not limit the specific type of the electronic device.
By way of example, fig. 3 shows a schematic structural diagram of the electronic device 100. As shown in fig. 3, the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, a user identification module (subscriber identification module, SIM) card interface 195, and the like.
The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors. The controller may be a neural hub and a command center of the electronic device 100, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution. A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.
The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device 100, and may also be used to transfer data between the electronic device 100 and a peripheral device. The charge management module 140 is configured to receive a charge input from a charger. The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like.
The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like. The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.
The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., as applied to the electronic device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.
In some embodiments, antenna 1 and mobile communication module 150 of electronic device 100 are coupled, and antenna 2 and wireless communication module 160 are coupled, such that electronic device 100 may communicate with a network and other devices through wireless communication techniques. The wireless communication techniques may include the Global System for Mobile communications (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a beidou satellite navigation system (beidou navigation satellite system, BDS), a quasi zenith satellite system (quasi-zenith satellite system, QZSS) and/or a satellite based augmentation system (satellite based augmentation systems, SBAS).
The display 194 is used to display a display interface of an application, such as a viewfinder interface of a camera application, or the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (flex), a mini, a Micro led, a Micro-OLED, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.
The electronic device 100 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.
The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.
The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.
Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent awareness of the electronic device 100 may be implemented through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.
The internal memory 121 may be used to store computer executable program code including instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an operating system, and software code of at least one application program (e.g., an aiqi application, a WeChat application, etc.), etc. The storage data area may store data (e.g., captured images, recorded video, etc.) generated during use of the electronic device 100, and so forth. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as pictures and videos are stored in an external memory card.
The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.
The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.
The pressure sensor 180A is used to sense a pressure signal, and may convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The gyro sensor 180B may be used to determine a motion gesture of the electronic device 100. In some embodiments, the angular velocity of electronic device 100 about three axes (i.e., x, y, and z axes) may be determined by gyro sensor 180B.
The gyro sensor 180B may be used for photographing anti-shake. The air pressure sensor 180C is used to measure air pressure. In some embodiments, electronic device 100 calculates altitude from barometric pressure values measured by barometric pressure sensor 180C, aiding in positioning and navigation. The magnetic sensor 180D includes a hall sensor. The electronic device 100 may detect the opening and closing of the flip cover using the magnetic sensor 180D. In some embodiments, when the electronic device 100 is a flip machine, the electronic device 100 may detect the opening and closing of the flip according to the magnetic sensor 180D. And then according to the detected opening and closing state of the leather sheath or the opening and closing state of the flip, the characteristics of automatic unlocking of the flip and the like are set. The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 100 in various directions (typically three axes). The magnitude and direction of gravity may be detected when the electronic device 100 is stationary. The method can also be used for identifying the gesture of the electronic equipment 100, and can be applied to applications such as horizontal and vertical screen switching, pedometers and the like.
A distance sensor 180F for measuring a distance. The electronic device 100 may measure the distance by infrared or laser. In some embodiments, the electronic device 100 may range using the distance sensor 180F to achieve quick focus. The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 100 emits infrared light outward through the light emitting diode. The electronic device 100 detects infrared reflected light from nearby objects using a photodiode. When sufficient reflected light is detected, it may be determined that there is an object in the vicinity of the electronic device 100. When insufficient reflected light is detected, the electronic device 100 may determine that there is no object in the vicinity of the electronic device 100. The electronic device 100 can detect that the user holds the electronic device 100 close to the ear by using the proximity light sensor 180G, so as to automatically extinguish the screen for the purpose of saving power. The proximity light sensor 180G may also be used in holster mode, pocket mode to automatically unlock and lock the screen.
The ambient light sensor 180L is used to sense ambient light level. The electronic device 100 may adaptively adjust the brightness of the display 194 based on the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust white balance when taking a photograph. Ambient light sensor 180L may also cooperate with proximity light sensor 180G to detect whether electronic device 100 is in a pocket to prevent false touches. The fingerprint sensor 180H is used to collect a fingerprint. The electronic device 100 may utilize the collected fingerprint feature to unlock the fingerprint, access the application lock, photograph the fingerprint, answer the incoming call, etc.
The temperature sensor 180J is for detecting temperature. In some embodiments, the electronic device 100 performs a temperature processing strategy using the temperature detected by the temperature sensor 180J. For example, when the temperature reported by temperature sensor 180J exceeds a threshold, electronic device 100 performs a reduction in the performance of a processor located in the vicinity of temperature sensor 180J in order to reduce power consumption to implement thermal protection. In other embodiments, when the temperature is below another threshold, the electronic device 100 heats the battery 142 to avoid the low temperature causing the electronic device 100 to be abnormally shut down. In other embodiments, when the temperature is below a further threshold, the electronic device 100 performs boosting of the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperatures.
The touch sensor 180K, also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 at a different location than the display 194.
The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, bone conduction sensor 180M may acquire a vibration signal of a human vocal tract vibrating bone pieces. The bone conduction sensor 180M may also contact the pulse of the human body to receive the blood pressure pulsation signal.
The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The electronic device 100 may receive key inputs, generating key signal inputs related to user settings and function controls of the electronic device 100. The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration alerting as well as for touch vibration feedback. For example, touch operations acting on different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization. The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc. The SIM card interface 195 is used to connect a SIM card. The SIM card may be contacted and separated from the electronic device 100 by inserting the SIM card interface 195 or extracting it from the SIM card interface 195.
It is to be understood that the components shown in fig. 3 are not to be construed as a specific limitation on the electronic device 100, and the electronic device 100 may include more or less components than illustrated, or may combine certain components, or may split certain components, or may have different arrangements of components. Furthermore, the combination/connection relationship between the components in fig. 3 is also adjustable and modifiable.
The technical solution provided in the embodiments of the present application will be described below with reference to the accompanying drawings by taking an electronic device (taking a mobile phone as an example) shown in fig. 3 as an example.
Example 1
Fig. 4 is a schematic flow chart of a video recording method according to an embodiment of the present application. The method may be applied to the electronic device shown in fig. 1, and the flow of the method includes:
s1, starting a camera and a microphone of the electronic equipment, wherein the camera is used for collecting video signals, and the microphone is used for collecting sound signals in the environment.
Illustratively, the electronic device activates the camera and microphone, including: and starting the first application, wherein the first application has a video recording function, and when the electronic equipment starts the video recording function of the first application, the camera and the microphone are opened. The first application may be a camera application in the electronic device, and the camera application may be a camera application of the electronic device or a third party camera application downloaded from a network by the electronic device. Alternatively, the first application may be an instant messaging application, for example, a Hua-Cheng application, where the Hua-Cheng application has a video call function or a video shooting function, etc. Alternatively, the first application may also be a short video application of various types, such as a tremble, a reddish book, or the like. Alternatively, the first application may also be a social network, such as a microblog, forum, or the like. In summary, the first application may be any application in the electronic device having a photographing function.
S2, determining a first target sound signal in sound signals in the environment, wherein the first target sound signal is one or more sound signals in N sound signals included in the environment, the frequencies and/or sound sources of the N sound signals are different, and N is an integer greater than or equal to 1.
Generally, sound signals in the environment include a variety of kinds, and can be understood as mixed sound signals. Thus, S2 may comprise: the mixed sound signals are separated to obtain at least one audio set, and then a target audio set (i.e., a first target sound signal) is determined in the at least one audio set.
There are various ways to separate the mixed sound signal, including but not limited to at least one of the following:
in the first way, the mixed sound signal is separated according to the frequency ranges of the different sound signals.
For example, the electronic device may pre-store a sound differentiation table in which different frequency intervals (the frequency of sound is typically in hertz (Hz), i.e. the number of periodic vibrations per second) are recorded. Wherein, different frequency intervals in the differential table can be preset. For example, please refer to the following table 1 for the sound differentiation table:
Table 1: sound differentiating meter
Frequency interval (Hz) Sound production
Interval
1 Audio group file 1
Interval 2 Audio group file 2
Interval 3 Audio group file 3
The electronic device may separate the mixed sound signals based on table 1 above. For example, a sound signal with a sound frequency in the section 1 in the mixed sound signal is separated out as an audio group file 1, a sound signal with a sound frequency in the section 2 in the mixed sound signal is separated out as an audio group file 2, and a sound signal with a sound frequency in the section 3 in the mixed sound signal is separated out as an audio group file 3. That is, the mixed sound signal is separated into three audio group files. The electronic device may store the three audio group files, and optionally, each audio group file may be named when stored, such as audio group 1 through audio group 3, respectively.
In this way, the mixed sound signal is separated by different frequency intervals, but it is unclear what sound type each audio group file separated is, such as wind sound or sea wave, and so on. The distinguishing mode is low in implementation difficulty and high in efficiency.
In the second way, the method can be further optimized based on the first way. Specifically, considering that sounds emitted from different sound sources have a specific frequency range, the different frequency intervals in table 1 above may correspond to sound emission frequency intervals of different types of sound sources. For example, the human voice frequency ranges from 100Hz (e.g., male bass) to 10000Hz (e.g., female treble), and the wind sounds range from 70Hz to 100 Hz. Therefore, the electronic device can store the sounding frequency range of various sound sources such as wind sound, rain sound, thunder sound, sea wave sound, footstep sound, bird song, and human speaking sound. Thus, table 1 above can be refined to table 2 below:
Table 2: sound differentiating meter
Frequency interval (Hz) Sound source type Sound production
Interval
1 Wind sound Audio group file 1
Interval 2 Human voice Audio group file 2
Interval 3 Sea wave sound Audio group file 3
The electronic device may separate the mixed sound signals based on table 2 above. For example, a sound signal with a sound frequency within the section 1 of the mixed sound signal is separated out and stored as an audio group file 1; the sound signals with the sound frequencies in the interval 2 in the mixed sound signals are separated out and stored as an audio group file 2; the sound signals of which the sound frequencies are within the section 3 among the mixed sound signals are separated out and stored as the audio group file 3. Since the sound source type of the separated individual audio group file can be determined after the audio group file is separated in the second manner, the audio group file can be named when the separated audio group file is stored. For example, audio group file 1 is named wind sound, audio group file 2 is named human sound, and audio group file 3 is named sea wave sound. For example, referring to fig. 5, the sound separation module in the electronic device may separate the mixed sound signals according to a sound distinction table (e.g., table 2 above) to obtain sound signals of different sound source types, such as wind sound, rain sound, sea wave sound, and the like.
In short, the types of sound sources corresponding to the plurality of audio group files separated in the second mode are different and accurate.
In a third aspect, the mixed sound signal is separated by a microphone array.
A microphone array may be understood as a plurality of microphones distributed according to a specific rule (e.g. three rows and three columns, five rows and five columns, etc.). Each microphone in the microphone array may collect sound signals, so that the sound signals collected by the microphone array may form a sound matrix. A sound matrix may be obtained for each sound source microphone array, so that each sound source may correspond to a sound matrix, and the sound matrices corresponding to different sound sources may be different, thereby being distinguished. In particular, the manner in which the sound matrix for each sound source is obtained by the microphone array may be implemented using an independent component analysis (independent component correlation algorithm, ICA) algorithm, which is not described in detail herein.
The third mode is more accurate than the previous second mode. Because the second mode can distinguish sound signals of different sound source types, but it is difficult to distinguish sound signals emitted from the same kind of sound source by the above second mode. For example, the second approach may separate the sound of a person from the sound of sea waves, but cannot distinguish between the sounds of different persons, e.g., person 1 and person 2. The sound signals of the same kind of sound source can be distinguished through the third mode, and the accuracy is higher.
In a fourth mode, the mixed sound signal is separated by voiceprint recognition.
Since the voiceprints of different sound sources are different, for example, the voiceprints of different people are different, the voiceprints of different animals are also different. Therefore, the electronic device may store a voiceprint database in advance, where voiceprints corresponding to different sound sources (such as different people or different animals) are stored. A plurality of audio groups are extracted from the mixed sound signal (e.g., using a first approach), and then the voiceprints of the audio groups are matched with voiceprints in a voiceprint database to determine the source of the audio groups (e.g., which person made the sound). In the mode, a certain storage space is needed for storing the voiceprint database, if the storage space of the electronic equipment is enough, the voiceprint database can be stored in the electronic equipment, if the storage space of the electronic equipment is insufficient, the voiceprint database can be stored in the cloud, when the electronic equipment needs to separate the mixed sound signals, the mixed sound signals can be sent to the cloud, the cloud separates the mixed sound signals according to the voiceprint database, and then the separation result is fed back to the electronic equipment. The fourth mode can distinguish the sound signals of the same kind of sound source, and the accuracy is higher.
It should be noted that the above four modes may be used alone or in combination. For example, the second mode is combined with the third mode, that is, the first mode is processed twice, and rough distinction can be performed according to the sound distinction table, and the second mode is further distinguished on the basis of the first distinction result (for example, further separation is performed on the sound signals of the same kind of sound sources). For example, referring to fig. 6, the first separation of the human voice, the wind voice, etc., and the second separation of the human voice separate the sounds of different persons. Thus, the sound signal is divided more accurately through the two separation processes.
After the mixed sound signal is separated into at least one audio group, a target audio group (i.e., a first target sound signal) may be determined. It will be appreciated that if only one audio set is separated, it may be determined that the audio set is the first target sound, and if at least two audio sets are separated, the first target sound signal may be determined in the at least two audio sets.
The first target sound signal may be determined in a plurality of ways, including but not limited to at least one of the following:
mode A
According to the user selection operation, the first target sound signal is determined.
For example, after the electronic device separates the mixed sound signal to obtain at least two audio groups, the at least two audio groups may be displayed in a certain manner for a user to select. For example, each audio group corresponds to a tag showing, and when the user selects which tag, the audio group corresponding to the tag is the target audio group (i.e. the first target sound signal).
Mode B
And determining a main shooting object in the video signal to be processed, and determining a target audio group (namely a first target sound signal) according to the main shooting object, wherein the main shooting object is a sound source of the target audio group.
Wherein, the subject shooting object can be one or more objects in the video signal to be processed. The one or more objects may be one or more objects of the same type or one or more objects of different types, without limitation. When there are a plurality of objects, the subject photographing object may be a target object among the plurality of objects. For example, the target object may be a preset object. The preset object can be a default set object; or, the setting is preset by the user, which is not limited in the embodiment of the present application. For another example, the target object may also be an object of interest to the user. The object of interest to the user can be an object recorded by the electronic equipment and frequently shot by the user or an object frequently repaired. In one implementation, taking the example of the object being a cat, the electronic device determines that the number of images of the cat in the images stored in the gallery application is greater, and then determines that the object of interest to the user is a cat. In another implementation manner, the electronic device records an object with more times of image trimming when the user uses the modification software to trim the image, and determines that the object with more times of image trimming is an object of interest to the user. When the electronic equipment determines that an object of interest to the user exists in the video signal to be processed, the object is determined to be a target object.
Alternatively, the subject shooting object may also be one or more object types in the video signal to be processed. Wherein one object type may correspond to one or more objects belonging to the type, in other words, when the subject photographed object is an object type, the subject photographed object includes all objects belonging to the object type in the video signal to be processed. For example, the video signal includes a person 1 and a person 2, and if the subject shooting object is of the object type "person", it is determined that the subject shooting object in the video signal to be processed includes two objects of the person 1 and the person 2. When there are a plurality of object types, the subject photographing object may be a target object type among the plurality of object types. The target object type may be any one or more of a plurality of object types, i.e., a plurality of object types are identified simultaneously if the target object type is a plurality of object types. For example, the object type having a higher priority among the object types of the target object type. For example, the priority relationship is: character > animal > word > food > flower > green plant > building. The electronic device may first determine whether the video signal to be processed includes a "person" type, and if so, determine that all objects belonging to the "person" type in the video signal to be processed (i.e., all persons in the video signal to be processed) are subject shooting objects; if the "character" type is not included, continuing to determine whether the "animal" type is included, if the "animal" type is included, determining that all objects belonging to the "animal" type in the video signal to be processed are subject shooting objects, and of course, if the "animal" type is not included, continuing to identify the object type of the next level, and so on. The priority relationship may be set by default in factory, or may be set by a user, which is not limited in this application. For another example, the target object type may also be a preset object type. The preset object type may be an object type set by default in factory or an object type set by a user, which is not limited in the embodiment of the present application. For another example, the target object type may also be an object type of interest to the user. In one implementation, taking the example of the object being a cat, the electronic device determines that the number of images of the cat in the images stored in the gallery application is greater, and then determines that the type of object of interest to the user is an "animal" type. In another implementation manner, the electronic device records an object with more image trimming times when the user uses the modification software to trim the image, and determines that the object type of the object with more image trimming times is the object type of interest to the user.
After determining the subject shooting object, it may be determined that an audio group whose sound source type matches the subject shooting object is a first target sound signal according to the sound source type from which at least two audio groups are separated. For example, the subject shooting object is a person, and the audio group in which the sound source is a person in the at least two audio groups is a target audio group (i.e., a first target sound signal).
Mode C
The electronic device provides a plurality of specific sound signal recording modes, the user can select a specific sound signal recording mode, and the electronic device determines the specific sound signal selected by the user as a first target sound signal. For example, the electronic device provides a wind sound recording mode, a rain sound recording mode, and the like, and if the user selects the wind sound recording mode, the electronic device determines wind sound from the collected mixed sound signals, and then synthesizes the wind sound and the video signals collected by the camera into a video file.
S4, synthesizing the video signals acquired by the camera and the first target sound signals into a video file.
There are various ways of synthesizing the first target sound signal and the video signal, including but not limited to the following ways a and B.
In the mode a, after the first target sound signal is determined, only the first target sound signal is synthesized with the video signal. That is, the synthesized video file includes only the first target sound signal and no other sound signals in the environment. In this way, other sound signals can be completely filtered and the sound is clean. Alternatively, the first target sound signal may be enhanced before being synthesized with the video signal, and then the enhanced first target sound signal may be synthesized with the video signal into the video file.
And B, after the first target sound signal is determined, enhancing the first target sound signal and/or weakening other sound signals, and then synthesizing the enhanced first target sound signal and the weakened other sound signals with the video signal to be processed, wherein the other sound signals are other sound signals except the first target sound signal in the sound signals in the environment. Wherein, the way of enhancing the first target sound signal may be to increase the intensity of the first target sound signal, and the way of weakening the other sound signals may be to decrease the intensity of the other sound signals. In this case, the synthesized video file includes a plurality of sound signals in the environment, but the first target sound signal has prominent sound and other sound signals are weaker.
By way of example, taking an example of video recording performed by an electronic device using a camera application, a video recording method provided in an embodiment of the present application is described below.
Fig. 7 is a schematic flow chart of a video recording method according to the second embodiment. This fig. 7 can be understood as a refinement of fig. 4, specifically, S702 in fig. 7 is a refinement of S1 in fig. 4, S703 in fig. 7 is a refinement of S2 in fig. 4, S704 to S706 in fig. 7 are a refinement of S3 in fig. 4, and S707 in fig. 7 is a refinement of S4 in fig. 4. As shown in fig. 7, the process includes:
S701, starting a camera application.
By way of example, fig. 8 (a) shows a graphical user interface (graphical user interface, GUI) of a mobile phone, the GUI being the desktop of the mobile phone. When the mobile phone detects that the user clicks the icon of the camera application on the desktop, the camera application can be started.
S702, starting a camera for collecting video signals.
S703, starting a microphone for collecting sound signals in the environment.
S704, displaying a preview interface, wherein video signals collected by the camera are displayed in the preview interface.
Illustratively, after the electronic device launches the camera application, another GUI, which may be referred to as a preview interface 801, is displayed as shown in FIG. 8 (b). The preview interface 801 is used for displaying video signals collected by the camera. A control 802 for indicating a photographing mode, a control 803 for indicating a recording mode, and a photographing control 804 may also be included on the preview interface 801. In the photographing mode, after the mobile phone detects that the user clicks the photographing control 804, the mobile phone performs photographing operation; in the video recording mode, when the mobile phone detects that the user clicks the shooting control 804, the mobile phone performs video recording operation.
It should be noted that, at present, the sound information in the environment collected by the microphone is not processed when the mobile phone is recorded, but in the second embodiment, the target sound in the sound signal in the collected environment is processed (such as enhanced processing) when the mobile phone is recorded, so if the current recording mode is referred to as a general recording mode and the recording mode of the application is referred to as a specific sound recording mode, the mobile phone can provide at least two recording modes. Optionally, before the mobile phone uses the specific sound recording mode for recording, the user may set the recording mode of the camera application on the mobile phone to the specific sound recording mode. For example, referring to fig. 8 (b), a mode option 805 is included on the preview interface 801. When the handset detects that the user clicks on the photographing option 805, a mode selection interface shown in fig. 8 (c) is displayed. After the mobile phone detects that the user clicks the control 806 for indicating the specific sound recording mode on the shooting mode interface, the mobile phone enters the specific sound recording mode.
Of course, in other embodiments, the user may not need to set the specific audio recording mode. For example, the mobile phone system defaults to use the specific sound recording mode, or, when the specific sound recording mode is set last time and the camera application is used for recording next time, defaults to use the specific sound recording mode set last time.
Optionally, after the mobile phone enters the specific sound recording mode, a preview interface 801 as shown in fig. 9 may be displayed, and a prompt message 807 may be displayed in the preview interface 801 to prompt the user that the mobile phone is currently in the specific sound recording mode, and in fig. 8, the prompt message 807 is an acoustic wave signal, which is taken as an example, it will be appreciated that other prompt messages are also possible. Alternatively, the prompt 807 may automatically disappear after a period of time is displayed so as not to obscure the preview interface 801.
S705, determining a first target sound signal of the sound signals in the environment.
The implementation of S705 includes at least one of the following.
In the first way, N labels are displayed on the preview interface 801, where the N labels are used to identify N sound signals in the environment; and determining a target tag in response to an operation for selecting the target tag, wherein the sound signal corresponding to the target tag is the first target sound signal. Specifically, the first mode may include steps 1 to 3.
Step 1; the sound signals in the environment are separated into N audio groups, where N is an integer greater than or equal to 2, and the separation principle is the same as the implementation principle of S3 in fig. 4, and the description is not repeated here.
Step 2: n tags are set for identifying N audio groups and N tags are displayed. For example, after separating the sound signals in the environment into N audio groups, N audio group files are stored, where each audio group file may be named when stored, and an exemplary tag may be the naming of the audio group file.
For example, if S704 uses the first approach previously described to separate sound signals in an environment into N audio groups, it is not known which type of sound is wind, rain or human. Thus, when each audio group file is stored, the audio group is directly named audio group 1, audio group 2, audio group 3, and so on. In this case, the tag may be the name of the stored audio group file, i.e., audio group 1, audio group 2, audio group 3, etc. For example, referring to FIG. 10, a plurality of labels are displayed in preview interface 801, including audio group 1, audio group 2, audio group 3, and so forth. Preview interface 801 may also display a prompt: click on the audio group to enhance the corresponding audio. For example, if the user clicks on audio group 1, the audio corresponding to audio group 1 is enhanced. In this case, the user cannot directly distinguish what sound is for each audio group from the tags, and the user can click on each tag one by one, i.e., try the sound of the corresponding audio group of each tag one by one.
For example, if S704 uses any one or more of the foregoing second to fourth modes to separate N audio groups, since the type of the separated audio group, such as wind, rain, or human voice, can be determined. Thus, when each audio group file is stored, the audio group file may be directly named wind, bird song, rain, etc. In this case, the tag may be the naming of the audio group file. For example, referring to FIG. 11, a plurality of labels are displayed in preview interface 801, including wind, bird song, sea wave, foot step, etc. The label is more visual, and a user can directly distinguish what sound is in each audio group through the label without trying the sound corresponding to each label one by one.
In fig. 10 and 11, N labels are vertically arranged and displayed in the right area of the preview interface 801, and it is understood that the display position and/or the display form of the labels may be adjusted, for example, the display form of the labels may be vertically arranged or horizontally arranged, the display position may be in the right area of the preview interface 801 or in the left area of the preview interface 801, and so on.
For another example, a tab may also be displayed in the preview interface 801 at the location where the subject was photographed. For example, referring to fig. 12, a bird song label is shown at the bird site and a sea wave song label is shown at the sea wave site. As for wind sounds, footsteps sounds, and the like, since there is no corresponding photographic subject in the preview interface 801, it may be displayed at an arbitrary position or at a position where wind sounds or footsteps sounds may occur. For example, in fig. 12, the wind sound tag is displayed at the sky, and the footstep sound tag is displayed at the beach position. The display mode of the label is more convenient for users to distinguish, and the user experience is better.
It will be appreciated that the subject in the preview interface may be identified prior to displaying the N labels in the manner shown in fig. 12. Assume that the electronic device recognizes that P photographic subjects are included in the preview interface, and that the number of tags is N, and that three relationships exist between P and N.
1、P>N
The number N of labels is smaller than the number P of shooting objects in the preview interface. One possible scenario is that, in a video signal captured by a camera, a part of the captured objects does not emit sound (such as blue sky, white cloud, etc.), so that the microphone cannot capture the sound signal of the part of the captured objects, so that the number of captured objects in the capturing interface is greater than the number of tags.
2、P=N
The number N of tags is equal to the number P of photographic subjects in the preview interface. A possible scenario is that a plurality of photographic subjects included in a video signal acquired by a camera just all sound, and there is no side sound.
3、P<N
The number N of labels is larger than the number P of shooting objects in the preview interface. One possible scenario is that, because the field angle of the camera is limited, the video signal captured by the camera does not include a subject that is not in the field of view, but the sound emitted by the subject that is not in the field of view is captured by the microphone, like a bystander sound. In such a scene, there may be no object in the video signal collected by the camera, but the sound signal collected by the microphone includes the sound signal of the object, that is, the number of shooting objects in the shooting interface is smaller than the number of tags. For example, in fig. 12, if a shooting object of a child does not appear in the preview interface 801, and a sound played by the child is assumed to be in the environment, the collected sound signal includes the sound played by the child.
In the above three cases, when the electronic device displays N labels in the preview interface, corresponding M shooting objects may be matched in P shooting objects according to M labels in the N labels, where the M shooting objects are sound sources of M sound signals corresponding to the M labels. Then, M tags are displayed at positions where M photographic subjects are located. Other tabs (other tabs than M tabs) out of the N tabs may be displayed at other locations in the preview interface. For example, referring to fig. 12, if the electronic device identifies a bird as a shooting object, a bird song label is displayed at the position of the bird, and if the electronic device identifies a sea water as the shooting object, a sea wave song label is displayed at the position of the sea water. As for the wind sound or the footstep sound, since the corresponding photographic subject is not recognized, at this time, it may be displayed at an arbitrary position, or may be displayed at a position where the wind sound or the footstep sound is likely to occur. For example, in fig. 12, the wind sound tag is displayed at the sky, and the footstep sound tag is displayed at the beach position.
In the above embodiment, after the electronic device separates the sound signals in the environment into N audio groups, N labels are automatically displayed in the preview interface 801. In other embodiments, after the electronic device separates the sound signals in the environment into N audio groups, no tags will automatically appear in the preview interface 801, and the tags are displayed when an operation for evoking the tags is received. The operation for calling out the label may be a preset gesture operation (such as a double click or a long press operation at any position of the preview interface 801), or a specific control is displayed in the preview interface 801, and when an operation for the control is detected, the label is called out, or when a voice instruction for indicating the label is received, the label is called out.
Of course, the label may be hidden after it is displayed. For example, when the electronic device receives an operation for hiding the tag, the tag is hidden. The operation for hiding the tag may be a preset gesture operation (such as a double click or a long press operation at any position of the preview interface 801), or a specific control is displayed in the preview interface 801, where the tag is hidden when an operation for the control is detected, or where a voice instruction for indicating the tag is received. For example, referring to fig. 13 (a), a label is displayed in the preview interface 801, and when a double click or long press operation is detected at any position on the preview interface 801, the label is hidden as in fig. 13 (b). When a double click or a long press operation at an arbitrary position on the preview interface is detected again, the label is evoked as shown in fig. 13 (c).
And step 3, receiving user input operation, and determining a target label according to the input operation.
For example, referring to fig. 14, after the user clicks the wind sound tag, a prompt message is displayed: having selected "wind", will begin to focus on "wind" after 3s, and clicking again may cancel the focus on "wind". That is, after clicking the wind sound tag, automatic recording is started after 3 s. Alternatively, after the user selects the wind-sound tag, the wind-sound tag may be highlighted, such as highlighted, enlarged, bolded, etc.
The second way is to determine the subject shooting object in the video signal; the subject shooting object is one or more objects or one or more object categories in the video signal; and determining a first target sound signal according to the subject shooting object, wherein the sound source of the first target sound signal is the subject shooting object.
The determination mode of the subject shooting object is the same as the implementation principle of the mode B in the multiple determination modes of the target audio group in S3 in fig. 4, and the description is not repeated.
For example, referring to fig. 15, a preview interface 801 includes a plurality of subjects, birds, ocean waves, boats, etc. The electronic device may automatically recognize the subject photographing object (implementation principles have been described above), or the user may designate the subject photographing object. For example, the user selects a subject photographing object through a circling operation; alternatively, the user may specify the subject photographing object through a voice instruction.
S706, recording video is started.
After the first target sound signal is determined, waiting for a preset duration to automatically start video recording;
alternatively, recording may be automatically started a certain time (e.g., 3 s) after the user selects the target tag. For example, taking fig. 14 as an example, recording is automatically started after selecting the wind sound tag 3 s.
In the mode B, after the first target sound signal is determined, when an operation for instructing to start video recording is detected, video recording is started. That is, after the user selects the target tag, recording is not automatically started, and when a recording start instruction is received, recording is started. Continuing to take fig. 14 as an example, when the user selects the wind sound tag, the recording is not automatically started after 3s, and when the user clicks the recording key 804, the key recording wind sound is started, that is, the recording interface 1601 shown in fig. 16 is displayed, where the recording interface 1601 only includes the wind sound tag and no other tag, which represents that the wind sound is currently being recorded, and the recording time is also displayed, which represents that the video is currently being recorded.
S707, synthesizing the video signal collected by the camera and the first target sound signal into a video file.
The implementation principle of S707 is referred to the implementation principle of S4 in fig. 4, and the description thereof is not repeated here. Taking fig. 16 as an example, when the electronic device foreground displays the recording interface 1601 shown in fig. 16, the background enhances the sound in the environment collected by the microphone, and weakens other sounds, so as to achieve the purpose of mainly recording the wind sound. When the user wants to stop recording, he can click on the stop recording control 1602 in the recording interface 1601.
Alternatively, the electronic device may start the composition when detecting that the user clicks the stop video control 1602, or perform the composition in real time during the recording process, which is not limited in the embodiment of the present application.
In the above embodiments, the first target sound signal is determined before the electronic device starts recording, and it is understood that the target sound signal may be replaced before recording or during recording. The following is presented in two scenarios.
Scene one
Before recording is started, the target sound signal is changed. For example, before recording is started, a first target sound signal is determined, if the user is not satisfied, the first target sound signal can be switched to a second target sound signal, and after switching to the second target sound signal, the second sound signal and a video signal collected by a camera are combined into a video file when recording is started.
Scene two
The target sound signal is changed in the middle of recording, wherein the middle of recording can be understood as stopping recording after starting recording. For example, after the electronic device has focused on recording the first target sound signal for a period of time, the user may want to focused on recording the second target sound signal immediately (before stopping recording). That is, in recording one video, two kinds of target sound signals are switched.
Illustratively, the electronic device determines the first target sound signal prior to recording. After recording is started, the electronic device synthesizes the video signal collected by the camera in the first time length and the first sound signal collected by the microphone in the first time length to obtain a first video clip, wherein the first time length is the time length after the first target sound is determined. Before stopping recording, the electronic device detects that the first target sound signal is switched to the second target sound signal, and then the electronic device synthesizes the video signal collected by the camera in the second time period and the second target sound signal collected by the microphone in the second time period into a second recording segment, wherein the second time period is the time period after the second target sound signal is determined. And when the video recording stopping instruction is detected, the electronic equipment synthesizes the first video recording section and the second video recording section into a video recording file. In this way, the target sounds with different segments protruding in the video file obtained by the electronic equipment are different, and the shooting experience is good.
If the first target sound signal is determined in the first way, i.e. the plurality of tags are displayed for the user to select a target tag, in which case during video recording a new target tag may be selected again among the plurality of tags if the user wants to switch to the second target sound signal. For example, referring to fig. 17 (a), the electronic device is currently recording wind sounds with emphasis, and a control 1603 is displayed in the recording interface 1601. When it is detected that the user clicks on the control 1603, all the labels are evoked, as in fig. 17 (b). When it is detected that the user selects another tag, such as a wave sound tag, the key recording of wave sound is started, as shown in fig. 17 (c). That is, after the recording is started, the user may adjust the target tag, so that a recording is completed, and the focused sound in different segments in the recording file is different, for example, a video signal collected by the camera in a first segment in the recording file is synthesized with the first target sound signal, and a video signal collected by the camera in a second segment in the recording file is synthesized with the second target sound signal.
If the second mode (i.e. no tag is displayed on the display screen, the first target sound signal is determined according to the subject shooting object) is used to determine the first target sound signal. In this case, if the user wants to switch to the second target sound signal during video recording, the user may reselect the object to be shot in the preview interface 801. Since the background establishes a correspondence between the photographic subjects and the tags, when it is detected that the user reselects a certain photographic subject, the tag corresponding to the reselected photographic subject is determined as the target tag. For example, referring to fig. 18 (a), the phone is recording bird sounds with emphasis, and when the phone detects a user selecting ocean waves, the phone starts recording ocean waves with emphasis immediately, for example, after 3s, as shown in fig. 18 (b).
Example two
In the second embodiment, the second scenario mentioned above, that is, the scenario in which the locally stored video file is post-processed, is specifically described.
Fig. 19 is a flowchart of a video file processing method according to a second embodiment. The flow diagram can be understood as a refinement to the flow diagram shown in fig. 4. The process comprises the following steps:
s1901, determining a first video file to be processed in locally stored video files, wherein the first video file comprises video signals and N sound signals, and N is an integer greater than or equal to 1.
The method includes the steps that a first video file to be processed in locally stored video files is determined, the first application is started by the electronic equipment, the first application comprises at least one video file, and the first video file to be processed is determined according to user operation. The first application can be a local gallery or a cloud gallery of the electronic device; or the first application is a short video application, and the first video file is a short video downloaded by the electronic equipment; or the first application is an instant messaging application, and videos sent by other contacts in the first video file; alternatively, the first application is a social network and the first video file is a video downloaded from the social network (e.g., a video downloaded by a user after release by another person).
The description will be given below taking a gallery application in an electronic device as an example.
Illustratively, (a) in fig. 20 shows a desktop of the cellular phone. When the mobile phone detects that the user clicks on an icon of the gallery application on the desktop, the gallery application may be opened, and another GUI, which may be referred to as a top page of the gallery application, is displayed as shown in (b) of fig. 20. The front page comprises a cover map of a video file locally stored in the mobile phone. When the mobile phone detects an operation for a certain video file 2001, an interface as shown in fig. 21 is displayed, in which a cover map of the video file 2001 is displayed, and an edit control 2002 is also displayed.
S1902, entering an editing mode of the first video file.
Illustratively, continuing with fig. 21 as an example, when the handset detects operation of the user clicking on the edit control 2002, it enters an edit mode for the video file 2001, for example, displays an interface as shown in fig. 22 (a) or fig. 23 (a), which includes a sound enhancement control 2201. When it is detected that the user clicks the sound enhancement control 2201, an edit mode for the sound signal in the first video file is entered.
Alternatively, S1902 may be executed or not, and the embodiments of the present application are not limited, so they are indicated by dashed lines in the figures.
S1903, determining a target sound signal of the N sound signals.
In a first mode, displaying N tags, where the N tags are used to identify the N sound signals; and determining a target label in response to an operation for selecting the target label, wherein the sound corresponding to the target label is the target sound. After the electronic device detects the user to click on the sound enhancement control 2201 as in (a) of fig. 22, the user can select the target tab as shown in (b) of fig. 22.
A second mode of determining a subject shooting object in the video signal; the subject shooting object is one or more objects or one or more object categories in the video signal; and determining a target sound signal according to the subject shooting object, wherein the sound source of the target sound signal is the subject shooting object. After the electronic device detects the click for the sound enhancement control 2201 as in fig. 23 (a), the user can select a subject shooting object in the interface (e.g., select a subject shooting object in a circling operation) and then determine a target sound signal according to the subject shooting object selected by the user, as shown in fig. 23 (b).
S1904, enhancing the target sound signal in the first video file, and/or weakening other sound signals in the first video file to obtain a second video file, where the other sound signals are other sound signals than the target sound signal in the N sound signals.
For example, continuing to take fig. 22 (b) as an example, after detecting that the user selects the wave sound tag, when detecting that the user clicks the completion control, the mobile phone enhances wave sound of the video file and/or weakens other sounds to obtain a new video file.
It should be noted that, when the user uses the electronic device to record a video, the user may not be aware of the noisy environment, and when the user opens the video file to watch after the recording is completed, the user may not be aware of the noisy environment. At this time, the user can use the second embodiment to enhance the target sound signal in the stored local recording file, weaken other sound signals, and improve the effect of the recorded recording file.
In the embodiments provided in the present application, the method provided in the embodiments of the present application is described from the point of view that the electronic device (for example, a mobile phone) is used as the execution subject. In order to implement the functions in the methods provided in the embodiments of the present application, the electronic device may include a hardware structure and/or a software module, where the functions are implemented in the form of a hardware structure, a software module, or a hardware structure plus a software module. Some of the functions described above are performed in a hardware configuration, a software module, or a combination of hardware and software modules, depending on the specific application of the solution and design constraints.
Based on the same conception, fig. 24 shows an electronic device 2400 provided in the present application. The electronic device 2400 may be a cell phone as described above. As shown in fig. 24, the electronic device 2400 may include: one or more processors 2401; one or more memories 2402; a communication interface 2403, and one or more computer programs 2404, which may be connected by one or more communication buses 2405. Wherein the one or more computer programs 2404 are stored in the memory 2402 and configured to be executed by the one or more processors 2401, the one or more computer programs 2404 include instructions that can be used to perform the steps associated with the cell phone as in the corresponding embodiments above. The communication interface 2403 is used to enable communication with other devices, for example, the communication interface may be a transceiver.
In the embodiments provided in the present application, the method provided in the embodiments of the present application is described from the point of view that the electronic device (for example, a mobile phone) is used as the execution subject. In order to implement the functions in the methods provided in the embodiments of the present application, the electronic device may include a hardware structure and/or a software module, where the functions are implemented in the form of a hardware structure, a software module, or a hardware structure plus a software module. Some of the functions described above are performed in a hardware configuration, a software module, or a combination of hardware and software modules, depending on the specific application of the solution and design constraints.
As used in the above embodiments, the term "when …" or "after …" may be interpreted to mean "if …" or "after …" or "in response to determination …" or "in response to detection …" depending on the context. Similarly, the phrase "at the time of determination …" or "if detected (a stated condition or event)" may be interpreted to mean "if determined …" or "in response to determination …" or "at the time of detection (a stated condition or event)" or "in response to detection (a stated condition or event)" depending on the context. In addition, in the above-described embodiments, relational terms such as first and second are used to distinguish one entity from another entity without limiting any actual relationship or order between the entities.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc. The schemes of the above embodiments may be used in combination without conflict.
It is noted that a portion of this patent document contains material which is subject to copyright protection. The copyright owner has reserved copyright rights, except for making copies of patent documents or recorded patent document content of the patent office.

Claims (18)

1. A video file processing method, comprising:
determining a first video file to be processed, wherein the first video file comprises video signals and N sound signals, and N is an integer greater than or equal to 1;
and enhancing the target sound signals in the N sound signals and/or weakening other sound signals in the N sound signals to obtain a second video file.
2. The method of claim 1, wherein the target sound signal is determined in an edit mode for the first video file.
3. The method of claim 2, wherein the determining the target sound signal comprises:
displaying N labels, wherein the N labels are used for identifying the N sound signals;
and determining a target label in response to an operation for selecting the target label, wherein a sound signal corresponding to the target label is the target sound signal.
4. The method of claim 2, wherein the determining the target sound signal comprises:
Determining a subject shooting object in the video signal; the subject shooting object is one or more objects or one or more object categories in the video signal; and determining a target sound signal according to the subject shooting object, wherein the sound source of the target sound signal is the subject shooting object.
5. A video recording method applied to an electronic device, comprising:
synthesizing a video signal acquired by a camera with a first target sound signal in the environment to obtain a video file;
wherein the first target sound signal is one or more of N sound signals included in the environment, N being an integer greater than or equal to 1.
6. The method as recited in claim 5, further comprising:
displaying N labels on a display screen of the electronic equipment, wherein the N labels are used for identifying the N sound signals;
and determining a target tag in response to an operation for selecting the target tag, wherein the sound signal corresponding to the target tag is the first target sound signal.
7. The method as recited in claim 5, further comprising:
determining a subject capture object in the video signal, the subject capture object being one or more objects in the video signal;
And determining a first target sound signal according to the subject shooting object, wherein the sound source of the first target sound signal is the subject shooting object.
8. The method of claim 7, wherein the step of determining the position of the probe is performed,
the subject shooting object is an object in the video signal specified by a user on a preview interface; alternatively, the subject photographing object is an object of interest to the user in the video signal.
9. The method as recited in claim 5, further comprising:
detecting a second operation for indicating a first mode, the first mode being a mode for indicating recording of a specific sound signal;
in response to the second operation, the specific sound signal is determined to be the first target sound signal.
10. The method of claim 6, wherein displaying N labels on the display screen comprises:
detecting the call-out operation of the N labels by a user;
and responding to the call-out operation, and displaying the N labels on the display screen.
11. The method of claim 6, wherein displaying N labels on the display screen comprises:
displaying a preview interface, wherein the preview interface comprises video signals collected by the camera;
Determining the positions of M shooting objects in the preview interface;
displaying M labels at the positions of the M shooting objects in the preview interface, wherein the M shooting objects are sound sources of M sound signals corresponding to the M labels in the N labels; or displaying N-M labels out of the M labels in other positions in the preview interface;
wherein M is an integer of 1 to N.
12. The method according to any one of claims 6-11, wherein the method further comprises:
after the first target sound signal is determined, waiting for a preset duration to automatically start video recording; or,
after the first target sound signal is determined, when an operation for instructing to start recording is detected, recording is started.
13. The method according to any one of claims 6-12, wherein synthesizing the video signal acquired by the camera with the first target sound signal in the environment comprises:
synthesizing a video signal acquired by the camera in a first time period and a first target sound signal acquired by the microphone in the first time period into a first video clip, wherein the first time period is the time period after the first target sound signal is determined; the method further comprises the steps of:
Before stopping video recording, switching the first target sound signal into a second target sound signal according to a target sound signal switching operation;
synthesizing a video signal acquired by the camera in a second time period and a second target sound signal acquired by the microphone in the second time period into a second video clip, wherein the second time period is the time period after switching to the second target sound signal;
and detecting a video recording stopping instruction, and synthesizing the first video recording section and the second video recording section into a video recording file.
14. The method according to any one of claims 5-13, further comprising:
responding to a recording stopping instruction, and storing a first video file and a second video file;
the first video file is synthesized by video signals collected by the camera and N sound signals in the environment, and the second video file is synthesized by video signals collected by the camera and the first target sound signals.
15. The method according to any one of claims 5-14, wherein synthesizing the video signal captured by the camera with the first target sound signal in the environment comprises:
Enhancing the first target sound signal and/or weakening other sound signals which are other sound signals than the first target sound signal of the N sound signals;
and synthesizing the video signal acquired by the camera with the enhanced first target sound signal and the weakened other sound signals.
16. The method according to any one of claims 5-15, wherein the video signal includes at least one shot object, and wherein the at least one shot object does not include an object corresponding to the first target sound signal.
17. An electronic device, comprising:
a processor, a memory, and one or more programs;
wherein the one or more programs are stored in the memory, the one or more programs comprising instructions, which when executed by the processor, cause the electronic device to perform the method steps of any of claims 1-16.
18. A computer readable storage medium for storing a computer program which, when run on a computer, causes the computer to perform the method of any one of claims 1 to 16.
CN202310274620.3A 2021-05-20 2021-05-20 Video recording method and electronic equipment Pending CN116233348A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310274620.3A CN116233348A (en) 2021-05-20 2021-05-20 Video recording method and electronic equipment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110553288.5A CN113473057B (en) 2021-05-20 2021-05-20 Video recording method and electronic equipment
CN202310274620.3A CN116233348A (en) 2021-05-20 2021-05-20 Video recording method and electronic equipment

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN202110553288.5A Division CN113473057B (en) 2021-05-20 2021-05-20 Video recording method and electronic equipment

Publications (1)

Publication Number Publication Date
CN116233348A true CN116233348A (en) 2023-06-06

Family

ID=77871089

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202310274620.3A Pending CN116233348A (en) 2021-05-20 2021-05-20 Video recording method and electronic equipment
CN202110553288.5A Active CN113473057B (en) 2021-05-20 2021-05-20 Video recording method and electronic equipment

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202110553288.5A Active CN113473057B (en) 2021-05-20 2021-05-20 Video recording method and electronic equipment

Country Status (1)

Country Link
CN (2) CN116233348A (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104599674A (en) * 2014-12-30 2015-05-06 西安乾易企业管理咨询有限公司 System and method for directional recording in camera shooting
CN107197187A (en) * 2017-05-27 2017-09-22 维沃移动通信有限公司 The image pickup method and mobile terminal of a kind of video
CN108566519B (en) * 2018-04-28 2022-04-12 腾讯科技(深圳)有限公司 Video production method, device, terminal and storage medium
CN109040641B (en) * 2018-08-30 2020-10-16 维沃移动通信有限公司 Video data synthesis method and device
CN110505403A (en) * 2019-08-20 2019-11-26 维沃移动通信有限公司 A kind of video record processing method and device
CN110740259B (en) * 2019-10-21 2021-06-25 维沃移动通信有限公司 Video processing method and electronic equipment
CN111669636B (en) * 2020-06-19 2022-02-25 海信视像科技股份有限公司 Audio-video synchronous video recording method and display equipment
CN112637529B (en) * 2020-12-18 2023-06-02 Oppo广东移动通信有限公司 Video processing method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN113473057A (en) 2021-10-01
CN113473057B (en) 2023-03-03

Similar Documents

Publication Publication Date Title
CN114467297B (en) Video call display method and related device applied to electronic equipment
CN110225244B (en) Image shooting method and electronic equipment
US20240205535A1 (en) Photographing method and electronic device
CN111061912A (en) Method for processing video file and electronic equipment
CN114390139B (en) Method for presenting video by electronic equipment in incoming call, electronic equipment and storage medium
CN113170037B (en) Method for shooting long exposure image and electronic equipment
CN110059211B (en) Method and related device for recording emotion of user
CN114115770B (en) Display control method and related device
CN114449110B (en) Control method and device of electronic equipment
CN114697543B (en) Image reconstruction method, related device and system
CN113467735A (en) Image adjusting method, electronic device and storage medium
CN112532508B (en) Video communication method and video communication device
CN114449333A (en) Video note generation method and electronic equipment
CN116055859B (en) Image processing method and electronic device
CN115437601B (en) Image ordering method, electronic device, program product and medium
CN113473057B (en) Video recording method and electronic equipment
CN113572798B (en) Device control method, system, device, and storage medium
CN111885768A (en) Method, electronic device and system for adjusting light source
CN116709018B (en) Zoom bar segmentation method and electronic equipment
CN115640414B (en) Image display method and electronic device
CN113938556B (en) Incoming call prompting method and device and electronic equipment
CN113472996B (en) Picture transmission method and device
WO2022228010A1 (en) Method for generating cover, and electronic device
CN114115772B (en) Method and device for off-screen display
WO2023221895A1 (en) Target information processing method and apparatus, and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination