WO2022000174A1

WO2022000174A1 - Audio processing method, audio processing apparatus, and electronic device

Info

Publication number: WO2022000174A1
Application number: PCT/CN2020/098886
Authority: WO
Inventors: 莫品西; 边云锋; 刘洋
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2022-01-06
Also published as: CN113767432A

Abstract

An audio processing method. The audio processing method comprises: acquiring an audio signal to be processed, wherein said audio signal comprises audio components having different frequencies (S101); determining a sound source direction corresponding to each audio component (S102); adjusting the gains of the audio components on the basis of the degree of matching between the sound source direction and a target direction (S103); and synthesizing a target audio signal on the basis of the gain-adjusted audio components (S104). By means of the audio processing method, directional sound pickup in any direction is realized.

Description

音频处理方法、音频处理装置、电子设备Audio processing method, audio processing device, and electronic equipment

技术领域technical field

本申请涉及音频处理技术领域，尤其涉及一种音频处理方法、音频处理装置、电子设备及计算机可读存储介质。The present application relates to the technical field of audio processing, and in particular, to an audio processing method, an audio processing apparatus, an electronic device, and a computer-readable storage medium.

背景技术Background technique

指向性拾音是一种仅对指定方向来源的声音进行拾取的技术，该技术被广泛的应用在专业录音和影视等行业。但随着自媒体、Vlog等多媒体应用的兴起，普通消费者群体对指向性拾音的需求也有所增加。Directive pickup is a technology that only picks up sounds from sources in a specified direction. This technology is widely used in professional recording and film and television industries. However, with the rise of multimedia applications such as self-media and Vlog, the demand for directional pickup by ordinary consumers has also increased.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本申请提供一种音频处理方法、音频处理装置、电子设备及计算机可读存储介质，以实现任意方向的指向性拾音。In view of this, the present application provides an audio processing method, an audio processing apparatus, an electronic device, and a computer-readable storage medium, so as to realize directional sound pickup in any direction.

本申请第一方面提供一种音频处理方法，包括：A first aspect of the present application provides an audio processing method, including:

获取待处理音频信号，其中，所述待处理音频信号包括不同频率的音频分量；Acquiring a to-be-processed audio signal, wherein the to-be-processed audio signal includes audio components of different frequencies;

确定每个所述音频分量对应的声源方向；determining the sound source direction corresponding to each of the audio components;

基于所述声源方向与目标方向的匹配度，调整所述音频分量的增益；adjusting the gain of the audio component based on the degree of matching between the sound source direction and the target direction;

基于调整增益后的所述音频分量合成目标音频信号。A target audio signal is synthesized based on the gain-adjusted audio components.

本申请第二方面提供一种音频处理装置，包括：处理器与存储有计算机程序的存储器；A second aspect of the present application provides an audio processing device, comprising: a processor and a memory storing a computer program;

所述处理器在执行所述计算机程序时实现以下步骤：The processor implements the following steps when executing the computer program:

本申请第三方面提供一种电子设备，包括：处理器与存储有计算机程序的存储器；A third aspect of the present application provides an electronic device, including: a processor and a memory storing a computer program;

基于声源方向与目标方向的匹配度，调整所述音频分量的增益；adjusting the gain of the audio component based on the degree of matching between the sound source direction and the target direction;

本申请第四方面提供一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序；所述计算机程序被处理器执行时实现上述第一方面所述的音频处理方法。A fourth aspect of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium; when the computer program is executed by a processor, the audio processing method described in the first aspect above is implemented.

本申请实施例提供的音频处理方法，关注到待处理音频信号所包括的不同频率的音频分量，针对每个音频分量，分别确定其声源方向，并可以根据声源方向与目标方向的匹配度，调整音频分量的增益，从而使得合成的目标音频信号中，来源于目标方向的声音能够更加突出，实现了指向性拾音。并且，由于可以对不同频率的音频分量进行增益调整，因此不同频率上的指向性灵活可控。此外，目标方向可以根据需求灵活设定，因此可以实现任意方向的指向性拾音。The audio processing method provided by the embodiment of the present application pays attention to the audio components of different frequencies included in the audio signal to be processed, and determines the sound source direction for each audio component, and can determine the sound source direction according to the matching degree between the sound source direction and the target direction. , adjust the gain of the audio component, so that in the synthesized target audio signal, the sound from the target direction can be more prominent, and the directional sound pickup is realized. Moreover, since the gain adjustment can be performed on the audio components of different frequencies, the directivity at different frequencies is flexibly controllable. In addition, the target direction can be flexibly set according to requirements, so directivity pickup in any direction can be achieved.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.

图1是本申请实施例提供的一种音频处理方法的流程图。FIG. 1 is a flowchart of an audio processing method provided by an embodiment of the present application.

图2是本申请实施例提供的一种音频处理方法的算法框图。FIG. 2 is an algorithm block diagram of an audio processing method provided by an embodiment of the present application.

图3是本申请实施例提供的一种音频处理装置的结构示意图。FIG. 3 is a schematic structural diagram of an audio processing apparatus provided by an embodiment of the present application.

图4是本申请实施例提供的一种电子设备的结构示意图。FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

具体实施方式detailed description

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

指向性拾音，即对指定方向的声音进行拾取。随着自媒体、vlog等多媒体应用的兴起，指向性拾音的需求在普通消费者群体中也逐渐增加。Directional pickup, that is, to pick up sounds in a specified direction. With the rise of multimedia applications such as self-media and vlog, the demand for directional pickup is gradually increasing among ordinary consumers.

指向性拾音主要有两种实现方式。一种是通过物理结构的设计来实现指向性，如枪式麦克风。这种实现方式通常涉及较复杂的声学结构，且往往需要一定的体积，不具有便携性，本申请对此种实现方式不作过多讨论。There are two main ways to achieve directional pickup. One is to achieve directivity through the design of physical structures, such as shotgun microphones. This implementation usually involves a relatively complex acoustic structure, often requires a certain volume, and is not portable. This application does not discuss this implementation too much.

另一种是基于算法来实现指向性拾音，比如基于麦克风阵列的波束形成算法。基于麦克风阵列的波束形成算法可以对任意感兴趣的方向进行指向性拾音，从原理上看，该算法在确定感兴趣方向后，可以通过对麦克风阵列中各个麦克风采集的音频信号进行相位和/或幅度的调整，以使各个麦克风采集的音频信号均往感兴趣方向上增强，再对各个调整后的音频信号进行加权，合成最终需要的音频信号，实现指向性拾音。The other is to achieve directional pickup based on algorithms, such as beamforming algorithms based on microphone arrays. The beamforming algorithm based on the microphone array can perform directional pickup for any direction of interest. In principle, after determining the direction of interest, the algorithm can perform phase and/or phase and/or analysis on the audio signals collected by each microphone in the microphone array after determining the direction of interest. Or adjust the amplitude, so that the audio signals collected by each microphone are enhanced in the direction of interest, and then each adjusted audio signal is weighted to synthesize the final audio signal required to achieve directional pickup.

申请人发现，上述的波束形成算法虽然能够对任一方向进行指向性拾音，但其指向性性能需依赖麦克风阵列的尺寸和麦克风位置布置，较强的指向性需要以较大尺寸的麦克风阵列为基础，不合理的麦克风位置布置也会导致部分频段指向性不满足要求。并且，上述的波束形成算法在各频率上的指向性强度不一，往往在高频信号上才有较好的指向性，低频信号则几乎没有指向性。The applicant found that although the above-mentioned beamforming algorithm can perform directional pickup in any direction, its directional performance depends on the size of the microphone array and the placement of the microphones. Based on this, unreasonable placement of microphones will also lead to unsatisfactory directivity in some frequency bands. In addition, the above-mentioned beamforming algorithms have different directivity strengths at each frequency, and often have good directivity on high-frequency signals, while low-frequency signals have almost no directivity.

鉴于波束形成算法具有上述缺点，本申请实施例提出一种音频处理方法，在实现指向性拾音的同时，各频率上的指向性也可以按需灵活设定，且该方法只需小型的麦克风阵列或至少两个的麦克风即可满足强指向性的需求。可以参见图1，图1是本申请实施例提供的一种音频处理方法的流程图。In view of the above shortcomings of the beamforming algorithm, the embodiment of the present application proposes an audio processing method, which can flexibly set the directivity at each frequency as needed while realizing directional pickup, and the method only requires a small microphone An array or at least two microphones can meet the needs of strong directivity. Referring to FIG. 1 , FIG. 1 is a flowchart of an audio processing method provided by an embodiment of the present application.

该音频处理方法可以应用于各种具有拾音功能的电子设备，包括但不限于手机、照相机、摄像机、运动相机、云台相机、录音笔、话筒、穿戴电子设备、智能音箱、智能家电、监控、智能机器人等。该方法也可以应用于有处理能力的音频处理装置，该音频处理装置可以用于对其他设备所采集的音频信号进行后处理。The audio processing method can be applied to various electronic devices with sound pickup function, including but not limited to mobile phones, cameras, video cameras, motion cameras, PTZ cameras, voice recorders, microphones, wearable electronic devices, smart speakers, smart home appliances, monitoring , intelligent robots, etc. The method can also be applied to an audio processing device with processing capability, and the audio processing device can be used to perform post-processing on audio signals collected by other devices.

本申请实施例所提供的方法包括以下步骤：The method provided by the embodiment of the present application comprises the following steps:

步骤101、获取待处理音频信号。Step 101: Acquire a to-be-processed audio signal.

其中，所述待处理音频信号包括不同频率的音频分量。Wherein, the audio signal to be processed includes audio components of different frequencies.

步骤102、确定每个音频分量对应的声源方向。Step 102: Determine the sound source direction corresponding to each audio component.

步骤103、基于所述声源方向与目标方向的匹配度，调整所述音频分量的增益；Step 103, adjusting the gain of the audio component based on the degree of matching between the sound source direction and the target direction;

步骤104、基于调整增益后的所述音频分量合成目标音频信号。Step 104 , synthesizing a target audio signal based on the audio component after the gain adjustment.

在一个声场中，可以通过至少两个麦克风对该声场的声音进行采集。对于待处理音频信号，其可以是该声场中任一麦克风采集的音频信号，也可以是利用该声场中的若干个麦克风采集的音频信号合成的音频信号。In a sound field, the sound of the sound field can be collected by at least two microphones. The audio signal to be processed may be an audio signal collected by any microphone in the sound field, or an audio signal synthesized by using audio signals collected by several microphones in the sound field.

待处理音频信号包括不同频率的音频分量。在确定待处理音频信号所包含的音频分量时，或者说，在确定待处理音频信号的频率构成时，可以有多种实施方式。在一种实施方式中，可以对待处理音频信号进行傅里叶变换，将待处理音频信号从时域变换至频域，从而确定其所包含的不同频率的音频分量。在其他的实施方式中，还可以采用滤波法、子代分析法等作为傅里叶变换的替代手段，这些替代手段同样可以确定出该待处理音频信号所包括的音频分量。The audio signal to be processed includes audio components of different frequencies. When determining the audio components contained in the audio signal to be processed, or in other words, when determining the frequency composition of the audio signal to be processed, there may be various implementations. In one embodiment, Fourier transform may be performed on the audio signal to be processed to transform the audio signal to be processed from the time domain to the frequency domain, so as to determine the audio components of different frequencies contained in the audio signal. In other implementation manners, filtering methods, sub-generation analysis methods, etc. can also be used as alternative means of Fourier transform, and these alternative means can also determine the audio components included in the audio signal to be processed.

针对每个频率的音频分量，可以确定其对应的声源方向。在确定音频分量的声源方向时，可以基于声源定位算法确定。可选的声源定位算法有多种，比如波束形成算法、到达时间差估计算法、差分麦克风阵列算法等。根据任一种声源定位算法，利用该声场中至少两个麦克风采集的音频信号，可以计算出该声场中各频率的音频分量对应的声源方向。在一种实施方式中，声源方向可以用周角和/或俯仰角表示。For the audio component of each frequency, its corresponding sound source direction can be determined. When determining the sound source direction of the audio component, it may be determined based on a sound source localization algorithm. There are many optional sound source localization algorithms, such as beamforming algorithm, time difference of arrival estimation algorithm, differential microphone array algorithm, etc. According to any sound source localization algorithm, the sound source direction corresponding to the audio component of each frequency in the sound field can be calculated by using the audio signals collected by at least two microphones in the sound field. In one embodiment, the direction of the sound source may be represented by a circumference angle and/or a pitch angle.

上述的至少两个麦克风，由于该至少两个麦克风采集的音频信号主要参与声源方向的计算，因此可以将该至少两个麦克风称为定向麦克风。而对于待处理音频信号，其可以是利用定向麦克风采集的音频信号中的一个或多个得到的。比如，在一种实施中，待处理音频信号可以是定向麦克风所采集的音频信号中信噪比最高的音频信号。又比如，待处理音频信号可以是定向麦克风所采集的音频信号合成的音频信号。在另一种实施方式中，待处理音频信号还可以是根据定向麦克风以外的其他麦克风采集的音频信号得到的。For the above at least two microphones, since the audio signals collected by the at least two microphones mainly participate in the calculation of the sound source direction, the at least two microphones may be called directional microphones. As for the audio signal to be processed, it may be obtained by using one or more of the audio signals collected by the directional microphone. For example, in one implementation, the audio signal to be processed may be the audio signal with the highest signal-to-noise ratio among the audio signals collected by the directional microphone. For another example, the audio signal to be processed may be an audio signal synthesized from audio signals collected by a directional microphone. In another implementation manner, the audio signal to be processed may also be obtained according to audio signals collected by other microphones than the directional microphone.

为方便理解，可以举个例子。比如，麦克风阵列可以包括6个麦克风，可以选择其中的3个麦克风作为定向麦克风，则待处理音频信号可以是利用该3个定向麦克风采集的音频信号得到的，也可以是根据另外3个麦克风采集的音频信号得到的。又比如，在另一个例子中，待处理音频信号还可以是根据该麦克风阵列外的其它麦克风采集的音频信号确定的，其他的麦克风还可以是其他设备上的麦克风。To facilitate understanding, an example can be given. For example, the microphone array may include 6 microphones, and 3 of them can be selected as directional microphones, and the audio signal to be processed can be obtained by using the audio signals collected by the 3 directional microphones, or it can be collected according to the other 3 microphones. obtained from the audio signal. For another example, in another example, the audio signal to be processed may also be determined according to audio signals collected by other microphones outside the microphone array, and the other microphones may also be microphones on other devices.

需要注意的是，由于方向是一个相对的概念，因此在表示一个方向时，通常可以先确定一个基准，该基准可以是一个基准的方向，也可以是一个作为基准的坐标系等等。而方向的具体表达方式有多种，在实际工程应用中，一个方向可以对应一个角度，可以对应一个角度所落入的范围(如东南西北、前后左右、区间)，也可以对应一个矢量，或者对应一个坐标(通过该坐标与基准点坐标可以确定方向)。当然，也还有其他的方式可以表示方向，在此不一一列举。It should be noted that since the direction is a relative concept, when expressing a direction, a datum can usually be determined first. There are many ways to express the direction. In practical engineering applications, a direction can correspond to an angle, a range that an angle falls into (such as southeast, northwest, front, back, left, right, interval), or a vector, or Corresponds to a coordinate (the direction can be determined by the coordinate and the reference point coordinate). Of course, there are other ways to express the direction, which are not listed here.

目标方向可以是用户的感兴趣方向。在一种实施方式中，其可以是用户设定的方向。比如用户可以与应用了本申请所提供的方法的电子设备进行交互，通过输入方向信息来设定目标方向。在另一种实施方式中，上述的电子设备可以具有位姿信息可变化的摄像头，电子设备可以获取摄像头的位姿信息，从而确定摄像头的朝向，并可以将目标方向设定为与该摄像头的朝向相匹配。The target direction may be the direction of interest of the user. In one embodiment, it may be a user-set direction. For example, the user can interact with the electronic device to which the method provided by this application is applied, and set the target direction by inputting direction information. In another embodiment, the above-mentioned electronic device may have a camera whose pose information can be changed, the electronic device may obtain the pose information of the camera, thereby determining the orientation of the camera, and may set the target direction to be the same as that of the camera. orientation to match.

对于位姿信息可变化的摄像头，其可以对应多种实施方式。比如，电子设备可以配备云台，云台上安装摄像头，则摄像头可在云台的控制下全方位的转动。又比如，摄像头可以安装在滑轨上，通过电机驱动摄像头在滑轨上滑动。当然，还有其他的实施方式，即摄像头是可相对于机体运动的，都属于本申请所提及的位姿信息可变化的摄像头。For a camera with variable pose information, it can correspond to various implementations. For example, the electronic device can be equipped with a PTZ, and a camera is installed on the PTZ, and the camera can be rotated in all directions under the control of the PTZ. For another example, the camera can be installed on the slide rail, and the camera can be driven by a motor to slide on the slide rail. Of course, there are other implementations, that is, the camera can move relative to the body, all of which belong to the camera with variable pose information mentioned in this application.

在确定每个音频分量对应的声源方向后，可以根据声源方向与目标方向的匹配度，调整音频分量的增益。在一种实施方式中，可以对声源方向与目标方向的匹配度高的音频分量进行增益调整，比如可以提高匹配度高的音频分量的增益。在另一种实施方式中，还可以对匹配度低的音频分量进行增益调整，比如降低匹配度低的音频分量的增益。当然，也可以同时提高匹配度高的音频分量的增益和降低匹配度低的音频分量的增益。After the sound source direction corresponding to each audio component is determined, the gain of the audio component can be adjusted according to the matching degree between the sound source direction and the target direction. In an implementation manner, the gain adjustment may be performed on the audio component with a high degree of matching between the sound source direction and the target direction, for example, the gain of the audio component with a high degree of matching may be increased. In another implementation manner, gain adjustment may also be performed on the audio components with low matching degree, for example, reducing the gain of the audio components with low matching degree. Of course, it is also possible to simultaneously increase the gain of the audio component with a high degree of matching and decrease the gain of the audio component with a low degree of matching.

考虑到一些特殊的需求，用户可能希望对目标方向的声音进行削弱，则在一种实施方式中，可以对声源方向与目标方向的匹配度高的音频分量进行增益的降低，或者对匹配度低的音频分量进行增益的提高，又或者同时降低匹配度高的音频分量的增益和提高匹配度低的音频分量的增益。Considering some special requirements, the user may wish to attenuate the sound in the target direction. In one embodiment, the gain of the audio component with a high degree of matching between the sound source direction and the target direction may be reduced, or the matching degree may be reduced. The gain of the low audio component is increased, or the gain of the audio component with high matching degree is reduced and the gain of the audio component with low matching degree is increased at the same time.

关于声源方向与目标方向的匹配度，其是用于表征声源方向与目标方向之间的差异的。在一种实施方式中，匹配度可以是根据声源方向与目标方向的差值确定的。例如，可以设定一个差值阈值，当声源方向与目标方向的差值小于该差值阈值，则认为该声源方向与目标方向的匹配度高。在另一种实施方式中，声源方向与目标方向之间的差异也可以通过其他方式表达，比如可以通过级别来表达这种差异，如声源方向落入第三区间，目标方向在第一区间，第一区间与第三区间之间间隔了第二区间，则可以确定声源方向与目标方向之间的差异是二级。当然，还有其他多种表达方式，在此不一一列举。Regarding the matching degree between the sound source direction and the target direction, it is used to characterize the difference between the sound source direction and the target direction. In one embodiment, the matching degree may be determined according to the difference between the sound source direction and the target direction. For example, a difference threshold may be set, and when the difference between the sound source direction and the target direction is less than the difference threshold, it is considered that the sound source direction and the target direction have a high degree of matching. In another embodiment, the difference between the sound source direction and the target direction can also be expressed in other ways, for example, the difference can be expressed by a level, for example, the sound source direction falls into the third interval, and the target direction is in the first interval. If there is a second interval between the first interval and the third interval, it can be determined that the difference between the sound source direction and the target direction is the second level. Of course, there are many other expressions, which are not listed here.

对不同频率的音频分量进行增益调整后，可以基于增益调整后的音频分量合成目标音频信号。目标音频信号的合成可以认为是从频域到时域的变换，其在具体实现时有多种方式，比如可以通过傅里叶逆变换实现，当然，也可以有其他的方式。After the gain adjustment is performed on the audio components of different frequencies, the target audio signal can be synthesized based on the audio components after the gain adjustment. The synthesis of the target audio signal can be considered as the transformation from the frequency domain to the time domain, which can be implemented in various ways, such as through inverse Fourier transform, of course, there can also be other ways.

本申请实施例提供的音频处理方法，关注到待处理音频信号所包括的不同频率的音频分量，针对每个音频分量，分别确定其声源方向，并可以根据声源方向与目标方向的匹配度，调整音频分量的增益，从而使得合成的目标音频信号中，来源于目标方向的声音能够更加突出，实现了指向性拾音。并且，由于可以对不同频率的音频分量进行增益调整，因此不同频率上的指向性灵活可控。此外，目标方向可以根据需求灵活设定，因此可以实现任意方向的指向性拾音。相比基于波束形成算法实现的指向性拾音，本申请实施例提供的方法只需小型的麦克风阵列或少量(两个以上)的麦克风即可满足强指向性的需求。The audio processing method provided by the embodiment of the present application pays attention to the audio components of different frequencies included in the audio signal to be processed, and determines the sound source direction for each audio component, and can determine the sound source direction according to the matching degree between the sound source direction and the target direction. , adjust the gain of the audio component, so that in the synthesized target audio signal, the sound from the target direction can be more prominent, and the directional sound pickup is realized. Moreover, since the gain adjustment can be performed on the audio components of different frequencies, the directivity at different frequencies is flexibly controllable. In addition, the target direction can be flexibly set according to requirements, so directivity pickup in any direction can be achieved. Compared with the directional sound pickup based on the beamforming algorithm, the method provided by the embodiments of the present application only needs a small microphone array or a small number (more than two) of microphones to meet the requirement of strong directivity.

在具体调整音频分量的增益时，可以根据声源方向与目标方向的匹配度，确定音频分量的增益系数，从而通过该增益系数对音频分量进行增益调整。而对于增益系数，在一种实施方式中，可以根据预设对应关系确定。预设对应关系可以是匹配度与增益系数的对应关系，因此在确定音频分量的匹配度后，就可以通过该预设对应关系确定该匹配度所对应的增益系数。When specifically adjusting the gain of the audio component, the gain coefficient of the audio component may be determined according to the matching degree between the sound source direction and the target direction, so that the gain of the audio component is adjusted by the gain coefficient. As for the gain coefficient, in an implementation manner, it may be determined according to a preset corresponding relationship. The preset correspondence may be the correspondence between the matching degree and the gain coefficient. Therefore, after the matching degree of the audio component is determined, the gain coefficient corresponding to the matching degree may be determined through the preset correspondence.

预设对应关系在具体设定时，可以根据需求灵活调整。比如，可以使预设对应关系中，匹配度越高，增益系数越大，即匹配度与增益系数正相关。The preset corresponding relationship can be flexibly adjusted according to the needs during specific setting. For example, in the preset correspondence relationship, the higher the matching degree, the larger the gain coefficient, that is, the matching degree and the gain coefficient are positively correlated.

并且，考虑到增益系数随匹配度的变化过于剧烈时，比如匹配度落入某个范围内对应的增益系数较高，而只要稍微落出该范围，对应的增益系数就迅速降低，如此，合成出的目标音频信号虽然指向性很强，但其他非目标方向的声音将被过度削弱，使音频整体较为生硬、不自然。因此，可以使增益系数随匹配度变化的变化步长小于或等于指定变化量，换言之，即匹配度变化一个单位时，增益系数相应的变化量小于或等于指定变化量，从而使得增益系数的变化相对平滑，合成的目标音频信号听起来也可以更加自然Moreover, considering that the change of the gain coefficient with the matching degree is too severe, for example, if the matching degree falls within a certain range, the corresponding gain coefficient is relatively high, and as long as the matching degree falls slightly outside this range, the corresponding gain coefficient decreases rapidly. In this way, the synthesis Although the output target audio signal has a strong directivity, the sounds in other non-target directions will be excessively attenuated, making the overall audio more blunt and unnatural. Therefore, the change step size of the gain coefficient with the change of the matching degree can be less than or equal to the specified change amount. In other words, when the matching degree changes by one unit, the corresponding change amount of the gain coefficient is less than or equal to the specified change amount, so that the change of the gain coefficient Relatively smooth, the synthesized target audio signal can also sound more natural

在一种实施方式中，预设对应关系还可以是声源方向与增益系数的对应关系，即在确定某个音频分量的增益系数时，可以根据该音频分量的声源方向，基于预设对应关系确定该声源方向对应的增益系数。在该实施方式中，预设对应关系在设定时，增益系数需要根据声源方向与目标方向的匹配关系来设定。举个例子，比如目标方向是12点钟，则可以在预设对应关系中设定12点钟的声源方向对应的增益系数是1，11点钟的声源方向对应的增益系数是0.8，10点中的声源方向对应的增益系数是0.5……In one embodiment, the preset correspondence may also be the correspondence between the sound source direction and the gain coefficient, that is, when determining the gain coefficient of a certain audio component, the preset correspondence may be based on the sound source direction of the audio component. The relationship determines the gain coefficient corresponding to the sound source direction. In this embodiment, when the preset corresponding relationship is set, the gain coefficient needs to be set according to the matching relationship between the sound source direction and the target direction. For example, if the target direction is 12 o'clock, the gain coefficient corresponding to the sound source direction at 12 o'clock can be set as 1 in the preset correspondence, and the gain coefficient corresponding to the sound source direction at 11 o'clock is 0.8. The gain factor corresponding to the sound source direction in 10 points is 0.5...

可见，在上述实施方式中，虽然预设对应关系中的两个变化参数是声源方向与增益系数，不包括声源方向与目标方向的匹配度，但声源方向所对应的增益系数在数值大小上与该声源方向与目标方向的匹配度是相适应的。It can be seen that in the above-mentioned embodiment, although the two change parameters in the preset correspondence are the sound source direction and the gain coefficient, excluding the matching degree between the sound source direction and the target direction, the gain coefficient corresponding to the sound source direction is in the numerical value The size is compatible with the matching degree between the sound source direction and the target direction.

预设对应关系在表现形式上也有多种，其中的一种表现形式可以是函数，该函数可以根据需求自由的设定，而该函数可以反映增益系数随声源方向的变化。在一种设定方式中，增益系数可以随声源方向的变化而连续且平滑的变化。The preset correspondence also has various expressions, one of which can be a function, which can be freely set according to requirements, and the function can reflect the change of the gain coefficient with the direction of the sound source. In one setting, the gain factor can change continuously and smoothly with the change of the sound source direction.

考虑现实场景中，人对低频声音与高频声音在方向上的感知度是不同的。对于低频声音，由于低频具有较强的绕射能力，因此人耳对低频声音来自哪个方向并不是很敏感，即低频声音的方向感并不突出。而对于高频声音，人耳对其方向感有较灵敏的感知度。因此，为使合成出的目标音频信号在收听效果上更接近现实场景中的收听效果，对于音频分量的增益调整，除了关注该音频分量的声源方向与目标方向的匹配度以外，还可以关注该音频分量的频率，即可以根据音频分量在方向上的匹配度以及该音频分量的频率对该音频分量的增益进行调整。Considering that in real scenarios, people have different perceptions of low-frequency sounds and high-frequency sounds in direction. For low-frequency sounds, due to the strong diffraction ability of low-frequency sounds, the human ear is not very sensitive to which direction low-frequency sounds come from, that is, the sense of direction of low-frequency sounds is not prominent. For high-frequency sounds, the human ear is more sensitive to its sense of direction. Therefore, in order to make the listening effect of the synthesized target audio signal closer to the listening effect in the real scene, for the gain adjustment of the audio component, in addition to paying attention to the matching degree between the sound source direction of the audio component and the target direction, you can also pay attention to The frequency of the audio component, that is, the gain of the audio component can be adjusted according to the matching degree of the audio component in the direction and the frequency of the audio component.

在一种实施方式中，预设对应关系还可以是增益系数与声源方向、频率这两个参数的对应关系，即在该预设对应关系中，只有音频分量的频率与声源方向确定时，该音频分量的增益系数才唯一确定。如此，在设定该预设对应关系时，可以将低频部分对应的指向性设定得弱一些，高频部分的指向性设定的强一些，比如可以在声源方向相同的基础上，设定对低频对应的增益系数小于高频对应的增益系数。如此，基于该预设对应关系合成的目标音频信号可以在收听效果上更符合人耳的实际听感。In one embodiment, the preset corresponding relationship may also be the corresponding relationship between the gain coefficient and the two parameters of the sound source direction and frequency, that is, in the preset corresponding relationship, only when the frequency of the audio component and the sound source direction are determined , the gain coefficient of the audio component is uniquely determined. In this way, when setting the preset corresponding relationship, the directivity corresponding to the low-frequency part can be set to be weaker, and the directivity of the high-frequency part can be set to be stronger. For example, on the basis of the same sound source direction, set The gain coefficient corresponding to the low frequency is set to be smaller than the gain coefficient corresponding to the high frequency. In this way, the target audio signal synthesized based on the preset correspondence can be more in line with the actual hearing sense of the human ear in terms of listening effect.

在一种实施方式中，待处理音频信号可以是原始音频信号的一个音频帧，即待处理音频信号可以是对原始音频信号进行分帧处理得到的。该音频帧包括预设个数的采样点，可以将该音频帧称为第一音频帧。相应的，合成的目标音频信号也是一个音频帧，该音频帧与第一音频帧相对应，可以称为第二音频帧。In one embodiment, the audio signal to be processed may be an audio frame of the original audio signal, that is, the audio signal to be processed may be obtained by performing frame-by-frame processing on the original audio signal. The audio frame includes a preset number of sampling points, and the audio frame may be referred to as a first audio frame. Correspondingly, the synthesized target audio signal is also an audio frame, and the audio frame corresponds to the first audio frame and may be referred to as a second audio frame.

之所以对原始音频信号进行分帧，是因为信号从时域变换至频域时，变换算法要求输入的信号是平稳的。而在一个帧的时长内，可以认为一个信号是平稳的，因此，可以按照设定的帧长，对原始音频信号进行分帧处理，得到原始音频信号的多个音频帧，而待处理音频信号可以是该多个音频帧中的任一个。The reason why the original audio signal is divided into frames is that when the signal is transformed from the time domain to the frequency domain, the transformation algorithm requires the input signal to be stable. Within the duration of one frame, a signal can be considered to be stable. Therefore, the original audio signal can be processed in frames according to the set frame length to obtain multiple audio frames of the original audio signal, and the audio signal to be processed Can be any of the plurality of audio frames.

在一种实施方式中，第一音频帧中采样点的个数可以是2的幂次方，以便于在分析第一音频帧(待处理音频信号)所包含的音频分量时，可以采用快速傅里叶变换FFT进行加速计算。In an implementation manner, the number of sampling points in the first audio frame may be a power of 2, so that when analyzing the audio components contained in the first audio frame (audio signal to be processed), fast Fourier analysis can be adopted. Lie transform FFT to speed up the calculation.

考虑到分帧所得的音频帧往往是非周期性信号，此时若直接进行频谱分析，将容易出现频谱泄漏的现象。因此，在一种实施方式中，可以在分析第一音频帧的频谱之前，将第一音频帧调制为周期性信号。而调制为周期性信号的具体做法，可以是对第一音频帧加分析窗，即将第一音频帧与分析窗的窗函数相乘。该分析窗的窗函数可以是正弦窗、汉宁窗等。Considering that the audio frames obtained by framing are often aperiodic signals, if the spectrum analysis is performed directly at this time, the phenomenon of spectrum leakage will easily occur. Thus, in one embodiment, the first audio frame may be modulated into a periodic signal prior to analyzing the spectrum of the first audio frame. The specific method of modulating into a periodic signal may be adding an analysis window to the first audio frame, that is, multiplying the first audio frame by a window function of the analysis window. The window function of the analysis window can be a sine window, a Hanning window, or the like.

在上述的实施方式中，待处理音频信号是原始音频信号的一个音频帧(第一音频帧)，合成的目标音频信号相应的也只是一个音频帧(第二音频帧)。由于对原始音频信号进行分帧时，帧移(相邻两帧之间采样点的个数)总是小于帧长(一帧内采样点的个数)的，因此音频帧与音频帧之间会有重叠的采样点。鉴于此，在对各个第一音频帧进行处理并合成得到对应的第二音频帧后，可以通过重叠相加法Overlap-add对第二音频帧进行处理，将第二音频帧与前一音频帧重叠的采样点进行累加。In the above-mentioned embodiment, the audio signal to be processed is one audio frame (the first audio frame) of the original audio signal, and the synthesized target audio signal is correspondingly only one audio frame (the second audio frame). Since the frame shift (the number of sampling points between two adjacent frames) is always smaller than the frame length (the number of sampling points in one frame) when the original audio signal is divided into frames, the difference between the audio frame and the audio frame is There will be overlapping sample points. In view of this, after each first audio frame is processed and synthesized to obtain a corresponding second audio frame, the second audio frame can be processed by the overlap-add method Overlap-add, and the second audio frame and the previous audio frame can be combined with each other. Overlapping sample points are accumulated.

进一步的，考虑前后音频帧之间直接累加，重叠的部分可能有幅值突变，为使最终还原出的完整的音频信号是平滑的，可以在进行累加之前，消除第二音频帧两端幅值的畸变。消除幅值畸变的具体手段，可以是对第二音频帧加合成窗。合成窗的窗函数也有多种选择，比如正弦窗或者汉宁窗等。Further, considering the direct accumulation between the audio frames before and after, the overlapping part may have a sudden change in amplitude. In order to make the final restored complete audio signal smooth, the amplitudes at both ends of the second audio frame can be eliminated before accumulation. distortion. A specific method for eliminating amplitude distortion may be adding a synthesis window to the second audio frame. There are also many options for the window function of the synthesis window, such as a sine window or a Hanning window.

需要说明的是，在一种场景中，待处理音频信号可以有多路，每一路待处理音频信号都可以通过本申请实施例提供的音频处理方法进行处理，每一路待处理音频信号所进行的指向性处理所针对的目标方向可以相同也可以不同。比如，可以有两路待处理音频信号，其中一路待处理音频信号可以是针对前方进行指向性拾音的，另一路待处理音频信号可以是针对后方进行指向性拾音的。It should be noted that, in a scenario, there may be multiple channels of audio signals to be processed, and each channel of the audio signals to be processed may be processed by the audio processing method provided in this embodiment of the present application. The target directions for the directivity processing can be the same or different. For example, there may be two channels of audio signals to be processed, one channel of audio signals to be processed may be directional pickup for the front, and the other channel of audio signals to be processed may be directional pickup for the rear.

下面提供一个相对详尽的实施例，可以参见图2，图2是本申请实施例提供的一种音频处理方法的算法框图。A relatively detailed embodiment is provided below, and reference may be made to FIG. 2 , which is an algorithm block diagram of an audio processing method provided by an embodiment of the present application.

在一个场景中，可以通过麦克风阵列对同一声场的声音进行采集。比如麦克风阵列中可以包含M个麦克风，M≥2，则第m个麦克风采集到的时域的音频信号可以用x _m(t)表示，其中m为麦克风序号，m＝1,2,…,M，t为采样离散时间序列，t＝1,2,…。原始音频信号(待处理音频信号为原始音频信号的一个音频帧)可以用s _i(t)表示，i表示第i路原始音频信号。 In a scene, the sound of the same sound field can be collected by the microphone array. For example, the microphone array can include M microphones, M≥2, then the time domain audio signal collected by the mth microphone can be represented by x _m (t), where m is the microphone serial number, m=1,2,..., M, t is the sampling discrete time series, t=1,2,…. The original audio signal (the to-be-processed audio signal is an audio frame of the original audio signal) can be represented by s _i (t), where i represents the i-th original audio signal.

可以以L为帧移，N为帧长，对原始音频信号s _i(t)以及各个麦克风采集的音频信号x _m(t)进行分帧处理，得到原始音频信号对应的第一音频帧s _i(n) _l和麦克风采集的音频信号对应的音频帧x _m(n) _l。其中，n是一个帧信号内的时间序列，n＝1,2,…,N；l是帧序列，l＝1,2,…。 Taking L as the frame shift and N as the frame length, the original audio signal s _i (t) and the audio signal x _m (t) collected by each microphone can be divided into frames to obtain the first audio frame s _{i corresponding to the original audio signal.} (n) _l _{and the audio frame x m} (n) _l corresponding to the audio signal collected by the microphone. Among them, n is a time sequence in a frame signal, n=1, 2, . . ., N; l is a frame sequence, l=1, 2, . . .

分别对麦克风采集的音频信号对应的音频帧x _m(n) _l与第一音频帧s _i(n) _l进行加分析窗，得到x' _m(n) _l和s' _i(n) _l。将加分析窗后的x' _m(n) _l和s' _i(n) _l分别输入FFT模块，得到时域音频帧x' _m(n) _l和s' _i(n) _l各自对应的频谱X _m(k) _l与S _i(k) _l，其中，k表示离散频谱序列，k＝1,2,…,N。 _{An analysis window is added to the audio frame x m} (n) _l corresponding to the audio signal collected by the microphone and the first audio frame s _i (n) _l respectively, to obtain x' _m (n) _l and s' _i (n) _l . _{Input the x' m} (n) _l and s' _i (n) _l after adding the analysis window to the FFT module, respectively, to obtain the corresponding frequency spectra of the _{time-domain audio frames x' m} (n) _l and s' _i (n) _{l respectively} X _m (k) _l and S _i (k) _l , where k represents a discrete spectrum sequence, k=1, 2, . . . , N.

将各个麦克风对应的频谱X _m(k) _l输入声源定位模块(如图2所示，包括X ₁(k) _l、…、X _M-1(k) _l、X _M(k) _l)，在声源定位模块中，基于麦克风阵列的声源定位算法，可以确定该声场中不同频率的音频分量对应的声源方向。其中，频率序号为k的音频分量所对应的声源方向可以用俯仰角Ψ(k)与周角θ(k)表示。 _{Input the spectrum X m} (k) _l corresponding to each microphone into the sound source localization module (as shown in Figure 2, including X ₁ (k) _l , ..., X _M-1 (k) _l , X _M (k) _l ) , in the sound source localization module, based on the sound source localization algorithm of the microphone array, the direction of the sound source corresponding to the audio components of different frequencies in the sound field can be determined. The direction of the sound source corresponding to the audio component whose frequency serial number is k can be represented by the pitch angle Ψ(k) and the circumference angle θ(k).

增益系数确定模块包括预设对应关系，该预设对应关系可以是增益系数与声源方向、频率这两个参数的对应关系，在本实施例中，可以用函数G _i(θ,Ψ,k)表示。该函数G _i(θ,Ψ,k)可以灵活的设定，具体的设定方式可以参考前文中关于预设对应关系的设定。 The gain coefficient determination module includes a preset corresponding relationship, and the preset corresponding relationship can be the corresponding relationship between the gain coefficient and the two parameters of the sound source direction and frequency. In this embodiment, the function G _i (θ, Ψ, k )Express. The function G _i (θ,Ψ,k) can be set flexibly, and the specific setting method can refer to the setting of the preset corresponding relationship in the foregoing.

针对每个音频分量S _i(k) _l，可以将该音频分量的声源方向(周角θ(k)、俯仰角Ψ(k))与频率序号k输入增益系数确定模块，从而可以基于函数G _i(θ,Ψ,k)确定出该音频分量S _i(k) _l的增益系数G _i(k)＝G _i(θ(k),Ψ(k),k)。 For each audio component S _i (k) _l , the sound source direction (circumference angle θ(k), pitch angle Ψ(k)) and frequency number k of the audio component can be input into the gain coefficient determination module, so that the function can be G _i (θ,Ψ,k) determines the gain coefficient G _i (k)=G _i (θ(k),Ψ(k),k) _{of the audio component S i} (k) _l.

将G _i(k)与S _i(k) _l输入音频分量增益调整模块，根据该增益系数G _i(k)对音频分量S _i(k) _l进行处理，即使两者相乘，S _i(k) _l＝S _i(k) _lG _i(k)。 Input G _i (k) and S _i (k) _l into the audio component gain adjustment module, and process the audio component S _i (k) _l _{according to the gain coefficient G i} (k), even if the two are multiplied, S _i ( k) _l = S _i (k) _l G _i (k).

将增益调整后的各音频分量S _i(k) _l输入快速傅里叶逆变换IFFT模块，从频域变换回时域，得到时域音频帧s' _i(n) _l。相应的，可以对各个s' _i(n) _l加合成窗，得到s” _i(n) _l。将各个加合成窗后的音频帧s” _i(n) _l通过重叠相加法Overlap-add还原出的音频帧s _i(n) _l。利用各个音频帧s _i(n) _l可以合成最终完整的目标音频信号。 The gain-adjusted audio components S _i (k) _{l are} input into the inverse fast Fourier transform IFFT module, and transformed from the frequency domain back to the time domain to obtain a time domain audio frame s' _i (n) _l . Correspondingly, a synthesis window can be added to each s' _i (n) _l to obtain s" _i (n) _l . The audio frame s" _i (n) _l after each addition synthesis window is overlapped and added by the Overlap-add method. The restored audio frame s _i (n) _l . The final complete target audio signal can be synthesized using each audio frame s _i (n) _l.

以上为对本申请实施例提供的音频处理方法的详细说明。下面请参见图3，图3是本申请实施例提供的一种音频处理装置的结构示意图。该音频处理装置可以包括：处理器310与存储有计算机程序的存储器320；The above is a detailed description of the audio processing method provided by the embodiments of the present application. Referring to FIG. 3 below, FIG. 3 is a schematic structural diagram of an audio processing apparatus provided by an embodiment of the present application. The audio processing apparatus may include: a processor 310 and a memory 320 storing a computer program;

可选的，所述处理器在执行所述基于声源方向与目标方向的匹配度，调整所述音频分量的增益时，具体用于基于声源方向与目标方向的匹配度，确定所述音频分量的增益系数，并根据所述增益系数调整所述音频分量的增益。Optionally, when adjusting the gain of the audio component based on the matching degree between the sound source direction and the target direction, the processor is specifically configured to determine the audio frequency based on the matching degree between the sound source direction and the target direction. gain coefficient of the component, and adjust the gain of the audio component according to the gain coefficient.

可选的，所述音频分量的增益系数是根据预设对应关系确定的，所述预设对应关系是所述匹配度与所述增益系数的对应关系。Optionally, the gain coefficient of the audio component is determined according to a preset corresponding relationship, and the preset corresponding relationship is a corresponding relationship between the matching degree and the gain coefficient.

可选的，所述预设对应关系中，声源方向对应的增益系数与所述匹配度正相关。Optionally, in the preset correspondence, the gain coefficient corresponding to the sound source direction is positively correlated with the matching degree.

可选的，所述预设对应关系中，所述增益系数在所述匹配度变化一个单位时对应的变化量小于或等于指定变化量。Optionally, in the preset corresponding relationship, the corresponding change amount of the gain coefficient when the matching degree changes by one unit is less than or equal to a specified change amount.

可选的，所述匹配度是根据所述声源方向与所述目标方向的差值确定的。Optionally, the matching degree is determined according to the difference between the sound source direction and the target direction.

可选的，所述处理器在执行所述根据所述基于所述声源方向与目标方向的匹配度，调整所述音频分量的增益时，具体用于根据所述声源方向与目标方向的匹配度以及所述音频分量的频率，调整所述音频分量的增益。Optionally, when the processor performs the adjusting the gain of the audio component based on the matching degree between the sound source direction and the target direction, the processor is specifically configured to adjust the gain of the audio component according to the matching degree between the sound source direction and the target direction. The matching degree and the frequency of the audio component adjust the gain of the audio component.

可选的，所述声源方向是基于声源定位算法，利用至少两个麦克风对同一声场采集的音频信号确定的。Optionally, the sound source direction is determined based on a sound source localization algorithm, using at least two microphones to collect audio signals from the same sound field.

可选的，所述声源定位算法包括以下任一种：波束形成算法、到达时间差估计算法、差分麦克风阵列算法。Optionally, the sound source localization algorithm includes any one of the following: a beamforming algorithm, a time difference of arrival estimation algorithm, and a differential microphone array algorithm.

可选的，所述待处理音频信号是利用所述至少两个麦克风采集的音频信号中的一个或多个得到的。Optionally, the to-be-processed audio signal is obtained by using one or more of the audio signals collected by the at least two microphones.

可选的，所述待处理音频信号是所述至少两个麦克风采集的音频信号中信噪比最高的音频信号。Optionally, the to-be-processed audio signal is the audio signal with the highest signal-to-noise ratio among the audio signals collected by the at least two microphones.

可选的，所述待处理音频信号是利用所述至少两个麦克风采集的音频信号合成得到的。Optionally, the audio signal to be processed is synthesized by using audio signals collected by the at least two microphones.

可选的，所述待处理音频信号是根据所述至少两个麦克风以外的其他麦克风采集的音频信号得到的。Optionally, the to-be-processed audio signal is obtained according to audio signals collected by other microphones other than the at least two microphones.

可选的，所述待处理音频信号是包括预设个数的采样点的第一音频帧，所述目标音频信号是与所述第一音频帧对应的第二音频帧。Optionally, the audio signal to be processed is a first audio frame including a preset number of sampling points, and the target audio signal is a second audio frame corresponding to the first audio frame.

可选的，所述预设个数为2的幂次方。Optionally, the preset number is a power of 2.

可选的，所述不同频率的音频分量是对所述第一音频帧进行快速傅里叶变换确定的。Optionally, the audio components of different frequencies are determined by performing fast Fourier transform on the first audio frame.

可选的，所述处理器还用于，在确定所述第一音频帧包括的不同频率的音频分量之前，将所述第一音频帧调制为周期性信号。Optionally, the processor is further configured to modulate the first audio frame into a periodic signal before determining the audio components of different frequencies included in the first audio frame.

可选的，所述处理器在执行所述将所述第一音频帧调制为周期性信号时，具体用于对所述第一音频帧加分析窗。Optionally, when the processor modulates the first audio frame into a periodic signal, the processor is specifically configured to add an analysis window to the first audio frame.

可选的，所述第二音频帧与前一音频帧重叠的采样点进行了累加。Optionally, the sampling points overlapping the second audio frame and the previous audio frame are accumulated.

可选的，所述处理器还用于，在将所述第二音频帧与前一音频帧重叠的采样点进行累加之前，消除所述第二音频帧两端幅值的畸变。Optionally, the processor is further configured to eliminate the distortion of the amplitudes at both ends of the second audio frame before accumulating the sampling points overlapping the second audio frame and the previous audio frame.

可选的，所述处理器在执行所述消除所述第二音频帧两端幅值的畸变时，具体用于对所述第二音频帧加合成窗。Optionally, the processor is specifically configured to add a synthesis window to the second audio frame when performing the removing the distortion of the amplitudes at both ends of the second audio frame.

可选的，所述目标方向是根据用户输入的方向信息设定的。Optionally, the target direction is set according to the direction information input by the user.

可选的，装载于电子设备，所述电子设备具有位姿信息可变化的摄像头，所述目标方向是根据所述摄像头的朝向确定的。Optionally, it is mounted on an electronic device, and the electronic device has a camera whose pose information can be changed, and the target direction is determined according to the orientation of the camera.

可选的，所述声源方向包括：周角和/或俯仰角。Optionally, the sound source direction includes: a circumference angle and/or a pitch angle.

本申请实施例提供的音频处理装置，关注到待处理音频信号所包括的不同频率的音频分量，针对每个音频分量，分别确定其声源方向，并可以根据声源方向与目标方向的匹配度，调整音频分量的增益，从而使得合成的目标音频信号中，来源于目标方向的声音能够更加突出，实现了指向性拾音。并且，由于可以对不同频率的音频分量进行增益调整，因此不同频率上的指向性灵活可控。此外，目标方向可以根据需求灵活设定，因此可以实现任意方向的指向性拾音。相比基于波束形成算法实现的指向性拾音，本申请实施例提供的装置只需小型的麦克风阵列或少量(两个以上)的麦克风即可满足强指向性的需求。The audio processing device provided by the embodiment of the present application pays attention to the audio components of different frequencies included in the audio signal to be processed, determines the sound source direction for each audio component, and can determine the sound source direction according to the matching degree between the sound source direction and the target direction. , adjust the gain of the audio component, so that in the synthesized target audio signal, the sound from the target direction can be more prominent, and the directional sound pickup is realized. Moreover, since the gain adjustment can be performed on the audio components of different frequencies, the directivity at different frequencies is flexibly controllable. In addition, the target direction can be flexibly set according to requirements, so directional pickup in any direction can be achieved. Compared with the directional sound pickup based on the beamforming algorithm, the device provided by the embodiments of the present application only needs a small microphone array or a small number (more than two) of microphones to meet the requirement of strong directivity.

以上所提供的音频处理装置的各种实施方式的具体实现，可以参见本申请实施例提供的音频处理方法的相关说明，在此不再赘述。For the specific implementation of the various implementation manners of the audio processing apparatus provided above, reference may be made to the relevant description of the audio processing method provided in the embodiment of the present application, and details are not described herein again.

下面请参见图4，图4是本申请实施例提供的一种电子设备的结构示意图。该电子设备包括：处理器410与存储有计算机程序的存储器420；Referring to FIG. 4 below, FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. The electronic device includes: a processor 410 and a memory 420 storing a computer program;

可选的，所述处理器在执行所述基于声源方向与目标方向的匹配度，调整所述音频分量的增益时，具体用于基于声源方向与目标方向的匹配度，确定所述音频分量的增益系数，并根据所述增益系数调整所述音频分量的增益。Optionally, when the processor adjusts the gain of the audio component based on the matching degree between the sound source direction and the target direction, the processor is specifically configured to determine the audio frequency based on the matching degree between the sound source direction and the target direction. gain coefficient of the component, and adjust the gain of the audio component according to the gain coefficient.

可选的，所述音频分量的增益系数是根据预设对应关系确定的，所述预设对应关系是所述匹配度与所述增益系数的对应关系。Optionally, the gain coefficient of the audio component is determined according to a preset corresponding relationship, and the preset corresponding relationship is the corresponding relationship between the matching degree and the gain coefficient.

可选的，所述预设对应关系中，所述增益系数在所述匹配度变化一个单位时对应的变化量小于或等于指定变化量。Optionally, in the preset correspondence relationship, when the matching degree changes by one unit, the corresponding change amount of the gain coefficient is less than or equal to a specified change amount.

可选的，还包括：至少两个麦克风；Optionally, it also includes: at least two microphones;

所述声源方向是基于声源定位算法，利用所述至少两个麦克风对同一声场采集的音频信号确定的。The sound source direction is determined based on a sound source localization algorithm using audio signals collected from the same sound field by the at least two microphones.

可选的，所述处理器在执行所述将所述第一音频帧调制为周期性信号时，具体用于对所述第一音频帧加分析窗。Optionally, when performing the modulating the first audio frame into a periodic signal, the processor is specifically configured to add an analysis window to the first audio frame.

可选的，还包括：摄像头，所述摄像头可相对于所述电子设备运动，所述目标方向是根据所述摄像头的朝向确定的。Optionally, it further includes: a camera, the camera can move relative to the electronic device, and the target direction is determined according to the orientation of the camera.

本申请实施例提供的电子设备，关注到待处理音频信号所包括的不同频率的音频分量，针对每个音频分量，分别确定其声源方向，并可以根据声源方向与目标方向的匹配度，调整音频分量的增益，从而使得合成的目标音频信号中，来源于目标方向的声音能够更加突出，实现了指向性拾音。并且，由于可以对不同频率的音频分量进行增益调整，因此不同频率上的指向性灵活可控。此外，目标方向可以根据需求灵活设定，因此可以实现任意方向的指向性拾音。相比基于波束形成算法实现的指向性拾音，本申请实施例提供的电子设备只需小型的麦克风阵列或少量(两个以上)的麦克风即可满足强指向性的需求。The electronic device provided by the embodiment of the present application pays attention to the audio components of different frequencies included in the audio signal to be processed, determines the sound source direction for each audio component, and can determine the sound source direction according to the matching degree between the sound source direction and the target direction, The gain of the audio component is adjusted, so that in the synthesized target audio signal, the sound originating from the target direction can be more prominent, realizing directional sound pickup. Moreover, since the gain adjustment can be performed on the audio components of different frequencies, the directivity at different frequencies is flexibly controllable. In addition, the target direction can be flexibly set according to requirements, so directivity pickup in any direction can be achieved. Compared with the directional sound pickup based on the beamforming algorithm, the electronic device provided by the embodiments of the present application only needs a small microphone array or a small number (more than two) of microphones to meet the requirement of strong directivity.

以上所提供的电子设备的各种实施方式的具体实现，可以参见本申请实施例提供的电子设备的相关说明，在此不再赘述。For the specific implementation of the various implementation manners of the electronic device provided above, reference may be made to the relevant description of the electronic device provided in the embodiments of the present application, and details are not described herein again.

以上实施例中对每个步骤分别提供了多种实施方式，至于每个步骤具体采用哪种实施方式，在不存在冲突或矛盾的基础上，本领域技术人员可以根据实际情况自由选择或组合，由此构成各种不同的实施例。而本申请文件限于篇幅，未对各种不同的实施例展开说明，但可以理解的是，各种不同的实施例也属于本申请实施例公开的范围。In the above embodiments, various implementations are provided for each step. As for which implementation is adopted for each step, on the basis of no conflict or contradiction, those skilled in the art can freely choose or combine them according to the actual situation. Various embodiments are thus constituted. However, this application document is limited in space and does not describe various embodiments, but it is understood that various embodiments also belong to the scope disclosed by the embodiments of this application.

本申请实施例可采用在一个或多个其中包含有程序代码的存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。计算机可用存储介质包括永久性和非永久性、可移动和非可移动媒体，可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括但不限于：相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带，磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。Embodiments of the present application may take the form of a computer program product implemented on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having program code embodied therein. Computer-usable storage media includes permanent and non-permanent, removable and non-removable media, and storage of information can be accomplished by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. The terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion such that a process, method, article or device comprising a list of elements includes not only those elements, but also other not expressly listed elements, or also include elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

以上对本申请实施例所提供的方法、装置、设备进行了详细介绍，本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的一般技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本申请的限制。The methods, devices, and equipment provided by the embodiments of the present application have been described in detail above. The principles and implementations of the present application are described in this paper by using specific examples. The descriptions of the above embodiments are only used to help understand the methods of the present application. and its core idea; at the same time, for those of ordinary skill in the art, according to the idea of the application, there will be changes in the specific implementation and application scope. To sum up, the content of this specification should not be construed as a limits.

Claims

一种音频处理方法，其特征在于，包括：An audio processing method, comprising:

获取待处理音频信号，其中，所述待处理音频信号包括不同频率的音频分量；Acquiring a to-be-processed audio signal, wherein the to-be-processed audio signal includes audio components of different frequencies;

确定每个所述音频分量对应的声源方向；determining the sound source direction corresponding to each of the audio components;

基于所述声源方向与目标方向的匹配度，调整所述音频分量的增益；adjusting the gain of the audio component based on the degree of matching between the sound source direction and the target direction;

基于调整增益后的所述音频分量合成目标音频信号。A target audio signal is synthesized based on the gain-adjusted audio components.
根据权利要求1所述的音频处理方法，其特征在于，所述基于声源方向与目标方向的匹配度，调整所述音频分量的增益，包括：The audio processing method according to claim 1, wherein the adjusting the gain of the audio component based on the matching degree between the sound source direction and the target direction comprises:

基于声源方向与目标方向的匹配度，确定所述音频分量的增益系数，并根据所述增益系数调整所述音频分量的增益。Based on the matching degree between the sound source direction and the target direction, the gain coefficient of the audio component is determined, and the gain of the audio component is adjusted according to the gain coefficient.
根据权利要求2所述的音频处理方法，其特征在于，所述音频分量的增益系数是根据预设对应关系确定的，所述预设对应关系是所述匹配度与所述增益系数的对应关系。The audio processing method according to claim 2, wherein the gain coefficient of the audio component is determined according to a preset corresponding relationship, and the preset corresponding relationship is the corresponding relationship between the matching degree and the gain coefficient .
根据权利要求3所述的音频处理方法，其特征在于，所述预设对应关系中，声源方向对应的增益系数与所述匹配度正相关。The audio processing method according to claim 3, wherein, in the preset correspondence, the gain coefficient corresponding to the sound source direction is positively correlated with the matching degree.
根据权利要求3所述的音频处理方法，其特征在于，所述预设对应关系中，所述增益系数在所述匹配度变化一个单位时对应的变化量小于或等于指定变化量。The audio processing method according to claim 3, wherein, in the preset correspondence, the corresponding change amount of the gain coefficient when the matching degree changes by one unit is less than or equal to a specified change amount.
根据权利要求1所述的音频处理方法，其特征在于，所述匹配度是根据所述声源方向与所述目标方向的差值确定的。The audio processing method according to claim 1, wherein the matching degree is determined according to the difference between the sound source direction and the target direction.
根据权利要求1所述的音频处理方法，其特征在于，根据所述基于所述声源方向与目标方向的匹配度，调整所述音频分量的增益，包括：The audio processing method according to claim 1, wherein adjusting the gain of the audio component according to the matching degree based on the sound source direction and the target direction, comprising:

根据所述声源方向与目标方向的匹配度以及所述音频分量的频率，调整所述音频分量的增益。The gain of the audio component is adjusted according to the degree of matching between the sound source direction and the target direction and the frequency of the audio component.
根据权利要求1所述的音频处理方法，其特征在于，所述声源方向是基于声源定位算法，利用至少两个麦克风对同一声场采集的音频信号确定的。The audio processing method according to claim 1, wherein the sound source direction is determined based on a sound source localization algorithm, using at least two microphones to collect audio signals from the same sound field.
根据权利要求8所述的音频处理方法，其特征在于，所述声源定位算法包括以下任一种：波束形成算法、到达时间差估计算法、差分麦克风阵列算法。The audio processing method according to claim 8, wherein the sound source localization algorithm comprises any one of the following: a beamforming algorithm, a time difference of arrival estimation algorithm, and a differential microphone array algorithm.
根据权利要求8所述的音频处理方法，其特征在于，所述待处理音频信号是利用所述至少两个麦克风采集的音频信号中的一个或多个得到的。The audio processing method according to claim 8, wherein the audio signal to be processed is obtained by using one or more of the audio signals collected by the at least two microphones.
根据权利要求10所述的音频处理方法，其特征在于，所述待处理音频信号是所述至少两个麦克风采集的音频信号中信噪比最高的音频信号。The audio processing method according to claim 10, wherein the audio signal to be processed is the audio signal with the highest signal-to-noise ratio among the audio signals collected by the at least two microphones.
根据权利要求10所述的音频处理方法，其特征在于，所述待处理音频信号是利用所述至少两个麦克风采集的音频信号合成得到的。The audio processing method according to claim 10, wherein the audio signal to be processed is synthesized by using audio signals collected by the at least two microphones.
根据权利要求8所述的音频处理方法，其特征在于，所述待处理音频信号是根据所述至少两个麦克风以外的其他麦克风采集的音频信号得到的。The audio processing method according to claim 8, wherein the audio signal to be processed is obtained according to audio signals collected by other microphones other than the at least two microphones.
根据权利要求1所述的音频处理方法，其特征在于，所述待处理音频信号是包括预设个数的采样点的第一音频帧，所述目标音频信号是与所述第一音频帧对应的第二音频帧。The audio processing method according to claim 1, wherein the audio signal to be processed is a first audio frame including a preset number of sampling points, and the target audio signal is corresponding to the first audio frame of the second audio frame.
根据权利要求14所述的音频处理方法，其特征在于，所述预设个数为2的幂次方。The audio processing method according to claim 14, wherein the preset number is a power of 2.
根据权利要求15所述的音频处理方法，其特征在于，所述不同频率的音频分量是对所述第一音频帧进行快速傅里叶变换确定的。The audio processing method according to claim 15, wherein the audio components of different frequencies are determined by performing fast Fourier transform on the first audio frame.
根据权利要求14所述的音频处理方法，其特征在于，在确定所述第一音频帧包括的不同频率的音频分量之前，还包括：The audio processing method according to claim 14, wherein before determining the audio components of different frequencies included in the first audio frame, the method further comprises:

将所述第一音频帧调制为周期性信号。The first audio frame is modulated into a periodic signal.
根据权利要求17所述的音频处理方法，其特征在于，所述将所述第一音频帧调制为周期性信号，包括：The audio processing method according to claim 17, wherein the modulating the first audio frame into a periodic signal comprises:

对所述第一音频帧加分析窗。An analysis window is added to the first audio frame.
根据权利要求14所述的音频处理方法，其特征在于，所述第二音频帧与前一音频帧重叠的采样点进行了累加。The audio processing method according to claim 14, wherein the sampling points overlapping the second audio frame and the previous audio frame are accumulated.
根据权利要求19所述的音频处理方法，其特征在于，在将所述第二音频帧与前一音频帧重叠的采样点进行累加之前，还包括：The audio processing method according to claim 19, wherein before accumulating the sampling points overlapping the second audio frame and the previous audio frame, the method further comprises:

消除所述第二音频帧两端幅值的畸变。Distortion of the amplitudes at both ends of the second audio frame is eliminated.
根据权利要求20所述的音频处理方法，其特征在于，所述消除所述第二音频帧两端幅值的畸变，包括：The audio processing method according to claim 20, wherein the eliminating the distortion of the amplitudes at both ends of the second audio frame comprises:

对所述第二音频帧加合成窗。A synthesis window is added to the second audio frame.
根据权利要求1所述的音频处理方法，其特征在于，所述目标方向是根据用户输入的方向信息设定的。The audio processing method according to claim 1, wherein the target direction is set according to direction information input by a user.
根据权利要求1所述的音频处理方法，其特征在于，应用于电子设备，所述电子设备具有位姿信息可变化的摄像头，所述目标方向是根据所述摄像头的朝向确定的。The audio processing method according to claim 1, characterized in that it is applied to an electronic device, wherein the electronic device has a camera whose pose information can be changed, and the target direction is determined according to the orientation of the camera.
根据权利要求1所述的音频处理方法，其特征在于，所述声源方向包括：周角和/或俯仰角。The audio processing method according to claim 1, wherein the sound source direction comprises: a circumference angle and/or a pitch angle.
一种音频处理装置，其特征在于，包括：处理器与存储有计算机程序的存储器；An audio processing device, comprising: a processor and a memory storing a computer program;

所述处理器在执行所述计算机程序时实现以下步骤：The processor implements the following steps when executing the computer program:

获取待处理音频信号，其中，所述待处理音频信号包括不同频率的音频分量；Acquiring a to-be-processed audio signal, wherein the to-be-processed audio signal includes audio components of different frequencies;

确定每个所述音频分量对应的声源方向；determining the sound source direction corresponding to each of the audio components;

基于所述声源方向与目标方向的匹配度，调整所述音频分量的增益；adjusting the gain of the audio component based on the degree of matching between the sound source direction and the target direction;

基于调整增益后的所述音频分量合成目标音频信号。A target audio signal is synthesized based on the gain-adjusted audio components.
根据权利要求25所述的音频处理装置，其特征在于，所述处理器在执行所述基于声源方向与目标方向的匹配度，调整所述音频分量的增益时，具体用于基于声源方向与目标方向的匹配度，确定所述音频分量的增益系数，并根据所述增益系数调整所述音频分量的增益。The audio processing apparatus according to claim 25, wherein when the processor adjusts the gain of the audio component based on the matching degree between the sound source direction and the target direction, the processor is specifically configured to adjust the gain of the audio component based on the sound source direction. According to the matching degree with the target direction, the gain coefficient of the audio component is determined, and the gain of the audio component is adjusted according to the gain coefficient.
根据权利要求26所述的音频处理装置，其特征在于，所述音频分量的增益系数是根据预设对应关系确定的，所述预设对应关系是所述匹配度与所述增益系数的对应关系。The audio processing device according to claim 26, wherein the gain coefficient of the audio component is determined according to a preset correspondence relationship, and the preset correspondence relationship is a correspondence relationship between the matching degree and the gain coefficient .
根据权利要求27所述的音频处理装置，其特征在于，所述预设对应关系中，声源方向对应的增益系数与所述匹配度正相关。The audio processing apparatus according to claim 27, wherein, in the preset correspondence, a gain coefficient corresponding to a sound source direction is positively correlated with the matching degree.
根据权利要求27所述的音频处理装置，其特征在于，所述预设对应关系中，所述增益系数在所述匹配度变化一个单位时对应的变化量小于或等于指定变化量。The audio processing apparatus according to claim 27, wherein, in the preset correspondence, the corresponding change amount of the gain coefficient when the matching degree changes by one unit is less than or equal to a specified change amount.
根据权利要求25所述的音频处理装置，其特征在于，所述匹配度是根据所述声源方向与所述目标方向的差值确定的。The audio processing apparatus according to claim 25, wherein the matching degree is determined according to a difference between the sound source direction and the target direction.
根据权利要求25所述的音频处理装置，其特征在于，所述处理器在执行所述根据所述基于所述声源方向与目标方向的匹配度，调整所述音频分量的增益时，具体用于根据所述声源方向与目标方向的匹配度以及所述音频分量的频率，调整所述音频分量的增益。The audio processing device according to claim 25, wherein when the processor performs the adjusting the gain of the audio component based on the matching degree between the sound source direction and the target direction, the processor specifically uses The gain of the audio component is adjusted according to the matching degree between the sound source direction and the target direction and the frequency of the audio component.
根据权利要求25所述的音频处理装置，其特征在于，所述声源方向是基于声源定位算法，利用至少两个麦克风对同一声场采集的音频信号确定的。The audio processing apparatus according to claim 25, wherein the sound source direction is determined based on a sound source localization algorithm using at least two microphones to collect audio signals from the same sound field.
根据权利要求32所述的音频处理装置，其特征在于，所述声源定位算法包括以下任一种：波束形成算法、到达时间差估计算法、差分麦克风阵列算法。The audio processing device according to claim 32, wherein the sound source localization algorithm comprises any one of the following: a beamforming algorithm, a time difference of arrival estimation algorithm, and a differential microphone array algorithm.
根据权利要求32所述的音频处理装置，其特征在于，所述待处理音频信号是利用所述至少两个麦克风采集的音频信号中的一个或多个得到的。The audio processing apparatus according to claim 32, wherein the audio signal to be processed is obtained by using one or more of the audio signals collected by the at least two microphones.
根据权利要求34所述的音频处理装置，其特征在于，所述待处理音频信号是所述至少两个麦克风采集的音频信号中信噪比最高的音频信号。The audio processing apparatus according to claim 34, wherein the audio signal to be processed is the audio signal with the highest signal-to-noise ratio among the audio signals collected by the at least two microphones.
根据权利要求34所述的音频处理装置，其特征在于，所述待处理音频信号是利用所述至少两个麦克风采集的音频信号合成得到的。The audio processing apparatus according to claim 34, wherein the to-be-processed audio signal is synthesized by using audio signals collected by the at least two microphones.
根据权利要求32所述的音频处理装置，其特征在于，所述待处理音频信号是根据所述至少两个麦克风以外的其他麦克风采集的音频信号得到的。The audio processing apparatus according to claim 32, wherein the audio signal to be processed is obtained according to audio signals collected by other microphones other than the at least two microphones.
根据权利要求25所述的音频处理装置，其特征在于，所述待处理音频信号是包括预设个数的采样点的第一音频帧，所述目标音频信号是与所述第一音频帧对应的第二音频帧。The audio processing apparatus according to claim 25, wherein the audio signal to be processed is a first audio frame including a preset number of sampling points, and the target audio signal is corresponding to the first audio frame of the second audio frame.
根据权利要求38所述的音频处理装置，其特征在于，所述预设个数为2的幂次方。The audio processing apparatus according to claim 38, wherein the preset number is a power of 2.
根据权利要求39所述的音频处理装置，其特征在于，所述不同频率的音频分量是对所述第一音频帧进行快速傅里叶变换确定的。The audio processing apparatus according to claim 39, wherein the audio components of different frequencies are determined by performing fast Fourier transform on the first audio frame.
根据权利要求38所述的音频处理装置，其特征在于，所述处理器还用于，在确定所述第一音频帧包括的不同频率的音频分量之前，将所述第一音频帧调制为周期性信号。The audio processing apparatus according to claim 38, wherein the processor is further configured to modulate the first audio frame into a period before determining the audio components of different frequencies included in the first audio frame sexual signals.
根据权利要求41所述的音频处理装置，其特征在于，所述处理器在执行所述将所述第一音频帧调制为周期性信号时，具体用于对所述第一音频帧加分析窗。The audio processing apparatus according to claim 41, wherein when the processor modulates the first audio frame into a periodic signal, the processor is specifically configured to add an analysis window to the first audio frame .
根据权利要求38所述的音频处理装置，其特征在于，所述第二音频帧与前一音频帧重叠的采样点进行了累加。The audio processing apparatus according to claim 38, wherein the sampling points overlapping the second audio frame and the previous audio frame are accumulated.
根据权利要求43所述的音频处理装置，其特征在于，所述处理器还用于，在将所述第二音频帧与前一音频帧重叠的采样点进行累加之前，消除所述第二音频帧两端幅值的畸变。The audio processing apparatus according to claim 43, wherein the processor is further configured to, before accumulating the sampling points overlapping the second audio frame and the previous audio frame, remove the second audio Amplitude distortion at both ends of the frame.
根据权利要求44所述的音频处理装置，其特征在于，所述处理器在执行所述消除所述第二音频帧两端幅值的畸变时，具体用于对所述第二音频帧加合成窗。The audio processing apparatus according to claim 44, wherein when the processor performs the removing distortion of the amplitudes at both ends of the second audio frame, the processor is specifically configured to add and synthesize the second audio frame window.
根据权利要求25所述的音频处理装置，其特征在于，所述目标方向是根据用户输入的方向信息设定的。The audio processing apparatus according to claim 25, wherein the target direction is set according to direction information input by a user.
根据权利要求25所述的音频处理装置，其特征在于，装载于电子设备，所述电子设备具有位姿信息可变化的摄像头，所述目标方向是根据所述摄像头的朝向确定的。The audio processing apparatus according to claim 25, characterized in that it is mounted on an electronic device, the electronic device has a camera whose pose information can be changed, and the target direction is determined according to the orientation of the camera.
根据权利要求25所述的音频处理装置，其特征在于，所述声源方向包括：周角和/或俯仰角。The audio processing device according to claim 25, wherein the sound source direction comprises: a circumference angle and/or a pitch angle.
一种电子设备，其特征在于，包括：处理器与存储有计算机程序的存储器；An electronic device, comprising: a processor and a memory storing a computer program;

所述处理器在执行所述计算机程序时实现以下步骤：The processor implements the following steps when executing the computer program:

获取待处理音频信号，其中，所述待处理音频信号包括不同频率的音频分量；Acquiring a to-be-processed audio signal, wherein the to-be-processed audio signal includes audio components of different frequencies;

确定每个所述音频分量对应的声源方向；determining the sound source direction corresponding to each of the audio components;

基于声源方向与目标方向的匹配度，调整所述音频分量的增益；adjusting the gain of the audio component based on the degree of matching between the sound source direction and the target direction;

基于调整增益后的所述音频分量合成目标音频信号。A target audio signal is synthesized based on the gain-adjusted audio components.
根据权利要求49所述的电子设备，其特征在于，所述处理器在执行所述基于声源方向与目标方向的匹配度，调整所述音频分量的增益时，具体用于基于声源方向与目标方向的匹配度，确定所述音频分量的增益系数，并根据所述增益系数调整所述音频分量的增益。The electronic device according to claim 49, wherein when the processor adjusts the gain of the audio component based on the matching degree between the sound source direction and the target direction, the processor is specifically configured to adjust the gain of the audio component based on the sound source direction and the target direction. The matching degree of the target direction determines the gain coefficient of the audio component, and adjusts the gain of the audio component according to the gain coefficient.
根据权利要求50所述的电子设备，其特征在于，所述音频分量的增益系数是根据预设对应关系确定的，所述预设对应关系是所述匹配度与所述增益系数的对应关系。The electronic device according to claim 50, wherein the gain coefficient of the audio component is determined according to a preset corresponding relationship, and the preset corresponding relationship is a corresponding relationship between the matching degree and the gain coefficient.
根据权利要求51所述的电子设备，其特征在于，所述预设对应关系中，声源方向对应的增益系数与所述匹配度正相关。The electronic device according to claim 51, wherein, in the preset correspondence, a gain coefficient corresponding to a sound source direction is positively correlated with the matching degree.
根据权利要求51所述的电子设备，其特征在于，所述预设对应关系中，所述增益系数在所述匹配度变化一个单位时对应的变化量小于或等于指定变化量。The electronic device according to claim 51, wherein, in the preset corresponding relationship, a corresponding change amount of the gain coefficient when the matching degree changes by one unit is less than or equal to a specified change amount.
根据权利要求49所述的电子设备，其特征在于，所述匹配度是根据所述声源方向与所述目标方向的差值确定的。The electronic device according to claim 49, wherein the matching degree is determined according to a difference between the sound source direction and the target direction.
根据权利要求49所述的电子设备，其特征在于，所述处理器在执行所述根据所述基于所述声源方向与目标方向的匹配度，调整所述音频分量的增益时，具体用于根据所述声源方向与目标方向的匹配度以及所述音频分量的频率，调整所述音频分量的增益。The electronic device according to claim 49, wherein when the processor performs the adjusting the gain of the audio component based on the matching degree between the sound source direction and the target direction, the processor is specifically configured to: The gain of the audio component is adjusted according to the degree of matching between the sound source direction and the target direction and the frequency of the audio component.
根据权利要求49所述的电子设备，其特征在于，还包括：至少两个麦克风；The electronic device of claim 49, further comprising: at least two microphones;

所述声源方向是基于声源定位算法，利用所述至少两个麦克风对同一声场采集的音频信号确定的。The sound source direction is determined based on a sound source localization algorithm using the audio signals collected from the same sound field by the at least two microphones.
根据权利要求56所述的电子设备，其特征在于，所述声源定位算法包括以下任一种：波束形成算法、到达时间差估计算法、差分麦克风阵列算法。The electronic device according to claim 56, wherein the sound source localization algorithm includes any one of the following: a beamforming algorithm, a time difference of arrival estimation algorithm, and a differential microphone array algorithm.
根据权利要求56所述的电子设备，其特征在于，所述待处理音频信号是利用所述至少两个麦克风采集的音频信号中的一个或多个得到的。The electronic device according to claim 56, wherein the audio signal to be processed is obtained by using one or more of the audio signals collected by the at least two microphones.
根据权利要求58所述的电子设备，其特征在于，所述待处理音频信号是所述至少两个麦克风采集的音频信号中信噪比最高的音频信号。The electronic device according to claim 58, wherein the audio signal to be processed is an audio signal with the highest signal-to-noise ratio among the audio signals collected by the at least two microphones.
根据权利要求58所述的电子设备，其特征在于，所述待处理音频信号是利用所述至少两个麦克风采集的音频信号合成得到的。The electronic device according to claim 58, wherein the audio signal to be processed is synthesized by using audio signals collected by the at least two microphones.
根据权利要求56所述的电子设备，其特征在于，所述待处理音频信号是根据所述至少两个麦克风以外的其他麦克风采集的音频信号得到的。The electronic device according to claim 56, wherein the audio signal to be processed is obtained according to audio signals collected by other microphones other than the at least two microphones.
根据权利要求49所述的电子设备，其特征在于，所述待处理音频信号是包括预设个数的采样点的第一音频帧，所述目标音频信号是与所述第一音频帧对应的第二音频帧。The electronic device according to claim 49, wherein the audio signal to be processed is a first audio frame including a preset number of sampling points, and the target audio signal is corresponding to the first audio frame second audio frame.
根据权利要求62所述的电子设备，其特征在于，所述预设个数为2的幂次方。The electronic device according to claim 62, wherein the preset number is a power of 2.
根据权利要求63所述的电子设备，其特征在于，所述不同频率的音频分量是对所述第一音频帧进行快速傅里叶变换确定的。The electronic device according to claim 63, wherein the audio components of different frequencies are determined by performing fast Fourier transform on the first audio frame.
根据权利要求62所述的电子设备，其特征在于，所述处理器还用于，在确定所述第一音频帧包括的不同频率的音频分量之前，将所述第一音频帧调制为周期性信号。The electronic device according to claim 62, wherein the processor is further configured to, before determining the audio components of different frequencies included in the first audio frame, modulate the first audio frame to be periodic Signal.
根据权利要求65所述的电子设备，其特征在于，所述处理器在执行所述将所述第一音频帧调制为周期性信号时，具体用于对所述第一音频帧加分析窗。The electronic device according to claim 65, wherein when the processor performs the modulating the first audio frame into a periodic signal, the processor is specifically configured to add an analysis window to the first audio frame.
根据权利要求62所述的电子设备，其特征在于，所述第二音频帧与前一音频帧重叠的采样点进行了累加。The electronic device according to claim 62, wherein the sampling points of the second audio frame overlapping with the previous audio frame are accumulated.
根据权利要求67所述的电子设备，其特征在于，所述处理器还用于，在将所述第二音频帧与前一音频帧重叠的采样点进行累加之前，消除所述第二音频帧两端幅值的畸变。The electronic device according to claim 67, wherein the processor is further configured to eliminate the second audio frame before accumulating the sampling points overlapping the second audio frame and the previous audio frame Amplitude distortion at both ends.
根据权利要求68所述的电子设备，其特征在于，所述处理器在执行所述消除所述第二音频帧两端幅值的畸变时，具体用于对所述第二音频帧加合成窗。The electronic device according to claim 68, wherein the processor is specifically configured to add a synthesis window to the second audio frame when performing the removing the distortion of the amplitudes at both ends of the second audio frame .
根据权利要求49所述的电子设备，其特征在于，所述目标方向是根据用户输入的方向信息设定的。The electronic device according to claim 49, wherein the target direction is set according to direction information input by a user.
根据权利要求49所述的电子设备，其特征在于，还包括：摄像头，所述摄像头可相对于所述电子设备运动，所述目标方向是根据所述摄像头的朝向确定的。The electronic device according to claim 49, further comprising: a camera, the camera can move relative to the electronic device, and the target direction is determined according to the orientation of the camera.
根据权利要求49所述的电子设备，其特征在于，所述声源方向包括：周角和/或俯仰角。The electronic device according to claim 49, wherein the sound source direction comprises: a circumference angle and/or a pitch angle.
一种计算机可读存储介质，其特征在于，所述计算机可读存储介质存储有计算机程序；所述计算机程序被处理器执行时实现如权利要求1-24任一项所述的音频处理方法。A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program; when the computer program is executed by a processor, the audio processing method according to any one of claims 1-24 is implemented.