CN113472943A

CN113472943A - Audio processing method, device, equipment and storage medium

Info

Publication number: CN113472943A
Application number: CN202110735665.7A
Authority: CN
Inventors: 吴晓光
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2021-10-01
Anticipated expiration: 2041-06-30
Also published as: CN113472943B

Abstract

The application discloses an audio processing method, an audio processing device, audio processing equipment and a storage medium, and belongs to the technical field of communication. The method mainly comprises receiving a first input of a recorded audio; determining first sound source direction information of the recorded audio according to first posture information of the electronic equipment in response to the first input; and carrying out audio recording according to the first sound source direction information to obtain a first audio, and correcting the first audio information through second posture information of the electronic equipment held by the user to obtain a target audio.

Description

Audio processing method, device, equipment and storage medium

Technical Field

The present application belongs to the field of communications technologies, and in particular, to an audio processing method, apparatus, device, and storage medium.

Background

With the development of electronic equipment technology, the functions of electronic equipment are more and more abundant and diversified. For example, 3D surround sound or panoramic sound provided by the electronic device can provide a user with a strong sense of presence in video and game play.

Currently, 3D surround sound or panoramic sound can be recorded by professional 3D audio production kits or by human head recording equipment. However, the professional 3D audio production suite is complex to operate, and the human head recording equipment needs special external recording equipment and is inconvenient to carry. Thus, it results in inefficient and less flexible recording of audio.

Disclosure of Invention

An embodiment of the present application provides an audio processing method, an audio processing apparatus, an audio processing device, and a storage medium, which can solve the problems of low efficiency and poor flexibility of currently recording audio.

In a first aspect, an embodiment of the present application provides an audio processing method, which is applied to an electronic device, and the method may include:

receiving a first input of recorded audio;

determining first sound source direction information of the recorded audio according to first posture information of the electronic equipment in response to the first input;

recording audio according to the first sound source direction information to obtain a first audio;

and correcting the first audio information through the second posture information of the electronic equipment held by the user to obtain the target audio.

In a second aspect, an embodiment of the present application provides an audio processing apparatus, which is applied to an electronic device, and the apparatus may include:

the receiving module is used for receiving a first input of the recorded audio;

the determining module is used for responding to the first input and determining first sound source direction information of the recorded audio according to the first posture information of the electronic equipment;

the recording module is used for recording audio according to the first sound source direction information to obtain a first audio;

and the correction module is used for correcting the first audio information through the second posture information of the electronic equipment held by the user to obtain the target audio.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, where the program or instructions, when executed by the processor, implement the steps of the audio processing method as shown in the first aspect.

In a fourth aspect, the present application provides a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the audio processing method as shown in the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the steps of the audio processing method according to the first aspect.

In the embodiment of the application, first sound source direction information of the recorded audio is preliminarily determined through first attitude information of the electronic equipment, and the audio is recorded according to the first sound source direction, so that the first audio is preliminarily obtained. However, in the actual shooting process, the state of the user holding the electronic device, such as holding with both hands in a landscape mode, holding with one hand in a landscape mode, or holding with one hand in a portrait mode, may interfere with the judgment of the sound directivity to some extent, and in order to ensure the accuracy of the recording direction and to improve the quality of the recorded audio, the interference needs to be eliminated. Therefore, the preliminarily obtained first audio is corrected through the posture information of the electronic equipment held by the user, and the accuracy of determining the direction of the sound source and the efficiency of recording the audio are improved. In addition, a user can record target audio with 3D surround sound or panoramic sound at any time and any place by using the portable electronic equipment, so that the flexibility of recording the audio at present is improved.

Drawings

Fig. 1 is a schematic diagram of a processing architecture according to an embodiment of the present application;

fig. 2 is a schematic diagram of a user holding an electronic device according to an embodiment of the present disclosure;

fig. 3 is a flowchart of an audio processing method according to an embodiment of the present application;

fig. 4 is a schematic diagram of a binaural effect provided by an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating an auricle effect according to an embodiment of the present application;

fig. 6 is a schematic diagram of a work flow of an HRTF system according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an electronic device provided with a radio device according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a sound receiving device and a sound source according to an embodiment of the present application;

FIG. 9 is a schematic diagram illustrating the relationship between position information and sound sources of a plurality of sound receiving devices according to an embodiment of the present application;

fig. 10 is a schematic diagram of a user holding an electronic device according to an embodiment of the present application;

FIG. 11 is a schematic diagram illustrating a sound direction based on the user holding the electronic device in FIG. 10 according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 14 is a schematic hardware structure diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

Based on this, the following describes in detail the audio processing method provided by the embodiment of the present application through a specific embodiment and an application scenario thereof with reference to fig. 1 to fig. 3.

An embodiment of the present application provides a processing architecture, which may include an electronic device, as shown in fig. 1. The electronic device 10 may include an acceleration sensor 101, a gyroscope, a photosensor 102, and other devices for determining the posture (or orientation) of the electronic device and determining the posture of the user holding the electronic device; multiple sound receiving devices such as microphones 103 disposed at different orientations on the electronic device may also be included for locating sound source information and recording audio. In one or more embodiments, the electronic device may further include a shooting device, such as a camera, i.e., a front camera or a rear camera 104, for obtaining a video image when shooting the audio and video.

Based on the processing architecture, an application scenario of the audio processing method provided by the embodiment of the present application is described.

The audio processing method in the embodiment of the present application is described by taking an example that an electronic device includes an acceleration sensor, a plurality of microphones, and a rear camera and is in a scene of shooting audio and video. Therefore, a user can shoot the image information in the audio and video and record the audio information corresponding to the image information through the electronic equipment, so that the electronic equipment receives the input of the audio and video (including recording the image information and recording the audio information corresponding to the image information) triggered by the user. The electronic device responds to the first input, acquires first posture information through the acceleration sensor, and judges target shooting direction information of the electronic device, such as the current orientation of the electronic device, namely horizontal (such as the rear camera facing left), horizontal (such as the rear camera facing right) or vertical, based on the first posture information, wherein the target shooting direction information represents the vertical placement of the electronic device as an example, and determines the position information of the plurality of microphones when the electronic device is vertically placed. And determining first sound source direction information of the recorded audio according to the relative position information of at least two microphones in the position information of the plurality of microphones, such as microphone array parameters of four microphones.

Then, audio recording is performed according to the first sound source direction through the microphones 103 in different directions, so as to obtain a first audio.

Furthermore, in the actual shooting process, the state of the user holding the electronic device, such as holding with a horizontal image of both hands, holding with a horizontal image of one hand, or holding with a vertical image of one hand, may interfere with the judgment of the sound directivity to some extent, and in order to ensure the accuracy of the previously recorded first audio, this interference needs to be eliminated, so as shown in fig. 2, in the embodiment of the present application, the second posture information of the user holding the electronic device, such as holding the electronic device with both hands of the user, is obtained through a sensor and other devices, and according to the incidence relation information between the posture information of the user holding the electronic device and the audio adjustment model, the target audio adjustment model corresponding to the second posture information of the user holding the electronic device with both hands is determined. And adjusting the first audio information according to the audio adjustment parameters in the target audio adjustment model to obtain the target audio with the 3D surround sound or panoramic sound attribute.

In the embodiment of the application, first sound source direction information of the recorded audio is preliminarily determined through first attitude information of the electronic equipment, and the audio is recorded according to the first sound source direction, so that the first audio is preliminarily obtained. However, in the actual shooting process, the state of the user holding the electronic device, such as holding with both hands in a landscape mode, holding with one hand in a landscape mode, or holding with one hand in a portrait mode, may interfere with the judgment of the sound directivity to some extent, and in order to ensure the accuracy of the recording direction and to improve the quality of the recorded audio, the interference needs to be eliminated. Therefore, the preliminarily obtained first audio is corrected through the posture information of the electronic equipment held by the user, and the accuracy of determining the direction of the sound source and the efficiency of recording the audio are improved. .

In addition, a user can use the portable electronic equipment, the problem that an existing professional 3D audio making kit or a human head recording device is complex and not easy to carry is solved, 3D surround sound or panoramic sound is recorded anytime and anywhere, the flexibility of recording the 3D surround sound or panoramic sound at present is improved, and the experience of recording the audio with the 3D surround sound or panoramic sound attribute by the user is improved.

It should be noted that the audio processing method provided in the embodiment of the present application may be applied to the above-mentioned scenario in which the user records the audio and video, and may also be applied to a scenario in which the user separately records the target audio with the 3D surround sound or the panoramic sound attribute, where the audio processing method provided in the embodiment of the present application may be applied to any scenario in which the target audio with the 3D surround sound or the panoramic sound attribute is recorded by the electronic device.

According to the application scenario, the following describes in detail the audio processing method provided by the embodiment of the present application with reference to fig. 3.

Fig. 3 is a flowchart of an audio processing method according to an embodiment of the present application.

As shown in fig. 3, the audio processing method may be applied to the electronic devices shown in fig. 1 and fig. 2, and based on this, may specifically include the following steps:

at step 310, a first input to record audio is received. Step 320, in response to the first input, determining first sound source direction information of the recorded audio according to the first posture information of the electronic device. And 330, recording audio according to the first sound source direction information to obtain a first audio. And 340, correcting the first audio information through the second posture information of the electronic equipment held by the user to obtain the target audio.

Therefore, through the first attitude information of the electronic equipment, the first sound source direction information of the recorded audio is preliminarily determined, the audio recording is carried out according to the first sound source direction, and the first audio is preliminarily obtained. However, in the actual shooting process, the state of the user holding the electronic device, such as holding with both hands in a landscape mode, holding with one hand in a landscape mode, or holding with one hand in a portrait mode, may interfere with the judgment of the sound directivity to some extent, and in order to ensure the accuracy of the recording direction and to improve the quality of the recorded audio, the interference needs to be eliminated. Therefore, the preliminarily obtained first audio is corrected through the posture information of the electronic equipment held by the user, and the accuracy of determining the direction of the sound source and the efficiency of recording the audio are improved. In addition, a user can record target audio with 3D surround sound or panoramic sound at any time and any place by using the portable electronic equipment, so that the flexibility of recording the audio at present is improved.

The above steps are described in detail below, specifically as follows:

first, as shown in fig. 4, the human ear perceives 3D surround sound by using a binaural effect (i.e. the sound source 401a and the sound source 401b have different time difference/phase difference and loudness difference to reach two ears, so as to distinguish between left and right) and a pinna effect (i.e. as shown in fig. 5, the sound source 501a and the sound source 501b at different positions have different paths into the ear canal after being reflected and diffracted by the pinna shape, so as to generate a filtering effect, so as to distinguish between front and back, up and down), and the whole human head 502 constitutes a filtering system for sound. Based on this, a Head Related Transfer Functions (HRTF) system is proposed, which can filter sound sources in different directions through the HRTF system, and finally input the sound sources into the ear canal to mix the sound sources to form stereo sound including 3D surround sound or panoramic sound. As shown in fig. 6, the HRTF system simulates the filtering effect of the human head, and when the HRTF system is input, directional sound is output to form 3D surround sound or panoramic sound, as long as the direction of the sound source is determined.

Based on this, the sound source direction can be determined by the following steps 320 to 330.

Referring to step 320, in one or more optional embodiments, step 320 may specifically include:

determining target shooting direction information of the electronic equipment according to the three-axis rotation motion information under the condition that the first posture information comprises the three-axis rotation motion information of the electronic equipment;

according to the target shooting direction information, determining position information of a plurality of radio devices in the electronic equipment corresponding to the target shooting direction information;

and determining first sound source direction information of the recorded audio according to the relative position information of at least two radio devices in the plurality of radio devices.

It should be noted that, in the embodiment of the present application, at least three radio receivers in the plurality of radio receivers are not in the same straight line, and at least four radio receivers in the plurality of radio receivers are not in the same plane.

Exemplarily, as shown in fig. 7, the electronic device is provided with at least four microphones, i.e. a microphone 701, a microphone 702, a microphone 703 and a microphone 704, and the layout of the four microphones requires: the sound inlet holes of any three of the four microphones are not on the same straight line, such as the microphone 701, the microphone 702 and the microphone 703, and the openings of the four microphones are not on the same plane.

Based on this, as shown in fig. 8, when any two microphones receive sound waves emitted by a sound source 801 at any point in space, a time difference is generated, and by setting I (length) to v (velocity) x t (time), a distance difference between the sound source and the two microphones can be calculated, and the four microphones can determine a unique sound source position, where 3 microphones form a plane, and microphones at symmetrical positions on both sides of the plane cannot be distinguished, and a fourth microphone is required for auxiliary confirmation, as shown in fig. 9, and then the pitch angles α, β, γ of the sound source 801 in a coordinate axis with the center of the electronic device as a center line point can be calculated. Here, since the orientation of the electronic device is different at the time of shooting, the first posture information may be acquired in conjunction with the acceleration sensor first. Based on the first attitude information, target shooting direction information of the electronic equipment is judged, such as the current orientation of the electronic equipment, namely, the electronic equipment is horizontally placed (such as the rear camera faces to the left), horizontally placed (such as the rear camera faces to the right) or vertically placed, different target shooting direction information corresponds to different microphone position information, such as microphone array parameters 1-4 of four microphones corresponding to the horizontal placement (such as the rear camera faces to the left), microphone array parameters 5-8 of four microphones corresponding to the horizontal placement (such as the rear camera faces to the right), or microphone array parameters 9-12 of four microphones corresponding to the vertical placement.

Based on this, step 330 is involved, and in one or more alternative embodiments, the audio recording may be performed according to the first sound source direction determined in the above manner, resulting in a first audio.

However, in the actual shooting process, the state of the user holding the electronic device, such as holding the electronic device with both hands in a horizontal manner, holding the electronic device with one hand in a horizontal manner, or holding the electronic device with one hand in a vertical manner, may interfere with the determination of the sound directivity to some extent, and in order to ensure the accuracy of the recording direction and to improve the quality of the recorded audio, the interference needs to be eliminated.

Thus, referring to step 340, in one or more alternative embodiments, step 340 may specifically include:

determining a target audio adjusting model corresponding to the second posture information according to the incidence relation information between the posture information of the electronic equipment held by the user and the audio adjusting model;

and adjusting the first sound source direction information according to the audio adjustment parameters in the target audio adjustment model to obtain the target audio.

For example, in the actual shooting process, the electronic device is held by the user, such as two-hand portrait holding, one-hand landscape holding, and one-hand portrait holding, as shown in fig. 10, taking the user as an example of holding the electronic device with two hands, as shown in fig. 11, a part of the sound transmitted into the microphone by the external sound enters after being reflected by the hands, as shown by arrows in fig. 11. This has a certain interference to the judgment of the sound directivity, and in order to ensure the accuracy of determining the sound source direction, the interference needs to be eliminated. The holding posture will be slightly different because of different hand sizes of different people. Therefore, the neural network may be used to train through a large amount of sample data (i.e., the third posture information of the electronic device held by the user and the audio information corresponding to the third posture information of each kind of electronic device held by the user), so as to obtain the audio adjustment model. Then, the first sound source direction information can be adjusted through the audio adjusting parameters in the audio adjusting model, and second sound source direction information is obtained. Here, for each holding method, the corresponding audio adjustment model can be obtained, and different holding methods can be used to correspond to different audio adjustment models, and the specific procedure for determining the second sound source direction information can be as follows.

Based on this, before step 340, a step of determining association relationship information between the posture information of the user holding the electronic device and the audio adjustment model may also be included, and specifically, the step may include:

acquiring sample data, wherein the sample data comprises third posture information of at least two types of electronic equipment held by users in a historical time period and audio information corresponding to the third posture information of each type of electronic equipment held by the users;

inputting the third posture information of each kind of electronic equipment held by the user and the audio information corresponding to the third posture information of each kind of electronic equipment held by the user into an audio adjusting model, and training the audio adjusting model until a preset training condition is met to obtain a trained first audio adjusting model;

and associating each first audio adjustment model with the third posture information of the electronic equipment held by the user corresponding to each first audio adjustment model to obtain association relation information.

Then, in one or more optional embodiments, the step 340 may specifically include:

and inputting the audio adjusting parameters in the first audio and target audio adjusting model into the head-related transfer function model to obtain a target audio corresponding to the second sound source direction information, wherein the target audio is an audio with 3D surround sound or panoramic sound attributes.

In the embodiment of the application, first sound source direction information of the recorded audio is determined through a plurality of microphones arranged in the electronic equipment and first posture information of the electronic equipment, and the audio is recorded according to the first sound source direction information to obtain the first audio preliminarily. However, in the actual shooting process, the state of the user holding the electronic device, such as holding with both hands in a landscape mode, holding with one hand in a landscape mode, or holding with one hand in a portrait mode, may interfere with the determination of the sound directivity to some extent, and in order to ensure the accuracy of the direction, this interference needs to be eliminated. Therefore, the preliminarily obtained first audio is corrected through the posture information of the electronic equipment held by the user, and the accuracy of determining the direction of the sound source and the efficiency of recording the audio are improved. In addition, the user can record 3D surround sound or panoramic sound anytime and anywhere by using the portable electronic equipment, so that the flexibility of recording audio at present is improved.

It should be noted that, in the embodiment of the present application, a sound source is taken as an example for description, and certainly, a plurality of sound sources may also be provided in the embodiment of the present application, and similarly, a plurality of pieces of second sound source direction information may be determined according to the audio processing method provided in the embodiment of the present application, and then, audio recording is performed according to a plurality of second sound source directions to obtain initial multiple pieces of first audio; and inputting the audio adjusting parameters in the target audio adjusting model corresponding to the plurality of first audios and each first audio into the corresponding head-related transfer function model to obtain the target audio with the 3D surround sound or panoramic sound attribute.

It should be noted that, in the audio processing method provided in the embodiment of the present application, the execution subject may be an audio processing apparatus, or a control module for executing the audio processing method in the audio processing apparatus. In the embodiment of the present application, an audio processing apparatus is taken as an example to execute an audio processing method, and an audio processing apparatus provided in the embodiment of the present application is described.

Based on the same inventive concept, the application also provides an audio processing device. The details are described with reference to fig. 12.

Fig. 12 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present application.

As shown in fig. 12, the audio processing apparatus 120 is applied to the electronic devices shown in fig. 1-2, and may specifically include:

a receiving module 1201, configured to receive a first input of a recorded audio;

a determining module 1202, configured to determine, in response to a first input, first sound source direction information of a recorded audio according to first posture information of an electronic device;

a recording module 1203, configured to record an audio according to the first sound source direction information to obtain a first audio;

the correcting module 1204 is configured to correct the first audio information through the second posture information of the electronic device held by the user, so as to obtain a target audio.

The following describes the audio processing apparatus 120 in detail, specifically as follows:

in one or more possible embodiments, the determining module 1202 may be specifically configured to, in a case that the first posture information includes three-axis rotational motion information of the electronic device, determine target shooting direction information of the electronic device according to the three-axis rotational motion information;

It should be noted that at least three radio receivers in the plurality of radio receivers are not on the same straight line, and at least four radio receivers in the plurality of radio receivers are not on the same plane.

Based on this, the correcting module 1203 is specifically configured to determine, according to the incidence relation information between the posture information of the electronic device held by the user and the audio adjustment model, a target audio adjustment model corresponding to the second posture information;

In one or more possible embodiments, the audio processing device 120 further includes: the system comprises an acquisition module, a training module and an association module; wherein,

the acquisition module is used for acquiring sample data, wherein the sample data comprises third posture information of at least two types of users holding the electronic equipment in a historical time period and audio information corresponding to the third posture information of each type of user holding the electronic equipment;

the training module is used for respectively inputting the third posture information of each kind of electronic equipment held by the user and the audio information corresponding to the third posture information of each kind of electronic equipment held by the user into the audio adjustment model, training the audio adjustment model until preset training conditions are met, and obtaining a trained first audio adjustment model;

and the association module is used for associating each first audio adjustment model with the third posture information of the user holding the electronic equipment corresponding to each first audio adjustment model to obtain association relation information.

In one or more possible embodiments, the correcting module 1204 is specifically configured to input the audio adjustment parameters in the first audio and target audio adjustment model into the head-related transfer function model to obtain a target audio corresponding to the second sound source direction information, where the target audio is an audio with 3D surround sound or panoramic sound properties.

The audio processing apparatus in the embodiment of the present application may be an apparatus, and may also be a component, an integrated circuit, or a chip in an electronic device. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.

The audio processing apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.

The audio processing apparatus provided in the embodiment of the present application can implement each process implemented by the method embodiments in fig. 1 to fig. 11, and is not described herein again to avoid repetition.

Optionally, as shown in fig. 13, an electronic device 130 is further provided in this embodiment of the present application, and includes a processor 1301, a memory 1302, and a program or an instruction stored in the memory 1302 and capable of running on the processor 1301, where the program or the instruction is executed by the processor 1301 to implement each process of the foregoing audio processing method embodiment, and can achieve the same technical effect, and details are not repeated here to avoid repetition.

It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic device and the non-mobile electronic device described above.

The electronic device 1400 includes, but is not limited to: radio unit 1401, network module 1402, audio output unit 1403, input unit 1404, sensor 1405, display unit 1406, user input unit 1407, interface unit 1408, memory 1409, processor 1410, and radio 1411.

Those skilled in the art will appreciate that the electronic device 1400 may further comprise a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 1410 via a power management system, so as to implement functions of managing charging, discharging, and power consumption via the power management system. The electronic device structure shown in fig. 14 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is not repeated here.

The user input unit 1407 in this embodiment of the present application is configured to receive a first input of recorded audio. A processor 1410 configured to determine, in response to a first input, first sound source direction information of the recorded audio according to first posture information of the electronic device; recording audio according to the first sound source direction information to obtain a first audio; and correcting the first audio information through the second posture information of the electronic equipment held by the user to obtain the target audio.

In one or more possible embodiments, the processor 1410 may be specifically configured to, in a case that the first posture information includes three-axis rotational motion information of the electronic device, determine target shooting direction information of the electronic device according to the three-axis rotational motion information of the electronic device;

Based on this, the processor 1410 is specifically configured to determine, according to the incidence relation information between the posture information of the user holding the electronic device and the audio adjustment model, a target audio adjustment model corresponding to the second posture information; and adjusting the first sound source direction information according to the audio adjustment parameters in the target audio adjustment model to obtain the target audio.

In yet another or more possible embodiments, the processor 1410 may be further configured to obtain sample data including third pose information of at least two users holding the electronic device and audio information corresponding to the third pose information of each user holding the electronic device for a historical period of time; inputting the third posture information of each kind of electronic equipment held by the user and the audio information corresponding to the third posture information of each kind of electronic equipment held by the user into an audio adjusting model, and training the audio adjusting model until a preset training condition is met to obtain a trained first audio adjusting model; and associating each first audio adjustment model with the third posture information of the electronic equipment held by the user corresponding to each first audio adjustment model to obtain association relation information.

In one or more possible embodiments, the processor 1410 is specifically configured to input the audio adjustment parameters in the first audio and target audio adjustment model into the head-related transfer function model, so as to obtain a target audio corresponding to the second sound source direction information, where the target audio is an audio with 3D surround sound or panoramic sound properties.

It is to be appreciated that the input Unit 1404 may include a Graphics Processing Unit (GPU) 14041 and a microphone 14042, the Graphics processor 14041 Processing image data of still images or video obtained by an image capture device (e.g., a camera) in a video capture mode or an image capture mode. The display unit 1406 may include a display panel 14061, and the display panel 14061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1407 includes a touch panel 14071 and other input devices 14072. Touch panel 14071, also referred to as a touch screen. The touch panel 14071 may include two parts of a touch detection device and a touch controller. Other input devices 14072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. The memory 1409 may be used to store software programs as well as various data, including but not limited to application programs and operating systems. The processor 1410 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 1410.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the embodiment of the audio processing method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device in the above embodiment. The readable storage medium includes a computer-readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

In addition, an embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the above-mentioned audio processing method embodiment, and the same technical effect can be achieved.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method of the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An audio processing method, comprising:

receiving a first input of recorded audio;

responding to the first input, and determining first sound source direction information of the recorded audio according to first posture information of the electronic equipment;

and correcting the first audio information through second posture information of the electronic equipment held by the user to obtain a target audio.

2. The method of claim 1, wherein the first pose information comprises three-axis rotational motion information of the electronic device; the determining of the first sound source direction information of the recorded audio according to the first posture information of the electronic device includes:

determining target shooting direction information of the electronic equipment according to the three-axis rotation motion information;

3. The method of claim 2, wherein at least three of the plurality of sound receivers are not collinear and at least four of the plurality of sound receivers are not coplanar.

4. The method of claim 1, wherein the correcting the first audio information by the second gesture information of the user holding the electronic device to obtain the target audio comprises:

determining a target audio adjusting model corresponding to the second posture information according to incidence relation information between the posture information of the electronic equipment held by the user and the audio adjusting model;

5. The method of claim 4, wherein prior to determining the target audio adjustment model corresponding to the second pose information, the method further comprises:

acquiring sample data, wherein the sample data comprises third posture information of at least two types of users holding the electronic equipment in a historical time period and audio information corresponding to the third posture information of each type of users holding the electronic equipment;

inputting third posture information of each user holding the electronic equipment and audio information corresponding to the third posture information of each user holding the electronic equipment into an audio adjustment model respectively, and training the audio adjustment model until preset training conditions are met to obtain a trained first audio adjustment model;

and associating each first audio adjustment model with third posture information of the electronic equipment held by the user corresponding to each first audio adjustment model to obtain the association relation information.

6. The method of claim 4, wherein the adjusting the first sound source direction information according to the audio adjustment parameter in the target audio adjustment model to obtain the target audio comprises:

and inputting the first audio and the audio adjusting parameters in the target audio adjusting model into a head-related transfer function model to obtain a target audio corresponding to the second sound source direction information, wherein the target audio is an audio with 3D surround sound or panoramic sound properties.

7. An audio processing apparatus, comprising:

the receiving module is used for receiving a first input of the recorded audio;

the determining module is used for responding to the first input and determining first sound source direction information of the recorded audio according to first posture information of the electronic equipment;

and the correction module is used for correcting the first audio information through second posture information of the electronic equipment held by a user to obtain a target audio.

8. The apparatus of claim 7, wherein the first pose information comprises three-axis rotational motion information of the electronic device; the determining module is specifically configured to determine target shooting direction information of the electronic device according to the three-axis rotational motion information;

9. The apparatus of claim 8, at least three of the plurality of sound receivers being not in a same line and at least four of the plurality of sound receivers being not in a same plane.

10. The apparatus according to claim 7, wherein the correction module is specifically configured to determine, according to association relationship information between pose information of a user holding the electronic device and an audio adjustment model, a target audio adjustment model corresponding to the second pose information;

11. The apparatus of claim 10, wherein the audio processing apparatus further comprises: the system comprises an acquisition module, a training module and an association module; wherein,

the acquisition module is used for acquiring sample data, wherein the sample data comprises third posture information of at least two users holding the electronic equipment in a historical time period and audio information corresponding to the third posture information of each user holding the electronic equipment;

the training module is used for inputting third posture information of each user holding the electronic equipment and audio information corresponding to the third posture information of each user holding the electronic equipment into an audio adjustment model respectively, and training the audio adjustment model until preset training conditions are met to obtain a trained first audio adjustment model;

the association module is configured to associate each first audio adjustment model with third posture information of the electronic device held by the user corresponding to each first audio adjustment model, so as to obtain the association relationship information.

12. The apparatus according to claim 10, wherein the correction module is specifically configured to input the audio adjustment parameters in the first audio and the target audio adjustment model into a head-related transfer function model, so as to obtain a target audio corresponding to the second sound source direction information, where the target audio is an audio with 3D surround sound or panoramic sound properties.

13. An electronic device, comprising: a processor, a memory and a program or instructions stored on the memory and executable on the processor, which when executed by the processor, implement the steps of the audio processing method of any of claims 1-6.

14. A readable storage medium, characterized in that it stores thereon a program or instructions which, when executed by a processor, implement the steps of the audio processing method according to any one of claims 1 to 6.