CN108364648B

CN108364648B - Method and device for acquiring audio information

Info

Publication number: CN108364648B
Application number: CN201810141926.0A
Authority: CN
Inventors: 耿雷
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-02-11
Filing date: 2018-02-11
Publication date: 2021-08-03
Anticipated expiration: 2038-02-11
Also published as: CN108364648A

Abstract

The embodiment of the application discloses a method and a device for acquiring audio information. One embodiment of the method comprises: acquiring audio to be processed in real time, and performing audio identification on the audio to be processed; responding to the fact that a wake-up signal exists in the audio to be processed, acquiring first direction information of the wake-up signal, and acquiring audio information of a sound source corresponding to the first direction information, wherein the first direction information is used for representing the direction of the sound source sending the wake-up signal; and performing data processing on the audio information in response to that second direction information of the audio information is the same as the first direction information of the wake-up signal, wherein the second direction information is used for representing the direction in which a sound source emitting the audio information is located. This embodiment improves the efficiency of acquiring audio information of a sound source.

Description

Method and device for acquiring audio information

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to the technical field of audio processing, and particularly relates to a method and a device for acquiring audio information.

Background

With the development of science and technology, intelligent equipment provides various conveniences for the work and life of users. Through the intelligent equipment, the user can acquire the latest news information, perform instant chat with other people, search professional data and the like.

Typically, the operation of the smart device needs to be done manually. To further improve the work efficiency of the user, some smart devices may support voice interaction between the user and the smart device. The intelligent device supporting voice interaction can acquire the audio information of a user and acquire related instructions from the audio information to realize corresponding operation.

Disclosure of Invention

The embodiment of the application aims to provide a method and a device for acquiring audio information.

In a first aspect, an embodiment of the present application provides a method for acquiring audio information, where the method includes: acquiring audio to be processed in real time, and performing audio identification on the audio to be processed; responding to the fact that a wake-up signal exists in the audio to be processed, acquiring first direction information of the wake-up signal, and acquiring audio information of a sound source corresponding to the first direction information, wherein the first direction information is used for representing the direction of the sound source sending the wake-up signal; and performing data processing on the audio information in response to that second direction information of the audio information is the same as the first direction information of the wake-up signal, wherein the second direction information is used for representing the direction in which a sound source emitting the audio information is located.

In some embodiments, the above method further comprises: and selecting one microphone from the microphone array as a wake-up signal monitoring microphone, wherein the wake-up signal monitoring microphone is used for collecting audio to be processed.

In some embodiments, the obtaining the first direction information of the wake-up signal includes: acquiring to-be-processed audio collected by each microphone in a microphone array, and determining the marking time of the wake-up signal in each to-be-processed audio to obtain a marking time set; sequencing the marking time in the marking time set according to the time sequence to obtain a marking time sequence; and setting first direction information of the wake-up signal according to the spatial direction of the microphone corresponding to the previously set marking time in the marking time sequence, wherein the spatial direction is used for representing the direction of the microphone for collecting audio.

In some embodiments, the setting of the first direction information of the wake-up signal according to the spatial orientation of the microphone corresponding to the previously set tag time in the tag time sequence includes: inquiring a microphone space direction table to obtain a space angle corresponding to the space direction of each microphone, wherein the microphone space direction table is used for representing the corresponding relation between the space direction of the microphone and the space angle of the collected audio at the space position where the microphone is located; and setting the angle range formed by the space angles corresponding to the space directions of the microphones corresponding to the previously set mark time in the mark time sequence as the first direction information of the wake-up signal according to the angle range.

In some embodiments, the above method further comprises: and responding to the fact that the second direction information of the audio information is different from the first direction information of the wake-up signal, and when the wake-up signal is detected to exist in the audio information, reacquiring the first direction information of the wake-up signal.

In a second aspect, an embodiment of the present application provides an apparatus for acquiring audio information, where the apparatus includes: the audio recognition unit is used for acquiring audio to be processed in real time and performing audio recognition on the audio to be processed; the first direction information acquisition unit is used for acquiring first direction information of the wake-up signal in response to the fact that the wake-up signal exists in the audio to be processed and acquiring audio information of a sound source corresponding to the first direction information, wherein the first direction information is used for representing the direction of the sound source sending the wake-up signal; and the audio information acquisition unit is used for responding that second direction information of the audio information is the same as the first direction information of the wake-up signal and carrying out data processing on the audio information, wherein the second direction information is used for representing the direction of a sound source which emits the audio information.

In some embodiments, the above apparatus further comprises: and the microphone setting unit is used for selecting one microphone from the microphone array as a wake-up signal monitoring microphone, and the wake-up signal monitoring microphone is used for acquiring audio to be processed.

In some embodiments, the first direction information acquiring unit includes: a to-be-processed audio acquisition subunit, configured to acquire to-be-processed audio acquired by each microphone in the microphone array, and determine a marking time of the wake-up signal in each to-be-processed audio to obtain a marking time set; the marking time sequence acquiring subunit is used for sequencing the marking times in the marking time set according to the time sequence to obtain a marking time sequence; and the first direction information setting subunit is used for setting the first direction information of the wake-up signal according to the spatial direction of the microphone corresponding to the previously set marking time in the marking time sequence, wherein the spatial direction is used for representing the direction of the microphone for collecting the audio.

In some embodiments, the first direction information setting subunit includes: the microphone spatial direction table is used for representing the corresponding relation between the spatial direction of the microphone and the spatial angle of the collected audio at the spatial position of the microphone; and the first direction information setting module is used for setting the first direction information of the wake-up signal according to an angle range formed by space angles corresponding to the space directions of the microphones corresponding to the previously set mark time in the mark time sequence.

In some embodiments, the above apparatus further comprises: and the first direction information updating unit responds that the second direction information of the audio information is different from the first direction information of the wake-up signal, and is used for reacquiring the first direction information of the wake-up signal when the wake-up signal is detected to exist in the audio information.

In a third aspect, an embodiment of the present application provides a server, including: one or more processors; a memory for storing one or more programs; the microphone array is used for collecting audio information of a sound source; the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method for obtaining audio information of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method for acquiring audio information of the first aspect.

According to the method and the device for acquiring the audio information, the first direction information of the wake-up signal is acquired after the wake-up signal is detected from the audio to be processed; and then, performing data processing on the audio information when the second direction information of the audio information is the same as the first direction information of the wake-up signal. According to the method, the continuous collection of the audio information of the sound source can be realized only by detecting the wake-up signal once, so that frequent detection of the wake-up signal and frequent detection of the first direction information after the wake-up signal is detected are avoided, and the efficiency of obtaining the audio information of the sound source is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for obtaining audio information according to the present application;

FIG. 3 is a schematic illustration of an application scenario of a method for obtaining audio information according to the present application;

FIG. 4 is a schematic block diagram illustrating one embodiment of an apparatus for obtaining audio information according to the present application;

FIG. 5 is a schematic diagram of a system architecture of a server suitable for use in implementing embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the method for acquiring audio information or the apparatus for acquiring audio information of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include an audio capture device 101, a network 102, and a server 103. The network 102 serves as a medium to provide a communication link between the audio capture device 101 and the server 103. The server 103 is configured to perform data processing on data collected by the audio collection device 101. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The audio collection device 101 may collect audio information; the server 103 is configured to collect audio information when a wake-up signal exists in the audio to be processed collected by the audio collection device 101, and perform data processing on the audio information when the direction information of the audio information is the same as the direction information of the wake-up signal.

The audio capture device 101 may be a stand-alone microphone array, a microphone array integrated on an electronic device, or the like; the server 103 may be a server that provides various services, such as a server that performs data processing on data collected by the audio collecting apparatus 101. The server 103 performs data processing on the data acquired by the audio acquisition apparatus 101, so that the audio acquisition apparatus 101 can accurately acquire audio information of a sound source.

It should be noted that the method for acquiring audio information provided in the embodiment of the present application is generally performed by the server 103, and accordingly, the apparatus for acquiring audio information is generally disposed in the server 103.

It should be understood that the number of audio capture devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of audio capture devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for obtaining audio information in accordance with the present application is shown. The method for acquiring audio information comprises the following steps:

step 201, acquiring the audio to be processed in real time, and performing audio identification on the audio to be processed.

In this embodiment, the electronic device (for example, the server 103 shown in fig. 1) on which the method for acquiring audio information operates may receive the audio to be processed from the audio acquisition device 101 (for example, a microphone array) through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

When a user (i.e., a sound source) is within a certain distance range from the microphone array, the microphone array can acquire the audio to be processed of the user. Typically, the audio to be processed of the user may contain a wake-up signal and/or instructions for data processing. When the related data operation is performed through the audio information of the user, a wake-up signal needs to be received first to inform the data processing device to receive a data processing instruction, and then the data processing instruction is obtained from the audio and the related data operation is performed. Therefore, the frequency of the wake-up signal is too high in the data processing process, and the user may be audio information sent out under the condition of moving, in order to obtain clear and effective audio information, the microphone array needs to position the position of the user again every time, and then the direction of acquiring the audio information is determined; and then, whether the user audio information has the wake-up signal is identified, so that the efficiency of acquiring the audio information of the sound source is not high.

In the application, the server 103 may obtain the audio to be processed collected by the microphone array in real time, and perform audio recognition on the audio to be processed.

In some optional implementations of this embodiment, the method may further include: and selecting one microphone from the microphone array as a wake-up signal monitoring microphone.

In order to obtain clear and accurate audio information as much as possible, the microphone array usually includes a plurality of microphones, and the plurality of microphones may be combined into a microphone array having a spherical, hemispherical, or other structure. For a certain sound source, the time of the audio signal of the sound source collected by each microphone may be different, mainly due to the distance and angle of each microphone relative to the sound source. In order to reduce the data processing amount and avoid mutual interference between signals, the present embodiment may select one microphone from the microphone array as the wake-up signal monitoring microphone, and other microphones in the microphone array except the monitoring microphone also collect the audio to be processed, but are not used to detect the wake-up word. The wake-up signal monitoring microphone is used for collecting audio to be processed. The monitoring microphone may select the microphone closest to the sound source or the microphone with the largest spatial-facing angle (i.e., the angle at which the audio information is collected).

Step 202, in response to detecting that a wake-up signal exists in the audio to be processed, acquiring first direction information of the wake-up signal, and acquiring audio information of a sound source corresponding to the first direction information.

When the server 105 detects the presence of a wake-up signal in the audio to be processed, all microphones comprised by the microphone array may be controlled to collect audio information. First direction information of a sound source with respect to the microphone array may be determined by a difference of audio information collected for each microphone. Wherein, the first direction information is used for representing the direction of the sound source sending the wake-up signal. Then, the audio information of the sound source corresponding to the first direction information (i.e., the sound source pointed by the direction corresponding to the first direction information) is obtained. Since the audio information is obtained after the wake-up signal is obtained, the audio information usually includes related operation instructions.

In some optional implementation manners of this embodiment, the obtaining the first direction information of the wake-up signal may include the following steps:

the method comprises the steps of firstly, acquiring to-be-processed audio collected by each microphone in a microphone array, and determining the marking time of the wake-up signal in each to-be-processed audio to obtain a marking time set.

When the microphone array collects the audio of the sound source, each microphone in the microphone array collects the audio to be processed of the sound source. The audio information collected by each microphone on the microphone array may be referenced to a time on the server 103. Due to the fact that the microphones are different in position from each other, the angles of the microphones for collecting the audio to be processed are also different, and therefore the time for the audio emitted by the same sound source to reach each microphone is different, namely the marking time corresponding to the wake-up signal contained in the audio information collected by each microphone is different. To this end, the time stamp of the corresponding wake-up signal may be extracted from the audio to be processed captured by each microphone. In this way, a set of time stamps for the microphone array can be obtained.

And secondly, sequencing the marking time in the marking time set according to the time sequence to obtain a marking time sequence.

As can be seen from the above description, the positions and angles of different microphones and sound sources in the microphone array are different from each other. Typically, the audio from the sound source reaches each microphone at a constant velocity. Therefore, it can be determined which microphones are closer to the sound source by the above-mentioned time marking. In this embodiment, the marking times in the marking time set may be sorted according to the sequence of the occurrence of the marking times, so as to obtain a marking time sequence.

And thirdly, setting first direction information of the wake-up signal according to the space direction of the microphone corresponding to the previously set marking time in the marking time sequence.

The earlier the marking time is, the closer the corresponding microphone is to the sound source; the later the time stamp, the further away the corresponding microphone is from the sound source. In combination with the positional relationship between the microphones, the first directional information of the wake-up signal (for characterizing the direction of the sound source relative to the microphone array) may be set by the spatial orientation of the microphones. Wherein the spatial orientation is used to characterize the direction from which the microphone picks up the audio. The direction from which the microphone picks up audio can be considered as the direction from which the microphone picks up a clear audio signal. In practice, the spatial orientation may be set to a certain direction or a certain range of directions in space.

In some optional implementation manners of this embodiment, the setting of the first direction information of the wake-up signal according to the spatial orientation of the microphone corresponding to the previously set marking time in the marking time sequence may include the following steps:

firstly, inquiring a microphone space direction table to obtain a space angle corresponding to the space direction of each microphone.

When a microphone array is placed at a certain position, in order to facilitate the determination of the position of a sound source, a microphone spatial direction table may first be constructed. The microphone spatial direction table is used for representing the corresponding relation between the spatial direction of the microphone and the spatial angle of the collected audio at the spatial position of the microphone. By querying the microphone spatial direction table, the spatial angle corresponding to the spatial direction of each microphone can be obtained.

And secondly, setting an angle range formed by space angles corresponding to the space direction of the microphone corresponding to the previously set mark time in the mark time sequence, and setting the angle range as first direction information of the wake-up signal according to the angle range.

Obtaining a spatial angle corresponding to the spatial direction of the microphone corresponding to each marking time in the previously set marking time in the marking time sequence through the microphone spatial direction table; these spatial angles are then combined to obtain a range of angles. After the angle range is obtained, the angle at which the angle bisector corresponding to the angle range is located may be determined as the first direction information of the wake-up signal. The first direction information may be determined by setting a spatial coordinate system to determine a specific angle value.

Step 203, in response to that the second direction information of the audio information is the same as the first direction information of the wake-up signal, performing data processing on the audio information.

As can be seen from the above description, after the server 103 determines the wake-up signal, the microphone array may collect audio information of the sound source corresponding to the wake-up signal. The sound source may be stationary or may be mobile. Taking the sound source being stationary as an example, after the wake-up signal is detected for the first time, as long as the second direction information of the subsequently acquired audio information is the same as the first direction information of the wake-up signal, it can be said that the position of the sound source is not changed, and the wake-up signal detection does not need to be performed on the acquired audio information every time. And then, controlling the microphone array according to the first direction information of the wake-up signal to acquire the audio information of the sound source corresponding to the first direction information of the wake-up signal, and further performing corresponding data processing on the audio information. Therefore, frequent detection of the wake-up signal and frequent acquisition of the first direction information of the wake-up signal can be avoided, and the efficiency of acquiring the audio information is improved. The second direction information is used for representing the direction of a sound source which emits the audio information.

In some optional implementations of this embodiment, the method may further include: and responding to the fact that the second direction information of the audio information is different from the first direction information of the wake-up signal, and when the wake-up signal is detected to exist in the audio information, reacquiring the first direction information of the wake-up signal.

As can be seen from the above description, the sound source may be stationary or mobile. When the sound source moves, the second direction information of the audio information collected by the microphone array is different from the first direction information of the wake-up signal. In order to acquire accurate and effective audio information, the wake-up signal needs to be detected again, the first direction information of the wake-up signal needs to be acquired again, and then the microphone array is controlled to acquire the audio information of the sound source corresponding to the first direction information of the wake-up signal, so that the sound source can be tracked.

In the above process, after the first direction information of the wake-up signal is acquired, the step of controlling the microphone array to collect audio may include:

firstly, determining the corresponding sound source angle of the sound source in the microphone space direction table.

The microphone spatial direction table is used for representing the corresponding relation between the spatial direction of the microphone and the spatial angle of the collected audio at the spatial position of the microphone. By querying the microphone spatial direction table, the spatial angle corresponding to the spatial direction of each microphone can be obtained.

And secondly, setting the microphone corresponding to the sound source angle as a sound source microphone, and acquiring audio information through the sound source microphone.

The server 103 may set a microphone corresponding to a sound source angle as a sound source microphone, which may be regarded as a microphone closest to the sound source in the distance and direction from which audio is collected. The server 103 may then collect audio information via the sound source microphone. One or more sound source microphones may be provided.

In some optional implementations of this embodiment, the controlling the microphone array to collect the audio information of the sound source according to the position information may further include: and shielding the audio information collected by the microphones except the sound source microphone in the microphone array.

In order to be able to track dynamic sound sources, the microphone array contains each microphone in a state in which it collects audio. After the sound source microphone is determined, in order to avoid interference of audio information collected by other microphones, the audio information collected by microphones other than the sound source microphone in the microphone array can be shielded, so that the accuracy of audio information analysis is improved.

When the audio information of the sound source is not collected within the set time, the sound source can be considered to not send the audio information any more. At this time, the server 105 may control the microphone array to stop collecting the audio information of the sound source.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for acquiring audio information according to the present embodiment. In the application scenario of fig. 3, a user (sound source) may send out a wake-up signal in a classroom (or other space), and after the microphone array 101 collects the wake-up signal, the server 103 identifies the wake-up signal and obtains first direction information of the wake-up signal; then, the server 104 controls the microphone array 101 to collect the audio information of the user and perform data processing on the audio information, thereby controlling the content displayed on the screen.

According to the method provided by the embodiment of the application, after the wake-up signal is detected from the audio to be processed, the first direction information of the wake-up signal is obtained; and then, performing data processing on the audio information when the second direction information of the audio information is the same as the first direction information of the wake-up signal. According to the method, the continuous collection of the audio information of the sound source can be realized only by detecting the wake-up signal once, so that frequent detection of the wake-up signal and frequent detection of the first direction information after the wake-up signal is detected are avoided, and the efficiency of obtaining the audio information of the sound source is improved.

With further reference to fig. 4, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for acquiring audio information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 4, the apparatus 400 for acquiring audio information of the present embodiment may include: an audio recognition unit 401, a first direction information acquisition unit 402, and an audio information acquisition unit 403. The audio recognition unit 401 is configured to obtain an audio to be processed in real time, and perform audio recognition on the audio to be processed; a first direction information obtaining unit 402, configured to, in response to detecting that a wake-up signal exists in an audio to be processed, obtain first direction information of the wake-up signal, and obtain audio information of a sound source corresponding to the first direction information, where the first direction information is used to represent a direction in which the sound source that sends the wake-up signal is located; the audio information collecting unit 403, configured to perform data processing on the audio information in response to that second direction information of the audio information is the same as the first direction information of the wake-up signal, where the second direction information is used to represent a direction in which a sound source emitting the audio information is located.

In some optional implementations of this embodiment, the apparatus 400 for acquiring audio information may further include: and a microphone setting unit (not shown) for selecting one microphone from the microphone array as a wake-up signal monitoring microphone for acquiring the audio to be processed.

In some optional implementation manners of this embodiment, the first direction information obtaining unit 402 may include: a to-be-processed audio acquiring subunit (not shown in the figure), a marked time series acquiring subunit (not shown in the figure), and a first direction information setting subunit (not shown in the figure). The to-be-processed audio acquisition subunit is configured to acquire to-be-processed audio acquired by each microphone in the microphone array, and determine a marking time of the wake-up signal in each to-be-processed audio to obtain a marking time set; the marking time sequence acquisition subunit is used for sequencing the marking times in the marking time set according to the time sequence to obtain a marking time sequence; the first direction information setting subunit is configured to set first direction information of the wake-up signal according to a spatial direction of a microphone corresponding to a previously set marking time in the marking time sequence, where the spatial direction is used to represent a direction in which the microphone collects audio.

In some optional implementation manners of this embodiment, the first direction information setting subunit may include: a spatial angle query module (not shown) and a first direction information setting module (not shown). The microphone space direction table is used for representing the corresponding relation between the space direction of the microphone and the space angle of the collected audio at the space position where the microphone is located; the first direction information setting module is used for setting the first direction information of the wake-up signal according to an angle range formed by space angles corresponding to the space directions of the microphones corresponding to the previously set marking time in the marking time sequence.

In some optional implementations of this embodiment, the apparatus 400 for acquiring audio information may further include: and the first direction information updating unit responds that the second direction information of the audio information is different from the first direction information of the wake-up signal, and is used for reacquiring the first direction information of the wake-up signal when the wake-up signal is detected to exist in the audio information.

The present embodiment further provides a server, including: one or more processors; a memory for storing one or more programs, a microphone array for collecting audio information of a sound source; the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method for obtaining audio information described above.

The present embodiment also provides a computer-readable storage medium on which a computer program is stored, which program, when being executed by a processor, carries out the above-mentioned method for acquiring audio information.

Referring now to FIG. 5, a block diagram of a computer system 500 suitable for use in implementing a server according to embodiments of the present application is shown. The server shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the method of the present application when executed by the Central Processing Unit (CPU) 501.

It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an audio recognition unit, a first direction information acquisition unit, and an audio information acquisition unit. The names of these units do not in some cases constitute a limitation on the unit itself, and for example, an audio information acquisition unit may also be described as a "unit for acquiring audio information of a sound source".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring audio to be processed in real time, and performing audio identification on the audio to be processed; responding to the fact that a wake-up signal exists in the audio to be processed, acquiring first direction information of the wake-up signal, and acquiring audio information of a sound source corresponding to the first direction information, wherein the first direction information is used for representing the direction of the sound source sending the wake-up signal; and performing data processing on the audio information in response to that second direction information of the audio information is the same as the first direction information of the wake-up signal, wherein the second direction information is used for representing the direction in which a sound source emitting the audio information is located.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for obtaining audio information, the method comprising:

acquiring audio to be processed in real time, and performing audio identification on the audio to be processed;

responding to the fact that a wake-up signal exists in the audio to be processed, acquiring first direction information of the wake-up signal, and acquiring audio information of a sound source corresponding to the first direction information, wherein the first direction information is used for representing the direction of the sound source sending the wake-up signal;

performing data processing on audio information in response to that second direction information of the audio information is the same as the first direction information of the wake-up signal, wherein the second direction information is used for representing the direction of a sound source which emits the audio information;

and in response to that the second direction information of the audio information is different from the first direction information of the wake-up signal, when the wake-up signal is detected to exist in the audio information, the first direction information of the wake-up signal is obtained again.

2. The method of claim 1, further comprising:

one microphone is selected from the microphone array as a wake-up signal monitoring microphone, and the wake-up signal monitoring microphone is used for collecting audio to be processed.

3. The method of claim 1, wherein the obtaining the first direction information of the wake-up signal comprises:

acquiring to-be-processed audio collected by each microphone in a microphone array, and determining the marking time of the wake-up signal in each to-be-processed audio to obtain a marking time set;

sequencing the marking time in the marking time set according to the time sequence to obtain a marking time sequence;

and setting first direction information of the wake-up signal according to the spatial direction of the microphone corresponding to the previously set marking time in the marking time sequence, wherein the spatial direction is used for representing the direction of the microphone for collecting audio.

4. The method according to claim 3, wherein the setting of the first direction information of the wake-up signal according to the spatial orientation of the microphone corresponding to the previous set mark time in the mark time sequence comprises:

inquiring a microphone space direction table to obtain a space angle corresponding to the space direction of each microphone, wherein the microphone space direction table is used for representing the corresponding relation between the space direction of the microphone and the space angle of the collected audio at the space position where the microphone is located;

and setting the angle range formed by the space angles corresponding to the space directions of the microphones corresponding to the previously set marking time in the marking time sequence as the first direction information of the wake-up signal according to the angle range.

5. An apparatus for obtaining audio information, the apparatus comprising:

the audio recognition unit is used for acquiring audio to be processed in real time and performing audio recognition on the audio to be processed;

the first direction information acquisition unit is used for responding to the detection that the wake-up signal exists in the audio to be processed, acquiring first direction information of the wake-up signal and acquiring audio information of a sound source corresponding to the first direction information, wherein the first direction information is used for representing the direction of the sound source sending the wake-up signal;

the audio information acquisition unit is used for responding that second direction information of the audio information is the same as the first direction information of the wake-up signal and processing the audio information, wherein the second direction information is used for representing the direction of a sound source which emits the audio information;

and the first direction information updating unit responds to that the second direction information of the audio information is different from the first direction information of the wake-up signal and is used for reacquiring the first direction information of the wake-up signal when the wake-up signal is detected to exist in the audio information.

6. The apparatus of claim 5, further comprising:

the microphone setting unit is used for selecting one microphone from the microphone array as a wake-up signal monitoring microphone, and the wake-up signal monitoring microphone is used for collecting audio to be processed.

7. The apparatus according to claim 5, wherein the first direction information acquiring unit includes:

the to-be-processed audio acquisition subunit is used for acquiring to-be-processed audio acquired by each microphone in the microphone array, and determining the marking time of the wake-up signal in each to-be-processed audio to obtain a marking time set;

the marking time sequence acquiring subunit is used for sequencing the marking times in the marking time set according to the time sequence to obtain a marking time sequence;

and the first direction information setting subunit is used for setting the first direction information of the wake-up signal according to the spatial direction of the microphone corresponding to the previously set marking time in the marking time sequence, wherein the spatial direction is used for representing the direction of the microphone for collecting the audio.

8. The apparatus of claim 7, wherein the first direction information setting subunit comprises:

the microphone spatial direction table is used for representing the corresponding relation between the spatial direction of the microphone and the spatial angle of the collected audio at the spatial position of the microphone;

and the first direction information setting module is used for setting the first direction information of the wake-up signal according to an angle range formed by space angles corresponding to the space directions of the microphones corresponding to the previously set marking time in the marking time sequence.

9. A server, comprising:

one or more processors;

a memory for storing one or more programs;

the microphone array is used for collecting audio information of a sound source;

the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-4.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 4.