WO2015090137A1

WO2015090137A1 - A voice message search method, device, and system

Info

Publication number: WO2015090137A1
Application number: PCT/CN2014/092426
Authority: WO
Inventors: Yelu LIU
Original assignee: Tencent Technology (Shenzhen) Company Limited
Priority date: 2013-12-17
Filing date: 2014-11-28
Publication date: 2015-06-25
Also published as: CN104714981A; CN104714981B

Abstract

The present disclosure discloses a voice message search method, device, and system, and relates to the field of mobile Internet. The method comprises: obtaining a text search keyword； searching for a text message that includes the text search keyword from the text messages respectively corresponding to each voice message, with each text message generated based on speech recognition result of the corresponding the voice message； feeding back as the search result the voice message corresponding to the text message that includes the text search keyword. The present disclosure solves the problem of low search efficiency with the voice message search method provided by the prior art, allowing a user to quickly and conveniently find the target voice message simply by inputting the text search keyword.

Description

A VOICE MESSAGE SEARCH METHOD, DEVICE, AND SYSTEM CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to Chinese Patent Application No. 201310695093. X, filed on December 17, 2013, which is hereby incorporated by reference in its entirety.

FIELD

The present disclosure relates to the field of mobile Internet, and more particularly to a voice message search method, device, and system.

BACKGROUND

A voice instant messaging application is an application that allows two or more parties to communicate with each other instantly by exchanging voice messages. Such applications include

Yixin, Line, and Laiwang. Voice instant messaging applications are now among the applications that are the most widely used on mobile terminals, including smartphones, tablet PCs, and eBook readers.

When using a voice instant messaging application, a user may need to search for the target content from the history voice messages. For example, after user A and user B send dozens of voice messages to each other arranging a meeting, user A may need to, from the voice messages sent by user B, find the one that contains information about the location arranged for the meeting. In this case, the existing voice message search method comprises the following: the user uses a mobile terminal to play all the voice messages one by one or play a voice message selected based on guess； after a voice message is played, the user determines whether the voice message contains the target content； if yes, the user stops the search； if no, the user continue to play the next voice message by using the mobile terminal.

During the implementation of the present disclosure, the inventor has found that the existing art has at least the following problems. If the number of voice messages is large, searching for the target content by playing voice messages one by one is very inefficient. In addition, the user's judgment also deteriorates due to repeated clicking and the visual fatigue caused by sliding operations. Consequently, the overall efficiency of the searches performed using the above-mentioned voice message search method is low.

SUMMARY

To solve the problem of inefficient search using the voice message search method provided by the prior art, the embodiments of the present disclosure provide a voice message search method, device, and system. The technical solution is as follows:

In a first aspect, a voice message search method is provided, for use on a client and comprising: obtaining a text search keyword； searching for the text message that includes the search keyword from the text messages respectively corresponding to each voice message, with each text message generated based on speech recognition result of the corresponding the voice message； feeding back as the search result the voice message corresponding to the text message that includes the search keyword.

In a second aspect, a voice message search device is provided. The device includes: a search acquisition module, a text search module, and a result feedback module. The search acquisition module is configured to obtain a text search keyword. The text search module is configured to search for the text message that includes the search keyword from the text messages respectively corresponding to each voice message, with each text message generated based on speech recognition result of the corresponding the voice message. The result feedback module is configured to feed back as the search result the voice message corresponding to the text message that includes the search keyword.

In a third aspect, a voice message search system is provided, comprising a client and a server, with the client and the server interconnected using a wireless network or wired network； The client can be the voice message search device described in the above-mentioned third aspect.

The technical solution provided by the embodiments of the present disclosure has the following benefits:

By obtaining a text search keyword, the solution obtains the search result by searching for the text message that includes the search keyword from the text messages respectively corresponding to each voice message. This solves the problem of inefficient search using the voice message search method provided by the prior art, allowing a user to quickly and conveniently find the target voice message simply by inputting a search keyword.

BRIEF DESCRIPTION OF THE DRAWINGS

To more clearly describe the technical solution provided by the embodiments of the present disclosure, the following gives an overview of the drawings needed to describe the embodiments. Obviously, the following drawings show only some of the embodiments of the present disclosure, and those of ordinary skill in existing art may obtain other drawings based on these drawings without creative work.

Figure 1 shows the flowchart of the voice message search method provided by an embodiment of the present disclosure.

Figure 2A shows the flowchart of the voice message search method provided by another embodiment of the present disclosure.

Figures 2B to 2E show the schematic diagrams for the implementation interfaces related in the embodiment as shown in Figure 2A.

Figure 3 shows the structural diagram of the voice message search device provided by an embodiment of the present disclosure.

Figure 4 shows the structural diagram of the voice message search device provided by another embodiment of the present disclosure.

Figure 5 shows the structural diagram of the voice message search system provided by an embodiment of the present disclosure.

DETAILED DESCRIPTION OF THE DRAWINGS

Reference throughout this specification to "embodiments, " "an embodiment, " "example embodiment, " or the like in the singular or plural means that one or more particular features, structures, or characteristics described in connection with an embodiment is included in at least embodiments of the present disclosure. Thus, the appearances of the phrases "in embodiments" or "in an embodiment, " "in an example embodiment, " or the like in the singular or plural in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

The terminology used in the description of the disclosure herein is for the purpose of describing particular examples only and is not intended to be limiting of the disclosure. As used in the description of the disclosure and the appended claims, the singular forms "a, " "an, " and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise. It will also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms "may include, " "including, ""comprises, " and/or "comprising, " when used in this specification, specify the presence of stated features, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, operations, elements, components, and/or groups thereof.

As used herein, the term “module” or “unit” may refer to, be part of, or include an Application Specific Integrated Circuit (ASIC) ； an electronic circuit； a combinational logic circuit； a field programmable gate array (FPGA) ； a processor (shared, dedicated, or group) that executes code； other suitable hardware components that provide the described functionality； or a combination of some or all of the above, such as in a system-on-chip. The term module or unit may include memory (shared, dedicated, or group) that stores code executed by the processor.

The exemplary environment may include a server, a client, and a communication network. The server and the client may be coupled through the communication network for information exchange, such as sending/receiving identification information, sending/receiving data files such as splash screen images, etc. Although only one client and one server are shown in the environment, any number of terminals or servers may be included, and other devices may also be included.

The communication network may include any appropriate type of communication network for providing network connections to the server and client or among multiple servers or clients. For example, communication network may include the Internet or other types of computer networks or telecommunication networks, either wired or wireless. In a certain embodiment, the disclosed methods and apparatus may be implemented, for example, in a wireless network that includes at least one client.

In some cases, the client may refer to any appropriate user terminal with certain computing capabilities, such as a personal computer (PC) , a work station computer, a server computer, a hand-held computing device (tablet) , a smart phone or mobile phone, or any other user-side computing device. In various embodiments, the client may include a network access device. The client may be stationary or mobile.

A server, as used herein, may refer to one or more server computers configured to provide certain server functionalities, such as database management and search engines. A server may also include one or more processors to execute computer programs in parallel. A user, as used herein, may refer to one or more persons or things that control a client. The user may control more than one clients or other devices.

The solutions in the embodiments of the present disclosure are clearly and completely described in combination with the attached drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only a part, but not all, of the embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments acquired by those of ordinary skill in the art under the precondition that no creative efforts have been made shall be covered by the protective scope of the present disclosure.

To make clearer the purpose, technical solution, and benefits of the present disclosure, the following describes in further details the embodiments of the present disclosure based on the drawings.

Inthe embodiments of the present disclosure, a client can be an application client that allows two or more parties to communicate with each other by exchanging voice messages on terminals such as smartphones, tablet PCs, eBook readers, Moving Picture Experts Group Audio Layer III (MP3) players, and Moving Picture Experts Group Audio Layer IV (MP4) players.

Figure 1 shows the flowchart of the voice message search method provided by embodiments of the present disclosure. In the embodiments, the scenario where the voice message search method is used for a client that allows two or more parties to communicate with each other by exchanging voice messages is described as an example. The above-mentioned method comprises:

Step 102: Obtain a text search keyword.

The client may directly obtain a search keyword directly input as text. The client may also obtain search voice signals that the user inputs as voice, and then use a speech recognition technology locally or on the server to identify the search keyword in text format from the search voice signals.

Step 104: Search for the text message that includes the search keyword from the text messages respectively corresponding to each voice message, with each text message generated based on speech recognition result of the corresponding voice message.

Each voice message respectively corresponds to a text messages. Each text message is generated based on speech recognition result of the corresponding voice message.

Step 106: Feed back as a search result the voice message corresponding to the text message that includes the search keyword.

In short, a voice message search method is provided by the present embodiments. The client obtains a text search keyword and then obtains the search result by searching for the text message that includes the search keyword from the text messages respectively corresponding to each voice message. The method solves the problem of inefficient search using the voice message search method in the prior art, allowing a user to quickly and conveniently find the target voice message simply by inputting a search keyword.

Figure 2A shows the flowchart of the voice message search method provided by embodiments of the present disclosure. In the embodiments, the scenario where the voice message search method is used for a client that allows two or more parties to communicate with each other by exchanging voice messages is described as an example. The above-mentioned method comprises:

Step 201: Obtain and store the text messages corresponding to each voice message.

As a voice message is stored and transferred in a voice format, the client needs to first obtain and store the text messages corresponding to each voice message. For example, the client needs to convert the voice message "Hello, this is John. " into the text message "Hello, this is John. " which is stored in association with the voice message.

This step may be implemented by using any of the following three methods:

In the first implementation mode, the client performs speech recognition on each voice message to obtain respective speech recognition results, and based on the speech recognition results, generates the text messages respectively corresponding to each voice message.

In this implementation mode, the terminal running the client needs to have powerful processing capabilities. Preferentially, the client performs the above-mentioned speech recognition procedure during idle time.

In the second implementation mode, the client sends each voice message to the server and receives the text messages returned by the server corresponding to each voice message. The text messages are generated based on the speech recognition results obtained after the server performs speech recognition on the voice messages.

At a preset time interval, during idle time, or when connected to a wireless local area network (LAN) , the client may send all or some of the local voice messages to the server, each voice message having a unique message ID. After receiving voice messages from the client, the server performs speech recognition on each voice message to obtain respective speech recognition results and generates the corresponding text messages based on the speech recognition results. Then, the server returns each text message to the client, with each text message having the message ID of the corresponding voice message. The client receives and stores the text messages corresponding to each voice message.

In the third implementation mode, the client receives the voice messages sent by other clients and forwarded by the server and the text messages corresponding to the voice messages, the text message being generated based on the speech recognition result obtained after the server performs speech recognition on the voice message； and/or, after sending the local voice message, the client receives the text message returned by the server corresponding to the voice message, the text message being generated based on the speech recognition result obtained after the server performs speech recognition on the voice message.

A voice message is generated by the communication between clients and needs to be forwarded by a server during transfer. Before forwarding a voice message, the server performs speech recognition on the voice message to obtain the speech recognition result and then generates the corresponding text message. Then, the server sends the voice message and the text message corresponding to the voice message to the target client. The target client receives and stores the voice messages sent by other clients and simultaneously forwarded by the server and the text messages corresponding to the voice messages. In addition, the server returns a text messages to the source client that sent the text message. After sending a local voice message, a source client receives and stores the text message returned by the server that corresponds to the voice message.

Obviously, if the processing capabilities of the server are powerful, the third mode is the preferred mode of implementing this step.

Step 202: The client obtains a text search keyword.

Generally, the client can obtain a text search keyword by in one of the following three modes:

In the first mode, the client obtains a search keyword directly input as text.

For example, the client may receive the search keyword "Tomorrow let's go to" that user A directly inputs as text in the text search box 22, as shown in Figure 2B of a voice instant messaging application in a terminal device.

In the second mode, the client obtains the search voice signals that the user inputs as voice, and the client uses a speech recognition technology to identify the search keyword in text format from the search voice signals.

For example, if the processing capabilities of the terminal running the client are powerful, upon receiving a signal indicating that user A has pressed the voice search button 24, the client uses the microphone 26 of the terminal to receive the search voice signals that the user inputs as voice. Then, the client uses a speech recognition technology to identify the search keyword "Tomorrow let's go to" from the search voice signals, as shown in Figure 2C.

In the third mode, the client obtains the search voice signals that the user inputs as voice, and then the client sends the search voice signals to the server. The client receives the search keyword returned by the server, with the search keyword identified by the server from the search voice signals by using a speech recognition technology.

Step 203: Search for the text message that includes the search keyword from the text messages respectively corresponding to each voice message, with each text message generated based on speech recognition result of the corresponding voice message.

To improve the search efficiency, this step may include the following sub-steps:

First, according to the preset conditions, sort the text messages corresponding to each voice message to be searched, the preset conditions including at least one of the following: the generation time corresponding to each voice message, priorities of the contacts corresponding to each voice message, and data sizes of each text message.

Generally, the voice messages to be searched refer to the voice messages of the contacts related to the current interface shown on the client. The current interface may be the active user interface shown on the terminal device. For example, if the current active interface is the chat interface between contact A and contact B, the voice messages to be searched are the voice messages generated during the voice chat between contact A and contact B. For another example, if the current interface is the chat interface of a group, the voice messages to be searched are the voice messages generated during the voice chat among the contacts belonging to the group. For yet another example, if the current interface is not a chat interface, the voice messages to be searched can be all the voice messages globally. In summary, the client may select the messages to be searched based on the content of the active user interface.

If the preset conditions include the generation time corresponding to each voice message, the client can sort the text messages corresponding to each voice message based on the generation time corresponding to each voice message. For example, the client sorts the text messages corresponding to each voice message in ascending order or descending order of the generation time corresponding to each voice message. For another example, if the number of voice messages is very large, in descending order of the forget possibility corresponding to different generation time andbased on the human forget curve, the client sorts the text messages corresponding to each voice message to be searched. For yet another example, the user may have located the current interface to the voice messages generated within a time segment other than the latest time segment, for example, the chat messages generated the day before yesterday. In this case, the client may sort to the beginning the text messages corresponding to the voice messages generated within that time segment and sort the text messages corresponding to the voice messages generated within other time segments.

If the preset conditions include the priorities of the contacts corresponding to each voice message, the client may, based on the priorities of the contacts, sort the text messages corresponding to each voice message to be searched. The priorities can be preset by the client. For example, if it is more possibly that the search result is found among the voice messages of other contacts, the client may set the priorities for other contacts higher than those of the contacts corresponding to the current client. That is, if the voice messages are the chat messages exchanged between the current contact A and other contact B, the text messages corresponding to the voice messages of other contact B are sorted in front of the text messages corresponding to the voice messages of the current contact A. Thus, the search is performed preferentially in the text messages corresponding to contact B. For another example, the client can also set different priorities for contacts based on the numbers of history messages of each contact and the levels of friendliness of each contact with the current contact A.

If the preset conditions include the data sizes of each text message, the client can, in descending order or ascending order of the data sizes of each text message, sort the text messages corresponding to each voice message to be searched.

Note that two or three of the above-mentioned preset conditions may be combined to use simultaneously for sorting the messages. The client can first perform sorting based on one of the conditions and then, based on another condition, continue to sort the sorting result obtained using the preceding condition. For example, the client can sort each text message based on the priorities of the contacts and then continue to sort the text messages of the same contact in ascending order of the generation time of the corresponding voice messages.

Note that the above-mentioned sorting may be performed before or during step 202. For example, when the client receives a signal indicating that the user has pressed the voice search button 24, the sorting is triggered. Concurrently, the client receives the search voice signals input by the user after or during the sorting.

Second, from the sorted text messages, a search is performed for the text message that includes the search keyword.

Then, from the sorted text messages, the terminal searches for the text message that includes the search keyword.

For example, the client searches for and finds the text message "Tomorrow let's go to the Curious Dinosaur Park. It's Halloween tomorrow. There is a haunted house over there", which includes the search keyword "Tomorrow let's go to. "

Step 204: Feed back as a search result the voice message corresponding to the text message that includes the search keyword.

After finding the text message that includes the search keyword, the terminal displays or plays, as the search result on the current interface, the voice message corresponding to the text message that includes the search keyword.

The client not only can use the found voice messages as the search results but also can use the found text messages as the search results. In addition, the client can also use the found voice messages and the corresponding text messages as the search results. The mode of displaying search results can be set by the user. For example, the user can set the mode in which voice messages are always used as the search results for feedback, as shown in Figure 2B. The mode of displaying search results can also be determined based on the current scenario mode of the terminal. For example, if the scenario mode of the terminal is currently set to "Outdoor" , the client uses the found voice messages as the search results for feedback； if the scenario mode of the terminal is currently set to "Mute", the client uses the found text messages as the search results for feedback, or the client uses the found voice messages and the corresponding text messages as the search results for feedback, as shown in Figure 2C.

To sum up, the voice message search method provided by the present embodiment, by obtaining a text search keyword, obtains the search result by searching for the text message that includes the search keyword from the text messages respectively corresponding to each voice message. This solves the problem of inefficient search using the voice message search method provided by the prior art, allowing a user to quickly and conveniently find the target voice message simply by inputting a search keyword.

In addition, in the present embodiment, text messages are sorted according to preset conditions to accelerate the search. Particularly, in two-party or multi-party chat using voice messages, voice messages can be sorted giving another contact a priority higher than that of the current contact, thus significantly accelerating the search.

Note that, to make searches faster, the client can, before the sorting, receive a selection signal indicating that the user has selected the target contact from at least two contacts related to the current interface； then, the client determines the voice messages of the selected target contact as each voice message to be searched.

As shown in Figure 2D, after a search is triggered, the client may provide the interface 27 for selecting at least two contacts related to the current interface. Then, the user can select all or some of the contacts. Based on the received selection signal, the client determines the voice messages of the target contacts "Jack" and "Ashley, " selected from the three contacts in the group, as the voice messages to be searched. Thus, the range of the voice messages to be searched is narrowed down, improving the search efficiency. In group chat where the voice messages to be searched involve multiple persons or in a scenario where all the contacts are related to the current interface, this implementation mode can significantly accelerate searches.

Similarly, before the sorting, the client may receive a selection signal indicating that the user has selected the target time segment from at least two preset candidate time segments； then the client determines the voice messages that belong to selected the target time segment as the voice messages to be searched.

As shown in Figure 2E, after a search is triggered, the client may provide the interface 28 for selecting at least two time segments. Then, the user can select all or some of the time segments. Based on the received selection signal, the client determines the voice messages generated during the selected time segment "recent week" as the voice messages to be searched. Thus, the range of the voice messages to be searched is narrowed down, improving the search efficiency. In a scenario where the voice messages to be searched include multiple voice messages generated over a very long period of time, this implementation mode can significantly accelerate searches.

The following describes an embodiment of the device provided by the present disclosure. For details not given, see the above-mentioned method embodiments that correspond with each other.

See Figure 3, which shows the structural diagram for the voice message search device provided by an embodiment of the present disclosure. By using software, hardware, or a combination of software and hardware, the voice message search device can be implemented as all or part of a client or a terminal. The voice message search device 300 includes a hardware processor 302 and a non-transitory storage medium 304 configured to store the following modules: a search acquisition module 320, a text search module 340, and a result feedback module 360. The search acquisition module 320 is configured to obtain a text search keyword. The text search module 340 is configured to search for the text message that includes the search keyword from the text messages respectively corresponding to each voice message, with each text message generated based on speech recognition result of the corresponding the voice message. The result feedback module 360 is configured to feed back as the search result the voice message corresponding to the text message that includes the search keyword.

To sum up, a voice message search device is provided by the embodiments. By obtaining a text search keyword, the voice message search device obtains the search result by searching for the text message that includes the search keyword from the text messages respectively corresponding to each voice message. This solves the problem of inefficient search using the voice message search method provided by the prior art, allowing a user to quickly and conveniently find the target voice message simply by inputting a search keyword.

Figure 4 shows the structural diagram for the voice message search device 300 provided by another embodiment of the present disclosure. By using software, hardware, or a combination of software and hardware, the voice message search device may be implemented as all or part of a client or a terminal. The voice message search device 300 include a hardware processor 302 and a non-transitory storage medium 304 configured to store the following modules:

the search acquisition module 320, configured to obtain a text search keyword；

the text search module 340, configured to search for the text message that includes the search keyword from the text messages respectively corresponding to each voice message, with each text message generated based on speech recognition result of the corresponding the voice message；

the result feedback module 360, configured to feed back as the search result the voice message corresponding to the text message that includes the search keyword.

Optionally, the device may further include the text generation module 310.

The text generation module 310 is configured to perform speech recognition on each voice message to obtain respective speech recognition results； based on the speech recognition results, generate the text messages respectively corresponding to each voice message；

or,

the text generation module 310 is configured to send each voice message to the server； receive the text messages fed back by the server that respectively correspond to each voice message； the text message is generated based on the speech recognition results obtained after the server performs speech recognition on each voice message；

or,

the text generation module 310 is configured to receive the voice message sent by another client and forwarded by the server and the text message corresponding to the voice message, the text message being generated based on the speech recognition result obtained after the server performs speech recognition on the voice message； and/or, after sending the local the voice message, receive the text message fed back by the server corresponding to the voice message, the text message being generated based on the speech recognition result obtained after the server performs speech recognition on the voice message.

Optionally, the text search module 340 may include the message sorting module 342 and the sorting search module 344. The message sorting module 342 is configured to, according to the preset conditions, sort the text messages corresponding to each voice message to be searched, the preset conditions including at least one of the following: the times corresponding to each voice message, priorities of the contacts corresponding to each voice message, and data sizes of each text message. The sorting search module 344, is configured to, from sorted the text messages, search for the text message that includes the search keyword.

Optionally, the text search module 340 may further include a contact selection module and a contact determination module (not shown in figure) . The contact selection module is configured to receive a selection signal for selecting the target contact from at least two contacts related to the current interface. The contact determination module is configured to determine the voice messages that belong to selected the target contact as the each voice message to be searched.

Optionally, the text search module 340 may further include a time segment selection module and a time segment determination module (not shown in figure) . The time segment selection module is configured to receive a selection signal for selecting the target time segment from at least two preset candidate time segments. The time segment determination module is configured to determine the voice messages that belong to selected the target time segment as the each voice message to be searched.

Optionally, the search acquisition module 320 is configured to obtain the search keyword directly input as text；

or,

the search acquisition module 320 is configured to obtain search voice signals input as text； use a speech recognition technology to identify the search keyword in text format from the search voice signals；

or,

the search acquisition module 320 is configured to obtain search voice signals input as voice； send the search voice signals to the server； receive the search keyword fed back by the server, with the search keyword identified by the server from the search voice signals by using a speech recognition technology.

Note that, the voice message search performed by the voice message search device provided by the above-mentioned embodiment is described by using only the division of the above-mentioned function modules as an example. In actual application, the above-mentioned functions can be assigned to different function modules for completion as needed. That is, the internal structure of the device can be divided into different function modules to complete all or some of the above-mentioned functions. In addition, the voice message search device and voice message search method provided by the above-mentioned embodiments adopt the same concept. For implementation details, see the description of the method provided by the above-mentioned embodiments.

Figure 5 shows the structural diagram for the voice message search system provided by an embodiment of the present disclosure. The voice message search system comprises at least a client 520 and a server 540. The client 520 and server 540 are interconnected using a wireless network or wired network.

The client 520 includes the voice message search device provided by the embodiment as shown in Figure 3 or Figure 4.

The sequence numbers of the above-mentioned embodiments are intended only for description, instead of indicating the priorities of the embodiments.

Those of ordinary skill in the existing art can understand that all of or part of the steps described in the above-mentioned embodiments can be completed by hardware or by related hardware as instructed by a program. The program can be stored on a computer-readable storage medium. The storage medium can be a Read-Only Memory (ROM) , a magnetic disk, or an optical disk.

While the present disclosure has been particularly disclosed and described above with reference to preferred embodiments, it should be understood that the description is not intended to limit the present disclosure. Any modifications, equivalent substitutions, and improvements made without departing from the spirit or principle of the present disclosure shall fall within the scope of the present disclosure.

Person of skill in the art can get aware that the whole or part of method in embodiments above may be realized through relevant hardware under instruction of computer program, in which the program may be stored in a computer-readable memory medium. When the program is executed, flow processes in embodiments of method above may be contained. Therein, the memory medium above may be diskette, optical disk, Read-Only Memory (ROM) or Random Access Memory (RAM) , or the like.

All disclosures above are just some of the preferred embodiments of the disclosure, which are descried specifically and particularly but not intending to limit the range of the disclosure. It should be noticed that person of skill in the art can make various changes and modifications within the scope of the disclosure, therefore, the protection scope of the present disclosure is defined by the claims.

Claims

A method for searching voice message in a terminal, comprising:

obtaining, by the terminal, a text search keyword；

searching, by the terminal, for a text message that comprises the text search keyword from text messages respectively corresponding to each voice message, with each text message generated based on speech recognition result of the corresponding the voice message； and

feeding back, by the terminal, as a search result the voice message corresponding to the text message that comprises the text search keyword.
The method according to claim 1, wherein before searching for the text message that comprises the text search keyword from the text messages respectively corresponding to each voice message, the method further comprises at least one of the following:

performing speech recognition on each voice message to obtain respective speech recognition results； based on the speech recognition results, generating the text messages respectively corresponding to each voice message；

sending each voice message to a server； receiving the text messages returned by the server that respectively correspond to each voice message； the text message is generated based on the speech recognition results obtained after the server performs speech recognition on each voice message；

and

receiving the voice message sent by another client and forwarded by the server and the text message corresponding to the voice message, the text message being generated based on the speech recognition result obtained after the server performs speech recognition on the voice message； and/or, after sending the voice message from the client, receiving the text message returned by the server corresponding to the voice message, the text message being generated based on the speech recognition result obtained after the server performs speech recognition on the voice message.
The method according to claim 1, wherein searching for the text message that comprises the text search keyword from the text messages respectively corresponding to each voice message comprises:

according to preset conditions, sorting the text messages corresponding to each voice message to be searched, the preset conditions comprising at least one of the following: generation time corresponding to each voice message, priorities of contacts corresponding to each voice message, and data sizes of each text message； and

from sorted the text messages, searching for the text message that comprises the text search keyword.
The method according to claim 3, wherein before sorting, according to the preset conditions, the text messages corresponding to each voice message to be searched, further comprises:

receiving a selection signal for selecting target contact from at least two contacts related to a current interface； and

determining the voice messages that belong to selected the target contact as the each voice message to be searched.
The method according to claim 3, wherein before sorting, according to the preset conditions, the text messages corresponding to each voice message to be searched, further comprises:

receiving a selection signal for selecting a target time segment from at least two preset candidate time segments； and

determining the voice messages that belong to selected the target time segment as the each voice message to be searched.
The method according to any one of claims 1 to 5, wherein obtaining a text search keyword comprises at least one of the following:

obtaining the text search keyword directly input as text；

obtaining search voice signals input as voice； using a speech recognition technology to identify the text search keyword in text format from the search voice signals； and

obtaining search voice signals input as voice； sending the search voice signals to a server； receiving the text search keyword returned by the server, with the text search keyword identified by the server from the search voice signals by using a speech recognition technology.
A device comprising a processor and a non-transitory storage medium configured to store modules comprising:

a search acquisition module, configured to obtain a text search keyword；

a text search module, configured to search for a text message that comprises the search keyword from the text messages respectively corresponding to each voice message, with each text message generated based on speech recognition result of the corresponding the voice message； and

a result feedback module, configured to feed back as a search result the voice message corresponding to the text message that comprises the text search keyword.
The device according to claim 7, further comprising a text generation module configured to perform at least one of the following:

perform speech recognition on each voice message to obtain respective speech recognition results； based on the speech recognition results, generate the text messages respectively corresponding to each voice message；

send each voice message to a server； receive the text messages returned by the server that respectively correspond to each voice message； the text message is generated based on the speech recognition results obtained after the server performs speech recognition on each voice message； and

receive the voice message sent by another client and forwarded by the server and the text message corresponding to the voice message, the text message being generated based on the speech recognition result obtained after the server performs speech recognition on the voice message； and/or, after sending the voice message, receive the text message returned by the server corresponding to the voice message, the text message being generated based on the speech recognition result obtained after the server performs speech recognition on the voice message.
The device according to claim 7, wherein the text search module further comprises:

a message sorting module, configured to, according to preset conditions, sort the text messages corresponding to each voice message to be searched, the preset conditions comprising at least one of the following: generation time corresponding to each voice message, priorities of contacts corresponding to each voice message, and data sizes of each text message； and

a sorting search module, configured to, from sorted the text messages, search for the text message that comprises the text search keyword.
The device according to claim 9, wherein the text search module further comprises a contact selection module and a contact determination module；

the contact selection module is configured to receive a selection signal for selecting target contact from at least two contacts related to a current interface； and

the contact determination module is configured to determine the voice messages that belong to selected the target contact as the each voice message to be searched.
The device according to claim 9, wherein the text search module further comprises a time segment selection module and a time segment determination module；

the time segment selection module is configured to receive a selection signal for selecting a target time segment from at least two preset candidate time segments； and

the time segment determination module is configured to determine the voice messages that belong to selected the target time segment as the each voice message to be searched.
The device according to any of claims 7 to 11, wherein the search acquisition module is configured to perform at least one of the following:

obtain the text search keyword directly input as text；

obtain search voice signals input as text； use a speech recognition technology to identify the text search keyword in text format from the search voice signals； and

obtain search voice signals input as voice； send the search voice signals to a server； receive the text search keyword returned by the server, with the text search keyword identified by the server from the search voice signals by using a speech recognition technology.
A non-transitory storage medium storing a set of instructions for searching voice message in a device having a processor, the set of instructions to direct the processor to perform acts of:

obtaining a text search keyword；

searching for a text message that comprises the text search keyword from text messages respectively corresponding to each voice message, with each text message generated based on speech recognition result of the corresponding the voice message； and

feeding back as a search result the voice message corresponding to the text message that comprises the text search keyword.
The non-transitory storage medium according to claim 13, the set of instructions to direct the processor to perform at least one of the following:

performing speech recognition on each voice message to obtain respective speech recognition results； based on the speech recognition results, generating the text messages respectively corresponding to each voice message；

sending each voice message to a server； receiving the text messages returned by the server that respectively correspond to each voice message； the text message is generated based on the speech recognition results obtained after the server performs speech recognition on each voice message；

and

receiving the voice message sent by another client and forwarded by the server and the text message corresponding to the voice message, the text message being generated based on the speech recognition result obtained after the server performs speech recognition on the voice message； and/or, after sending the voice message from the client, receiving the text message returned by the server corresponding to the voice message, the text message being generated based on the speech recognition result obtained after the server performs speech recognition on the voice message.
The non-transitory storage medium according to claim 13, wherein searching for the text message that comprises the text search keyword from the text messages respectively corresponding to each voice message comprises:

according to preset conditions, sorting the text messages corresponding to each voice message to be searched, the preset conditions comprising at least one of the following: generation time corresponding to each voice message, priorities of contacts corresponding to each voice message, and data sizes of each text message； and

from sorted the text messages, searching for the text message that comprises the text search keyword.
The non-transitory storage medium according to claim 15, the set of instructions to direct the processor to perform acts of:

receiving a selection signal for selecting target contact from at least two contacts related to a current interface； and

determining the voice messages that belong to selected the target contact as the each voice message to be searched.
The non-transitory storage medium according to claim 15, the set of instructions to direct the processor to perform acts of:

receiving a selection signal for selecting a target time segment from at least two preset candidate time segments； and

determining the voice messages that belong to selected the target time segment as the each voice message to be searched.
The non-transitory storage medium according to any one of claims 13 to 17, wherein obtaining a text search keyword comprises at least one of the following:

obtaining the text search keyword directly input as text；

obtaining search voice signals input as voice； using a speech recognition technology to identify the text search keyword in text format from the search voice signals； and

obtaining search voice signals input as voice； sending the search voice signals to a server； receiving the text search keyword returned by the server, with the text search keyword identified by the server from the search voice signals by using a speech recognition technology.