CN111242721A

CN111242721A - Voice meal ordering method and device, electronic equipment and storage medium

Info

Publication number: CN111242721A
Application number: CN201911401837.6A
Authority: CN
Inventors: 胡江鹭; 李和瀚; 孙辉丰; 丁鑫哲; 孙叔琦; 孙珂; 李婷婷
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2020-06-05
Anticipated expiration: 2039-12-30
Also published as: CN111242721B

Abstract

The application discloses a voice meal ordering method and device, and relates to the technical field of artificial intelligence. The specific implementation scheme is as follows: collecting voice of a user, carrying out voice recognition and semantic recognition on the voice, and acquiring semantic information; the semantic information includes: intent information and slot position information; when the intention information is ordering, determining an object to be ordered according to the semantic information and generating order information of the object to be ordered; comparing the pre-order information with a necessary information list corresponding to an object to be ordered, and judging whether the pre-order information lacks necessary information or not; when the pre-order information does not lack necessary information, the method executes order placing operation by combining the pre-order information and the personal information of the user.

Description

Voice meal ordering method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer application technologies, and in particular, to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for ordering food by voice, an electronic device, and a computer-readable storage medium.

Background

The demand of online ordering generally exists in daily life, and at present, two main ways of online ordering are available: one is that the user directly contacts the manual customer service/merchant to order; the other is that the user orders the food by self on a visual page through a food ordering application program or a website, but for the second online food ordering mode, the user needs to select the food from a plurality of commodities on a menu, and then order and pay, a series of ordering processes are complicated, and ordering efficiency is low; although the first food ordering mode is time-saving, worry-saving, convenient and quick for users; but for the merchant, more manual customer service is needed to respond to the demands of the users in time, the needed labor is large, and the cost is high.

Disclosure of Invention

The application provides a voice ordering method, a voice ordering device, electronic equipment and a computer readable storage medium, the voice of a user is subjected to voice recognition, semantic understanding and order management, order information of the user is determined, ordering is completed, a large number of customer service staff are prevented from being arranged on a merchant side, an ordering flow of the user is simplified, ordering cost is reduced, and ordering efficiency is improved.

An embodiment of a first aspect of the present application provides a method for voice ordering, including: collecting voice of a user, and carrying out voice recognition and semantic recognition on the voice to obtain semantic information; the semantic information includes: intent information and slot position information; when the intention information is ordering, determining an object to be ordered according to the semantic information and generating order information of the object to be ordered; comparing the pre-order information with a necessary information list corresponding to the object to be placed to judge whether the pre-order information lacks necessary information; and when the pre-order information does not lack necessary information, combining the pre-order information and the personal information of the user to execute order placing operation.

In an embodiment of the present application, the method for voice ordering further includes: when the pre-order information lacks necessary information, acquiring the lacked first necessary information, generating inquiry voice by combining the first necessary information, and playing the inquiry voice to the user so as to acquire the first necessary information by combining the voice replied by the user; and updating the order information according to the first necessary information until the order information does not lack the necessary information.

In an embodiment of the present application, the method for voice ordering further includes: and when the intention information is ordering, if the object to be ordered is not determined according to the semantic information, generating voice for inquiring the object to be ordered and playing the voice to the user so as to determine the object to be ordered by combining the voice replied by the user.

In an embodiment of the present application, the method for voice ordering further includes: when the intention information is recommendation, determining an object to be recommended by combining the semantic information; and generating a recommended voice according to the object to be recommended and playing the recommended voice to the user so as to facilitate the user to select.

In an embodiment of the present application, the method for voice ordering further includes: in the process of playing the voice, if the voice of the user is collected, carrying out voice recognition and semantic recognition on the collected voice to obtain a voice recognition result and a semantic recognition result; when the voice recognition result or the semantic recognition result meets a preset interruption condition, stopping playing voice and processing the collected voice; the preset interrupt condition comprises any one or more of the following conditions: the number of words in the voice recognition result is greater than a preset word number threshold, a preset interruption keyword exists in the voice recognition result, and intention information exists in the semantic recognition result; and after the collected voice is processed, continuing to play the voice.

According to the voice meal ordering method, voice of a user is collected, voice recognition and semantic recognition are carried out on the voice, and semantic information is obtained; the semantic information includes: intent information and slot position information; when the intention information is ordering, determining an object to be ordered according to the semantic information and generating order information of the object to be ordered; comparing the pre-order information with a necessary information list corresponding to the object to be placed to judge whether the pre-order information lacks necessary information; and when the pre-order information does not lack necessary information, combining the pre-order information and the personal information of the user to execute order placing operation. The method determines the order information of the user by performing voice recognition, semantic understanding and order management on the voice of the user, completes ordering, avoids arranging a large number of customer service personnel at a merchant side, simplifies the ordering process of the user, reduces the ordering cost and improves the ordering efficiency.

Another embodiment of the present application provides a voice meal ordering apparatus, including: the acquisition module is used for acquiring the voice of a user, performing voice recognition and semantic recognition on the voice and acquiring semantic information; the semantic information includes: intent information and slot position information; the generating module is used for determining an object to be placed and generating the order information of the object to be placed according to the semantic information when the intention information is placing an order; the comparison module is used for comparing the pre-order information with a necessary information list corresponding to the object to be placed and judging whether the pre-order information lacks necessary information or not; and the ordering module is used for executing ordering operation by combining the order information and the personal information of the user when the order information does not lack the necessary information.

In an embodiment of the present application, the voice ordering apparatus further includes: the device comprises a first acquisition module and an updating module; the first obtaining module is used for obtaining the lacking first necessary information when the necessary information is lacking in the order information, generating inquiry voice by combining the first necessary information and playing the inquiry voice to the user so as to obtain the first necessary information by combining the voice replied by the user; and the updating module is used for updating the order information according to the first necessary information until the order information does not lack the necessary information.

In an embodiment of the application, the generating module is further configured to, when the intention information is an order, if the object to be ordered is not determined according to the semantic information, generate a voice inquiring the object to be ordered and play the voice to the user, so as to determine the object to be ordered by combining the voice replied by the user.

In an embodiment of the present application, the voice ordering apparatus further includes: a determination module; the determining module is used for determining an object to be recommended by combining the semantic information when the intention information is recommended; the generating module is further used for generating recommended voice according to the object to be recommended and playing the recommended voice to the user so that the user can select the voice.

In an embodiment of the present application, the voice ordering apparatus further includes: a second acquisition module and a processing module; the second acquisition module is used for carrying out voice recognition and semantic recognition on the acquired voice if the voice of the user is acquired in the process of playing the voice, and acquiring a voice recognition result and a semantic recognition result; the processing module is used for stopping playing the voice and processing the collected voice when the voice recognition result or the semantic recognition result meets a preset interrupt condition; the preset interrupt condition comprises any one or more of the following conditions: the number of words in the voice recognition result is greater than a preset word number threshold, a preset interruption keyword exists in the voice recognition result, and intention information exists in the semantic recognition result; and the processing module is also used for continuously playing the voice after the collected voice is processed.

According to the voice meal ordering device, voice of a user is collected, voice recognition and semantic recognition are carried out on the voice, and semantic information is obtained; the semantic information includes: intent information and slot position information; when the intention information is ordering, determining an object to be ordered according to the semantic information and generating order information of the object to be ordered; comparing the pre-order information with a necessary information list corresponding to the object to be placed to judge whether the pre-order information lacks necessary information; and when the pre-order information does not lack necessary information, combining the pre-order information and the personal information of the user to execute order placing operation. The method determines the order information of the user by performing voice recognition, semantic understanding and order management on the voice of the user, completes ordering, avoids arranging a large number of customer service personnel at a merchant side, simplifies the ordering process of the user, reduces the ordering cost and improves the ordering efficiency.

An embodiment of a third aspect of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the voice ordering method of the embodiment of the application.

A fourth aspect of the present application is directed to a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the voice ordering method of the embodiment of the present application.

Other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present application;

FIG. 2 is a schematic illustration of a recommendation according to one embodiment of the present application.

FIG. 3 is a schematic illustration of a recommendation flow according to one embodiment of the present application;

FIG. 4 is a schematic illustration according to a second embodiment of the present application;

FIG. 5 is a schematic illustration according to a third embodiment of the present application;

FIG. 6 is a schematic illustration according to a fourth embodiment of the present application;

FIG. 7 is a schematic diagram according to a fifth embodiment of the present application

Fig. 8 is a block diagram of an electronic device for implementing a voice ordering method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

A voice ordering method, an apparatus, an electronic device, and a computer-readable storage medium according to embodiments of the present application are described below with reference to the accompanying drawings. The execution main body of the voice meal ordering method in the embodiment of the application is a voice meal ordering device.

Fig. 1 is a schematic diagram according to a first embodiment of the present application.

As shown in fig. 1, the voice ordering method may include:

step 101, collecting voice of a user, performing voice recognition and semantic recognition on the voice, and acquiring semantic information.

In the embodiment of the application, the voice meal ordering device can provide a voice input interface for a user, and voice data input by the user can be acquired through the interface. For example, the mobile terminal collects voice of the user through a microphone, and uploads the collected voice data to the voice ordering device through the voice input interface, so that the voice ordering device obtains the voice data input by the user.

Then, voice recognition and semantic recognition are performed on the collected voice data of the user, and then semantic information is obtained, wherein the semantic information may include but is not limited to intention information and slot position information. For example, voice recognition and semantic recognition are performed on "give me a cup of full-sugar hot latte", and the intention information analyzed is as follows: ordering, the slot position information is: one cup (quantity), full sugar (sugar content information), heat (temperature information), latte (drink type).

As an example, the collected user speech may be subjected to speech recognition and semantic recognition through speech recognition techniques and semantic recognition techniques. The Speech Recognition technology may be ASR (Automatic Speech Recognition) technology, such as: and (4) performing hundred-degree voice recognition. Semantic identification techniques may include, but are not limited to, semantic identification based on combined semantic derivation, semantic identification based on an intent classification model and a word slot entity identification model, semantic identification based on an intent word slot integral identification model, semantic identification based on sample instance generalization, and semantic identification through reference resolution.

Therefore, the collected speech of the user can be semantically recognized by different semantic recognition technologies, which are exemplified as follows:

the first example: the semantic recognition of the collected speech of the user can be performed based on the combined semantic derivation, that is, the speech of the user is analyzed based on the requirement structure derived by the rule. It can be understood that a type of requirements expressed by a user generally conforms to a certain pattern, requirements with the same pattern can be summarized to form a template, and correspondingly, the requirements of the user can be described through the template. For example, "i want one cup of coffee" and "give me two cups of milk tea" have the same pattern, they can be summarized as: [ kw _ wait: desired ] [ kw _ num: quantity ] [ kw _ drink: drink ] template form.

It should be noted that, under the condition of no corpus (or few corpora), a large number of templates are quickly constructed by summarizing and concluding which component segments the user needs can be decomposed into, and the templates are used as templates in the derivation based on the combined semantics to realize the analysis and acquisition of the intention information and the slot position information. Wherein, the corpus can be obtained in advance by related personnel.

The second example is: and performing semantic recognition on the collected voice of the user based on the intention classification model and the word slot entity recognition model. That is, corresponding intention information can be obtained from the collected voice input intention classification model of the user, and corresponding slot position information can be obtained by analyzing the collected voice input word slot entity recognition model of the user. It should be noted that, in order to ensure the accuracy of the intention classification model and the word groove entity recognition model, the intention classification model and the word groove entity recognition model need to be obtained by training through a large amount of corpora.

In a third example, semantic recognition is performed on the collected voice of the user based on the intention word slot integrated recognition model. Namely, the collected voice of the user is input into the intention word and slot integrated recognition model, and the model can output corresponding intention information and slot position information simultaneously. It should be noted that, in order to ensure the accuracy of the integrated identification model of the intended word slot, the integrated identification model of the intended word slot needs to be trained through a large amount of corpora, and a loss function in the model is optimized to obtain the integrated identification model of the intended word slot.

In a fourth example, semantic recognition is performed on collected user speech based on sample instance generalization. That is, the template is abstracted from the marked sample automatically according to the currently collected voice of the user, and the generalization of the same or similar sentence patterns is performed based on the template to obtain the corresponding intention information and slot position information. For example: "i want a cup of coffee" is labeled, and its intention information is: ordering, the slot position information is: one cup (quantity), coffee (type of drink), then generalizing by sample example, "i want two cups of milk tea" and labeled sample have the same sentence pattern, can be identified with intention information and slot information as well. It should be noted that the template generated by generalizing the sample instance can be generated in two ways, one is automatically generated, and the other is constructed by the related technical personnel.

A fifth example, determines the content pointed to by a pronoun in the user's speech by referring to resolution. That is, on the basis of any one of the first four examples, if a pronoun exists in the voice data of the user, it may be determined by referring to resolution to which noun or phrase the pronoun specifically points to perform resolution to obtain corresponding intention information and slot position information. Pronouns may include, but are not limited to, ordinal pronouns, fuzzy pronouns, and the like. The ordinal pronouns refer to that when the voice ordering device has a conversation with a user, the sentence of the voice ordering device is a multi-result selection, and one of options in the answer of the user is directly expressed by an ordinal. Such as: the voice ordering device says that: "you want coffee, milk tea or fruit juice", the user answers say: "first", wherein the ordinal pronoun 1 refers to coffee. The fuzzy meaning means that when the speech ordering apparatus has a conversation with the user, the sentence of the speech ordering apparatus is a multi-result selection, and a certain option is indicated by a keyword in the answer of the user. Such as: the voice ordering device says that: "you want pearl milk tea, red bean fresh milk tea or mango green", the user answers saying: the red bean bar is characterized in that red beans refer to red bean fresh milk tea.

And 102, when the intention information is order placing, determining an object to be placed and generating order information of the object to be placed according to the semantic information.

In the embodiment of the application, when the intention information is ordering, whether the object to be ordered is determined or not can be judged according to the semantic information.

As an example, when the intention information is order placing, according to the semantic information, the object to be placed is determined and the order information of the object to be placed is generated. In the embodiment of the application, the pre-order information of the object to be placed can be formed through the slot position information in the semantic information. For example, "i want one cup of coffee", wherein the object to be placed is "coffee", and the corresponding order information is "one cup of coffee" and "coffee".

As another example, when the intention information is ordering, if the object to be ordered is not determined according to the semantic information, generating a voice inquiring the object to be ordered and playing the voice to the user, so as to determine the object to be ordered by combining the voice replied by the user. For example, the user says "i want a drink", and since the "drink" range is too wide, the voice ordering apparatus asks the user what kind of drink is specifically, and the user returns the type of drink (e.g., orange juice) to the voice ordering apparatus, so that the subject to be placed can be determined.

And 103, comparing the pre-order information with a necessary information list corresponding to the object to be placed to judge whether the pre-order information lacks necessary information.

And 104, when the pre-order information does not lack the necessary information, combining the pre-order information and the personal information of the user to execute order placing operation.

In the embodiment of the application, the order information can be compared with the necessary information list corresponding to the object to be placed, and whether the necessary information is lacked in the order information or not can be judged. The necessary information list corresponding to the order waiting object may be stored in advance, as shown in table 1, for example, when the order waiting object is coffee, the necessary information list corresponding to the order waiting object may include a first-class (coffee), a second-class (latte), a temperature (hot), a cup amount (small cup), a sugar amount (full sugar), a charging (non-charging), and the like, to which the order waiting object belongs.

Class I articles	Second class article	Temperature of	Amount of cup	Amount of sugar	Charging of
						Coffee	Latte iron	Heat generation	Small cup	Whole sugar	Without adding

As an example, after comparing the order information with the necessary information list corresponding to the object to be placed, when the necessary information is not lacked in the order information, the order placing operation is executed by combining the order information and the personal information of the user.

For example, the user says "give me a small cup of full-sugar hot latte, the others are not added", the voice ordering apparatus performs voice recognition according to the voice data of the user, obtains semantic information, generates corresponding order information according to the semantic information, compares the order information with a necessary information list (such as a primary product, a secondary product, temperature, cup amount, sugar amount, charging and the like) corresponding to an object to be ordered (coffee), determines that the necessary information is not lacked in the order information after the comparison, and then performs order placing operation by combining the order information and personal information (such as seat number, telephone number, name, address) of the user.

As another example, when the necessary information is absent in the order information, acquiring the absent first necessary information, generating a voice of a query in combination with the first necessary information and playing the voice to the user, so as to acquire the first necessary information in combination with the voice replied by the user; and updating the order information according to the first necessary information until the necessary information is not lacked in the order information.

For example, the user says "cup hot coffee", the voice ordering apparatus performs voice recognition according to the voice data of the user, obtains semantic information, generates corresponding order information according to the semantic information, compares the order information with a necessary information list (such as first class, second class, temperature, cup amount, sugar amount, charging and the like) corresponding to an object to be ordered (coffee), determines that the order information lacks necessary information (such as information lack of second class, cup amount, sugar amount, charging and the like) after comparison, generates voice of inquiry according to the lacking necessary information, and broadcasts the voice to the user, for example, the voice ordering apparatus inquires that the user asks that the user is "iron, mocha or american", and the user answers and says "mocha"; and the voice ordering device updates the information of the pre-order according to the answer result of the user, determines the necessary information which is lacked in the pre-order, then continuously inquires according to the lacked necessary information and updates the pre-order according to the answer of the user until the necessary information is not lacked in the information of the pre-order.

It should be noted that, in the embodiment of the present application, each order update needs to be jointly determined by combining the current semantic information and the current order status.

As an example, the voice data of the user may be recognized, corresponding semantic information may be obtained, slot position information in the semantic information may be segmented, and the pre-order may be updated according to the segmentation result and the current pre-order state. Such as: the user says that the slot position information of 'two American cups, one middle cup, one small cup, one hot middle cup and one normal temperature' can be segmented, and the result after segmentation is as follows: (1) two cups [2], American [ American coffee ], one [1], a middle cup [ middle cup ], one [1] and a small cup [ small cup ]; (2) medium cup [ medium cup ], hot [ hot ]; (3) since (2) and (3) in the segmentation result are supplementary to (1), the updated order information is as shown in table 2:

class I articles	Second class article	Temperature of	Amount of cup	Amount of sugar	Charging of
						Coffee	American style coffee	Heat generation	Middle cup
Coffee	American style coffee	At normal temperature	Small cup

In conclusion, the order information of the user is determined through voice recognition, semantic understanding and order management of the user, and ordering is completed, so that a large number of customer service personnel are prevented from being arranged at a merchant side, the ordering process of the user is simplified, the ordering cost is reduced, and the ordering efficiency is improved.

In order to make the voice ordering more intelligent, when the user orders food, the user does not know which products are on the menu, and the voice ordering device is actively required to recommend the food. Optionally, when the intention information is recommendation, determining an object to be recommended by combining the semantic information, generating a recommended voice according to the object to be recommended, and playing the voice to the user so that the user can select the voice. For example, as shown in fig. 2, in a coffee shop ordering scenario, corresponding recommendation needs to be performed according to the occurrence of the slot information of the primary category and the secondary category in the user semantic information.

For example, as shown in fig. 3, fig. 3 is a schematic diagram of a recommendation process according to an embodiment of the present application. In a coffee shop meal ordering scene, the types of beverages such as coffee, milk tea, fruit juice and the like belong to first-grade products; the specific certain drinks such as latte, pearl milk tea, freshly squeezed orange juice and the like belong to the second-grade products. For example, when a user dials a phone call, a user orders a meal, and after a conversation is opened, the voice meal ordering device automatically responds a welcome word to the user, and goes through the conversation with the user to collect information required during meal ordering, and when the user does not know which products are on a menu, the user actively requests the voice meal ordering device to recommend the products. At the moment, the voice meal ordering device can determine whether a secondary product exists according to semantic information corresponding to the voice of the user, and when the secondary product exists, the product information can be determined to be collected; when the secondary category does not exist in the primary category, inquiring the secondary category; and inquiring the primary class when the secondary class and the primary class do not exist. After the user recommends according to the voice ordering device, the user selects the type information, after the type information is determined, the voice ordering device judges whether the necessary information is lacked in the order form, if the necessary information is lacked in the order form information, the missing information needs to be inquired, the necessary information in each order form is ensured to be complete, and the user confirms the order form. And finally, collecting personal information of the user, including telephone numbers, names, addresses and the like, responding to a finish word, finishing the whole conversation and finishing the meal ordering.

It should be noted that, in the interaction process between the voice ordering apparatus and the user, for example, when the intention information is ordering, if the object to be ordered is not determined according to the semantic information, a voice inquiring the object to be ordered is generated and played to the user, or when the intention information is recommendation, the object to be recommended is determined by combining the semantic information; generating recommended voice according to an object to be recommended and playing the recommended voice to a user, and performing voice recognition and semantic recognition on the collected voice if the voice of the user is collected in the playing process to obtain a voice recognition result and a semantic recognition result; when the voice recognition result or the semantic recognition result meets a preset interruption condition, stopping playing the voice and processing the collected voice; the preset interrupt condition includes any one or more of the following conditions: the number of words in the voice recognition result is larger than a preset word number threshold, a preset interruption keyword exists in the voice recognition result, and intention information exists in the semantic recognition result. And after the collected voice is processed, continuing to play the voice.

For example, in the playing process, for example, the user suddenly starts speaking, the voice ordering device performs voice recognition and semantic recognition on the voice of the user, when the number of words spoken by the user exceeds a preset word number threshold, the user is considered to be interrupted, and at this time, the user needs to stop playing the voice and start listening to the content of the new words spoken by the user; if the number of words does not reach the preset word number threshold, the user can ignore the words and does not need to stop playing the voice. For another example, in the playing process, the user suddenly starts speaking, the voice ordering device performs voice recognition and semantic recognition on the user voice, and when the user speaking can be analyzed to obtain the actual intention, for example, "i thirst today, first comes a cup of boiled water", the voice ordering device stops playing the voice, generates new order information according to step 101 and step 104, then continues to play the previous voice, completes the previous order perfecting and ordering process, and after completion, performs the new order perfecting and ordering process; for another example, in the playing process, the user suddenly starts speaking, the voice ordering apparatus performs voice recognition and semantic recognition on the user voice, the user speaking contains certain specific self-defined keywords, for example, "when the user waits for a moment, i't hear clearly", the voice ordering apparatus stops playing the voice, and starts to repeatedly play the previous voice.

In the embodiment of the application, voice recognition and semantic recognition are carried out on the voice by collecting the voice of a user, and semantic information is obtained; the semantic information includes: intent information and slot position information; when the intention information is ordering, determining an object to be ordered according to the semantic information and generating order information of the object to be ordered; comparing the pre-order information with a necessary information list corresponding to an object to be ordered, and judging whether the pre-order information lacks necessary information or not; and when the pre-order information does not lack necessary information, combining the pre-order information and the personal information of the user to execute order placing operation. The method determines the order information of the user by performing voice recognition, semantic understanding and order management on the voice of the user, completes ordering, avoids arranging a large number of customer service personnel at a merchant side, simplifies the ordering process of the user, reduces the ordering cost and improves the ordering efficiency.

Corresponding to the voice meal ordering methods provided by the above embodiments, an embodiment of the present application further provides a voice meal ordering apparatus, and since the voice meal ordering apparatus provided by the embodiment of the present application corresponds to the voice meal ordering methods provided by the above embodiments, the implementation manner of the voice meal ordering method is also applicable to the voice meal ordering apparatus provided by the embodiment, and is not described in detail in the embodiment. Fig. 4 is a schematic diagram according to a second embodiment of the present application. As shown in fig. 4, the voice ordering apparatus 400 includes: the system comprises an acquisition module 410, a generation module 420, a comparison module 430 and an ordering module 440.

The acquisition module 410 is configured to acquire a voice of a user, perform voice recognition and semantic recognition on the voice, and acquire semantic information; the semantic information includes: intent information and slot position information; the generating module 420 is configured to determine an object to be placed and generate order information of the object to be placed according to the semantic information when the intention information is placing an order; a comparison module 430, configured to compare the order information with a necessary information list corresponding to the object to be placed, and determine whether the order information lacks necessary information; and the ordering module 440 is configured to perform an ordering operation in combination with the order information and the personal information of the user when the order information does not lack necessary information.

As a possible implementation manner of the embodiment of the present application, as shown in fig. 5, on the basis of fig. 4, fig. 5 is a schematic diagram according to a third embodiment of the present application. The voice ordering apparatus 400 further includes: a first acquisition module 450 and an update module 460.

The first obtaining module 450 is configured to, when the pre-order information lacks necessary information, obtain the lacking first necessary information, generate a voice of an inquiry by combining the first necessary information, and play the voice to the user, so as to obtain the first necessary information by combining the voice replied by the user; and the updating module 460 is configured to update the order information according to the first necessary information until the order information does not lack the necessary information.

As a possible implementation manner of the embodiment of the present application, the generating module 420 is further configured to, when the intention information is an order, if the object to be ordered is not determined according to the semantic information, generate a voice inquiring the object to be ordered and play the voice to the user, so as to determine the object to be ordered by combining the voice replied by the user.

As a possible implementation manner of the embodiment of the present application, as shown in fig. 6, on the basis of fig. 4, fig. 6 is a schematic diagram according to a fourth embodiment of the present application. The voice ordering apparatus 400 further includes: a determination module 470.

The determining module 470 is configured to determine, when the intention information is recommendation, an object to be recommended by combining the semantic information; the generating module 420 is further configured to generate a recommended voice according to the object to be recommended and play the recommended voice to the user, so that the user can select the recommended voice.

As a possible implementation manner of the embodiment of the present application, as shown in fig. 7, on the basis of fig. 6, fig. 7 is a schematic diagram according to a fifth embodiment of the present application. The voice ordering apparatus 400 further includes: a second acquisition module 480 and a processing module 490.

The second obtaining module 480 is configured to, in the process of playing the voice, perform voice recognition and semantic recognition on the collected voice if the voice of the user is collected, and obtain a voice recognition result and a semantic recognition result; the processing module 490 is configured to stop playing the voice and process the collected voice when the voice recognition result or the semantic recognition result meets a preset interruption condition; the preset interrupt condition includes any one or more of the following conditions: the number of words in the voice recognition result is larger than a preset word number threshold, a preset interruption keyword exists in the voice recognition result, and intention information exists in the semantic recognition result; the processing module 490 is further configured to continue playing the voice after the collected voice is processed.

According to the voice meal ordering device, voice recognition and semantic recognition are carried out on voice by collecting voice of a user, and semantic information is obtained; the semantic information includes: intent information and slot position information; when the intention information is ordering, determining an object to be ordered according to semantic information and generating order information of the object to be ordered; comparing the pre-order information with a necessary information list corresponding to an object to be ordered, and judging whether the pre-order information lacks necessary information or not; and when the pre-order information does not lack necessary information, combining the pre-order information and the personal information of the user to execute order placing operation. The method determines the order information of the user by performing voice recognition, semantic understanding and order management on the voice of the user, completes ordering, avoids arranging a large number of customer service personnel at a merchant side, simplifies the ordering process of the user, reduces the ordering cost and improves the ordering efficiency.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 8 is a block diagram of an electronic device for a voice ordering method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 8, the electronic apparatus includes: one or more processors 801, memory 802, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 8 illustrates an example of a processor 801.

The memory 802 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the voice ordering method provided by the application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the voice meal ordering method provided by the present application.

The memory 802 serves as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the voice ordering method in the embodiments of the present application (e.g., the acquisition module 410, the generation module 420, the comparison module 430, the ordering module 440 shown in fig. 4, the first acquisition module 450 and the update module 460 shown in fig. 5, the determination module 470 shown in fig. 6, the second acquisition module 480 and the processing module 490 shown in fig. 7). The processor 801 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 802, that is, implements the voice ordering method in the above-described method embodiments.

The memory 802 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device for voice ordering, and the like. Further, the memory 302 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 302 optionally includes memory located remotely from processor 301, and these remote memories may be connected over a network to the electronics of the voice ordering. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method for voice ordering may further include: an input device 803 and an output device 804. The processor 801, the memory 802, the input device 803, and the output device 804 may be connected by a bus or other means, and are exemplified by a bus in fig. 8.

The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic apparatus for voice ordering, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 804 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for voice ordering, comprising:

collecting voice of a user, and carrying out voice recognition and semantic recognition on the voice to obtain semantic information; the semantic information includes: intent information and slot position information;

when the intention information is ordering, determining an object to be ordered according to the semantic information and generating order information of the object to be ordered;

comparing the pre-order information with a necessary information list corresponding to the object to be placed to judge whether the pre-order information lacks necessary information;

and when the pre-order information does not lack necessary information, combining the pre-order information and the personal information of the user to execute order placing operation.

2. The method of claim 1, further comprising:

when the pre-order information lacks necessary information, acquiring the lacked first necessary information, generating inquiry voice by combining the first necessary information, and playing the inquiry voice to the user so as to acquire the first necessary information by combining the voice replied by the user;

and updating the order information according to the first necessary information until the order information does not lack the necessary information.

3. The method of claim 1, further comprising:

and when the intention information is ordering, if the object to be ordered is not determined according to the semantic information, generating voice for inquiring the object to be ordered and playing the voice to the user so as to determine the object to be ordered by combining the voice replied by the user.

4. The method of claim 1, further comprising:

when the intention information is recommendation, determining an object to be recommended by combining the semantic information;

and generating a recommended voice according to the object to be recommended and playing the recommended voice to the user so as to facilitate the user to select.

5. The method according to any one of claims 2-4, further comprising:

in the process of playing the voice, if the voice of the user is collected, carrying out voice recognition and semantic recognition on the collected voice to obtain a voice recognition result and a semantic recognition result;

when the voice recognition result or the semantic recognition result meets a preset interruption condition, stopping playing voice and processing the collected voice; the preset interrupt condition comprises any one or more of the following conditions: the number of words in the voice recognition result is greater than a preset word number threshold, a preset interruption keyword exists in the voice recognition result, and intention information exists in the semantic recognition result;

and after the collected voice is processed, continuing to play the voice.

6. A voice meal ordering apparatus, comprising:

the acquisition module is used for acquiring the voice of a user, performing voice recognition and semantic recognition on the voice and acquiring semantic information; the semantic information includes: intent information and slot position information;

the generating module is used for determining an object to be placed and generating the order information of the object to be placed according to the semantic information when the intention information is placing an order;

the comparison module is used for comparing the pre-order information with a necessary information list corresponding to the object to be placed and judging whether the pre-order information lacks necessary information or not;

and the ordering module is used for executing ordering operation by combining the order information and the personal information of the user when the order information does not lack the necessary information.

7. The apparatus of claim 6, further comprising: the device comprises a first acquisition module and an updating module;

the first obtaining module is used for obtaining the lacking first necessary information when the necessary information is lacking in the order information, generating inquiry voice by combining the first necessary information and playing the inquiry voice to the user so as to obtain the first necessary information by combining the voice replied by the user;

and the updating module is used for updating the order information according to the first necessary information until the order information does not lack the necessary information.

8. The apparatus according to claim 6, wherein the generating module is further configured to, when the intention information is ordering, if the object to be ordered is not determined according to the semantic information, generate a voice inquiring about the object to be ordered and play the voice to the user, so as to determine the object to be ordered in combination with the voice replied by the user.

9. The apparatus of claim 6, further comprising: a determination module;

the determining module is used for determining an object to be recommended by combining the semantic information when the intention information is recommended;

the generating module is further used for generating recommended voice according to the object to be recommended and playing the recommended voice to the user so that the user can select the voice.

10. The apparatus of any one of claims 7-9, further comprising: a second acquisition module and a processing module;

the second acquisition module is used for carrying out voice recognition and semantic recognition on the acquired voice if the voice of the user is acquired in the process of playing the voice, and acquiring a voice recognition result and a semantic recognition result;

the processing module is used for stopping playing the voice and processing the collected voice when the voice recognition result or the semantic recognition result meets a preset interrupt condition; the preset interrupt condition comprises any one or more of the following conditions: the number of words in the voice recognition result is greater than a preset word number threshold, a preset interruption keyword exists in the voice recognition result, and intention information exists in the semantic recognition result;

and the processing module is also used for continuously playing the voice after the collected voice is processed.

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.