CN112786031B - Man-machine conversation method and system - Google Patents

Man-machine conversation method and system Download PDF

Info

Publication number
CN112786031B
CN112786031B CN201911059980.1A CN201911059980A CN112786031B CN 112786031 B CN112786031 B CN 112786031B CN 201911059980 A CN201911059980 A CN 201911059980A CN 112786031 B CN112786031 B CN 112786031B
Authority
CN
China
Prior art keywords
user
current user
sentence
new
reply content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911059980.1A
Other languages
Chinese (zh)
Other versions
CN112786031A (en
Inventor
陈炎荣
宋洪博
石韡斯
卢玉环
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
Sipic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sipic Technology Co Ltd filed Critical Sipic Technology Co Ltd
Priority to CN201911059980.1A priority Critical patent/CN112786031B/en
Publication of CN112786031A publication Critical patent/CN112786031A/en
Application granted granted Critical
Publication of CN112786031B publication Critical patent/CN112786031B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses a man-machine conversation method and a man-machine conversation system, which are applied to intelligent terminal equipment with a screen, wherein the method comprises the following steps: starting a full-duplex dialogue mode after detecting the operation of starting a dialogue by a user; performing voice recognition on the detected current user sentence, so as to determine reply content corresponding to the current user sentence according to an obtained voice recognition result and present the reply content to a user; and if a new user sentence is detected before determining the reply content corresponding to the current user sentence and presenting the reply content to the user, determining new reply content responding to the current user according to the current user sentence and the new user sentence. The invention adopts a full-duplex dialogue mode, can detect new user sentences in real time while responding to the user sentences, and determines new response contents responding to the current user according to the current user sentences and the new user sentences, thereby more accurately and efficiently determining the response contents by simultaneously integrating the context.

Description

Man-machine conversation method and system
Technical Field
The invention relates to the technical field of man-machine conversation, in particular to a man-machine conversation method and system.
Background
At present, voice assistants running in smart televisions or OTT television boxes in the market generally adopt a half-duplex interaction mode to carry out human-computer interaction. For example:
1. and adding the awakening word by using the sound pickup plate, wherein the awakening word is required to be used in each conversation, interaction can be carried out after awakening, and sound pickup is stopped after VAD (voice detection) detection is finished.
2. When the user presses the voice key, the voice remote controller starts the conversation to enter a pickup state, and when the user lifts the voice key, the pickup is stopped, and the conversation output is output.
Half-duplex interaction can satisfy most simple operations of sentences, such as volume increase, television channel change, song cut-off and the like. But the user is not careful when dealing with complex scenes. For example: the user says "i want to watch movie", then says "with or without Liu De leading", then says "second line fourth", and finally says "want to listen to his song".
The above scenario involves:
1. the following is a reduction of the above range;
2. the user intends to jump across domains (skills are independent from each other, and when the skill A jumps to the skill B, the user is called a cross-domain jump);
3. keywords of the sentence are mentioned in the previous sentence, and the sentence is default and needs to be referred according to the context;
4. the user would prefer to input the next sentence immediately after the dialog output of the sentence in the previous sentence, rather than waiting for the completion of the TTS (text to speech) broadcast.
The half-duplex interaction has the following defects in the above scene:
1. depending on the awakening word or the remote controller, pressing the remote controller or speaking the awakening word is input in each interaction;
2. the effect of instant interruption (termination of the current conversation process) cannot be achieved, and the input of the next conversation can be carried out only by interrupting the conversation or waiting for the conversation to be finished;
3. under the scene that the message is required to be referred, the user intention can not be inferred according to the context after the domain crossing is realized;
4. the operation mode is fixed, and the interactive experience is poor.
In order to solve the above problems and optimize the experience, the following solutions are available in the industry:
1. multiple rounds of conversation (requiring multiple rounds of presentation, the user may also continue to clarify and refine his or her own intent during the conversation): the function is that after a user calls a conversation, the conversation is not ended after output but continues to be monitored, the scheme solves the requirements that the user needs to wake up every time, associates the above in the same skill and the skill, and the like, but the following requirements exist: the system can not be interrupted immediately and does not support the defect of cross-domain context.
2. And the awakening words are set to be words which can be possibly used in a specific scene after the specific scene is entered, when the user speaks the keywords, the conversation opening is triggered, and the client only needs to analyze the corresponding conversation opening action and execute the corresponding service logic. The scheme has the problems that the number of awakening words is limited, the awakening effect is poor, the situation of awakening without awakening or awakening by mistake is easy to occur, the cross-domain scene cannot be met, and the like.
Disclosure of Invention
An embodiment of the present invention provides a man-machine interaction method and system, which are used for solving at least one of the above technical problems.
In a first aspect, an embodiment of the present invention provides a man-machine conversation method, which is applied to a smart terminal device with a screen, and the method includes:
starting a full-duplex dialogue mode after detecting the operation of starting a dialogue by a user;
performing voice recognition on the detected current user sentence, so as to determine reply content corresponding to the current user sentence according to an obtained voice recognition result and present the reply content to a user;
and if a new user sentence is detected before determining the reply content corresponding to the current user sentence and presenting the reply content to the user, determining new reply content responding to the current user according to the current user sentence and the new user sentence.
In a second aspect, an embodiment of the present invention provides a human-computer interaction system, which is applied to a smart terminal device with a screen, where the system includes:
the conversation mode starting module is used for starting a full-duplex conversation mode after detecting the operation of starting a conversation by a user;
the voice recognition module is used for carrying out voice recognition on the detected current user sentence so as to determine the reply content corresponding to the current user sentence according to the obtained voice recognition result and present the reply content to the user;
and the reply content determining module is used for determining the new reply content responding to the current user according to the current user sentence and the new user sentence if the new user sentence is detected before determining the reply content corresponding to the current user sentence and presenting the reply content to the user.
In a third aspect, an embodiment of the present invention provides a storage medium, where one or more programs including execution instructions are stored, where the execution instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform any one of the above-described human-computer conversation methods of the present invention.
In a fourth aspect, an electronic device is provided, comprising: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute any one of the man-machine conversation methods of the invention.
In a fifth aspect, the present invention further provides a computer program product, which includes a computer program stored on a storage medium, the computer program including program instructions, which when executed by a computer, cause the computer to execute any one of the above man-machine conversation methods.
The embodiment of the invention has the beneficial effects that: the embodiment of the invention adopts a full-duplex dialogue mode, can detect a new user sentence in real time while responding to the user sentence, and determines new responding content responding to the current user according to the current user sentence and the new user sentence if the new user sentence is detected before determining the responding content corresponding to the current user sentence and presenting the responding content to the user, thereby simultaneously synthesizing the context to more accurately and efficiently determine the responding content.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of one embodiment of a human-machine conversation method of the present invention;
FIG. 2 is a flow chart of another embodiment of a human-machine dialog method of the present invention;
FIG. 3 is a functional block diagram of an embodiment of a human-machine dialog system of the present invention;
fig. 4 is a schematic structural diagram of an embodiment of an electronic device according to the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
As used in this disclosure, "module," "device," "system," and the like are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, or software in execution. In particular, for example, an element may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. Also, an application or script running on a server, or a server, may be an element. One or more elements may be in a process and/or thread of execution and an element may be localized on one computer and/or distributed between two or more computers and may be operated by various computer-readable media. The elements may also communicate by way of local and/or remote processes based on a signal having one or more data packets, e.g., from a data packet interacting with another element in a local system, distributed system, and/or across a network in the internet with other systems by way of the signal.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
As shown in fig. 1, which is a flowchart of an embodiment of a human-computer conversation method according to the present invention, the human-computer conversation method is applied to a smart terminal device with a screen, where the smart terminal device with a screen may be a smart television, a smart phone, a tablet computer, a story machine with a display screen, a smart sound box with a display screen, and the like, and the present invention is not limited thereto. The man-machine conversation method comprises the following steps:
and S10, when the operation of opening the conversation by the user is detected, starting the full-duplex conversation mode. Illustratively, the user's operation of opening a dialog may be by speaking a wakeup word or by pressing a special function key on the remote control.
S20, performing voice recognition on the detected current user sentence, determining the reply content corresponding to the current user sentence according to the obtained voice recognition result, and presenting the reply content to the user;
and S30, if a new user sentence is detected before determining the reply content corresponding to the current user sentence and presenting the reply content to the user, determining the new reply content corresponding to the current user according to the current user sentence and the new user sentence.
The embodiment of the invention adopts a full-duplex dialogue mode, can detect a new user sentence in real time while responding to the user sentence, and determines new responding content responding to the current user according to the current user sentence and the new user sentence if the new user sentence is detected before determining the responding content corresponding to the current user sentence and presenting the responding content to the user, thereby simultaneously synthesizing the context to more accurately and efficiently determine the responding content.
In some embodiments, if the time interval between the detection of the new user sentence and the detection of the current user sentence does not exceed a preset time threshold,
determining new reply content responsive to the current user from the current user statement and the new user statement comprises: and if the new user statement is determined to be the associated statement of the current user statement, determining new reply content responding to the current user according to the current user statement and the new user statement.
Illustratively, the preset time length threshold is the maximum time length which can be endured by a general user when a man-machine conversation is performed, and the time length can be determined by collecting massive man-machine conversation process data to perform analysis statistics. For example, the preset duration threshold may be 5 seconds, but the present invention does not limit the specific value of the preset duration threshold, and a person skilled in the art may appropriately adjust the preset duration threshold according to actual needs, or the preset duration threshold determined based on the collected massive human-machine conversation process data may change according to the passage of time.
If a new user sentence is detected within a preset time threshold, the explanation corresponds to whether the reply content of the current user sentence is presented to the user or is in the acceptable range content, otherwise, the reply content of the current user sentence is not considered.
Illustratively, an associative statement means that the new user statement is a statement that further defines the current user statement. For example, the current user statement is "i want to listen to songs", and the new user statement may be "Liu De Hua"; alternatively, the current user statement is "I want to watch a movie", and the new user statement may be "homemade"; alternatively, the current sentence is "i want to watch XX shows", the new user sentence is "last week (or most recent)", and so on.
In this embodiment, when it is determined that a new user sentence is an associated sentence of a current user sentence, new reply content responsive to the current user is determined jointly according to the current user sentence and the new user sentence. Therefore, the current user statement and the new user statement are considered, and the new reply content is finally comprehensively determined to be presented to the user.
If the reply content corresponding to the current user statement and the reply content corresponding to the new user statement are respectively determined, and then the two reply contents are sequentially presented to the user (for example, a 'movie interface' is presented first, and then the 'domestic movie interface' is presented in a refreshing mode), the whole interaction process is redundant and tedious, and is repeated, especially for a screen intelligent terminal device (for example, an intelligent television), page refreshing is required to be carried out at least twice, and the user experience is seriously influenced.
The method of the embodiment can directly present the final reply content to the user.
In some embodiments, if it is determined that the new user sentence is not an associated sentence of the current user sentence, first reply content responsive to the current user is determined from the current user sentence, and second reply content responsive to the current user is determined from the new user sentence.
Although the new user sentence is not the associated sentence of the current user sentence in the embodiment, the new user sentence is detected within the preset time length threshold, that is, the reply content of the current user sentence is still within the acceptable range content, so the determined first reply content and the second reply content are respectively used for presenting to the user, and the requirement of the user is met to the greatest extent.
In some embodiments, if the time interval between the detection of the new user sentence and the detection of the current user sentence exceeds a preset time threshold,
determining new reply content responsive to the current user from the current user statement and the new user statement comprises:
determining new reply content responsive to the current user from the new user statement.
In this embodiment, since the interval duration between when a new user sentence is detected and when a current user sentence is detected exceeds the preset duration threshold, it indicates that the maximum tolerable waiting duration of the user has been exceeded, and the user does not care about the reply content corresponding to the current user sentence any more. If the first reply content and the second reply content corresponding to the current user sentence and the new user sentence are respectively presented to the user at this time, the user experience will be seriously influenced.
It is more intuitive that the user asks a question a first, and if the answer content corresponding to the question a is presented to the user after the network delay, the user gives a feeling of answering a question if the answer content corresponding to the question a is presented to the user when the user does not get the answer of the machine within the preset time threshold due to delay caused by poor network conditions or other possible reasons.
The method based on the embodiment can realize the effect of cross-domain jumping (the domain to which the question a belongs is crossed to the domain to which the question b belongs), and the reply content of the question really concerned by the user at the moment is presented to the user.
Fig. 2 is a flow chart of another embodiment of the man-machine interaction method of the present invention, which includes the following steps:
1. starting the process;
2. a conversation is called through a remote controller or a wake-up word;
3. starting pickup and sending to a cloud for recognition;
4. performing semantic analysis according to the recognition result, if the user has other effective semantic input at the moment, interrupting the current link and reentering the recognition link;
5. after the dialogue is ended, carrying out dialogue management, recording the semantic slot value of the current dialogue, outputting the dialogue result, and if the user has other effective semantic input, interrupting the current link and re-entering the recognition link;
6. if the conversation service has voice synthesis, the voice synthesis is carried out and is issued to the client for broadcasting, and if the user has other effective semantic input at the moment, the current link is interrupted and the recognition link is entered again;
7. the user does not have effective semantic analysis for a long time, or the user triggers an exit action, and the process is finished.
Illustratively, the method can be used for realizing one-time awakening and continuous conversation on an intelligent television or OTT television box, so that the interaction is easier and smoother, and the method is increasingly close to a voice assistant implementation scheme for human-to-human interaction.
The implementation principle of the scheme is as follows:
1. after the dialog is called, continuously picking up sound for recognition, and quitting the dialog when the user clearly quits or does not speak for a long time (without effective semantics);
2. and key information is recorded in the conversation service process, and when new voice input exists, association can be carried out according to the conversation context recorded by the system, so that the user intention can be accurately presumed.
The scheme has the advantages that:
1. only one time of awakening is needed, continuous interaction can be realized, and the condition that an awakening word needs to be input or a voice key needs to be pressed in each sentence is avoided;
2. the pickup is continuously carried out, so that interruption can be carried out at any link in the conversation process, and the user is prevented from carrying out meaningless waiting;
3. the conversation context is recorded, the conversation context is not limited in a single skill any more, and the context association of cross-domain skills can be realized;
4. the audio transmission identification is carried out after the conversation is called, so that the system resource occupation is reduced;
5. interaction experience is optimized, and human-computer interaction is close to human-human interaction.
In some embodiments, the operation of the user to open a dialog is speaking a wake-up voice, and the initiating the full-duplex dialog mode after the operation of the user to open a dialog is detected comprises:
determining user characteristic information of the current user according to the detected awakening word voice;
querying a user characteristic information base to determine a dialog mode applicable to the current user; the user characteristic information base stores characteristic information of a plurality of users who use the current intelligent terminal equipment with a screen, and records conversation modes respectively suitable for the users;
and starting the full-duplex dialogue mode when the inquiry result shows that the dialogue mode corresponding to the user characteristic information of the current user is the full-duplex mode.
In some embodiments, the half-duplex dialog mode is initiated when the query result indicates that the dialog mode corresponding to the user characteristic information of the current user is a half-duplex mode.
The embodiment realizes the self-adaptive selection of the initial dialogue mode after the system is awakened, thereby being capable of self-adapting to users with different applicable habits. For example, for the elderly who are not familiar with voice control of the smart television or the users who use the smart television for the first time, the system is usually required to prompt how to operate in each step of operation, and the selection is performed after the prompt of the system is finished, and obviously, the half-duplex conversation mode is started. For the young or the people familiar with the semantic control process of the intelligent television, the voice control is directly performed according to the content seen by the young or the people who are familiar with the semantic control process of the intelligent television without listening to the prompt tone of the system, and at this time, the system needs to be configured into a full-duplex conversation mode so as to realize the interruption of the conversation at any time and meet the requirements of the user group.
It should be noted that for simplicity of explanation, the foregoing method embodiments are described as a series of acts or combination of acts, but those skilled in the art will appreciate that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention. In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
As shown in fig. 3, a schematic block diagram of an embodiment of a man-machine interaction system 300 of the present invention is applied to a smart terminal device with a screen, where the system 300 includes:
a conversation mode starting module 310, configured to start a full-duplex conversation mode after detecting an operation of starting a conversation by a user;
a voice recognition module 320, configured to perform voice recognition on the detected current user sentence, so as to determine, according to an obtained voice recognition result, a reply content corresponding to the current user sentence, and present the reply content to the user;
a reply content determining module 330, configured to determine a new reply content responsive to the current user according to the current user sentence and the new user sentence if a new user sentence is detected before determining the reply content corresponding to the current user sentence and presenting to the user.
In some embodiments, if the time interval between the detection of the new user sentence and the detection of the current user sentence does not exceed a preset time threshold,
determining new reply content responsive to the current user from the current user statement and the new user statement comprises: and if the new user sentence is determined to be the associated sentence of the current user sentence, determining new reply content responding to the current user according to the current user sentence and the new user sentence.
In some embodiments, if it is determined that the new user sentence is not an associated sentence of the current user sentence, first reply content responsive to the current user is determined from the current user sentence, and second reply content responsive to the current user is determined from the new user sentence.
In some embodiments, if the time interval between the detection of the new user sentence and the detection of the current user sentence exceeds a preset time threshold,
determining new reply content responsive to the current user from the current user statement and the new user statement comprises: determining new reply content responsive to the current user from the new user statement.
The man-machine conversation system of the embodiment of the invention can be used for executing the man-machine conversation method of the embodiment of the invention, and accordingly achieves the technical effect achieved by the man-machine conversation method of the embodiment of the invention, and the details are not repeated here. In the embodiment of the present invention, the relevant functional module may be implemented by a hardware processor (hardware processor).
Fig. 4 is a schematic hardware structure diagram of an electronic device for performing a man-machine interaction method according to another embodiment of the present invention, as shown in fig. 4, the electronic device includes:
one or more processors 410 and a memory 420, with one processor 410 being an example in fig. 4.
The apparatus for performing the man-machine conversation method may further include: an input device 430 and an output device 440.
The processor 410, memory 420, input device 430, and output device 440 may be connected by a bus or other means, such as by a bus connection in fig. 4.
The memory 420, which is a non-volatile computer-readable storage medium, may be used for storing non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the man-machine interaction method in the embodiments of the present invention. The processor 410 executes various functional applications of the server and data processing by operating non-volatile software programs, instructions and modules stored in the memory 420, so as to implement the man-machine conversation method of the above method embodiment.
The memory 420 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the human-machine conversation apparatus, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 420 may optionally include memory located remotely from processor 410, which may be connected to the human dialog device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 430 may receive input numeric or character information and generate signals related to user settings and function control of the human-machine interaction device. The output device 440 may include a display device such as a display screen.
The one or more modules are stored in the memory 420 and, when executed by the one or more processors 410, perform the human-machine dialog method of any of the method embodiments described above.
The product can execute the method provided by the embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiment of the present invention.
The electronic device of embodiments of the present invention exists in a variety of forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.
(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players (e.g., ipods), smart stereos, story machines, robots, handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.
(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.
(5) And other electronic devices with data interaction functions.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions substantially or contributing to the related art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A man-machine conversation method is applied to a screen intelligent terminal device and comprises the following steps:
starting a full-duplex dialogue mode after detecting the operation of starting a dialogue by a user;
performing voice recognition on the detected current user sentence, so as to determine reply content corresponding to the current user sentence according to an obtained voice recognition result and present the reply content to a user;
and if a new user sentence is detected before determining the reply content corresponding to the current user sentence and presenting the reply content to the user, determining new reply content responding to the current user according to the current user sentence and the new user sentence.
2. The method of claim 1, wherein if the time interval between the detection of the new user sentence and the detection of the current user sentence does not exceed a preset time threshold,
determining new reply content responsive to the current user from the current user statement and the new user statement comprises:
and if the new user statement is determined to be the associated statement of the current user statement, determining new reply content responding to the current user according to the current user statement and the new user statement.
3. The method of claim 2, wherein if it is determined that the new user sentence is not an associated sentence of the current user sentence, determining first reply content responsive to a current user from the current user sentence, and determining second reply content responsive to the current user from the new user sentence.
4. The method of claim 1, wherein if the time interval between the detection of the new user sentence and the detection of the current user sentence exceeds a preset time threshold,
determining new reply content responsive to the current user from the current user statement and the new user statement comprises:
determining new reply content responsive to the current user from the new user statement.
5. A man-machine conversation system is applied to a screen intelligent terminal device, and comprises:
the conversation mode starting module is used for starting a full-duplex conversation mode after detecting the operation of starting a conversation by a user;
the voice recognition module is used for carrying out voice recognition on the detected current user sentence so as to determine the reply content corresponding to the current user sentence according to the obtained voice recognition result and present the reply content to the user;
and the reply content determining module is used for determining the new reply content responding to the current user according to the current user sentence and the new user sentence if the new user sentence is detected before determining the reply content corresponding to the current user sentence and presenting the reply content to the user.
6. The system of claim 5, wherein if the time interval between the detection of the new user sentence and the detection of the current user sentence does not exceed a preset time threshold,
determining new reply content responsive to the current user from the current user statement and the new user statement comprises: and if the new user statement is determined to be the associated statement of the current user statement, determining new reply content responding to the current user according to the current user statement and the new user statement.
7. The system of claim 6, wherein if it is determined that the new user sentence is not an associated sentence of the current user sentence, determining first reply content responsive to a current user from the current user sentence, and determining second reply content responsive to the current user from the new user sentence.
8. The system of claim 5, wherein if the time interval between the detection of the new user sentence and the detection of the current user sentence exceeds a preset time threshold,
determining new reply content responsive to the current user from the current user statement and the new user statement comprises: determining new reply content responsive to the current user from the new user statement.
9. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any one of claims 1-4.
10. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.
CN201911059980.1A 2019-11-01 2019-11-01 Man-machine conversation method and system Active CN112786031B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911059980.1A CN112786031B (en) 2019-11-01 2019-11-01 Man-machine conversation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911059980.1A CN112786031B (en) 2019-11-01 2019-11-01 Man-machine conversation method and system

Publications (2)

Publication Number Publication Date
CN112786031A CN112786031A (en) 2021-05-11
CN112786031B true CN112786031B (en) 2022-05-13

Family

ID=75747197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911059980.1A Active CN112786031B (en) 2019-11-01 2019-11-01 Man-machine conversation method and system

Country Status (1)

Country Link
CN (1) CN112786031B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763046A (en) * 2021-09-07 2021-12-07 四川易海天科技有限公司 Mobile internet vehicle-mounted intelligent delivery system based on big data analysis

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3795350B2 (en) * 2001-06-29 2006-07-12 株式会社東芝 Voice dialogue apparatus, voice dialogue method, and voice dialogue processing program
CN103077165A (en) * 2012-12-31 2013-05-01 威盛电子股份有限公司 Natural language dialogue method and system thereof
WO2017179262A1 (en) * 2016-04-12 2017-10-19 ソニー株式会社 Information processing device, information processing method, and program
CN110223697B (en) * 2019-06-13 2022-04-22 思必驰科技股份有限公司 Man-machine conversation method and system

Also Published As

Publication number Publication date
CN112786031A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
CN111049996B (en) Multi-scene voice recognition method and device and intelligent customer service system applying same
EP3779972A1 (en) Voice wake-up method and apparatus
CN111540349B (en) Voice breaking method and device
CN112735398B (en) Man-machine conversation mode switching method and system
WO2021196617A1 (en) Voice interaction method and apparatus, electronic device and storage medium
US11721328B2 (en) Method and apparatus for awakening skills by speech
CN110503954A (en) Voice technical ability starts method, apparatus, equipment and storage medium
US11830483B2 (en) Method for processing man-machine dialogues
CN113779208A (en) Method and device for man-machine conversation
JP2023517363A (en) Method and apparatus for determining functional area of dialog text
CN117253478A (en) Voice interaction method and related device
CN109686372B (en) Resource playing control method and device
CN108492826B (en) Audio processing method and device, intelligent equipment and medium
CN112786031B (en) Man-machine conversation method and system
CN112700767B (en) Man-machine conversation interruption method and device
CN112447177B (en) Full duplex voice conversation method and system
CN111161734A (en) Voice interaction method and device based on designated scene
CN109658924B (en) Session message processing method and device and intelligent equipment
CN115188377A (en) Voice interaction method, electronic device and storage medium
CN113488047A (en) Man-machine conversation interruption method, electronic device and computer readable storage medium
JP2021009350A (en) Method for exiting audio skill, apparatus, device, and storage medium
CN113096651A (en) Voice signal processing method and device, readable storage medium and electronic equipment
CN109614252B (en) Audio playing scheduling method and system for intelligent story machine
CN110516043A (en) Answer generation method and device for question answering system
KR20200129315A (en) Remote control And Set-top Box Operating Method For Recognition Of Voice Recognition Call-word

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB02 Change of applicant information

Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant after: Sipic Technology Co.,Ltd.

Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant before: AI SPEECH Ltd.

CB02 Change of applicant information
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant