WO2021036624A1 - 交互方法、装置、设备以及存储介质 - Google Patents

交互方法、装置、设备以及存储介质 Download PDF

Info

Publication number
WO2021036624A1
WO2021036624A1 PCT/CN2020/104466 CN2020104466W WO2021036624A1 WO 2021036624 A1 WO2021036624 A1 WO 2021036624A1 CN 2020104466 W CN2020104466 W CN 2020104466W WO 2021036624 A1 WO2021036624 A1 WO 2021036624A1
Authority
WO
WIPO (PCT)
Prior art keywords
objects
user
information
interactive
image
Prior art date
Application number
PCT/CN2020/104466
Other languages
English (en)
French (fr)
Inventor
张子隆
孙林
栾青
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to JP2021556968A priority Critical patent/JP7224488B2/ja
Priority to KR1020217031185A priority patent/KR20210131415A/ko
Publication of WO2021036624A1 publication Critical patent/WO2021036624A1/zh
Priority to US17/681,026 priority patent/US20220179609A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/14Digital output to display device ; Cooperation and interconnection of the display device with other functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/002Specific input/output arrangements not covered by G06F3/01 - G06F3/16
    • G06F3/005Input arrangements through a video camera
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/01Indexing scheme relating to G06F3/01
    • G06F2203/012Walk-in-place systems for allowing a user to walk in a virtual environment while constraining him to a given position in the physical environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Definitions

  • the present disclosure relates to the field of computer vision technology, and in particular to an interaction method, device, equipment, and storage medium.
  • the way of human-computer interaction is mostly: the user inputs based on keys, touch, and voice, and the device responds by presenting images and text on the display screen.
  • the device responds by presenting images and text on the display screen.
  • most virtual characters are improved on the basis of voice assistants, and only the voice of the device is output, and the interaction between the user and the virtual characters is still on the surface.
  • the embodiments of the present disclosure provide an interaction solution.
  • an interaction method includes: acquiring an image of the periphery of a display device collected by a camera, the display device displaying interactive objects through a transparent display screen; and performing operations on one or more objects involved in the image. Detection; in response to detecting that at least two objects are involved in the image, a target object is selected from the at least two objects based on the detected feature information of the at least two objects; based on the detection of the target object As a result, the interactive object displayed on the transparent display screen of the display device is driven to respond to the target object.
  • the characteristic information includes object posture information and/or object attribute information.
  • the selecting the target object from the at least two objects according to the detected feature information of the at least two objects includes: according to the object posture information of each of the at least two objects The degree of posture matching with the set posture feature, or the target object is selected from the at least two objects according to the degree of match between the object attribute information of each of the at least two objects and the attribute of the set attribute feature .
  • a suitable object By selecting a target object from multiple objects according to feature information such as object posture information and object attribute information of each object, a suitable object can be selected as the target object for interaction, thereby improving interaction efficiency and service experience.
  • the selecting the target object from the at least two objects according to the detected feature information of the at least two objects includes: according to the object posture information of each of the at least two objects , Selecting one or more first objects that meet the characteristics of the set posture; when there are at least two first objects, driving the interactive objects to guide the at least two first objects to output setting information, respectively, And the target object is determined according to the order in which the detected first objects output the setting information respectively.
  • the target object with high willingness to cooperate can be selected from the objects that meet the characteristics of the set posture, which can improve the interaction efficiency and service experience.
  • the selecting the target object from the at least two objects according to the detected feature information of the at least two objects includes: according to the object posture information of each of the at least two objects , Select one or more first objects that meet the characteristics of the set posture; if there are at least two first objects, determine the at least two first objects according to the respective object attribute information of the at least two first objects The respective interaction response priority of each first object, and the target object is determined according to the interaction response priority.
  • the method further includes: after selecting a target object from the at least two objects, driving the interactive object to output confirmation information to the target object.
  • the object By outputting confirmation information to the target object, the object can be made clear that the object is currently in an interactive state, and the interaction efficiency is improved.
  • the method further includes: responding to that the object is not detected from the image at the current moment, and the object is not detected and not tracked from the image within a set time period before the current moment Object, determining that the object to be interacted with of the interactive object is empty, and making the display device enter the waiting object state.
  • the method further includes: in response to no object being detected from the image at the current moment, and the object is detected or tracked from the image within a set time period before the current moment, The to-be-interacted object of the interactive object is determined to be the object that interacted most recently.
  • the display state of the interactive object is more in line with actual interaction requirements, More targeted.
  • the display device displays the reflection of the interaction object through the transparent display screen, or the display device displays the reflection of the interaction object on the bottom plate.
  • the displayed interactive objects can be made more three-dimensional and vivid.
  • the interactive object includes a virtual character with a three-dimensional effect.
  • an interactive device in a second aspect, includes: an image acquisition unit for acquiring images around a display device collected by a camera, the display device displaying interactive objects through a transparent display screen; One or more objects involved in the image are detected; an object selection unit is configured to respond to the detection unit detecting that at least two objects are involved in the image, according to the detected feature information of the at least two objects , Select a target object from the at least two objects; a driving unit, configured to drive the interactive object displayed on the transparent display screen of the display device to perform the target object on the target object based on the detection result of the target object Response.
  • the characteristic information includes object posture information and/or object attribute information.
  • the object selection unit is specifically configured to: according to the degree of matching of the object posture information of each of the at least two objects with the posture of the set posture feature, or according to the The object attribute information of each object matches the degree of the attribute of the set attribute feature, and the target object is selected from the at least two objects.
  • the object selection unit is specifically configured to: select one or more first objects that meet the set posture characteristics according to the object posture information of each of the at least two objects; When there are at least two objects, the driving unit is caused to drive the interactive objects to guide the at least two first objects to output setting information, and to output the settings according to the detected first objects. The order of the information determines the target object.
  • the object selection unit is specifically configured to: select one or more first objects that meet the set posture characteristics according to the object posture information of each of the at least two objects; In the case that there are at least two objects, the respective interaction response priorities of the at least two first objects are determined according to the respective object attribute information of the at least two first objects, and the interaction response priorities are determined according to the interaction response priorities.
  • the target object is specifically configured to: select one or more first objects that meet the set posture characteristics according to the object posture information of each of the at least two objects; In the case that there are at least two objects, the respective interaction response priorities of the at least two first objects are determined according to the respective object attribute information of the at least two first objects, and the interaction response priorities are determined according to the interaction response priorities.
  • the target object is specifically configured to: select one or more first objects that meet the set posture characteristics according to the object posture information of each of the at least two objects; In the case that there are at least two objects, the respective interaction response priorities of the at least two first objects are determined according to the respective object attribute information of the at least
  • the device further includes a confirmation unit configured to: in response to the object selection unit selecting a target object from the at least two objects, cause the driving unit to drive the interactive object Output confirmation information to the target object.
  • the device further includes a waiting state unit configured to respond to the detection unit not detecting an object from the image at the current moment, and at a set time before the current moment In the segment, no object is detected and no object is tracked from the image, it is determined that the to-be-interacted object of the interactive object is empty, and the display device enters the waiting object state.
  • a waiting state unit configured to respond to the detection unit not detecting an object from the image at the current moment, and at a set time before the current moment In the segment, no object is detected and no object is tracked from the image, it is determined that the to-be-interacted object of the interactive object is empty, and the display device enters the waiting object state.
  • the device further includes an end state unit configured to respond to the detection unit not detecting an object from the image at the current moment, and at a set time before the current moment An object is detected or tracked from the image in the segment, and the object to be interacted with the interactive object is determined to be the object that interacted most recently.
  • the display device also displays the reflection of the interaction object through the transparent display screen, or the display device also displays the reflection of the interaction object on the bottom plate.
  • the interactive object includes a virtual character with a three-dimensional effect.
  • an interactive device in a third aspect, includes a processor; a memory for storing instructions executable by the processor, and when the instructions are executed, the processor is prompted to implement any implementation provided in the present disclosure The interactive method described in the method.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the processor is prompted to implement the interaction method according to any of the embodiments provided in the present disclosure .
  • Fig. 1 shows a flowchart of an interaction method according to at least one embodiment of the present disclosure
  • Fig. 2 shows a schematic diagram of displaying interactive objects according to at least one embodiment of the present disclosure
  • Fig. 3 shows a schematic structural diagram of an interactive device according to at least one embodiment of the present disclosure
  • Fig. 4 shows a schematic structural diagram of an interactive device according to at least one embodiment of the present disclosure.
  • FIG. 1 shows a flowchart of an interaction method according to at least one embodiment of the present disclosure. As shown in FIG. 1, the method includes steps 101 to 104.
  • step 101 an image of the periphery of a display device collected by a camera is acquired, and the display device displays interactive objects through a transparent display screen.
  • the periphery of the display device includes any direction within the setting range of the display device, for example, it may include one or more of the front direction, the side direction, the rear direction, and the upper direction of the display device.
  • the camera used to collect images can be set on the display device or used as an external device, independent of the display device. And the image collected by the camera can also be displayed on the transparent display screen in the display device.
  • the number of the cameras can be multiple.
  • the image collected by the camera may be a frame in the video stream, or may be an image obtained in real time.
  • step 102 one or more users involved in the image are detected.
  • the one or more users in the image described herein refer to one or more objects involved in the detection process of the image.
  • object and “user” can be used interchangeably, and for convenience of presentation, they are collectively referred to as "user”.
  • the detection results are obtained, such as whether there are users around the display device and how many users are there, and information about the detected users can also be obtained, for example, from image recognition technology
  • the detection result may also include other information.
  • step 103 in response to detecting that at least two users are involved in the image, a target user is selected from the at least two users according to the detected characteristic information of the at least two users.
  • users can be selected according to corresponding feature information.
  • step 104 based on the detection result of the target user, the interactive object displayed on the transparent display screen of the display device is driven to respond to the target user.
  • the interactive object In response to the detection results of different target users, the interactive object will be driven to respond correspondingly to the different target users.
  • the interactive object displayed on the transparent display screen of the display device is driven to respond to the target user , Can choose the target user suitable for the current scene to interact in the multi-user scene, which improves the interaction efficiency and service experience.
  • the interactive objects displayed on the transparent display screen of the display device include virtual characters with a three-dimensional effect.
  • the interaction process can be made more natural and the user's interaction experience can be improved.
  • the interactive objects are not limited to virtual characters with three-dimensional effects, but may also be virtual animals, virtual items, cartoon characters, and other virtual images capable of realizing interactive functions.
  • the three-dimensional effect of the interactive object displayed on the transparent display screen can be realized by the following method.
  • Whether the human eye sees an object in three dimensions is usually determined by the shape of the object itself and the light and shadow effects of the object.
  • the light and shadow effects are, for example, high light and dark light in different areas of the object, and the projection of the light on the ground after the object is irradiated (that is, the reflection).
  • the reflection of the interactive object is also displayed on the transparent display screen, so that the human eye can observe the stereoscopic Picture.
  • a bottom plate is provided under the transparent display screen, and the transparent display is perpendicular or inclined to the bottom plate. While the transparent display screen displays the stereoscopic video or image of the interactive object, the reflection of the interactive object is displayed on the bottom plate, so that the human eye can observe the stereoscopic image.
  • the display device further includes a box body, and the front side of the box body is set to be transparent, for example, the transparent setting is realized by materials such as glass or plastic.
  • the transparent setting is realized by materials such as glass or plastic.
  • one or more light sources are also provided in the box to provide light to the transparent display screen to form a reflection.
  • the three-dimensional video or image of the interactive object is displayed on the transparent display screen, and the reflection of the interactive object is formed on the transparent display screen or the bottom plate to achieve the three-dimensional effect, so that the displayed interactive object It is more three-dimensional and vivid, and enhances the user's interactive experience.
  • the characteristic information includes user posture information and/or user attribute information
  • the target user can be selected from at least two detected users according to the user posture information and/or user attribute information.
  • the user gesture information refers to characteristic information obtained by performing image recognition in an image, such as user actions, gestures, and so on.
  • User attribute information refers to the characteristic information about the user, including the user's identity (for example, whether it is a VIP user), service record, time of arrival at the current location, and so on.
  • the attribute feature information may be obtained from user history records stored on the display device or the cloud, and the user history records may be related to the user's face and/or human body characteristics by retrieving on the display device or the cloud. Information is obtained from matching records.
  • the target user may be selected from the at least two users according to the degree of match between the user posture information of each of the at least two users and the posture of the set posture feature.
  • the user with the highest degree of posture matching among the matching results of the at least two users can be determined by matching the user posture information of the at least two users with the hand-raising action. For the target user.
  • the target user may be selected from the at least two users according to the degree of matching between the user attribute information of each of the at least two users and the attributes of the set attribute characteristics.
  • the attribute matching degree in the matching results of the at least two users may be matched by matching the user attribute information of the at least two users with the set attribute characteristics. The highest user is determined as the interactive object.
  • a user suitable for the current application scenario can be selected as the target user for interaction , So as to improve the interaction efficiency and service experience.
  • the target user can be selected from the at least two users in the following manner:
  • the first user who meets the set posture feature is selected.
  • conforming to the set posture feature means that the posture matching degree of the user posture information and the set posture feature is greater than a set value, for example, greater than 80%.
  • the posture feature is set as a hand-raising action.
  • the first user whose posture information matches the posture of the hand-raising action higher than 80% (the user is considered to have performed the hand-raising action). That is, all users who have performed the gesture of raising their hands are selected.
  • the target user can be further determined by the following method: driving the interactive object to guide the at least two first users to output setting information respectively, and according to the detected first user The order in which each user outputs the setting information determines the target user.
  • the setting information output by the first user may be one or more of actions, expressions, and voices.
  • at least two first users are guided to perform a jumping action, and the first user who performs the jumping action first is determined as the target user.
  • a target user with high willingness to cooperate can be selected from users who meet the characteristics of the set posture, which can improve the interaction efficiency and service experience.
  • the target user can be further determined by the following method:
  • the interaction response priority is determined according to the user attribute information of each first user, and the first user with the highest priority is determined Determined as the target user.
  • the user attribute information used as the basis for selection can be comprehensively judged in combination with the user's current needs and actual scenes. For example, in the scenario of queuing to buy tickets, the time of arrival at the current location can be used as the basis of user attribute information to determine the interaction priority.
  • the user who arrives first has the highest interactive response priority and can be determined as the target user; in other service locations, the target user can also be determined based on other user attribute information, for example, the interaction priority is determined based on the user's points in the location , So that the user with the highest score has the highest interactive response priority.
  • each user may be further guided to output setting information. If the number of first users who output the setting information is still more than one, the user with the highest interactive response priority can be determined as the target user.
  • the target user is selected from multiple detected users in combination with user attribute information, user posture information, and application scenarios, and different interactive response priorities can be set to provide corresponding services to the target user. Selecting a suitable user as the target user for interaction improves the interaction efficiency and service experience.
  • the user can be notified that the user is selected by outputting confirmation information to the user.
  • the interactive object may be driven to point to the user with a finger, or the interactive object may be driven to highlight the user in the camera preview screen, or the confirmation information may be output by other means.
  • the user by outputting confirmation information to the target user, the user can be made sure that the user is currently in an interactive state, and the interaction efficiency is improved.
  • the interaction object After a certain user is selected as the target user for interaction, the interaction object only responds or preferentially responds to the instructions of the target user until the target user leaves the shooting range of the camera.
  • the user When the user is not detected in the image around the device, it means that there is no user around the display device, that is, the device is not currently in a state of interacting with the user.
  • This state includes that there is no user interacting with the device in the set time period before the current time, that is, waiting for the user state; it also includes the user interacting with the user in the set time period before the current time, and the device Is in the user away state.
  • the interactive object should be driven to react differently. For example, for the waiting user state, the interactive object can be driven to respond to welcome the user in combination with the current environment; while for the user leaving state, the interactive object can be driven to respond to the user who interacted most recently to end the service.
  • no user in response to the user not being detected from the image at the current moment, and within a set time period before the current moment, for example, within 5 seconds, no user is detected from the image and no user is detected from the image.
  • the user is tracked, the user to be interacted with the interactive object is determined to be empty, and the interactive object on the display device is driven to enter the waiting user state.
  • the interaction object in response to the user being not detected from the image at the current moment, and the user is detected or tracked from the image within a set period of time before the current moment, it is determined that the interaction object is The user to be interacted is the user who interacted most recently.
  • the interactive object when there is no user interacting with the interactive object, by determining that the device is currently waiting for the user or the user leaving the state, and driving the interactive object to make different responses, the interactive object is displayed The status is more in line with the interaction needs and more targeted.
  • the detection result may also include the current service status of the device.
  • the current service status may also include discovering the user status and so on.
  • the current service state of the device may also include other states, and is not limited to the above.
  • a human face and/or a human body is detected from the image around the device, it means that there is a user around the display device, and the state at the moment when the user is detected may be determined as the user-discovered state.
  • the user history information stored in the display device can also be obtained, and/or the user history information stored in the cloud can be obtained to determine whether the user is a regular user , Or whether it is a VIP customer.
  • the user history information may also include the user's name, gender, age, service record, remarks, and so on.
  • the user history information may include information input by the user, or may include information recorded by the display device and/or cloud.
  • the user history information matching the user may be found based on the detected feature information of the user's face and/or human body.
  • the interactive object When the display device is in a user discovery state, the interactive object can be driven to respond according to the current service state of the display device, user attribute information obtained from the image, and user history information obtained through search.
  • the user history information When a user is detected for the first time, the user history information may be empty, that is, the interaction object is driven according to the current service state, the user attribute information, and the environment information.
  • the user’s face and/or human body can be recognized through the image first to obtain basic user attribute information about the user.
  • the user is a female and is at the age of Between 20 and 30 years old; then, according to the user’s face and/or body feature information, search on the display device and/or the cloud to find user history information that matches the feature information, for example, the user Name, service record, etc.
  • the interactive object is driven to make a targeted welcoming action to the female user, and to show the female user the services that can be provided for the female user.
  • the order of providing services can be adjusted, so that the user can find the service items of interest more quickly.
  • feature information of the at least two users can be obtained first, and the feature information can include at least one of user posture information and user attribute information, and The feature information corresponds to user history information, where the user posture information can be obtained by recognizing the user's actions in the image.
  • the target user among the at least two users is determined according to the obtained characteristic information of the at least two users.
  • the characteristic information of each user can be comprehensively evaluated in combination with the actual scene to determine the target user to be interacted with.
  • the interactive object displayed on the display device can be driven to respond to the target user.
  • the user when the user is found, after driving the interactive object to respond, by tracking the user detected in the image surrounding the display device, for example, the facial expression of the user can be tracked, and/or, Tracking the user's actions, etc., and judging whether to make the display device enter the service activation state by judging whether the user has actively interacted expressions and/or actions.
  • specific trigger information may be set, such as common facial expressions and/or actions for greetings between people, such as blinking, nodding, waving, raising hands, slaps, and so on.
  • the specified trigger information set here may be referred to as the first trigger information.
  • the display device enters the service activation state, and the interactive object is driven to display the provided service, for example, it can be displayed in language or The text information displayed on the screen is displayed.
  • the current common somatosensory interaction requires the user to raise his hand for a period of time to activate the service. After selecting the service, the user needs to keep his hand still for several seconds to complete the activation.
  • the interactive method provided by the embodiments of the present disclosure does not require the user to raise the hand for a period of time to activate the service, nor does it need to keep the hand position different to complete the selection.
  • the service can be automatically activated and the device is in the service activation state. , Avoiding users raising their hands and waiting for a period of time, improving user experience.
  • specific trigger information can be set, such as a specific gesture action, and/or a specific voice command.
  • the specified trigger information set here may be referred to as second trigger information.
  • the second trigger information In the case of detecting the second trigger information output by the user, it is determined that the display device enters the in-service state, and the interactive object is driven to provide a service matching the second trigger information.
  • the corresponding service is executed through the second trigger information output by the user.
  • the services that can be provided to users include: the first service option, the second service option, the third service option, etc., and the corresponding second trigger information can be configured for the first service option.
  • the voice "one” can be set. "Is the second trigger information corresponding to the first service option, and the voice "two” is set as the second trigger information corresponding to the second service option, and so on.
  • the display device is caused to enter the service option corresponding to the second trigger information, and the interactive object is driven to provide the service according to the content set by the service option.
  • the first granularity (coarse-grained) identification method is to make the device enter the service activation state and drive the interactive object to display the provided service when the first trigger information output by the user is detected;
  • the recognition method is to make the device enter the in-service state in the case of detecting the second trigger information output by the user, and drive the interactive object to provide the corresponding service.
  • the user does not need to enter keys, touches, or voice input, and only stand around the display device.
  • the interactive objects displayed in the display device can make targeted welcoming actions and follow the user’s instructions.
  • the needs or interests of the users show the service items that can be provided, and enhance the user experience.
  • the environment information of the display device may be acquired, and the interactive object displayed on the display device can be driven to respond according to the detection result and the environment information.
  • the environmental information of the display device may be acquired through the geographic location of the display device and/or the application scenario of the display device.
  • the environmental information may be, for example, the geographic location of the display device, an Internet Protocol (IP) address, or the weather, date, etc. of the area where the display device is located.
  • IP Internet Protocol
  • the interactive object may be driven to respond according to the current service state and environment information of the display device.
  • the environmental information includes time, location, and weather conditions, which can drive the interactive objects displayed on the display device to make welcome actions and gestures, or make some interesting actions, and output
  • the voice "It is XX time, X month X day, X year X, XX weather, welcome to XX shopping mall in XX city, I am very happy to serve you".
  • the current time, location, and weather conditions are also added, which not only provides more information, but also makes the response of interactive objects more in line with interaction needs and more targeted.
  • the interactive objects displayed in the display device are driven to respond, so that the response of the interactive objects is more in line with the interaction requirements, and the user The interaction with the interactive objects is more real and vivid, thereby enhancing the user experience.
  • a matching predetermined response label may be obtained according to the detection result and the environmental information; then, the interactive object is driven to make a corresponding response according to the response label. This application is not limited to this.
  • the response tag may correspond to the driving text of one or more of the action, expression, gesture, and language of the interactive object. For different detection results and environmental information, corresponding driving text can be obtained according to the determined response label, so that the interactive object can be driven to output one or more of corresponding actions, expressions, and languages.
  • the corresponding response label may be: the action is a welcome action, and the voice is "Welcome to Shanghai”.
  • the corresponding response label can be: the action is welcome action, and the voice is " Good morning, Ms. Zhang, welcome, and I am very happy to serve you.”
  • the interactive object By configuring corresponding response labels for the combination of different detection results and different environmental information, and using the response labels to drive the interactive object to output one or more of the corresponding actions, expressions, and languages, the interactive object can be driven according to Different states and different scenarios of the device make different responses, so as to make the responses of the interactive objects more diversified.
  • the response tag may be input to a pre-trained neural network, and the driving text corresponding to the response tag may be output, so as to drive the interactive object to output one of corresponding actions, expressions, and language. Or multiple.
  • the neural network may be trained by a sample response label set, wherein the sample response label is annotated with corresponding driving text. After being trained, the neural network can output corresponding driving text for the output response label, so as to drive the interactive object to output one or more of corresponding actions, expressions, and languages. Compared with searching the corresponding driving text directly on the display device or the cloud, using a pre-trained neural network, driving text can also be generated for response labels that are not preset with driving text to drive the interactive object to respond appropriately.
  • the driving text can be manually configured for the corresponding response label.
  • the corresponding driving text is automatically called to drive the interactive object to respond, so that the actions and expressions of the interactive object are more natural.
  • the display device in response to the display device being in a user discovery state, obtain the position information of the user relative to the interactive object in the display device according to the position of the user in the image; and The position information adjusts the orientation of the interactive object so that the interactive object faces the user.
  • the image of the interactive object is captured by a virtual camera.
  • the virtual camera is a virtual software camera applied to 3D software and used to collect images, and the interactive object is displayed on the screen through the 3D image collected by the virtual camera. Therefore, the user's perspective can be understood as the perspective of the virtual camera in the 3D software, which will cause a problem that the interactive objects cannot achieve eye contact between users.
  • the line of sight of the interactive object is also kept aligned with the virtual camera. Since the interactive object faces the user during the interaction process, and the line of sight remains aligned with the virtual camera, the user will have the illusion that the interactive object is looking at himself, which can improve the comfort of interaction between the user and the interactive object.
  • FIG. 3 shows a schematic structural diagram of an interaction device according to at least one embodiment of the present disclosure.
  • the device may include: an image acquisition unit 301, a detection unit 302, a user selection unit 303, and a driving unit 304.
  • the image acquisition unit 301 is configured to acquire images around the display device collected by the camera, and the display device displays interactive objects through a transparent display;
  • the detection unit 302 is configured to perform operations on one or more users involved in the image. Detection;
  • user selection unit 303 in response to the detection unit 302 detects that the image involves at least two users, according to the detected feature information of the at least two users, from the at least two users Select the target user;
  • the driving unit 304 is configured to drive the interactive object displayed on the transparent display screen of the display device to respond to the target user based on the detection result of the target user.
  • the one or more users in the image described herein refer to one or more objects involved in the detection process of the image. In the following, "object” and "user” can be used interchangeably, and for convenience of presentation, they are collectively referred to as "user”.
  • the characteristic information includes user posture information and/or user attribute information.
  • the user selection unit 303 is specifically configured to: according to the degree of match between the user posture information of each of the at least two users and the posture of the set posture feature, or according to the at least two users The user attribute information of each user in each user matches the degree of the attribute of the set attribute feature, and the target user is selected from the at least two users.
  • the user selection unit 303 is specifically configured to: select one or more first users that meet the set gesture characteristics according to the user gesture information of each of the at least two users; In the case that there are at least two first users, the driving unit 304 is caused to drive the interactive object to guide the at least two first users to output setting information; According to the sequence of the setting information, the target user is determined.
  • the user selection unit 303 is specifically configured to: select one or more first users that meet the set gesture characteristics according to the user gesture information of each of the at least two users; In the case that there are at least two first users, determine the respective interaction response priorities of the at least two first users according to the respective user attribute information of the at least two first users; and according to the interaction response The priority determines the target user.
  • the device further includes a confirmation unit configured to: in response to the user selection unit 303 selecting a target user from the at least two users, causing the driving unit to drive the The interactive object outputs confirmation information to the target user.
  • the device further includes a waiting state unit for: responding to the detection unit 302 not detecting the user from the image at the current moment, and setting the device before the current moment No user is detected from the image and no user is tracked within a certain period of time, the user to be interacted with the interactive object is determined to be empty, and the display device enters a state of waiting for the user.
  • a waiting state unit for: responding to the detection unit 302 not detecting the user from the image at the current moment, and setting the device before the current moment No user is detected from the image and no user is tracked within a certain period of time, the user to be interacted with the interactive object is determined to be empty, and the display device enters a state of waiting for the user.
  • the device further includes an end state unit for: responding to the detection unit 302 not detecting a user from the image at the current moment, and setting the device before the current moment.
  • a user is detected or tracked from the image within a predetermined period of time, and the user to be interacted with the interactive object is determined to be the user who interacted most recently.
  • the display device displays the reflection of the interaction object through the transparent display screen, or the display device displays the reflection of the interaction object on the bottom plate.
  • the interactive object includes a virtual character with a three-dimensional effect.
  • At least one embodiment of the present disclosure also provides an interactive device.
  • the device includes a memory 401 and a processor 402.
  • the memory 401 is used to store instructions executable by the processor, and when the instructions are executed, the processor 402 is prompted to implement the interaction method described in any embodiment of the present disclosure.
  • At least one embodiment of the present disclosure also provides a computer-readable storage medium on which a computer program is stored.
  • the computer program When the computer program is executed by a processor, the processor realizes the interaction described in any embodiment of the present disclosure. method.
  • one or more embodiments of the present disclosure may be provided as a method, a system, or a computer program product. Therefore, one or more embodiments of the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, one or more embodiments of the present disclosure may adopt computer programs implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes. The form of the product.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • Embodiments of the subject matter in the present disclosure can be implemented as one or more computer programs, that is, one or more of computer program instructions encoded on a tangible non-transitory program carrier to be executed by a data processing device or to control the operation of the data processing device Modules.
  • the program instructions may be encoded on artificially generated propagated signals, such as machine-generated electrical, optical or electromagnetic signals, which are generated to encode information and transmit it to a suitable receiver device for data transmission.
  • the processing device executes.
  • the computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
  • the processing and logic flow in the present disclosure can be executed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating according to input data and generating output.
  • the processing and logic flow can also be executed by a dedicated logic circuit, such as FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit), and the device can also be implemented as a dedicated logic circuit.
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • Computers suitable for executing computer programs include, for example, general-purpose and/or special-purpose microprocessors, or any other type of central processing unit.
  • the central processing unit will receive instructions and data from a read-only memory and/or a random access memory.
  • the basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data.
  • the computer will also include one or more mass storage devices for storing data, such as magnetic disks, magneto-optical disks, or optical disks, or the computer will be operatively coupled to this mass storage device to receive data from or send data to it. It transmits data, or both.
  • the computer does not have to have such equipment.
  • the computer can be embedded in another device, such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a global positioning system (GPS) receiver, or, for example, a universal serial bus (USB ) Flash drives are portable storage devices, just to name a few.
  • PDA personal digital assistant
  • GPS global positioning system
  • USB universal serial bus
  • Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media, and memory devices, including, for example, semiconductor memory devices (such as EPROM, EEPROM, and flash memory devices), magnetic disks (such as internal hard disks or Removable disks), magneto-optical disks, CD ROM and DVD-ROM disks.
  • semiconductor memory devices such as EPROM, EEPROM, and flash memory devices
  • magnetic disks such as internal hard disks or Removable disks
  • magneto-optical disks CD ROM and DVD-ROM disks.
  • the processor and the memory can be supplemented by or incorporated into a dedicated logic circuit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • User Interface Of Digital Computer (AREA)
  • Transition And Organic Metals Composition Catalysts For Addition Polymerization (AREA)
  • Indexing, Searching, Synchronizing, And The Amount Of Synchronization Travel Of Record Carriers (AREA)
  • Holo Graphy (AREA)
  • Controls And Circuits For Display Device (AREA)

Abstract

本公开涉及一种交互方法、装置、设备以及存储介质。所述方法包括:获取摄像头采集的显示设备周边的图像,所述显示设备通过透明显示屏显示交互对象;对所述图像中涉及的一个或多个对象进行检测;响应于检测到所述图像中涉及至少两个对象,根据检测到的所述至少两个对象的特征信息,从所述至少两个对象中选择目标对象;基于对所述目标对象的检测结果,驱动所述显示设备的透明显示屏上显示的所述交互对象对所述目标对象进行回应。

Description

交互方法、装置、设备以及存储介质 技术领域
本公开涉及计算机视觉技术领域,具体涉及一种交互方法、装置、设备以及存储介质。
背景技术
人机交互的方式大多为:用户基于按键、触摸、语音进行输入,设备通过在显示屏上呈现图像、文本进行回应。目前虚拟人物多是在语音助理的基础上改进得到的,只是对设备的语音进行输出,用户与虚拟人物的交互还停留表面上。
发明内容
本公开实施例提供一种交互方案。
第一方面,提供一种交互方法,所述方法包括:获取摄像头采集的显示设备周边的图像,所述显示设备通过透明显示屏显示交互对象;对所述图像中涉及的一个或多个对象进行检测;响应于检测到所述图像中涉及至少两个对象,根据检测到的所述至少两个对象的特征信息,从所述至少两个对象中选择目标对象;基于对所述目标对象的检测结果,驱动所述显示设备的透明显示屏上显示的所述交互对象对所述目标对象进行回应。
通过对显示设备周边的图像进行对象检测,并根据对象的特征信息选择目标对象,驱动所述显示设备的透明显示屏上显示的所述交互对象对所述目标对象进行回应,能够在多对象场景下选择适合的目标对象进行交互,提高了交互效率和,也能提升交互体验。
在一个示例中,所述特征信息包括对象姿态信息和/或对象属性信息。
在一个示例中,所述根据检测到的所述至少两个对象的特征信息,从所述至少两个对象中选择目标对象,包括:根据所述至少两个对象中每个对象的对象姿态信息与设定姿态特征的姿态匹配程度,或,根据所述至少两个对象中每个对象的对象属性信息与设定属性特征的属性匹配程度,从所述至少两个对象中选择所述目标对象。
通过根据各个对象的对象姿态信息、对象属性信息等特征信息来从多个对象中选择目标对象,可以选择适合的对象作为进行交互的目标对象,从而提高交互效率以及服务 体验。
在一个示例中,所述根据检测到的所述至少两个对象的特征信息,从所述至少两个对象中选择目标对象,包括:根据所述至少两个对象中每个对象的对象姿态信息,选取符合设定姿态特征的一个或多个第一对象;在所述第一对象有至少两个的情况下,驱动所述交互对象引导所述至少两个第一对象各自输出设定信息,并根据检测到的所述第一对象各自输出所述设定信息的顺序,确定所述目标对象。
通过引导第一对象输出设定信息,可以从符合设定姿态特征的对象中,选取出配合意愿高的目标对象,可以提高交互效率以及服务体验。
在一个示例中,所述根据检测到的所述至少两个对象的特征信息,从所述至少两个对象中选择目标对象,包括:根据所述至少两个对象中每个对象的对象姿态信息,选取符合设定姿态特征的一个或多个第一对象;在所述第一对象有至少两个的情况下,根据所述至少两个第一对象各自的对象属性信息,确定所述至少两个第一对象各自的交互响应优先级,并根据所述交互响应优先级确定所述目标对象。
通过结合对象属性信息、对象姿态信息、应用场景来从多个检测到的对象中选择目标对象,并通过设置不同的交互响应优先级来为目标对象提供相应服务,可以选择适合的对象作为进行交互的目标对象,从而提高交互效率以及服务体验。
在一个示例中,所述方法还包括:在从所述至少两个对象中选择目标对象后,驱动所述交互对象对所述目标对象输出确认信息。
通过向目标对象输出确认信息,可以使对象明确当前处于交互状态,提高了交互效率。
在一个示例中,所述方法还包括:响应于在当前时刻从所述图像中未检测到对象,且在当前时刻之前的设定时间段内从所述图像中未检测到对象且未追踪到对象,确定所述交互对象的待交互对象为空,并使所述显示设备进入等待对象状态。
在一个示例中,所述方法还包括:响应于在当前时刻从所述图像中未检测到对象,且在当前时刻之前的设定时间段内从所述图像中检测到对象或追踪到对象,确定所述交互对象的待交互对象为最近一次进行交互的对象。
在没有对象与交互对象进行交互的情况下,通过确定设备当前处于等待对象状态或对象离开状态,并驱动所述交互对象进行不同的回应,使所述交互对象的展示状态更符合实际交互需求、更有针对性。
在一个示例中,所述显示设备通过所述透明显示屏显示所述交互对象的倒影,或者,所述显示设备在底板上显示所述交互对象的倒影。
通过在透明显示屏上显示立体画面,并在透明显示屏或底板上形成倒影以实现立体效果,能够使所显示的交互对象更加立体、生动。
在一个示例中,所述交互对象包括具有立体效果的虚拟人物。
通过利用具有立体效果的虚拟人物与对象进行交互,可以使交互过程更加自然,提升对象的交互感受。
第二方面,提供一种交互装置,所述装置包括:图像获取单元,用于获取摄像头采集的显示设备周边的图像,所述显示设备通过透明显示屏显示交互对象;检测单元,用于对所述图像中涉及的一个或多个对象进行检测;对象选择单元,用于响应于所述检测单元检测到所述图像中涉及至少两个对象,根据检测到的所述至少两个对象的特征信息,从所述至少两个对象中选择目标对象;驱动单元,用于基于对所述目标对象的检测结果,驱动所述显示设备的透明显示屏上显示的所述交互对象对所述目标对象进行回应。
在一个示例中,所述特征信息包括对象姿态信息和/或对象属性信息。
在一个示例中,所述对象选择单元具体用于:根据所述至少两个对象中每个对象的对象姿态信息与设定姿态特征的姿态匹配程度,或,根据所述至少两个对象中每个对象的对象属性信息与设定属性特征的属性匹配程度,从所述至少两个对象中选择所述目标对象。
在一个示例中,所述对象选择单元具体用于:根据所述至少两个对象中每个对象的对象姿态信息,选取符合设定姿态特征的一个或多个第一对象;在所述第一对象有至少两个的情况下,使所述驱动单元驱动所述交互对象引导所述至少两个第一对象各自输出设定信息,并根据检测到的所述第一对象各自输出所述设定信息的顺序,确定所述目标对象。
在一个示例中,所述对象选择单元具体用于:根据所述至少两个对象中每个对象的对象姿态信息,选取符合设定姿态特征的一个或多个第一对象;在所述第一对象有至少两个的情况下,根据所述至少两个第一对象各自的对象属性信息,确定所述至少两个第一对象各自的交互响应优先级,并根据所述交互响应优先级确定所述目标对象。
在一个示例中,所述装置还包括确认单元,所述确认单元用于:响应于所述对象选 择单元从所述至少两个对象中选择了目标对象,使所述驱动单元驱动所述交互对象对所述目标对象输出确认信息。
在一个示例中,所述装置还包括等待状态单元,所述等待状态单元用于:响应于所述检测单元在当前时刻从所述图像中未检测到对象,且在当前时刻之前的设定时间段内从所述图像中未检测到对象且未追踪到对象,确定所述交互对象的待交互对象为空,并使所述显示设备进入等待对象状态。
在一个示例中,所述装置还包括结束状态单元,所述结束状态单元用于:响应于所述检测单元在当前时刻从所述图像中未检测到对象,且在当前时刻之前的设定时间段内从所述图像中检测到对象或追踪到对象,确定所述交互对象的待交互对象为最近一次进行交互的对象。
在一个示例中,所述显示设备还通过所述透明显示屏显示所述交互对象的倒影,或者,所述显示设备还在底板上显示所述交互对象的倒影。
在一个示例中,所述交互对象包括具有立体效果的虚拟人物。
第三方面,提供一种交互设备,所述设备包括处理器;用于存储可由处理器执行的指令的存储器,在所述指令被执行时,促使所述处理器实现本公开提供的任一实施方式所述的交互方法。
第四方面,提供一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序被处理器执行时,促使所述处理器实现本公开提供的任一实施方式所述的交互方法。
附图说明
图1示出根据本公开至少一个实施例的交互方法的流程图;
图2示出根据本公开至少一个实施例的显示交互对象的示意图;
图3示出根据本公开至少一个实施例的交互装置的结构示意图;
图4示出根据本公开至少一个实施例的交互设备的结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附 图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所述的、本公开的一些方面相一致的装置和方法的例子。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
图1示出根据本公开的至少一个实施例的交互方法的流程图,如图1所示,所述方法包括步骤101~步骤104。
在步骤101中,获取摄像头采集的显示设备周边的图像,所述显示设备通过透明显示屏显示交互对象。
所述显示设备周边,包括所述显示设备的设定范围内任意方向,例如可以包括所述显示设备的前向、侧向、后方、上方中的一个或多个方向。
用于采集图像的摄像头,可以设置在显示设备上,也可以作为外接设备,独立于显示设备之外。并且所述摄像头采集的图像,也可以在显示设备中的透明显示屏上进行显示。所述摄像头的数量可以为多个。
可选的,摄像头所采集的图像可以是视频流中的一帧,也可以是实时获取的图像。
在步骤102中,对所述图像中涉及的一个或多个用户进行检测。本文所述的图像中的一个或多个用户是指在对图像的检测过程中涉及的一个或多个对象。在下文中,“对象”和“用户”可以互换使用,为了表述方便,统称为“用户”。
通过对显示设备周边的图像中的用户进行检测,获得检测结果,例如所述显示设备周边是否有用户、有几个用户,也可以获得关于所检测到的用户的信息,例如通过图像识别技术从所述图像中获取的特征信息,或者根据所述用户的人脸和/或人体图像在显示设备端或者云端进行查询获得的特征信息,等等。本领域技术人员应当理解,所述检测结果还可以包括其他信息。
在步骤103中,响应于检测到所述图像中涉及至少两个用户,根据检测到的所述至少两个用户的特征信息,从所述至少两个用户中选择目标用户。
对于不同的应用场景,可以根据相应的特征信息来选择用户。
在步骤104中,基于对所述目标用户的检测结果,驱动所述显示设备的透明显示屏上显示的所述交互对象对所述目标用户进行回应。
响应于对不同目标用户的检测结果,将驱动所述交互对象对所述不同目标用户进行相应的回应。
本公开实施例中,通过对显示设备周边的图像进行用户检测,并根据用户的特征信息选择目标用户,驱动所述显示设备的透明显示屏上显示的所述交互对象对所述目标用户进行回应,能够在多用户场景下选择适合当前场景的目标用户进行交互,提高了交互效率和服务体验。
在一些实施例中,所述显示设备的透明显示屏显示的交互对象包括具有立体效果的虚拟人物。
通过利用具有立体效果的虚拟人物与用户进行交互,可以使交互过程更加自然,提升用户的交互感受。
本领域技术人员应当理解,交互对象并不限于具有立体效果的虚拟人物,还可以是虚拟动物、虚拟物品、卡通形象等等其他能够实现交互功能的虚拟形象。
在一些实施例中,可以通过以下方法实现透明显示屏所显示的交互对象的立体效果。
人眼看到物体是否为立体的观感,通常由物体本身的外形以及物体的光影效果所决定。该光影效果例如为在物体不同区域的高光和暗光,以及光线照射在物体后在地面的投影(即倒影)。
利用以上原理,在一个示例中,在透明显示屏上显示出交互对象的立体视频或图像的画面的同时,还在透明显示屏上显示出该交互对象的倒影,从而使得人眼可以观察到立体画面。
在另一个示例中,所述透明显示屏的下方设置有底板,并且所述透明显示与所述底板呈垂直或倾斜状。在透明显示屏显示出交互对象的立体视频或图像的画面的同时,在所述底板上显示出所述交互对象的倒影,从而使得人眼可以观察到立体画面。
在一些实施例中,所述显示设备还包括箱体,并且所述箱体的正面设置为透明,例如通过玻璃、塑料等材料实现透明设置。透过箱体的正面能够看到透明显示屏的画面以及透明显示屏或底板上画面的倒影,从而使得人眼可以观察到立体画面,如图2所示。
在一些实施例中,箱体内还设有一个或多个光源,以为透明显示屏提供光线以形成倒影。
在本公开实施例中,通过在透明显示屏上显示交互对象的立体视频或图像的画面,并在透明显示屏或底板上形成该交互对象的倒影以实现立体效果,能够使所显示的交互对象更加立体、生动,提升用户的交互感受。
在一些实施例中,所述特征信息包括用户姿态信息和/或用户属性信息,可以根据用户姿态信息和/或用户属性信息从检测到的至少两个用户中选择目标用户。
其中,所述用户姿态信息是指通过在图像中进行影像识别所获得的特征信息,例如用户的动作、手势等等。用户属性信息是指关于用户自身的特征信息,包括所述用户的身份(比如是否为VIP用户)、服务记录、到达当前场所的时间等等。所述属性特征信息可以从存储在显示设备端或者云端的用户历史记录中获得,所述用户历史记录可以通过在所述显示设备端或者云端检索与所述用户的人脸和/或人体的特征信息相匹配的记录而获得。
在一些实施例中,可以根据所述至少两个用户中每个用户的用户姿态信息与设定姿态特征的姿态匹配程度,从所述至少两个用户中选择目标用户。
例如,假设设定姿态特征为举手动作,可以通过将所述至少两个用户的用户姿态信息与举手动作进行匹配,将所述至少两个用户的匹配结果中姿态匹配程度最高的用户确定为目标用户。
在一些实施例中,可以根据所述至少两个用户中每个用户的用户属性信息与设定属性特征的属性匹配程度,从所述至少两个用户中选择目标用户。
例如,假设设定属性特征为VIP用户、女性,可以通过将所述至少两个用户的用户属性信息与所述设定属性特征进行匹配,将所述至少两个用户的匹配结果中属性匹配程度最高的用户确定为交互对象。
在本公开实施例中,通过根据各个用户的用户姿态信息、用户属性信息等特征信息来从检测到的至少两个用户中选择目标用户,可以选择适合当前应用场景的用户作为进行交互的目标用户,从而提高交互效率以及服务体验。
在一些实施例中,可以通过以下方式从所述至少两个用户中选择目标用户:
首先,根据所述至少两个用户的用户姿态信息,选取符合设定姿态特征的第一用户。 其中,符合设定姿态特征,是指用户姿态信息与所述设定姿态特征的姿态匹配程度大于设定值,例如大于80%。
举例来说,假设设定姿态特征为举手动作,首先在图像中,选取用户姿态信息与举手动作的姿态匹配程度高于80%(认为该用户进行了举手动作)的第一用户,也即选取所有进行了举手动作的用户。
在第一用户有至少两个的情况下,可以进一步通过以下方法来确定目标用户:驱动所述交互对象引导所述至少两个第一用户各自输出设定信息,并根据检测到的所述第一用户各自输出所述设定信息的顺序,确定所述目标用户。
在一个示例中,第一用户输出的设定信息可以为动作、表情、语音中的一项或多项。例如,引导至少两个第一用户进行跳跃动作,将最先进行跳跃动作的第一用户确定为目标用户。
在本公开实施例中,通过引导第一用户输出设定信息,可以从符合设定姿态特征的用户中,选取出配合意愿高的目标用户,可以提高交互效率以及服务体验。
在第一用户有至少两个的情况下,还可以进一步通过以下方法来确定目标用户:
在所述第一用户有至少两个的情况下,根据所述至少两个第一用户各自的用户属性信息,确定所述至少两个第一用户各自的交互响应优先级;并根据所述交互响应优先级确定所述目标用户。
例如,如果进行举手动作的第一用户超过一个,则在这些举手的第一用户中,根据各第一用户的用户属性信息来确定交互响应优先级,并将优先级最高的第一用户确定为目标用户。其中,作为选取依据的用户属性信息,可以结合用户当前的需求、实际的场景综合判断。例如,在排队购票的场景下,可以将到达当前场所的时间作为所依据的用户属性信息,来确定交互优先级。最先到达的用户具有最高的交互响应优先级,可以将其确定为目标用户;在其他服务场所,还可以将根据其他用户属性信息确定目标用户,例如根据用户在该场所的积分确定交互优先级,使积分最高的用户具有最高的交互响应优先级。
在一个示例中,在确定了所述至少两个第一用户的交互响应优先级后,还可以进一步引导各个用户输出设定信息。如果输出设定信息的第一用户数量仍然多于一个,则可以将其中交互响应优先级最高的用户确定为目标用户。
在本公实施例中,结合用户属性信息、用户姿态信息、应用场景来从多个检测到的 用户中选择目标用户,并可以通过设置不同的交互响应优先级来为目标用户提供相应服务,来选择适合的用户作为进行交互的目标用户,提高了交互效率以及服务体验。
在确定了将某一用户作为进行交互的目标用户后,可以通过向该用户输出确认信息,以告知该用户被选中。例如,可以驱动所述交互对象用手指向该用户,或者驱动所述交互对象在摄像头预览画面中高亮选中该用户,或者通过其他方式输出确认信息。
在本公开实施例中,通过向目标用户输出确认信息,可以使用户明确当前处于交互状态,提高了交互效率。
在某一用户被选中作为进行交互的目标用户后,所述交互对象仅响应或者优先响应该目标用户的指令,直至该目标用户离开摄像头的拍摄范围。
在设备周边的图像中未检测到用户的情况下,表示所述显示设备周边没有用户,也即该设备当前并未处于与用户进行交互的状态。这种状态包含了在当前时刻之前的设定时间段内都没有用户与设备进行交互,也即等待用户状态;还包含了用户在当前时刻之前的设定时间段内与用户进行了交互,设备正处于用户离开状态。对于这两种不同的状态,应当驱动所述交互对象进行不同的反应。例如,对于等待用户状态,可以驱动所述交互对象结合当前环境做出欢迎用户的回应;而对于用户离开状态,可以驱动所述交互对象对最近一次进行交互的用户做出结束服务的回应。
在一些实施例中,响应于在当前时刻从所述图像中未检测到用户,且在当前时刻之前的设定时间段内,例如5秒钟内,从所述图像中未检测到用户且未追踪到用户,确定所述交互对象的待交互用户为空,并驱动所述显示设备上的所述交互对象进入等待用户状态。
在一些实施例中,响应于当前时刻从所述图像中未检测到用户,且在当前时刻之前的设定时间段内从所述图像中检测到用户或追踪到用户,确定所述交互对象的待交互用户为最近一次进行交互的用户。
在本公开实施例中,在没有用户与交互对象进行交互的情况下,通过确定设备当前处于等待用户状态或用户离开状态,并驱动所述交互对象进行不同的回应,使所述交互对象的展示状态更符合交互需求、更有针对性。
在一些实施例中,所述检测结果还可以包括所述设备的当前服务状态,所述当前服务状态除了等待用户状态、用户离开状态,还可以包括发现用户状态等等。本领域技术人员应当理解,所述设备的当前服务状态还可以包括其他状态,不限于以上所述。
在从设备周边的图像中检测到了人脸和/或人体的情况下,表示所述显示设备周边存在用户,则可以将检测到用户这一时刻的状态确定为发现用户状态。
在发现用户状态下,对于所检测到的用户,还可以获取存储在所述显示设备中的用户历史信息,和/或,获取存储在云端的用户历史信息,以确定该用户是否为老顾定,或者是否为VIP客户。所述用户历史信息还可以包含所述用户的姓名、性别、年龄、服务记录、备注等等。该用户历史信息可以包含所述用户自行输入的信息,也可以包括所述显示设备和/或云端记录的信息。通过获取用户历史信息,可以驱动所述交互对象更有针对性地对所述用户进行回应。
在一个示例中,可以根据所检测到的用户的人脸和/或人体的特征信息去查找与所述用户相匹配的用户历史信息。
在显示设备处于发现用户状态时,可以根据所述显示设备的当前服务状态、从所述图像获取的用户属性信息、通过查找获取的用户历史信息,来驱动所述交互对象进行回应。在初次检测到一个用户的时候,所述用户历史信息可以为空,也即根据所述当前服务状态、所述用户属性信息和所述环境信息来驱动所述交互对象。
在显示设备周边的图像中检测到一个用户的情况下,可以首先通过图像对该用户进行人脸和/或人体识别,获得关于所述用户的基本用户属性信息,例如该用户为女性,年龄在20岁~30岁之间;之后根据该用户的人脸和/或人体特征信息,在显示设备端和/或云端进行搜索,以查找与所述特征信息相匹配的用户历史信息,例如该用户的姓名、服务记录等等。之后,在发现用户状态下,驱动所述交互对象对该女性用户作出有针对性的欢迎动作,并向该女性用户展示可以为其提供的服务。根据用户历史信息中包括的该用户曾经使用的服务项目,可以调整提供服务的顺序,以使用户能够更快的发现感兴趣的服务项目。
当在设备周边的图像中检测到至少两个用户的情况下,可以首先获得所述至少两个用户的特征信息,该特征信息可以包括用户姿态信息、用户属性信息中的至少一项,并且所述特征信息与用户历史信息对应,其中,所述用户姿态信息可以通过对所述图像中所述用户的动作进行识别而获得。
接下来,根据所获得的所述至少两个用户的特征信息来确定所述至少两个用户中的目标用户。可以结合实际的场景综合评估各个用户的特征信息,以确定待进行交互的目标用户。
在确定了目标用户后,则可以驱动所述显示设备上显示的所述交互对象对所述目标用户进行回应。
在一些实施例中,在发现用户状态下,驱动所述交互对象进行回应之后,通过追踪在显示设备周边的图像中所检测到的用户,例如可以追踪所述用户的面部表情,和/或,追踪所述用户的动作,等等,并通过判断所述用户有无主动交互的表情和/或动作来判断是否要使所述显示设备进入服务激活状态。
在一个示例中,在追踪所述用户时,可以设置指定触发信息,例如眨眼、点头、挥手、举手、拍打等常见的人与人之间打招呼的表情和/或动作。为了与下文进行区别,此处不妨将所设置的指定触发信息称为第一触发信息。在检测到所述用户输出的所述第一触发信息的情况下,则确定所述显示设备进入服务激活状态,并驱动所述交互对象展示所提供的服务,例如可以利用语言展示,也可以用显示在屏幕上的文字信息来展示。
目前常见的体感交互需要用户先举手一段时间来激活服务,选中服务后需要保持手部位置不动若干秒后才能完成激活。本公开实施例所提供的交互方法,无需用户先举手一段时间激活服务,也无需保持手部位置不同完成选择,通过自动判断用户的指定触发信息,可以自动激活服务,使设备处于服务激活状态,避免了用户举手等待一段时间,提升了用户体验。
在一些实施例中,在服务激活状态下,可以设置指定触发信息,例如特定的手势动作,和/或特定的语音指令等。为了与上文进行区别,此处不妨将所设置的指定触发信息称为第二触发信息。在检测到所述用户输出的所述第二触发信息的情况下,则确定所述显示设备进入服务中状态,并驱动所述交互对象提供与所述第二触发信息匹配的服务。
在一个示例中,通过用户输出的第二触发信息来执行相应的服务。例如,可以为用户提供的服务包括:第一服务选项、第二服务选项、第三服务选项等等,可以并且为第一个服务选项配置相应的第二触发信息,例如,可以设置语音“一”为第一服务选项相对应的第二触发信息,设置语音“二”为与第二服务选项相对应的第二触发信息,以此类推。当检测到所述用户输出其中一个语音,则使所述显示设备进入与第二触发信息相应的服务选项,并驱动所述交互对象根据服务选项所设置的内容提供服务。
在本公开实施例中,在所述显示设备进入发现用户状态之后,提供两种粒度的识别方式。第一粒度(粗粒度)识别方式为在检测到用户输出的第一触发信息的情况下,使设备进入服务激活状态,并驱动所述交互对象展示所提供的服务;第二粒度(细粒度) 识别方式为在检测到用户输出的第二触发信息的情况下,使设备进入服务中状态,并驱动所述交互对象提供相应的服务。通过上述两种粒度的识别方式,能够使用户与交互对象的交互更流畅、更自然。
通过本公开实施例提供的交互方法,用户无需进行按键、触摸或者语音输入,仅站在显示设备的周边,显示设备中显示的交互对象即可以有针对性地做出欢迎的动作,并按照用户的需求或者兴趣展示能够提供的服务项目,提升用户的使用感受。
在一些实施例中,可以获取所述显示设备的环境信息,根据所述检测结果和所述环境信息,来驱动所述显示设备上显示的所述交互对象进行回应。
所述显示设备的环境信息可以通过所述显示设备的地理位置和/或所述显示设备的应用场景获取。所述环境信息例如可以是所述显示设备的地理位置、互联网协议(Internet Protocol,IP)地址,也可以是所述显示设备所在区域的天气、日期等等。本领域技术人员应当理解,以上环境信息仅为示例,还可以包括其他环境信息。
举例来说,在显示设备处于等待用户状态和用户离开状态时,可以根据所述显示设备的当前服务状态和环境信息驱动所述交互对象进行回应。例如,在所述显示设备处于等待用户状态时,环境信息包括时间、地点、天气情况,可以驱动显示设备所显示的交互对象做出欢迎的动作和手势,或者做出一些有趣的动作,并输出语音“现在是X年X月X日XX时刻,天气XX,欢迎光临XX城市的XX商场,很高兴为您服务”。在通用的欢迎动作、手势和语音外,还加入了当前时间、地点和天气情况,不但提供了更多资讯,还使交互对象的反应更符合交互需求、更有针对性。
通过对显示设备周边的图像进行用户检测,并根据检测结果和所述显示设备的环境信息,来驱动所述显示设备中显示的交互对象进行回应,使交互对象的反应更符合交互需求,使用户与交互对象之间的交互更加真实、生动,从而提升用户体验。
在一些实施例中,可以根据所述检测结果和所述环境信息,获得相匹配的、预定的回应标签;之后根据所述回应标签来驱动所述交互对象做出相应的回应。本申请对此并不限定。
所述回应标签可以对应于所述交互对象的动作、表情、手势、语言中的一项或多项的驱动文本。对于不同的检测结果和环境信息,可以根据所确定的回应标签获得相应的驱动文本,从而可以驱动所述交互对象输出相应的动作、表情、语言中的一项或多项。
例如,若当前服务状态为等待用户状态,并且环境信息指示地点为上海,对应的回应标签可以是:动作为欢迎动作,语音为“欢迎来到上海”。
再比如,若当前服务状态为发现用户状态,并且环境信息指示时间为上午,用户属性信息指示女性,并且用户历史记录指示姓氏为张,对应的回应标签可以是:动作为欢迎动作,语音为“张女士上午好,欢迎光临,很高兴为您提供服务”。
通过对于不同的检测结果和不同的环境信息的组合配置相应的回应标签,并通过所述回应标签来驱动交互对象输出相应的动作、表情、语言中的一项或多项,可以驱动交互对象根据设备的不同状态、不同的场景,做出不同的回应,以使所述交互对象的回应更加多样化。
在一些实施例中,可以通过将所述回应标签输入至预先训练的神经网络,输出与所述回应标签对应的驱动文本,以驱动所述交互对象输出相应的动作、表情、语言中的一项或多项。
其中,所述神经网络可以通过样本回应标签集来进行训练,其中,所述样本回应标签标注了对应的驱动文本。所述神经网络经训练后,对于所输出的回应标签,能够输出相应的驱动文本,以驱动所述交互对象输出相应的动作、表情、语言中的一项或多项。相较于直接在显示设备端或云端搜索对应的驱动文本,采用预先训练的神经网络,对于没有预先设置驱动文本的回应标签,也能够生成驱动文本,以驱动所述交互对象进行适当的回应。
在一些实施例中,针对高频、重要的场景,还可以通过人工配置的方式进行优化。也即,对于出现频次较高的检测结果与环境信息的组合,可以为其对应的回应标签人工配置驱动文本。在该场景出现时,自动调用相应的驱动文本驱动所述交互对象进行回应,以使交互对象的动作、表情更加自然。
在一个实施例中,响应于所述显示设备处于发现用户状态,根据所述用户在所述图像中的位置,获得所述用户相对于所述显示设备中的交互对象的位置信息;并根据所述位置信息调整所述交互对象的朝向,使所述交互对象面向所述用户。
在一些实施例中,所述交互对象的图像是通过虚拟摄像头采集的。虚拟摄像头是应用于3D软件、用于采集图像的虚拟软件摄像头,交互对象是通过所述虚拟摄像头采集的3D图像显示在屏幕上的。因此用户的视角可以理解为3D软件中虚拟摄像头的视角,这样就会带来一个问题,就是交互对象无法实现用户之间的眼神交流。
为了解决以上问题,在本公开至少一个实施例中,在调整交互对象的身体朝向的同时,还使所述交互对象的视线保持对准所述虚拟摄像头。由于交互对象的在交互过程中面向用户,并且视线保持对准虚拟摄像头,因此用户会有交互对象正看自己的错觉,可以提升用户与交互对象交互的舒适性。
图3示出根据本公开至少一个实施例的交互装置的结构示意图,如图3所示,该装置可以包括:图像获取单元301、检测单元302、用户选择单元303和驱动单元304。
其中,图像获取单元301,用于获取摄像头采集的显示设备周边的图像,所述显示设备通过透明显示屏显示交互对象;检测单元302,用于对所述图像中涉及的一个或多个用户进行检测;用户选择单元303,用于响应于所述检测单元302检测到所述图像中涉及至少两个用户,根据检测到的所述至少两个用户的特征信息,从所述至少两个用户中选择目标用户;驱动单元304,用于基于对所述目标用户的检测结果,驱动所述显示设备的透明显示屏上显示的所述交互对象对所述目标用户进行回应。本文所述的图像中的一个或多个用户是指在对图像的检测过程中涉及的一个或多个对象。在下文中,“对象”和“用户”可以互换使用,为了表述方便,统称为“用户”。
在一些实施例中,所述特征信息包括用户姿态信息和/或用户属性信息。
在一些实施例中,所述用户选择单元303具体用于:根据所述至少两个用户中每个用户的用户姿态信息与设定姿态特征的姿态匹配程度,或,根据所述至少两个用户中每个用户的用户属性信息与设定属性特征的属性匹配程度,从所述至少两个用户中选择目标用户。
在一些实施例中,所述用户选择单元303具体用于:根据所述至少两个用户中每个用户的用户姿态信息,选取符合设定姿态特征的一个或多个第一用户;在所述第一用户有至少两个的情况下,使所述驱动单元304驱动所述交互对象引导所述至少两个第一用户各自输出设定信息;并根据检测到的所述第一用户各自输出所述设定信息的顺序,确定所述目标用户。
在一些实施例中,所述用户选择单元303具体用于:根据所述至少两个用户中每个用户的用户姿态信息,选取符合所述设定姿态特征的一个或多个第一用户;在所述第一用户有至少两个的情况下,根据所述至少两个第一用户各自的用户属性信息,确定所述至少两个第一用户各自的交互响应优先级;并根据所述交互响应优先级确定所述目标用户。
在一些实施例中,所述装置还包括确认单元,所述确认单元用于:响应于所述用户选择单元303从所述至少两个用户中选择了目标用户,使所述驱动单元驱动所述交互对象对所述目标用户输出确认信息。
在一些实施例中,所述装置还包括等待状态单元,所述等待状态单元用于:响应于所述检测单元302在当前时刻从所述图像中未检测到用户,且在当前时刻之前的设定时间段内从所述图像中未检测到用户且未追踪到用户,确定所述交互对象的待交互用户为空,并使所述显示设备进入等待用户状态。
在一些实施例中,所述装置还包括结束状态单元,所述结束状态单元用于:响应于所述检测单元302在当前时刻从所述图像中未检测到用户,且在当前时刻之前的设定时间段内从所述图像中检测到用户或追踪到用户,确定所述交互对象的待交互用户为最近一次进行交互的用户。
在一些实施例中,所述显示设备通过所述透明显示屏显示所述交互对象的倒影,或者,所述显示设备在底板上显示所述交互对象的倒影。
在一些实施例中,所述交互对象包括具有立体效果的虚拟人物。
本公开至少一个实施例还提供了一种交互设备,如图4所示,所述设备包括存储器401、处理器402。存储器401用于存储可由处理器执行的指令,所述指令被执行时,促使处理器402实现本公开任一实施例所述的交互方法。
本公开至少一个实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时,使所述处理器实现本公开任一实施例所述的交互方法。
本领域技术人员应明白,本公开一个或多个实施例可提供为方法、***或计算机程序产品。因此,本公开一个或多个实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本公开一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于数据处理设备实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
上述对本公开特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的行为或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
本公开中的主题及功能操作的实施例可以在以下中实现:数字电子电路、有形体现的计算机软件或固件、包括本公开中公开的结构及其结构性等同物的计算机硬件、或者它们中的一个或多个的组合。本公开中的主题的实施例可以实现为一个或多个计算机程序,即编码在有形非暂时性程序载体上以被数据处理装置执行或控制数据处理装置的操作的计算机程序指令中的一个或多个模块。可替代地或附加地,程序指令可以被编码在人工生成的传播信号上,例如机器生成的电、光或电磁信号,该信号被生成以将信息编码并传输到合适的接收机装置以由数据处理装置执行。计算机存储介质可以是机器可读存储设备、机器可读存储基板、随机或串行存取存储器设备、或它们中的一个或多个的组合。
本公开中的处理及逻辑流程可以由执行一个或多个计算机程序的一个或多个可编程计算机执行,以通过根据输入数据进行操作并生成输出来执行相应的功能。所述处理及逻辑流程还可以由专用逻辑电路—例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)来执行,并且装置也可以实现为专用逻辑电路。
适合用于执行计算机程序的计算机包括,例如通用和/或专用微处理器,或任何其他类型的中央处理单元。通常,中央处理单元将从只读存储器和/或随机存取存储器接收指令和数据。计算机的基本组件包括用于实施或执行指令的中央处理单元以及用于存储指令和数据的一个或多个存储器设备。通常,计算机还将包括用于存储数据的一个或多个大容量存储设备,例如磁盘、磁光盘或光盘等,或者计算机将可操作地与此大容量存储设备耦接以从其接收数据或向其传送数据,抑或两种情况兼而有之。然而,计算机不是必须具有这样的设备。此外,计算机可以嵌入在另一设备中,例如移动电话、个人数字助理(PDA)、移动音频或视频播放器、游戏操纵台、全球定位***(GPS)接收机、或例如通用串行总线(USB)闪存驱动器的便携式存储设备,仅举几例。
适合于存储计算机程序指令和数据的计算机可读介质包括所有形式的非易失性存储器、媒介和存储器设备,例如包括半导体存储器设备(例如EPROM、EEPROM和闪存设备)、磁盘(例如内部硬盘或可移动盘)、磁光盘以及CD ROM和DVD-ROM 盘。处理器和存储器可由专用逻辑电路补充或并入专用逻辑电路中。
虽然本公开包含许多具体实施细节,但是这些不应被解释为限制本公开的范围或所要求保护的范围,而是主要用于描述本公开的一些实施例的特征。本公开的多个实施例中的某些特征也可以在单个实施例中被组合实施。另一方面,在单个实施例中描述的各种特征也可以在多个实施例中分开实施或以任何合适的子组合来实施。此外,虽然特征可以如上所述在某些组合中起作用并且甚至最初如此要求保护,但是来自所要求保护的组合中的一个或多个特征在一些情况下可以从该组合中去除,并且所要求保护的组合可以指向子组合或子组合的变型。
类似地,虽然在附图中以特定顺序描绘了操作,但是这不应被理解为要求这些操作以所示的特定顺序执行或顺次执行、或者要求所有例示的操作被执行,以实现期望的结果。在某些情况下,多任务和并行处理可能是有利的。此外,上述实施例中的各种***模块和组件的分离不应被理解为在所有实施例中均需要这样的分离,并且应当理解,所描述的程序组件和***通常可以一起集成在单个软件产品中,或者封装成多个软件产品。
由此,主题的特定实施例已被描述。其他实施例在所附权利要求书的范围以内。在某些情况下,权利要求书中记载的动作可以以不同的顺序执行并且仍实现期望的结果。此外,附图中描绘的处理并非必需所示的特定顺序或顺次顺序,以实现期望的结果。在某些实现中,多任务和并行处理可能是有利的。
以上所述仅为本公开的一些实施例而已,并不用以限制本公开。凡在本公开的精神和原则之内所做的任何修改、等同替换、改进等,均应包含在本公开的范围之内。

Claims (22)

  1. 一种交互方法,所述方法包括:
    获取摄像头采集的显示设备周边的图像,所述显示设备通过透明显示屏显示交互对象;
    对所述图像中涉及的一个或多个对象进行检测;
    响应于检测到所述图像中涉及至少两个对象,根据检测到的所述至少两个对象的特征信息,从所述至少两个对象中选择目标对象;
    基于对所述目标对象的检测结果,驱动所述显示设备的透明显示屏上显示的所述交互对象对所述目标对象进行回应。
  2. 根据权利要求1所述的方法,其中,所述特征信息包括对象姿态信息和/或对象属性信息。
  3. 根据权利要求2所述的方法,其中,所述根据检测到的所述至少两个对象的特征信息,从所述至少两个对象中选择目标对象,包括:
    根据所述至少两个对象中每个对象的对象姿态信息与设定姿态特征的姿态匹配程度,或,根据所述至少两个对象中每个对象的对象属性信息与设定属性特征的属性匹配程度,从所述至少两个对象中选择所述目标对象。
  4. 根据权利要求2所述的方法,其中,所述根据检测到的所述至少两个对象的特征信息,从所述至少两个对象中选择目标对象,包括:
    根据所述至少两个对象中每个对象的对象姿态信息,选取符合设定姿态特征的一个或多个第一对象;
    在所述第一对象有至少两个的情况下,驱动所述交互对象引导所述至少两个第一对象各自输出设定信息,并根据检测到的所述第一对象各自输出所述设定信息的顺序,确定所述目标对象。
  5. 根据权利要求2所述的方法,其中,所述根据检测到的所述至少两个对象的特征信息,从所述至少两个对象中选择目标对象,包括:
    根据所述至少两个对象中每个对象的对象姿态信息,选取符合设定姿态特征的一个或多个第一对象;
    在所述第一对象有至少两个的情况下,根据所述至少两个第一对象各自的对象属性信息,确定所述至少两个第一对象各自的交互响应优先级,并根据所述交互响应优先级确定所述目标对象。
  6. 根据权利要求1至5任一项所述的方法,所述方法还包括:
    在从所述至少两个对象中选择目标对象后,驱动所述交互对象对所述目标对象输出确认信息。
  7. 根据权利要求1至6任一项所述的方法,所述方法还包括:
    响应于在当前时刻从所述图像中未检测到对象,且在当前时刻之前的设定时间段内从所述图像中未检测到对象且未追踪到对象,确定所述交互对象的待交互对象为空,并使所述显示设备进入等待对象状态。
  8. 根据权利要求1至6任一项所述的方法,所述方法还包括:
    响应于在当前时刻从所述图像中未检测到对象,且在当前时刻之前的设定时间段内从所述图像中检测到对象或追踪到对象,确定所述交互对象的待交互对象为最近一次进行交互的对象。
  9. 根据权利要求1至8任一项所述的方法,其中,所述显示设备通过所述透明显示屏显示所述交互对象的倒影,或者,所述显示设备在底板上显示所述交互对象的倒影。
  10. 根据权利要求1至9任一项所述的方法,其中,所述交互对象包括具有立体效果的虚拟人物。
  11. 一种交互装置,所述装置包括:
    图像获取单元,用于获取摄像头采集的显示设备周边的图像,所述显示设备通过透明显示屏显示交互对象;
    检测单元,用于对所述图像中涉及的一个或多个对象进行检测;
    对象选择单元,用于响应于所述检测单元检测到所述图像中涉及至少两个对象,根据检测到的所述至少两个对象的特征信息,从所述至少两个对象中选择目标对象;
    驱动单元,用于基于对所述目标对象的检测结果,驱动所述显示设备的透明显示屏上显示的所述交互对象对所述目标对象进行回应。
  12. 根据权利要求11所述的装置,其中,所述特征信息包括对象姿态信息和/或对象属性信息。
  13. 根据权利要求12所述的装置,其中,所述对象选择单元用于:
    根据所述至少两个对象中每个对象的对象姿态信息与设定姿态特征的姿态匹配程度,或,根据所述至少两个对象中每个对象的对象属性信息与设定属性特征的属性匹配程度,从所述至少两个对象中选择所述目标对象。
  14. 根据权利要求12所述的装置,其中,所述对象选择单元用于:
    根据所述至少两个对象中每个对象的对象姿态信息,选取符合设定姿态特征的一个或多个第一对象;
    在所述第一对象有至少两个的情况下,使所述驱动单元驱动所述交互对象引导所述至少两个第一对象各自输出设定信息,并根据检测到的所述第一对象各自输出所述设定信息的顺序,确定所述目标对象。
  15. 根据权利要求12所述的装置,其中,所述对象选择单元用于:
    根据所述至少两个对象中每个对象的对象姿态信息,选取符合设定姿态特征的一个或多个第一对象;
    在所述第一对象有至少两个的情况下,根据所述至少连个第一对象各自的对象属性信息,确定所述至少两个第一对象各自的交互响应优先级,并根据所述交互响应优先级确定所述目标对象。
  16. 根据权利要求11至15任一项所述的装置,其中,所述装置还包括确认单元,所述确认单元用于:
    响应于所述对象选择单元从所述至少两个对象中选择了目标对象,使所述驱动单元驱动所述交互对象对所述目标对象输出确认信息。
  17. 根据权利要求11至16任一项所述的装置,其特征在于,所述装置还包括等待状态单元,所述等待状态单元用于:
    响应于所述检测单元在当前时刻从所述图像中未检测到对象,且在当前时刻之前的设定时间段内从所述图像中未检测到对象且未追踪到对象,确定所述交互对象的待交互对象为空,并使所述显示设备进入等待对象状态。
  18. 根据权利要求11至16任一项所述的装置,其中,所述装置还包括结束状态单元,所述结束状态单元用于:
    响应于所述检测单元在当前时刻从所述图像中未检测到对象,且在当前时刻之前的设定时间段内从所述图像中检测到对象或追踪到对象,确定所述交互对象的待交互对象为最近一次进行交互的对象。
  19. 根据权利要求11至18任一项所述的装置,其中,所述显示设备通过所述透明显示屏显示所述交互对象的倒影,或者,所述显示设备在底板上显示所述交互对象的倒影。
  20. 根据权利要求11至19任一项所述的装置,其中,所述交互对象包括具有立体效果的虚拟人物。
  21. 一种交互设备,所述设备包括:
    处理器;以及
    用于存储可由所述处理器执行的指令的存储器,
    其中,所述指令在被执行时,促使所述处理器实现根据权利要求1至10任一项所述的交互方法。
  22. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时,使所述处理器实现根据权利要求1至10任一项所述的交互方法。
PCT/CN2020/104466 2019-08-28 2020-07-24 交互方法、装置、设备以及存储介质 WO2021036624A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2021556968A JP7224488B2 (ja) 2019-08-28 2020-07-24 インタラクティブ方法、装置、デバイス、及び記憶媒体
KR1020217031185A KR20210131415A (ko) 2019-08-28 2020-07-24 인터렉티브 방법, 장치, 디바이스 및 기록 매체
US17/681,026 US20220179609A1 (en) 2019-08-28 2022-02-25 Interaction method, apparatus and device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910803899.3 2019-08-28
CN201910803899.3A CN110716634A (zh) 2019-08-28 2019-08-28 交互方法、装置、设备以及显示设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/681,026 Continuation US20220179609A1 (en) 2019-08-28 2022-02-25 Interaction method, apparatus and device and storage medium

Publications (1)

Publication Number Publication Date
WO2021036624A1 true WO2021036624A1 (zh) 2021-03-04

Family

ID=69209574

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/104466 WO2021036624A1 (zh) 2019-08-28 2020-07-24 交互方法、装置、设备以及存储介质

Country Status (6)

Country Link
US (1) US20220179609A1 (zh)
JP (1) JP7224488B2 (zh)
KR (1) KR20210131415A (zh)
CN (1) CN110716634A (zh)
TW (1) TWI775134B (zh)
WO (1) WO2021036624A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110716641B (zh) * 2019-08-28 2021-07-23 北京市商汤科技开发有限公司 交互方法、装置、设备以及存储介质
CN110716634A (zh) * 2019-08-28 2020-01-21 北京市商汤科技开发有限公司 交互方法、装置、设备以及显示设备
CN111443801B (zh) * 2020-03-25 2023-10-13 北京百度网讯科技有限公司 人机交互方法、装置、设备及存储介质
CN111459452B (zh) * 2020-03-31 2023-07-18 北京市商汤科技开发有限公司 交互对象的驱动方法、装置、设备以及存储介质
CN111627097B (zh) * 2020-06-01 2023-12-01 上海商汤智能科技有限公司 一种虚拟景物的展示方法及装置
CN111640197A (zh) * 2020-06-09 2020-09-08 上海商汤智能科技有限公司 一种增强现实ar特效控制方法、装置及设备
CN114466128B (zh) * 2020-11-09 2023-05-12 华为技术有限公司 目标用户追焦拍摄方法、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102221886A (zh) * 2010-06-11 2011-10-19 微软公司 通过化身与用户界面交互
EP2919094A1 (en) * 2014-03-10 2015-09-16 BAE Systems PLC Interactive information display
CN106325517A (zh) * 2016-08-29 2017-01-11 袁超 一种基于虚拟现实的目标对象触发方法、***和穿戴设备
CN107728782A (zh) * 2017-09-21 2018-02-23 广州数娱信息科技有限公司 交互方法及交互***、服务器
CN106203364B (zh) * 2016-07-14 2019-05-24 广州帕克西软件开发有限公司 一种3d眼镜互动试戴***及方法
CN110716634A (zh) * 2019-08-28 2020-01-21 北京市商汤科技开发有限公司 交互方法、装置、设备以及显示设备

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6720949B1 (en) * 1997-08-22 2004-04-13 Timothy R. Pryor Man machine interfaces and applications
JP2005189426A (ja) * 2003-12-25 2005-07-14 Nippon Telegr & Teleph Corp <Ntt> 情報表示装置および情報入出力装置
US8555207B2 (en) * 2008-02-27 2013-10-08 Qualcomm Incorporated Enhanced input using recognized gestures
JP6322927B2 (ja) * 2013-08-14 2018-05-16 富士通株式会社 インタラクション装置、インタラクションプログラムおよびインタラクション方法
TW201614423A (en) * 2014-10-03 2016-04-16 Univ Southern Taiwan Sci & Tec Operation system for somatosensory device
CN104978029B (zh) * 2015-06-30 2018-11-23 北京嘿哈科技有限公司 一种屏幕操控方法及装置
KR20170029320A (ko) * 2015-09-07 2017-03-15 엘지전자 주식회사 이동 단말기 및 그 제어방법
WO2017086108A1 (ja) * 2015-11-16 2017-05-26 大日本印刷株式会社 情報提示装置、情報提示方法、プログラム、情報処理装置及び案内ロボット制御システム
JP6768597B2 (ja) * 2017-06-08 2020-10-14 株式会社日立製作所 対話システム、対話システムの制御方法、及び装置
CN107728780B (zh) * 2017-09-18 2021-04-27 北京光年无限科技有限公司 一种基于虚拟机器人的人机交互方法及装置
CN108153425A (zh) * 2018-01-25 2018-06-12 余方 一种基于全息投影的互动娱乐***和方法
CN108780361A (zh) * 2018-02-05 2018-11-09 深圳前海达闼云端智能科技有限公司 人机交互方法、装置、机器人及计算机可读存储介质
CN108470205A (zh) * 2018-02-11 2018-08-31 北京光年无限科技有限公司 基于虚拟人的头部交互方法及***
CN108415561A (zh) * 2018-02-11 2018-08-17 北京光年无限科技有限公司 基于虚拟人的手势交互方法及***
CN108363492B (zh) * 2018-03-09 2021-06-25 南京阿凡达机器人科技有限公司 一种人机交互方法及交互机器人
CN108682202A (zh) * 2018-04-27 2018-10-19 伍伟权 一种文科用全息投影教学设备
CN109522790A (zh) * 2018-10-08 2019-03-26 百度在线网络技术(北京)有限公司 人体属性识别方法、装置、存储介质及电子设备
CN109739350A (zh) * 2018-12-24 2019-05-10 武汉西山艺创文化有限公司 基于透明液晶显示屏的ai智能助理设备及其交互方法
CN110119197A (zh) * 2019-01-08 2019-08-13 佛山市磁眼科技有限公司 一种全息互动***

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102221886A (zh) * 2010-06-11 2011-10-19 微软公司 通过化身与用户界面交互
EP2919094A1 (en) * 2014-03-10 2015-09-16 BAE Systems PLC Interactive information display
CN106203364B (zh) * 2016-07-14 2019-05-24 广州帕克西软件开发有限公司 一种3d眼镜互动试戴***及方法
CN106325517A (zh) * 2016-08-29 2017-01-11 袁超 一种基于虚拟现实的目标对象触发方法、***和穿戴设备
CN107728782A (zh) * 2017-09-21 2018-02-23 广州数娱信息科技有限公司 交互方法及交互***、服务器
CN110716634A (zh) * 2019-08-28 2020-01-21 北京市商汤科技开发有限公司 交互方法、装置、设备以及显示设备

Also Published As

Publication number Publication date
KR20210131415A (ko) 2021-11-02
JP7224488B2 (ja) 2023-02-17
US20220179609A1 (en) 2022-06-09
TWI775134B (zh) 2022-08-21
TW202109246A (zh) 2021-03-01
JP2022526772A (ja) 2022-05-26
CN110716634A (zh) 2020-01-21

Similar Documents

Publication Publication Date Title
WO2021036624A1 (zh) 交互方法、装置、设备以及存储介质
JP7100092B2 (ja) ワードフロー注釈
US10817760B2 (en) Associating semantic identifiers with objects
WO2021036622A1 (zh) 交互方法、装置、设备以及存储介质
CN109635621B (zh) 用于第一人称视角中基于深度学习识别手势的***和方法
KR102257181B1 (ko) 감각 안경류
EP2877254B1 (en) Method and apparatus for controlling augmented reality
US9280972B2 (en) Speech to text conversion
EP2912659B1 (en) Augmenting speech recognition with depth imaging
KR101832693B1 (ko) 직관적 컴퓨팅 방법들 및 시스템들
JP2019197499A (ja) プログラム、記録媒体、拡張現実感提示装置及び拡張現実感提示方法
CN109815409A (zh) 一种信息的推送方法、装置、穿戴设备及存储介质
US20220012283A1 (en) Capturing Objects in an Unstructured Video Stream
JP2022531055A (ja) インタラクティブ対象の駆動方法、装置、デバイス、及び記録媒体
US20150123901A1 (en) Gesture disambiguation using orientation information
KR20220111716A (ko) 디바이스 로컬리제이션을 위한 디바이스 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20859149

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021556968

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20217031185

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20859149

Country of ref document: EP

Kind code of ref document: A1