CN102375541B

CN102375541B - User movement is converted into the response of multiple object

Info

Publication number: CN102375541B
Application number: CN201110280552.9A
Authority: CN
Inventors: O·O·G·桑托斯; M·黑格; C·武切蒂奇; B·欣德尔; D·A·贝内特
Original assignee: Microsoft Technology Licensing LLC
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2010-08-20
Filing date: 2011-08-19
Publication date: 2016-12-14
Anticipated expiration: 2031-08-19

Abstract

The present invention relates to be converted into user movement the method and system of multiple object response.Provide a kind of user based on the application performed on calculating equipment mutual, user movement is converted into the system of multiple objects response of onscreen object.The user movement data from one or more users are received at the equipment of seizure.Described user movement data are corresponding to mutual with the user that the onscreen object that presents in application is carried out.This onscreen object is corresponding to the object in addition to representing on the screen of the user shown by the equipment of calculating.User movement data are automatically converted into multiple objects response of onscreen object.Multiple objects of onscreen object are responded and is simultaneously displayed to user.

Description

User movement is converted into the response of multiple object

Technical field

The present invention relates to be converted into user movement the method and system of multiple object response.

Background technology

In the past, such as computer game and multimedia application etc. calculate application and use controller, remote controller, keyboard, mouse Etc. allowing user direct game role.Recently, computer game and multimedia application have started to use photographing unit and software appearance Gesture identification provides man-machine interface (" HCI ").Use HCI, the user of user's posture form be detected alternately, explain and make for Control game role.

Summary of the invention

Disclose a kind of technology, it by allow user be traditionally on noninteractive and static various screens right As interacting, strengthen mutual with the user that application is carried out.Such as, user movement can be used to change, control or move to remove Object beyond representing on the screen of user.Catch the user movement and want and applying the user view interacted relevant, and This user movement is converted into multiple objects response of onscreen object.Onscreen object multiple objects response for apply into Consumer's Experience that is that the mutual user of row provides enhancing and that improve, without change that application carries out by user typical or its The most mutual result of his mode.

In one embodiment, it is provided that a kind of based on mutual with the user that the application performed on the computing device is carried out The method that user movement is converted into multiple objects response of onscreen object.Receive from one or more from the equipment of seizure The user movement data of user.These user movement data are handed over corresponding to the user carried out with the onscreen object that presents in application Mutually.This onscreen object is corresponding to the object in addition to representing on the screen of the user shown by the equipment of calculating.By user movement Data are automatically converted into multiple objects response of onscreen object.Multiple objects of onscreen object are responded and is simultaneously displayed to use Family.

This general introduction is provided so as to introduce in simplified form will be described in detail below in some concepts of further describing. Present invention is not intended as identifying key feature or the essential feature of theme required for protection, is intended to be used to help Determine the scope of theme required for protection.Additionally, theme required for protection is not limited to solve any part in the disclosure In each realization of any or all shortcoming of mentioning.

Accompanying drawing explanation

Fig. 1 shows that user is playing an embodiment of the tracking system of game.

Fig. 2 shows the embodiment catching equipment of the part that can be used as tracking system.

Fig. 3 shows and can be used to pursuit movement and carry out the reality calculating equipment of more new opplication based on the motion followed the tracks of Execute example.

Fig. 4 shows and can be used to pursuit movement and carry out the reality calculating equipment of more new opplication based on the motion followed the tracks of Execute example.

Fig. 5 depicts the flow chart of an embodiment of the process for performing each operation in disclosure technology.

Fig. 6 depicts the stream of an embodiment of the process for catching user movement data according to disclosure technology Cheng Tu.

Fig. 7 shows skeleton pattern or the example of mapping of the human object that expression scanned.

Fig. 8 provides the further detail below of the exemplary embodiment of the gesture recognition engine shown in Fig. 2.

Fig. 9 depict according to disclosure technology for determining whether user movement data mate the process of a posture The flow chart of one embodiment.

Figure 10-14 shows depicting and the user that carried out of application that performs on calculating equipment according to disclosure technology Mutual various user interface screen.

Detailed description of the invention

Disclosing a kind of technology, the user carried out by this technology and the application performed on the computing device is obtained alternately Strengthen, enable a user to and be not that the onscreen object of expression interacts on the screen of user.Seizure equipment is caught Catch the user movement data relating to applying the user view interacted.Calculating equipment is connected to described seizure equipment, and from Described seizure equipment receives the information relevant with user movement data, to control the various aspects of application.Application can include example Such as the video game application performed on said computing device or film applications.One or more users are via in calculating equipment User interface interacts with the onscreen object described by this application.In one embodiment, onscreen object corresponds to The most unconventional object for appreciation game element, such as at non-interactive type or static experience (the static film the most such as presented by application Sequence, series of stories, film sequence or animation) period display lived object or abiotic object.

In one embodiment, user movement data are converted into multiple objects response of onscreen object by calculating equipment. User movement data can receive at one or more users mutual with application.Multiple objects response of onscreen object can be wrapped Include the motion response of onscreen object, acoustic frequency response or eye response.Can be via the user interface in calculating equipment by right As response is simultaneously displayed to user.

Fig. 1 shows the target recognition of each operation for performing disclosure technology, analyzes and one of tracking system 10 Embodiment (the most commonly referred to follows the tracks of system).Target recognition, analyze and tracking system 10 can be used to identify, analyze and/or Follow the tracks of the human object of such as user 18 grade.As it is shown in figure 1, tracking system 10 can include calculating equipment 12.Calculating equipment 12 can To be computer, games system or control station etc..According to an embodiment, calculating equipment 12 can include nextport hardware component NextPort and/ Or component software, the application of application, non-gaming application etc so that calculating equipment 12 can be used to perform such as to play.One In individual embodiment, calculating equipment 12 can include performing in processor readable storage device storage, retouch at this for performing The processor of the instruction of the process stated, such as standardization device, application specific processor, microprocessor etc..

As it is shown in figure 1, tracking system 10 may also include seizure equipment 20.Seizure equipment 20 can be such as photographing unit, should Photographing unit can be used for visually monitoring one or more users of such as user 18 grade, so that catch, analyze also Follow the tracks of the posture made by the one or more user, in order to control each side of the application performed on calculating equipment 12 Face.

According to an embodiment, tracking system 10 is connectable to provide game or application to the user of such as user 18 grade The audio-visual equipment 16 of vision and/or audio frequency, such as television set, monitor, HDTV (HDTV) etc..Such as, equipment is calculated 12 can include the audio frequency adapter such as the video adapters such as such as graphics card and/or such as sound card, and these adapters can provide and swim The audio visual signal that play application, non-gaming application etc. are associated.Audio-visual equipment 16 can receive audio visual signal from calculating equipment 12, then The game being associated with audio visual signal or application vision and/or audio frequency can be exported to user 18.According to an embodiment, audiovisual sets Standby 16 can be via such as, and S-vision cable, coaxial cable, HDMI cable, DVI cable, VGA cable etc. are connected to calculating equipment 12。

This target recognition, analyze and tracking system 10 can be used to identify, analyze and/or follow the tracks of of such as user 18 etc. Or multiple human object.Such as, seizure equipment 20 can be used to follow the tracks of user 18 so that can the movement of user 18 be construed to can The application performed by calculating equipment 12 for impact or the control of operating system.

Fig. 2 shows an embodiment of seizure equipment 20 and calculating equipment 12, they can target recognition, analysis and Tracking system 10 uses, in order to identify the mankind in capture region or nonhuman target, and unique terrestrial reference in three dimensions Know them and follow the tracks of them.According to an embodiment, seizure equipment 20 can be configured to via any suitable technology, including example Having the video of depth information as flight time, structured light, stereo-picture etc. catch, this depth information includes including The depth image of depth value.According to an embodiment, the depth information calculated can be organized as " Z layer " and maybe can hang down by seizure equipment 20 Straight in the layer of the Z axis extended along its sight line from depth camera.

As in figure 2 it is shown, capture device 20 can include image camera component 32.According to an embodiment, image camera group Part 32 can be the depth camera of the depth image that can catch scene.Depth image can include the two dimension (2-of caught scene D) pixel region, wherein each pixel in 2-D pixel region can represent depth value, the most such as in terms of centimetre, millimeter etc. The distance to image distance camera in the scene caught.

As in figure 2 it is shown, image camera component 32 can include the IR light group that can be used to catch the depth image of capture region Part 34, three-dimensional (3-D) camera 36 and RGB camera 38.Such as, in ToF analysis, catch the IR optical assembly of equipment 20 Then 34 can use sensor by infrared light emission to capture region, by such as 3-D camera 36 and/or RGB camera The light of 38 backscatter,surfaces detecting the one or more targets from capture region and object.In certain embodiments, Pulsed infrared light can be used such that it is able to the time difference measured between outgoing light pulse and corresponding incident light pulse general It is for determining the physical distance of the target from seizure equipment 20 to capture region or the ad-hoc location on object.Additionally, can It is compared to determine phase shift by the phase place of outgoing light wave and the phase place of incident light wave.Then can use this phase in-migration determine from Capture device to the physical distance of ad-hoc location on target or object.

According to another embodiment, ToF analysis can be used, by via including such as shutter light pulse imaging Various technology analyze reflection light beam Strength Changes in time to be determined indirectly from seizure equipment 20 to target or object On the physical distance of ad-hoc location.

In another example, seizure equipment 20 can use structured light to catch depth information.In this analysis, patterning Light (that is, being shown as the light of the such as known pattern such as lattice or candy strip) can be projected via such as IR optical assembly 34 In capture region.During the surface of one or more targets in striking capture region or object, as response, pattern can Deformation.This deformation of pattern can be caught by such as 3-D camera 36 and/or RGB camera 38, then can be analyzed to determine from Capture device to the physical distance of ad-hoc location on target or object.

According to an embodiment, seizure equipment 20 can include can observing capture region from different angles two or more The camera being physically isolated, can be resolved to generate the visual stereo data of depth information to obtain more.It is used as it He creates depth image by the depth image sensor of type.

Seizure equipment 20 may also include microphone 40.Microphone 40 can include can receiving sound and converting thereof into changing of the signal of telecommunication Can device or sensor.According to an embodiment, microphone 40 can be used to reduce in target recognition, analyzes and catching in tracking system 10 Catch the feedback between equipment 20 and calculating equipment 12.It addition, microphone 40 can be used to receive also can customer-furnished audio signal, Apply with such as game application, the non-gaming application etc. controlling to be performed by calculating equipment 12.

In one embodiment, seizure equipment 20 can also include can carrying out exercisable logical with image camera component 32 The processor 42 of letter.Processor 42 can include the standard processor of executable instruction, application specific processor, microprocessor etc., these Instruction can include the instruction for storing profile, for receive depth image instruction, for determine suitable target whether by The instruction that is included in depth image, for suitable Target Transformation is become the skeleton representation of this target or the instruction of model or Any other suitably instructs.

Seizure equipment 20 may also include memory assembly 44, and memory assembly 44 can store the finger that can be performed by processor 42 The image make, caught by 3-D camera or RGB camera or the frame of image, user profiles or any other suitable information, figure As etc..According to an example, memory assembly 44 can include random access memory (RAM), read only memory (ROM), height Speed caching, flash memory, hard disk or any other suitably store assembly.As in figure 2 it is shown, memory assembly 44 can be to catch with image Catch assembly 32 and single assembly that processor 42 communicates.In another embodiment, memory assembly 44 can be integrated into In processor 42 and/or image capture assemblies 32.In one embodiment, shown in Fig. 2 catch equipment 20 assembly 32, 34, some or all in 36,38,40,42 and 44 are accommodated in single shell.

Seizure equipment 20 can communicate with calculating equipment 12 via communication link 46.Communication link 46 can be bag Include the wired connection of such as USB connection, live wire connection, Ethernet cable connection etc. and/or the most wireless 802.11b, The wireless connections such as 802.11g, 802.11a or 802.11n connection.Calculating equipment 12 can provide clock to seizure equipment 20, should Clock may be used to determine when catch such as scene via communication link 46.

Seizure equipment 20 can provide by such as 3-D camera 36 and/or RGB phase to calculating equipment 12 via communication link 46 The depth information of machine 38 seizure and image, including the skeleton pattern that can be generated by seizure equipment 20.Then calculating equipment 12 can make Such as create vision screen with this skeleton pattern, depth information and the image caught, and control at such as game or word The application of reason program etc.

Calculating equipment 12 includes gesture library 192, structured data 194 and gesture recognition engine 190.Gesture library 192 can include The set of posture filter, each posture filter includes what description or definition can be made by skeleton pattern (when user moves) Motion or the information of posture.Structured data 192 includes about can the structural information of object to be tracked.For example, it is possible to storage people The skeleton pattern of class is to help understand the movement of user and identify body part.The structure about non-inanimate object can also be stored Information is to help identify these objects and help understanding mobile.

Gesture recognition engine 190 can be by caught by camera 36,38 and equipment 20, skeleton pattern and associated there The data of form such as movement compare with the posture filter in gesture library 192, to identify (as represented) by skeleton pattern When user has made one or more posture.Calculating equipment 12 can use gesture library 192 to explain the movement of skeleton pattern also Move based on this and control application.More information about gesture recognition engine 190 sees the U.S. submitted on April 13rd, 2009 Patent application 12/422,661 " Gesture Recognition System Architecture (gesture recognition system frame Structure) ", this application is quoted by entirety and is herein incorporated.See what on February 23rd, 2009 submitted to about the more information identifying posture U.S. Patent application 12/391,150 " Standard Gestures (standard gestures) "；And U.S. that on May 29th, 2009 submits to State's patent application 12/474,655 " Gesture Tool (gesture tool) ", the two application is all quoted by entirety and is incorporated into This.The U.S. Patent application 12/641,788 can submitted to for 18th at Decembers in 2009 about the more information of motion detect and track " Motion Detection Using Depth Images (using the motion detection of depth image) ", and United States Patent (USP) Shen Please 12/475,308 " Device for Identifying and Tracking Multiple Humans over Time (use In the equipment identifying and following the tracks of multiple mankind in time) " in find, the two application is quoted by entirety and is herein incorporated.

In one embodiment, calculating equipment 12 can include applying 202.Application 202 may be included in holds on calculating equipment 12 The video game application of row, film applications, shopping application, browse application or other application.Application 202 allows and by this application The user that the 202 one or more onscreen object presented are carried out is mutual.Described onscreen object may correspond to by application 202 The most unconventional object for appreciation game element presented, such as non-interactive type or static experience (the most static film sequence, Series of stories, film sequence or animation) period display lived object or abiotic object.Application 202 can include One or more scenes.Scene in application 202 can include such as developing the background information of plot between game play session. Such as, the scene in application 202 can include the static film sequence presented by application 202, series of stories, film sequence or move Drawing, they depict one or more lived or abiotic onscreen object.In one example, onscreen object can Corresponding to the object in addition to role representation on the screen of user.In one embodiment, user can set via the calculating of user User interface in standby 12 to interact with application 102.

Application 202 can include transformation model 196, display module 198 and application controls logic 200.Transformation model 196, aobvious Show that module 198 and application controls logic 200 can realize as software module, to perform the one or more of disclosure technology Operation.Application controls logic 200 can include the preprogrammed logic relevant with the execution of application 202 and/or the set of rule.One In individual embodiment, application controls logic 200 rule specified can be based on define alternately with the users that carried out of application 202 can For controlling the mode of the onscreen object presented by program 202.In one example, application controls logic 200 specify Rule can define the type of the action of execution on onscreen object alternately based on applying 202 users carried out.As incited somebody to action Discussed in detail below, user may be made that a range of motion, in order to the one or more screens described in application On curtain, object interacts, and the most different user movements can cause the corresponding object of similar or different onscreen object Action.

In one embodiment, gesture recognition engine 190 can provide the skeleton pattern of human object to application controls logic 200 Type and the information of the movement about human object.In one embodiment, the information about the movement of human object can include example The position of each body part as being associated with human object, direction, acceleration and curvature.Application controls logic 200 can profit The type of the action to perform on display onscreen object in application 202 is defined by this information.An embodiment In, application controls logic 200 rule specified can be implemented as the one group of posture identified by gesture recognition engine 190 and screen On curtain, a group objects response of object carries out the data structure being correlated with.

In another embodiment, application controls logic 200 can also define on the screen that wherein can describe to apply 202 One or more environmental context in object.Therefore, the type of the action that will perform on onscreen object can be based on The information relevant with user movement data (movement of human object) and wherein description have the environmental context of onscreen object. Such as, if depicting the lived onscreen object such as such as shark in the application, then environmental context can correspond to One context.In this context, the movement from left to right of the hip of user such as can cause so that the fin of shark comes The action that return is dynamic, but in the second context depicting the abiotic objects such as such as truck, same action but may be used Cause the action performing to turn round on truck.

Transformation model 196 can include based on rule defined in application controls logic 200 in specific screens on object Perform the executable instruction of specific action.In one embodiment, transformation model 196 can be by turning user movement data automatically The multiple objects response changing onscreen object into perform specific action on onscreen object.In one embodiment, conversion Model 196 can be come by the respective response accessing onscreen object from the data structure defined by programed logic 200 Perform this conversion.In one embodiment, the respective response of onscreen object can include the motion response of this onscreen object, regard Feel and respond or acoustic frequency response.Then transformation model 196 may have access to realize the code of this object response.In one embodiment, may be used By the skeleton pattern of user being represented the object model being mapped to onscreen object represents that realizing this object responds.Such as, Transformation model 196 can be mapped to based on screen by the movement of the body part of acquisition in representing from the skeleton pattern of human object User movement data are converted into by the some portion of corresponding movement in the onscreen object that on curtain, the object model of object represents The corresponding sports response of onscreen object.Similarly, conversion module 196 can represent by utilizing the object model of onscreen object User movement data are converted into corresponding acoustic frequency response or the eye response of this onscreen object.

In one embodiment, transformation model 196 is performing concrete mobile being turned accordingly with execution onscreen object When changing motion, the available metadata relevant with the movement of human object, the speed of such as body part, the acceleration of body part The distance that degree or body part are advanced.Such as, the movement of the more speed of the concrete body part of human object can cause by with Family exercise data is converted into the motion response of onscreen object, and wherein in response to the movement of more speed, onscreen object moves Obtain faster.Similarly, transformation model 196 can utilize the metadata relevant with the movement of human object (such as body part Speed, the acceleration of body part or body part are advanced and to be obtained distance etc.) trigger acoustic frequency response or the vision of onscreen object Response.

Display module 198 controls what to display to the user that, and can include for receiving based at transformation model 196 Information carry out the executable instruction of multiple objects response of object on display screen in the user interface calculating equipment 12.

Fig. 3 shows the example of the calculating equipment 100 of the calculating equipment 12 that can be used to realize Fig. 1-2.The calculating equipment of Fig. 3 100 can be the multimedia consoles 100 such as such as game console.As it is shown on figure 3, multimedia console 100 has centre Managing unit (CPU) 200 and be easy to the Memory Controller 202 of processor access all kinds memorizer, all kinds store Device includes flash read only memory (ROM) 204, random access memory (RAM) 206, hard disk drive 208 and portable Media drive 106.In one implementation, CPU 200 includes 1 grade of cache 210 and 2 grades of caches 212, is used for facing Time storage data, and therefore reduce the quantity to the memory access cycle that hard disk drive 208 is carried out, thus improve process Speed and handling capacity.

CPU 200, Memory Controller 202 and various memory devices are via one or more bus (not shown) It is interconnected.The details of the bus used in this implementation is to understanding that theme described herein is not especially relevant.So And, it should be appreciated that such bus can include serial and concurrent bus, memory bus, peripheral bus, the various bus of use One or more in the processor of any one of architecture or local bus.As example, such architecture Can include that industry standard architecture (ISA) bus, MCA (MCA) bus, enhancement mode ISA (EISA) are total Peripheral parts interconnected (PCI) bus of line, VESA's (VESA) local bus and also referred to as mezzanine bus.

In one implementation, CPU 200, Memory Controller 202, ROM 204 and RAM 206 are integrated into public In module 214.In this implementation, ROM 204 is configured to pci bus and ROM bus (being both shown without) connects Flash ROM to Memory Controller 202.RAM 206 is configured to multiple Double Data Rate synchronous dynamic ram (DDR SDRAM) module, they are stored by controller 202 and are controlled independently by separate bus (not shown).Hard drive Device 208 and portable media drive 106 are shown to pass through pci bus and AT Attachment (ATA) bus 216 is connected to Memory Controller 202.But, in other realize, it is also possible to alternatively apply different types of dedicated data bus structures.

Graphics Processing Unit 220 and video encoder 222 constitute for carrying out at high speed and high-resolution (such as, height Definition) the video processing pipeline of graphics process.Data pass through digital video bus (not shown) from Graphics Processing Unit 220 are transferred to video encoder 222.Audio treatment unit 224 and audio coder-decoder (encoder/decoder) 226 are constituted Corresponding audio processing pipeline, for carrying out multi-channel audio process to various digital audio formats.Pass through communication link (not shown) transmits voice data between audio treatment unit 224 and audio coder-decoder 226.Video and Audio Processing stream Waterline exports data to A/V (audio/video) port 228, in order to be transferred to television set or other display.In shown reality In Xian, video and audio processing components 220-228 are arranged in module 214.

Fig. 3 shows the module 214 including USB host controller 230 and network interface 232.USB host controller 230 is shown For being communicated with CPU 200 and Memory Controller 202 by bus (such as, pci bus), and as peripheral controllers The main frame of 104 (1)-104 (4).Network interface 232 provides the access to network (such as the Internet, home network etc.), and can Being to include the various wired or nothings such as Ethernet card, modem, wireless access card, bluetooth module, cable modem Any one in line interface assembly.

In realizing depicted in figure 3, control station 102 includes for supporting four controllers 104 (1)-104 (4) Sub-component 240 supported by controller.Controller support sub-component 240 include supporting with such as, such as, media and game console it Any hardware and software component needed for the wired and radio operation of the external control devices of class.Front panel I/O subassembly 242 Hold power knob 112, ejector button 114, and any LED (light emitting diode) or be exposed on the outer surface of control station 102 Multiple functions such as other display lamps.Subassembly 240 and 242 is led to module 214 by one or more cable assemblies 244 Letter.In other realize, control station 102 can include other controller subassembly.Shown realization also show is joined It is set to transmission and reception can be for delivery to the optics I/O interface 235 of the signal of module 214.

MU 140 (1) and 140 (2) is illustrated as being connected respectively to MU port " A " 130 (1) and " B " 130 (2).Additional MU (such as, MU 140 (3)-140 (6)) is illustrated as may be connected to controller 104 (1) and 104 (3), i.e. each controller two Individual MU.Controller 104 (2) and 104 (4) can also be configured to receive MU (not shown).Each MU 140 provides additional Memorizer, can store game, game parameter and other data in the above.In some implementations, other data can include Digital game component, executable game application, any in the instruction set and media file of extension, game application Kind.When being inserted in control station 102 or controller, MU 140 can be stored by controller 202 and access.System power supply mould Block 250 is to the assembly power supply of games system 100.Fan 252 can cool down the circuit in control station 102.

Application program 260 including machine instruction is stored on hard disk drive 208.When control station 102 is switched on electricity During source, the various piece of application 260 is loaded in RAM 206, and/or cache 210 and 212 with on CPU 200 Performing, wherein application 260 is such example.Various application can be stored on hard disk drive 208 at CPU Perform on 200.

By simply game and media system 100 are connected to monitor 150 (Fig. 1), television set, video projector or Other display equipment, this system 100 can serve as autonomous system and operates.Under this stand-alone mode, game and media system 100 allow one or more players game play or appreciate Digital Media, such as viewing film or appreciation music.But, along with width Integrated being possibly realized by network interface 232 of band connection, game and media system 100 are also used as bigger network The participant of game community operates.

Fig. 4 is exemplified with the universal computing device of the operation for realizing disclosure technology.With reference to Fig. 4, it is used for realizing these public affairs Open the universal computing device that the example system of technology includes presenting with the form of computer 310.The assembly of computer 310 is permissible It include but not limited to, processing unit 320, system storage 330, and the various system components of system storage will be included It is coupled to the system bus 321 of processing unit 320.If system bus 321 can be any one in the bus structures of dry type, Including the memory bus of any one used in various bus architecture or Memory Controller, peripheral bus, Yi Jiju Portion's bus.Unrestricted as example, such architecture includes industry standard architecture (ISA) bus, microchannel body Architecture (MCA) bus, enhancement mode ISA (EISA) bus, VESA (VESA) local bus, and also referred to as Peripheral parts interconnected (PCI) bus of mezzanine bus.

Computer 310 generally includes various computer-readable medium.Computer-readable medium can be can be by computer 310 Any usable medium accessed, and comprise volatibility and non-volatile media, removable and irremovable medium.As example And unrestricted, computer-readable medium can include computer-readable storage medium and communication media.Computer-readable storage medium include with For storing any method or the technology of the information such as such as computer-readable instruction, data structure, program module or other data The volatibility realized and medium non-volatile, removable and irremovable.Computer-readable storage medium include, but not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disc (DVD) or other optical disc memory apparatus, magnetic Tape drum, tape, disk storage equipment or other magnetic storage apparatus, or can be used for storing information needed and can be by computer 310 any other media accessed.Communication media generally comes with modulated message signal such as such as carrier wave or other transmission mechanisms Embody computer-readable instruction, data structure, program module or other data, and include random information transmission medium.Term is " Modulated data signal " refer to the signal that one or more feature is set in the way of encoding information in the signal or changes. Unrestricted as example, communication media includes such as cable network or the wire medium of directly line connection etc, and as acoustics, The wireless medium of RF, infrared and other wireless mediums etc.Above-mentioned middle any combination also should be included in computer-readable medium Within the scope of.

System storage 330 includes volatibility and/or the computer-readable storage medium of nonvolatile memory form, as read-only Memorizer (ROM) 331 and random access memory (RAM) 332.Basic input/output 333 (BIOS) includes as started Time help to transmit between element in computer 310 the basic routine of information, it is commonly stored in ROM 331.RAM 332 Generally comprise data and/or program module that processing unit 320 can immediately access and/or be currently in operation.As example And unrestricted, Fig. 4 shows operating system 334, application program 335, other program modules 336, and routine data 337.

Computer 310 can also include other removable/nonremovable, volatile/nonvolatile computer storage media. Being only used as example, Fig. 4 shows the hard disk drive 340 being written and read irremovable, non-volatile magnetic media, to moving The disc driver 351 that dynamic, non-volatile magnetic disk 352 is written and read, and can to such as CD ROM or other optical medium etc. The CD drive 355 that mobile, anonvolatile optical disk 356 is written and read.Can use in Illustrative Operating Environment other Removable/nonremovable, volatile/nonvolatile computer storage media includes but not limited to, cartridge, flash card, numeral Versatile disc, digital video tape, solid-state RAM, solid-state ROM etc..Hard disk drive 341 generally by such as interface 340 can not Mobile memory interface is connected to system bus 321, and disc driver 351 and CD drive 355 are generally by such as connecing The removable memory interface of mouth 350 is connected to system bus 321.

As discussed above and the driver that figure 4 illustrates and the computer-readable storage medium that is associated thereof are computer 310 provide the storage to computer-readable instruction, data structure, program module and other data.Such as, in Fig. 4, hard disk Driver 341 is illustrated as storing operating system 344, application program 345, other program module 346 and routine data 347.Note, These assemblies can be identical with operating system 334, application program 335, other program modules 336 and routine data 337, it is also possible to Different from them.It is given at this operating system 344, application program 345, other program modules 346 and routine data 347 Different numberings, with explanation, at least they are different copies.User can be (logical by such as keyboard 362 and pointing device 361 Be commonly referred to as mouse, tracking ball or touch pads) etc input equipment to computer 20 input order and information.Other inputs set Standby (not shown) can include microphone, stick, game paddle, satellite dish, scanner etc..These and other are defeated Enter equipment and be generally connected to processing unit 320 by the user's input interface 360 coupleding to system bus but it also may by the most also Other interfaces such as row port, game port or USB (universal serial bus) (USB) and bus structures are attached.Monitor 391 or Other kinds of display device is connected to system bus 321 also by the interface of such as video interface 390.In addition to monitor 891, Computer can also include other peripheral output devices of such as speaker 397 and printer 396 etc, and they can be by defeated Go out peripheral interface 390 to connect.

Computer 310 can use the logic of the one or more remote computers to such as remote computer 380 etc even It is connected in networked environment operation.Remote computer 380 can be personal computer, server, router, network PC, equity Equipment or other common network node, and generally include above with reference to the many described by computer 310 or whole element, Although Fig. 4 merely illustrates memory devices 381.Logic described in Fig. 4 connects and includes LAN (LAN) 371 and wide area Net (WAN) 373, but it is also possible to include other networks.Such networked environment office, enterprise-wide. computer networks, Intranet and the Internet are common.

When using in LAN networked environment, computer 310 is connected to LAN by network interface or adapter 370 371.When using in WAN networked environment, computer 310 generally includes modem 372 or for by such as because of spy The WAN 373 such as net set up other means of communication.Modem 372 can be internal or external, and it can be via user Input interface 360 or other suitable mechanism are connected to system bus 321.In networked environment, relative to computer 310 institute The program module or its part that describe can be stored in remote memory storage device.Unrestricted as example, Fig. 4 illustrates Reside in the remote application 385 on memory devices 381.It is exemplary for being appreciated that shown network connects, and Other means setting up communication link between the computers can be used.

The hardware device in Fig. 1-4 can be used to realize a system, and this system allows user and except on the screen of user Object beyond expression interacts.Fig. 5 depicts an enforcement of the process of each operation for performing disclosure technology The flow chart of example.In one embodiment, the step in Fig. 5 can be by the transformation model calculated in equipment 12 in system 10 196, the software module in display module 198 and/or application controls logic 200 performs.In step 500, calculating is first received The identity of the user on equipment 12.In one example, step 500 can use facial recognition by from the vision figure received The face of the user of picture carries out relevant, to determine the identity of user to reference visual pattern.In another example, user identity is determined Can include at user, receive the input identifying their identity.Such as, user profiles can be stored by computer equipment 12, and User can make one's options to be designated corresponding to this user profiles themselves on screen.It is being successfully made ID After, receive user's selection to such as applying 202 application such as grade in step 501.In step 501, can be by the place of user User interface in reason equipment 12 points out user to select to apply 202.As discussed in the above, application 202 can include example As calculated video game application, the film applications etc. performed on equipment 12.In step 502, application 202 is displayed to the user that Scene.

In step 506, make and determine whether to detect the inspection of user movement.If it is determined that detect user's fortune Dynamic, receive user movement data the most in step 512.In one embodiment, the processor 42 in seizure equipment 20 can be examined Survey and receive the information relevant with the motion of user.Discuss in figure 6 and detected by seizure equipment 20 and catch user movement The process of data.Without user movement data being detected, make the most in step 508 and determine whether user wishes and application The inspection that the additional scene of 202 interacts.If it is determined that user wishes to interact with one or more additional scenes, then exist Step 504 obtains the next scene of application 202, and displays to the user that this scene in step 502.If it is determined that user is not Wish to interact with any additional scene of application 202, then apply 202 to exit in step 510.

After capturing user movement in step 512, make in step 514 and determine whether user movement data mate one The inspection of posture.User may be made that a range of motion, in order to the one or more onscreen object described in application Interact.In one embodiment, step 514 can include specifying user movement data with by gesture recognition engine 190 One or more predetermined gestures compare.Determining whether user movement data mate the process of a posture can be by posture Identify that engine 190 performs, and can discuss in the figure 7.If user movement Data Matching one posture, then in step In 518, these user movement data are automatically converted into multiple objects response of onscreen object.In one embodiment, conversion is used The step 518 of family exercise data can perform as follows.In step 519, utilize obtain in the step 514 to be matched posture Access as described in Figure 2 carry out relevant data structure for posture and object being responded.In step 520, from data structure Place accesses the respective response of onscreen object, and it includes accessing the pointer pointing to the code realizing this response.Such as, right on screen The respective response of elephant can include the motion response of this onscreen object, eye response or acoustic frequency response.In step 521, access Realize the code of object response.In one embodiment, can be mapped to by the skeleton pattern of user is represented as described in Figure 2 The object model of onscreen object represents to realize object response.

If user movement data do not mate posture, the most in step 516 via the user interface calculating equipment 12 come to User provides the feedback relevant with being used by the user to the different types of motion that interacts with onscreen object.Such as, may be used Providing a user with the guide with text, this guide and user can be used to dissimilar with what various onscreen object interacted Motion relevant.If detecting user movement data in step 506, being subsequently based on the most in step 512 and providing a user with Feedback catch user movement data.

In step 522, via the user interface in the calculating equipment 12 of user, multiple objects of onscreen object are rung User should be shown to.

Fig. 6-9 there is provided the flow chart in greater detail of each step in Fig. 5.Fig. 6 depicts for catching use The flow chart of one embodiment (step 512 in Fig. 5) of the process of family exercise data.In step 526, catch the place of equipment 20 Reason device 42 receives visual pattern and depth image at image capture assemblies 32.In other examples, only receive in step 526 Depth image.Depth image and visual pattern can by any sensor in image capture assemblies 32 or as known in the art its The sensor that he is suitable for catches.In one embodiment, depth image and visual pattern are separately captured.Real at some In Xian, depth image and visual pattern are captured simultaneously, and in other realize, they are sequentially or at different times It is captured.In other embodiments, depth image is captured together with visual pattern, or is combined into visual pattern One image file so that each pixel has R value, G-value, B value and Z value (expression distance).

In step 528, determine the depth information corresponding to visual pattern and depth image.Can analyze and receive in step 526 The visual pattern arrived and depth image, to determine the depth value of the one or more targets in image.Seizure equipment 20 can catch Or observe the capture region that can include one or more target.

In step 530, seizure equipment determines whether depth image includes human object.In one embodiment, can be to deeply Each target in degree image carries out film color filling and is compared to determine whether this depth image includes by itself and a pattern Human object.In one embodiment, it may be determined that the edge of each target being captured in scene of depth image.Depth image Can be included being captured the two-dimensional pixel region of scene.Each pixel in 2D pixel region can represent the most such as can be from camera Measure the length arrived or distance even depth value.Can be by being associated with the adjacent or neighbouring pixel of such as depth image Various depth values are compared to determine edge.If the various depth values just compared are more than predetermined sides tolerance, then these pictures Element one edge of definable.The depth information calculated including depth image can be organized into " Z layer " and maybe can hang down by seizure equipment Straight in each layer of the Z axis extending to observer along its sight line from camera.Can based on determined by edge, the possibility to Z layer Z value carry out film color filling.Such as, can by with determined by the pixel that is associated of edge and determined by intramarginal region In pixel interrelated, to define the target in capture region or object.

In step 532, catch device scan human object to find one or more body parts.People's classification can be scanned Mark the tolerance of such as length, the width etc. that provide the one or more body parts with user to be associated so that can be based on this A little tolerance generate the accurate model of this user.In one example, isolate human object, and create bit mask and scan one Or multiple body part.Can such as by human object is carried out film color fill create bit mask so that this human object with Other targets or object in capture region element are separated.

In step 534, it is grown up next life classification target model based on the scanning performed in step 532.This bit mask can be analyzed Find one or more body part, to generate the models such as the skeleton pattern of such as human object, mesh human model.Example As, the metric determined by the bit mask scanned can be used to the one or more joints defining in skeleton pattern.Bit mask Can include that human object is along X, Y and the value of Z axis.These one or more joints can be used for defining the body part that may correspond to the mankind One or more skeleton.

According to an embodiment, in order to determine the position of the cervical region of human object, shoulder etc., can be by such as scanned The width of bit mask of position compare with the threshold value of the representative width being associated with such as cervical region, shoulder etc..Replace one Change in embodiment, it is possible to use determine with the distance of previous position that is that scan and that be associated with the body part in bit mask The position of cervical region, shoulder etc..

In one embodiment, in order to determine the position of shoulder, can be by the width of the bit mask of shoulder position and threshold value Shoulder value compares.For example, it is possible to by the distance between two most external Y values at the X value of the bit mask of shoulder position And the threshold shoulder value of the typical range between such as mankind's shoulder compares.Thus, according to an example embodiment, this threshold value Shoulder value can be the representative width or width range being associated with the shoulder in the body model of the mankind.

In one embodiment, some body part of such as lower limb, foot etc. can be based on the position of such as other body parts Calculate.Such as, as set forth above, it is possible to the such as information such as position, pixel that scanning is associated with human object, to determine people's classification The position of each body part of target.Based on these positions, can be human object's follow-up body of calculating such as lower limb, foot etc. subsequently Body region.

According to an embodiment, after the value determining such as certain body part, a data structure can be created, these data Structure can include the metric of such as length, the width etc. of the body part that the scanning of the bit mask with human object is associated. In one embodiment, this data structure can include the scanning result that multiple depth images are averaging gained.Such as, seizure sets For catching the capture region in each frame, each frame includes depth image.The depth map of each frame can be analyzed as described above Picture, to determine whether to include human object.If the depth image of frame includes human object, then can scan and be associated with this frame The bit mask of the human object of depth image finds one or more body part.Then can be to determined by each frame The value of body part is averaging, so that this data structure body part that can include being associated with the scanning of each frame is all Average degree value such as length, width etc..According to an embodiment, the metric of body part determined by adjustable, as put Greatly, reduce so that the metric in data structure corresponds more closely to the typical model of human body.In step 534, The metric determined by the bit mask scanned can be used to the one or more joints defining in skeleton pattern.

In step 536, use skeleton to map and follow the tracks of the model created in step 534.Such as, can be user in the visual field Adjust and update the skeleton pattern of user 18 when physical space before inherent camera moves.Information from the equipment of seizure can be used In adjusting model so that skeleton pattern represents user exactly.In one example, this is by the one of this skeleton pattern Or multiple stress aspect applies one or more power, to be adjusted to correspond more closely to human object and thing by this skeleton pattern The attitude of the attitude in reason space realizes.

In step 538, motion is caught, to generate motion capture files based on skeleton mapping.In one embodiment, The step 538 catching motion can include calculating the position of one or more body parts by described scanning mark, direction, acceleration Degree and curvature.The position of body part is calculated, to create this body part three-dimensional in the visual field of camera in X, Y, Z-space Positional representation.The direction that body part moves is calculated according to this position.Directivity moves can have appointing in X, Y and Z-direction Component in what one or a combination thereof.Determine the curvature of this body part movement in X, Y, Z-space, such as to represent health Position nonlinear moving in capture region.Speed, acceleration and curvature estimation are not dependent on direction.It is right to be appreciated that The use that X, Y, Z Cartesian map is merely possible to what example provided.In other embodiments, different coordinates can be used to reflect The system of penetrating calculates mobile, speed and acceleration.When check body part naturally rotate around joint mobile time, such as sphere Coordinate mapping is probably useful.

After once analyzing all body parts in a scan, it is possible to generate in step 538 for target update Motion capture files.In one example, motion capture files is to come in real time based on the information being associated with the model followed the tracks of Generate.Such as, in one embodiment, motion capture files can include the vector containing X, Y and Z value, and these vectors are at model For defining joint and the skeleton of this model when each time point is tracked.As it has been described above, tracked model can be based on respectively The user movement of individual time point is adjusted, and can generate and store the motion capture files of the model of motion.With mesh Mark not, analyze and follow the tracks of the user of system interaction and carry out nature and move period, this motion capture files can catch is followed the tracks of Model.Such as, motion capture files can be generated so that this motion capture files can naturally catch user with target recognition, Analyze and follow the tracks of any movement or motion carried out during system interaction.This motion capture files can include with such as user not With the frame that the snapshot of the motion of time point is corresponding.After capturing followed the tracks of model, can be the one of motion capture files Presenting the information being associated with model in frame, this information is included in a particular point in time and is applied to any movement or the tune of this model Whole.Information in this frame can include the such as vector containing X, Y and Z value and a timestamp, the mould that these vector definition are followed the tracks of The joint of type and skeleton, this timestamp for example, may indicate that user performs the movement of the attitude corresponding to the model followed the tracks of Time point.

In one embodiment, step 526-538 is performed by calculating equipment 12.Although additionally, step 526-538 quilt It is depicted as being performed by seizure equipment 20, but each step in these steps all can be by such as computing environment 12 etc Other assemblies perform.Such as, seizure equipment 20 can provide vision and/or depth image to calculating equipment 12, calculates equipment 12 Then will determine depth information, detection human object, scanning target, generate also trace model and the fortune of seizure human object Dynamic.

Fig. 7 show that can generate in the step 534 of Fig. 6, represent the skeleton pattern of human object scanned or reflect Penetrate the example of 540.According to an embodiment, skeleton pattern 540 can include that human object can be expressed as threedimensional model Or multiple data structure.Each body part is characterized by defining joint and the mathematical vector of skeleton of skeleton pattern 540.

Skeleton pattern 540 includes joint n1-n18.It is fixed that each in the n1-n18 of joint can make between these joints One or more body potential energies of justice move relative to other body parts one or more.Represent that the model of human object can Including multiple rigidity and/or flexible body position, these body parts can be by one or more structures of such as " skeleton " etc. Part defines, and joint n1-n18 is positioned at the intersection of adjacent skeleton.Joint n1-n18 can make and skeleton and joint n1- Each body part that n18 is associated can independently of one another or be movable relative to each other.Such as, between n7 and n11 of joint The skeleton of definition is corresponding to forearm, and this forearm can be independent of the bone corresponding to shank such as defined between n15 and n17 of joint Bone and move.It is appreciated that some skeleton may correspond to the anatomy skeleton in human object, and/or some skeleton is the mankind Target may not have the anatomy skeleton of correspondence.

Skeleton and joint can collectively form skeleton pattern, and they can be the constitution element of this model.Axial rolling angle can For defining limb relative to his father's limb and/or the spin orientation of trunk.Such as, if skeleton pattern is just illustrating the axial rotation of arm Turn, then roll joint and can be used to indicate the direction (such as, palm is upwards) of the wrist indication being associated.By checking that limb is relative to it Father's limb and/or the orientation of trunk, it may be determined that axial rolling angle.Such as, if checking shank, then can check shank relative to The thigh being associated and the orientation of hip are to determine axial rolling angle.

Fig. 8 provides the further detail below of the exemplary embodiment of gesture recognition engine 190 as shown in Figure 2.As shown Going out, gesture recognition engine 190 may include determining whether at least one filter 450 of one or more posture.Filter 450 includes The parameter of the metadata 454 of definition posture 452 (hereinafter referred to as " posture ") and this posture.Filter can include recognizable posture Or otherwise process the code of the degree of depth, RGB or skeleton data and the data being associated.Such as, including a hands from health The throwing crossing preaxial motion behind can be implemented as including that a hands of expression user crosses health behind from health The posture 452 of the information of the movement in front, this moves and will be caught by depth camera.Then can be this posture 452 setup parameter 454.In the case of posture 452 is to throw, parameter 454 can be that the threshold velocity that must reach of this hands, this hands must be advanced Distance (or absolute, or as the overall size relative to user) and by identifying that grading made by engine Posture occur confidence level.These parameters 454 of posture 452 can be between each application, at each context of single application Between or one application a context in change over time.Pose parameter can include threshold angle (such as hip Portion-thigh angle, forearm-biceps angle etc.), the motion periodicity, threshold period, the threshold position that occur or do not occur (open Begin, terminate), moving direction, speed, acceleration, the coordinate etc. of movement.

Filter can include recognizable posture or the code otherwise processing the degree of depth, RGB or skeleton data and be correlated with The data of connection.Filter can be modular or interchangeable.In one embodiment, filter have multiple input and Multiple outputs, each in these inputs has a type, and each in these outputs has a type.In this situation In, the first filter can replace with second filter with this first filter with equal number and the input of type and output Change, and without changing other aspects of gesture recognition engine architecture.For example, it may be possible to have the first filter to be driven, This first filter using skeleton data as input, and export confidence level occurent with the posture that this filter is associated and Steering angle.In the case of this first driving filter is replaced in hope with the second driving filter, (this is possibly due to second Drive filter more efficient and need less process resource), can be by replacing the first filtration with the second filter simply Device carrys out do so, if the second filter have same input and output one of skeleton data type input and Confidence categories and two outputs of angular type.

Filter need not have parameter.Such as, " user's height " filter of the height returning user may not allow Any parameter that can be conditioned." user's height " filter replaced can have customized parameter, as determined the height of user Time whether consider the footwear of user, hair style, headwear and figure.

The input of filter can include the joint data of such as joint position about user, such as the bone intersected at joint Angle that bone is formed, the content such as the rate of change of one side from the rgb color data of capture region and user.From The output of filter can include the most just making the confidence level of given posture, making the speed of posture movements and make posture The contents such as the time of motion.

Gesture recognition engine 190 can have the base recognition engine 456 providing function to posture filter 450.At one In embodiment, the function that base recognition engine 456 realizes includes following the tracks of the posture identified and the input in time of other inputs (input-over-time) achieve, (wherein modeling system is assumed have unknown parameter in hidden Markov model realization (current state encapsulates any past state information determined needed for state in future to Markov process, is therefore not necessarily this mesh And safeguard the process of other past state information any), and hiding parameter determines from observable data) and solve Other functions needed for the particular instance of gesture recognition.

Filter 450 loads on base recognition engine 456 and realizes, and available engine 456 is supplied to own The service of filter 450.In one embodiment, base recognition engine 456 processes received data, in order to whether determine it Meet the requirement of any filter 450.The service that such as input being carried out parsing etc. due to these is provided is to be identified by basis Engine 456 disposably provides rather than is provided by each filter 450, the most this service only need within a period of time by Processing once rather than is processed once by each filter 450 during this time period, thereby reduces needed for determining posture Process.

Application program can use by the filter 450 identifying that engine 190 provides, or this application program can provide its own Filter 450, this filter is inserted in base recognition engine 456.In one embodiment, all filters 450 have Enable the common interface of this insertion characteristic.Additionally, all filters 450 may utilize parameter 454, therefore can use as described below Single gesture tool diagnoses and regulates whole filter system.These parameters 454 can be application or application by gesture tool Context regulates.

There is the various output can being associated with posture.In one example, posture can be related to the most occur Baseline " yes/no ".In another example, it is also possible to have confidence level, it is corresponding to the tracked mobile correspondence of user Probability in posture.This can be scope be the lineal scale of the floating number (including end points) between 0 and 1.Receiving this appearance In the case of the application of gesture information can not accept the false input of conduct certainly, it can only use has the high confidence water such as at least 0.95 The posture that flat those are identified.Even certainly must identify the situation of each example of posture as cost with vacation in application Under, it may use that at least have the posture of much lower confidence level, such as those postures of simply greater than .2.Posture can have The output of the time between two nearest steps, and in the case of only have registered the first step, this can be set as retention, such as-1 (because the time between any two steps is just necessary for).Posture also can have about at the highest thigh that the most further period reaches The output at angle.

The space body that it can must be occurred by posture or one part wherein is as parameter.Include that health moves in posture In the case of, this space body generally can be expressed relative to health.Such as, the rugby for dexterous user is thrown Throw posture to be only not less than right shoulder 410a and know in the space body of the same side of 422 with throwing arm 402a-410a Not.All borders of possible unnecessary definition space body, as this throwing gesture, wherein kept not from the border that health is outside It is defined, and this space body ad infinitum stretches out, or extend to the edge of the capture region being just monitored.

It addition, posture may be stacked on over each other.That is, user once can express more than one posture.Such as, not exist Do not allow any input in addition to throwing when making throwing gesture, the most do not require that user protects in addition to the component of this posture Hold motionless (such as, standing still) when making the throwing gesture only relating to an arm.When posture stacks, user can be simultaneously Make jump posture and throwing gesture, and the two posture all will be recognized by the gesture engine.

Fig. 9 depicts and according to disclosure technology for performing the determination user movement data of the step 514 in Fig. 5 is The flow chart of one embodiment of the process of no coupling one posture.Fig. 9 describes a kind of rule-based method, and the method is used for One or more posture filter is applied whether to mate one to the exercise data determining user by gesture recognition engine 190 concrete Posture.Although being appreciated that and describe the detection to single posture in this concrete example, but the process of Fig. 9 can be by repeatedly Perform the multiple postures concentrated with detection in activity posture.Described process can for multiple activity postures in parallel or sequentially Ground performs.

In step 602, gesture recognition engine accesses the skeleton tracking data of objectives to start whether to determine this target Make selected posture.As described in Figure 6, skeleton tracking data can be accessed at motion capture files.In step 604, posture Identifying that engine filters skeleton tracking data for one or more predetermined body parts, this is one or more the most true Fixed body part is relevant with the posture selected by mark in selected posture filter.Step 604 can include only accessing with The data that selected posture is relevant, or access whole skeleton tracking data of target and ignore or abandon and selected posture not phase The information closed.Such as, hand gesture filter can indicate that the hands of only human object is relevant with selected posture, so that and its The relevant data of his body part can be left in the basket.This technology can be determined in advance as the appearance to being selected by process being limited to Gesture is that significant information is to improve the performance of gesture recognition engine.

In step 606, gesture recognition engine is that predetermined moving axially filters skeleton tracking data.Such as, selected The filter of posture may specify that the movement of a certain subset only along axle is relevant.

In step 608, gesture recognition engine accesses the regular j specified in posture filter.In the process of Fig. 9 first In secondary iteration, j is equal to 1.Posture can include multiple parameters that needs are satisfied in order to posture is identified.Every in these parameters Individual parameter can be specified in single rule, although can include multiple component in single rule.Rule may specify the body of target Threshold distance, position, direction, curvature, speed and/or acceleration that body region must is fulfilled for make posture be satisfied and Other parameters.Rule can be applicable to a body part or multiple body part.And, rule may specify such as position etc Single parameter, or multiple parameters of such as position, direction, distance, curvature, speed and acceleration etc.

In step 610, gesture recognition engine is by the appointment in step 604 and the 606 skeleton tracking data filtered with rule Parameter compares, to determine whether this rule is satisfied.Such as, gesture recognition engine can determine that the original position whether position of hands In the threshold distance of original position parameter.Rule can further specify that and engine determines that hands moves the most in the direction indicated Move, move threshold distance from original position in the direction indicated；Along specifying axle to move in threshold value curvature, with command speed or Exceed command speed to move；Meet or exceed appointment acceleration.If engine determines that skeleton tracking information is unsatisfactory for filter rule The parameter specified in then, engine returns failure or the unsatisfied response of posture filter the most in step 612.Set in calculating The application 202 performed on standby 12 returns this response.

In step 614, it is necessary that gesture recognition engine determines whether posture filter specifies for posture to be done The ancillary rules being satisfied.If filter includes ancillary rules, then j be incremented by 1 and process return to step 608, in step 608 access next rule.Without ancillary rules, then returning posture filter in step 618 gesture recognition engine is expired The instruction of foot.

The step 612 of Fig. 9 and 618 aligns analyzed posture and returns simply by/failed response.In other examples In, Fig. 9 does not return simply by/failed response, but will return the confidence level that posture filter is satisfied.For Each rule in filter, determines that the movement of target meets or is unsatisfactory for specifying the amount of parameter.Gathering based on this tittle, knows Other engine returns target and really performs the confidence level of posture.

Figure 10-14 shows the various users that user that the application describing have with perform on the computing device is carried out is mutual Interface screen.Figure 10-11 shows that the example user carried out via the user interface in calculating equipment 12 and application is mutual, with And the result mutual with the user that application is carried out.Figure 10 show user 18 just via the user interface in calculating equipment 12 with should Interact with 202.Figure 11 shows and applies the mutual result of the user as shown in Figure 10 carried out.In showing shown in Figure 11 In example diagram, user 18 to describe with in the scene 50 of application 202 by performing the exemplary motion such as such as hip motion Shark object 52 interact.Showing the result that this user is mutual, in this result, hip motion 54 has been converted into shark Multiple motion responses 56,57 of the fin of fish.As further shown, via user interface, multiple objects are responded 56,57 It is simultaneously displayed to user 18.

In another embodiment, more than one user is carried out with the application 202 performed in calculating equipment 12 simultaneously Alternately.Therefore, first user exercise data can be received at first user, and can be from right with on the screen of description application 202 The second user movement data are received as at the second user of interacting.First user exercise data can be converted into the first motion Response, and the second user movement data can be converted into the second motion response.Can come via the user interface calculated in equipment 12 First and second responses of onscreen object are simultaneously displayed to user.In one embodiment, when the second user movement data When differing with first user exercise data, the second object response can differ with the first object response.As replacement, when When two user's exercise datas are identical with first user exercise data, the second object response can be the amplification of first user response. The response amplified can such as based on the user movement sensed speed or acceleration determine, wherein in response to more speed Mobile onscreen object is faster removable.

Figure 12 is first user 18 and the exemplary diagram of the second user 19, and this first user 18 and the second user 19 are Interact with the onscreen object 52 described in the application 202 performed on calculating equipment 12.In this exemplary diagram, the One user 18 performs hip motion and interacts with the shark object 52 described in the scene 50 of application 202.Second user 19 perform hand exercise 53 interacts with identical shark object 52.Show to enter with shark object 52 the most simultaneously The result that two capable users are mutual, wherein hip motion 54 has been converted into multiple motions of the fin causing shark object 52 56, first motion response of 57, and hand exercise 53 be converted into the health causing shark object 52 motion 59 second Motion response.

Figure 13 shows that user 18 is just carried out via another scene of the user interface in calculating equipment 12 with application 202 Alternately.In the exemplary diagram shown in Figure 13, user 18 and the such as bulb described in the concrete scene 60 of application 202 The abiotic objects such as object 62 interact.The result that Figure 14 shows with light bulb object 62 is carried out user is mutual, its The motion 66 of clapping hands of middle user has been converted into the eye response of light bulb object 62.As shown, as by reference marker 64 Bulb 62 is opened by this indicated eye response.

Although the language special by architectural feature and/or method action describes this theme, it will be appreciated that appended power Profit theme defined in claim is not necessarily limited to above-mentioned specific features or action.More precisely, above-mentioned specific features and action It is as realizing disclosed in the exemplary forms of claim.The scope of the present invention is defined by the claims appended hereto.

Claims

1. one kind based on mutual with the user that carried out of application that performs on calculating equipment, and user movement is converted into multiple object The method of response, including:

The user movement data of one or more users are received at sensor；

Determine whether described user movement data mate one or more predefined posture；

When the described one or more predefined posture of user movement Data Matching, described user movement data are changed automatically Becoming multiple objects response of onscreen object, described onscreen object is corresponding to except being shown by the calculating equipment performing described application User screen on represent beyond object；And

Show the plurality of object response of described onscreen object based on described conversion simultaneously, and do not change user to described The most mutual result that application is carried out,

Wherein said onscreen object is noninteractive.

2. the method for claim 1, it is characterised in that:

At least one in the response of the plurality of object includes the motion response of described onscreen object.

3. the method for claim 1, it is characterised in that:

At least one in the response of the plurality of object is the acoustic frequency response of described onscreen object.

4. the method for claim 1, it is characterised in that:

At least one in the response of the plurality of object is the eye response of described onscreen object.

5. the method for claim 1, it is characterised in that described user movement data are converted into the response of multiple object also Including:

Based on a determination that whether described user movement data mate the one or more predefined posture, access by posture with Object response carries out the data structure being correlated with；

Access the respective response of described onscreen object；And

Realizing the response of described object, described realization includes representing the skeleton pattern of described user and is mapped to described onscreen object Object model represent.

6. the method for claim 1, it is characterised in that receive described user movement data at one or more users Farther include:

At first user, receive first user exercise data, and at the second user, receive the second user movement data.

7. method as claimed in claim 6, it is characterised in that also include:

Described first user exercise data is converted into the first object response of described onscreen object；

Described second user movement data are converted into the second object response of described onscreen object；

Described first object response and the described second object response of described onscreen object are simultaneously displayed to described first and use Family and described second user.

8. method as claimed in claim 7, it is characterised in that:

When described second user movement data differ with described first user exercise data, described second object response and institute State the first object response to differ.

9. method as claimed in claim 7, it is characterised in that:

When described second user movement data are identical with described first user exercise data, described second object response is described The amplification of first user response.

10. the method for claim 1, it is characterised in that:

Receive described user movement data to include at first user, receiving first user exercise data and at the second user Receive the second user movement data；

Described user movement data are converted into the response of multiple object and include being converted into described first user exercise data described Described second user movement data are converted into the second motion of described onscreen object by the first motion response of onscreen object Response；

Described user movement data are converted into the response of multiple object include representing the skeleton pattern of described user movement data The object model being mapped to described onscreen object represents；

Show that multiple objects response of described onscreen object includes showing the fortune of described onscreen object based on described conversion At least one in dynamic response, acoustic frequency response or eye response.

11. the method for claim 1, it is characterised in that:

One or more described onscreen object include the abiotic object presented in the application；And wherein:

At least one in the response of the plurality of object includes the motion of described abiotic object.

12. 1 kinds are used for strengthening the method mutual with the user of application, including:

Receiving the first user exercise data of first user at sensor, described first user exercise data corresponds to and is counting The first user that the onscreen object presented in the application performed on calculation equipment is carried out is mutual；

At described sensor, receive the second user movement data of the second user, described second user movement data corresponding to The second user that the described onscreen object presented in the application performed on the computing device is carried out is mutual；

Described first user exercise data is converted into the first motion response of described onscreen object；

Described second user movement data are converted into the second motion response of described onscreen object；And

Described first motion response and described second motion response are simultaneously displayed to described first user and described second user, And do not change described first user and the most mutual result that described application is carried out by described second user,

Wherein said onscreen object is noninteractive.

13. methods as claimed in claim 12, it is characterised in that:

When described second user movement data differ with described first user exercise data, described second motion response and institute State the first motion response to differ.

14. methods as claimed in claim 12, it is characterised in that:

When described second user movement data are identical with described first user exercise data, described second motion response is described The amplification of the first motion response.

15. 1 kinds of devices triggering object response based on the user with application alternately, including:

Catch the sensor of user movement data；And

Being connected to the calculating equipment of described sensor, described user movement data are converted into the motion of abiotic object and ring by it Should, at least one in acoustic frequency response or eye response, described calculating equipment shows described abiotic based on described conversion At least one in the described motion response of object, described acoustic frequency response or described eye response, and do not change user to described The most mutual result that application is carried out,

Wherein said abiotic object is noninteractive.