WO2012100114A2 - Multiple viewpoint electronic media system - Google Patents

Multiple viewpoint electronic media system Download PDF

Info

Publication number
WO2012100114A2
WO2012100114A2 PCT/US2012/021951 US2012021951W WO2012100114A2 WO 2012100114 A2 WO2012100114 A2 WO 2012100114A2 US 2012021951 W US2012021951 W US 2012021951W WO 2012100114 A2 WO2012100114 A2 WO 2012100114A2
Authority
WO
WIPO (PCT)
Prior art keywords
video
streams
media
electronic
content
Prior art date
Application number
PCT/US2012/021951
Other languages
French (fr)
Other versions
WO2012100114A3 (en
Inventor
Jeffrey Glasse
Joshua HARTUNG
David SOSNOW
Original Assignee
Kogeto Inc.
James, John W.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kogeto Inc., James, John W. filed Critical Kogeto Inc.
Publication of WO2012100114A2 publication Critical patent/WO2012100114A2/en
Publication of WO2012100114A3 publication Critical patent/WO2012100114A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/21805Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/698Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • This specification relates to media recording and playback.
  • Electronic multi track audio and video recording systems are used in various applications.
  • audio recording studios multiple audio tracks are used in order to separately capture the sounds produced by individual instruments and vocalists. These multiple audio charts are generally combined or “mixed down" into a track or set of tracks that are provided for a listener.
  • video recording studios multiple cameras and microphones are used to capture images and sounds from various angles and locations. Editors pick from these various audio and video tracks to produce a program for consumption by viewers.
  • this document describes a system for collecting, grouping and presenting, heterogeneous time-based media elements together with temporal, spatial and/or orientation metadata to provide a substantially seamless immersive experience from these disparate media elements, regardless of whether the operator(s) of these devices are aware of each other.
  • an audio and video capture system includes a plurality of video capture devices configured to capture a plurality of video streams of an event, a plurality of audio capture devices configured to capture a plurality of audio streams of the event, a processor configured to receive, align, and associate the video and audio streams such that the audio and video streams are accessible as a single repository of media, a data repository configured to store the associated video and audio streams, and a server configured to provide the associated video and audio streams, such that a user may select from among the video and audio streams of the event.
  • At least one of the video streams can be a video stream of a panoramic view.
  • the processor can be further configured to transform the panoramic view to a
  • the processor can be further configured to compress the video and audio streams.
  • the apparatus can include one or more time- based media capture devices configured to capture a plurality of time-based media streams of the event.
  • One or more of the time-based media capture devices can include a video stream created from data provided by an electronic whiteboard.
  • One or more of the video capture devices can be in wireless communication with the processor.
  • One or more of the video capture devices can be cellular telephones.
  • the one or more of the video capture devices can be further configured to determine positional information and provide the positional information to the processor.
  • an audio and video capture apparatus includes a plurality of video capture devices configured to capture a plurality of video streams of an event, a plurality of audio capture devices configured to capture a plurality of video streams of the event, a clock, and a communications network interface.
  • the apparatus can also include a positional sensor. At least one of the video capture devices can be a panoramic camera configured to capture a panoramic view of the event.
  • the apparatus can also include one or more time-based media capture devices configured to capture a plurality of time-based media streams of the event. At least one of the time-based media capture devices can be configured to create a video stream from data provided by an electronic whiteboard. At least one of the video capture devices can communicate the video streams wirelessly. At least one of the video capture devices can be a cellular telephone.
  • the video capture devices can be further configured to determine positional information and associate the positional information with the video streams.
  • the video capture devices can be clocks configurable to be synchronized with the clock, and associate synchronized timing information with the video streams. At least one or the audio capture devices can communicate audio streams wirelessly.
  • an audio and video capture system includes a plurality of electronic media capture devices configured to capture and provide electronic media streams at a plurality of locations at an event, and an electronic media recording device configured to receive, align, and associate the electronic media streams, and provide the associated electronic media streams as a single electronic media stream comprising a plurality of selectable audio-visual perspectives of the event.
  • the electronic media recording device can provide timing synchronization signals to the electronic media capture devices.
  • the electronic media streams can include
  • the electronic media capture devices and the electronic media recording devices can communicate wirelessly.
  • the electronic media capture devices can be audio capture devices.
  • the electronic media capture devices can be video capture devices.
  • the video capture devices can include optics configured to capture panoramic views of the event.
  • the electronic media capture devices can be cellular telephones.
  • the electronic media capture devices can be electronic whiteboards.
  • the event can be a classroom lecture.
  • a method of creating media content includes receiving, at a processing device, a plurality of electronic content streams comprising content captured from an event, each stream comprising information describing the time at which the electronic content was captured.
  • the electronic content streams are aligned by the processing device.
  • the processing device also creates a collection of metadata which identifies each of the electronic content streams and the locations at which the electronic content streams were captured.
  • the method can also include compressing the electronic content streams.
  • the electronic content streams can include a plurality of audio streams.
  • the electronic content streams can include a plurality of video streams.
  • the plurality of video streams can include panoramic video content.
  • At least one of video streams can include panoramic video content, and at least one of the video streams can include non-panoramic video content.
  • the electronic content streams can include video created from data provided by an electronic whiteboard.
  • Each of the electronic content streams can also include metadata which describes timing information identifying when the electronic content stream was captured, and metadata which identifies the location at which the electronic content stream was captured.
  • the event can be a classroom lecture.
  • Each stream can also include an identifier of a device which captured the electronic content.
  • a method for presenting media content includes receiving, at a user device, data that describes a plurality of electronic media content streams comprising aligned content captured from an event.
  • the user device presents a selection of electronic media content streams and, in response to a user selection, presents a selected electronic media content stream.
  • the electronic content streams can include a plurality of audio streams.
  • the electronic content streams can include a plurality of video streams.
  • the plurality of video streams can include panoramic video content.
  • At least one of video streams can include panoramic video content, and at least one of the video streams can include non- panoramic video content.
  • the electronic content streams can include video created from data provided by an electronic whiteboard.
  • the method can also include presenting user controls receptive to user inputs which direct the presentation of a selected subsection of the panoramic video content.
  • the method can also include comprising transforming the selected subsection from a panoramic perspective to a substantially first-person perspective.
  • the method can also include requesting the selected subsection of the panoramic video content from a server, and receiving a video content stream from the server comprising a first-person perspective view of the selected subsection.
  • the event can be classroom lecture.
  • the method can also include identifying by the user device in response to another user selection, a first time code associated with the electronic content stream, and presenting another selected electronic media content stream starting at a second time code associated with the other electronic media stream and aligned with the first time code.
  • viewpoints can be panoramic, wherein the viewer may interact with the playback of the panoramic view to emulate being able to look around the environment in which the media was recorded.
  • devices used to capture the media can be consumer electronic devices, such as cellular telephones.
  • multiple media content streams can be grouped, aligned and presented for playback in a process that is at least partly automated, thereby reducing or eliminating the need for human directors, editors, and/or producers.
  • FIG. 1 is shows an example system for capturing, distributing, and presenting media content.
  • FIG. 2 is a block diagram of an example media capture system.
  • FIGs. 3A and 3B show example media capture devices.
  • FIG. 4 shows another example system for capturing media content.
  • FIG. 5 shows an example user interface for presenting media content.
  • FIG. 6 is a flow diagram for an example process for capturing media content.
  • FIG. 7 is a flow diagram for an example process for presenting media content.
  • FIG. 8 is a block diagram of an example computing system.
  • This document describes systems and techniques for capturing, distributing, and presenting multiple streams of audio and video content, which can be synchronized to varying degrees of accuracy in certain implementations.
  • Such multiple synchronized streams can for example, provide a media user with an immersive and interactive experience while watching the playback of media recorded at an event, such as a lecture, a conference, a demonstration, a meeting, or other appropriate event, substantially without the need for a human producer, and whether or not the individual capture devices were coordinated by a single entity or represent captures of
  • a classroom can be outfitted with multiple camera devices placed at various locations within the classroom, wherein one camera may be positioned to capture a detailed view of a blackboard at the front of the room, while a panoramic camera may be positioned to capture a substantially 360-degree view of the classroom from approximately the instructor's or a student's audio-visual perspective.
  • consumer-grade electronic devices can be used to capture one or more streams of video.
  • a cellular telephone configured for video recording may be used.
  • some of the aforementioned consumer-grade electronic devices can be equipped with lenses that convert the devices' cameras into panoramic cameras.
  • time-based media capture devices may also be used, such as electronic whiteboards that convert the instructor's writing and drawings into time-based media streams which may be synchronized with video streams.
  • An arbitrary number of audio devices e.g., microphones
  • an instructor can wear a wireless lapel microphone to capture his or her lecture, while another microphone may be directed to capture the sounds of students in the classroom as they participate in classroom discussions.
  • All of these simultaneously-captured audio and video (AV) sources can then be synchronized and associated together as a group in time and space using
  • the grouped AV streams can then be made available for download or streaming to viewers.
  • Viewers may receive and view some or all of the grouped AV streams, or parts thereof, based at least partly on the viewer's selections from among the various AV streams in the group, in many cases using a standardized viewer which is
  • the set of such viewers is extensible, allowing the system to become adaptive and flexible as it is utilized more frequently by a large collection of users.
  • the grouped AV stream may include information that describes each of the audio, video, and other time-based media streams, including the locations where the respective streams were captured relative to each other.
  • the student may be provided with user interface (Ul) controls that allow the user to select a subsection of the panoramic view that is to be "de-warped" into a first-person view, thereby providing the student with a simulated ability to look (e.g., pan) around the classroom.
  • Ul user interface
  • viewers can access the grouped AV sources, view the recorded event from any of the available viewpoints, switch among viewpoints, and pan about the recoded event in a manner that can gives the viewer an interactive playback experience that can provide a greater feeling of presence and involvement in the event that the viewer may feel by passively watching a traditional video presentation.
  • FIG. 1 shows an example system 100 for capturing, distributing, and presenting media content.
  • the system 100 includes a media capture device 102 and a media capture device 104.
  • the media capture device 102 provides an AV media stream 106
  • the media capture device 104 provides an AV media stream 108.
  • the media capture devices 102, 104 are configured to capture AV media, such as first- person video images, panoramic video images, and/or one or more channels of audio.
  • the media capture devices 102, 104 are discussed in additional detail in the
  • the AV media streams 106, 108 include both media, such as audio and/or video, and metadata.
  • the metadata may include time codes or timing information. For example, SMPTE time codes may be associated with the media to make it easier to later synchronize or otherwise align the AV media streams 106, 108.
  • the metadata may identify the media capture device used to capture the media.
  • the metadata may include positional information that describes the media capture device used to capture the media.
  • the metadata for the AV media stream 106 may indicate that the media capture device 102 captured its AV media at a location near the front of a classroom, while the metadata of the AV media stream 108 may indicate that the media capture device 104 is capturing AV media from a location near the back of the classroom.
  • positional information may be provided manually (e.g., by users of the media capture devices 102, 104.
  • positional information may be determined by the media capture devices 102, 104.
  • the media capture devices 102, 104 may include global positioning system (GPS) sensors, electronic compasses, accelerometers, beacons, or other appropriate forms of position locators that may be used by the media capture devices 102, 104 to determine their absolute positions, or to determine the positions of the media capture devices 102, 104 relative to each other.
  • GPS global positioning system
  • the AV media streams 106, 108 are received by a computer device 1 10 and stored in a storage device 1 12.
  • the computer device 1 10 associates and aligns the stored AV media streams 106, 108, and adds a set of metadata 1 14.
  • the metadata 114 can include information about the locations of the media capture devices 102, 104, information about the media streams 106, 108 (e.g., bitrates, compression formats, panoramic lens parameters), information about the event (e.g., name, date, location, transcripts), embedded electronic content (e.g., electronic documents, links to related content), or other appropriate information.
  • the alignment may be based on time codes included in the AV media streams 106, 108.
  • the media capture devices 102, 104 may include high-precision clocks that are substantially synchronized, and these clocks may be used to provide time code metadata.
  • the media capture devices 102, 104 may be triggered to begin capturing the AV media streams 106, 108 at substantially the same time.
  • beacons may be used to keep the media capture devices 102, 104 synchronized.
  • a periodic radio frequency "heartbeat" signal may be broadcast to re-align the media capture devices' 102, 104 clocks.
  • infrared (IR) light may be used.
  • CMOS video capture devices are generally sensitive to infrared wavelengths of light that humans cannot see.
  • the media capture device 102 may be equipped with such an IR beacon that is invisible to nearby persons, yet the media capture device 104 may be able to detect the beacon through its camera and use the beacon to synchronize itself with the media capture device 102.
  • acoustic beacons such as ultrasonic pings that are inaudible to humans may be used.
  • a lecture hall or auditorium may be large enough to permit echoing to occur, therefore an audio stream captured by a microphone at a lectern in the front of the room, and an audio stream captured at the back row of the room, may exhibit an unwanted echo due to the sound propagation delay between the two microphones when played back simultaneously.
  • An audible or ultrasonic acoustic ping may therefore be broadcast from the lectern or elsewhere in the room. Since the propagation delay of the acoustic ping may be substantially equal or proportional to the delay of sounds in the room, the ping may be used to synchronize the media recording devices 102, 104 relative to one another.
  • the AV media streams may be aligned based on events recorded in the media of the AV media streams 106, 108. For example, two or more video streams may be analyzed to locate one or more key events within the stream that may have been captured by two or more of the media capture devices. The two or more video or audio streams may then be time shifted to align the key event.
  • key events may include visible events such as a sudden change in the level or hue of the recorded ambient lighting, change in the content of the scene (e.g., students jumping up from their chairs at the end of a class period), or other identifiable changes in the scene.
  • Key events may include audible events such as taps of chalk against a chalkboard, coughs or sneezes in a classroom, or other identifiable changes in volume (e.g., applause), frequency (e.g., the notes in a melody), or spectrum (e.g., white noise disappears when a ventilation system shuts off) .
  • audible events such as taps of chalk against a chalkboard, coughs or sneezes in a classroom, or other identifiable changes in volume (e.g., applause), frequency (e.g., the notes in a melody), or spectrum (e.g., white noise disappears when a ventilation system shuts off) .
  • video streams may be synchronized substantially absolutely while audio streams may be synchronized relatively.
  • audio streams may be synchronized relatively.
  • sound and video of a distant subject may appear out of sync (e.g., light may reach the camera perceptibly sooner than the accompanying sound).
  • synchronizing the audio and video differently a perceived synchronization may be attained.
  • the computer device combines the streams 106, 108, and the metadata 1 14 into an associated media stream 120.
  • the AV media streams 106, 108 may be associated by grouping or combining the streams 106, 108, and the metadata 1 14.
  • the AV media streams 106, 108, and the metadata 1 14 may be encoded into a single media file or stream, At playback, the single media file or stream, or portions thereof, are decoded and presented to the viewer.
  • the AV media streams 106 and 108, as well as the metadata 1 14 are stored as separate streams and/or files, wherein the metadata 1 14 maintains links to the locations of the AV media streams 106, 108.
  • a viewer may access the metadata 1 14 to gain access to the AV media streams 106, 108 as though they were part of a single file or stream.
  • the associated media stream 120 is then provided to a web service 130.
  • the web service 130 is a network or Internet server that can store and serve the associated media stream 120 to client devices 140, 142.
  • the client device 140 and the client device 142 communicate with the web service to request a download or stream of the associated media stream 120, and present the associated media stream 120 as AV media.
  • the computer device 1 10 may include the functions of the web service 130.
  • the associated media stream 120 can be provided substantially in its entirety.
  • the client device 140 may be a laptop computer with a fast broadband network connection to the web service 130, and request a complete download of the associated media content 120 (e.g., for offline storage and playback).
  • portions of the associated media stream 120 can be provided by the web service 130.
  • the client device 142 can be a smartphone with limited network bandwidth and storage.
  • the web service 130 may stream a sub-portion of the associated media stream 120 that includes the view that a user has currently selected.
  • the web server 130 may also transcode the stream to a format that the client device 142 can decode and/or to a bitrate that is supported by the device's network connection speed.
  • the web service 130 may modify and/or stream a sub-portion of a single AV media stream.
  • the AV media stream 106 may be panoramic video.
  • the web server may select a portion of the panoramic view that corresponds to the user's input, transform (e.g., de-warp) the portion of the panoramic view into a first-person view, and stream the first-person view to the client device 142.
  • the user input may include gestures or other manipulation of the client devices 140, 142 themselves.
  • the client device 142 may be a smartphone that includes accelerometers, magnetometers, gyroscopes, or other positional sensors that allow the client device 142 to sense its orientation. The user may then hold up the client device 142 to view one section of a panoramic media stream, and then tilt, pan, and/or rotate the client device 142 to pan and tilt her view of the panoramic media stream.
  • FIG. 2 is a block diagram of an example media capture system 200.
  • the system 200 can be the client device 102 or the client device 104 of FIG.1.
  • the system 200 includes a processor module 202, a memory module 204, and a storage module 206.
  • the processor 202 is configured to execute program code stored in the memory.
  • the memory module 204 and the storage module 206 are configured to also store and retrieve electronic data.
  • the system 200 includes a video processor module 208 and an audio processor module 210.
  • the video processor module 208 receives and digitizes video signals provided by a high definition camera 212 and a panoramic camera 214.
  • the high definition camera 212 and/or the panoramic camera 214 may connect to the system 200 wirelessly.
  • the audio processor module 210 receives and digitizes audio signals provided by a wireless microphone 216 and a wireless microphone 218.
  • the microphones 216 and/or 218 may connect to the audio processor module 210 through a wired connection.
  • a location sensor module 220 is configured to sense and determine the geographic location of the system 200.
  • the location sensor module 220 can be a GPS receiver.
  • a position and orientation sensor module 222 is configured to sense and determine the position, orientation, and motion of the system 200.
  • the position and orientation sensor 222 may include a
  • the position and orientation sensor 222 may include tilt sensors or gyroscopes to sense the pitch and yaw of the system 200.
  • the position and orientation sensor 222 may include
  • accelerometers to sense movement of the system 200.
  • the system 200 includes a clock module 224.
  • the clock module 224 may be a real time clock.
  • the clock module 224 may be a high-precision clock.
  • the clock module 224 may provide timing signals that may be associated with or included into the digitized AV streams provided by the video processor module 208 and the audio processor module 210.
  • the clock module 224 may be part of the location sensor module 220.
  • the location sensor module 220 may be a GPS receiver, which operates based on the reception of high precision timing signals transmitted by GPS satellites.
  • the processor module 202 is operable to store the digitized AV streams provided by the video processor module 208 and the audio processor module 210 in the storage module 206.
  • the processor module 202 is further operable to align and associate the digitized high definition video, the digitized panoramic video, the digitized audio streams, the location information, the position information, the timing information, and any other appropriate information, together as an associated media file or stream.
  • the processor module 202 may compress the aforementioned media streams and other data prior to association, or may compress the associated media file or stream after the association.
  • the system 200 includes a network interface module 230 configured to connected to a wired and/or wireless network.
  • the network interface module 230 may connect the system 200 to a wireless Ethernet local area network (Wi- Fi), a cellular data network (e.g., EVDO, 3G, 4G, LTE, WiMAX), or other appropriate wireless network.
  • the network interface module 230 may connect the system 200 to Ethernet local area network, a power line communication network (e.g., HomePlug), a fiber optic network, or other appropriate network.
  • the network interface module 230 can connect the system 200, directly or indirectly, to the Internet.
  • the network interface module 230 can connect the system 200 to media capture devices, such as the media capture device 104 of FIG. 1.
  • the processor module 202 is operable to communicate, through the network interface module 230, to a web service or other server.
  • the processor 202 may retrieve an associated media file from the storage module 206 and upload the file to the web service 130 of FIG. 1 through the network interface module 230.
  • the system 200 may be operated remotely through a remote operations device in communication with the network interface module 230. Examples of remote operations devices and interfaces are discussed in the descriptions of FIGs. 4 and 5.
  • FIG. 3A shows an example media capture device 300.
  • FIG. 3A shows an example media capture device 300.
  • the media capture device 300 may be the media capture device 102 of FIG. , or the system 200 of FIG. 2.
  • the media capture device 300 includes a substantially planar base member 302, an electronics enclosure 304, a support arm member 310, and a video head member 312.
  • the base member 302 may be of sufficient width, length, and mass to provide stability for the media capture device 300 when the device 300 is placed on a flat surface such as a table or desk.
  • the electronics enclosure 304 includes electronic power and computing components.
  • the electronics enclosure 304 may include the processor module 202, the memory module 204, the storage module 206, the video processor module 208, the audio processor module 210, the location sensor module 220, the position and orientation sensor module 222, the clock module 224, the network interface module 230, power supplies, or other appropriate electronic components.
  • the electronics enclosure 304 may also include switches and/or other user interfaces, as well as power, communications, AV, and other appropriate connection ports.
  • the electronics enclosure 304 also includes a microphone receptacle 306a and 306b.
  • the microphone receptacles 306a, 306b are formed to at least partially receive a wireless microphone 308a and a wireless microphone 308b.
  • the wireless microphones 308a and 308b may be retained within the receptacles 306a and 306b (e.g., by catches, friction, magnets) for storage and/or transport.
  • the receptacles 306a and 306b may be include electrical or inductive components for recharging batteries in the wireless microphones 308a and 308b.
  • the support arm member 310 provides support to elevate the video head member 312.
  • the support arm 310 is pivotably connected to the electronics enclosure 304 so as selectively elevate the video head member 312 above the base member 302.
  • the video head member 312 is pivotably connected to the support arm member 310 so as to permit the video head member 312 to be selectably angled relative to, and/or to be made substantially vertical relative to the base member 302.
  • the support arm member 310 is adjustable to lower the video head member 312 into contact with a bumper 314.
  • the bumper 314 may protect the video head member 312 during storage and/or transport of the media capture device 300.
  • the video head member 312 includes a high resolution camera 316 adjustably connected to the video head member 312 by a positioning member 318.
  • the high resolution camera 316 can be the high resolution camera 212 of FIG. 2.
  • the positioning member 318 and the high resolution camera 316 are adjustable to aim the high resolution camera 316 at a selected subject.
  • the high resolution camera 316 may be oriented to capture a view of a lecturer or of a whiteboard.
  • the video head member 312 also includes a panoramic camera 320.
  • the panoramic camera 320 can be the panoramic camera 214 (FIG. 2).
  • the panoramic camera 320 includes a lens section 322.
  • the lens section 322 focuses light that is reflected off a reflective dome 324 and into the panoramic camera 320.
  • the reflective dome 324 is positioned relative to the lens section 322 by a substantially transparent cylinder 326.
  • light bouncing off objects located in a zone surrounding the cylinder enters the transparent cylinder 326, is reflected off the reflective dome 324, is focused by the lens section 322, and is captured by the panoramic camera 320.
  • the panoramic camera 320 captures radial image of the camera's surroundings. For example, pixels ringing the center of the captured image represent light reflected off objects surrounding the device 300 at a relatively low height, while pixels ringing the outer portions of the captured image represent light reflected off objects surrounding the device 300 at relatively higher elevations.
  • FIG. 3B shows an example media capture device 350.
  • FIG. 3B shows an example media capture device 350.
  • the media capture device 350 can be the media capture device 104.
  • the media capture device 350 is a portable electronic device such as a cellular telephone, smart phone, personal digital assistant, or other electronic device capable of capturing video.
  • the media capture device is a smartphone 352 equipped with a camera (not shown).
  • a panoramic optical adapter 354 is coupled to the smartphone 325 such that the adapter 345 is positioned over the camera lens to enable the camera to capture a substantially panoramic image.
  • Light, reflected off objects surrounding the adapter 354 enters through a substantially transparent cylinder 356, bounces off a reflective dome section 358 and onto a mirror 360. The light is then reflected off the mirror 360 and through an aperture 360 in the top of the reflective dome section 358.
  • the aperture 360 permits the light to pass through the lens and into the camera of the smartphone 352.
  • additional information produced by the smartphone may be added to the panoramic video captured by the camera.
  • time codes For example, time codes, positional information (e.g., GPS coordinates, electronic compass heading), and/or other appropriate information may be added or otherwise associated with the panoramic video.
  • the panoramic view captured by the camera is recorded, streamed , or is otherwise provided to a device such as the media capture device 300, or to a computer device such as the computer device 1 10 of FIG. 1 where the panoramic video may be aligned and/or associated with other AV streams.
  • the smartphone 352 may use its wireless functions (e.g., cellular, WiFi, Bluetooth) to transfer the panoramic video the media capture device 300 or other appropriate device.
  • FIG. 4 shows another example system 400 for capturing media content.
  • the system 400 includes a parent device 410, a collection of child media capture devices 420, and a remote device 430.
  • the parent device 410 is a media capture device with wireless communications capabilities.
  • the parent device 410 can be the media capture device 102 of FIG. , the system 200 of FIG. 2, or the media capture device 300 of FIG. 3.
  • the illustrated example may be referred to as a "swarm" configuration.
  • the parent device 410 communicates wirelessly with the child media capture devices 420.
  • the child media devices 420 are configured to capture first-person video, panoramic video, and/or audio content, from a number of different audio-visual perspectives and wirelessly transmit the captured content to the parent device 410.
  • the child media capture devices 420 can be the media capture device 104, the system 200, or the media capture device 350.
  • the system 400 is configured to capture audio and video of an event, such as a classroom lecture, a presentation, a demonstration, a speech, a meeting, or other appropriate event from a variety of different locations, or points of view, within and around the event location.
  • an event such as a classroom lecture, a presentation, a demonstration, a speech, a meeting, or other appropriate event from a variety of different locations, or points of view, within and around the event location.
  • the parent device 410 may be located near the front of the classroom and the child media capture devices 420 may be located on several of the students' desks.
  • the parent device 410 may capture a clear view of the instructor and/or whiteboard, while the child media devices 420 capture audio and/or video of various locations within the classroom.
  • the AV media from the devices 410 and 420 is grouped and aligned as an associated media stream, the associated media stream may be played back such that the viewer may watch and listen to the recorded classroom lecture from
  • the viewer can interact with panoramic video streams by panning and tilting a view of a subsection of the panoramic view, to provide the viewer with an experience similar to being able to look around the classroom (e.g., to see student reactions, to look at a student who is asking or answering a question).
  • the parent device 410 can be a device substantially similar to the device 102, the system 200, or the device 300, but may omit selected components such as cameras, microphones, video processing, audio processing, location sensors, or position sensors.
  • the parent device 4 0 may include substantially only the components of the system 200 that are needed to receive, align, and associate AV media streams from the child media capture devices 420, and provide the associated media stream to a web service such as the web service 130.
  • the remote device 430 is in wireless communication with the parent device 410, and provides a user interface with which a user can interact to view selected AV media streams provided by the child media capture devices 420, and/or perform various directing, editing, and production functions for the creation of the associated media stream.
  • the remote device 430 may be used to pre-select default views that will be presented during playback of the associated media stream. For example, a user of the remote device 430 may select a high resolution video feed of a whiteboard while a teacher is writing, or may select a panoramic stream and a particular view while a student answers a question.
  • the remote user may select an audio feed obtained from a lapel or unidirectional microphone focused on the teacher while the teacher is speaking, or select an audio feed from an omnidirectional microphone during classroom discussion.
  • these selections may be integrated into the associated media stream (e.g., as part of the metadata 1 14) such that when the associated media stream is played back, the playback may automatically switch among the various grouped AV media streams as the media plays. The viewer may then passively watch the playback, or may override the preselected views to view and listen to the lesson from substantially any of the available grouped audio and video streams.
  • FIG. 5 shows an example user interface (Ul) 500 for presenting media content.
  • the Ul 500 may be used during the playback of an associated media stream to provide the viewer with controls that may be used to select from among a variety of audio-visual perspectives captured by the AV media streams and grouped by the associated media stream, select particular views from within panoramic video streams, and control the playback (e.g., play, pause, fast forward, rewind) of the associated media stream.
  • the Ul 500 may be the user interface presented by the remote device 430 of FIG. 4.
  • the Ul includes a viewing region 502, which generally provides one or more views of video media streams included in an associate media stream, and a control region 504, which generally provides user interface elements to present information about and to control the playback of the associated media stream.
  • the control region 504 includes a collection of playback controls 506.
  • the playback controls 506 may include buttons for functions such as play, pause, fast forward, rewind, slow motion, frame advance, or other appropriate playback functions.
  • the control region includes a time control 508.
  • the time control 508 displays the time code associate with the media at the presented point of playback. In some implementations, the viewer may enter a time value into the time control 508 to jump to a selected time point of the presented media stream.
  • a timeline control 510 displays a timeline 512 of the duration of the presented media stream and an indicator 514 of the relative point in time of the media stream at which the playback is being presented.
  • a label 516 displays the name of an AV media stream that is available within the presented associated media stream.
  • the viewing region 502 includes primary view region 550.
  • the primary view region 550 presents a primary, or relatively large, view of a selected one or group of AV media streams of the presented associated media stream.
  • a view selector control 552 enables the viewer to select from a collection of choices representative of the media streams within the presented associated media stream.
  • the area occupied by the view selector control 552 can represent the space in which the captured event took place (e.g., the area of a classroom, a lecture hall, a conference room).
  • a collection of icons 554 and 556 is presented within the view selector control 552.
  • the icons 554, 556 represent audio-visual perspectives that are available for viewing within the presented associated media stream.
  • the icons 554 can represent available panoramic views, and the icon 556 can represent an available first- person view.
  • the viewer may click or otherwise select one of the icons 554, 556 to cause the playback of the associated media stream to switch to the selected audiovisual perspective and continue playing substantially without temporal interruption of the presentation.
  • the viewer may select one of the icons 554, and the corresponding video stream may be presented in the primary view region 550.
  • the icons 554, 556 are placed in locations within the view selector control 552 that are representative of the locations of their respective media capture devices within the area of the event.
  • the icon 556 may represent a first-person camera view captured by a camera located near the front of a classroom and pointed forward (e.g., toward a whiteboard).
  • the view selector control 552 may be a list, a dropdown control, a menu, or other appropriate user interface control that can give the viewer a selectable choice of media streams.
  • a secondary view region 560 is provided to display a reduced, substantially synchronous, presentation of another selected one of the grouped media streams (e.g., a picture-in-picture display).
  • the secondary view region 560 may be configured to view a video stream of the teacher or whiteboard while the viewer freely looks around the classroom using a section of a panoramic view that is displayed in the primary view region 550.
  • the associated media stream may include two video streams, and the secondary view region may be configured to always show the video stream that is not being presented in the primary view region 550.
  • a pan view region 570 presents a view of a selected panoramic video stream.
  • a substantially polar or radial panoramic image may be at least partly unwrapped, de-warped, or otherwise transformed into a flattened, rectangular format for display in the pan view region.
  • a viewport control 572 may be moved within the pan view region 570 by the viewer to pan and tilt a view of a
  • subsection of the panoramic view is transformed from panoramic format to a first-person perspective and presented in the primary view region 550 or the secondary view region 560.
  • the subsection of the panoramic view selected by the viewer through the viewport control may be passed to a web service, such as the web service 130 of FIG. 1 .
  • the web service may then transform only the selected subsection of the panoramic video stream, and stream that view to the viewer as a video stream having a first person perspective.
  • FIG. 6 is a flow diagram for an example process 600 for capturing media content.
  • the process 600 may be implemented by the media capture devices 102, 104, or 300 of FIGs. 1 , 3A, 3B and 4, respectively, or by the system 200 of FIG. 2, or by the parent device 410 of FIG. 4.
  • the process begins at step 610, when multiple content streams are received.
  • the system 200 may receive video content streams captured by the cameras 212, 214, and audio streams captured by the microphones 216, 218.
  • the parent device 410 may receive multiple AV content streams from the child media capture devices 420.
  • the content streams are aligned.
  • the AV content streams provided by the child media capture devices 420 may arrive at the parent device 410 with various delays (e.g., due to network latency), and the parent device 410 may use the start of each AV content stream as an alignment point by which to substantially synchronize the parallel AV media streams.
  • time codes within the content streams may be used to substantially align or synchronize the content streams.
  • step 630 metadata that identifies the content streams and the locations at which the streams were captured is created.
  • the system 200 may identify the audio-visual perspective captured by the high resolution camera 212 as "whiteboard view” and identify the audio-visual perspective captured by the panoramic camera 214 as "classroom view”.
  • the parent device 410 may receive the location and/or position information provided by the child media capture devices 420, and associate that location information with the respective AV media streams as metadata.
  • the content streams and the metadata are grouped into an associated content stream.
  • the system 200 may encode the video and audio streams captured by the cameras 212, 214 and the microphones 216, 218 into a single media file or stream.
  • the single media stream, or portions thereof, can be decoded and presented to the viewer.
  • the system 200 may store the AV streams and the metadata as separate streams and/or files, wherein the metadata may maintains links to the locations of the AV media streams.
  • a viewer may access the metadata to gain access to the AV media streams as though they were all part of a single file or stream.
  • a single (e.g., mono, stereo, surround sound) audio stream for inclusion in the associated content stream may be formed by switching among a selection of captured audio streams. For example, a stream captured from a lapel microphone may be selected while a classroom instructor is speaking, and a stream captured from an omnidirectional microphone may be selected while students are asking questions.
  • a human editor may select from among multiple audio streams to choose which portions are to be included in the associated media stream.
  • the remote device 430 may be used by an editor to select which audio stream is to be included in the associated content stream.
  • an automated process may be used to select from among multiple audio streams to choose which portions are to be included in the associated content stream.
  • the system 200 may automatically switch among audio streams (e.g., based on the loudest source, the most continuous source) to form an audio stream for inclusion in the associated content stream.
  • the associated content stream is compressed.
  • these streams may be multiplexed for higher performance and better synchronization during playback.
  • the associated content stream is provided for viewing.
  • the computer device 1 10 may upload the associated content stream 120 to the web service 130, from which the associated content stream 120 may be
  • FIG. 7 is a flow diagram for an example process 700 for presenting media content.
  • the process 700 may be performed by the client devices 140, 142 of FIG. 1 , or the remote device 430 of FIG. 4.
  • the process 700 may be performed by the client devices 140, 142 of FIG. 1 , or the remote device 430 of FIG. 4.
  • the Ul 500 of FIG. 5 may be used by a viewer to interact with the functions of the process 700.
  • the process 700 begins at step 710 when data that describes multiple content streams is received.
  • the client devices 140, 142 may request and receive an associated content stream from the web server 130.
  • the received data may be a stream or file that includes substantially all the AV content streams that have been grouped into the associated content stream. In some implementations, substantially only the metadata describing the grouped AV content streams is received.
  • a selection of the content streams is presented to the viewer. For example, the view selector control 552 may be used to present a choice of audiovisual perspectives for the viewer to choose from.
  • a user selection of a content stream is received. For example, the viewer may select or click on one of the icons 554, 556 within the view selector control 552.
  • the selected content is presented at step 740. For example, the user may select the icon 556, and the video associated with the icon 556 may be presented in the primary view region 550.
  • presentation of the selected content stream is started at a time code associated with the selected content stream and aligned with the identified time code.
  • the client device 140 may stop presenting the content stream associated with the icon 556 at time "00:10:15", and begin presentation of the content stream associated with the selected icon 554 at time "00:10:15" within the newly selected content stream.
  • the process 700 continues at step 740, where the selected content stream continues to be presented. [0087] If, however, at step 750 a user selection of another content stream is not received, then another determination is made. If at step 780 the presentation of the associated content streams is not complete (e.g., playback has not reached the end of the duration of the stream), then the process continues at step 740. If, however at step 780 the presentation of the associated content streams is determined to be complete (e.g., playback has reached the end of the duration of the stream), then the processes 700 ends.
  • FIG. 8 is a block diagram of computing devices 800, 850 that may be used to implement the systems and methods described in this document, either as a client or as a server or plurality of servers.
  • Computing device 800 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers.
  • Computing device 850 is intended to represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smartphones, and other similar computing devices.
  • the components shown here, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed in this document.
  • Computing device 800 includes a processor 802, memory 804, a storage device 806, a high-speed interface 808 connecting to memory 804 and high-speed expansion ports 810, and a low speed interface 812 connecting to low speed bus 814 and storage device 806.
  • Each of the components 802, 804, 806, 808, 810, and 812 are interconnected using various busses, and may be mounted on a common
  • the processor 802 can process instructions for execution within the computing device 800, including instructions stored in the memory 804 or on the storage device 806 to display graphical information for a GUI on an external input/output device, such as display 816 coupled to high speed interface 808.
  • an external input/output device such as display 816 coupled to high speed interface 808.
  • multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory.
  • multiple computing devices 800 may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multiprocessor system).
  • the memory 804 stores information within the computing device 800.
  • the memory 804 is a computer-readable medium. In one
  • the memory 804 is a volatile memory unit or units. In another implementation, the memory 804 is a non-volatile memory unit or units.
  • the storage device 806 is capable of providing mass storage for the computing device 800.
  • the storage device 806 is a computer- readable medium.
  • the storage device 806 may be a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations.
  • a computer program product is tangibly embodied in an information carrier.
  • the computer program product contains instructions that, when executed, perform one or more methods, such as those described above.
  • the information carrier is a computer- or machine-readable medium, such as the memory 804, the storage device 806, or memory on processor 802.
  • the high speed controller 808 manages bandwidth-intensive operations for the computing device 800, while the low speed controller 812 manages lower bandwidth-intensive operations. Such allocation of duties is exemplary only.
  • the high-speed controller 808 is coupled to memory 804, display 816 (e.g., through a graphics processor or accelerator), and to high-speed expansion ports 810, which may accept various expansion cards (not shown).
  • low-speed controller 812 is coupled to storage device 806 and low-speed expansion port 814.
  • the low-speed expansion port which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter.
  • input/output devices such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter.
  • the computing device 800 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 820, or multiple times in a group of such servers. It may also be implemented as part of a rack server system 824. In addition, it may be implemented in a personal computer such as a laptop computer 822. Alternatively, components from computing device 800 may be combined with other components in a mobile device (not shown), such as device 850. Each of such devices may contain one or more of computing device 800, 850, and an entire system may be made up of multiple computing devices 800, 850 communicating with each other.
  • Computing device 850 includes a processor 852, memory 864, an
  • the device 850 may also be provided with a storage device, such as a microdrive or other device, to provide additional storage.
  • a storage device such as a microdrive or other device, to provide additional storage.
  • Each of the components 850, 852, 864, 854, 866, and 868, are interconnected using various buses, and several of the components may be mounted on a common motherboard or in other manners as appropriate.
  • the processor 852 can process instructions for execution within the
  • the processor may also include separate analog and digital processors.
  • the processor may provide, for example, for coordination of the other components of the device 850, such as control of user interfaces, applications run by device 850, and wireless communication by device 850.
  • Processor 852 may communicate with a user through control interface 858 and display interface 856 coupled to a display 854.
  • the display 854 may be, for example, a TFT LCD display or an OLED display, or other appropriate display
  • the display interface 856 may comprise appropriate circuitry for driving the display 854 to present graphical and other information to a user.
  • the control interface 858 may receive commands from a user and convert them for submission to the processor 852.
  • an external interface 862 may be provide in communication with processor 852, so as to enable near area communication of device 850 with other devices.
  • External interface 862 may provide, for example, for wired communication (e.g., via a docking procedure) or for wireless communication (e.g., via Bluetooth or other such technologies).
  • the memory 864 stores information within the computing device 850.
  • the memory 864 is a computer-readable medium.
  • the memory 864 is a volatile memory unit or units.
  • the memory 864 is a non-volatile memory unit or units.
  • Expansion memory 874 may also be provided and connected to device 850 through expansion interface 872, which may include, for example, a SIMM card interface. Such expansion memory 874 may provide extra storage space for device 850, or may also store applications or other information for device 850. Specifically, expansion memory 874 may include instructions to carry out or supplement the processes described above, and may include secure information also.
  • expansion memory 874 may be provide as a security module for device 850, and may be programmed with instructions that permit secure use of device 850.
  • secure applications may be provided via the SIMM cards, along with additional information, such as placing identifying information on the SIMM card in a non-hackable manner.
  • the memory may include for example, flash memory and/or MRAM memory, as discussed below.
  • a computer program product is tangibly embodied in an information carrier.
  • the computer program product contains
  • the information carrier is a computer- or machine-readable medium, such as the memory 864, expansion memory 874, or memory on processor 852.
  • Device 850 may communicate wirelessly through communication interface 866, which may include digital signal processing circuitry where necessary.
  • Communication interface 866 may provide for communications under various modes or protocols, such as GSM voice calls, SMS, EMS, or MMS messaging, CDMA, TDMA, PDC, WCDMA, CDMA2000, or GPRS, among others. Such communication may occur, for example, through radio-frequency transceiver 868. In addition, short-range communication may occur, such as using a Bluetooth, WiFi, or other such transceiver (not shown). In addition, GPS receiver module 870 may provide additional wireless data to device 850, which may be used as appropriate by applications running on device 850.
  • Device 850 may also communication audibly using audio codec 860, which may receive spoken information from a user and convert it to usable digital information. Audio codex 860 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of device 850. Such sound may include sound from voice telephone calls, may include recorded sound (e.g., voice messages, music files, etc.) and may also include sound generated by applications operating on device 850.
  • Audio codec 860 may receive spoken information from a user and convert it to usable digital information. Audio codex 860 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of device 850. Such sound may include sound from voice telephone calls, may include recorded sound (e.g., voice messages, music files, etc.) and may also include sound generated by applications operating on device 850.
  • the computing device 850 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a cellular telephone 880. It may also be implemented as part of a smartphone 882, personal digital assistant, or other similar mobile device.
  • Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof.
  • ASICs application specific integrated circuits
  • These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
  • Programmable Logic Devices used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal.
  • machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
  • the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and a pointing device e.g., a mouse or a trackball
  • Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
  • the systems and techniques described here can be implemented in a computing system that includes a back end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front end component (e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back end, middleware, or front end components.
  • the components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN”), a wide area network (“WAN”), and the Internet.
  • LAN local area network
  • WAN wide area network
  • the Internet the global information network
  • the computing system can include clients and servers.
  • a client and server are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Television Signal Processing For Recording (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The subject matter of this specification can be embodied in, among other things, an audio and video capture system that includes a plurality of video capture devices configured to capture a plurality of video streams of an event, and a plurality of audio capture devices configured to capture a plurality of audio streams of the event. A processor is configured to receive, align, and associate the video and audio streams such that the audio and video streams are accessible as a single repository of media. A data repository is configured to store the associated video and audio streams, and a server is configured to provide the associated video and audio streams, such that a user may select from among the video and audio streams of the event.

Description

Multiple Viewpoint Electronic Media System
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims priority to and the full benefit of United States
Provisional Application Serial Number 61/434,584, filed January 20, 201 1 , and titled "Multiple Viewpoint Electronic Media System", the entire contents of which are incorporated herein by reference.
TECHNICAL FIELD
[0002] This specification relates to media recording and playback.
BACKGROUND
[0003] Electronic multi track audio and video recording systems are used in various applications. In audio recording studios, multiple audio tracks are used in order to separately capture the sounds produced by individual instruments and vocalists. These multiple audio charts are generally combined or "mixed down" into a track or set of tracks that are provided for a listener. In video recording studios, multiple cameras and microphones are used to capture images and sounds from various angles and locations. Editors pick from these various audio and video tracks to produce a program for consumption by viewers.
[0004] Electronic audio and video recording has also been used in classrooms. Various schools have placed cameras and microphones in classrooms in order to capture classroom lectures. These lectures may, in turn, be broadcast live or provided as recordings for later viewing by students outside of the classroom. In such classroom applications, one camera is generally used to provide a first-person view of the instructor and/or whiteboard, and in some cases another camera is used to capture materials presented by an electronic overhead projector. Sounds are generally captured by a microphone worn by or directed toward the instructor, and in some applications additional microphones are used to pick up classroom sounds such as when a student asks a question. These various audio and video sources are generally selected for broadcast or recording "live", wherein a human operator switches among the various audio and video sources as the classroom lecture is being given.
SUMMARY
[0005] In general, this document describes a system for collecting, grouping and presenting, heterogeneous time-based media elements together with temporal, spatial and/or orientation metadata to provide a substantially seamless immersive experience from these disparate media elements, regardless of whether the operator(s) of these devices are aware of each other.
[0006] In a first aspect, an audio and video capture system includes a plurality of video capture devices configured to capture a plurality of video streams of an event, a plurality of audio capture devices configured to capture a plurality of audio streams of the event, a processor configured to receive, align, and associate the video and audio streams such that the audio and video streams are accessible as a single repository of media, a data repository configured to store the associated video and audio streams, and a server configured to provide the associated video and audio streams, such that a user may select from among the video and audio streams of the event. [0007] Various implementations can include any, all, or none of the following features. At least one of the video streams can be a video stream of a panoramic view. The processor can be further configured to transform the panoramic view to a
substantially first-person perspective. The processor can be further configured to compress the video and audio streams. The apparatus can include one or more time- based media capture devices configured to capture a plurality of time-based media streams of the event. One or more of the time-based media capture devices can include a video stream created from data provided by an electronic whiteboard. One or more of the video capture devices can be in wireless communication with the processor. One or more of the video capture devices can be cellular telephones. The one or more of the video capture devices can be further configured to determine positional information and provide the positional information to the processor.
[0008] In another aspect, an audio and video capture apparatus includes a plurality of video capture devices configured to capture a plurality of video streams of an event, a plurality of audio capture devices configured to capture a plurality of video streams of the event, a clock, and a communications network interface.
[0009] Implementations can include any, all, or none of the following features. The apparatus can also include a positional sensor. At least one of the video capture devices can be a panoramic camera configured to capture a panoramic view of the event. The apparatus can also include one or more time-based media capture devices configured to capture a plurality of time-based media streams of the event. At least one of the time-based media capture devices can be configured to create a video stream from data provided by an electronic whiteboard. At least one of the video capture devices can communicate the video streams wirelessly. At least one of the video capture devices can be a cellular telephone. The video capture devices can be further configured to determine positional information and associate the positional information with the video streams. The video capture devices can be clocks configurable to be synchronized with the clock, and associate synchronized timing information with the video streams. At least one or the audio capture devices can communicate audio streams wirelessly.
[0010] In another aspect, an audio and video capture system includes a plurality of electronic media capture devices configured to capture and provide electronic media streams at a plurality of locations at an event, and an electronic media recording device configured to receive, align, and associate the electronic media streams, and provide the associated electronic media streams as a single electronic media stream comprising a plurality of selectable audio-visual perspectives of the event.
[0011] Implementations can include any, all, or none of the following features. The electronic media recording device can provide timing synchronization signals to the electronic media capture devices. The electronic media streams can include
information indicating the positions of the electronic media capture devices, and information indicating the times at which the electronic media streams were captured. The electronic media capture devices and the electronic media recording devices can communicate wirelessly. The electronic media capture devices can be audio capture devices. The electronic media capture devices can be video capture devices. The video capture devices can include optics configured to capture panoramic views of the event. The electronic media capture devices can be cellular telephones. The electronic media capture devices can be electronic whiteboards. The event can be a classroom lecture.
[0012] In another aspect, a method of creating media content includes receiving, at a processing device, a plurality of electronic content streams comprising content captured from an event, each stream comprising information describing the time at which the electronic content was captured. The electronic content streams are aligned by the processing device. The processing device also creates a collection of metadata which identifies each of the electronic content streams and the locations at which the electronic content streams were captured.
[0013] Implementations can include any, all, or none of the following features. The method can also include compressing the electronic content streams. The electronic content streams can include a plurality of audio streams. The electronic content streams can include a plurality of video streams. The plurality of video streams can include panoramic video content. At least one of video streams can include panoramic video content, and at least one of the video streams can include non-panoramic video content. The electronic content streams can include video created from data provided by an electronic whiteboard. Each of the electronic content streams can also include metadata which describes timing information identifying when the electronic content stream was captured, and metadata which identifies the location at which the electronic content stream was captured. The event can be a classroom lecture. Each stream can also include an identifier of a device which captured the electronic content. Each stream can also include information describing the orientation of the device at the event. [0014] In another aspect, a method for presenting media content includes receiving, at a user device, data that describes a plurality of electronic media content streams comprising aligned content captured from an event. The user device presents a selection of electronic media content streams and, in response to a user selection, presents a selected electronic media content stream.
[0015] Implementations can include any, all, or none of the following features. The electronic content streams can include a plurality of audio streams. The electronic content streams can include a plurality of video streams. The plurality of video streams can include panoramic video content. At least one of video streams can include panoramic video content, and at least one of the video streams can include non- panoramic video content. The electronic content streams can include video created from data provided by an electronic whiteboard. The method can also include presenting user controls receptive to user inputs which direct the presentation of a selected subsection of the panoramic video content. The method can also include comprising transforming the selected subsection from a panoramic perspective to a substantially first-person perspective. The method can also include requesting the selected subsection of the panoramic video content from a server, and receiving a video content stream from the server comprising a first-person perspective view of the selected subsection. The event can be classroom lecture. The method can also include identifying by the user device in response to another user selection, a first time code associated with the electronic content stream, and presenting another selected electronic media content stream starting at a second time code associated with the other electronic media stream and aligned with the first time code. [0016] The systems and techniques described here may provide one or more of the following advantages. First, a system can provide an immersive and interactive media viewing experience in which the viewer can select from among a variety of viewpoints. Another advantage is that some of the viewpoints can be panoramic, wherein the viewer may interact with the playback of the panoramic view to emulate being able to look around the environment in which the media was recorded. Another advantage is that at least some of the devices used to capture the media can be consumer electronic devices, such as cellular telephones. Another advantage is that multiple media content streams can be grouped, aligned and presented for playback in a process that is at least partly automated, thereby reducing or eliminating the need for human directors, editors, and/or producers.
[0017] The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features and advantages will be apparent from the description and drawings, and from the claims.
DESCRIPTION OF DRAWINGS
[0018] FIG. 1 is shows an example system for capturing, distributing, and presenting media content.
[0019] FIG. 2 is a block diagram of an example media capture system.
[0020] FIGs. 3A and 3B show example media capture devices.
[0021] FIG. 4 shows another example system for capturing media content.
[0022] FIG. 5 shows an example user interface for presenting media content.
[0023] FIG. 6 is a flow diagram for an example process for capturing media content.
[0024] FIG. 7 is a flow diagram for an example process for presenting media content. [0025] FIG. 8 is a block diagram of an example computing system.
DETAILED DESCRIPTION
[0026] This document describes systems and techniques for capturing, distributing, and presenting multiple streams of audio and video content, which can be synchronized to varying degrees of accuracy in certain implementations. Such multiple synchronized streams, can for example, provide a media user with an immersive and interactive experience while watching the playback of media recorded at an event, such as a lecture, a conference, a demonstration, a meeting, or other appropriate event, substantially without the need for a human producer, and whether or not the individual capture devices were coordinated by a single entity or represent captures of
substantially unconnected individuals.
[0027] For example, a classroom can be outfitted with multiple camera devices placed at various locations within the classroom, wherein one camera may be positioned to capture a detailed view of a blackboard at the front of the room, while a panoramic camera may be positioned to capture a substantially 360-degree view of the classroom from approximately the instructor's or a student's audio-visual perspective. In some implementations, consumer-grade electronic devices can be used to capture one or more streams of video. For example, a cellular telephone configured for video recording may be used. Furthermore, some of the aforementioned consumer-grade electronic devices can be equipped with lenses that convert the devices' cameras into panoramic cameras. Other types of time-based media capture devices may also be used, such as electronic whiteboards that convert the instructor's writing and drawings into time-based media streams which may be synchronized with video streams. [0028] An arbitrary number of audio devices (e.g., microphones) can also be located in various places within the classroom. For example, an instructor can wear a wireless lapel microphone to capture his or her lecture, while another microphone may be directed to capture the sounds of students in the classroom as they participate in classroom discussions.
[0029] All of these simultaneously-captured audio and video (AV) sources can then be synchronized and associated together as a group in time and space using
information as the physical locations and onentations of the AV capture devices, as well as the degree to which their time-bases are synchronized. The grouped AV streams can then be made available for download or streaming to viewers.
[0030] Viewers may receive and view some or all of the grouped AV streams, or parts thereof, based at least partly on the viewer's selections from among the various AV streams in the group, in many cases using a standardized viewer which is
appropriate for the specific collection of media types. The set of such viewers is extensible, allowing the system to become adaptive and flexible as it is utilized more frequently by a large collection of users. For example, the grouped AV stream may include information that describes each of the audio, video, and other time-based media streams, including the locations where the respective streams were captured relative to each other. By selecting various AV streams captured from several different audiovisual perspectives, a student may view a recorded classroom lecture from various viewpoints within the classroom. Furthermore, video streams of panoramic video may be transformed to a first-person perspective in response to viewer input. For example, the student may be provided with user interface (Ul) controls that allow the user to select a subsection of the panoramic view that is to be "de-warped" into a first-person view, thereby providing the student with a simulated ability to look (e.g., pan) around the classroom.
[0031] By providing such a system, sounds and views of an event may be
simultaneously captured at a number of locations at an event, and these various audio and video sources can be aligned and grouped with each other for subsequent viewing. Later, viewers can access the grouped AV sources, view the recorded event from any of the available viewpoints, switch among viewpoints, and pan about the recoded event in a manner that can gives the viewer an interactive playback experience that can provide a greater feeling of presence and involvement in the event that the viewer may feel by passively watching a traditional video presentation.
[0032] FIG. 1 shows an example system 100 for capturing, distributing, and presenting media content. The system 100 includes a media capture device 102 and a media capture device 104. The media capture device 102 provides an AV media stream 106, and the media capture device 104 provides an AV media stream 108. The media capture devices 102, 104 are configured to capture AV media, such as first- person video images, panoramic video images, and/or one or more channels of audio. The media capture devices 102, 104 are discussed in additional detail in the
descriptions of FIGs. 3A and 3B.
[0033] The AV media streams 106, 108 include both media, such as audio and/or video, and metadata. In some implementations, the metadata may include time codes or timing information. For example, SMPTE time codes may be associated with the media to make it easier to later synchronize or otherwise align the AV media streams 106, 108. In some implementations, the metadata may identify the media capture device used to capture the media.
[0034] In some implementations, the metadata may include positional information that describes the media capture device used to capture the media. For example, the metadata for the AV media stream 106 may indicate that the media capture device 102 captured its AV media at a location near the front of a classroom, while the metadata of the AV media stream 108 may indicate that the media capture device 104 is capturing AV media from a location near the back of the classroom. In some implementations, positional information may be provided manually (e.g., by users of the media capture devices 102, 104. In some implementations, positional information may be determined by the media capture devices 102, 104. For example, the media capture devices 102, 104 may include global positioning system (GPS) sensors, electronic compasses, accelerometers, beacons, or other appropriate forms of position locators that may be used by the media capture devices 102, 104 to determine their absolute positions, or to determine the positions of the media capture devices 102, 104 relative to each other.
[0035] The AV media streams 106, 108 are received by a computer device 1 10 and stored in a storage device 1 12. The computer device 1 10 associates and aligns the stored AV media streams 106, 108, and adds a set of metadata 1 14. In some implementations, the metadata 114 can include information about the locations of the media capture devices 102, 104, information about the media streams 106, 108 (e.g., bitrates, compression formats, panoramic lens parameters), information about the event (e.g., name, date, location, transcripts), embedded electronic content (e.g., electronic documents, links to related content), or other appropriate information. V
[0036] In some implementations, the alignment may be based on time codes included in the AV media streams 106, 108. For example, the media capture devices 102, 104 may include high-precision clocks that are substantially synchronized, and these clocks may be used to provide time code metadata. In some implementations, the media capture devices 102, 104 may be triggered to begin capturing the AV media streams 106, 108 at substantially the same time.
[0037] In some implementations, beacons may be used to keep the media capture devices 102, 104 synchronized. For example, a periodic radio frequency "heartbeat" signal may be broadcast to re-align the media capture devices' 102, 104 clocks. In another example, infrared (IR) light may be used. For example, CMOS video capture devices are generally sensitive to infrared wavelengths of light that humans cannot see. The media capture device 102 may be equipped with such an IR beacon that is invisible to nearby persons, yet the media capture device 104 may be able to detect the beacon through its camera and use the beacon to synchronize itself with the media capture device 102.
[0038] In another example, acoustic beacons, such as ultrasonic pings that are inaudible to humans may be used. For example, a lecture hall or auditorium may be large enough to permit echoing to occur, therefore an audio stream captured by a microphone at a lectern in the front of the room, and an audio stream captured at the back row of the room, may exhibit an unwanted echo due to the sound propagation delay between the two microphones when played back simultaneously. An audible or ultrasonic acoustic ping may therefore be broadcast from the lectern or elsewhere in the room. Since the propagation delay of the acoustic ping may be substantially equal or proportional to the delay of sounds in the room, the ping may be used to synchronize the media recording devices 102, 104 relative to one another.
[0039] In some implementations, the AV media streams may be aligned based on events recorded in the media of the AV media streams 106, 108. For example, two or more video streams may be analyzed to locate one or more key events within the stream that may have been captured by two or more of the media capture devices. The two or more video or audio streams may then be time shifted to align the key event. Such key events may include visible events such as a sudden change in the level or hue of the recorded ambient lighting, change in the content of the scene (e.g., students jumping up from their chairs at the end of a class period), or other identifiable changes in the scene. Key events may include audible events such as taps of chalk against a chalkboard, coughs or sneezes in a classroom, or other identifiable changes in volume (e.g., applause), frequency (e.g., the notes in a melody), or spectrum (e.g., white noise disappears when a ventilation system shuts off) .
[0040] In some implementations, video streams may be synchronized substantially absolutely while audio streams may be synchronized relatively. For example, in a large venue, sound and video of a distant subject may appear out of sync (e.g., light may reach the camera perceptibly sooner than the accompanying sound). By synchronizing the audio and video differently, a perceived synchronization may be attained.
[0041] The computer device combines the streams 106, 108, and the metadata 1 14 into an associated media stream 120. In some implementations, the AV media streams 106, 108 may be associated by grouping or combining the streams 106, 108, and the metadata 1 14. For example, the AV media streams 106, 108, and the metadata 1 14 may be encoded into a single media file or stream, At playback, the single media file or stream, or portions thereof, are decoded and presented to the viewer. In another example, the AV media streams 106 and 108, as well as the metadata 1 14 are stored as separate streams and/or files, wherein the metadata 1 14 maintains links to the locations of the AV media streams 106, 108. In such an example, a viewer may access the metadata 1 14 to gain access to the AV media streams 106, 108 as though they were part of a single file or stream.
[0042] The associated media stream 120 is then provided to a web service 130. In some implementations, the web service 130 is a network or Internet server that can store and serve the associated media stream 120 to client devices 140, 142. The client device 140 and the client device 142 communicate with the web service to request a download or stream of the associated media stream 120, and present the associated media stream 120 as AV media. In some implementations, the computer device 1 10 may include the functions of the web service 130.
[0043] In some implementations, the associated media stream 120 can be provided substantially in its entirety. For example, the client device 140 may be a laptop computer with a fast broadband network connection to the web service 130, and request a complete download of the associated media content 120 (e.g., for offline storage and playback). In some implementations, portions of the associated media stream 120 can be provided by the web service 130. For example, the client device 142 can be a smartphone with limited network bandwidth and storage. As such the web service 130 may stream a sub-portion of the associated media stream 120 that includes the view that a user has currently selected. The web server 130 may also transcode the stream to a format that the client device 142 can decode and/or to a bitrate that is supported by the device's network connection speed.
[0044] In some implementations, the web service 130 may modify and/or stream a sub-portion of a single AV media stream. For example, the AV media stream 106 may be panoramic video. As the user pans around the image using controls on the user device 142, the web server may select a portion of the panoramic view that corresponds to the user's input, transform (e.g., de-warp) the portion of the panoramic view into a first-person view, and stream the first-person view to the client device 142.
[0045] In some implementations, the user input may include gestures or other manipulation of the client devices 140, 142 themselves. For example, the client device 142 may be a smartphone that includes accelerometers, magnetometers, gyroscopes, or other positional sensors that allow the client device 142 to sense its orientation. The user may then hold up the client device 142 to view one section of a panoramic media stream, and then tilt, pan, and/or rotate the client device 142 to pan and tilt her view of the panoramic media stream.
[0046] FIG. 2 is a block diagram of an example media capture system 200. In some implementations, the system 200 can be the client device 102 or the client device 104 of FIG.1. The system 200 includes a processor module 202, a memory module 204, and a storage module 206. The processor 202 is configured to execute program code stored in the memory. module 204 and the storage module 206. The memory module 204 and the storage module 206 are configured to also store and retrieve electronic data.
[0047] The system 200 includes a video processor module 208 and an audio processor module 210. The video processor module 208 receives and digitizes video signals provided by a high definition camera 212 and a panoramic camera 214. In some implementations, the high definition camera 212 and/or the panoramic camera 214 may connect to the system 200 wirelessly.
[0048] The audio processor module 210 receives and digitizes audio signals provided by a wireless microphone 216 and a wireless microphone 218. In some implementations, the microphones 216 and/or 218 may connect to the audio processor module 210 through a wired connection.
[0049] A location sensor module 220 is configured to sense and determine the geographic location of the system 200. In some implementations, the location sensor module 220 can be a GPS receiver. A position and orientation sensor module 222 is configured to sense and determine the position, orientation, and motion of the system 200. For example, the position and orientation sensor 222 may include a
magnetometer or electronic compass to determine the orientation of the system 200 relative to magnetic north. In another example, the position and orientation sensor 222 may include tilt sensors or gyroscopes to sense the pitch and yaw of the system 200. As yet another example, the position and orientation sensor 222 may include
accelerometers to sense movement of the system 200.
[0050] The system 200 includes a clock module 224. In some implementations, the clock module 224 may be a real time clock. In some implementations, the clock module 224 may be a high-precision clock. For example, the clock module 224 may provide timing signals that may be associated with or included into the digitized AV streams provided by the video processor module 208 and the audio processor module 210. In some implementations, the clock module 224 may be part of the location sensor module 220. For example, the location sensor module 220 may be a GPS receiver, which operates based on the reception of high precision timing signals transmitted by GPS satellites.
[0051] The processor module 202 is operable to store the digitized AV streams provided by the video processor module 208 and the audio processor module 210 in the storage module 206. The processor module 202 is further operable to align and associate the digitized high definition video, the digitized panoramic video, the digitized audio streams, the location information, the position information, the timing information, and any other appropriate information, together as an associated media file or stream. In some implementations, the processor module 202 may compress the aforementioned media streams and other data prior to association, or may compress the associated media file or stream after the association.
[0052] The system 200 includes a network interface module 230 configured to connected to a wired and/or wireless network. For example, the network interface module 230 may connect the system 200 to a wireless Ethernet local area network (Wi- Fi), a cellular data network (e.g., EVDO, 3G, 4G, LTE, WiMAX), or other appropriate wireless network. In another example, the network interface module 230 may connect the system 200 to Ethernet local area network, a power line communication network (e.g., HomePlug), a fiber optic network, or other appropriate network. In some implementations, the network interface module 230 can connect the system 200, directly or indirectly, to the Internet. In some implementations, the network interface module 230 can connect the system 200 to media capture devices, such as the media capture device 104 of FIG. 1. [0053] The processor module 202 is operable to communicate, through the network interface module 230, to a web service or other server. For example, the processor 202 may retrieve an associated media file from the storage module 206 and upload the file to the web service 130 of FIG. 1 through the network interface module 230. In some implementations, the system 200 may be operated remotely through a remote operations device in communication with the network interface module 230. Examples of remote operations devices and interfaces are discussed in the descriptions of FIGs. 4 and 5.
[0054] FIG. 3A shows an example media capture device 300. In some
implementations, the media capture device 300 may be the media capture device 102 of FIG. , or the system 200 of FIG. 2. The media capture device 300 includes a substantially planar base member 302, an electronics enclosure 304, a support arm member 310, and a video head member 312. In some implementations, the base member 302 may be of sufficient width, length, and mass to provide stability for the media capture device 300 when the device 300 is placed on a flat surface such as a table or desk.
[0055] The electronics enclosure 304 includes electronic power and computing components. For example, the electronics enclosure 304 may include the processor module 202, the memory module 204, the storage module 206, the video processor module 208, the audio processor module 210, the location sensor module 220, the position and orientation sensor module 222, the clock module 224, the network interface module 230, power supplies, or other appropriate electronic components. The electronics enclosure 304 may also include switches and/or other user interfaces, as well as power, communications, AV, and other appropriate connection ports.
[0056] The electronics enclosure 304 also includes a microphone receptacle 306a and 306b. The microphone receptacles 306a, 306b are formed to at least partially receive a wireless microphone 308a and a wireless microphone 308b. In some implementations, the wireless microphones 308a and 308b may be retained within the receptacles 306a and 306b (e.g., by catches, friction, magnets) for storage and/or transport. In some implementations, the receptacles 306a and 306b may be include electrical or inductive components for recharging batteries in the wireless microphones 308a and 308b.
[0057] The support arm member 310 provides support to elevate the video head member 312. The support arm 310 is pivotably connected to the electronics enclosure 304 so as selectively elevate the video head member 312 above the base member 302. The video head member 312 is pivotably connected to the support arm member 310 so as to permit the video head member 312 to be selectably angled relative to, and/or to be made substantially vertical relative to the base member 302. The support arm member 310 is adjustable to lower the video head member 312 into contact with a bumper 314. In some implementations, the bumper 314 may protect the video head member 312 during storage and/or transport of the media capture device 300.
[0058] The video head member 312 includes a high resolution camera 316 adjustably connected to the video head member 312 by a positioning member 318. In some implementations, the high resolution camera 316 can be the high resolution camera 212 of FIG. 2. The positioning member 318 and the high resolution camera 316 are adjustable to aim the high resolution camera 316 at a selected subject. For example, the high resolution camera 316 may be oriented to capture a view of a lecturer or of a whiteboard.
[0059] The video head member 312 also includes a panoramic camera 320. In some implementations, the panoramic camera 320 can be the panoramic camera 214 (FIG. 2). The panoramic camera 320 includes a lens section 322. The lens section 322 focuses light that is reflected off a reflective dome 324 and into the panoramic camera 320. The reflective dome 324 is positioned relative to the lens section 322 by a substantially transparent cylinder 326. In the described configuration, light bouncing off objects located in a zone surrounding the cylinder enters the transparent cylinder 326, is reflected off the reflective dome 324, is focused by the lens section 322, and is captured by the panoramic camera 320. As such, the panoramic camera 320 captures radial image of the camera's surroundings. For example, pixels ringing the center of the captured image represent light reflected off objects surrounding the device 300 at a relatively low height, while pixels ringing the outer portions of the captured image represent light reflected off objects surrounding the device 300 at relatively higher elevations.
[0060] FIG. 3B shows an example media capture device 350. In some
implementations, the media capture device 350 can be the media capture device 104. The media capture device 350 is a portable electronic device such as a cellular telephone, smart phone, personal digital assistant, or other electronic device capable of capturing video. In the illustrated example, the media capture device is a smartphone 352 equipped with a camera (not shown). [0061] A panoramic optical adapter 354 is coupled to the smartphone 325 such that the adapter 345 is positioned over the camera lens to enable the camera to capture a substantially panoramic image. Light, reflected off objects surrounding the adapter 354, enters through a substantially transparent cylinder 356, bounces off a reflective dome section 358 and onto a mirror 360. The light is then reflected off the mirror 360 and through an aperture 360 in the top of the reflective dome section 358. The aperture 360 permits the light to pass through the lens and into the camera of the smartphone 352.
[0062] In some implementations, additional information produced by the smartphone may be added to the panoramic video captured by the camera. For example, time codes, positional information (e.g., GPS coordinates, electronic compass heading), and/or other appropriate information may be added or otherwise associated with the panoramic video.
[0063] In some implementations, the panoramic view captured by the camera is recorded, streamed , or is otherwise provided to a device such as the media capture device 300, or to a computer device such as the computer device 1 10 of FIG. 1 where the panoramic video may be aligned and/or associated with other AV streams. For example, the smartphone 352 may use its wireless functions (e.g., cellular, WiFi, Bluetooth) to transfer the panoramic video the media capture device 300 or other appropriate device.
[0064] FIG. 4 shows another example system 400 for capturing media content. The system 400 includes a parent device 410, a collection of child media capture devices 420, and a remote device 430. The parent device 410 is a media capture device with wireless communications capabilities. In some implementations, the parent device 410 can be the media capture device 102 of FIG. , the system 200 of FIG. 2, or the media capture device 300 of FIG. 3. In some implementations, the illustrated example may be referred to as a "swarm" configuration.
[0065] The parent device 410 communicates wirelessly with the child media capture devices 420. The child media devices 420 are configured to capture first-person video, panoramic video, and/or audio content, from a number of different audio-visual perspectives and wirelessly transmit the captured content to the parent device 410. In some implementations, the child media capture devices 420 can be the media capture device 104, the system 200, or the media capture device 350.
[0066] In general, the system 400 is configured to capture audio and video of an event, such as a classroom lecture, a presentation, a demonstration, a speech, a meeting, or other appropriate event from a variety of different locations, or points of view, within and around the event location. For example, in a classroom setting the parent device 410 may be located near the front of the classroom and the child media capture devices 420 may be located on several of the students' desks. As such, the parent device 410 may capture a clear view of the instructor and/or whiteboard, while the child media devices 420 capture audio and/or video of various locations within the classroom. When the AV media from the devices 410 and 420 is grouped and aligned as an associated media stream, the associated media stream may be played back such that the viewer may watch and listen to the recorded classroom lecture from
substantially any of the captured viewpoints, and may freely switch among the viewpoints substantially without disrupting the continuity of the playback timing.
Furthermore, the viewer can interact with panoramic video streams by panning and tilting a view of a subsection of the panoramic view, to provide the viewer with an experience similar to being able to look around the classroom (e.g., to see student reactions, to look at a student who is asking or answering a question).
[0067] In some implementations, the parent device 410 can be a device substantially similar to the device 102, the system 200, or the device 300, but may omit selected components such as cameras, microphones, video processing, audio processing, location sensors, or position sensors. For example, the parent device 4 0 may include substantially only the components of the system 200 that are needed to receive, align, and associate AV media streams from the child media capture devices 420, and provide the associated media stream to a web service such as the web service 130.
[0068] The remote device 430 is in wireless communication with the parent device 410, and provides a user interface with which a user can interact to view selected AV media streams provided by the child media capture devices 420, and/or perform various directing, editing, and production functions for the creation of the associated media stream. In some implementations, the remote device 430 may be used to pre-select default views that will be presented during playback of the associated media stream. For example, a user of the remote device 430 may select a high resolution video feed of a whiteboard while a teacher is writing, or may select a panoramic stream and a particular view while a student answers a question. Similarly, the remote user may select an audio feed obtained from a lapel or unidirectional microphone focused on the teacher while the teacher is speaking, or select an audio feed from an omnidirectional microphone during classroom discussion. In some implementations, these selections may be integrated into the associated media stream (e.g., as part of the metadata 1 14) such that when the associated media stream is played back, the playback may automatically switch among the various grouped AV media streams as the media plays. The viewer may then passively watch the playback, or may override the preselected views to view and listen to the lesson from substantially any of the available grouped audio and video streams.
[0069] FIG. 5 shows an example user interface (Ul) 500 for presenting media content. In general, the Ul 500 may be used during the playback of an associated media stream to provide the viewer with controls that may be used to select from among a variety of audio-visual perspectives captured by the AV media streams and grouped by the associated media stream, select particular views from within panoramic video streams, and control the playback (e.g., play, pause, fast forward, rewind) of the associated media stream. In some implementations, the Ul 500 may be the user interface presented by the remote device 430 of FIG. 4.
[0070] The Ul includes a viewing region 502, which generally provides one or more views of video media streams included in an associate media stream, and a control region 504, which generally provides user interface elements to present information about and to control the playback of the associated media stream. The control region 504 includes a collection of playback controls 506. In some implementations, the playback controls 506 may include buttons for functions such as play, pause, fast forward, rewind, slow motion, frame advance, or other appropriate playback functions.
[0071] The control region includes a time control 508. The time control 508 displays the time code associate with the media at the presented point of playback. In some implementations, the viewer may enter a time value into the time control 508 to jump to a selected time point of the presented media stream. A timeline control 510 displays a timeline 512 of the duration of the presented media stream and an indicator 514 of the relative point in time of the media stream at which the playback is being presented. A label 516 displays the name of an AV media stream that is available within the presented associated media stream.
[0072] The viewing region 502 includes primary view region 550. The primary view region 550 presents a primary, or relatively large, view of a selected one or group of AV media streams of the presented associated media stream. A view selector control 552 enables the viewer to select from a collection of choices representative of the media streams within the presented associated media stream. In the illustrated example, the area occupied by the view selector control 552 can represent the space in which the captured event took place (e.g., the area of a classroom, a lecture hall, a conference room). A collection of icons 554 and 556 is presented within the view selector control 552. The icons 554, 556 represent audio-visual perspectives that are available for viewing within the presented associated media stream. For example the icons 554 can represent available panoramic views, and the icon 556 can represent an available first- person view. The viewer may click or otherwise select one of the icons 554, 556 to cause the playback of the associated media stream to switch to the selected audiovisual perspective and continue playing substantially without temporal interruption of the presentation. For example, the viewer may select one of the icons 554, and the corresponding video stream may be presented in the primary view region 550.
[0073] In the illustrated example, the icons 554, 556 are placed in locations within the view selector control 552 that are representative of the locations of their respective media capture devices within the area of the event. For example, the icon 556 may represent a first-person camera view captured by a camera located near the front of a classroom and pointed forward (e.g., toward a whiteboard). In some implementations, the view selector control 552 may be a list, a dropdown control, a menu, or other appropriate user interface control that can give the viewer a selectable choice of media streams.
[0074] A secondary view region 560 is provided to display a reduced, substantially synchronous, presentation of another selected one of the grouped media streams (e.g., a picture-in-picture display). For example, in a classroom setting, the secondary view region 560 may be configured to view a video stream of the teacher or whiteboard while the viewer freely looks around the classroom using a section of a panoramic view that is displayed in the primary view region 550. In another example, the associated media stream may include two video streams, and the secondary view region may be configured to always show the video stream that is not being presented in the primary view region 550.
[0075] A pan view region 570 presents a view of a selected panoramic video stream. In some implementations, a substantially polar or radial panoramic image may be at least partly unwrapped, de-warped, or otherwise transformed into a flattened, rectangular format for display in the pan view region. A viewport control 572 may be moved within the pan view region 570 by the viewer to pan and tilt a view of a
subsection of the panoramic view. The selected subsection is transformed from panoramic format to a first-person perspective and presented in the primary view region 550 or the secondary view region 560. In some implementations, the subsection of the panoramic view selected by the viewer through the viewport control may be passed to a web service, such as the web service 130 of FIG. 1 . The web service may then transform only the selected subsection of the panoramic video stream, and stream that view to the viewer as a video stream having a first person perspective.
[0076] FIG. 6 is a flow diagram for an example process 600 for capturing media content. The process 600 may be implemented by the media capture devices 102, 104, or 300 of FIGs. 1 , 3A, 3B and 4, respectively, or by the system 200 of FIG. 2, or by the parent device 410 of FIG. 4. The process begins at step 610, when multiple content streams are received. For example, the system 200 may receive video content streams captured by the cameras 212, 214, and audio streams captured by the microphones 216, 218. In another example, the parent device 410 may receive multiple AV content streams from the child media capture devices 420.
[0077] At step 620, the content streams are aligned. For example, the AV content streams provided by the child media capture devices 420 may arrive at the parent device 410 with various delays (e.g., due to network latency), and the parent device 410 may use the start of each AV content stream as an alignment point by which to substantially synchronize the parallel AV media streams. In another example, time codes within the content streams may be used to substantially align or synchronize the content streams.
[0078] At step 630, metadata that identifies the content streams and the locations at which the streams were captured is created. For example, the system 200 may identify the audio-visual perspective captured by the high resolution camera 212 as "whiteboard view" and identify the audio-visual perspective captured by the panoramic camera 214 as "classroom view". In another example, the parent device 410 may receive the location and/or position information provided by the child media capture devices 420, and associate that location information with the respective AV media streams as metadata.
[0079] At step 640, the content streams and the metadata are grouped into an associated content stream. For example, the system 200 may encode the video and audio streams captured by the cameras 212, 214 and the microphones 216, 218 into a single media file or stream. At playback, the single media stream, or portions thereof, can be decoded and presented to the viewer. In another example, the system 200 may store the AV streams and the metadata as separate streams and/or files, wherein the metadata may maintains links to the locations of the AV media streams. In such an example, a viewer may access the metadata to gain access to the AV media streams as though they were all part of a single file or stream.
[0080] In some implementations, a single (e.g., mono, stereo, surround sound) audio stream for inclusion in the associated content stream may be formed by switching among a selection of captured audio streams. For example, a stream captured from a lapel microphone may be selected while a classroom instructor is speaking, and a stream captured from an omnidirectional microphone may be selected while students are asking questions. In some implementations, a human editor may select from among multiple audio streams to choose which portions are to be included in the associated media stream. For example, the remote device 430 may be used by an editor to select which audio stream is to be included in the associated content stream. In some implementations, an automated process may be used to select from among multiple audio streams to choose which portions are to be included in the associated content stream. For example, the system 200 may automatically switch among audio streams (e.g., based on the loudest source, the most continuous source) to form an audio stream for inclusion in the associated content stream.
[0081] At step 650, the associated content stream is compressed. In some
implementations, these streams may be multiplexed for higher performance and better synchronization during playback.
[0082] At step 660, the associated content stream is provided for viewing. For example, the computer device 1 10 may upload the associated content stream 120 to the web service 130, from which the associated content stream 120 may be
downloaded or streamed for presentation by the client devices 140, 142.
[0083] FIG. 7 is a flow diagram for an example process 700 for presenting media content. In some implementations, the process 700 may be performed by the client devices 140, 142 of FIG. 1 , or the remote device 430 of FIG. 4. In some
implementations, the Ul 500 of FIG. 5 may be used by a viewer to interact with the functions of the process 700.
[0084] The process 700 begins at step 710 when data that describes multiple content streams is received. For example, the client devices 140, 142 may request and receive an associated content stream from the web server 130. In some
implementations, the received data may be a stream or file that includes substantially all the AV content streams that have been grouped into the associated content stream. In some implementations, substantially only the metadata describing the grouped AV content streams is received. [0085] At step 720, a selection of the content streams is presented to the viewer. For example, the view selector control 552 may be used to present a choice of audiovisual perspectives for the viewer to choose from. At step 730, a user selection of a content stream is received. For example, the viewer may select or click on one of the icons 554, 556 within the view selector control 552. In response to the user selection, the selected content is presented at step 740. For example, the user may select the icon 556, and the video associated with the icon 556 may be presented in the primary view region 550.
[0086] At step 750 a determination is made of whether the viewer has selected to view another of the available content streams. For example, the viewer may be watching the video stream represented by the icon 556, and then click on one of the icons 554. If a user selection of another content stream is received, then a time code corresponding to the playback position of the currently-presented content stream is identified. For example, the viewer may have clicked one of the icons 554 when the playback of content stream was ten minutes and fifteen seconds ("00:10:15") into its duration. The client device 140 may identify the "00:10: 15" time code associated with the time of the user selection. At step 770, presentation of the selected content stream is started at a time code associated with the selected content stream and aligned with the identified time code. For example, the client device 140 may stop presenting the content stream associated with the icon 556 at time "00:10:15", and begin presentation of the content stream associated with the selected icon 554 at time "00:10:15" within the newly selected content stream. The process 700 continues at step 740, where the selected content stream continues to be presented. [0087] If, however, at step 750 a user selection of another content stream is not received, then another determination is made. If at step 780 the presentation of the associated content streams is not complete (e.g., playback has not reached the end of the duration of the stream), then the process continues at step 740. If, however at step 780 the presentation of the associated content streams is determined to be complete (e.g., playback has reached the end of the duration of the stream), then the processes 700 ends.
[0088] FIG. 8 is a block diagram of computing devices 800, 850 that may be used to implement the systems and methods described in this document, either as a client or as a server or plurality of servers. Computing device 800 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers.
Computing device 850 is intended to represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smartphones, and other similar computing devices. The components shown here, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed in this document.
[0089] Computing device 800 includes a processor 802, memory 804, a storage device 806, a high-speed interface 808 connecting to memory 804 and high-speed expansion ports 810, and a low speed interface 812 connecting to low speed bus 814 and storage device 806. Each of the components 802, 804, 806, 808, 810, and 812, are interconnected using various busses, and may be mounted on a common
motherboard or in other manners as appropriate. The processor 802 can process instructions for execution within the computing device 800, including instructions stored in the memory 804 or on the storage device 806 to display graphical information for a GUI on an external input/output device, such as display 816 coupled to high speed interface 808. In other implementations, multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory. Also, multiple computing devices 800 may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multiprocessor system).
[0090] The memory 804 stores information within the computing device 800. In one implementation, the memory 804 is a computer-readable medium. In one
implementation, the memory 804 is a volatile memory unit or units. In another implementation, the memory 804 is a non-volatile memory unit or units.
[0091] The storage device 806 is capable of providing mass storage for the computing device 800. In one implementation, the storage device 806 is a computer- readable medium. In various different implementations, the storage device 806 may be a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations. In one implementation, a computer program product is tangibly embodied in an information carrier. The computer program product contains instructions that, when executed, perform one or more methods, such as those described above. The information carrier is a computer- or machine-readable medium, such as the memory 804, the storage device 806, or memory on processor 802. [0092] The high speed controller 808 manages bandwidth-intensive operations for the computing device 800, while the low speed controller 812 manages lower bandwidth-intensive operations. Such allocation of duties is exemplary only. In one implementation, the high-speed controller 808 is coupled to memory 804, display 816 (e.g., through a graphics processor or accelerator), and to high-speed expansion ports 810, which may accept various expansion cards (not shown). In the implementation, low-speed controller 812 is coupled to storage device 806 and low-speed expansion port 814. The low-speed expansion port, which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter.
[0093] The computing device 800 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 820, or multiple times in a group of such servers. It may also be implemented as part of a rack server system 824. In addition, it may be implemented in a personal computer such as a laptop computer 822. Alternatively, components from computing device 800 may be combined with other components in a mobile device (not shown), such as device 850. Each of such devices may contain one or more of computing device 800, 850, and an entire system may be made up of multiple computing devices 800, 850 communicating with each other.
[0094] Computing device 850 includes a processor 852, memory 864, an
input/output device such as a display 854, a communication interface 866, and a transceiver 868, among other components. The device 850 may also be provided with a storage device, such as a microdrive or other device, to provide additional storage. Each of the components 850, 852, 864, 854, 866, and 868, are interconnected using various buses, and several of the components may be mounted on a common motherboard or in other manners as appropriate.
[0095] The processor 852 can process instructions for execution within the
computing device 850, including instructions stored in the memory 864. The processor may also include separate analog and digital processors. The processor may provide, for example, for coordination of the other components of the device 850, such as control of user interfaces, applications run by device 850, and wireless communication by device 850.
[0096] Processor 852 may communicate with a user through control interface 858 and display interface 856 coupled to a display 854. The display 854 may be, for example, a TFT LCD display or an OLED display, or other appropriate display
technology. The display interface 856 may comprise appropriate circuitry for driving the display 854 to present graphical and other information to a user. The control interface 858 may receive commands from a user and convert them for submission to the processor 852. In addition, an external interface 862 may be provide in communication with processor 852, so as to enable near area communication of device 850 with other devices. External interface 862 may provide, for example, for wired communication (e.g., via a docking procedure) or for wireless communication (e.g., via Bluetooth or other such technologies).
[0097] The memory 864 stores information within the computing device 850. In one implementation, the memory 864 is a computer-readable medium. In one implementation, the memory 864 is a volatile memory unit or units. In another implementation, the memory 864 is a non-volatile memory unit or units. Expansion memory 874 may also be provided and connected to device 850 through expansion interface 872, which may include, for example, a SIMM card interface. Such expansion memory 874 may provide extra storage space for device 850, or may also store applications or other information for device 850. Specifically, expansion memory 874 may include instructions to carry out or supplement the processes described above, and may include secure information also. Thus, for example, expansion memory 874 may be provide as a security module for device 850, and may be programmed with instructions that permit secure use of device 850. In addition, secure applications may be provided via the SIMM cards, along with additional information, such as placing identifying information on the SIMM card in a non-hackable manner.
[0098] The memory may include for example, flash memory and/or MRAM memory, as discussed below. In one implementation, a computer program product is tangibly embodied in an information carrier. The computer program product contains
instructions that, when executed, perform one or more methods, such as those described above. The information carrier is a computer- or machine-readable medium, such as the memory 864, expansion memory 874, or memory on processor 852.
[0099] Device 850 may communicate wirelessly through communication interface 866, which may include digital signal processing circuitry where necessary.
Communication interface 866 may provide for communications under various modes or protocols, such as GSM voice calls, SMS, EMS, or MMS messaging, CDMA, TDMA, PDC, WCDMA, CDMA2000, or GPRS, among others. Such communication may occur, for example, through radio-frequency transceiver 868. In addition, short-range communication may occur, such as using a Bluetooth, WiFi, or other such transceiver (not shown). In addition, GPS receiver module 870 may provide additional wireless data to device 850, which may be used as appropriate by applications running on device 850.
[00100] Device 850 may also communication audibly using audio codec 860, which may receive spoken information from a user and convert it to usable digital information. Audio codex 860 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of device 850. Such sound may include sound from voice telephone calls, may include recorded sound (e.g., voice messages, music files, etc.) and may also include sound generated by applications operating on device 850.
[00101] The computing device 850 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a cellular telephone 880. It may also be implemented as part of a smartphone 882, personal digital assistant, or other similar mobile device.
[00102] Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
[00103] These computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine- readable medium" "computer-readable medium" refers to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory,
Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
[00104] To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input. [00105] The systems and techniques described here can be implemented in a computing system that includes a back end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front end component (e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), and the Internet.
[00106] The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
[00107] A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the disclosure. For example, various forms of the flows shown above may be used, with steps re-ordered, added, or removed. Also, although several applications of the media capture systems and methods have been described, it should be recognized that numerous other applications are contemplated. Accordingly, other implementations are within the scope of the following claims.

Claims

WHAT IS CLAIMED IS:
1. An audio and video capture system, comprising:
a plurality of video capture devices configured to capture a plurality of video streams of an event;
a plurality of audio capture devices configured to capture a plurality of audio streams of the event;
a processor configured to receive, align, and associate the video and audio streams such that the audio and video streams are accessible as a single repository of media; a data repository configured to store the associated video and audio streams; and, a server configured to provide the associated video and audio streams, such that a user may select from among the video and audio streams of the event.
2. The apparatus of claim 1 , wherein at least one of the video streams comprises a video stream of a panoramic view.
3. The apparatus of claim 2, wherein the processor is further configured to transform the panoramic view to a substantially first-person perspective.
4. The apparatus of claim 1 , wherein the processor is further configured to compress the video and audio streams.
5. The apparatus of claim 1 , further comprising one or more time-based media capture devices configured to capture a plurality of time-based media streams of the event.
6. The apparatus of claim 5, wherein one or more of the time-based media capture devices comprises a video stream created from data provided by an electronic whiteboard.
7. The apparatus of claim 1 , wherein one or more of the video capture devices are in wireless communication with the processor.
8. The apparatus of claim 7, wherein one or more of the video capture devices are cellular telephones.
9. The apparatus of claim 1 , wherein the one or more of the video capture devices are further configured to determine positional information and provide the positional information to the processor.
10. An audio and video capture apparatus, comprising:
a plurality of video capture devices configured to capture a plurality of video streams of an event;
a plurality of audio capture devices configured to capture a plurality of video streams of the event;
a clock; and,
a communications network interface.
1 1 . The apparatus of claim 0, further comprising a positional sensor.
12. The apparatus of claim 10, wherein at least one of the video capture devices is a panoramic camera configured to capture a panoramic view of the event.
13. The apparatus of claim 10, further comprising one or more time-based media capture devices configured to capture a plurality of time-based media streams of the event.
14. The apparatus of claim 13, wherein at least one of the time-based media capture devices is configured to create a video stream from data provided by an electronic whiteboard.
15. The apparatus of claim 10, wherein at least one of the video capture devices communicates the video streams wirelessly.
16. The apparatus of claim 10, wherein at least one of the video capture devices is a cellular telephone.
17. The apparatus of claim 10, wherein the video capture devices are further configured to determine positional information and associate the positional information with the video streams.
18. The apparatus of claim 10, wherein the video capture devices comprise clocks configurable to be synchronized with the clock, and associate synchronized timing information with the video streams.
19. The apparatus of claim 10, wherein at least one or the audio capture devices communicates audio streams wirelessly.
20. An audio and video capture system, comprising:
a plurality of electronic media capture devices configured to capture and provide electronic media streams at a plurality of locations at an event; and,
an electronic media recording device configured to receive, align, and associate the electronic media streams, and provide the associated electronic media streams as a single electronic media stream comprising a plurality of selectable audio-visual perspectives of the event.
21 . The system of claim 20, wherein the electronic media recording device provides timing synchronization signals to the electronic media capture devices.
22. The system of 20, wherein the electronic media streams include information indicating the positions of the electronic media capture devices, and information indicating the times at which the electronic media streams were captured.
23. The system of 20, wherein the electronic media capture devices and the electronic media recording devices communicate wirelessly.
24. The system of 20, wherein the electronic media capture devices are audio capture devices.
25. The system of 20, wherein the electronic media capture devices are video capture devices.
26. The system of 25, wherein the video capture devices comprise optics configured to capture panoramic views of the event.
27. The system of 20, wherein the electronic media capture devices are cellular telephones.
28. The system of 20, wherein the electronic media capture devices are electronic whiteboards.
29. The system of 20, wherein the event is a classroom lecture.
30. A method of creating media content, comprising:
receiving, at a processing device, a plurality of electronic content streams comprising content captured from an event, each stream comprising information describing the time at which the electronic content was captured;
aligning, by the processing device, the electronic content streams; and,
creating, by the processing device, a collection of metadata which identifies each of the electronic content streams and the locations at which the electronic content streams were captured.
31. The method of claim 30, further comprising compressing the electronic content streams.
32. The method of claim 30, wherein the electronic content streams comprise a plurality of audio streams.
33. The method of claim 30, wherein the electronic content streams comprise a plurality of video streams.
34. The method of claim 33, wherein the plurality of video streams comprise panoramic video content.
35. The method of claim 33, wherein at least one of video streams comprises panoramic video content, and at least one of the video streams comprises non- panoramic video content.
36. The method of claim 30, wherein the electronic content streams comprise video created from data provided by an electronic whiteboard.
37. The method of claim 30, wherein each of the electronic content streams further comprises metadata which describes timing information identifying when the electronic content stream was captured, and metadata which identifies the location at which the electronic content stream was captured.
38. The method of claim 30, wherein the event is a classroom lecture.
39. The method of claim 30, wherein each stream further comprises an identifier of a device which captured the electronic content.
40. The method of claim 30, wherein each stream further comprises information describing the orientation of the device at the event.
41 . A method for presenting media content, comprising:
receiving, at a user device, data that describes a plurality of electronic media content streams comprising aligned content captured from an event;
presenting, by the user device, a selection of electronic media content streams; and, presenting, by the user device in response to a user selection, a selected electronic media content stream.
42. The method of claim 41 , wherein the electronic content streams comprise a plurality of audio streams.
43. The method of claim 41 , wherein the electronic content streams comprise a plurality of video streams.
44. The method of claim 43, wherein the plurality of video streams comprise panoramic video content.
45. The method of claim 43, wherein at least one of video streams comprises panoramic video content, and at least one of the video streams comprises non- panoramic video content.
46. The method of claim 40, wherein the electronic content streams comprise video created from data provided by an electronic whiteboard.
47. The method of claim 44, further comprising presenting user controls receptive to user inputs which direct the presentation of a selected subsection of the panoramic video content.
48. The method of claim 47, further comprising transforming the selected subsection from a panoramic perspective to a substantially first-person perspective.
49. The method of claim 47, further comprising requesting the selected subsection of the panoramic video content from a server, and receiving a video content stream from the server comprising a first-person perspective view of the selected subsection.
50. The method of claim 40, wherein the event is classroom lecture.
51 . The method of 40, further comprising identifying by the user device in response to another user selection, a first time code associated with the electronic content stream, and presenting another selected electronic media content stream starting at a second time code associated with the other electronic media stream and aligned with the first time code.
PCT/US2012/021951 2011-01-20 2012-01-20 Multiple viewpoint electronic media system WO2012100114A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201161434584P 2011-01-20 2011-01-20
US61/434,584 2011-01-20

Publications (2)

Publication Number Publication Date
WO2012100114A2 true WO2012100114A2 (en) 2012-07-26
WO2012100114A3 WO2012100114A3 (en) 2012-10-26

Family

ID=46516390

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/021951 WO2012100114A2 (en) 2011-01-20 2012-01-20 Multiple viewpoint electronic media system

Country Status (1)

Country Link
WO (1) WO2012100114A2 (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI505242B (en) * 2014-09-29 2015-10-21 Vivotek Inc System and method for digital teaching
US9225527B1 (en) 2014-08-29 2015-12-29 Coban Technologies, Inc. Hidden plug-in storage drive for data integrity
GB2528060A (en) * 2014-07-08 2016-01-13 Ibm Peer to peer audio video device communication
US9307317B2 (en) 2014-08-29 2016-04-05 Coban Technologies, Inc. Wireless programmable microphone apparatus and system for integrated surveillance system devices
WO2016202885A1 (en) * 2015-06-15 2016-12-22 Piksel, Inc Processing content streaming
WO2017083418A1 (en) * 2015-11-09 2017-05-18 Nexvidea Inc. Methods and systems for recording, producing and transmitting video and audio content
US9660999B2 (en) 2015-02-06 2017-05-23 Microsoft Technology Licensing, Llc Discovery and connection to a service controller
US9742780B2 (en) 2015-02-06 2017-08-22 Microsoft Technology Licensing, Llc Audio based discovery and connection to a service controller
US9742976B2 (en) 2014-07-08 2017-08-22 International Business Machines Corporation Peer to peer camera communication
US9781320B2 (en) 2014-07-08 2017-10-03 International Business Machines Corporation Peer to peer lighting communication
CN108401167A (en) * 2017-02-08 2018-08-14 三星电子株式会社 Electronic equipment and server for video playback
US10152858B2 (en) 2016-05-09 2018-12-11 Coban Technologies, Inc. Systems, apparatuses and methods for triggering actions based on data capture and characterization
US10165171B2 (en) 2016-01-22 2018-12-25 Coban Technologies, Inc. Systems, apparatuses, and methods for controlling audiovisual apparatuses
US10370102B2 (en) 2016-05-09 2019-08-06 Coban Technologies, Inc. Systems, apparatuses and methods for unmanned aerial vehicle
US10789840B2 (en) 2016-05-09 2020-09-29 Coban Technologies, Inc. Systems, apparatuses and methods for detecting driving behavior and triggering actions based on detected driving behavior
CN113518260A (en) * 2021-09-14 2021-10-19 腾讯科技(深圳)有限公司 Video playing method and device, electronic equipment and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040034653A1 (en) * 2002-08-14 2004-02-19 Maynor Fredrick L. System and method for capturing simultaneous audiovisual and electronic inputs to create a synchronized single recording for chronicling human interaction within a meeting event
US7057663B1 (en) * 2001-05-17 2006-06-06 Be Here Corporation Audio synchronization pulse for multi-camera capture systems
US20070220561A1 (en) * 2006-03-20 2007-09-20 Girardeau James W Jr Multiple path audio video synchronization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7057663B1 (en) * 2001-05-17 2006-06-06 Be Here Corporation Audio synchronization pulse for multi-camera capture systems
US20040034653A1 (en) * 2002-08-14 2004-02-19 Maynor Fredrick L. System and method for capturing simultaneous audiovisual and electronic inputs to create a synchronized single recording for chronicling human interaction within a meeting event
US20070220561A1 (en) * 2006-03-20 2007-09-20 Girardeau James W Jr Multiple path audio video synchronization

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10257404B2 (en) 2014-07-08 2019-04-09 International Business Machines Corporation Peer to peer audio video device communication
US10270955B2 (en) 2014-07-08 2019-04-23 International Business Machines Corporation Peer to peer audio video device communication
US9948846B2 (en) 2014-07-08 2018-04-17 International Business Machines Corporation Peer to peer audio video device communication
GB2528060B (en) * 2014-07-08 2016-08-03 Ibm Peer to peer audio video device communication
US9955062B2 (en) 2014-07-08 2018-04-24 International Business Machines Corporation Peer to peer audio video device communication
GB2528060A (en) * 2014-07-08 2016-01-13 Ibm Peer to peer audio video device communication
US9781320B2 (en) 2014-07-08 2017-10-03 International Business Machines Corporation Peer to peer lighting communication
US9742976B2 (en) 2014-07-08 2017-08-22 International Business Machines Corporation Peer to peer camera communication
US9307317B2 (en) 2014-08-29 2016-04-05 Coban Technologies, Inc. Wireless programmable microphone apparatus and system for integrated surveillance system devices
US9225527B1 (en) 2014-08-29 2015-12-29 Coban Technologies, Inc. Hidden plug-in storage drive for data integrity
TWI505242B (en) * 2014-09-29 2015-10-21 Vivotek Inc System and method for digital teaching
US9742780B2 (en) 2015-02-06 2017-08-22 Microsoft Technology Licensing, Llc Audio based discovery and connection to a service controller
US9660999B2 (en) 2015-02-06 2017-05-23 Microsoft Technology Licensing, Llc Discovery and connection to a service controller
WO2016202885A1 (en) * 2015-06-15 2016-12-22 Piksel, Inc Processing content streaming
US11425439B2 (en) 2015-06-15 2022-08-23 Piksel, Inc. Processing content streaming
WO2016202886A1 (en) * 2015-06-15 2016-12-22 Piksel, Inc Synchronisation of streamed content
US10791356B2 (en) 2015-06-15 2020-09-29 Piksel, Inc. Synchronisation of streamed content
WO2017083418A1 (en) * 2015-11-09 2017-05-18 Nexvidea Inc. Methods and systems for recording, producing and transmitting video and audio content
US10165171B2 (en) 2016-01-22 2018-12-25 Coban Technologies, Inc. Systems, apparatuses, and methods for controlling audiovisual apparatuses
US10152859B2 (en) 2016-05-09 2018-12-11 Coban Technologies, Inc. Systems, apparatuses and methods for multiplexing and synchronizing audio recordings
US10370102B2 (en) 2016-05-09 2019-08-06 Coban Technologies, Inc. Systems, apparatuses and methods for unmanned aerial vehicle
US10789840B2 (en) 2016-05-09 2020-09-29 Coban Technologies, Inc. Systems, apparatuses and methods for detecting driving behavior and triggering actions based on detected driving behavior
US10152858B2 (en) 2016-05-09 2018-12-11 Coban Technologies, Inc. Systems, apparatuses and methods for triggering actions based on data capture and characterization
CN108401167A (en) * 2017-02-08 2018-08-14 三星电子株式会社 Electronic equipment and server for video playback
CN113518260A (en) * 2021-09-14 2021-10-19 腾讯科技(深圳)有限公司 Video playing method and device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
WO2012100114A3 (en) 2012-10-26

Similar Documents

Publication Publication Date Title
WO2012100114A2 (en) Multiple viewpoint electronic media system
US10171769B2 (en) Sound source selection for aural interest
US10514885B2 (en) Apparatus and method for controlling audio mixing in virtual reality environments
US8831505B1 (en) Method and apparatus for effectively capturing and broadcasting a traditionally delivered classroom or a presentation
KR101270780B1 (en) Virtual classroom teaching method and device
Zhang et al. An automated end-to-end lecture capture and broadcasting system
US9240214B2 (en) Multiplexed data sharing
US20150124171A1 (en) Multiple vantage point viewing platform and user interface
US10296281B2 (en) Handheld multi vantage point player
US20190139312A1 (en) An apparatus and associated methods
US20200413152A1 (en) Video content switching and synchronization system and method for switching between multiple video formats
KR101367260B1 (en) A virtual lecturing apparatus for configuring a lecture picture during a lecture by a lecturer
US10156898B2 (en) Multi vantage point player with wearable display
US20180227501A1 (en) Multiple vantage point viewing platform and user interface
KR101351085B1 (en) Physical picture machine
US20150221334A1 (en) Audio capture for multi point image capture systems
US20140294366A1 (en) Capture, Processing, And Assembly Of Immersive Experience
US20220222881A1 (en) Video display device and display control method for same
US10664225B2 (en) Multi vantage point audio player
US20150304559A1 (en) Multiple camera panoramic image capture apparatus
EP3379379A1 (en) Virtual reality system and method
US20150304724A1 (en) Multi vantage point player
US20090153550A1 (en) Virtual object rendering system and method
US20180227694A1 (en) Audio capture for multi point image capture systems
WO2018027067A1 (en) Methods and systems for panoramic video with collaborative live streaming

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12736367

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12736367

Country of ref document: EP

Kind code of ref document: A2