CN112422808B

CN112422808B - Photo acquisition method, media object processing device and electronic equipment

Info

Publication number: CN112422808B
Application number: CN201910785236.3A
Authority: CN
Inventors: 郑凯方
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-08-23
Filing date: 2019-08-23
Publication date: 2023-05-19
Anticipated expiration: 2039-08-23
Also published as: CN112422808A

Abstract

The application discloses a method and a device for acquiring a photo and processing a media object, wherein the method for acquiring the photo comprises the following steps: collecting audio data in the process of collecting the first media data; the first media data includes photo data; generating second media data according to the collected audio data; establishing an association relationship between the first media data and the second media data; generating a target object according to the first media data, the second media data and the association relation; and when the target object is operated, outputting the first media data and the second media data according to the corresponding relation. By the method, the content of the photo can be enriched, the interestingness of the photo application is increased, and the richness of the photo application is increased.

Description

Photo acquisition method, media object processing device and electronic equipment

Technical Field

The present invention relates to the field of media applications, and in particular, to a method and apparatus for obtaining a photograph and processing a media object.

Background

With the popularization of portable terminal devices such as smart phones, functional applications brought to users by software and hardware of the smart terminal devices are more and more abundant, and shooting pictures is an important application frequently used by users, and the users can shoot various interested target objects such as people, objects, sceneries and the like at any time and any place through the shooting function provided by the portable smart terminal devices which are commonly held. The intelligent terminal equipment has the advantages that the development process of the intelligent terminal equipment is similar to that of many other equipment functions, the photo shooting function provided by the intelligent terminal equipment also undergoes the development process from simple one to complex and various, and different shooting requirements of various application scenes of a user and increasingly refined use requirements of the shooting function can be better met through the refined development of the equipment functions in multiple directions.

From the development of hardware, the hardware specification of a camera carried by the terminal equipment is higher and higher, and components and functions of camera pixels, photosensitive elements, anti-shake, zoom control and the like are developed towards a more powerful direction, so that the guarantee is provided for obtaining pictures with higher imaging quality; the quality of the obtained photo image is obviously improved by matching with an algorithm which is mature and later-stage picture repairing software, and meanwhile, the application of a photographing function is enriched. In the prior art, the quality of the photo image is improved and the richness of the photo application is improved mainly through the two main directions, namely the directions of improving the functional configuration of hardware equipment and enriching software processing means. In practice, the purpose of enriching the photo application can be achieved in other ways.

Disclosure of Invention

The embodiment of the invention provides a method and a device for acquiring a photo and processing a media object, which can enrich the content of the photo, increase the interestingness of the photo application and increase the richness of the photo application.

The invention provides the following scheme:

an acquisition method for acquiring a photo, comprising:

collecting audio data in the process of collecting the first media data; the first media data includes photo data;

Generating second media data according to the collected audio data;

establishing an association relationship between the first media data and the second media data;

generating a target object according to the first media data, the second media data and the association relation; and when the target object is operated, outputting the first media data and the second media data according to the corresponding relation.

A method of processing a media object, the media object being generated based on first media data and second media data; the first media data comprises photo data, and the second media data is generated according to audio data acquired in the process of acquiring the photo data; the method comprises the following steps:

providing a first operation option for operating the media object;

and when an operation request for loading the media object is received through the first operation option, loading the photo data and the second media data, displaying the photo data and playing corresponding audio data content.

A panoramic photograph, comprising:

first media data; the first media data includes image data of a panoramic photograph;

second media data; the second media data is generated according to the audio data synchronously collected in the process of collecting the first media data; and outputting the first media data and the second media data when the panoramic photo is operated.

An acquisition device of a photo, comprising:

the audio data acquisition unit is used for acquiring audio data in the process of acquiring the first media data; the first media data includes photo data;

a second media data generating unit for generating second media data according to the collected audio data;

an association relation establishing unit, configured to establish an association relation between the first media data and the second media data;

the target object generating unit is used for generating a target object according to the first media data, the second media data and the association relation; and when the target object is operated, outputting the first media data and the second media data according to the corresponding relation.

A processing device of a media object, the media object being generated based on first media data and second media data; the first media data comprises photo data, and the second media data is generated according to audio data acquired in the process of acquiring the photo data; the device comprises:

an operation option providing unit for providing a first operation option for operating the media object;

and the object loading display unit is used for loading the photo data and the second media data when receiving an operation request for loading the media object through the first operation option, displaying the photo data and playing corresponding audio data content.

An electronic device, comprising:

one or more processors; and

a memory associated with the one or more processors, the memory for storing program instructions that, when read for execution by the one or more processors, perform the operations of:

generating second media data according to the collected audio data;

According to a specific embodiment provided by the application, the application discloses the following technical effects:

according to the method, the acquisition of the audio data can be carried out in the process of acquiring the photo data, the second media data is generated according to the acquired audio data, the second media data can be audio content obtained by the audio data, and after the association relation between the audio data and the second media data is established, the target object can be generated according to the first media data, the second media data and the association relation; and when the target object is operated, outputting the first media data and the second media data according to the corresponding relation. The obtained photo combines photo data and content generated based on audio data when the photo data is acquired, scene information when the photo is taken is reflected through the second media data, information showing the scene related to the moment of taking is integrated into a target object together with the photo content, and particularly, for panoramic photos or photo types with relatively long shooting time such as continuous shooting photos, abundant synchronous audio content can be acquired. The target object produced by the method has the advantage of being relatively lightweight compared with media content such as video. When the obtained target object is output, information related to the scene when the photo is shot can be obtained, so that the content of the photo is enriched, the interestingness of the photo application is increased, and the richness of the photo application is increased.

Of course, not all of the above-described advantages need be achieved at the same time in practicing any one of the products of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an image and a second media data storage provided in an embodiment of the present application;

FIG. 2 is a flow chart of a first method provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a sub-photograph corresponding to audio data;

FIG. 4 is a flow chart of a second method provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a switching operation option for the second media data content;

FIG. 6 is a schematic view of a first apparatus provided in an embodiment of the present application;

FIG. 7 is a schematic diagram of a second apparatus provided in an embodiment of the present application;

fig. 8 is a schematic architecture diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the invention, fall within the scope of protection of the invention.

The function of taking photos provided by portable terminal devices such as smartphones and the like is undergoing a refinement development in multiple directions, so as to better meet different shooting requirements of different application scenes and increasingly refined use requirements for shooting functions. In the main direction of development of the related art, firstly, hardware specifications are changed, for example, the number of camera pixels, photosensitive elements, and cooperable cameras are increased; the other direction is the image obtaining and processing link, such as using better imaging algorithm, applying photo processing software with rich functions, etc. The method provided by the embodiment of the application aims to enrich the content of the photo and the application scene of the photo shooting through another way, and improves the interestingness of the photo acquisition and the display through the mode of enriching the media types in the shooting and the display process of the photo, thereby achieving the purpose of enriching the application scene of the photo shooting application.

In order to achieve the above purpose, the method provided by the application introduces the second media data obtained according to the audio data on the basis of the photo, and can obtain a photo form of 'rich media' through the combined application of the photo and the second media data. The second media data may be obtained from the collected audio data. The collection and processing of audio data is easier to implement than video, and the processing such as recording, encoding and storing of audio is lighter. In addition, the second media data obtained based on the audio data can be processed and output in a superposition mode on the basis of the content of the photo, the photo is represented more on the side while certain photo shooting scene information is represented, the photo obtaining and displaying processes are processed by taking the photo data as a center, and the photo body can be represented in photo application. Whereas with the video content as second media data, the more likely way of processing is to output in a switched manner due to the nature of the video medium itself, the photo itself is easily overwhelmed by the video information. The photo image and the corresponding second media data can be stored respectively in a storage mode, and the association relation between the photo image and the corresponding second media data is embodied in a certain mode. For example, 11 and 12 in fig. 1, where 11 in fig. 1 is an image file corresponding to photo data, 12 in fig. 1 is a file corresponding to second media data in audio format, and the two identify association relationships through similar file numbers. A new file format may also be developed to accommodate the photo image and the second media data, i.e. to save the photo data and the second media data etc. in the same file, and the contents of the parts of the file are parsed and loaded by the processing tool for that file format. Such a storage may be, for example, file 13 in fig. 1, wherein file 13 includes photo data 14 and second media 15, and photo data 14 and second media 15 may be considered different blocks or tracks of file 13. Of course, in practical application, the forms of the above two examples are not limited. The following describes in detail a method and an apparatus for acquiring a photograph, and a method and an apparatus for processing a media object according to embodiments of the present application with reference to the accompanying drawings.

Example 1

An embodiment of the present application provides a method for obtaining a photo, referring to fig. 2, which is a flowchart of the method, the method for obtaining a photo may include the following steps:

s210: collecting audio data in the process of collecting the first media data; the first media data includes photo data;

firstly, the collection of audio data can be carried out in the process of collecting the first media data; wherein the first media data comprises photo data. When the portable terminal device, such as a smart phone, collects photo data and audio data, the portable terminal device can be realized through corresponding functional components, for example, the portable terminal device can collect photo data through shooting of a camera component, and the smart phone can collect audio data through a microphone component of the smart phone. The audio data is usually synchronous audio data in the process of collecting photo data, so that the scene when the photo is taken is represented by the collected audio data or data obtained by the audio data. In practical application, the "rich media" photo can be obtained by providing an APP for taking a photo, and by a photographing application or a functional module of the photographing application. The audio data can be synchronously acquired in the process of photographing to acquire the photo data after the application or the application function module is started.

The photo is obtained by taking the collected photo data as a core and based on the photo data and the second media data, wherein the photo data can be of different photo types, for example, a common photo, a 3D photo, a panoramic photo, or a photo group formed by a plurality of sub-photos, etc. The panoramic photo, the continuous photo and the like need to be shot for many times in the acquisition process, the acquisition and processing process of the rich media photo based on the photo can have a more flexible processing mode, the characteristics of the method can be reflected, and the follow-up content is mainly introduced based on the photo of the type. The audio data is used as a synchronous medium of the scene at shooting, and information related to the shooting scene can be directly reflected, the audio data or information generated based on the audio data can be used as second media data, and the second media data and the photo data can be stored together, for example, under the same folder, or embedded into the photo data to form a single file. When the photo is displayed, not only can the photo be displayed, but also scene information when the photo is shot can be obtained by analyzing the second media data.

When the type of the photo is a panoramic photo or a photo in the form of a continuous photo, photo data can be treated as a photo group, and the photo group comprises a plurality of sub-photos, except that the panoramic photo is obtained by acquiring a plurality of photos based on shooting positions in the shooting process but is finally spliced into one photo, so in the panoramic photo, the sub-photos can be regarded as logic division relative to the panoramic photo, for example, in specific treatment, the logic division of the sub-photos of the panoramic photo can be determined according to the photo data shot in sequence, and the sub-photos of the panoramic photo can be directly divided in the panoramic photo. The sub-photos in the photo group comprise sub-photos obtained by dividing the panoramic photo, and each sub-photo corresponds to a corresponding part of the generated panoramic photo. The sequential photos themselves are made up of a plurality of photos, and the sub-photos in a group of sequential photos are typically stored as different files, and thus can be regarded as a physical division (as opposed to a logical division of panoramic photos). Such as panoramic photographs or continuous photographs, the photograph data may include a photograph group including at least two sub-photographs. The sub-photos may be physically divided and stored as different sub-photo files, such as photo group 310 shown in fig. 3, including a plurality of independent sub-photo files 311-315, the sub-photos 311-315 forming a photo group; the group of photographs may also be a logical division, such as dividing the photograph into logical parts on a panoramic photograph as sub-photographs, where the divided sub-photographs form a group of photographs, e.g., the portions identified in fig. 3 by division of panoramic photograph 340 into sub-photographs 341-345, 341-345 may be separately taken as sub-photographs of the panoramic photograph, and form a group of photographs.

Multiple shots are required when acquiring photo data of a photo group, and different implementations are possible when acquiring audio data during such photo shots. The first way may be that, during the process of collecting the whole photo group, uninterrupted audio data of the whole process is collected, and it is obvious that the audio thus collected corresponds to the whole photo group, and that the sub-photos are in a many-to-one relationship with the audio data. For example, the audio data 320 in fig. 3 is the uninterrupted audio data described above, the sub-photos 311-315 in the photo group 310 may all correspond to the audio data 310, or the sub-photos 341-345 of the panoramic photo may all correspond to the audio data 310. The second way may be to collect audio data corresponding to the sub-photos, respectively, when each sub-photo is taken, so that the collected audio data may be in the form of a plurality of audio segments, for example, audio segments 331-335 included in the audio data 330 shown in fig. 3, each audio segment corresponding to a corresponding sub-photo, for example, audio segments 331-335 may correspond to sub-photos 311-315 in photo group 310, respectively, or sub-photos 341-345 of the panoramic photo, respectively. The captured audio data includes a plurality of segments of audio, wherein each segment of audio corresponds to one or more sub-photos of the photo group.

S220: generating second media data according to the collected audio data;

after the audio data in the process of acquiring the photo data is acquired, second media data can be generated according to the acquired audio data. The second media data may be audio content obtained according to audio data, collected audio data may be directly used as the second media data, audio or other types of media data may be processed according to the collected audio data to obtain audio or other types of media data as the second media data, or a combination of different types of media data obtained based on the audio data is determined to be the second media data. For example, voice recognition may be performed on the collected audio data, text information corresponding to the audio data may be determined, and the determined text information may be determined as the second media data. Or the audio content can be determined according to the collected audio data, the collected audio data is subjected to voice recognition, text information corresponding to the audio data is determined, and the second media data is determined together based on the audio content and the corresponding text information, namely, the combination of different types of media data is determined as the second media data.

The second media data may also include other types of content, for example, may include results of image recognition on the photo data, for example, to identify whether a specific person or object, such as a mobile phone owner, an automobile, or some animal, is included in the image, and corresponding information is added to the photo based on the image recognition results, so as to increase the interest and applicability of the photo. In specific implementation, the obtained photo data can be subjected to image recognition, and the image recognition result of the photo data is determined; and adding the result of the image recognition to the second media data. In addition, geographical position and/or weather state information in the process of collecting photo data can be acquired, and the geographical position and/or weather state information when the photo is taken can be added into the second media data, so that the photo can be displayed based on the information when the photo is used, for example, the geographical position, the weather state information and the like in the process of collecting the photo data are displayed above the photo content.

The second media data may be implemented based on the photographer's narration. The explanation of the photographer, that is, the explanation content when the photographer takes the picture, for example, in the process that the photographer takes the panoramic picture, a plurality of pictures are taken and spliced to form the panoramic picture, compared with the common picture, the panoramic picture taking is a time-consuming and longer process, and the photographer can take the content of each picture and explain the content of the picture at the same time, so as to generate the explanation audio. The audio data of the shooting commentary of the photographer can be acquired in the process of acquiring the photo data, and then the second media data related to the shooting commentary is generated according to the audio data of the shooting commentary. Therefore, when the corresponding explanation audio is played when the photo is checked, more scene contents in shooting can be known, and the interestingness of the photo is increased.

The second media data may also be acquired or changed after the photo data is determined, for example, an operation entry may be provided in the photographing application or the photo viewing application to acquire or change the second media data corresponding to the photo data. Therefore, after the photo data is acquired, the corresponding audio content is acquired, and even the audio data can be acquired for multiple times, so that the effect of obtaining a more satisfactory target object is achieved.

In addition, when generating the second media data based on the collected audio data, the collected audio data may be preprocessed, effective audio is extracted from the collected audio data, and the second media data is generated based on the extracted effective audio. When the audio data is collected, blank content segments may be collected, the collected audio data may include silent segments, and the blank segments in the audio data may be removed as invalid data; or only the part with the voice is determined as the effective audio, the voice part in the collected audio data is extracted, the extraction result is taken as the effective audio, and then the second media data is generated based on the extracted effective audio.

S230: establishing an association relationship between the first media data and the second media data;

after the first media data and the second media data are determined, the association relationship between the first media data and the second media data can be established, and in different applications, the association relationship between the first media data and the second media data can have different implementation manners and meanings. Because different media data types are integrated, the association relationship between the first data and the second media data exists objectively in most cases, and different applications of the photo can be realized according to the association relationship between the first data and the second media data. For example, in the display stage of the photo, to realize presentation of different types of media data, the first data and the corresponding second media data can be read according to the association relationship so as to load, present and play media content, for example, play audio content while displaying the photo content.

When the type of the photo is the aforementioned panoramic photo or photo group in the form of continuous photo, the association relationship may further include the association relationship between the sub-photo and the second media data, that is, the association relationship between the sub-photo in the photo group and the second media data is established. The second media data may be collected audio data, or media content obtained based on the collected audio data, which is exemplified herein. As shown in fig. 3, the photo data and the audio data may establish a correspondence between the sub-photos 311-315 and the audio data 320, and in a specific implementation, all the sub-photos may correspond to the audio data 320, or only a part of the correspondence between the sub-photos and the audio data 320 may be established, for example, when the sub-photos 312-314 are main contents of a photo group, only the correspondence between the sub-photos 312-314 and the audio data 320 may be established. When the audio data are collected in the form of a plurality of audio segments, corresponding second media data are generated according to each audio segment, and then the corresponding relation between each sub-photo and the corresponding second media data is established. Specifically, the association relationship between the second media data corresponding to the audio segment and one or more sub-photos can be established. For example, in FIG. 3, a correspondence may be established between sub-photographs 341-345 of a panoramic photograph and respective corresponding audio segments in audio data 330. When the effective audio operation is performed on the audio data, there may be an audio piece determined to be ineffective in the audio data, in which case part of the sub-photos may not correspond to the audio piece, or a plurality of the sub-photos may correspond to the same audio piece. Related applications of photos generally relate to a plurality of photo data, and establish an association relationship between the photo data and the second media data, which can reflect an association relationship between the photo data and the corresponding second media data objectively, and also provide convenience for subsequent applications such as displaying the photos, for example, each media content can be loaded correctly according to the association relationship between the photo data and the corresponding second media data, and so on.

The association of the photo data with the second media data may have different implementations in different forms of data organization. A simple file association relationship can be established through the file names, for example, the file names are processed to be of the same character composition, or some part of the file names are of the same character composition, for example, the file of the photo data is named as 'a 015. Jpg', and the corresponding second media data is named as 'a 015. Wav', and when the file is read to display/play the media content, the corresponding media content only needs to be loaded according to the file names. The formats of the pictures and the audio in the examples, such as jpg, wav, etc., are merely exemplary formats, and in practical applications, other suitable media formats of the pictures and the audio may be adopted.

When the photo data includes a plurality of sub-photos, the association between the sub-photos and the second media data may be recorded using a separate file, for example, "a019.Inf", or information of the association may be embedded in a file tag of the photo or the second media data. When reading a file to present/play media content, a file tag of photo/second media data may be first read to load corresponding media content according to a record in the file tag.

Examples of associations of sub-photos with second media data are shown in table 1:

TABLE 1

image	audio
		A015.1.jpg	A015.1.wav
A015.2.jpg	N/A
		A015.3.jpg	A015.3.wav
…	…

When recording the association relationship between the sub-picture and the second media data by using the independent file or file tag information for the sub-picture obtained by dividing the panoramic picture, the following information may be recorded:

number of sub-photographs of the panoramic photograph; panoramic photo height/length; a second sequence of media data.

A specific example of the above information is as follows:

{parts＝5；pixels＝1920；audio＝(A015.1.wav；A015.2.wav；null；A015.4.wav； null)}

when the file is read to display/play the media content, the information can be read and analyzed first, the panoramic photo is divided according to the information to determine the range and the content of the sub-photo, the corresponding second media content of each sub-photo is indexed, and the corresponding media content is loaded according to the record in the information.

In an implementation manner of merging the photo data and the second media data into the same file, for example, when merging the photo data and the second media data into different tracks of the same file, the above-mentioned association relationship information of the photo data and the second media data may be embodied as a binding relationship between the photo data and the second media data, or in an application manner including a plurality of sub-photos, the information of the association relationship may be embedded into the target file, for example, as tag information of the target file.

In addition, some photo-related dynamic effects, such as birthday fireworks, loving shapes, etc., may also be included in the generated target object. The photo-related dynamic effects may be determined based on the way the preset effects are applied, the user's selection in a library of preset effects, etc. The interactive effect data can be stored together with the target object in the generation. The interaction effect data has a corresponding relation with the target object. The interactive effect data can be added into the photo data or can be stored independently. When the first media data, i.e. the photo data, is displayed, a corresponding interaction effect can be displayed according to the interaction effect data. When the interactive effect data is stored independently, optionally, the corresponding relationship between the interactive effect data and the target object, or the corresponding relationship between the interactive effect data and the photo data may be stored, for example, in the above-mentioned association relationship, so that when the first media data is displayed, the association relationship is read to determine the interactive effect data and display the corresponding interactive effect.

S240: generating a target object according to the first media data, the second media data and the association relation; and when the target object is operated, outputting the first media data and the second media data according to the corresponding relation.

The target object is generated according to the first media data, the second media data and the association relation, and can have different forms in different applications. In one implementation, the target object may be a file set, where the file set may include photo data, second media data, and association information of the photo data and the second media data, and an instance of the file set may be as shown in table 2, including files in table 2:

TABLE 2

A015.1.jpg	A015.1.wav	A015.1.txt
			A015.2.jpg	A015.2.wav	A015.2.txt
A015.3.jpg	A015.3.wav	A015.3.txt
			…	…	A015.inf

The file comprises photo data in jpg format, second media data in wav and txt format, and file in inf format to record association relationship between the photo data and the second media data.

Each file in the file set, including the photo data file and the second media file, may be stored separately, or the image file, the audio file and the association relationship of each sub-photo may be stored in a unified package file. For example, the files listed in table 2 are packaged and stored in a packaged file, so that different photo groups/file sets can be easily identified, and storage in a compressed form can also play a role in saving storage space. Another way to generate the target object may be to combine the photo data with the second media data into the same file according to the association relationship, such as being finally saved as 13 in fig. 1. Of course, when the photo data and the second media data are combined into the same file according to the association relationship, the correspondence relationship between multiple sub-photos and multiple audio segments needs to be considered, for example, when the panoramic photo 340 in fig. 3 and each audio segment in the audio 330 are combined into the same file, each sub-photo and each audio segment can be correspondingly stored in a corresponding track, and the correspondence relationship is represented by the positions of the sub-photos and the audio. Of course, the sub-photos of the panoramic photo and the corresponding second media files may be files stored independently of each other, and the association relationship between the sub-photos and the corresponding second media files is recorded by recording the file or file tag information. When the sub-photos in the photo group divide the panoramic photo to obtain sub-photos, each sub-photo corresponds to a corresponding part of the generated panoramic photo, the association relation between each sub-photo of the panoramic photo and the second media data can be saved, when the image file is displayed and the audio file is played, the panoramic photo can be displayed, and when the corresponding part of the sub-photo of the panoramic photo is displayed, the audio file corresponding to the current sub-photo is played according to the association relation.

In short, in the process of generating the target object according to the photo data, the second media data and the association relationship, the photo data can be saved to the image file, the audio data can be saved to the audio file, and the association relationship can be saved; when the first media data and the second media data are output according to the corresponding relation, for example, when the photo content is checked through a photo checking application, the stored association relation can be read, the image file is displayed, and the audio file is played.

The above describes in detail the method for acquiring a photo provided in the first embodiment of the present application, where the method may perform acquisition of audio data in a process of acquiring photo data, generate second media data according to the acquired audio data, where the second media data may be audio content obtained by the audio data, and after an association relationship between the audio data and the second media data is established, generate a target object according to the first media data, the second media data, and the association relationship; and when the target object is operated, outputting the first media data and the second media data according to the corresponding relation. The obtained photo combines photo data and content generated based on audio data when the photo data is acquired, scene information when the photo is taken is reflected through the second media data, information showing the scene related to the moment of taking is integrated into a target object together with the photo content, and particularly, for panoramic photos or photo types with relatively long shooting time such as continuous shooting photos, abundant synchronous audio content can be acquired. The target object produced by the method has the advantage of being relatively lightweight compared with media content such as video. When the obtained target object is output, information related to the scene when the photo is shot can be obtained, so that the content of the photo is enriched, the interestingness of the photo application is increased, and the richness of the photo application is increased.

Example two

The second embodiment of the present application provides a method for processing a media object, where the method may be based on the target object generated in the first embodiment, where the generated target object is processed as a media object. The method is more focused on the processing of such media objects, for example, a media object browsing application may be provided by which the type of media object may be processed, and the content of the photo data and the second media data in the media system may be read and parsed for viewing by the user. The media object may be generated based on first media data, which may include photo data, and second media data, which may be generated from audio data collected during the collection of the photo data. The audio data collected in the process of collecting the photo data can be realized by calling a voice assistant in the system or collecting the audio data through a recording interface of the system. As shown in fig. 4, which is a flowchart of a method of processing a media object, the method may include the steps of:

s410: providing a first operation option for operating the media object;

first, a first operation option to operate on a media object may be provided. In particular, different implementations may be used according to the actual application. The processing method of the media object can be applied to a photo showing application, for example, and the photo data can be processed and shown by the photo showing application and the corresponding second media data can be played/shown. The specific form of the first operation option may be different according to the actual application environment of the software application, and the data organization form of the media object. For example, when the media object comprises a file set including photo data, second media data and association information of the photo data and the second media data, an item of any photo data displayed can be used as a first operation option; for example, when the file of the media object is a single file, the photograph file and the second media file are packaged into a packaged file as described above, or are combined into a file, thumbnail images of the packaged file or the combined file may be provided in the user interface, and multiple thumbnail images may form a file list, and the thumbnail images are used as the first operational options.

A file list may be provided in the user interface, the items in the file list corresponding to the media objects, and the second media data may be hidden in the user interface, but only visual content associated with the photo image is displayed, so that the display content is more intuitive and compact. The file list may include a plurality of items, for example, may include thumbnails corresponding to the picture data in the media file, and a first operation option may be implemented on the items in the file list, for example, the thumbnails may be configured as operable objects, and when the user clicks the thumbnails, the data of the corresponding media objects may be read and analyzed, and the corresponding photos may be displayed and the corresponding second media data such as audio may be played. Icon information may also be provided on the thumbnail of the media object to identify the target object as a media object that includes photo data as well as second media data, in order to distinguish from a common photo file.

S420: when an operation request for loading the media object is received through the first operation option, loading photo data and second media data, displaying the photo data and playing corresponding audio data content.

When an operation request for loading the media object is received through the first operation option, the content data of the media object can be read, the photo data and the second media data are loaded, the photo content in the photo data is displayed, and the content of the second media data, such as text obtained based on the audio data, audio content played and the like, are displayed/played. The specific implementation of loading the photo data and the second media data may be different according to different data organization forms of the media object, for example, when the media object includes a file set including photo data, the second media data and association information of the photo data and the second media data, the association information may be read, when the photo data is loaded, the corresponding second media data is loaded according to the association information, for example, when the photo data includes a plurality of sub-photos, each sub-photo corresponds to a different audio segment, and when the current sub-photo is loaded, the audio segment corresponding to the current sub-photo may be loaded and played according to the association information.

When the media object further comprises the association relation information of the photo data and the second media data, the association relation information can be read when the photo data and the second media data are loaded, and the second media data corresponding to the photo data are determined according to the association information so as to play corresponding audio data content when the photo data are displayed. The photo data may comprise a photo group comprising at least two sub-photos, and the audio data may comprise a plurality of audio segments, wherein each audio segment corresponds to one or more sub-photos of the photo group. The association relationship information may include the association relationship between each sub-photo in the photo group and the audio segment, and in this implementation manner, when playing the corresponding audio data content when displaying the photo data, the association relationship between each sub-photo and the audio segment may be read, and the audio segment corresponding to each sub-photo is determined according to the association relationship, so as to play the corresponding audio segment content when displaying the sub-photo.

The sub-photos can comprise sub-photos obtained by dividing the panoramic photo, and each sub-photo corresponds to a corresponding part of the generated panoramic photo; correspondingly, the association relationship can comprise the association relationship between each sub-photo of the panoramic photo and the audio segment; in this implementation manner, the association relationship information is read, and when the second media data corresponding to the photo data is determined according to the association information, the association relationship between each sub-photo of the panoramic photo and the audio segment can be read, and the audio segment corresponding to the sub-photo of the panoramic photo currently displayed is determined according to the association relationship, so that the content of the audio segment corresponding to the current sub-photo can be played when the panoramic photo is displayed.

When the file of the media object is a single file, the photo file and the second media file are packaged into a packaged file as described above, the unpacking process may be performed first, and the photo data and the corresponding second media data therein are loaded. When the file of the media object is a single file, the photo data and the second media file are combined into one file, and each track of the combined file can be analyzed in a manner of accommodating the photo data and the second media data in the combined file by different tracks, so that the photo data and the second media data are determined to be displayed or played.

In addition to providing the first operation option for requesting loading of the media object, a second operation option for operating the display content may be provided, for example, when the photo content is displayed, an operation option of a sliding operation may be provided through a screen of the terminal device, and when the sliding operation is received through the second operation option, a portion of displaying the panoramic photo is switched according to the sliding direction, for example, to a next sub-photo portion of the panoramic photo when sliding leftwards, so as to determine an audio segment corresponding to the switched portion.

The second media data may include a combination of various media information, such as performing speech recognition on the collected audio data, determining text information corresponding to the audio data, and using both the audio and the text as the second media data, where the operation options of the switching operation may be provided during the presentation. In particular, a third operation option for switching the content of the providing mode of the second media data may be provided, and when a switching operation request is received through the third operation option, switching is performed between the provided audio data or the display text information. For example, as shown in fig. 5, in fig. 5 (a), the currently displayed photo data and the corresponding audio content are played in the user interface, and a switching operation option 510 is provided, when a switching request is received through the operation option 510, the playing of the audio content of the second media data is switched to the text display of the second media data, as shown in fig. 5 (b), in the state shown in fig. 5, the currently displayed photo data and the corresponding text content are displayed, and an operation option 520 is provided, and when the switching operation request is received through the operation option 520, the form of playing the audio content of the second media data can be switched to that shown in fig. 5 (a).

Operational options for controlling the playing of the audio content may also be provided when playing the second media data in the form of audio content. For example, control buttons for corresponding audio content may be provided when photo data is displayed in the user interface. In one implementation, the audio content may be configured in a stop play state by default, while a play operation option is provided, and the corresponding audio content is played again when the user operates the play operation option. In particular, a fourth operation option for controlling the audio data content may be provided, and when a control request is received through the fourth operation option, the playback/pause of the audio data content is controlled to continue to be played/stopped, and so on.

There may be different implementations for displaying the photo data and playing the corresponding audio data content in the user interface, for example, a target page may be provided, and the content of the audio data is loaded as a background sound of the page when the photo data is loaded as a background image of the target page. The method can also provide a target window in applications such as injection photo display and the like, display photo data in the target window, and call an audio playing interface to play corresponding audio data content in the background.

The above describes in detail the method for processing a media object provided in the second embodiment of the present application, where the media object may be generated based on the first media data and the second media data; the first media data comprises photo data, and the second media data is generated according to the audio data acquired in the process of acquiring the photo data; a first operation option to operate on the media object may be provided; when an operation request for loading the media object is received through the first operation option, loading the photo data and the second media data, displaying the photo data and playing the corresponding audio data content. According to the method, based on the content generated by combining the photo data and the audio data when the photo data is acquired, the scene information when the photo is taken is reflected through the second media data, the information about the scene at the moment of taking is integrated into the media object together with the photo content, and particularly, for panoramic photos or photo types with relatively long shooting time such as continuous photos, rich synchronous audio content can be acquired. When the media object is output, the information related to the scene of the photo shooting at the moment can be obtained while the photo content is displayed, so that the photo content is enriched, and the interestingness of the photo application and the richness of the photo application are increased.

Example III

The third embodiment of the application provides a panoramic photo; wherein the panoramic photograph may include first media data; the first media data may be image data of a panoramic photograph; and second media data, which may be generated according to audio data synchronously collected during the collection of the first media data; therefore, when the panoramic photo is operated, the first media data and the second media data can be output, the effect that synchronous audio content of a corresponding scene can be played while the panoramic photo is displayed is achieved, and the application of the photo is enriched.

The panoramic photograph provides an organization form of "rich media" photograph information, which can also be considered a new form of data. On the basis of the photo, second media data obtained according to the audio data is introduced, and a photo form of 'rich media' can be obtained through the combined application of the photo and the second media data. The second media data may be obtained from the collected audio data. The collection and processing of audio data is relatively easy to implement, and processes such as recording, encoding, and storing of audio are lighter. In addition, the second media data obtained based on the audio data can be processed and output in a superposition mode on the basis of the content of the photo, the photo is represented more on the side while certain photo shooting scene information is represented, the photo obtaining and displaying processes are processed by taking the photo data as a center, and the photo body can be represented in photo application. The photo image and the corresponding second media data can be stored respectively in a storage mode, and the association relation between the photo image and the corresponding second media data is embodied in a certain mode. For example, 11 and 12 in fig. 1 are shown in the foregoing, where 11 in fig. 1 is an image file corresponding to photo data, 12 in fig. 1 is a file corresponding to second media data in audio format, and the two are identified by similar file numbers. A new file format may also be developed to accommodate the photo image and the second media data, i.e. to save the photo data and the second media data etc. in the same file, and the contents of the parts of the file are parsed and loaded by the processing tool for that file format. Such a storage may be, for example, file 13 in fig. 1, wherein file 13 includes photo data 14 and second media 15, and photo data 14 and second media 15 may be considered different blocks or tracks of file 13. Of course, in practical application, the forms of the above two examples are not limited. The panoramic photo may include a plurality of logically divided sub-photos, and different sub-photos may correspond to different audio clips, such as the foregoing sub-photos and the organization of audio clips in fig. 3; of course, some of the sub-photos may not correspond to any audio clip, similar to the correspondence shown in table 1 above.

The third embodiment of the present application provides a panoramic photo, which may include image data of the panoramic photo, and second media data; the second media data is generated from audio data synchronously collected during the process of collecting the first media data. Thus, a photo form of 'rich media' is provided, wherein audio data is used as synchronous media of a scene at shooting time, information related to the scene at shooting time can be directly reflected, the audio data or information generated based on the audio data can be used as second media data, and the second media data can be stored together with photo data, for example, under the same folder, or embedded into the photo data to form a single file. When the photo is displayed, not only can the photo be displayed, but also the audio scene information when the photo is shot can be obtained by analyzing the second media data. The content of the photo information is enriched, and the interestingness and the richness of the content of the photo are increased.

Corresponding to the first embodiment of the present application, there is further provided a photo obtaining apparatus, as shown in fig. 6, which is a schematic diagram of the photo obtaining apparatus, and the apparatus may include:

an audio data acquisition unit 610, configured to acquire audio data during the process of acquiring the first media data; wherein the first media data may include photo data;

A second media data generating unit 620, configured to generate second media data according to the collected audio data;

an association relationship establishing unit 630, configured to establish an association relationship between the first media data and the second media data; and

a target object generating unit 640, configured to generate a target object according to the first media data, the second media data, and the association relationship; and when the target object is operated, outputting the first media data and the second media data according to the corresponding relation.

Corresponding to the two embodiments of the present application, a processing device for media objects is also provided, as shown in fig. 7, which is a schematic diagram of a photo obtaining device. Wherein the media object may be generated based on the first media data and the second media data; the first media data comprises photo data, and the second media data is generated according to the audio data collected in the photo data collecting process. The apparatus may include:

an operation option providing unit 710 for providing a first operation option for operating the media object;

an object loading display unit 720, configured to load the photo data and the second media data, display the photo data and play corresponding audio data content when receiving an operation request for loading the media object through the first operation option.

In addition, the embodiment of the application also provides electronic equipment, which can include:

one or more processors; and

generating second media data according to the collected audio data;

Fig. 8, among other things, illustrates an architecture of an electronic device, for example, device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, an aircraft, and so forth.

Referring to fig. 8, device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing element 802 may include one or more processors 820 to execute instructions to complete the video playing method provided in the present disclosure, when a preset condition is met, generate a traffic compression request, and send the traffic compression request to a server, where the traffic compression request records information for triggering the server to obtain a target attention area, and the traffic compression request is used for requesting the server to preferentially guarantee a code rate of video content in the target attention area; and playing the video content corresponding to the code stream file according to the code stream file returned by the server, wherein the code stream file is all or part of the video file obtained by the server performing code rate compression processing on the video content outside the target attention area according to the flow compression request. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 may include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 806 provides power to the various components of the device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 800.

The multimedia component 808 includes a screen between the device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or sliding action, but also the duration and pressure associated with the touch or sliding operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the device 800. For example, the sensor assembly 814 may detect an on/off state of the device 800, a relative positioning of the components, such as a display and keypad of the device 800, the sensor assembly 814 may also detect a change in position of the device 800 or a component of the device 800, the presence or absence of user contact with the device 800, an orientation or acceleration/deceleration of the device 800, and a change in temperature of the device 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communication between the device 800 and other devices, either wired or wireless. The device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication part 816 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

In an exemplary embodiment, there is further provided a non-transitory computer readable storage medium including instructions, for example, a memory 804 including instructions, where the instructions are executable by a processor 820 of an apparatus 800 to generate a traffic compression request when a preset condition is met in a video playing method provided by the technical solution of the present disclosure, and send the traffic compression request to a server, where information for triggering the server to obtain a target attention area is recorded in the traffic compression request, and the traffic compression request is used to request the server to preferentially guarantee a code rate of video content in the target attention area; and playing video contents corresponding to the code stream file according to the code stream file returned by the server, wherein the code stream file is a video file obtained by the server performing code rate compression processing on the video contents outside the target attention area according to the flow compression request. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in the embodiments or some parts of the embodiments of the present application.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

In the event that applicable legal regulations in the country are met (e.g., the user explicitly agrees, informed of the user, etc.), the user-specific personal data may be used in the schemes described herein within the scope of applicable legal regulations.

The above detailed description of the method and apparatus for acquiring the photograph and processing the media object provided in the present application applies specific examples to illustrate the principles and embodiments of the present application, and the above examples are only used to help understand the method and core ideas of the present application; also, as will occur to those of ordinary skill in the art, many modifications are possible in view of the teachings of the present application, both in the detailed description and the scope of its applications. In view of the foregoing, this description should not be construed as limiting the application.

Claims

1. A method for obtaining a photograph, comprising:

collecting audio data in the process of collecting the first media data; the first media data comprises panoramic photo data, the panoramic photo data comprises a photo group, the photo group comprises at least two sub-photos, and the panoramic photo data comprises data in a picture format;

Generating second media data according to the collected audio data, wherein the audio data comprises a plurality of sections of audio segments, and each audio segment corresponds to one or more sub-photos of the photo group; the second media data comprises data in audio format;

according to the association relation, the first media data and the second media data are used as different data blocks or tracks in the file with the same target format so as to generate a target object; the target object, when operated, outputs the first media data and the second media data through a processing tool for the target format.

2. The method of claim 1, wherein the establishing the association of the first media data with the second media data comprises:

and establishing an association relationship between the sub-photos in the photo group and the second media data.

3. The method of claim 2, wherein the sub-photographs in the photograph group include sub-photographs obtained by dividing the panoramic photograph, each sub-photograph corresponding to a respective portion of the generated panoramic photograph.

4. The method of claim 2, wherein the sub-photos in the photo group include sub-photos obtained by continuous shooting, each sub-photo corresponding to an independent photo file.

5. The method of claim 2, wherein generating the second media data from the collected audio data comprises:

generating corresponding second media data according to each audio segment;

the establishing the association relationship between the first media data and the second media data includes:

and establishing association relation between the second media data corresponding to the audio segment and one or more sub-photos.

6. The method of claim 1, wherein generating the second media data from the collected audio data comprises:

and carrying out voice recognition on the collected audio data, determining text information corresponding to the audio data, and determining the text information as the second media data.

7. The method of claim 1, wherein generating the second media data from the collected audio data comprises:

determining audio content according to the collected audio data, performing voice recognition on the collected audio data, and determining text information corresponding to the audio data;

The second media data is determined based on the audio content and the text information.

8. The method of claim 1, wherein generating the second media data from the collected audio data comprises:

extracting effective audio from the collected audio data, and generating second media data based on the extracted effective audio.

9. The method according to any one of claims 1-8, further comprising:

performing image recognition on the acquired panoramic photo data, and determining an image recognition result of the panoramic photo data;

and adding the result of the image recognition to the second media data.

10. The method according to any one of claims 1-8, further comprising:

acquiring geographic position and/or weather state information in the process of acquiring the first media data;

adding the geographical location and/or weather status information to the second media data.

11. The method of any of claims 1-8, wherein the capturing of audio data during the capturing of the first media data comprises:

during the process of collecting the first media data, collecting audio data of shooting explanation of a photographer;

The generating second media data according to the collected audio data includes:

and generating second media data related to the shooting commentary according to the audio data of the shooting commentary.

12. The method according to any one of claims 1-8, further comprising:

determining interaction effect data; the interaction effect data has a corresponding relation with the target object; and when the first media data is displayed, displaying the corresponding interaction effect according to the interaction effect data.

13. A method for processing a media object, wherein the media object is generated by using first media data and second media data with association as different data blocks or tracks in a file with the same target format; the first media data comprise panoramic photo data, and the second media data are generated according to audio data acquired in the panoramic photo data acquisition process; the panoramic photo data comprises data in a picture format, and the second media data comprises data in an audio format; the panoramic photograph data includes a photograph group including at least two sub-photographs; the audio data includes a plurality of segments of audio, wherein each segment of audio corresponds to one or more sub-photos of the photo group; the method comprises the following steps:

Providing, by a processing tool for the target format, a first operation option to operate on the media object;

and when an operation request for loading the media object is received through the first operation option, loading the panoramic photo data and the second media data, displaying the panoramic photo data and playing corresponding audio data content.

14. The method of claim 13, wherein the association comprises an association of each sub-photo in the photo group with the audio segment;

the loading the panoramic photo data and the second media data, displaying the panoramic photo data and playing corresponding audio data content, includes:

and reading the association relation between each sub-photo and the audio segment, and determining the audio segment corresponding to each sub-photo according to the association relation so as to play the corresponding audio segment content when the sub-photo is displayed.

15. The method of claim 13, wherein the association comprises an association of each sub-photograph of the panoramic photograph with the audio segment;

And reading the association relation between each sub-photo of the panoramic photo and the audio segment, and determining the audio segment corresponding to the sub-photo of the panoramic photo currently displayed according to the association relation so as to play the corresponding audio segment content of the current sub-photo when the panoramic photo is displayed.

16. The method as recited in claim 15, further comprising:

providing a second operation option for operating the display content;

when a sliding operation is received through the second operation option, a part of the panoramic photo is displayed according to the sliding direction, and an audio segment corresponding to the switched part is determined.

17. The method as recited in claim 13, further comprising:

performing voice recognition on the collected audio data, and determining text information corresponding to the audio data;

providing a third operation option for switching the providing mode content of the second media data;

and when a switching operation request is received through the third operation option, switching between the provided audio data or the display text information.

18. The method of claim 13, wherein the audio data collected during the collection of the panoramic photo data comprises:

The audio data is collected by invoking a voice assistant in the system or a recording interface of the system.

19. The method as recited in claim 13, further comprising:

providing a fourth operational option for controlling the audio data content;

and when receiving a control request through the fourth operation option, controlling the playing/suspending of the audio data content and then continuing playing/suspending.

20. The method of claim 13, wherein the displaying the panoramic photo data and playing the corresponding audio data content comprises:

and providing a target page, and loading the content of the audio data into the background sound of the page when loading the panoramic photo data into the background image of the target page.

21. The method of claim 13, wherein the displaying the panoramic photo data and playing the corresponding audio data content comprises:

providing a target window, displaying the panoramic photo data in the target window, and calling an audio playing interface to play corresponding audio data content in the background.

22. The method according to any one of claims 13-21, further comprising:

Providing a list of files in a user interface, the items in the list of files corresponding to the media objects and hiding the second media data in the user interface.

23. The method as recited in claim 22, further comprising:

the first operational option is implemented on an item in the file list.

24. The method of claim 22, wherein the items in the file list comprise thumbnails of picture data.

25. The method of claim 24, wherein icon information is provided on a thumbnail of the picture data, the icon information to identify a target object as the media object comprising panoramic photo data and second media data.

26. A photograph acquisition apparatus, comprising:

the audio data acquisition unit is used for acquiring audio data in the process of acquiring the first media data; the first media data comprises panoramic photo data, the panoramic photo data comprises a photo group, the photo group comprises at least two sub-photos, and the panoramic photo data comprises data in a picture format;

a second media data generating unit, configured to generate second media data according to the collected audio data, where the audio data includes a plurality of audio segments, and each audio segment corresponds to one or more sub-photos of the photo group; the second media data comprises data in audio format;

the target object generating unit is used for generating a target object by taking the first media data and the second media data as different data blocks or tracks in the file with the same target format according to the association relation; the target object, when operated, outputs the first media data and the second media data through a processing tool for the target format.

27. A processing device for a media object, wherein the media object is generated by taking first media data and second media data with association relation as different data blocks or tracks in a file with the same target format; the first media data comprise panoramic photo data, and the second media data are generated according to audio data acquired in the panoramic photo data acquisition process; the panoramic photo data comprises data in a picture format, and the second media data comprises data in an audio format; the panoramic photograph data includes a photograph group including at least two sub-photographs; the audio data includes a plurality of segments of audio, wherein each segment of audio corresponds to one or more sub-photos of the photo group; the device comprises:

An operation option providing unit for providing a first operation option for operating the media object through a processing tool for the target format;

and the object loading display unit is used for loading the panoramic photo data and the second media data when receiving an operation request for loading the media object through the first operation option, displaying the panoramic photo data and playing corresponding audio data content.

28. An electronic device, comprising:

one or more processors; and

according to the association relation, the first media data and the second media data are used as different data blocks or tracks in the file with the same target format so as to generate a target object; when the target object is operated, the first media data and the second media data are output through a processing tool aiming at the target format.