CN114092674A

CN114092674A - Multimedia data analysis method and system

Info

Publication number: CN114092674A
Application number: CN202210076247.6A
Authority: CN
Inventors: 易星
Original assignee: Beijing Peiruiweihang Interconnection Technology Co ltd
Current assignee: Beijing Peiruiweihang Interconnection Technology Co ltd
Priority date: 2022-01-24
Filing date: 2022-01-24
Publication date: 2022-02-25
Anticipated expiration: 2042-01-24
Also published as: CN114092674B

Abstract

The invention discloses a multimedia data analysis method and a multimedia data analysis system, wherein high-freedom VR tourism of a user is realized through the arrangement of a VR output module, an eye movement marking module, a media analysis module and a content selection module, scene objects in a VR scene are marked through the eye movement marking module, the user can synchronously screen similar or identical contents through a cloud end to carry out recommendation display and share tourism ideas, a virtual tourism interaction platform based on the cloud end and VR is built, people can conveniently and clearly visit according to self information, and the experience of VR tourism is well improved.

Description

Multimedia data analysis method and system

Technical Field

The invention relates to the field related to data processing, in particular to a multimedia data analysis method and a multimedia data analysis system.

Background

Multimedia data generally refers to data carrier types such as files, audio, pictures and videos, the multimedia data can generally transmit information content to personnel intuitively and quickly, and with the rapid development of technologies such as internet technology and VR, the display modes of the multimedia data become richer gradually and the application of the multimedia data is wider.

In the prior art, VR technology has been developed and used for online tourism in scenic spots and online tourism in exhibitions, and has been recognized and pursued by people because of the presence experience that the traditional multimedia method does not have.

However, in the prior art, VR tour modes are limited to a simple scene passive output mode, and the use experience is single, so that online tour of a large scene is tedious, a user cannot conveniently select and acquire interesting contents to actively select tour, and fatigue is easily caused after long-time use.

Disclosure of Invention

The present invention is directed to a multimedia data analysis method and system, which solve the problems set forth in the background art.

In order to achieve the purpose, the invention provides the following technical scheme:

multimedia data analysis system is applicable to scenes such as VR travel and VR exhibition, includes:

the VR output module is used for acquiring and outputting multimedia content in real time, wherein the multimedia content is a local VR scene of a display scene;

the eye movement marking module is used for acquiring and generating eyeball movement information of a user, analyzing and generating watching direction information according to the eyeball movement information, receiving and responding to object marking information, and marking corresponding scene content according to the multimedia content and the watching direction information;

the media analysis module is used for extracting the marked scene content in the multimedia content, performing content identification analysis on the scene content, and generating a marking tag according to an analysis result, wherein the marking tag is used for representing the characteristic information of the scene content, and the marking tag is uploaded through a cloud server and is synchronized into a cloud tag;

and the content selecting and listing module is used for comparing and screening the cloud tags of the display scene according to the marking tags and marking the display scene according to the position information corresponding to the cloud tags in the comparison and screening result.

As a further scheme of the invention: the eye movement marking module comprises:

the eye tracking unit is used for tracking the movement of the eyeballs of the user through sensing and image acquisition equipment to generate a group of eye movement information of the user;

the focusing calculation simulating unit is used for carrying out motion analysis on a group of eyeball motion information to generate watching direction information, the motion analysis is used for carrying out focusing calculation on the watching direction of the user according to the eyeball motion information, and the watching direction information is used for representing the watching direction of the eyeballs of the user;

the scene focusing unit is used for carrying out focusing prompt on the corresponding scene content in the multimedia content according to the gazing direction information, the focusing prompt is used for displaying an identification result of the system on the gazing content of the user to the user, and the scene content corresponds to the unique position information;

and the object marking unit is used for receiving and responding to object marking information and marking the scene content of the focusing prompt.

As a further scheme of the invention: the content selection module comprises:

the tag acquisition unit is used for accessing the cloud tags of the display scene through the cloud server;

the tag screening unit is used for comparing and screening the cloud tags according to the content of the marked tags to generate screening results, and the content coincidence rate of the cloud tags and the marked tags in the screening results is greater than or equal to a preset screening standard;

and the tag output unit is used for acquiring the position information corresponding to the cloud tag in the screening result, and marking and updating the display scene according to the position information.

As a further scheme of the invention: the VR output module includes:

the scene acquisition unit is used for acquiring scene model data and a ground data distribution map of the display scene, wherein the scene model data is arranged corresponding to the data distribution map and is used for representing a space model and image information of the display scene at a certain position;

the motion simulation unit is used for generating an observation motion point location in the data distribution map, receiving motion control information from a user terminal, and controlling and updating the position information of the observation motion point location relative to the data distribution map according to the motion control information, wherein the observation motion point location is used for representing the simulated observation position information of a user, and the observation motion point location comprises direction information;

and the scene output unit is used for acquiring the corresponding scene model data according to the observed motion point, rendering and outputting the scene model data.

As a further scheme of the invention: the eye movement marking module further comprises:

and the content marking unit is used for receiving and responding message marking information and marking the scene content of the focusing prompt, wherein the message marking information is used for representing the subjective marking content of the user.

As a further scheme of the invention: the media analysis module comprises a subjective tag unit, and the content selection module comprises a subjective screening unit;

the subjective tag unit is used for extracting the content of the message marking information to generate a subjective tag, and the subjective tag is uploaded and synchronized into a subjective cloud tag through the cloud server;

and the subjective screening unit is used for screening the subjective cloud tags through the subjective tags and marking the display scene.

The embodiment of the invention aims to provide a multimedia data analysis method, which is characterized by comprising the following steps:

acquiring and outputting multimedia content in real time, wherein the multimedia content is a local VR scene of a display scene;

acquiring and generating eyeball motion information of a user, analyzing and generating watching orientation information according to the eyeball motion information, receiving and responding object marking information, and marking corresponding scene content according to the multimedia content and the watching orientation information;

extracting the marked scene content in the multimedia content, performing content identification analysis on the scene content, and generating a marking tag according to an analysis result, wherein the marking tag is used for representing the characteristic information of the scene content, and the marking tag is synchronously updated through a cloud server;

and comparing and screening the cloud tags of the display scene according to the marked tags, and marking the display scene according to the azimuth information corresponding to the cloud tags in the comparison and screening result.

As a further scheme of the invention: the steps of collecting and generating the eyeball motion information of the user, analyzing and generating the gazing direction information according to the eyeball motion information, receiving and responding to object marking information, and marking the corresponding scene content according to the multimedia content and the gazing direction information specifically comprise:

the method comprises the steps that the eyeball of a user is tracked through sensing and image acquisition equipment, and a group of eyeball movement information of the user is generated;

performing motion analysis on the group of eyeball motion information to generate gaze direction information, wherein the motion analysis is used for performing focusing calculation on the gaze direction of the user according to the eyeball motion information, and the gaze direction information is used for representing the gaze direction of the eyeballs of the user;

focusing prompt is carried out on the corresponding scene content in the multimedia content according to the gazing direction information, and the focusing prompt is used for displaying the recognition result of the system on the gazing content of the user to the user;

and receiving and responding to object marking information, and marking the scene content of the focusing prompt.

As a further scheme of the invention: the step of comparing and screening the cloud tags of the display scene according to the labeled tags and labeling the display scene according to the orientation information corresponding to the cloud tags in the comparison and screening result specifically includes:

accessing a cloud tag of the display scene through the cloud server;

comparing and screening the cloud labels through the content of the marked labels to generate a screening result, wherein the content coincidence rate of the cloud labels and the marked labels in the screening result is greater than or equal to a preset screening standard;

and acquiring position information corresponding to the cloud label in the screening result, and marking and updating the display scene according to the position information.

Compared with the prior art, the invention has the beneficial effects that: through VR output module, eye movement mark module, the setting of media analysis module and content option module, user's high degree of freedom VR tourism has been realized, simultaneously mark the scene article in the VR scene through eye movement mark module, the user can recommend the sharing that shows and the tourism is thought through the synchronous screening similar or the same content in high in the clouds, a virtual tour interaction platform based on high in the clouds and VR has been built, can make people more convenient and clear and definite purposive tourism according to self information, good promotion VR tourism's experience.

Drawings

Fig. 1 is a block diagram showing a configuration of a multimedia data analysis system.

Fig. 2 is a block diagram of an eye movement labeling module in the multimedia data analysis system.

FIG. 3 is a block diagram of a content selection module in the multimedia data analysis system.

FIG. 4 is a block diagram of a VR output module in the multimedia data analysis system.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The following detailed description of specific embodiments of the present invention is provided in connection with specific embodiments.

As shown in fig. 1, the multimedia data analysis system provided for an embodiment of the present invention is suitable for VR travel and VR exhibition, and includes:

and the VR output module 100 is configured to acquire and output multimedia content in real time, where the multimedia content is a local VR scene of the display scene.

The eye movement marking module 300 is configured to collect and generate eye movement information of a user, analyze and generate gaze direction information according to the eye movement information, receive and respond to object marking information, and mark corresponding scene content according to the multimedia content and the gaze direction information.

The media analysis module 500 is configured to extract the scene content marked in the multimedia content, perform content identification analysis on the scene content, and generate a mark tag according to an analysis result, where the mark tag is used to represent feature information of the scene content, and the mark tag is uploaded through a cloud server and synchronized into a cloud tag.

The content selecting and listing module 700 is configured to compare and screen the cloud tags of the display scene according to the tag tags, and tag the display scene according to the position information corresponding to the cloud tags in the comparison and screening result.

In this embodiment, in use, firstly, a display scene is selected by a user, for example, a user a wants to watch a certain park through a VR device, then VR data content of the display scene of the park, that is, the whole park is downloaded and obtained, and then output through the VR device (the data content of a part of the display scene output through the VR device at a certain moment is defined as multimedia content here) (the above is realized by the VR output module 100), when the user visits the display scene of the park through the VR device, the sensing device tracks the eyeball movement of the user a, so as to determine a scene object in the park watched by the user a through simulation calculation (the above is realized by the eye movement marking module 300), for example, a plant m that the user a never sees is watching in the park, the user a has a great interest in the plant m, and wants to find and watch other same plantlets m, at this time, the user a marks the plant m watched by the user a through the control unit, then the system performs feature analysis on the marked plant m to generate a mark label (for example, feature 1, feature 2 or name) (which is implemented by the media analysis module 500), and then the system accesses all the park labels according to the label to find out the same or similar labels, and marks the same or similar labels in the display scene of the park, and outputs the labels through the VR device of the user to assist the user in viewing the park visit (which is implemented by the content listing module 700) (where the sensor device, the VR device, and the like are all one of the functional components of each module).

As shown in fig. 2, as another preferred embodiment of the present invention, the eye movement marking module 300 includes:

an eye tracking unit 301, configured to perform motion tracking on the eyeballs of the user through a sensing and image obtaining device, and generate a set of eye motion information of the user.

And a focusing calculation simulating unit 302, configured to perform motion analysis on a set of the eyeball motion information to generate gaze direction information, where the motion analysis is used to perform focusing calculation on a gaze direction of a user according to the eyeball motion information, and the gaze direction information is used to represent a gaze direction of an eyeball of the user.

And the scene focusing unit 303 is configured to perform focusing prompt on the corresponding scene content in the multimedia content according to the gazing direction information, where the focusing prompt is used to show an identification result of the system on the user gazing content to the user, and the scene content corresponds to the unique position information.

An object marking unit 304, configured to receive and mark the scene content of the focus hint in response to object marking information.

In this embodiment, the eye movement labeling module 300 is divided into more detailed functions and some terms are described, the eye movement tracking of the user can be realized by arranging a camera and a sensor in the VR device, and the focusing prompt here can be understood as a frame selection labeling of an object, etc., which is displayed in an output image of the VR device of the user, and the generation of the frame selection labeling changes along with the continuous adjustment change of the observation direction, etc., of the user.

As shown in fig. 3, as another preferred embodiment of the present invention, the content selection module 700 includes:

a tag obtaining unit 701, configured to access the cloud tag of the display scene through the cloud server.

A tag screening unit 702, configured to compare and screen the cloud tag according to the content of the tag, and generate a screening result, where a content coincidence rate of the cloud tag and the tag in the screening result is greater than or equal to a preset screening standard.

And a tag output unit 703 configured to obtain location information corresponding to the cloud tag in the screening result, and mark and update the display scene according to the location information.

In this embodiment, the content sorting module 700 is explained in more detail and the noun definition, where the cloud tag is determined by a preset sorting criterion during the sorting process, and a user may adjust the sorting criterion according to his own requirement and a sorting scheme for sorting results, for example, a matching sorting criterion that the tag overlap ratio reaches eighty percent, and then the matching tag is sorted and only displayed in a certain number in a circular range at the position of the display scene where the user is located, and the tag is hidden and displayed outside the range, which may be customized at will.

As shown in fig. 4, as another preferred embodiment of the present invention, the VR output module 100 includes:

the scene obtaining unit 101 is configured to obtain scene model data and a data distribution map of the display scene, where the scene model data is set corresponding to the data distribution map and is used to represent a spatial model and image information of the display scene at a certain position.

The motion simulation unit 102 is configured to generate an observed motion point location in the data distribution map, receive motion control information from a user, and control and update position information of the observed motion point location relative to the data distribution map according to the motion control information, where the observed motion point location is used to represent simulated observed position information of the user, and the observed motion point location includes direction information.

And the scene output unit 103 is configured to obtain the corresponding scene model data according to the observed motion point, render and output the scene model data.

In this embodiment, the VR scene output module 100 is further described, when a user uses VR equipment to visit a display scene, the content that the user can obtain and view at a certain time is limited, that is, the obtained content is a sector area of the human field of view, so that it is necessary to simulate the actual scene of the user visiting the park by establishing a view point moving along with the user control in the display scene, and update the position and orientation of the view point in the display scene by the motion simulation unit 102, so as to obtain and output scene model data to the VR equipment of the user.

As another preferred embodiment of the present invention, the eye movement marking module 300 further includes:

Further, the media analysis module 500 includes a subjective tag unit, and the content selection module 700 includes a subjective screening unit;

the subjective tag unit is used for extracting the content of the left message marking information to generate a subjective tag, and the subjective tag is uploaded and synchronized into a subjective cloud tag through the cloud server.

In this embodiment, message marking information and a subjective label are newly introduced, and the function of the message marking information and the subjective label can be equivalent to an evaluation message function, for example, after the user a marks the plant m, the message marking information is further set for sharing the feeling of the person who subsequently sees the mark, and meanwhile, the plant can be further described as a basis for generating the subjective label for further screening other similar landscape contents in the park, so as to expand the screening range.

As shown in fig. 4, the present invention also provides a multimedia data analysis method, including the following steps:

and acquiring and outputting multimedia content in real time, wherein the multimedia content is a local VR scene of a display scene.

The method comprises the steps of collecting and generating eyeball motion information of a user, analyzing and generating watching orientation information according to the eyeball motion information, receiving and responding object marking information, and marking corresponding scene content according to the multimedia content and the watching orientation information.

Extracting the marked scene content in the multimedia content, performing content identification analysis on the scene content, and generating a marking label according to an analysis result, wherein the marking label is used for representing the characteristic information of the scene content, and the marking label is synchronously updated through a cloud server.

As another preferred embodiment of the present invention, the step of collecting and generating eye movement information of a user, analyzing and generating gaze direction information according to the eye movement information, receiving and responding to object tagging information, and tagging corresponding scene content according to the multimedia content and the gaze direction information specifically includes:

and tracking the movement of the eyeballs of the user through sensing and image acquisition equipment to generate a group of eye movement information of the user.

And performing motion analysis on the eyeball motion information to generate gaze direction information, wherein the motion analysis is used for performing focusing and fitting calculation on the gaze direction of the user according to the eyeball motion information, and the gaze direction information is used for representing the gaze direction of the eyeballs of the user.

And carrying out focusing prompt on the corresponding scene content in the multimedia content according to the gazing direction information, wherein the focusing prompt is used for displaying the recognition result of the system on the gazing content of the user to the user.

As another preferred embodiment of the present invention, the step of comparing and screening the cloud tags of the display scene according to the labeled tags, and labeling the display scene according to the orientation information corresponding to the cloud tags in the comparison and screening result specifically includes:

and accessing the cloud label of the display scene through the cloud server.

And comparing and screening the cloud labels according to the content of the marked labels to generate a screening result, wherein the content coincidence rate of the cloud labels and the marked labels in the screening result is greater than or equal to a preset screening standard.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. Multimedia data analysis system is applicable to VR travel and VR exhibition scene, its characterized in that includes:

2. The multimedia data analysis system of claim 1, wherein the eye movement labeling module comprises:

3. The multimedia data analysis system of claim 2, wherein the content selection module comprises:

4. The multimedia data analysis system of claim 1, wherein the VR output module comprises:

5. The multimedia data analysis system of claim 2, wherein the eye movement labeling module further comprises:

6. The system of claim 5, wherein the media analysis module comprises a subjective tagging element and the content listing module comprises a subjective filtering element;

7. The multimedia data analysis method is characterized by comprising the following steps:

8. The method according to claim 7, wherein the step of collecting and generating eye movement information of the user, analyzing and generating gaze direction information according to the eye movement information, receiving and responding to object tagging information, and tagging corresponding scene content according to the multimedia content and the gaze direction information specifically comprises:

9. The multimedia data analysis method according to claim 8, wherein the step of comparing and screening the cloud tags of the display scene according to the tag tags and tagging the display scene according to the orientation information corresponding to the cloud tags in the comparison and screening result specifically comprises:

accessing a cloud tag of the display scene through the cloud server;