CN111768729A - VR scene automatic explanation method, system and storage medium - Google Patents

VR scene automatic explanation method, system and storage medium Download PDF

Info

Publication number
CN111768729A
CN111768729A CN201910263029.1A CN201910263029A CN111768729A CN 111768729 A CN111768729 A CN 111768729A CN 201910263029 A CN201910263029 A CN 201910263029A CN 111768729 A CN111768729 A CN 111768729A
Authority
CN
China
Prior art keywords
target object
picture
scene
target
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910263029.1A
Other languages
Chinese (zh)
Inventor
邓涛
周鹏
冀德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Chuansong Technology Co ltd
Original Assignee
Beijing Chuansong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Chuansong Technology Co ltd filed Critical Beijing Chuansong Technology Co ltd
Priority to CN201910263029.1A priority Critical patent/CN111768729A/en
Publication of CN111768729A publication Critical patent/CN111768729A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09FDISPLAYING; ADVERTISING; SIGNS; LABELS OR NAME-PLATES; SEALS
    • G09F25/00Audible advertising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09FDISPLAYING; ADVERTISING; SIGNS; LABELS OR NAME-PLATES; SEALS
    • G09F27/00Combined visual and audible advertising or displaying, e.g. for public address

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a VR scene automatic explanation method, a system and a storage medium, comprising: acquiring three-dimensional data of a target object in a VR scene to be explained and a category corresponding to the three-dimensional data, and training according to target characteristics of the three-dimensional data to obtain a scene content identification module; and acquiring a VR picture watched by the user in real time, inputting the VR picture into the scene content identification module, judging whether the target object exists in the VR picture, playing corresponding explanation content to the user according to the category of the target object if the target object exists in the VR picture, and continuously acquiring the VR picture watched by the user in real time if the target object exists in the VR picture, inputting the VR picture into the scene content identification module, and judging whether the target object exists in the VR picture. The method has no requirement on the manufacture of the original VR scene, and the VR picture is extracted on the premise of not changing the original VR application. The invention realizes scene classification by means of feature recognition or machine learning and the like.

Description

VR scene automatic explanation method, system and storage medium
Technical Field
The invention relates to the field of Virtual Reality (VR) and artificial intelligence, and particularly relates to a method, a system and a storage medium for automatically explaining VR scenes.
Background
Self-help commentary systems are already on the market, and are commonly used in large scenic spots or exhibition halls, such systems usually acquire the coordinate position of a user through a GPS positioning module or a radio frequency identification technology, and then play the pre-recorded voice content, for example, patent 01274768.8 discloses an electronic tour guide device, which realizes the positioning of the position through radio or infrared coding. The 200310110653.7 patent utilizes GPS location technology to locate the guest and then plays the corresponding audio material. Most of the self-help commentary systems are based on the real natural world, and the technologies are limited to the real world and are ineffective for VR scenes no matter radio frequency positioning or GPS positioning.
In recent years more and more VR applications are coming into view of people, such as VR games, VR exhibition halls/VR education/scenic spots or VR experience halls in home scenes, etc. In such VR applications, text prompts are often provided in a VR screen to help users experience, and the prompt information is solidified into a program during program design, so that the experience of all people is uniform and personalized is difficult to embody. Still other VR experiences may involve a worker sideways to the user some of the content in the current VR frame during the user experience. This type of experience greatly reduces the immersion of the VR.
For example, for an existing home decoration field sample house VR scene, a traditional solution is that when a client experiences the VR scene, a worker makes a targeted explanation for the client according to screen mapping picture information, if the client waits for 100 people a day, the worker needs to explain 100 times, which greatly increases communication cost, and another alternative is that the worker selects to record and play the explanation content in advance, and when the user experiences the VR, the corresponding recording is controlled and played, considering that the client may not experience according to a set sequence when experiencing the VR, so it is difficult to control a voice playing sequence well.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide an external self-help comment system applied to a VR scene, which can automatically identify a target according to a VR real-time picture and play corresponding voice content on the basis of not interfering the operation of the original VR scene, and different workers can record their own comment schemes for different styles or different articles in the scene.
Specifically, the invention discloses an automatic explanation method for VR scenes, which comprises the following steps:
step 1, acquiring three-dimensional data and corresponding categories of target objects in a VR scene to be explained, and training according to target characteristics of the three-dimensional data to obtain a scene content identification module;
step 2, obtaining a VR picture watched by a user in real time, inputting the VR picture to the scene content identification module, judging whether the target object exists in the VR picture, if so, executing step 3, otherwise, continuing to execute step 2;
and 3, playing the corresponding comment content to the user according to the type of the target object, and executing the step 2 after the comment content is played.
In the VR scene automatic explanation method, the training process of the scene content identification module in step 1 specifically includes:
and for the three-dimensional data of each target object, respectively capturing the target object pictures from multiple angles, capturing a target image as a template of the target object by using the maximum external frame of the target object in each picture, respectively obtaining SURF characteristic points and characteristic description vectors of the SURF characteristic points for each template, then storing the SURF characteristic points and the characteristic description vectors of the SURF characteristic points into a dictionary, recording all the target objects and the corresponding categories of each target object in the dictionary, and storing the dictionary as a judgment basis of the scene content identification module.
In the VR scene automatic explanation method, the process of the scene content identification module determining whether the target object exists in the VR picture in step 2 specifically includes:
and capturing an image in a visual central area of the VR picture as a target image, inputting all SURF feature points and feature description vectors of the target image into the scene content recognition module, calculating the number of feature matching between the feature description vectors of the target image and features of each category in the dictionary, and judging whether the target object exists in the VR picture according to the comparison between the number and a preset value.
In the VR scene automatic explanation method, the training process of the scene content identification module in step 1 specifically includes:
for each target object, intercepting multiple images of the target object from multiple angles and distances to form a sample set, expanding the number of the sample set by rotating and/or randomly changing the size and/or dithering the color and/or changing the brightness contrast of the sample image, dividing the sample set into a training sample set and a testing sample set according to a preset proportion, and generating a labeling data text;
training the training sample set by using a convolutional neural network model, changing the number of nodes of a full connection layer at the last layer of the convolutional neural network model into the number of target categories, verifying the training model by using a test sample set in the training process, and storing the convolutional neural network model as the scene content identification module after the verification is qualified;
the process of determining whether the target object exists in the VR picture by the scene content identification module in step 2 specifically includes:
and capturing an image in the visual center area of the VR picture as a target image, and sending the target image into the scene content identification module for identification and judgment to judge whether the target object exists in the VR picture.
The invention also discloses an automatic explanation system for the VR scene, which comprises:
the method comprises the following steps that a module 1 acquires three-dimensional data of a target object in a VR scene to be explained and a corresponding category of the target object, and a scene content recognition module is obtained according to target characteristics of the three-dimensional data;
the module 2 is used for acquiring a VR picture watched by a user in real time, inputting the VR picture to the scene content identification module, judging whether the target object exists in the VR picture, if so, executing the module 3, otherwise, continuing to execute the module 2;
and the module 3 plays the corresponding explanation content to the user according to the category of the target object, and executes the module 2 after the explanation content is played.
The VR scene automatic commentary system, wherein the training process of the scene content recognition module in the module 1 specifically includes:
and for the three-dimensional data of each target object, respectively capturing the target object pictures from multiple angles, capturing a target image as a template of the target object by using the maximum external frame of the target object in each picture, respectively obtaining SURF characteristic points and characteristic description vectors of the SURF characteristic points for each template, then storing the SURF characteristic points and the characteristic description vectors of the SURF characteristic points into a dictionary, recording all the target objects and the corresponding categories of each target object in the dictionary, and storing the dictionary as a judgment basis of the scene content identification module.
In the VR scene automatic explanation system, the process of determining whether the target object exists in the VR picture by the scene content identification module in the module 2 specifically includes:
and capturing an image in a visual central area of the VR picture as a target image, inputting all SURF feature points and feature description vectors of the target image into the scene content recognition module, calculating the number of feature matching between the feature description vectors of the target image and features of each category in the dictionary, and judging whether the target object exists in the VR picture according to the comparison between the number and a preset value.
The VR scene automatic commentary system, wherein the training process of the scene content recognition module in the module 1 specifically includes:
for each target object, intercepting multiple images of the target object from multiple angles and distances to form a sample set, expanding the number of the sample set by rotating and/or randomly changing the size and/or dithering the color and/or changing the brightness contrast of the sample image, dividing the sample set into a training sample set and a testing sample set according to a preset proportion, and generating a labeling data text;
training the training sample set by using a convolutional neural network model, changing the number of nodes of a full connection layer at the last layer of the convolutional neural network model into the number of target categories, verifying the training model by using a test sample set in the training process, and storing the convolutional neural network model as the scene content identification module after the verification is qualified;
the process of determining whether the target object exists in the VR picture by the scene content identification module in the module 2 specifically includes:
and capturing an image in the visual center area of the VR picture as a target image, and sending the target image into the scene content identification module for identification and judgment to judge whether the target object exists in the VR picture.
The invention also discloses an implementation method for the VR scene automatic commentary system.
The invention also discloses a storage medium for storing a program for executing the VR scene automatic comment method.
The invention can be applied to all the existing VR scenes, and has the technical advantages that:
(1) and extracting external pictures. The application of the method has no requirement on the manufacture of the original VR scene, and the method carries out VR picture extraction on the premise of not changing the original VR application.
(2) And intelligently identifying VR scenes. The invention realizes scene classification by means of feature recognition or machine learning and the like.
Drawings
Fig. 1 is an external VR self-help explanation system framework diagram.
Detailed Description
The invention realizes a real-time VR intelligent commentary system, which uses a computer vision technology to classify and identify objects in a real-time VR scene picture and then plays a corresponding commentary recording according to an identification result. The system comprises three modules, and the general working flow is as follows: (1) VR picture draws module. The VR picture is captured by a proper method, for example, the rendered VR picture can be captured by means of openVR SDK or openGL, a DirectX rendering engine and the like, and the picture displayed on a screen or glasses by a VR program can also be collected by a video capture card. (2) VR scene identification module. And (4) setting the number of scenes needing automatic explanation in the VR scene as N, and training a scene content identification module by extracting and integrating each target feature. And in the running process of the system, inputting the real-time intercepted VR picture into a scene classification model for scene recognition and classification. (3) And a voice recognition and playing module. And making a corresponding recording for each scene in advance, and calling and playing the corresponding recording according to the scene classification in the system operation process.
In order to make the aforementioned features and effects of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.
The external VR scene self-help explanation system mainly comprises three modules, namely a VR picture extraction module, a VR scene content recognition module and a voice playing module (see attached figure 1), and the detailed working flow of the system is as follows:
1. VR picture draws module. The existing VR applications are mostly operated at a computer end, and after the VR scene contents are rendered at the computer end, the frames are transmitted to a VR helmet for display in an HDMI or wireless mode. On the premise of not interfering the original VR application, there are various means for extracting the VR picture. The common means are as follows: a. and acquiring the picture at the current moment by using an HDMI deconcentrator, and transmitting the picture back to the computer by using a video acquisition card. b. Programming HDMI drive, and directly capturing VR picture from computer HDMI signal. c. For some VR applications with desktop mapping functions, a current screen can be directly subjected to screenshot to obtain a VR image. D. For VR applications using some common framework designs (such as openVR), VR pictures can be obtained through SDK through these. The acquired VR picture is transmitted to the scene content identification module, and in consideration of the fact that in practical application, a user cannot frequently switch pictures at a high speed, the VR picture is captured at a time interval of 1-3 seconds.
2. VR scene content identification module. The module is used for analyzing scene content, and identifying and classifying objects in the picture, such as objects in the visual center range. Object classification recognition through image content is based on machine vision methods in many cases. The invention provides two solutions, and supposes that a certain VR scene has M target objects needing to be identified and explained.
And 2.1, an object identification scheme based on image features. The specific implementation process is as follows:
and respectively intercepting a picture from 8 directions for each target object, and intercepting a target image as a template of the target object by using the maximum external frame of the target object in each picture. The SURF (speedUp RobustFeatures) feature points and the feature description vectors thereof are respectively obtained for each target template, the robust feature SURF is accelerated, and the method is a robust local feature point detection and description algorithm. The method is an improvement on a Scale-invariant feature transform (SIFT) algorithm, and the improved method has the main advantages of higher speed and more suitability for real-time feature inspection. The feature points are points of interest in the image, and the feature description vectors are feature descriptions of the neighbors of the feature points. Assuming that there are two feature vectors V1 and V2 from the two images I2 and I2, respectively, if the vectors are close in distance under the metric function, then their corresponding feature points are considered as matching success.
Thus each object class corresponds to 8 feature description vectors fi1,fi2,...,fi8Then storing the information into a dictionary D, wherein the dictionary D has M elements, and the key of each element corresponds to the classification C of one objectiThe values correspond to an array consisting of feature description vectors of all feature points in the 8 template images under the classification:
D={Ci:Fi}
Fi=[fi1,fi2,...,fi8]
wherein the dictionary is one kindThe key-value pairs are the data structures of the basic elements used for retrieval. The key is an index and the value is data. D is a dictionary whose key-value pairs are { Ci:FiIn which C isiBelongs to the bond, FiBelonging to the value i ∈ [0, M), the letter i representing the ith object, CiIs the category of the ith object, FiIs the feature vector array of the ith object, fi1To fi8Are 8 feature vector values. If there are M total target objects, D has M key-value pairs, each object corresponding to a D key-value pair { Ci:Fi}. Duplicate classes may occur in the M objects, but are described here using set symbols, with emphasis on indicating elements belonging to a set, without regard to the data structure hierarchy.
For the VR picture extracted by the VR picture extraction module, firstly, an image in a visual central area is captured as a target image, and then all SURF feature points and feature description vectors F are obtainedcurr
Using a brute force matching algorithm, calculate FcurrAnd (4) sorting the number of the feature points matched with the features in the template images of all the classifications in the D set according to the descending order of the number, and taking the classification corresponding to the image as a target identification classification result if the number of the most matched points exceeds N (N is generally 10-30). Otherwise, the current scene is considered to have no target object. The time complexity of the scheme is in a positive linear relation with the number of target classes in the scene, so the scheme is more suitable for VR scenes with less target classes.
2.2, a target identification scheme based on machine learning.
(1) And making training sample data. For each target object, not less than L (L is generally more than 10000) sub-images need to be cut from different angles and different distances to serve as sample images. The sample set number is then extended to 2L by rotation/random size transformation/color dithering/luminance contrast transformation. And (4) pressing a sample set as 5: the proportion of 1 is divided into a training sample set and a testing sample set, and a labeling data text is generated.
(2) And training a target classification model. Training a training sample set by using a classic convolutional neural network model resNet-CNN, changing the number of nodes of a full connection layer of the last layer into M (consistent with the number of target classes), and in the training process, adjusting network parameters to achieve the optimal classification accuracy. And (3) verifying the training model through the test sample set, and when the target classification accuracy reaches more than 90%, considering that the trained model is available, otherwise, continuously adjusting and optimizing the network structure.
(3) And (4) classifying and identifying the target. And sending each intercepted VR frame into a trained target classification model for recognition and outputting the classification.
Compared with an object recognition scheme based on image features, the scheme can support scenes with more classifications, and the computational complexity does not increase with the increase of the number of classifications.
3. And a voice playing module.
For each type of object, the user may record a piece of speech material as the narration speech. Storing all classified voices in a specified directory, and triggering the voice module to play when a target object in a VR scene appears in a visual center of a user for more than 2 seconds.
For the existing VR application, under the condition that the operation of the original application is not interfered, a VR picture can be conveniently extracted, content recognition is carried out according to picture content, and then voice content of a personal phone is played according to a recognition result. The scheme can be applied to virtual home decoration or a virtual exhibition hall, workers can record different voice contents for different user groups, and when the users experience VR, the system plays personalized explanation contents according to information such as age, sex and identity of the users, and personalized marketing effect can be achieved.
The following are method examples corresponding to the above system examples, and this embodiment mode can be implemented in cooperation with the above embodiment modes. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.
The invention also discloses an automatic explanation system for the VR scene, which comprises:
the method comprises the following steps that a module 1 acquires three-dimensional data of a target object in a VR scene to be explained and a corresponding category of the target object, and a scene content recognition module is obtained according to target characteristics of the three-dimensional data;
the module 2 is used for acquiring a VR picture watched by a user in real time, inputting the VR picture to the scene content identification module, judging whether the target object exists in the VR picture, if so, executing the module 3, otherwise, continuing to execute the module 2;
and the module 3 plays the corresponding explanation content to the user according to the category of the target object, and executes the module 2 after the explanation content is played.
The VR scene automatic commentary system, wherein the training process of the scene content recognition module in the module 1 specifically includes:
and for the three-dimensional data of each target object, respectively capturing the target object pictures from multiple angles, capturing a target image as a template of the target object by using the maximum external frame of the target object in each picture, respectively obtaining SURF characteristic points and characteristic description vectors of the SURF characteristic points for each template, then storing the SURF characteristic points and the characteristic description vectors of the SURF characteristic points into a dictionary, recording all the target objects and the corresponding categories of each target object in the dictionary, and storing the dictionary as a judgment basis of the scene content identification module.
In the VR scene automatic explanation system, the process of determining whether the target object exists in the VR picture by the scene content identification module in the module 2 specifically includes:
and capturing an image in a visual central area of the VR picture as a target image, inputting all SURF feature points and feature description vectors of the target image into the scene content recognition module, calculating the number of feature matching between the feature description vectors of the target image and features of each category in the dictionary, and judging whether the target object exists in the VR picture according to the comparison between the number and a preset value.
The VR scene automatic commentary system, wherein the training process of the scene content recognition module in the module 1 specifically includes:
for each target object, intercepting multiple images of the target object from multiple angles and distances to form a sample set, expanding the number of the sample set by rotating and/or randomly changing the size and/or dithering the color and/or changing the brightness contrast of the sample image, dividing the sample set into a training sample set and a testing sample set according to a preset proportion, and generating a labeling data text;
training the training sample set by using a convolutional neural network model, changing the number of nodes of a full connection layer at the last layer of the convolutional neural network model into the number of target categories, verifying the training model by using a test sample set in the training process, and storing the convolutional neural network model as the scene content identification module after the verification is qualified;
the process of determining whether the target object exists in the VR picture by the scene content identification module in the module 2 specifically includes:
and capturing an image in the visual center area of the VR picture as a target image, and sending the target image into the scene content identification module for identification and judgment to judge whether the target object exists in the VR picture.
The invention also discloses an implementation method for the VR scene automatic commentary system.
The invention also discloses a storage medium for storing a program for executing the VR scene automatic comment method.

Claims (10)

1. An automatic explanation method for VR scenes is characterized by comprising the following steps:
step 1, acquiring three-dimensional data and corresponding categories of target objects in a VR scene to be explained, and training according to target characteristics of the three-dimensional data to obtain a scene content identification module;
step 2, obtaining a VR picture watched by a user in real time, inputting the VR picture to the scene content identification module, judging whether the target object exists in the VR picture, if so, executing step 3, otherwise, continuing to execute step 2;
and 3, playing the corresponding comment content to the user according to the type of the target object, and executing the step 2 after the comment content is played.
2. The VR scene automatic interpretation method of claim 1, wherein the training process of the scene content recognition module in step 1 specifically includes:
and for the three-dimensional data of each target object, respectively capturing the target object pictures from multiple angles, capturing a target image as a template of the target object by using the maximum external frame of the target object in each picture, respectively obtaining SURF characteristic points and characteristic description vectors of the SURF characteristic points for each template, then storing the SURF characteristic points and the characteristic description vectors of the SURF characteristic points into a dictionary, recording all the target objects and the corresponding categories of each target object in the dictionary, and storing the dictionary as a judgment basis of the scene content identification module.
3. The VR scene automatic interpretation method of claim 2, wherein the process of the scene content recognition module determining whether the target object exists in the VR frame in step 2 specifically includes:
and capturing an image in a visual central area of the VR picture as a target image, inputting all SURF feature points and feature description vectors of the target image into the scene content recognition module, calculating the number of feature matching between the feature description vectors of the target image and features of each category in the dictionary, and judging whether the target object exists in the VR picture according to the comparison between the number and a preset value.
4. The VR scene automatic interpretation method of claim 1, wherein the training process of the scene content recognition module in step 1 specifically includes:
for each target object, intercepting multiple images of the target object from multiple angles and distances to form a sample set, expanding the number of the sample set by rotating and/or randomly changing the size and/or dithering the color and/or changing the brightness contrast of the sample image, dividing the sample set into a training sample set and a testing sample set according to a preset proportion, and generating a labeling data text;
training the training sample set by using a convolutional neural network model, changing the number of nodes of a full connection layer at the last layer of the convolutional neural network model into the number of target categories, verifying the training model by using a test sample set in the training process, and storing the convolutional neural network model as the scene content identification module after the verification is qualified;
the process of determining whether the target object exists in the VR picture by the scene content identification module in step 2 specifically includes:
and capturing an image in the visual center area of the VR picture as a target image, and sending the target image into the scene content identification module for identification and judgment to judge whether the target object exists in the VR picture.
5. An automatic explanation system for a VR scene, comprising:
the method comprises the following steps that a module 1 acquires three-dimensional data of a target object in a VR scene to be explained and a corresponding category of the target object, and a scene content recognition module is obtained according to target characteristics of the three-dimensional data;
the module 2 is used for acquiring a VR picture watched by a user in real time, inputting the VR picture to the scene content identification module, judging whether the target object exists in the VR picture, if so, executing the module 3, otherwise, continuing to execute the module 2;
and the module 3 plays the corresponding explanation content to the user according to the category of the target object, and executes the module 2 after the explanation content is played.
6. The VR scene automatic narration system of claim 5, wherein the training process of the scene content recognition module in module 1 specifically comprises:
and for the three-dimensional data of each target object, respectively capturing the target object pictures from multiple angles, capturing a target image as a template of the target object by using the maximum external frame of the target object in each picture, respectively obtaining SURF characteristic points and characteristic description vectors of the SURF characteristic points for each template, then storing the SURF characteristic points and the characteristic description vectors of the SURF characteristic points into a dictionary, recording all the target objects and the corresponding categories of each target object in the dictionary, and storing the dictionary as a judgment basis of the scene content identification module.
7. The VR scene automatic narration system of claim 6, wherein the process of the scene content recognition module in module 2 determining whether the target object exists in the VR frame specifically includes:
and capturing an image in a visual central area of the VR picture as a target image, inputting all SURF feature points and feature description vectors of the target image into the scene content recognition module, calculating the number of feature matching between the feature description vectors of the target image and features of each category in the dictionary, and judging whether the target object exists in the VR picture according to the comparison between the number and a preset value.
8. The VR scene automatic narration system of claim 5, wherein the training process of the scene content recognition module in module 1 specifically comprises:
for each target object, intercepting multiple images of the target object from multiple angles and distances to form a sample set, expanding the number of the sample set by rotating and/or randomly changing the size and/or dithering the color and/or changing the brightness contrast of the sample image, dividing the sample set into a training sample set and a testing sample set according to a preset proportion, and generating a labeling data text;
training the training sample set by using a convolutional neural network model, changing the number of nodes of a full connection layer at the last layer of the convolutional neural network model into the number of target categories, verifying the training model by using a test sample set in the training process, and storing the convolutional neural network model as the scene content identification module after the verification is qualified;
the process of determining whether the target object exists in the VR picture by the scene content identification module in the module 2 specifically includes:
and capturing an image in the visual center area of the VR picture as a target image, and sending the target image into the scene content identification module for identification and judgment to judge whether the target object exists in the VR picture.
9. An implementation method for the VR scene automatic narration system of any one of claims 5 to 8.
10. A storage medium storing a program for executing the VR scene automatic interpretation method of any one of claims 1 to 4.
CN201910263029.1A 2019-04-02 2019-04-02 VR scene automatic explanation method, system and storage medium Pending CN111768729A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910263029.1A CN111768729A (en) 2019-04-02 2019-04-02 VR scene automatic explanation method, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910263029.1A CN111768729A (en) 2019-04-02 2019-04-02 VR scene automatic explanation method, system and storage medium

Publications (1)

Publication Number Publication Date
CN111768729A true CN111768729A (en) 2020-10-13

Family

ID=72718525

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910263029.1A Pending CN111768729A (en) 2019-04-02 2019-04-02 VR scene automatic explanation method, system and storage medium

Country Status (1)

Country Link
CN (1) CN111768729A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113209640A (en) * 2021-07-09 2021-08-06 腾讯科技(深圳)有限公司 Comment generation method, device, equipment and computer-readable storage medium
CN113449122A (en) * 2021-07-09 2021-09-28 广州浩传网络科技有限公司 Method and device for generating explanation content of three-dimensional scene graph

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113209640A (en) * 2021-07-09 2021-08-06 腾讯科技(深圳)有限公司 Comment generation method, device, equipment and computer-readable storage medium
CN113449122A (en) * 2021-07-09 2021-09-28 广州浩传网络科技有限公司 Method and device for generating explanation content of three-dimensional scene graph
CN113449122B (en) * 2021-07-09 2023-01-17 广州浩传网络科技有限公司 Method and device for generating explanation content of three-dimensional scene graph

Similar Documents

Publication Publication Date Title
WO2021238631A1 (en) Article information display method, apparatus and device and readable storage medium
CN112565825B (en) Video data processing method, device, equipment and medium
CN109522815B (en) Concentration degree evaluation method and device and electronic equipment
CN110166827B (en) Video clip determination method and device, storage medium and electronic device
CN109145784B (en) Method and apparatus for processing video
CN110519636B (en) Voice information playing method and device, computer equipment and storage medium
CN108304793B (en) Online learning analysis system and method
CN110119700B (en) Avatar control method, avatar control device and electronic equipment
CN108198130B (en) Image processing method, image processing device, storage medium and electronic equipment
CN106060572A (en) Video playing method and device
CN113395542B (en) Video generation method and device based on artificial intelligence, computer equipment and medium
CN107316035A (en) Object identifying method and device based on deep learning neutral net
CN113709384A (en) Video editing method based on deep learning, related equipment and storage medium
CN107911643B (en) Method and device for showing scene special effect in video communication
Moghimi et al. Experiments on an rgb-d wearable vision system for egocentric activity recognition
CN116484318B (en) Lecture training feedback method, lecture training feedback device and storage medium
CN110263220A (en) A kind of video highlight segment recognition methods and device
CN111491187A (en) Video recommendation method, device, equipment and storage medium
CN110868554B (en) Method, device and equipment for changing faces in real time in live broadcast and storage medium
CN111444826A (en) Video detection method and device, storage medium and computer equipment
CN113766330A (en) Method and device for generating recommendation information based on video
CN111432206A (en) Video definition processing method and device based on artificial intelligence and electronic equipment
CN113705510A (en) Target identification tracking method, device, equipment and storage medium
CN111768729A (en) VR scene automatic explanation method, system and storage medium
CN111724199A (en) Intelligent community advertisement accurate delivery method and device based on pedestrian active perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201013