CN111797873A - Scene recognition method and device, storage medium and electronic equipment - Google Patents

Scene recognition method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN111797873A
CN111797873A CN201910282441.8A CN201910282441A CN111797873A CN 111797873 A CN111797873 A CN 111797873A CN 201910282441 A CN201910282441 A CN 201910282441A CN 111797873 A CN111797873 A CN 111797873A
Authority
CN
China
Prior art keywords
preset
scene
feature vector
data
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910282441.8A
Other languages
Chinese (zh)
Inventor
何明
陈仲铭
李姬俊男
刘耀勇
陈岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN201910282441.8A priority Critical patent/CN111797873A/en
Publication of CN111797873A publication Critical patent/CN111797873A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/12Details of telephonic subscriber devices including a sensor for measuring a physical value, e.g. temperature or motion

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a scene identification method, a scene identification device, a storage medium and electronic equipment, wherein the scene identification method comprises the following steps: acquiring perception data of a current scene; acquiring a feature vector according to the perception data; sequentially calculating the similarity between the feature vector and each preset feature vector in a plurality of preset feature vectors to obtain a plurality of similarity values; determining a target feature vector from the preset feature vectors; and determining a preset scene corresponding to the target characteristic vector as a current scene. In the scene identification method provided by the embodiment of the application, the electronic device can acquire the feature vector of the current scene according to the sensing data of the current scene, determine the target feature vector according to the similarity values of the feature vector and the preset feature vectors, and determine the preset scene corresponding to the target feature vector as the current scene, so that the current scene is identified, and the electronic device can perform intelligent operation on the current scene conveniently.

Description

Scene recognition method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of electronic technologies, and in particular, to a scene recognition method and apparatus, a storage medium, and an electronic device.
Background
With the development of electronic technology, electronic devices such as smart phones are capable of providing more and more services to users. For example, the electronic device may provide social services, navigation services, travel recommendation services, and the like for the user. In order to provide targeted and personalized services for users, the electronic device needs to identify the scene where the user is located, and then provide the services for the user based on the identified scene.
Disclosure of Invention
The embodiment of the application provides a scene identification method, a scene identification device, a storage medium and electronic equipment, which can identify a current scene through the electronic equipment.
The embodiment of the application provides a scene identification method, which comprises the following steps:
acquiring perception data of a current scene;
acquiring a feature vector of the current scene according to the perception data;
sequentially calculating the similarity between the feature vector and each preset feature vector in a plurality of preset feature vectors to obtain a plurality of similarity values, wherein each preset feature vector corresponds to one preset scene;
determining a target feature vector from the preset feature vectors according to the similarity values;
and determining a preset scene corresponding to the target characteristic vector as a current scene.
An embodiment of the present application further provides a scene recognition apparatus, including:
the first acquisition module is used for acquiring the perception data of the current scene;
the second acquisition module is used for acquiring the feature vector of the current scene according to the perception data;
the calculation module is used for calculating the similarity between the feature vector and each preset feature vector in a plurality of preset feature vectors in sequence to obtain a plurality of similarity values, wherein each preset feature vector corresponds to one preset scene;
a first determining module, configured to determine a target feature vector from the plurality of preset feature vectors according to the plurality of similarity values;
and the second determining module is used for determining the preset scene corresponding to the target characteristic vector as the current scene.
An embodiment of the present application further provides a storage medium, where a computer program is stored in the storage medium, and when the computer program runs on a computer, the computer is enabled to execute the above scene recognition method.
The embodiment of the present application further provides an electronic device, which includes a processor and a memory, where the memory stores a computer program, and the processor is configured to execute the scene recognition method by calling the computer program stored in the memory.
The scene recognition method provided by the embodiment of the application comprises the following steps: acquiring perception data of a current scene; acquiring a feature vector of the current scene according to the perception data; sequentially calculating the similarity between the feature vector and each preset feature vector in a plurality of preset feature vectors to obtain a plurality of similarity values, wherein each preset feature vector corresponds to one preset scene; determining a target feature vector from the preset feature vectors according to the similarity values; and determining a preset scene corresponding to the target characteristic vector as a current scene. In the scene identification method, the electronic device can acquire the feature vector of the current scene according to the sensing data of the current scene, determine the target feature vector according to the similarity values of the feature vector and the preset feature vectors, and determine the preset scene corresponding to the target feature vector as the current scene, so as to realize identification of the current scene, and thus, the electronic device can perform intelligent operation on the current scene conveniently.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 is a schematic view of an application scenario of a scenario identification method according to an embodiment of the present application.
Fig. 2 is a schematic flowchart of a first method for scene recognition according to an embodiment of the present disclosure.
Fig. 3 is a schematic flowchart of a second method for scene recognition according to an embodiment of the present disclosure.
Fig. 4 is a third flowchart illustrating a scene recognition method according to an embodiment of the present application.
Fig. 5 is a fourth flowchart illustrating a scene recognition method according to an embodiment of the present application.
Fig. 6 is a fifth flowchart illustrating a scene recognition method according to an embodiment of the present application.
Fig. 7 is a sixth flowchart illustrating a scene recognition method according to an embodiment of the present application.
Fig. 8 is a schematic structural diagram of a first scene recognition device according to an embodiment of the present application.
Fig. 9 is a schematic structural diagram of a second scene recognition device according to an embodiment of the present application.
Fig. 10 is a schematic structural diagram of a first electronic device according to an embodiment of the present application.
Fig. 11 is a second structural schematic diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without inventive step, are within the scope of the present application.
Referring to fig. 1, fig. 1 is a schematic view of an application scenario of a scenario identification method provided in an embodiment of the present application. The scene recognition method is applied to electronic equipment. A panoramic perception framework is arranged in the electronic equipment. The panoramic perception architecture is an integration of hardware and software for implementing the scene recognition method in an electronic device.
The panoramic perception architecture comprises an information perception layer, a data processing layer, a feature extraction layer, a scene modeling layer and an intelligent service layer.
The information perception layer is used for acquiring information of the electronic equipment or information in an external environment. The information-perceiving layer may include a plurality of sensors. For example, the information sensing layer includes a plurality of sensors such as a distance sensor, a magnetic field sensor, a light sensor, an acceleration sensor, a fingerprint sensor, a hall sensor, a position sensor, a gyroscope, an inertial sensor, an attitude sensor, an image sensor, and an audio sensor.
Among other things, a distance sensor may be used to detect a distance between the electronic device and an external object. The magnetic field sensor may be used to detect magnetic field information of the environment in which the electronic device is located. The light sensor can be used for detecting light information of the environment where the electronic equipment is located. The acceleration sensor may be used to detect acceleration data of the electronic device. The fingerprint sensor may be used to collect fingerprint information of a user. The Hall sensor is a magnetic field sensor manufactured according to the Hall effect, and can be used for realizing automatic control of electronic equipment. The location sensor may be used to detect the geographic location where the electronic device is currently located. Gyroscopes may be used to detect angular velocity of an electronic device in various directions. Inertial sensors may be used to detect motion data of an electronic device. The gesture sensor may be used to sense gesture information of the electronic device. An image sensor, which may be, for example, a camera, may be used to capture images of the surrounding environment. An audio sensor, which may be a microphone, for example, may be used to capture sound signals in the surrounding environment.
And the data processing layer is used for processing the data acquired by the information perception layer. For example, the data processing layer may perform data cleaning, data integration, data transformation, data reduction, and the like on the data acquired by the information sensing layer.
The data cleaning refers to cleaning a large amount of data acquired by the information sensing layer to remove invalid data and repeated data. The data integration refers to integrating a plurality of single-dimensional data acquired by the information perception layer into a higher or more abstract dimension so as to comprehensively process the data of the plurality of single dimensions. The data transformation refers to performing data type conversion or format conversion on the data acquired by the information sensing layer so that the transformed data can meet the processing requirement. The data reduction means that the data volume is reduced to the maximum extent on the premise of keeping the original appearance of the data as much as possible.
The characteristic extraction layer is used for extracting characteristics of the data processed by the data processing layer so as to extract the characteristics included in the data. The extracted features may reflect the state of the electronic device itself or the state of the user or the environmental state of the environment in which the electronic device is located, etc.
The feature extraction layer may extract features or process the extracted features by a method such as a filtering method, a packing method, or an integration method.
The filtering method is to filter the extracted features to remove redundant feature data. Packaging methods are used to screen the extracted features. The integration method is to integrate a plurality of feature extraction methods together to construct a more efficient and more accurate feature extraction method for extracting features.
The scene modeling layer is used for building a model according to the features extracted by the feature extraction layer, and the obtained model can be used for representing the state of the electronic equipment, the state of a user, the environment state and the like. For example, the scenario modeling layer may construct a key value model, a pattern identification model, a graph model, an entity relation model, an object-oriented model, and the like according to the features extracted by the feature extraction layer.
The intelligent service layer is used for providing intelligent services for the user according to the model constructed by the scene modeling layer. For example, the intelligent service layer can provide basic application services for users, perform system intelligent optimization for electronic equipment, and provide personalized intelligent services for users.
In addition, the panoramic perception architecture can further comprise a plurality of algorithms, each algorithm can be used for analyzing and processing data, and the plurality of algorithms can form an algorithm library. For example, the algorithm library may include algorithms such as a markov algorithm, a hidden dirichlet distribution algorithm, a bayesian classification algorithm, a word vector, a K-means clustering algorithm, a K-nearest neighbor algorithm, a cosine similarity algorithm, a residual error network, a long-short term memory network, a convolutional neural network, and a recurrent neural network.
The embodiment of the application provides a scene identification method, which can be applied to electronic equipment. The electronic device may be a smartphone, a tablet computer, a gaming device, an AR (Augmented Reality) device, an automobile, a data storage device, an audio playback device, a video playback device, a laptop computer, a desktop computing device, a wearable device such as an electronic watch, an electronic glasses, an electronic helmet, an electronic bracelet, an electronic necklace, an electronic garment, or the like.
Referring to fig. 2, fig. 2 is a schematic flowchart of a first method for scene recognition according to an embodiment of the present disclosure. The scene recognition method comprises the following steps:
and 110, acquiring the perception data of the current scene.
The electronic device may obtain perceptual data of a current scene. The current scene is a scene of an environment where the electronic device is currently located, that is, a scene of an environment where a user of the electronic device is currently located. It should be noted that, since the electronic device identifies the current scene through the acquired perceptual data, the current scene is an unknown scene for the electronic device.
The electronic device can acquire the perception data of the current scene through the information perception layer in the panoramic perception architecture. The perception data may comprise arbitrary data. For example, the sensing data may include various data such as temperature, humidity, ambient light intensity, image information, audio information, and the like.
For example, the current scene may be a conference scene. The perception data acquired by the electronic device may include a plurality of image information, a plurality of sound information, and the like in the conference scene.
And 120, acquiring a feature vector of the current scene according to the perception data.
After the electronic device acquires the sensing data of the current scene, the feature vector of the current scene can be acquired according to the sensing data. Wherein the feature vector may comprise a plurality of features. The feature vector is used to quantize the current scene so that the current scene can be represented by the feature vector.
For example, the feature vector may be P (a, B, C). Wherein A, B, C each represent a feature, e.g., a may represent an image feature, B may represent an audio feature, and C may represent a text feature. The situation of the current scene can be represented by the feature vector P (A, B, C).
And 130, sequentially calculating the similarity between the feature vector and each preset feature vector in a plurality of preset feature vectors to obtain a plurality of similarity values, wherein each preset feature vector corresponds to one preset scene.
A plurality of preset feature vectors may be preset in the electronic device, and each preset feature vector corresponds to one preset scene. That is, each preset feature vector is used to represent a preset scene. It will be appreciated that the preset scenario is a scenario known to the electronic device.
For example, the preset feature vector P may be preset in the electronic device1、P2、P3And the like. Wherein, P1May correspond to a conference scenario, P2Can correspond to a restaurant scenario, P3May correspond to a subway scene. That is, P1For representing conference scenes, P2For representing restaurant scenes, P3For representing a subway scene.
It should be noted that the preset feature vectors set in the electronic device may be a large number of vectors, for example, 100 preset feature vectors may be set in the electronic device. Therefore, a large number of preset scenes can be preset in the electronic equipment, and each preset scene can be a specific scene in the life of the user.
After the electronic equipment obtains the feature vector of the current scene, the similarity between the feature vector and each preset feature vector in a plurality of preset feature vectors is calculated in sequence, and a plurality of similarity values are obtained. The greater the similarity between the feature vector and a preset feature vector, the more similar the feature vector and the preset feature vector, that is, the more similar the current scene and the preset scene corresponding to the preset feature vector.
For example, after acquiring a feature vector P of a current scene, the electronic device sequentially calculates the feature vectors P and P1Similarity of (2) N1P and P2Similarity of (2) N2P and P3Similarity of (2) N3. Wherein N is1Denotes P and P1The similarity between the current scene and P can also be understood as1Similarity between corresponding preset scenes; n is a radical of2Denotes P and P2The similarity between the current scene and P can also be understood as2Similarity between corresponding preset scenes; n is a radical of3Denotes P and P3The similarity between the current scene and P can also be understood as3Similarity between corresponding preset scenes.
140, determining a target feature vector from the preset feature vectors according to the similarity values.
After the electronic device obtains the plurality of similarity values, a target feature vector can be determined from the plurality of preset feature vectors according to the plurality of similarity values. The target feature vector is the feature vector with the largest similarity with the feature vector of the current scene in the preset feature vectors. That is, the similarity between the scene represented by the target feature vector and the current scene is the highest.
And 150, determining a preset scene corresponding to the target feature vector as a current scene.
After the electronic device determines the target feature vector, a preset scene corresponding to the target feature vector can be determined as a current scene, so that the current scene is identified. Subsequently, the electronic device may perform an intelligent operation according to the identified scene, for example, the electronic device may automatically perform a mode switch, automatically adjust screen brightness, or provide an intelligent suggestion for the current scene to the user.
For example, the electronic device determines the target feature vector as P3,P3And if the corresponding preset scene is a conference scene, the electronic equipment determines the conference scene as a current scene. Subsequently, the electronic device may perform an intelligent operation according to the determined scene, for example, the electronic device may automatically switch to a mute mode.
In the embodiment of the application, the electronic device can acquire the feature vector of the current scene according to the sensing data of the current scene, determine the target feature vector according to the similarity values of the feature vector and the preset feature vectors, and determine the preset scene corresponding to the target feature vector as the current scene, so that the current scene is identified, and the electronic device can perform intelligent operation on the current scene conveniently.
In some embodiments, referring to fig. 3, fig. 3 is a second flowchart illustrating a scene recognition method according to an embodiment of the present disclosure.
Step 120, obtaining the feature vector of the current scene according to the perception data, including the following steps:
121, selecting a corresponding feature extraction model according to the data type of the perception data;
122, extracting data features from the perception data through the feature extraction model;
and 123, acquiring a feature vector of the current scene according to the data features.
A plurality of feature extraction models may be preset in the electronic device, and each feature extraction model is used for performing feature extraction on one type of data. For example, a convolutional neural network model, a recurrent neural network model, a word vector model, or the like may be set in advance in the electronic device. The convolutional neural network model is used for processing the image data so as to extract image features from the image data; the recurrent neural network model is used for processing the audio data so as to extract audio features from the audio data; the word vector model is used for processing the text data to extract text features from the text data.
After the electronic equipment acquires the perception data of the current scene, the corresponding feature extraction model can be selected according to the data type of the perception data. When the perception data comprises a plurality of data types, the electronic device can select a corresponding feature extraction model according to each data type.
And then, the electronic equipment extracts data features from the perception data through the selected feature extraction model and obtains the feature vector of the current scene according to the data features. For example, the electronic device may combine the extracted data features to obtain a feature vector for the current scene.
In some embodiments, referring to fig. 4, fig. 4 is a third flowchart illustrating a scene recognition method according to an embodiment of the present application.
Step 122, extracting data features from the perception data through the feature extraction model, including the following steps:
1221, extracting image features from the image data through a convolutional neural network model;
1222 extracting audio features from the audio data through a recurrent neural network model or a long-short term memory network model;
1223, extracting text features from the text data through a word vector model;
step 123, obtaining the feature vector of the current scene according to the data features, including the following steps:
1231, obtaining the feature vector of the current scene according to the image feature, the audio feature and the text feature.
The data type of the perception data acquired by the electronic equipment comprises image data, audio data and text data. The feature extraction model selected by the electronic device for image data may be a convolutional neural network model, the feature extraction model selected for audio data may be a recurrent neural network model or a long-short term memory network model, and the feature extraction model selected for text data may be a word vector model.
Subsequently, the electronic device may extract image features from the image data through a convolutional neural network model, audio features from the audio data through a recursive neural network model or a long-short term memory network model, and text features from the text data through a word vector model.
And then, the electronic equipment acquires a feature vector of the current scene according to the extracted image feature, audio feature and text feature.
For example, the image feature extracted by the electronic device may be a, the audio feature may be B, and the text feature may be C. Subsequently, the electronic device may stitch the extracted image features, audio features, and text features to obtain a feature vector P (a, B, C) of the current scene.
In some embodiments, the electronic device may further perform feature extraction on the obtained image features, audio features, and text features again to obtain new image features, new audio features, and new text features, and obtain a feature vector of the current scene according to the new image features, the new audio features, and the new text features.
For example, after the electronic device obtains the image feature a, the audio feature B, and the text feature C, feature extraction may be performed on the image feature a, the audio feature B, and the text feature C again in sequence by using a statistical method to obtain a new image feature a1New audio feature B1New text feature C1And A is1、B1、C1Splicing to obtain a feature vector P (A) of the current scene1,B1,C1)。
In some embodiments, referring to fig. 5, fig. 5 is a fourth flowchart illustrating a scene recognition method according to an embodiment of the present application.
Step 130, calculating the similarity between the feature vector and each preset feature vector in a plurality of preset feature vectors in sequence to obtain a plurality of similarity values, including the following steps:
131, calculating cosine similarity between the feature vector and each preset feature vector by sequentially adopting a cosine similarity algorithm to obtain a plurality of cosine similarity values;
132, determining the cosine similarity value between the feature vector and each of the predetermined feature vectors as the similarity value between the feature vector and the predetermined feature vector to obtain a plurality of similarity values.
The electronic device may sequentially calculate the cosine similarity between the feature vector and each of the preset feature vectors by using a cosine similarity algorithm, so as to obtain a plurality of cosine similarity values.
Wherein, the value range of the cosine similarity value is [ 1, 1 ]. The cosine similarity value of 1 indicates that the directions of the two vectors are the same, the cosine similarity value of 0 indicates that the two vectors are independent of each other, and the cosine similarity value of-1 indicates that the directions of the two vectors are opposite. The closer the cosine similarity value is to 1, the closer the directions of the two vectors are.
For example, the electronic device obtains a feature vector of a current scene as P, and the preset feature vector includes P1、P2、P3Then the electronic equipment calculates P and P by using a cosine similarity algorithm in sequence1、P2、P3The cosine similarity of P and P is obtained1Cosine similarity value K of1P and P2Cosine similarity value K of2P and P3Cosine similarity value K of3
Then, the electronic device determines the cosine similarity value between the feature vector of the current scene and each preset feature vector as the similarity value between the feature vector and the preset feature vector to obtain a plurality of similarity values.
For example, the electronic device may compare the cosine similarity value K1Determining the similarity value of P and P1, and determining the cosine similarity value K2Determining the similarity value of P and P2, and determining the cosine similarity value K3The similarity value of P and P3 is determined.
In some embodiments, with continued reference to fig. 5, the step 140 of determining the target feature vector from the plurality of preset feature vectors according to the plurality of similarity values includes the following steps:
141, determining the maximum similarity value from the plurality of similarity values;
and 142, determining the preset feature vector corresponding to the maximum similarity value as a target feature vector.
After obtaining the plurality of similarity values, the electronic device may compare the plurality of similarity values with each other to determine a maximum similarity value from the plurality of similarity values. And then, determining the preset feature vector corresponding to the maximum similarity value as a target feature vector.
For example, three similarity values N1、N2、N3In, N1Less than N2,N2Less than N3Then the electronic device may determine that the maximum similarity value is N3. Subsequently, the electronic device will N3Corresponding preset feature vector P3And determining the target feature vector.
In some embodiments, referring to fig. 6, fig. 6 is a fifth flowchart illustrating a scene recognition method according to an embodiment of the present application.
Before acquiring the perception data of the current scene in step 110, the method further includes the following steps:
161, obtaining preset sensing data of each preset scene in the plurality of preset scenes for multiple times to obtain a plurality of preset sensing data of each preset scene;
and 162, acquiring preset feature vectors corresponding to the preset scenes according to the preset perception data of each preset scene.
Wherein a plurality of preset scenes may be determined by a user first. For example, a user may determine a plurality of scenes, such as a meeting scene, a restaurant scene, a subway scene, and so on.
The electronic device may obtain preset sensing data of each preset scene in the plurality of preset scenes for multiple times to obtain a plurality of preset sensing data of each preset scene.
For example, the electronic device may obtain the preset sensing data of the conference scene multiple times to obtain multiple preset sensing data of the conference sceneX1、X2、X3(ii) a And acquiring preset perception data of the restaurant scene for multiple times to obtain multiple preset perception data Y of the restaurant scene1、Y2、Y3(ii) a And acquiring preset sensing data of the subway scene for multiple times to obtain multiple preset sensing data Z of the subway scene1、Z2、Z3
And then, the electronic equipment acquires preset feature vectors corresponding to the preset scenes according to the preset perception data of each preset scene. The electronic equipment can extract features of a plurality of preset sensing data of each preset scene, and obtains preset feature vectors corresponding to the preset scenes according to the extracted features.
For example, the electronic device may be paired with X1、X2、X3Extracting the characteristics, and acquiring a preset characteristic vector P of the conference scene according to the extracted characteristics1(ii) a For Y1、Y2、Y3Extracting the characteristics, and acquiring a preset characteristic vector P of the restaurant scene according to the extracted characteristics2(ii) a To Z1、Z2、Z3Extracting the characteristics, and acquiring a preset characteristic vector P of the subway scene according to the extracted characteristics3
In some embodiments, referring to fig. 7, fig. 7 is a sixth flowchart illustrating a scene recognition method according to an embodiment of the present application.
Step 162, obtaining a preset feature vector corresponding to each preset scene according to a plurality of preset sensing data of the preset scene, includes the following steps:
1621, sequentially obtaining a preset sub-feature vector according to each preset sensing data to obtain a plurality of preset sub-feature vectors of the preset scene;
1622, calculating an average feature vector of the plurality of preset sub-feature vectors;
1623, determining the average feature vector as a preset feature vector corresponding to the preset scene.
After the electronic equipment obtains a plurality of preset perception data of each preset scene, a preset sub-feature vector is obtained in sequence according to each preset perception data, so that a plurality of preset sub-feature vectors of the preset scene are obtained. The electronic device can extract features of each preset sensing data, and obtains a corresponding preset sub-feature vector according to the extracted features. Thus, for each preset scene, the electronic device may obtain a plurality of preset sub-feature vectors.
For example, for a meeting scenario, the electronic device may be aware of preset perception data X1Performing feature extraction to obtain X1Corresponding preset sub-feature vector P11(ii) a For preset perception data X2Performing feature extraction to obtain X2Corresponding preset sub-feature vector P12(ii) a For preset perception data X3Performing feature extraction to obtain X3Corresponding preset sub-feature vector P13. Thus, the electronic device can obtain three preset sub-feature vectors P of the conference scene11、P12、P13
Then, the electronic device calculates an average feature vector of the plurality of preset sub-feature vectors, and determines the average feature vector as a preset feature vector corresponding to the preset scene.
For example, the electronic device obtains three preset sub-feature vectors P of a conference scene11、P12、P13Then, the three preset sub-feature vectors P are calculated11、P12、P13Is averaged feature vector P1. The electronic device then averages the feature vector P1And determining a preset feature vector of the conference scene.
In particular implementation, the present application is not limited by the execution sequence of the described steps, and some steps may be performed in other sequences or simultaneously without conflict.
As can be seen from the above, the scene identification method provided in the embodiment of the present application includes: acquiring perception data of a current scene; acquiring a feature vector of the current scene according to the perception data; sequentially calculating the similarity between the feature vector and each preset feature vector in a plurality of preset feature vectors to obtain a plurality of similarity values, wherein each preset feature vector corresponds to one preset scene; determining a target feature vector from the preset feature vectors according to the similarity values; and determining a preset scene corresponding to the target characteristic vector as a current scene. In the scene identification method, the electronic device can acquire the feature vector of the current scene according to the sensing data of the current scene, determine the target feature vector according to the similarity values of the feature vector and the preset feature vectors, and determine the preset scene corresponding to the target feature vector as the current scene, so as to realize identification of the current scene, and thus, the electronic device can perform intelligent operation on the current scene conveniently.
The embodiment of the application also provides a scene recognition device, and the scene recognition device can be integrated in the electronic equipment. The electronic device may be a smartphone, a tablet computer, a gaming device, an AR (Augmented Reality) device, an automobile, a data storage device, an audio playback device, a video playback device, a laptop computer, a desktop computing device, a wearable device such as an electronic watch, an electronic glasses, an electronic helmet, an electronic bracelet, an electronic necklace, an electronic garment, or the like.
Referring to fig. 8, fig. 8 is a schematic view of a first structure of a scene recognition apparatus according to an embodiment of the present application. Wherein the scene recognition apparatus 200 comprises: a first obtaining module 201, a second obtaining module 202, a calculating module 203, a first determining module 204, and a second determining module 205.
A first obtaining module 201, configured to obtain perceptual data of a current scene.
The first obtaining module 201 may obtain perceptual data of a current scene. The current scene is a scene of an environment where the electronic device is currently located, that is, a scene of an environment where a user of the electronic device is currently located. It should be noted that, since the electronic device identifies the current scene through the acquired perceptual data, the current scene is an unknown scene for the electronic device.
The first obtaining module 201 may collect the sensing data of the current scene through an information sensing layer in a panoramic sensing architecture in the electronic device. The perception data may comprise arbitrary data. For example, the sensing data may include various data such as temperature, humidity, ambient light intensity, image information, audio information, and the like.
For example, the current scene may be a conference scene. The perception data acquired by the first acquisition module 201 may include a plurality of image information, a plurality of sound information, and the like in a conference scene.
A second obtaining module 202, configured to obtain a feature vector of the current scene according to the sensing data.
After the first obtaining module 201 obtains the sensing data of the current scene, the second obtaining module 202 may obtain the feature vector of the current scene according to the sensing data. Wherein the feature vector may comprise a plurality of features. The feature vector is used to quantize the current scene so that the current scene can be represented by the feature vector.
For example, the feature vector may be P (a, B, C). Wherein A, B, C each represent a feature, e.g., a may represent an image feature, B may represent an audio feature, and C may represent a text feature. The situation of the current scene can be represented by the feature vector P (A, B, C).
The calculating module 203 is configured to sequentially calculate a similarity between the feature vector and each preset feature vector in a plurality of preset feature vectors to obtain a plurality of similarity values, where each preset feature vector corresponds to one preset scene.
A plurality of preset feature vectors may be preset in the electronic device, and each preset feature vector corresponds to one preset scene. That is, each preset feature vector is used to represent a preset scene. It will be appreciated that the preset scenario is a scenario known to the electronic device.
For example, the preset feature vector P may be preset in the electronic device1、P2、P3And the like. Wherein, P1May correspond to a conference scenario, P2Can correspond to a restaurant scenario, P3Can be connected with a subwayThe scenes correspond. That is, P1For representing conference scenes, P2For representing restaurant scenes, P3For representing a subway scene.
It should be noted that the preset feature vectors set in the electronic device may be a large number of vectors, for example, 100 preset feature vectors may be set in the electronic device. Therefore, a large number of preset scenes can be preset in the electronic equipment, and each preset scene can be a specific scene in the life of the user.
After the second obtaining module 202 obtains the feature vector of the current scene, the calculating module 203 sequentially calculates the similarity between the feature vector and each preset feature vector in the plurality of preset feature vectors to obtain a plurality of similarity values. The greater the similarity between the feature vector and a preset feature vector, the more similar the feature vector and the preset feature vector, that is, the more similar the current scene and the preset scene corresponding to the preset feature vector.
For example, after the second obtaining module 202 obtains the feature vector P of the current scene, the calculating module 203 sequentially calculates the feature vectors P and P1Similarity of (2) N1P and P2Similarity of (2) N2P and P3Similarity of (2) N3. Wherein N is1Denotes P and P1The similarity between the current scene and P can also be understood as1Similarity between corresponding preset scenes; n is a radical of2Denotes P and P2The similarity between the current scene and P can also be understood as2Similarity between corresponding preset scenes; n is a radical of3Denotes P and P3The similarity between the current scene and P can also be understood as3Similarity between corresponding preset scenes.
A first determining module 204, configured to determine a target feature vector from the plurality of preset feature vectors according to the plurality of similarity values.
After the calculating module 203 obtains a plurality of similarity values, the first determining module 204 may determine the target feature vector from the plurality of preset feature vectors according to the plurality of similarity values. The target feature vector is the feature vector with the largest similarity with the feature vector of the current scene in the preset feature vectors. That is, the similarity between the scene represented by the target feature vector and the current scene is the highest.
A second determining module 205, configured to determine a preset scene corresponding to the target feature vector as a current scene.
After the first determining module 204 determines the target feature vector, the second determining module 205 may determine a preset scene corresponding to the target feature vector as a current scene, so as to implement identification of the current scene. Subsequently, the electronic device may perform an intelligent operation according to the identified scene, for example, the electronic device may automatically perform a mode switch, automatically adjust screen brightness, or provide an intelligent suggestion for the current scene to the user.
For example, the target feature vector determined by the first determination module 204 is P3,P3If the corresponding preset scene is a conference scene, the second determining module 205 determines the conference scene as a current scene. Subsequently, the electronic device may perform an intelligent operation according to the determined scene, for example, the electronic device may automatically switch to a mute mode.
In the embodiment of the application, the electronic device can acquire the feature vector of the current scene according to the sensing data of the current scene, determine the target feature vector according to the similarity values of the feature vector and the preset feature vectors, and determine the preset scene corresponding to the target feature vector as the current scene, so that the current scene is identified, and the electronic device can perform intelligent operation on the current scene conveniently.
In some embodiments, the second obtaining module 202 is configured to perform the following steps:
selecting a corresponding feature extraction model according to the data type of the perception data;
extracting data features from the perception data through the feature extraction model;
and acquiring a feature vector of the current scene according to the data features.
A plurality of feature extraction models may be preset in the electronic device, and each feature extraction model is used for performing feature extraction on one type of data. For example, a convolutional neural network model, a recurrent neural network model, a word vector model, or the like may be set in advance in the electronic device. The convolutional neural network model is used for processing the image data so as to extract image features from the image data; the recurrent neural network model is used for processing the audio data so as to extract audio features from the audio data; the word vector model is used for processing the text data to extract text features from the text data.
After the first obtaining module 201 obtains the sensing data of the current scene, the second obtaining module 202 may select a corresponding feature extraction model according to a data type of the sensing data. When the sensing data includes a plurality of data types, the second obtaining module 202 may select a corresponding feature extraction model according to each data type.
Subsequently, the second obtaining module 202 extracts data features from the sensing data through the selected feature extraction model, and obtains feature vectors of the current scene according to the data features. For example, the second obtaining module 202 may combine the extracted data features to obtain a feature vector of the current scene.
In some embodiments, when extracting data features from the perceptual data through the feature extraction model, the second obtaining module 202 is configured to perform the following steps:
extracting image features from the image data through a convolutional neural network model;
extracting audio features from the audio data through a recurrent neural network model or a long-short term memory network model;
extracting text features from the text data through a word vector model;
when the feature vector of the current scene is obtained according to the data feature, the second obtaining module 202 is configured to perform the following steps:
and acquiring a feature vector of the current scene according to the image feature, the audio feature and the text feature.
The data type of the perception data acquired by the first acquiring module 201 includes image data, audio data, and text data. The feature extraction model selected for the image data may be a convolutional neural network model, the feature extraction model selected for the audio data may be a recurrent neural network model or a long-short term memory network model, and the feature extraction model selected for the text data may be a word vector model.
Subsequently, the second obtaining module 202 may extract image features from the image data through a convolutional neural network model, audio features from the audio data through a recursive neural network model or a long-short term memory network model, and text features from the text data through a word vector model.
Subsequently, the second obtaining module 202 obtains a feature vector of the current scene according to the extracted image feature, audio feature and text feature.
For example, the image feature extracted by the second obtaining module 202 may be a, the audio feature may be B, and the text feature may be C. Subsequently, the second obtaining module 202 may splice the extracted image features, audio features, and text features to obtain a feature vector P (a, B, C) of the current scene.
In some embodiments, the second obtaining module 202 may further perform feature extraction on the obtained image features, audio features, and text features again to obtain new image features, new audio features, and new text features, and obtain a feature vector of the current scene according to the new image features, the new audio features, and the new text features.
For example, after the second obtaining module 202 obtains the image feature a, the audio feature B, and the text feature C, feature extraction may be performed on the image feature a, the audio feature B, and the text feature C again in sequence by using a statistical method to obtain a new image feature a1New audio feature B1New text feature C1And A is1、B1、C1Splicing to obtain a feature vector P (A) of the current scene1,B1,C1)。
In some embodiments, the calculation module 203 is configured to perform the following steps:
calculating cosine similarity between the eigenvectors and each preset eigenvector by sequentially adopting a cosine similarity algorithm to obtain a plurality of cosine similarity values;
and determining the cosine similarity value of the feature vector and each preset feature vector as the similarity value of the feature vector and the preset feature vector to obtain a plurality of similarity values.
The calculating module 203 may sequentially calculate the cosine similarity between the feature vector and each of the preset feature vectors by using a cosine similarity algorithm, so as to obtain a plurality of cosine similarity values.
Wherein, the value range of the cosine similarity value is [ 1, 1 ]. The cosine similarity value of 1 indicates that the directions of the two vectors are the same, the cosine similarity value of 0 indicates that the two vectors are independent of each other, and the cosine similarity value of-1 indicates that the directions of the two vectors are opposite. The closer the cosine similarity value is to 1, the closer the directions of the two vectors are.
For example, the second obtaining module 202 obtains a feature vector of the current scene as P, where the preset feature vector includes P1、P2、P3Then the calculating module 203 calculates P and P sequentially by using the cosine similarity algorithm1、P2、P3The cosine similarity of P and P is obtained1Cosine similarity value K of1P and P2Cosine similarity value K of2P and P3Cosine similarity value K of3
Subsequently, the calculating module 203 determines the cosine similarity value between the feature vector of the current scene and each of the preset feature vectors as the similarity value between the feature vector and the preset feature vector, so as to obtain a plurality of similarity values.
For example, the calculating module 203 may calculate the cosine similarity value K1Determining the similarity value of P and P1, and determining the cosine similarity value K2Determining the similarity value of P and P2, and determining the cosine similarity value K3The similarity value of P and P3 is determined.
In some embodiments, the first determination module 204 is configured to perform the following steps:
determining a maximum similarity value from the plurality of similarity values;
and determining the preset feature vector corresponding to the maximum similarity value as a target feature vector.
After the calculating module 203 obtains the plurality of similarity values, the first determining module 204 may compare the plurality of similarity values with each other to determine a maximum similarity value from the plurality of similarity values. And then, determining the preset feature vector corresponding to the maximum similarity value as a target feature vector.
For example, three similarity values N1、N2、N3In, N1Less than N2,N2Less than N3Then the first determination module 204 may determine that the maximum similarity value is N3. Subsequently, the first determination module 204 compares N3Corresponding preset feature vector P3And determining the target feature vector.
In some embodiments, referring to fig. 9, fig. 9 is a schematic diagram of a second structure of a scene recognition apparatus provided in an embodiment of the present application.
The scene recognition apparatus 200 further includes a third obtaining module 206, where the third obtaining module 206 is configured to perform the following steps:
acquiring preset sensing data of each preset scene in a plurality of preset scenes for a plurality of times to obtain a plurality of preset sensing data of each preset scene;
and acquiring preset characteristic vectors corresponding to the preset scenes according to the preset perception data of each preset scene.
Wherein a plurality of preset scenes may be determined by a user first. For example, a user may determine a plurality of scenes, such as a meeting scene, a restaurant scene, a subway scene, and so on.
The third obtaining module 206 may obtain the preset sensing data of each preset scene in the plurality of preset scenes for multiple times to obtain a plurality of preset sensing data of each preset scene.
For example, the third acquisition module 206 may acquire a conference scene multiple timesTo obtain a plurality of preset sensing data X of the conference scene1、X2、X3(ii) a And acquiring preset perception data of the restaurant scene for multiple times to obtain multiple preset perception data Y of the restaurant scene1、Y2、Y3(ii) a And acquiring preset sensing data of the subway scene for multiple times to obtain multiple preset sensing data Z of the subway scene1、Z2、Z3
Subsequently, the third obtaining module 206 obtains the preset feature vector corresponding to each preset scene according to the plurality of preset sensing data of the preset scene. The third obtaining module 206 may perform feature extraction on a plurality of preset sensing data of each preset scene, and obtain a preset feature vector corresponding to the preset scene from the extracted features.
For example, the third acquisition module 206 may be for X1、X2、X3Extracting the characteristics, and acquiring a preset characteristic vector P of the conference scene according to the extracted characteristics1(ii) a For Y1、Y2、Y3Extracting the characteristics, and acquiring a preset characteristic vector P of the restaurant scene according to the extracted characteristics2(ii) a To Z1、Z2、Z3Extracting the characteristics, and acquiring a preset characteristic vector P of the subway scene according to the extracted characteristics3
In some embodiments, when obtaining the preset feature vectors corresponding to each preset scene according to the plurality of preset perceptual data of the preset scene, the third obtaining module 206 is configured to perform the following steps:
acquiring a preset sub-feature vector according to each preset sensing data in sequence to obtain a plurality of preset sub-feature vectors of the preset scene;
calculating an average feature vector of the preset sub-feature vectors;
and determining the average characteristic vector as a preset characteristic vector corresponding to the preset scene.
After obtaining the plurality of preset sensing data of each preset scene, the third obtaining module 206 obtains a preset sub-feature vector according to each preset sensing data in sequence, so as to obtain a plurality of preset sub-feature vectors of the preset scene. The third obtaining module 206 may perform feature extraction on each preset sensing data, and obtain a corresponding preset sub-feature vector from the extracted features. Thus, for each preset scenario, the third obtaining module 206 may obtain a plurality of preset sub-feature vectors.
For example, for a meeting scenario, the third obtaining module 206 may obtain the preset perception data X1Performing feature extraction to obtain X1Corresponding preset sub-feature vector P11(ii) a For preset perception data X2Performing feature extraction to obtain X2Corresponding preset sub-feature vector P12(ii) a For preset perception data X3Performing feature extraction to obtain X3Corresponding preset sub-feature vector P13. Thus, the third obtaining module 206 may obtain three preset sub-feature vectors P of the conference scene11、P12、P13
Subsequently, the third obtaining module 206 calculates an average feature vector of the preset sub-feature vectors, and determines the average feature vector as a preset feature vector corresponding to the preset scene.
For example, the third obtaining module 206 obtains three preset sub-feature vectors P of the conference scene11、P12、P13Then, the three preset sub-feature vectors P are calculated11、P12、P13Is averaged feature vector P1. Subsequently, the third obtaining module 206 will average the feature vector P1And determining a preset feature vector of the conference scene.
In specific implementation, the modules may be implemented as independent entities, or may be combined arbitrarily and implemented as one or several entities.
As can be seen from the above, the scene recognition apparatus 200 according to the embodiment of the present application includes: a first obtaining module 201, configured to obtain perceptual data of a current scene; a second obtaining module 202, configured to obtain a feature vector of the current scene according to the sensing data; a calculating module 203, configured to sequentially calculate a similarity between the feature vector and each preset feature vector in a plurality of preset feature vectors to obtain a plurality of similarity values, where each preset feature vector corresponds to a preset scene; a first determining module 204, configured to determine a target feature vector from the plurality of preset feature vectors according to the plurality of similarity values; a second determining module 205, configured to determine a preset scene corresponding to the target feature vector as a current scene. The scene recognition device can acquire the feature vector of the current scene according to the sensing data of the current scene, determine the target feature vector according to the similarity values of the feature vector and the preset feature vectors, and determine the preset scene corresponding to the target feature vector as the current scene, so that the current scene is recognized, and the electronic equipment can perform intelligent operation on the current scene conveniently.
The embodiment of the application also provides the electronic equipment. The electronic device may be a smartphone, a tablet computer, a gaming device, an AR (Augmented Reality) device, an automobile, a data storage device, an audio playback device, a video playback device, a laptop computer, a desktop computing device, a wearable device such as an electronic watch, an electronic glasses, an electronic helmet, an electronic bracelet, an electronic necklace, an electronic garment, or the like.
Referring to fig. 10, fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Electronic device 300 includes, among other things, a processor 301 and a memory 302. The processor 301 is electrically connected to the memory 302.
The processor 301 is a control center of the electronic device 300, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or calling a computer program stored in the memory 302 and calling data stored in the memory 302, thereby performing overall monitoring of the electronic device.
In this embodiment, the processor 301 in the electronic device 300 loads instructions corresponding to one or more processes of the computer program into the memory 302 according to the following steps, and the processor 301 runs the computer program stored in the memory 302, so as to implement various functions:
acquiring perception data of a current scene;
acquiring a feature vector of the current scene according to the perception data;
sequentially calculating the similarity between the feature vector and each preset feature vector in a plurality of preset feature vectors to obtain a plurality of similarity values, wherein each preset feature vector corresponds to one preset scene;
determining a target feature vector from the preset feature vectors according to the similarity values;
and determining a preset scene corresponding to the target characteristic vector as a current scene.
In some embodiments, when obtaining the feature vector of the current scene according to the perception data, the processor 301 performs the following steps:
selecting a corresponding feature extraction model according to the data type of the perception data;
extracting data features from the perception data through the feature extraction model;
and acquiring a feature vector of the current scene according to the data features.
In some embodiments, the data types of the perception data include image data, audio data, and text data, and when the data features are extracted from the perception data by the feature extraction model, the processor 301 performs the following steps:
extracting image features from the image data by a convolutional neural network model;
extracting audio features from the audio data through a recurrent neural network model or a long-short term memory network model;
extracting text features from the text data through a word vector model;
when obtaining the feature vector of the current scene according to the data feature, the processor 301 executes the following steps:
and acquiring a feature vector of the current scene according to the image feature, the audio feature and the text feature.
In some embodiments, when the similarity between the feature vector and each of the preset feature vectors is sequentially calculated to obtain a plurality of similarity values, the processor 301 performs the following steps:
calculating cosine similarity between the eigenvectors and each preset eigenvector by sequentially adopting a cosine similarity algorithm to obtain a plurality of cosine similarity values;
and determining the cosine similarity value of the feature vector and each preset feature vector as the similarity value of the feature vector and the preset feature vector to obtain a plurality of similarity values.
In some embodiments, when determining the target feature vector from the plurality of preset feature vectors according to the plurality of similarity values, the processor 301 performs the following steps:
determining a maximum similarity value from the plurality of similarity values;
and determining the preset feature vector corresponding to the maximum similarity value as a target feature vector.
In some embodiments, before obtaining the perceptual data of the current scene, the processor 301 further performs the following steps:
acquiring preset sensing data of each preset scene in a plurality of preset scenes for a plurality of times to obtain a plurality of preset sensing data of each preset scene;
and acquiring preset characteristic vectors corresponding to the preset scenes according to the preset perception data of each preset scene.
In some embodiments, when obtaining the preset feature vector corresponding to each preset scene according to the plurality of preset sensing data of the preset scene, the processor 301 performs the following steps:
acquiring a preset sub-feature vector according to each preset sensing data in sequence to obtain a plurality of preset sub-feature vectors of the preset scene;
calculating an average feature vector of the preset sub-feature vectors;
and determining the average characteristic vector as a preset characteristic vector corresponding to the preset scene.
Memory 302 may be used to store computer programs and data. The memory 302 stores computer programs containing instructions executable in the processor. The computer program may constitute various functional modules. The processor 301 executes various functional applications and data processing by calling a computer program stored in the memory 302.
In some embodiments, referring to fig. 11, fig. 11 is a schematic view of a second structure of an electronic device provided in an embodiment of the present application.
Wherein, the electronic device 300 further comprises: a display 303, a control circuit 304, an input unit 305, a sensor 306, and a power supply 307. The processor 301 is electrically connected to the display 303, the control circuit 304, the input unit 305, the sensor 306, and the power source 307.
The display screen 303 may be used to display information entered by or provided to the user as well as various graphical user interfaces of the electronic device, which may be comprised of images, text, icons, video, and any combination thereof.
The control circuit 304 is electrically connected to the display 303, and is configured to control the display 303 to display information.
The input unit 305 may be used to receive input numbers, character information, or user characteristic information (e.g., fingerprint), and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control. Wherein, the input unit 305 may include a fingerprint recognition module.
The sensor 306 is used to collect information of the electronic device itself or information of the user or external environment information. For example, the sensor 306 may include a plurality of sensors such as a distance sensor, a magnetic field sensor, a light sensor, an acceleration sensor, a fingerprint sensor, a hall sensor, a position sensor, a gyroscope, an inertial sensor, an attitude sensor, a barometer, a heart rate sensor, and the like.
The power supply 307 is used to power the various components of the electronic device 300. In some embodiments, the power supply 307 may be logically coupled to the processor 301 through a power management system, such that functions of managing charging, discharging, and power consumption are performed through the power management system.
Although not shown in fig. 11, the electronic device 300 may further include a camera, a bluetooth module, and the like, which are not described in detail herein.
As can be seen from the above, an embodiment of the present application provides an electronic device, where the electronic device performs the following steps: acquiring perception data of a current scene; acquiring a feature vector of the current scene according to the perception data; sequentially calculating the similarity between the feature vector and each preset feature vector in a plurality of preset feature vectors to obtain a plurality of similarity values, wherein each preset feature vector corresponds to one preset scene; determining a target feature vector from the preset feature vectors according to the similarity values; and determining a preset scene corresponding to the target characteristic vector as a current scene. The electronic device provided by the embodiment of the application can acquire the feature vector of the current scene according to the sensing data of the current scene, determine the target feature vector according to the similarity values of the feature vector and the preset feature vectors, and determine the preset scene corresponding to the target feature vector as the current scene, so that the current scene is identified, and the electronic device can perform intelligent operation on the current scene conveniently.
An embodiment of the present application further provides a storage medium, where a computer program is stored in the storage medium, and when the computer program runs on a computer, the computer executes the scene recognition method according to any of the above embodiments.
It should be noted that, all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, which may include, but is not limited to: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
The scene recognition method, the scene recognition device, the storage medium, and the electronic device provided in the embodiments of the present application are described in detail above. The principle and the implementation of the present application are explained herein by applying specific examples, and the above description of the embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A method for scene recognition, comprising:
acquiring perception data of a current scene;
acquiring a feature vector of the current scene according to the perception data;
sequentially calculating the similarity between the feature vector and each preset feature vector in a plurality of preset feature vectors to obtain a plurality of similarity values, wherein each preset feature vector corresponds to one preset scene;
determining a target feature vector from the preset feature vectors according to the similarity values;
and determining a preset scene corresponding to the target characteristic vector as a current scene.
2. The method according to claim 1, wherein said obtaining a feature vector of the current scene according to the perceptual data comprises:
selecting a corresponding feature extraction model according to the data type of the perception data;
extracting data features from the perception data through the feature extraction model;
and acquiring a feature vector of the current scene according to the data features.
3. The scene recognition method of claim 2, wherein the data types of the perception data comprise image data, audio data and text data, and the extracting data features from the perception data through the feature extraction model comprises:
extracting image features from the image data by a convolutional neural network model;
extracting audio features from the audio data through a recurrent neural network model or a long-short term memory network model;
extracting text features from the text data through a word vector model;
the obtaining of the feature vector of the current scene according to the data feature includes:
and acquiring a feature vector of the current scene according to the image feature, the audio feature and the text feature.
4. The method of claim 1, wherein the sequentially calculating the similarity between the feature vector and each of a plurality of predetermined feature vectors to obtain a plurality of similarity values comprises:
calculating cosine similarity between the eigenvectors and each preset eigenvector by sequentially adopting a cosine similarity algorithm to obtain a plurality of cosine similarity values;
and determining the cosine similarity value of the feature vector and each preset feature vector as the similarity value of the feature vector and the preset feature vector to obtain a plurality of similarity values.
5. The method according to claim 1, wherein said determining a target feature vector from the plurality of predetermined feature vectors according to the plurality of similarity values comprises:
determining a maximum similarity value from the plurality of similarity values;
and determining the preset feature vector corresponding to the maximum similarity value as a target feature vector.
6. The scene recognition method according to claim 1, wherein before the obtaining the perception data of the current scene, further comprising:
acquiring preset sensing data of each preset scene in a plurality of preset scenes for a plurality of times to obtain a plurality of preset sensing data of each preset scene;
and acquiring preset characteristic vectors corresponding to the preset scenes according to the preset perception data of each preset scene.
7. The method according to claim 6, wherein the obtaining the preset feature vectors corresponding to the preset scenes according to the preset perceptual data of each preset scene comprises:
acquiring a preset sub-feature vector according to each preset sensing data in sequence to obtain a plurality of preset sub-feature vectors of the preset scene;
calculating an average feature vector of the preset sub-feature vectors;
and determining the average characteristic vector as a preset characteristic vector corresponding to the preset scene.
8. A scene recognition apparatus, comprising:
the first acquisition module is used for acquiring the perception data of the current scene;
the second acquisition module is used for acquiring the feature vector of the current scene according to the perception data;
the calculation module is used for calculating the similarity between the feature vector and each preset feature vector in a plurality of preset feature vectors in sequence to obtain a plurality of similarity values, wherein each preset feature vector corresponds to one preset scene;
a first determining module, configured to determine a target feature vector from the plurality of preset feature vectors according to the plurality of similarity values;
and the second determining module is used for determining the preset scene corresponding to the target characteristic vector as the current scene.
9. A storage medium having stored therein a computer program which, when run on a computer, causes the computer to execute the scene recognition method according to any one of claims 1 to 7.
10. An electronic device, characterized in that the electronic device comprises a processor and a memory, wherein the memory stores a computer program, and the processor is used for executing the scene recognition method according to any one of claims 1 to 7 by calling the computer program stored in the memory.
CN201910282441.8A 2019-04-09 2019-04-09 Scene recognition method and device, storage medium and electronic equipment Pending CN111797873A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910282441.8A CN111797873A (en) 2019-04-09 2019-04-09 Scene recognition method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910282441.8A CN111797873A (en) 2019-04-09 2019-04-09 Scene recognition method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN111797873A true CN111797873A (en) 2020-10-20

Family

ID=72805364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910282441.8A Pending CN111797873A (en) 2019-04-09 2019-04-09 Scene recognition method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111797873A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113157889A (en) * 2021-04-21 2021-07-23 韶鼎人工智能科技有限公司 Visual question-answering model construction method based on theme loss
CN114065340A (en) * 2021-10-15 2022-02-18 南方电网数字电网研究院有限公司 Construction site safety monitoring method and system based on machine learning and storage medium
CN115396831A (en) * 2021-05-08 2022-11-25 ***通信集团浙江有限公司 Interaction model generation method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617432A (en) * 2013-11-12 2014-03-05 华为技术有限公司 Method and device for recognizing scenes
CN103942523A (en) * 2013-01-18 2014-07-23 华为终端有限公司 Sunshine scene recognition method and device
CN108710847A (en) * 2018-05-15 2018-10-26 北京旷视科技有限公司 Scene recognition method, device and electronic equipment
CN109241903A (en) * 2018-08-30 2019-01-18 平安科技(深圳)有限公司 Sample data cleaning method, device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942523A (en) * 2013-01-18 2014-07-23 华为终端有限公司 Sunshine scene recognition method and device
CN103617432A (en) * 2013-11-12 2014-03-05 华为技术有限公司 Method and device for recognizing scenes
CN108710847A (en) * 2018-05-15 2018-10-26 北京旷视科技有限公司 Scene recognition method, device and electronic equipment
CN109241903A (en) * 2018-08-30 2019-01-18 平安科技(深圳)有限公司 Sample data cleaning method, device, computer equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113157889A (en) * 2021-04-21 2021-07-23 韶鼎人工智能科技有限公司 Visual question-answering model construction method based on theme loss
CN115396831A (en) * 2021-05-08 2022-11-25 ***通信集团浙江有限公司 Interaction model generation method, device, equipment and storage medium
CN114065340A (en) * 2021-10-15 2022-02-18 南方电网数字电网研究院有限公司 Construction site safety monitoring method and system based on machine learning and storage medium

Similar Documents

Publication Publication Date Title
CN111476306B (en) Object detection method, device, equipment and storage medium based on artificial intelligence
CN109947886B (en) Image processing method, image processing device, electronic equipment and storage medium
CN111696176B (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN111243668B (en) Method and device for detecting molecule binding site, electronic device and storage medium
CN110807361A (en) Human body recognition method and device, computer equipment and storage medium
CN111930964B (en) Content processing method, device, equipment and storage medium
CN113205183B (en) Article recommendation network training method and device, electronic equipment and storage medium
US20140232748A1 (en) Device, method and computer readable recording medium for operating the same
CN111797854B (en) Scene model building method and device, storage medium and electronic equipment
CN111797873A (en) Scene recognition method and device, storage medium and electronic equipment
CN111091845A (en) Audio processing method and device, terminal equipment and computer storage medium
CN111581958A (en) Conversation state determining method and device, computer equipment and storage medium
CN111738365B (en) Image classification model training method and device, computer equipment and storage medium
CN111797850A (en) Video classification method and device, storage medium and electronic equipment
CN111797302A (en) Model processing method and device, storage medium and electronic equipment
CN111797148A (en) Data processing method, data processing device, storage medium and electronic equipment
CN111796925A (en) Method and device for screening algorithm model, storage medium and electronic equipment
CN111797867A (en) System resource optimization method and device, storage medium and electronic equipment
CN111797849A (en) User activity identification method and device, storage medium and electronic equipment
CN111798367A (en) Image processing method, image processing device, storage medium and electronic equipment
WO2020207297A1 (en) Information processing method, storage medium, and electronic device
CN111798019B (en) Intention prediction method, intention prediction device, storage medium and electronic equipment
CN111796663B (en) Scene recognition model updating method and device, storage medium and electronic equipment
CN111796701A (en) Model training method, operation processing method, device, storage medium and equipment
CN111797656B (en) Face key point detection method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination