WO2020238775A1 - 一种场景识别方法、一种场景识别装置及一种电子设备 - Google Patents

一种场景识别方法、一种场景识别装置及一种电子设备 Download PDF

Info

Publication number
WO2020238775A1
WO2020238775A1 PCT/CN2020/091690 CN2020091690W WO2020238775A1 WO 2020238775 A1 WO2020238775 A1 WO 2020238775A1 CN 2020091690 W CN2020091690 W CN 2020091690W WO 2020238775 A1 WO2020238775 A1 WO 2020238775A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
scene recognition
scene
information
shooting
Prior art date
Application number
PCT/CN2020/091690
Other languages
English (en)
French (fr)
Inventor
李�真
王冬
雷张源
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020238775A1 publication Critical patent/WO2020238775A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements

Definitions

  • the embodiments of the present application relate to the field of information processing technology, and in particular, to a scene recognition method, a scene recognition device, and an electronic device.
  • AI Artificial Intelligence
  • One of the usage scenarios is the use of a neural network that simulates the human brain for analysis and learning to perform intelligent knowledge of objects. For example, identifying the plant species or category according to the plant picture; or identifying the scene (such as a snow scene) of the preview screen displayed on the preview interface when the photo is taken.
  • the existing neural network When the existing neural network performs scene recognition, it performs intelligent recognition based on the trained neural network. For example, after a neural network algorithm has been trained on a large number of flower pictures, when previewing the actual flowers, it can be classified according to the trained neural network algorithm, and then the result can be recognized. However, this method only relies on algorithms, so the reliability of the recognition results is low. For example, since the scene of falling cherry blossoms is very close to the scene of falling snow, the neural network will misidentify the falling cherry blossom scene as the falling snow scene.
  • the embodiments of the present application provide a scene recognition method, a scene recognition device, and an electronic device, which can improve the accuracy of image recognition.
  • a scene recognition method includes: recognizing a first image, determining a scene recognition result of the first image, the scene recognition result including at least one scene category; and acquiring the shooting when the first image is collected Information, the shooting information includes at least one or more of time information, location information, weather information, and temperature information; according to the shooting information, the label of the first image is determined from the scene recognition result, and the label of the first image is used To indicate the scene category of the first image.
  • the technical solution provided by the above-mentioned first aspect by combining the time information, location information, weather information and temperature information of the image to perform image scene recognition, can avoid the problem of misrecognition caused by relying solely on algorithms for scene recognition, thereby improving the accuracy of image recognition degree.
  • the aforementioned at least one scene category includes at least one or more of image background information in the image, season information corresponding to the image, weather information corresponding to the image, and photographic subject information of the image.
  • the application combines the shooting information of the image to classify the shooting background, season, weather, or shooting object of the image, so as to improve the accuracy of image recognition.
  • the aforementioned at least one scene category is sorted according to the degree of matching between the first image and each scene category in ascending order. Through such processing, it is possible to combine the shooting information of the image, and use the scene category with the highest degree of matching with the shooting information of the image as the scene category of the first image.
  • determining the label of the first image from the scene recognition result according to the shooting information includes: according to the degree of matching of the shooting information with at least one scene category, and sorting the at least one scene category from large to small To determine the label of the first image.
  • the scene category of the first image is determined according to the degree of matching between the shooting information of the image and each scene category, which can avoid misrecognition problems caused by relying solely on algorithms for scene recognition, thereby improving the accuracy of image recognition.
  • the scene recognition method of the present application can be applied to an electronic device that includes a neural network processing unit NPU chip; the recognizing the first image and determining the scene recognition result of the first image includes: using the NPU chip The first image is recognized, and the scene recognition result of the first image is determined.
  • the scene recognition method of this application can be implemented by an NPU chip.
  • the Cambricon instruction set is integrated in the NPU chip; the NPU chip uses the Cambricon instruction set to accelerate the process of determining the scene recognition result of the first image.
  • the speed of scene recognition can be improved and the user experience can be improved.
  • the first image is a preview image collected by a camera of the electronic device.
  • the scene recognition method of the present application may be performed on the preview image collected by the camera.
  • the first image is a stored picture; or, the first image is a picture obtained from another device.
  • the scene recognition method of the present application can also be performed on existing pictures, including those taken using electronic equipment and those obtained from a third party.
  • the method further includes: adjusting the shooting parameters of the camera so that the shooting parameters match the label of the first image.
  • a convolutional neural network is integrated in the above-mentioned NPU chip, and the method further includes: updating the first image and the labels of the first image into the training set of the convolutional neural network; and according to the updated training Set to retrain the convolutional neural network.
  • a scene recognition device in a second aspect, includes: a scene recognition unit configured to recognize a first image and determine a scene recognition result of the first image, the scene recognition result including at least one scene category; information The acquiring unit is used to acquire shooting information when the first image is collected.
  • the shooting information includes at least one or more of time information, location information, weather information, and temperature information; the scene recognition unit is also used to Information, the label of the first image is determined from the scene recognition result, and the label of the first image is used to indicate the scene category of the first image.
  • the technical solution provided by the second aspect mentioned above by combining the time information, location information, weather information and temperature information of the image to perform image scene recognition, it can avoid the problem of misrecognition caused by relying solely on algorithms for scene recognition, thereby improving the accuracy of image recognition degree.
  • the aforementioned at least one scene category includes at least one or more of image background information in the image, season information corresponding to the image, weather information corresponding to the image, and photographic subject information of the image.
  • the application combines the shooting information of the image to classify the shooting background, season, weather, or shooting object of the image, so as to improve the accuracy of image recognition.
  • the aforementioned at least one scene category is sorted according to the degree of matching between the first image and each scene category in ascending order. Through such processing, it is possible to combine the shooting information of the image, and use the scene category with the highest degree of matching with the shooting information of the image as the scene category of the first image.
  • the scene recognition unit determines the label of the first image from the scene recognition result according to the shooting information, including: the scene recognition unit matches the shooting information with at least one scene category, and at least one The scene category is sorted from large to small, and the label of the first image is determined.
  • the scene category of the first image is determined according to the degree of matching between the shooting information of the image and each scene category, which can avoid the misrecognition problem caused by relying solely on the algorithm for scene recognition, thereby improving the accuracy of image recognition.
  • the scene recognition unit includes a neural network processing unit NPU chip; the above-mentioned scene recognition unit recognizes the first image and determines the scene recognition result of the first image, including: the scene recognition unit recognizes the first image through the NPU chip Image, determine the scene recognition result of the first image.
  • the scene recognition method of this application can be implemented by an NPU chip.
  • the Cambricon instruction set is integrated in the NPU chip; the NPU chip uses the Cambricon instruction set to accelerate the process of determining the scene recognition result of the first image.
  • the speed of scene recognition can be improved and the user experience can be improved.
  • the scene recognition device further includes a camera
  • the first image is a preview image collected by the camera.
  • the scene recognition method of the present application may be performed on the preview image collected by the camera of the scene recognition device.
  • the first image is a stored picture; or, the first image is a picture obtained from another device.
  • the scene recognition method of the present application can also be performed on existing pictures, including those taken using electronic equipment and those obtained from a third party.
  • the device further includes: a parameter adjustment unit, configured to adjust the shooting parameters of the camera after the scene recognition unit determines the label of the first image from the scene recognition result according to the shooting information, so that the shooting parameters Match the label of the first image.
  • a parameter adjustment unit configured to adjust the shooting parameters of the camera after the scene recognition unit determines the label of the first image from the scene recognition result according to the shooting information, so that the shooting parameters Match the label of the first image.
  • a convolutional neural network is integrated into the NPU chip, and the scene recognition unit is further used to: update the first image and the labels of the first image into the training set of the convolutional neural network; After the training set, retrain the convolutional neural network.
  • the algorithm of the convolutional neural network can be continuously improved, and the accuracy of scene recognition by the convolutional neural network can be improved.
  • a user equipment UE includes: a scene identification device, configured to implement the scene identification method in any one of the possible implementation manners of the first aspect.
  • a user equipment UE includes: a memory for storing computer program code, the computer program code including instructions; a radio frequency circuit for transmitting and receiving wireless signals; and a processor for Executing the instruction implements the scene recognition method in any possible implementation manner of the first aspect.
  • a computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are executed by a processor, a scenario as in any one of the possible implementations of the first aspect is realized recognition methods.
  • a chip system in a sixth aspect, includes a processor and a memory, and instructions are stored in the memory; when the instructions are executed by the processor, the implementation is as in any possible implementation manner of the first aspect Scene recognition method.
  • the chip system can be composed of chips, or can include chips and other discrete devices.
  • FIG. 1 is a schematic diagram of the working process of a convolutional neural network provided by an embodiment of this application;
  • Figure 2 is a schematic diagram of a pooling method provided by an embodiment of the application.
  • FIG. 3 is a schematic diagram of a mobile phone hardware structure provided by an embodiment of the application.
  • FIG. 4 is a first flowchart of a scene recognition method provided by an embodiment of this application.
  • FIG. 5 is a second flowchart of a scene recognition method provided by an embodiment of this application.
  • FIG. 6 is an accelerator architecture based on the Cambricon instruction set provided by an embodiment of the application.
  • FIG. 7 is a third flowchart of a scene recognition method provided by an embodiment of this application.
  • FIG. 8 is a fourth flowchart of a scene recognition method provided by an embodiment of this application.
  • Fig. 9 is a schematic structural diagram of a mobile phone provided by an embodiment of the application.
  • the embodiments of the present application provide a scene recognition method, a scene recognition device, and an electronic device.
  • the method can be used in the process of scene recognition of the image to be recognized through a convolutional neural algorithm.
  • the image to be recognized may refer to a picture that has been taken, a preview of a camera, a picture obtained from other places, or a certain frame of image in a video, etc.
  • the embodiments of the present application do not limit the source, format, and acquisition method of the image to be recognized.
  • the scene recognition result in the embodiment of this application can identify the image background information (such as night scene, snow, beach), can identify the season information corresponding to the image (such as autumn), and can also identify the weather information corresponding to the image (such as rainy and cloudy days). ), you can also identify the subject information of the image (such as falling cherry blossoms, babies, falling snow), etc.
  • the above-mentioned scene recognition results are just a few examples, and the embodiment of the present application does not limit the specific scene category in the specific scene recognition result.
  • Example 1 The user uses the camera of the User Equipment (UE) to take a picture, and the UE performs scene recognition on the preview image during the preview interface. Adjust the shooting parameters of the camera based on the scene recognition result. In this way, when the user clicks the camera button to take the preview image, a picture that best matches the style and color of the scene will be taken based on the adjusted shooting parameters, which has a better shooting effect and a better user experience.
  • UE User Equipment
  • the preview interface refers to the interface where the UE enables the camera to preview the current image to be shot. After the UE starts the camera, the current preview interface of the camera will be displayed on the display screen of the UE, so that the user can determine whether the current picture is the picture that the user wants to take.
  • Example 2 A user uploads an existing picture to a website, and the website identifies the scene category of the picture. Specifically, the web server of the website identifies the scene category of the picture. Or the user recognizes the scene category of the picture through the APP installed in the UE. For example, the user wants to know the family (such as Rosaceae) and the name (such as rose) of the plants in the pictures taken. The user can upload the picture to a certain website, and the web server of the website can identify the family and name of the plant in the picture.
  • the web server of the website can identify the family and name of the plant in the picture.
  • Example 3 The user wants to find the clothing that Jia wears in a certain shopping APP.
  • the user can upload a photo of Jia wearing the costume to the shopping APP.
  • the application server of the APP completes the identification of the clothing, and matches the same clothing from the shopping APP, and recommends it to the user.
  • Example 4 The user wants to render the picture 1 taken by him into a picture 2 that is more suitable for his scene. For example, the user takes a photo of himself standing in a snow scene, hoping to render a more dreamy snow scene in the background. The user can identify the scene of the picture 1 through the picture processing APP. And after the APP obtains the scene recognition result, the picture 1 is rendered according to the scene recognition result to obtain a better snow scene effect.
  • example 1 to example 4 are only used as a few examples to introduce several possible applications of the scene recognition method in the embodiment of the present application.
  • the scene recognition method in the embodiment of the present application may also be applied to other possible situations, which is not limited in the embodiment of the present application.
  • the electronic devices in the embodiments of the present application may be smart phones, tablet computers, smart cameras, and may also be other desktop, laptop, and handheld devices, such as netbooks, personal digital assistants (PDAs), Wearable devices (such as smart watches), AR (augmented reality)/VR (virtual reality) devices, etc., can also be server-type devices (such as example 2 and example 3), or other devices.
  • PDAs personal digital assistants
  • Wearable devices such as smart watches
  • server-type devices such as example 2 and example 3
  • the embodiment of the application does not limit the type of electronic device.
  • a convolutional neural network may be integrated in the electronic device.
  • the convolutional neural network of the electronic device is integrated in a neural-network processing unit (NPU) chip, and the scene recognition method of the embodiment of the present application is completed through the NPU.
  • the convolutional neural network of the electronic device is integrated in the scene recognition device, and the scene recognition method in the embodiment of the present application is completed by the scene recognition device.
  • NPU neural-network processing unit
  • the convolutional neural network is a feed-forward neural network.
  • Artificial neurons can respond to surrounding units and can perform large-scale image processing.
  • Convolutional neural networks include one or more convolutional layers and a fully connected layer at the top (corresponding to classic neural networks), as well as associated weights and pooling layers. This structure enables convolutional neural networks to use the two-dimensional structure of the input data. Compared with other deep learning structures, convolutional neural networks can give better results in image and speech recognition. This model can also be trained using backpropagation algorithms. Compared with other deep, feed-forward neural networks, convolutional neural networks need to consider fewer parameters, and it is an attractive deep learning structure.
  • the convolutional layer 120 obtains and preprocesses the data input layer (Input layer) 110 (for example, the preprocessing includes de-averaging, normalization, and principal component analysis (PCA)/whitening ( Whitening)) image data for feature extraction.
  • the activation function layer 130 performs a non-linear mapping on the result output by the convolution layer 120.
  • the activation function layer 130 adopts the activation function modified linear unit (ReLU) to compress the output result of the convolutional layer 120 to a certain fixed range, so that the numerical range that can be kept layer by layer is acceptable. Controlled.
  • ReLU activation function modified linear unit
  • ReLU is characterized by fast convergence and simple gradient finding.
  • the pooling layer 140 samples the features, that is, replaces a region with a value, mainly for reducing the network training parameters and the degree of overfitting of the model.
  • the fully connected layer 150 integrates the previously extracted features. Since each node of the fully connected layer 150 is connected to all nodes of the upper layer, it has the characteristic of being fully connected, which is the same as the connection method of the traditional neural network neurons.
  • Max pooling refers to selecting the largest number for each 2 ⁇ 2 window as the value of the corresponding element of the output matrix. As shown in Figure 2, the largest number in the first 2 ⁇ 2 window of the input matrix is 6, then the first element of the output matrix is 6, and so on.
  • scene recognition equipment including UE, server equipment or scene recognition device
  • scene recognition device is based on the scene recognition result of the image to be recognized obtained by the convolutional neural network, combined with time, location, and weather , Humidity, temperature and other information, and finally determine the label of the image to be recognized.
  • the mobile phone 300 may include a processor 310, a memory (including an external memory interface 320 and an internal memory 321), a universal serial bus (USB) interface 330, a charging management module 340, and a power management module 341 , Battery 342, antenna 1, antenna 2, mobile communication module 350, wireless communication module 360, audio module 370, speaker 370A, microphone 370C, sensor module 380, buttons 390, indicator 392, camera 393, display 394, and user Identification module (subscriber identification module, SIM) card interface 395, etc.
  • the sensor module 380 may include a gyroscope sensor 380A, a pressure sensor 380B, an acceleration sensor 380C, a temperature sensor 380D, a touch sensor 380E, an ambient light sensor 380F, and so on.
  • the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the mobile phone 300.
  • the mobile phone 300 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 310 may include one or more processing units.
  • the processor 310 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor NPU chip, etc.
  • AP application processor
  • modem processor modem processor
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • baseband processor baseband processor
  • neural network processor NPU chip etc.
  • the different processing units may be independent devices or integrated in one or more processors.
  • the controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 310 to store instructions and data.
  • the memory in the processor 310 is a cache memory.
  • the memory can store instructions or data that have just been used or recycled by the processor 310. If the processor 310 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 310 is reduced, and the efficiency of the system is improved.
  • the processor 310 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter (universal asynchronous transmitter) interface.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB Universal Serial Bus
  • the charging management module 340 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the charging management module 340 may receive the charging input of the wired charger through the USB interface 330.
  • the charging management module 340 may receive the wireless charging input through the wireless charging coil of the mobile phone 300. While the charging management module 340 charges the battery 342, it can also supply power to the electronic device through the power management module 341.
  • the power management module 341 is used to connect the battery 342, the charging management module 340 and the processor 310.
  • the power management module 341 receives input from the battery 342 and/or the charge management module 340, and supplies power to the processor 310, the internal memory 321, the display screen 394, the camera 393, and the wireless communication module 360.
  • the power management module 341 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).
  • the power management module 341 may also be provided in the processor 310.
  • the power management module 341 and the charging management module 340 may also be provided in the same device.
  • the wireless communication function of the mobile phone 300 can be implemented by the antenna 1, the antenna 2, the mobile communication module 350, the wireless communication module 360, the modem processor, and the baseband processor.
  • the antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the mobile phone 300 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna can be used in combination with a tuning switch.
  • the mobile communication module 350 can provide a wireless communication solution including 2G/3G/4G/5G and the like applied on the mobile phone 300.
  • the mobile communication module 350 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), and so on.
  • the mobile communication module 350 can receive electromagnetic waves by the antenna 1, and perform processing such as filtering, amplifying and transmitting the received electromagnetic waves to the modem processor for demodulation.
  • the mobile communication module 350 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic waves for radiation by the antenna 1.
  • at least part of the functional modules of the mobile communication module 350 may be provided in the processor 310.
  • at least part of the functional modules of the mobile communication module 350 and at least part of the modules of the processor 310 may be provided in the same device.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is processed by the baseband processor and then passed to the application processor.
  • the application processor outputs a sound signal through an audio device (not limited to a speaker 370A, a receiver 370B, etc.), or displays an image or video through the display screen 394.
  • the modem processor may be an independent device.
  • the modem processor may be independent of the processor 310 and be provided in the same device as the mobile communication module 350 or other functional modules.
  • the wireless communication module 360 can provide applications on the mobile phone 300 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellite systems. (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • BT Bluetooth
  • GNSS global navigation satellite system
  • frequency modulation frequency modulation, FM
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the wireless communication module 360 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 360 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 310.
  • the wireless communication module 360 may also receive the signal to be sent from the processor 310, perform frequency modulation, amplify, and convert it into electromagnetic waves through the antenna 2 and
  • the antenna 1 of the mobile phone 300 is coupled with the mobile communication module 350, and the antenna 2 is coupled with the wireless communication module 360, so that the mobile phone 300 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technologies may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite-based augmentation systems
  • the mobile phone 300 implements a display function through a GPU, a display screen 394, and an application processor.
  • the GPU is an image processing microprocessor, which is connected to the display screen 394 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • the processor 310 may include one or more GPUs that execute program instructions to generate or change display information. Specifically, in the embodiment of the present application, after the scene recognition result is determined, the mobile phone 300 may render the picture into an effect suitable for the picture label through the GPU.
  • the display screen 394 is used to display images, videos, etc.
  • the display screen 394 includes a display panel.
  • the display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active-matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • active-matrix organic light-emitting diode active-matrix organic light-emitting diode
  • AMOLED flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
  • the mobile phone 300 may include one or N display screens 394, and N is a positive integer greater than one.
  • the mobile phone 300 can realize a shooting function through an ISP, a camera 393, a video codec, a GPU, a display screen 394, and an application processor.
  • the ISP is used to process the data fed back by the camera 393. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transfers the electrical signal to the ISP for processing and is converted into an image visible to the naked eye.
  • ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 393.
  • the camera 393 is used to capture still images or videos.
  • the object generates an optical image through the lens and projects it to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats.
  • the mobile phone 300 may include one or N cameras 393, and N is a positive integer greater than one.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the mobile phone 300 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the mobile phone 300 may support one or more video codecs. In this way, the mobile phone 300 can play or record videos in a variety of encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
  • MPEG moving picture experts group
  • MPEG2 MPEG2, MPEG3, MPEG4, etc.
  • NPU is a neural network processing unit (Neural-network Processing Unit). By drawing on the structure of biological neural networks, such as the transfer mode between human brain neurons, it can quickly process input information, and it can also continuously learn.
  • the NPU can realize applications such as intelligent cognition of the mobile phone 300, such as image recognition, face recognition, scene recognition, speech recognition, text understanding, etc.
  • the NPU can be understood as a unit integrated with a convolutional neural network, or can be understood as a scene recognition device. Or it can be understood that the scene recognition device may include an NPU for performing scene recognition on the image to be recognized.
  • the external memory interface 320 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the mobile phone 300.
  • the external memory card communicates with the processor 310 through the external memory interface 320 to realize the data storage function. For example, save music, video and other files in an external memory card.
  • the internal memory 321 may be used to store computer executable program code, where the executable program code includes instructions.
  • the internal memory 321 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, at least one application program (such as a sound playback function, an image playback function, etc.) required by at least one function.
  • the data storage area can store data (such as audio data, phone book, etc.) created during the use of the mobile phone 300.
  • the internal memory 321 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), etc.
  • the processor 310 executes various functional applications and data processing of the mobile phone 300 by running instructions stored in the internal memory 321 and/or instructions stored in a memory provided in the processor.
  • the mobile phone 300 can implement audio functions through the audio module 370, the speaker 370A, the receiver 370B, the microphone 370C, the earphone interface 370D, and the application processor. For example, music playback, recording, etc.
  • the audio module 370 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
  • the audio module 370 can also be used to encode and decode audio signals.
  • the speaker 370A also called a "speaker" is used to convert audio electrical signals into sound signals.
  • the mobile phone 300 can listen to music through the speaker 370A, or listen to a hands-free call.
  • the receiver 370B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the mobile phone 300 answers a call or a voice message, it can receive the voice by bringing the receiver 370B close to the human ear.
  • Microphone 370C also called “microphone”, “microphone”, is used to convert sound signals into electric signals.
  • the user can make a sound by approaching the microphone 370C through the mouth, and input the sound signal into the microphone 370C.
  • the mobile phone 300 can be provided with at least one microphone 370C.
  • the earphone interface 370D is used to connect wired earphones.
  • the earphone interface 370D can be a USB interface 330, or a 3.5mm open mobile terminal platform (OMTP) standard interface, and a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association
  • the gyro sensor 380A can be used to determine the movement posture of the mobile phone 300.
  • the angular velocity of the mobile phone 300 around three axes ie, x, y, and z axes
  • the gyro sensor 380A can be used for image stabilization. Specifically, in the embodiment of the present application, if the mobile phone 300 collects data such as the jitter angle of the mobile phone 300 through the gyroscope sensor 380A combined with the scene recognition result of the convolutional neural network, it is determined that the scene of the current preview image is parachuting.
  • the 300 shake angle calculates the distance that the lens module needs to compensate, and the lens can counteract the shake of the mobile phone 300 through reverse movement to achieve anti-shake.
  • the gyro sensor 380A can also be used for navigation and somatosensory game scenes.
  • the pressure sensor 380B is used to measure pressure or pressure.
  • the mobile phone 300 calculates the altitude based on the air pressure value measured by the pressure sensor 380B, combined with the GPS positioning of the mobile phone and the scene recognition result of the convolutional neural network to determine that the scene of the current preview image is Yulong Snow Mountain, and the mobile phone 300 can adjust the shooting parameters to make it More suitable for shooting the current scene.
  • the acceleration sensor 380C can detect the magnitude of the acceleration of the mobile phone 300 in various directions (generally three axes).
  • the magnitude and direction of gravity can be detected when the mobile phone 300 is stationary. It can also be used to identify the posture of electronic devices, and used in applications such as horizontal and vertical screen switching, pedometers, etc.
  • the mobile phone 300 if the mobile phone 300 combines data such as the magnitude and direction of the gravity of the mobile phone 300 collected by the acceleration sensor 380C, the pressure data collected by the pressure sensor 380B, and the convolutional neural network
  • the scene recognition result determines that the scene of the current preview image is the underwater world, and the mobile phone 300 can adjust the shooting parameters to make it more suitable for underwater shooting.
  • the ambient light sensor 380F is used to sense the brightness of the ambient light.
  • the mobile phone 300 can determine that the current preview screen scene is dark night according to the ambient light brightness collected by the ambient light sensor 380F and combined with the scene recognition result of the convolutional neural network, and the mobile phone 300 can fill light For shooting, the specific amount of supplemental light can also be determined by the brightness of the ambient light collected by the ambient light sensor 380F.
  • the temperature sensor 380D is used to detect temperature.
  • the mobile phone 300 can determine that the scene taken by the picture is cherry blossoms falling rather than snow based on the temperature collected by the temperature sensor 380D and combined with the scene recognition result of the convolutional neural network.
  • the picture is rendered as a winter snow scene.
  • the touch sensor 380E is also called a "touch device”.
  • the touch sensor 380E (also referred to as a touch panel) may be provided on the display screen 394, and the touch sensor 380E and the display screen 394 form a touch screen, also called a “touch screen”.
  • the touch sensor 380E is used to detect touch operations acting on or near it.
  • the touch sensor can transmit the detected touch operation to the application processor to determine the type of touch event.
  • the visual output related to the touch operation can be provided through the display screen 394.
  • the touch sensor 380E may also be disposed on the surface of the mobile phone 300, which is different from the position of the display screen 394. Specifically in this application, the mobile phone 300 can detect the user's pressing operation on the virtual shooting button of the display screen 394, and in response to the operation, shoot the current preview image.
  • the button 390 includes a power button, a volume button and so on.
  • the button 390 may be a mechanical button. It can also be a touch button.
  • the mobile phone 300 can receive key input, and generate key signal input related to user settings and function control of the mobile phone 300.
  • the motor 391 can generate vibration prompts.
  • the motor 391 can be used for incoming call vibration notification, and can also be used for touch vibration feedback.
  • touch operations applied to different applications can correspond to different vibration feedback effects.
  • Acting on touch operations in different areas of the display screen 394, the motor 391 can also correspond to different vibration feedback effects.
  • Different application scenarios for example: time reminding, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 392 can be an indicator light, which can be used to indicate the charging status, power change, and can also be used to indicate messages, missed calls, notifications, and so on.
  • the SIM card interface 395 is used to connect to the SIM card.
  • the SIM card can be connected to and separated from the mobile phone 300 by inserting into the SIM card interface 395 or pulling out from the SIM card interface 395.
  • the mobile phone 300 can support 1 or N SIM card interfaces, and N is a positive integer greater than 1.
  • the SIM card interface 395 can support Nano SIM cards, Micro SIM cards, SIM cards, etc.
  • the same SIM card interface 395 can insert multiple cards at the same time. The types of the multiple cards can be the same or different.
  • the SIM card interface 395 can also be compatible with different types of SIM cards.
  • the SIM card interface 395 can also be compatible with external memory cards.
  • the mobile phone 300 interacts with the network through the SIM card to implement functions such as call and data communication.
  • the mobile phone 300 uses an eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the mobile phone 300 and cannot be separated from the mobile phone 300.
  • the mobile phone 300 can perform some or all of the steps in the embodiment of the present application. These steps or operations are only examples, and the embodiments of the present application may also perform other operations or variations of various operations. In addition, each step may be executed in a different order presented in the embodiment of the present application, and it may not be necessary to perform all the operations in the embodiment of the present application.
  • the scene recognition method in the embodiment of the present application can be implemented through S401-S403:
  • the mobile phone 300 recognizes a first image, and determines a scene recognition result of the first image.
  • the first image can be understood as the image to be identified above.
  • the first image may be a picture taken by the user through a local camera.
  • the user directly takes photos through the mobile phone camera, or the user calls the photos taken by the mobile phone camera through a certain application (Application, APP) installed in the mobile phone.
  • the first image may also be a picture obtained by the user from other devices.
  • a user receives a picture of falling snow from friends through WeChat, and a picture of falling cherry blossoms downloaded from the Internet.
  • the first image may also be an image from another source. For example, a certain frame of the video recorded by the user.
  • a neural network processing unit NPU chip may be integrated in the mobile phone 300.
  • the convolutional neural network can be integrated in the NPU chip.
  • the mobile phone 300 recognizing the first image and determining the scene recognition result of the first image may include: the mobile phone 300 inputs the first image into the convolutional neural network, and the convolutional neural network determines the scene recognition result of the first image.
  • the convolutional neural network can be pre-trained before the mobile phone 300 leaves the factory, and solidified in the mobile phone 300. It is also possible to use the photos taken by the mobile phone 300 within the preset time period or the received and downloaded pictures as the training set to perform personalized training on the convolutional neural network, so that the convolutional neural network is accurate in scene recognition degree. For example, since the user often takes photos of mountains and plants, and because the mobile phone 300 uses the photos taken by the user to continuously train the training set, the mobile phone 300 has a high accuracy of the scene recognition results of the mountains and plants.
  • the scene recognition result may include at least one scene category. If the scene recognition result includes N scene categories, the N scene categories can be sorted according to the degree of matching of each scene category with the first image in descending order. Among them, N is greater than 1, and N is an integer.
  • the matching degree of each scene category with the first image may refer to the matching success rate of the corresponding feature of each scene in the training set of the convolutional neural network with the feature in the first image.
  • the N scene categories may also be ranked according to other factors, and the embodiment of the present application does not limit specific ranking rules, methods, etc.
  • the mobile phone 300 acquires shooting information when the first image is collected.
  • the shooting information is used to identify environmental information when the first image is collected.
  • the shooting information includes but is not limited to one or more of the following information: time information, location information, weather information, and temperature information.
  • the shooting information is collected by the mobile phone 300.
  • the mobile phone 300 can obtain current time information through synchronization with the Internet; obtain current location information through GPS; obtain current weather information through synchronization with the Internet; obtain current humidity information through a humidity sensor; obtain current temperature through a temperature sensor 180D or from the network Information; the current motion posture information is determined through the gyroscope sensor 180A or the acceleration sensor 180C; the current altitude information is acquired through the pressure sensor 180B; the current ambient light brightness information is acquired through the ambient light sensor 180F.
  • family such as Chimonanthaceae
  • name such as winter plum
  • the shooting information is collected by the mobile phone 300, and when the picture is input to the APP, the APP is notified of the shooting information collected by the mobile phone 300 at the same time.
  • the method and method for the mobile phone 300 to collect shooting information can refer to but not limited to the methods and methods listed above.
  • the shooting information is collected by the shooting device when the group photo is taken, and recorded together with the group photo information.
  • the mobile phone 300 may also determine the shooting information through other methods, which is not limited in the embodiment of the present application.
  • the mobile phone 300 determines the label of the first image from the scene recognition result according to the shooting information.
  • the label of the first image is used for the scene category of the first image.
  • the scene category includes, but is not limited to, one or more of the following: image background information in the first image, season information corresponding to the first image, weather information corresponding to the first image, and subject information of the first image.
  • the convolutional neural network in the mobile phone 300 performs scene recognition on the preview image, and sorts the scene categories according to the scene recognition result, and determines the shooting information obtained by the mobile phone 300 The label of the preview image.
  • the shopping APP in the mobile phone 300 can call the convolutional neural network in the mobile phone 300 to sort the scene according to the scene category in the scene recognition result, and combine the shooting information to determine that the label of the group photo is a snow scene.
  • S403 can be implemented through the following process:
  • the mobile phone 300 determines the degree of matching between the shooting information and each scene category in the scene recognition result.
  • the mobile phone 300 sorts the matching degree of each scene category in the scene recognition result, and determines the label of the first image in combination with the matching degree of the shooting information and each scene category.
  • the scene recognition result of the mobile phone 300 includes two scene categories, "sea world” and “undersea”, and the matching degree between the two scene categories and the first image is "sea world">"undersea".
  • the shooting information shows that the current pressure is 5 ⁇ 10 6 Pascals (Pa) and the location is Bohai. Therefore, although the scene category "sea world” has a higher degree of matching with the first image, the degree of matching between the shooting information and the "undersea” is greater than the degree of matching between the shooting information and the "sea world”. Therefore, the mobile phone 300 determines that the scene category is not “sea world” but "undersea” based on the shooting information. That is, the label of the first image is "Seabed”.
  • the scene recognition result of the mobile phone 300 includes two scene categories, "winter jasmine” and “winter jasmine", and the matching degree of the two scene categories with the first image is "winter jasmine”> "winter jasmine”.
  • the shooting information shows that the current date is December 25, 2018, and the temperature is -5°C.
  • the matching degree between the shooting information and "Winter Jasmine” is 5%, and the matching degree between the shooting information and the "Lamei” is 95%. Therefore, although the scene category "Winter Jasmine” matches the first image to a higher degree, the mobile phone 300 determines that the scene category is not “Winter Jasmine” but "Winter Jasmine” in combination with the shooting information. That is, the label of the first image is "Lamei”.
  • the scene recognition result of the mobile phone 300 includes two scene categories, "Floating Snow” and “Sakura Falling", and the matching degree of the two scene categories with the first image is "Floating Snow”> "Sakura Falling”.
  • the shooting information shows that the current location is in a park in Shanghai, the time is 10:00 am on April 15, 2019, the temperature is 23°C, and the weather is sunny.
  • the matching degree between the shooting information and "Floating Snow” is 20%, and the matching degree between the shooting information and "Sakura Falling" is 90%.
  • the mobile phone 300 determines that the scene category is not “Floating Snow” but "Sakura Falling Down” in combination with the shooting information. That is, the label of the first image is "Sakura Falling Down".
  • the method may further include:
  • the mobile phone 300 adjusts the shooting parameters of the camera so that the shooting parameters match the tags of the first image.
  • the shooting parameters of the aforementioned camera include, but are not limited to, exposure, sensitivity, aperture, white balance, focal length, light metering mode, flash, etc.
  • the mobile phone 300 After the mobile phone 300 recognizes the scene of the preview image, it can automatically adjust the shooting parameters according to the scene recognition result of the preview image, without manual adjustment, which improves the efficiency of adjusting the shooting parameters.
  • the shooting parameters automatically adjusted by the mobile phone 300 are generally better shooting parameters than those manually adjusted by non-professional users, which are more suitable for the current preview image, and can shoot higher-quality photos or videos.
  • the correspondence between different tags and different shooting parameters may be established in advance. After the label of the first image is determined, the corresponding shooting parameter can be searched for according to the determined label from the pre-established correspondence between different labels and different shooting parameters.
  • the shooting parameters may be restored to the initial parameters or to the default parameters. It is convenient to re-identify the scene during the next photo preview.
  • the initial parameter or the default parameter may be the shooting parameter corresponding to the scene most frequently shot by the mobile phone. For example, if the user most often takes pictures of mountains and rivers, the mobile phone can set the shooting parameters corresponding to "mountain and rivers" as the initial or default parameters. In this way, you can avoid frequently adjusting shooting parameters.
  • adjusting the shooting parameters may include: comparing the shooting parameters corresponding to the determined tags found with the initial parameters (or default parameters), if the two are the same, no adjustment is required; if the two are different, Then adjust the shooting parameters from the initial parameters (or default parameters) to the shooting parameters corresponding to the label.
  • the mobile phone 300 detects the user's shooting instruction.
  • the mobile phone 300 detects that the user clicks the camera icon on the touch screen. Or the mobile phone 300 detects other preset actions, such as pressing the volume key, and the preset dynamically indicates “photograph” or "video”.
  • the mobile phone 300 uses the adjusted shooting parameters to shoot a preview image.
  • the Cambricon instruction set is integrated into the NPU chip of the mobile phone 300.
  • the NPU chip uses the Cambricon instruction set to accelerate the process of determining the label of the first image by the convolutional neural network.
  • Cambricon's design principles are:
  • a reduced instruction set computer (Reduced Instruction Set Computer, RISC) based on the load-store access mode is adopted.
  • the selection of specific instructions is obtained by abstracting the calculation level according to the type of workload.
  • DNN Deep Neural Network
  • the main calculation and control tasks include vector calculation, matrix calculation, scalar calculation and branch jump.
  • the instruction set can be divided into four categories, namely calculation, logic, control and data access instructions.
  • Computational instructions mainly provide instruction set support for the common computational logic of neural networks. For example, the multiplication of matrix and matrix, the relationship between matrix and vector, the multiplication of vector and vector, and so on.
  • a feature of this type of instruction is that the length of the data manipulated by the instruction is variable, so as to flexibly support different sizes of matrices and vectors.
  • Logic instructions are mainly for vector or matrix data to complete logic judgment operations.
  • the conditional merge instruction used to support max-pooling can assign values to multiple groups of feature maps to complete the max-pooling operation.
  • the control class and data access class instructions are relatively simple, that is, it provides branch jump and data loading and writing.
  • an accelerator architecture based on the Cambricon instruction set provided by this embodiment of the application As shown in FIG. 6, an accelerator architecture based on the Cambricon instruction set provided by this embodiment of the application.
  • the scalar function unit (Scalar Func. Unit), the vector function unit (Vector Func. Unit), and the matrix function unit (Matrix Func. Unit) instructions in Figure 6 will be placed in the feature classification queue (Issue Queue) waiting.
  • the instructions After obtaining the operation type from the scalar register file (Scalar Register File), the instructions are sent to different modules for processing. Control instructions and scalar calculations will be sent directly to the scalar function unit for processing. Data transmission instructions need to access the L1 Cache, and vector and matrix related instructions will eventually be sent to the vector function unit and the matrix function unit respectively.
  • These two units are specially designed for accelerating the operation of vectors and matrices.
  • the vector and matrix operation instructions in Figure 6 use the on-chip Scratchpad Memory.
  • Traditional processors use fixed-length register data to participate in the calculation of the processor, but in neural networks, data is often of variable length, and it is not practical to use registers.
  • the traditional architecture has too few registers to be suitable for vector and matrix calculations. This is the purpose of using high-speed registers.
  • High-speed temporary memory replaces traditional registers, vector function unit and matrix function unit can use the data of high-speed temporary memory for calculation.
  • the vector cache is 64K
  • the matrix cache is 768K.
  • DMA direct memory access
  • a direct memory access input/output is also designed.
  • Cambricon also designed a mechanism to divide the high-speed scratchpad into multiple different banks to allow simultaneous support of multiple input/output interfaces.
  • the scene recognition method in the embodiment of the present application further includes:
  • the mobile phone 300 updates the first image and the labels of the first image into the training set of the convolutional neural network.
  • the mobile phone 300 retrains the convolutional neural network according to the updated training set.
  • the mobile phone 300 includes hardware structures and/or software modules corresponding to each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • the embodiment of the present application may divide the mobile phone 300 into functional modules.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. It should be noted that the division of modules in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
  • FIG. 9 it is a schematic structural diagram of a mobile phone provided in an embodiment of this application.
  • the mobile phone 300 may include a scene recognition unit 910 and an information acquisition unit 920.
  • the scene recognition unit 910 is used to recognize the first image and determine the scene recognition result of the first image.
  • the scene recognition result includes at least one scene category.
  • the information acquiring unit 920 is configured to acquire shooting information when the first image is collected, and the shooting information includes at least one or more of time information, location information, weather information, and temperature information.
  • the scene recognition unit 910 is further configured to determine the label of the first image from the scene recognition result according to the shooting information.
  • the label of the first image is used to indicate the scene category of the first image.
  • the mobile phone 300 may further include a parameter adjustment unit 930, which is used to adjust the shooting parameters of the camera after the scene recognition unit determines the label of the first image from the scene recognition result according to the shooting information The parameter matches the label of the first image.
  • a parameter adjustment unit 930 which is used to adjust the shooting parameters of the camera after the scene recognition unit determines the label of the first image from the scene recognition result according to the shooting information The parameter matches the label of the first image.
  • the aforementioned mobile phone 300 may also include a radio frequency circuit.
  • the mobile phone 300 can receive and send wireless signals through a radio frequency circuit.
  • the radio frequency circuit includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like.
  • the radio frequency circuit can also communicate with other devices through wireless communication.
  • the wireless communication can use any communication standard or protocol, including but not limited to global system for mobile communications, general packet radio service, code division multiple access, broadband code division multiple access, long-term evolution, email, short message service, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).
  • the steps of the method or algorithm described in the embodiments of the present application may be implemented in a hardware manner, or may be implemented in a manner in which a processor executes software instructions.
  • Software instructions can be composed of corresponding software modules, which can be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, mobile hard disk, CD-ROM or any other form of storage known in the art Medium.
  • An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and can write information to the storage medium.
  • the storage medium may also be an integral part of the processor.
  • the processor and the storage medium may be located in the ASIC.
  • the ASIC may be located in the detection device.
  • the processor and the storage medium may also exist as separate components in the detection device.
  • the disclosed user equipment and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division.
  • there may be other division methods for example, multiple units or components may be It can be combined or integrated into another device, or some features can be omitted or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate parts may or may not be physically separate.
  • the parts displayed as units may be one physical unit or multiple physical units, that is, they may be located in one place, or they may be distributed to multiple different places. . Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium.
  • the technical solutions of the embodiments of the present application are essentially or the part that contributes to the prior art, or all or part of the technical solutions can be embodied in the form of software products, which are stored in a storage medium.
  • a device which may be a single-chip microcomputer, a chip, etc.
  • a processor processor
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种场景识别方法、一种场景识别装置及一种电子设备,涉及信息处理技术领域,可以解决现有场景识别方法场景识别准确度低的问题。本申请通过结合场景识别算法,以及图像的时间信息、位置信息、天气信息和温度信息等进行图像场景识别,可以避免单纯依靠算法进行场景识别导致的误识别问题,从而提高图像识别的准确度。

Description

一种场景识别方法、一种场景识别装置及一种电子设备
本申请要求在2019年5月28日提交中国国家知识产权局、申请号为201910452148.1的中国专利申请的优先权,发明名称为“一种场景识别方法、一种场景识别装置及一种电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及信息处理技术领域,尤其涉及一种场景识别方法、一种场景识别装置及一种电子设备。
背景技术
目前人工智能(Artificial Intelligence,AI)的使用越来越广泛。其中一种使用场景是通过模拟人脑进行分析学习的神经网络,进行智慧识物的场景。例如,根据植物图片识别该植物品种或类别;或者识别拍照时预览界面上显示的预览画面的场景(如雪景)。
现有的神经网络在进行场景识别时,是基于训练后的神经网络进行智能识别的。例如,神经网络算法经过大量的花卉图片训练后,在预览实际花卉的时候可以根据训练后的神经网络算法进行分类,进而识别出结果。但这种方法仅依靠算法进行,因此识别结果可靠性较低。例如,由于樱花飘落的景象与飘雪的景象极为接近,神经网络会将樱花飘落场景误识别为飘雪场景。
发明内容
本申请实施例提供一种场景识别方法、一种场景识别装置及一种电子设备,可以提高图像识别的准确度。
为达到上述目的,本申请实施例采用如下技术方案:
第一方面,提供一种场景识别方法,该方法包括:识别第一图像,确定该第一图像的场景识别结果,该场景识别结果中包括至少一个场景类别;获取采集该第一图像时的拍摄信息,该拍摄信息至少包括:时间信息、位置信息、天气信息和温度信息中的一个或多个;根据拍摄信息,从场景识别结果中确定该第一图像的标签,该第一图像的标签用于指示该第一图像的场景类别。
上述第一方面提供的技术方案,通过结合图像的时间信息、位置信息、天气信息和温度信息等进行图像场景识别,可以避免单纯依靠算法进行场景识别导致的误识别问题,从而提高图像识别的准确度。
在一种可能的实现方式中,上述至少一个场景类别,至少包括:图像中的图像背景信息、图像对应的季节信息、图像对应的天气信息和图像的拍摄对象信息中的一个或多个。本申请通过结合图像的拍摄信息,对图像的拍摄背景、季节、天气或拍摄对象等进行分类,从而提高图像识别的准确度。
在一种可能的实现方式中,上述至少一个场景类别按照第一图像与每个场景类别的匹配程度由大到小排序。通过这样的处理,以便可以结合图像的拍摄信息,将与该图像的拍摄信息匹配度最高的场景类别作为第一图像的场景类别。
在一种可能的实现方式中,根据拍摄信息,从场景识别结果中确定第一图像的标签,包括:根据拍摄信息与至少一个场景类别的匹配程度,以及至少一个场景类别由大到小的排序,确定第一图像的标签。根据图像的拍摄信息与每一个场景类别的匹配程度确定第一图像的场 景类别,可以避免单纯依靠算法进行场景识别导致的误识别问题,从而提高图像识别的准确度。
在一种可能的实现方式中,本申请的场景识别方法可以应用于包括神经网络处理单元NPU芯片的电子设备;所述识别第一图像,确定第一图像的场景识别结果,包括:通过NPU芯片识别第一图像,确定第一图像的场景识别结果。本申请的场景识别方法可以通过NPU芯片来实现。
在一种可能的实现方式中,NPU芯片中集成有寒武纪Cambricon指令集;该NPU芯片使用寒武纪Cambricon指令集加速确定第一图像的场景识别结果的过程。通过使用Cambricon指令集可以提高场景识别的速度,提高用户体验。
在一种可能的实现方式中,第一图像是电子设备的摄像头采集的预览图像。本申请的场景识别方法可以是针对摄像头采集的预览图像进行的。
在一种可能的实现方式中,第一图像是已存储的图片;或者,第一图像是从其他设备获取的图片。本申请的场景识别方法也可以是针对已有的图片进行的,包括使用电子设备拍摄的,以及从第三方获取的。
在一种可能的实现方式中,在根据拍摄信息,从场景识别结果中确定第一图像的标签之后,该方法还包括:调整摄像头的拍摄参数,使得拍摄参数与第一图像的标签匹配。通过提高预览图像场景识别结果的准确度,以便可以调整合适的拍摄参数拍摄该预览图像,获得更好的拍摄效果,提高用户体验度。
在一种可能的实现方式中,上述NPU芯片中集成有卷积神经网络,该方法还包括:将第一图像及第一图像的标签更新进卷积神经网络的训练集;根据更新后的训练集重新训练卷积神经网络。通过使用每一个图像及该图像对应的标签重新训练卷积神经网络,可以不断完善卷积神经网络的算法,提高卷积神经网络进行场景识别的准确度。
第二方面,提供一种场景识别装置,该场景识别装置包括:场景识别单元,用于识别第一图像,确定该第一图像的场景识别结果,该场景识别结果中包括至少一个场景类别;信息获取单元,用于获取采集该第一图像时的拍摄信息,该拍摄信息至少包括:时间信息、位置信息、天气信息和温度信息中的一个或多个;该场景识别单元还用于,根据拍摄信息,从场景识别结果中确定第一图像的标签,该第一图像的标签用于指示第一图像的场景类别。
上述第二方面提供的技术方案,通过结合图像的时间信息、位置信息、天气信息和温度信息等进行图像场景识别,可以避免单纯依靠算法进行场景识别导致的误识别问题,从而提高图像识别的准确度。
在一种可能的实现方式中,上述至少一个场景类别,至少包括:图像中的图像背景信息、图像对应的季节信息、图像对应的天气信息和图像的拍摄对象信息中的一个或多个。本申请通过结合图像的拍摄信息,对图像的拍摄背景、季节、天气或拍摄对象等进行分类,从而提高图像识别的准确度。
在一种可能的实现方式中,上述至少一个场景类别按照第一图像与每个场景类别的匹配程度由大到小排序。通过这样的处理,以便可以结合图像的拍摄信息,将与该图像的拍摄信息匹配度最高的场景类别作为第一图像的场景类别。
在一种可能的实现方式中,场景识别单元根据拍摄信息,从场景识别结果中确定所第一图像的标签,包括:该场景识别单元根据拍摄信息与至少一个场景类别的匹配程度,以及至少一个场景类别由大到小的排序,确定第一图像的标签。根据图像的拍摄信息与每一个场景类别的匹配程度确定第一图像的场景类别,可以避免单纯依靠算法进行场景识别导致的误识 别问题,从而提高图像识别的准确度。
在一种可能的实现方式中,场景识别单元包括神经网络处理单元NPU芯片;上述场景识别单元识别第一图像,确定第一图像的场景识别结果,包括:该场景识别单元通过NPU芯片识别第一图像,确定第一图像的场景识别结果。本申请的场景识别方法可以通过NPU芯片来实现。
在一种可能的实现方式中,NPU芯片中集成有寒武纪Cambricon指令集;该NPU芯片使用寒武纪Cambricon指令集加速确定第一图像的场景识别结果的过程。通过使用Cambricon指令集可以提高场景识别的速度,提高用户体验。
在一种可能的实现方式中,场景识别装置还包括:摄像头,第一图像是摄像头采集的预览图像。本申请的场景识别方法可以是针对场景识别装置摄像头采集的预览图像进行的。
在一种可能的实现方式中,第一图像是已存储的图片;或者,第一图像是从其他设备获取的图片。本申请的场景识别方法也可以是针对已有的图片进行的,包括使用电子设备拍摄的,以及从第三方获取的。
在一种可能的实现方式中,该装置还包括:参数调整单元,用于在场景识别单元根据拍摄信息,从场景识别结果中确定第一图像的标签之后,调整摄像头的拍摄参数,使得拍摄参数与第一图像的标签匹配。通过提高预览图像场景识别结果的准确度,以便可以调整合适的拍摄参数拍摄该预览图像,获得更好的拍摄效果,提高用户体验度。
在一种可能的实现方式中,上述NPU芯片中集成有卷积神经网络,上述场景识别单元还用于:将第一图像及第一图像的标签更新进卷积神经网络的训练集;根据更新后的训练集重新训练卷积神经网络。通过使用每一个图像及该图像对应的标签重新训练卷积神经网络,可以不断完善卷积神经网络的算法,提高卷积神经网络进行场景识别的准确度。
第三方面,提供一种用户设备UE,该UE包括:场景识别装置,用于实现如第一方面任一种可能的实现方式中的所述场景识别方法。
第四方面,提供一种用户设备UE,该UE包括:存储器,用于存储计算机程序代码,所述计算机程序代码包括指令;射频电路,用于进行无线信号的发送和接收;处理器,用于执行所述指令实现如第一方面任一种可能的实现方式中的场景识别方法。
第五方面,提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机执行指令,该计算机执行指令被处理器执行时实现如第一方面任一种可能的实现方式中的场景识别方法。
第六方面,提供一种芯片***,该芯片***包括处理器、存储器,存储器中存储有指令;所述指令被所述处理器执行时,实现如第一方面任一种可能的实现方式中的场景识别方法。该芯片***可以由芯片构成,也可以包含芯片和其他分立器件。
附图说明
图1为本申请实施例提供的一种卷积神经网络的工作过程示意图;
图2为本申请实施例提供的一种池化方法示意图;
图3为本申请实施例提供的一种手机硬件结构示意图;
图4为本申请实施例提供的一种场景识别方法流程图一;
图5为本申请实施例提供的一种场景识别方法流程图二;
图6为本申请实施例提供的一种基于Cambricon指令集的加速器架构;
图7为本申请实施例提供的一种场景识别方法流程图三;
图8为本申请实施例提供的一种场景识别方法流程图四;
图9为本申请实施例提供的一种手机的结构示意图。
具体实施方式
本申请实施例提供一种场景识别方法、一种场景识别装置及一种电子设备。具体的,例如该方法可以用于通过卷积神经算法对待识别图像进行场景识别的过程中。
其中,待识别图像可以是指已拍摄的图片、摄像头的预览、从其他地方获取的图片或者视频中的某一帧图像等。本申请实施例对待识别图像的来源、格式、以及获取方式等不作限定。本申请实施例中的场景识别结果可以标识图像背景信息(如夜景、雪地、沙滩),可以标识图像对应的季节信息(如秋季),也可以标识图像对应的天气信息(如雨天、阴天),还可以标识图像的拍摄对象信息(如樱花飘落、婴儿、飘雪)等。以上几种场景识别结果仅作为几种示例,本申请实施例对具体的场景识别结果中的具体场景类别不作限定。
请参考以下示例,以下几种示例为本申请实施例中的场景识别方法几种可能的应用示例。
示例1:用户使用用户设备(User Equipment,UE)的摄像头拍照,在预览界面时,UE对预览图像进行场景识别。基于场景识别结果调整摄像头的拍摄参数。这样,在用户点击拍照按钮,拍摄该预览图像时,基于调整后的拍摄参数会拍摄出与该场景风格、色彩等最为匹配的图片,拍摄效果更好,用户体验度更好。
其中,预览界面是指UE启用相机预览当前要拍摄画面的界面。UE在启动相机之后,UE的显示屏上会显示相机的当前预览界面,以便用户确定当前画面是否为用户要拍摄的画面。
示例2:用户将已有的图片上传至某一网站,由该网站识别该图片的场景类别。具体的,由该网站的网站服务器识别该图片的场景类别。或者用户通过UE中安装的APP识别该图片的场景类别。例如,用户希望知道其拍摄的图片中的植物的科系(如蔷薇科)以及名称(如月季)。用户可以将该图片上传至某一网站,由该网站的网站服务器识别该图片中植物的科系以及名称。
示例3:用户希望在某一购物APP中找到甲某身穿的服饰。用户可以上传甲某身着该服饰的照片至该购物APP。由该APP的应用服务器完成该服饰的识别,以及从该购物APP中匹配出同款服饰,推荐给用户。
示例4:用户希望将其拍摄的图片1渲染为更符合其场景的图片2。例如,用户自拍了一张站在雪景中的照片,希望在该背景中渲染出更加梦幻的雪景。用户可以通过图片处理类APP,识别该图片1的场景。以及由该APP在获取场景识别结果后,根据该场景识别结果渲染图片1,获得更好的雪景效果。
需要说明的是,上述示例1-示例4仅作为几种示例介绍本申请实施例中的场景识别方法可能的几种应用。本申请实施例中的场景识别方法还可以应用于其他可能的情况中,本申请实施例对此不做限定。
另外,本申请实施例中的电子设备可以为智能手机、平板电脑、智能相机,还可以为其他桌面型、膝上型、手持型设备,例如上网本、个人数字助理(Personal Digital Assistant,PDA)、可穿戴设备(例如智能手表)、AR(增强现实)/VR(虚拟现实)设备等,也可以为服务器类设备(如示例2和示例3),或者其他设备。本申请实施例对电子设备的类型不作限定。
在一种可能的实现方式中,电子设备中可以集成有卷积神经网络。例如,电子设备的卷积神经网络集成在神经网络处理单元(neural-network processing unit,NPU)芯片中,通过NPU完成本申请实施例的场景识别方法。或者,电子设备的卷积神经网络集成在场景识别装置中,通过场景识别装置完成本申请实施例的场景识别方法。
其中,卷积神经网络是一种前馈神经网络,人工神经元可以响应周围单元,可以进行大 型图像处理。卷积神经网络包括一个或多个卷积层和顶端的全连通层(对应经典的神经网络),同时也包括关联权重和池化层(pooling layer)。这一结构使得卷积神经网络能够利用输入数据的二维结构。与其他深度学习结构相比,卷积神经网络在图像和语音识别方面能够给出更好的结果。这一模型也可以使用反向传播算法进行训练。相比较其他深度、前馈神经网络,卷积神经网络需要考量的参数更少,它是一种颇具吸引力的深度学习结构。
如图1所示,为一种卷积神经网络的工作过程示意图。如图1所示,卷积层120对数据输入层(Input layer)110获取且经过预处理(例如,预处理包括去均值、归一化和主成分分析(principal component analysis,PCA)/白化(whitening))的图像数据进行特征提取。激活函数层130对把卷积层120输出的结果做非线性映射。例如,激活函数层130采用激励函数修正线性单元(The Rectified Linear Unit,ReLU)将卷积层120输出的结果压缩到某一个固定的范围,这样可以一直保持一层一层下去的数值范围是可控的。其中,ReLU的特点是收敛快,求梯度简单。然后,池化层140对特征进行采样,即用一个数值替代一块区域,主要是为了降低网络训练参数及模型的过拟合程度。最后,全连接层150把前边提取到的特征综合起来。由于全连接层150的每一个结点都与上一层的所有结点相连,因此,其具有全相连的特性,也就是跟传统的神经网络神经元的连接方式是一样的。
其中,池化层140用的方法有Max pooling和average pooling。其中,Max pooling是指对于每个2×2的窗口选出最大的数作为输出矩阵的相应元素的值。如图2所示,输入矩阵第一个2×2窗口中最大的数是6,那么输出矩阵的第一个元素就是6,如此类推。
本申请实施例中的场景识别方法的基本原理是:场景识别设备(包括UE、服务器类设备或场景识别装置)基于卷积神经网络获取的待识别图像的场景识别结果,结合时间、位置、天气、湿度、温度等信息,最终确定待识别图像的标签。
请参考图3,如图3所示,为本申请实施例提供的一种手机的硬件结构示意图。如图3所示,手机300可以包括处理器310,存储器(包括外部存储器接口320和内部存储器321),通用串行总线(universal serial bus,USB)接口330,充电管理模块340,电源管理模块341,电池342,天线1,天线2,移动通信模块350,无线通信模块360,音频模块370,扬声器370A,麦克风370C,传感器模块380,按键390,指示器392,摄像头393,显示屏394,以及用户标识模块(subscriber identification module,SIM)卡接口395等。其中传感器模块380可以包括陀螺仪传感器380A,压力传感器380B,加速度传感器380C,温度传感器380D,触摸传感器380E,环境光传感器380F等。
可以理解的是,本发明实施例示意的结构并不构成对手机300的具体限定。在本申请另一些实施例中,手机300可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器310可以包括一个或多个处理单元,例如:处理器310可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器NPU芯片等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器310中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器310中的存储器为高速缓冲存储器。该存储器可以保存处理器310刚用过或循环使用的指令或数 据。如果处理器310需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器310的等待时间,因而提高了***的效率。
在一些实施例中,处理器310可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
充电管理模块340用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块340可以通过USB接口330接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块340可以通过手机300的无线充电线圈接收无线充电输入。充电管理模块340为电池342充电的同时,还可以通过电源管理模块341为电子设备供电。
电源管理模块341用于连接电池342,充电管理模块340与处理器310。电源管理模块341接收电池342和/或充电管理模块340的输入,为处理器310,内部存储器321,显示屏394,摄像头393,和无线通信模块360等供电。电源管理模块341还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块341也可以设置于处理器310中。在另一些实施例中,电源管理模块341和充电管理模块340也可以设置于同一个器件中。
手机300的无线通信功能可以通过天线1,天线2,移动通信模块350,无线通信模块360,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。手机300中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块350可以提供应用在手机300上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块350可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块350可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块350还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块350的至少部分功能模块可以被设置于处理器310中。在一些实施例中,移动通信模块350的至少部分功能模块可以与处理器310的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器370A,受话器370B等)输出声音信号,或通过显示屏394显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器310,与移动通信模块350或其他功能模块设置在同一个器件中。
无线通信模块360可以提供应用在手机300上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星***(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距 离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块360可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块360经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器310。无线通信模块360还可以从处理器310接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,手机300的天线1和移动通信模块350耦合,天线2和无线通信模块360耦合,使得手机300可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯***(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位***(global positioning system,GPS),全球导航卫星***(global navigation satellite system,GLONASS),北斗卫星导航***(beidou navigation satellite system,BDS),准天顶卫星***(quasi-zenith satellite system,QZSS)和/或星基增强***(satellite based augmentation systems,SBAS)。
手机300通过GPU,显示屏394,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏394和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器310可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。具体到本申请实施例中,在确定场景识别结果后,手机300可以通过GPU将图片渲染为适合该图片标签的效果。
显示屏394用于显示图像,视频等。显示屏394包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,手机300可以包括1个或N个显示屏394,N为大于1的正整数。
手机300可以通过ISP,摄像头393,视频编解码器,GPU,显示屏394以及应用处理器等实现拍摄功能。
ISP用于处理摄像头393反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头393中。
摄像头393用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,手机300可以包括1个或N个摄像头393,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当手机300在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。手机300可以支持一种或多种视频编解码器。这样,手机300可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络处理单元(Neural-network Processing Unit),通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现手机300的智能认知等应用,例如:图像识别,人脸识别,场景识别,语音识别,文本理解等。具体到本申请实施例中,NPU可以理解为集成有卷积神经网络的单元,或者可以理解为场景识别装置。或者可以理解为场景识别装置可以包括NPU,用于对待识别图像进行场景识别。
外部存储器接口320可以用于连接外部存储卡,例如Micro SD卡,实现扩展手机300的存储能力。外部存储卡通过外部存储器接口320与处理器310通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器321可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器321可以包括存储程序区和存储数据区。其中,存储程序区可存储操作***,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储手机300使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器321可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器310通过运行存储在内部存储器321的指令,和/或存储在设置于处理器中的存储器的指令,执行手机300的各种功能应用以及数据处理。
手机300可以通过音频模块370,扬声器370A,受话器370B,麦克风370C,耳机接口370D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块370用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块370还可以用于对音频信号编码和解码。
扬声器370A,也称“喇叭”,用于将音频电信号转换为声音信号。手机300可以通过扬声器370A收听音乐,或收听免提通话。
受话器370B,也称“听筒”,用于将音频电信号转换成声音信号。当手机300接听电话或语音信息时,可以通过将受话器370B靠近人耳接听语音。
麦克风370C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风370C发声,将声音信号输入到麦克风370C。手机300可以设置至少一个麦克风370C。
耳机接口370D用于连接有线耳机。耳机接口370D可以是USB接口330,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
陀螺仪传感器380A可以用于确定手机300的运动姿态。在一些实施例中,可以通过陀螺仪传感器380A确定手机300围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器380A可以用于拍摄防抖。具体到本申请的实施例中,若手机300通过陀螺仪传感器380A采集到的手机300抖动的角度等数据结合卷积神经网络的场景识别结果确定当前预览图像的场景是跳伞,手机300可以根据手机300抖动的角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消手机300的抖动,实现防抖。陀螺仪传感器380A还可以用于导航,体感游戏场景。
压力传感器380B用于测量压力或压强。例如,手机300通过压力传感器380B测得的气压值计算海拔高度,结合手机GPS定位,以及卷积神经网络的场景识别结果确定当前预览图像的场景是玉龙雪山,手机300可以调整拍摄参数,使其更适合当前场景的拍摄。
加速度传感器380C可检测手机300在各个方向上(一般为三轴)加速度的大小。当手机300静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。示例性的,具体到本申请的实施例中,若手机300结合通过加速度传感器380C采集到的手机300重力的大小及方向等数据、通过压力传感器380B采集到的压力数据、结合卷积神经网络的场景识别结果确定当前预览图像的场景是海底世界,手机300可以调整拍摄参数,使其更适合水下拍摄。
环境光传感器380F用于感知环境光亮度。示例性的,具体到本申请的实施例,手机300可以根据环境光传感器380F采集到的环境光亮度,结合卷积神经网络的场景识别结果确定当前预览画面的场景是黑夜,手机300可以补光进行拍摄,具体的补光量也可以视环境光传感器380F采集到的环境光亮度而定。
温度传感器380D用于检测温度。示例性的,具体到本申请的实施例,手机300可以根据温度传感器380D采集到的温度,结合卷积神经网络的场景识别结果确定图片拍摄的场景是樱花飘落而非飘雪,手机300可以将该图片渲染为冬日雪景的气氛。
触摸传感器380E,也称“触控器件”。触摸传感器380E(也称为触控面板)可以设置于显示屏394,由触摸传感器380E与显示屏394组成触摸屏,也称“触控屏”。触摸传感器380E用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触控事件类型。可以通过显示屏394提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器380E也可以设置于手机300的表面,与显示屏394所处的位置不同。具体到本申请中,手机300可以检测用户在显示屏394的虚拟拍摄按钮的按压操作,以及响应于该操作,对当前预览图像进行拍摄。
按键390包括开机键,音量键等。按键390可以是机械按键。也可以是触摸式按键。手机300可以接收按键输入,产生与手机300的用户设置以及功能控制有关的键信号输入。
马达391可以产生振动提示。马达391可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏394不同区域的触摸操作,马达391也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器392可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口395用于连接SIM卡。SIM卡可以通过***SIM卡接口395,或从SIM卡接口395拔出,实现和手机300的接触和分离。手机300可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口395可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口395可以同时***多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口395也可以兼容不同类型的SIM卡。SIM卡接口395也可以兼容外部存储卡。手机300通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,手机300采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在手机300中,不能和手机300分离。
以下结合图3中的手机,具体介绍本申请实施例提供的场景识别方法。以下实施例中的方法均可以在具有上述硬件结构的手机300中实现。
可以理解的,本申请实施例中,手机300可以执行本申请实施例中的部分或全部步骤,这些步骤或操作仅是示例,本申请实施例还可以执行其它操作或者各种操作的变形。此外,各个步骤可以按照本申请实施例呈现的不同的顺序来执行,并且有可能并非要执行本申请实施例中的全部操作。
如图4所示,本申请实施例中的场景识别方法可以通过S401-S403实现:
S401、手机300识别第一图像,确定第一图像的场景识别结果。
其中,第一图像可以理解为上文中的待识别图像。第一图像可以是用户通过本地摄像头拍摄的图片。例如,用户直接通过手机摄像头拍摄的照片,或者用户通过手机中安装的某一应用程序(Application,APP)调用手机摄像头拍摄的照片。该第一图像也可以是用户从其他设备获取到的图片。例如,用户通过微信从朋友接收到的飘雪图片,用户从互联网下载的樱花飘落图片。或者,该第一图像还可以是其他来源的图像。例如,用户录的视频中的某一帧图像。
在一些实施例中,手机300中可以集成有神经网络处理单元NPU芯片。卷积神经网络可以集成在该NPU芯片中。
手机300识别第一图像,确定第一图像的场景识别结果,可以包括:手机300将第一图像输入卷积神经网络,由卷积神经网络确定第一图像的场景识别结果。
其中,卷积神经网络可以在手机300出厂前预先训练好,固化在手机300中。也可以使用手机300在预设时间段内所拍摄的照片,或者接收的、下载的图片作为训练集,对卷积神经网络进行个性化训练,使得该卷积神经网络在进行场景识别时的准确度。例如,由于用户经常拍摄山川植物的照片,由于手机300使用用户拍摄的照片不断训练训练集,因此,手机300对于山川植物的场景识别结果准确度较高。
其中,场景识别结果中可以包括至少一个场景类别。若场景识别结果包括N个场景类别,N个场景类别可以按照每一种场景类别与第一图像的匹配程度由大到小排序。其中,N大于1,N为整数。
其中,每一种场景类别与第一图像的匹配程度,可以指每一种场景在卷积神经网络的训练集中对应的特征与第一图像中特征的匹配成功率。或者,N个场景类别还可以依据其他因素排名,本申请实施例对具体的排名规则、方法等不作限定。
S402、手机300获取采集第一图像时的拍摄信息。
其中,该拍摄信息用于标识采集第一图像时的环境信息。该拍摄信息包括但不限于以下信息中的一个或多个:时间信息、位置信息、天气信息和温度信息。
以下列举几个具体的实例对手机300获取拍摄信息进行具体介绍:
实例(A):手机300在启动摄像头之后,通过卷积神经网络和拍摄信息确定预览图像的标签,进而选择与该标签对应的拍摄参数。
在该实例中,拍摄信息是手机300采集的。例如,手机300可以通过与互联网同步获取当前时间信息;通过GPS获取当前位置信息;通过与互联网同步获取当前天气信息;通过湿度传感器获取当前湿度信息;通过温度传感器180D,或者从网络中获取当前温度信息;通过陀螺仪传感器180A或加速度传感器180C确定当前运动姿态信息;通过压力传感器180B获取当前海拔信息;通过环境光传感器180F获取当前环境光亮度信息等。
实例(B):用户希望知道其拍摄的图片中的植物的科系(如蜡梅科)以及名称(如腊梅)。用户可以通过UE中安装的APP识别该图片中植物的科系以及名称。
在该实例中,拍摄信息是手机300采集的,且在向APP输入该图片时,同时告知该APP 手机300采集到的拍摄信息。其中,手机300采集拍摄信息的方式和方法可以参考但不限于上文中列举的方式和方法。
实例(C):用户通过微信从朋友接收到该用户与朋友的一张合影,用户希望通过图片处理APP对该合影进行背景渲染。在该实例中,拍摄信息由拍摄设备在拍摄该合影时采集,并与该合影照片信息一起记录下来。
需要说明的是,以上仅作为几种举例说明拍摄信息可能的获取方式和途径。手机300还可以通过其他方法确定拍摄信息,本申请实施例对此不作限定。
S403、手机300根据拍摄信息,从场景识别结果中确定第一图像的标签。
其中,第一图像的标签用于第一图像的场景类别。该场景类别包括但不限于以下中的一个或多个:第一图像中的图像背景信息、第一图像对应的季节信息、第一图像对应的天气信息、第一图像的拍摄对象信息。
例如,对于上述实例(A),手机300在启动摄像头之后,手机300中的卷积神经网络对预览图像进行场景识别,并根据场景识别结果中的场景类别排序,结合手机300获取的拍摄信息确定预览图像的标签。又例如,对于上述实例(C),手机300中的购物类APP可以调用手机300中的卷积神经网络根据场景识别结果中的场景类别排序,结合拍摄信息确定该合影的标签为雪景。
在一种可能的实现方式中,S403可以通过以下过程来实现:
手机300确定拍摄信息与场景识别结果中每一个场景类别的匹配程度。手机300根据场景识别结果中每一个场景类别的匹配程度排序,结合拍摄信息与每一个场景类别的匹配程度确定第一图像的标签。
例如,上述实例(A)中。手机300的场景识别结果包括两个场景类别,“海洋世界”和“海底”,且两个场景类别与第一图像的匹配程度“海洋世界”>“海底”。但是拍摄信息显示当前压强为5×10 6帕斯卡(Pa),位置为渤海。因此,虽然场景类别“海洋世界”与第一图像的匹配程度较高,但是拍摄信息与“海底”的匹配程度大于拍摄信息与“海洋世界”的匹配程度。因此,手机300结合拍摄信息确定该场景类别并非“海洋世界”,而是“海底”。即该第一别图像的标签为“海底”。
例如,上述实例(B)中。手机300的场景识别结果包括两个场景类别,“腊梅”和“迎春花”,且两个场景类别与第一图像的匹配程度“迎春花”>“腊梅”。但是拍摄信息显示当前时日期为2018年12月25日,温度为-5℃。拍摄信息与“迎春花”的匹配程度为5%,拍摄信息与“腊梅”的匹配程度为95%。因此,虽然场景类别“迎春花”与第一图像的匹配程度较高,但是手机300结合拍摄信息确定该场景类别并非“迎春花”,而是“腊梅”。即该第一图像的标签为“腊梅”。
又例如,上述实例(C)中。手机300的场景识别结果包括两个场景类别,“飘雪”和“樱花飘落”,且两个场景类别与第一图像的匹配程度“飘雪”>“樱花飘落”。但是拍摄信息显示当前位置为上海某公园内,时间为2019年4月15日上午10:00,温度为23℃,天气为晴。拍摄信息与“飘雪”的匹配程度为20%,拍摄信息与“樱花飘落”的匹配程度为90%。因此,虽然场景类别“飘雪”与第一图像的匹配程度较高,但是手机300结合拍摄信息确定该场景类别并非“飘雪”,而是“樱花飘落”。即该第一图像的标签为“樱花飘落”。
基于本申请实施例提供的场景识别方法,可以避免单纯依靠算法导致的识别结果有偏差的问题,如上述示例,单纯依靠算法会将“海底”误识别为“海洋世界”,将“腊梅”误识别为“迎春花”,将“樱花飘落”误识别为“飘雪”。
对于对预览图像进行场景识别的情景。在一些实施例中,如图5所示,在步骤S403之后 该方法还可以包括:
S404、手机300调整摄像头的拍摄参数,使得拍摄参数与第一图像的标签匹配。
其中,上述摄像头的拍摄参数包括但不限于曝光度、感光度、光圈、白平衡、焦距、测光方式、闪光灯等。手机300在识别出预览图像的场景后,可以根据预览图像的场景识别结果自动调整拍摄参数,而无需手动调整,提高了拍摄参数的调整效率。另外,手机300自动调整的拍摄参数相比于不是非常专业的用户手动调整的拍摄参数,通常是更优的拍摄参数,更加适合当前预览图像,可以拍摄出更加高质量的照片或者视频。
在一种可能的实现方式中,不同的标签与不同的拍摄参数的对应关系可以是预先建立的。在确定第一图像的标签后,可以从预先建立的不同的标签与不同的拍摄参数的对应关系中,根据确定的标签查找对应的拍摄参数。
在一些实施例中,在使用调整后的拍摄参数完成拍摄之后,可以将拍摄参数恢复至初始参数,或者恢复至默认参数。便于在下次拍照预览时,重新识别场景。其中,初始参数或默认参数可以是手机最常拍摄的场景对应的拍摄参数。例如,用户最常拍摄的是山川风景,手机便可以将“山川风景”对应的拍摄参数设为的初始参数或默认参数。通过这样的方式,可以避免经常调整拍摄参数。
在一种可能的实现方式中,调整拍摄参数可以包括:将查找到的确定的标签对应的拍摄参数与初始参数(或默认参数)比较,若两者相同,则无需调整;若两者不同,则将拍摄参数由初始参数(或默认参数)调整为该标签对应的拍摄参数。
S405、手机300检测用户的拍摄指令。
例如:手机300检测到用户点击触摸屏上的拍照图标。或者手机300检测到了其他预设的动作,例如按下音量键,该预设置动态指示的是“拍照”或者“摄像”。
S406、响应于用户的所述拍摄指令,手机300使用调整后的拍摄参数拍摄预览画面。
在一些实施例中,手机300的NPU芯片中集成有寒武纪Cambricon指令集。NPU芯片使用Cambricon指令集加速卷积神经网络确定第一图像的标签的处理过程。
其中,Cambricon的设计原则是:
1)采用基于load-store访存模式的精简指令集计算机(Reduced Instruction Set Computer,RISC)。具体指令的选取,根据workload的类型进行计算层面的抽象得出。对于深层神经网络(Deep Neural Network,DNN)来说,主要的计算和控制任务有向量计算、矩阵计算、标量计算和分支跳转。
2)不引入复杂的缓存Cache体系和相关控制逻辑。这跟AI算法的workload类型有强关联,对于AI算法来说,数据局部性data locality并不强,cache对性能的影响不像常规计算任务那么大,所以把用于实现缓存层级cache hierarchy的控制逻辑精简掉,对于提升芯片的计算功耗比会有很大的助益。
3)使用暂存器Scratchpad Memory而不是寄存器堆来作为计算数据的主存储。因为AI算法的计算任务与常规的多媒体计算任务不同,指令所操作的数据长度往往是不定长的,所以应用于多媒体指令优化单指令多数据结构(Single Instruction Multiple Data,SIMD)的寄存器堆就不如Scrathpad Memory灵活。
其中,指令集可以划分为四大类,分别是计算类、逻辑类、控制类和数据存取类指令。计算类指令主要是针对神经网络的常用计算逻辑提供了指令集的支持。比如矩阵与矩阵的相乘,矩阵与向量的相系,向量与向量的相乘,等等。这类指令的一个特点是,指令所操作数据的长度是不定长的,以灵活支持不同尺寸的矩阵和向量。逻辑类指令主要是针对向量或矩 阵数据,完成逻辑判断操作。比如用于支持max-pooling的条件merge指令就可以对多组feature map,通过条件赋值,完成max-pooling的操作。控制类和数据存取类指令比较简单,就是提供了分支跳转以及数据的加载和写入。
如图6所示,为本申请实施例提供的一种基于Cambricon指令集的加速器架构。其中,图6中的标量函数单元(Scalar Func.Unit),矢量函数单元(Vector Func.Unit),矩阵函数单元(Matrix Func.Unit)指令在译码后,会先放到特征分类队列(Issue Queue)中等待。等从标量寄存器文件(Scalar Register File)中取得操作类型后,将指令发送到不同的模块处理。控制指令和标量计算会直接发送到标量函数单元处理。数据传输指令需要访问L1缓存(L1Cache),而向量和矩阵相关指令最终会分别发送到矢量函数单元和矩阵函数单元,这两个单元为向量和矩阵的操作加速而专门设计。
图6中的向量和矩阵操作指令使用了片内的高速暂存器(Scratchpad Memorry)。传统的处理器使用固定长度的寄存器的数据来参与处理器的计算,而在神经网络中,数据往往是不定长的,使用寄存器不大现实。而且传统的架构,寄存器数量太少,不适合向量和矩阵计算。这就是使用高速暂存器的目的。高速暂存器替代传统的寄存器,矢量函数单元和矩阵函数单元可以用高速暂存器的数据进行计算。在Cambricon的设计中,矢量高速暂存器是64K,矩阵高速暂存器是768K。
另外,为加速高速暂存器的访问,还为矢量函数单元和矩阵函数单元设计了3个直接内存存取(Direct Memory Access,DMA);另外还设计了一个直接内存存取输入/输出。Cambricon还设计了一套机制,把高速暂存器分为多个个不同的bank,以允许同时支持多个输入/输出接口。
在一些实施例中,如图7所示,在S403之后,或者如图8所示,在S406之后,本申请实施例中的场景识别方法还包括:
S701、手机300将第一图像及第一图像的标签更新进卷积神经网络的训练集。
S702、手机300根据更新后的训练集重新训练卷积神经网络。
可以理解的是,手机300为了实现上述任一个实施例的功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以对手机300进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
比如,以采用集成的方式划分各个功能模块的情况下,如图9所示,为本申请实施例提供的一种手机的结构示意图。该手机300可以包括场景识别单元910和信息获取单元920。
其中,场景识别单元910用于识别第一图像,确定第一图像的场景识别结果。其中,场景识别结果中包括至少一个场景类别。信息获取单元920用于获取采集第一图像时的拍摄信息,该拍摄信息至少包括:时间信息、位置信息、天气信息和温度信息中的一个或多个。场景识别单元910还用于,根据拍摄信息,从场景识别结果中确定第一图像的标签。其中,第一图像的标签用于指示第一图像的场景类别。
在一种可能得到结构中,该手机300还可以包括参数调整单元930,用于在场景识别单元根据拍摄信息,从场景识别结果中确定第一图像的标签之后,调整摄像头的拍摄参数,使得拍摄参数与第一图像的标签匹配。
需要说明的是,上述手机300还可以包括射频电路。具体的,手机300可以通过射频电路进行无线信号的接收和发送。通常,射频电路包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器、双工器等。此外,射频电路还可以通过无线通信和其他设备通信。所述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯***、通用分组无线服务、码分多址、宽带码分多址、长期演进、电子邮件、短消息服务等。
在一种可选的方式中,当使用软件实现数据传输时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地实现本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如软盘、硬盘、磁带)、光介质(例如DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
结合本申请实施例所描述的方法或者算法的步骤可以硬件的方式来实现,也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于探测装置中。当然,处理器和存储介质也可以作为分立组件存在于探测装置中。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的用户设备和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以 采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (24)

  1. 一种场景识别方法,其特征在于,所述方法包括:
    识别第一图像,确定所述第一图像的场景识别结果,所述场景识别结果中包括至少一个场景类别;
    获取采集所述第一图像时的拍摄信息,所述拍摄信息至少包括:时间信息、位置信息、天气信息和温度信息中的一个或多个;
    根据所述拍摄信息,从所述场景识别结果中确定所述第一图像的标签,所述第一图像的标签用于指示所述第一图像的场景类别。
  2. 根据权利要求1所述的方法,其特征在于,所述至少一个场景类别,至少包括:图像中的图像背景信息、图像对应的季节信息、图像对应的天气信息和图像的拍摄对象信息中的一个或多个。
  3. 根据权利要求1或2所述的方法,其特征在于,所述至少一个场景类别按照所述第一图像与每个场景类别的匹配程度由大到小排序。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述拍摄信息,从所述场景识别结果中确定所述第一图像的标签,包括:
    根据所述拍摄信息与所述至少一个场景类别的匹配程度,以及所述至少一个场景类别由大到小的排序,确定所述第一图像的标签。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述方法应用于包括神经网络处理单元NPU芯片的电子设备;
    所述识别所述第一图像,确定所述第一图像的场景识别结果,包括:
    通过所述NPU芯片识别所述第一图像,确定所述第一图像的场景识别结果。
  6. 根据权利要求5所述的方法,其特征在于,所述NPU芯片中集成有寒武纪Cambricon指令集;所述NPU芯片使用寒武纪Cambricon指令集加速确定所述第一图像的场景识别结果的过程。
  7. 根据权利要求5或6所述的方法,其特征在于,所述第一图像是所述电子设备的摄像头采集的预览图像。
  8. 根据权利要求1-6任一项所述的方法,其特征在于,所述第一图像是已存储的图片;或者,所述第一图像是从其他设备获取的图片。
  9. 根据权利要求7所述的方法,其特征在于,在根据所述拍摄信息,从所述场景识别结果中确定所述第一图像的标签之后,所述方法还包括:
    调整所述摄像头的拍摄参数,使得所述拍摄参数与所述第一图像的标签匹配。
  10. 根据权利要求1-9任一项所述的方法,其特征在于,所述NPU芯片中集成有卷积神经网络,所述方法还包括:
    将所述第一图像及所述第一图像的标签更新进所述卷积神经网络的训练集;
    根据更新后的训练集重新训练所述卷积神经网络。
  11. 一种场景识别装置,其特征在于,所述场景识别装置包括:
    场景识别单元,用于识别第一图像,确定所述第一图像的场景识别结果,所述场景识别结果中包括至少一个场景类别;
    信息获取单元,用于获取采集所述第一图像时的拍摄信息,所述拍摄信息至少包括:时间信息、位置信息、天气信息和温度信息中的一个或多个;
    所述场景识别单元还用于,根据所述拍摄信息,从所述场景识别结果中确定所述第一图 像的标签,所述第一图像的标签用于指示所述第一图像的场景类别。
  12. 根据权利要求11所述的装置,其特征在于,所述至少一个场景类别,至少包括:图像中的图像背景信息、图像对应的季节信息、图像对应的天气信息和图像的拍摄对象信息中的一个或多个。
  13. 根据权利要求11或12所述的装置,其特征在于,所述至少一个场景类别按照所述第一图像与每个场景类别的匹配程度由大到小排序。
  14. 根据权利要求13所述的装置,其特征在于,所述场景识别单元根据所述拍摄信息,从所述场景识别结果中确定所述第一图像的标签,包括:
    所述场景识别单元根据所述拍摄信息与所述至少一个场景类别的匹配程度,以及所述至少一个场景类别由大到小的排序,确定所述第一图像的标签。
  15. 根据权利要求11-14任一项所述的装置,其特征在于,所述场景识别单元包括神经网络处理单元NPU;
    所述场景识别单元识别所述第一图像,确定所述第一图像的场景识别结果,包括:
    所述场景识别单元通过所述NPU芯片识别所述第一图像,确定所述第一图像的场景识别结果。
  16. 根据权利要求15所述的装置,其特征在于,所述NPU芯片中集成有寒武纪Cambricon指令集;所述NPU芯片使用寒武纪Cambricon指令集加速确定所述第一图像的场景识别结果的过程。
  17. 根据权利要求15或16所述的装置,其特征在于,所述场景识别装置还包括:摄像头,所述第一图像是所述摄像头采集的预览图像。
  18. 根据权利要求11-16任一项所述的装置,其特征在于,所述第一图像是已存储的图片;或者,所述第一图像是从其他设备获取的图片。
  19. 根据权利要求17所述的装置,其特征在于,所述装置还包括:
    参数调整单元,用于在所述场景识别单元根据所述拍摄信息,从所述场景识别结果中确定所述第一图像的标签之后,调整所述摄像头的拍摄参数,使得所述拍摄参数与所述第一图像的标签匹配。
  20. 根据权利要求11-19任一项所述的装置,其特征在于,所述NPU芯片中集成有卷积神经网络,所述场景识别单元还用于:
    将所述第一图像及所述第一图像的标签更新进所述卷积神经网络的训练集;
    根据更新后的训练集重新训练所述卷积神经网络。
  21. 一种用户设备UE,其特征在于,所述UE包括:场景识别装置,所述场景识别装置用于执行如权利要求1-10任一项所述的场景识别方法。
  22. 一种用户设备UE,其特征在于,所述UE包括:
    存储器,用于存储计算机程序代码,所述计算机程序代码包括指令;
    射频电路,用于进行无线信号的发送和接收;
    处理器,用于执行所述指令实现如权利要求1-10任一项所述的场景识别方法。
  23. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机执行指令,所述计算机执行指令被处理电路执行时实现如权利要求1-10任一项所述的场景识别方法。
  24. 一种芯片***,其特征在于,所述芯片***包括处理器、存储器,所述存储器中存储有指令;所述指令被所述处理器执行时,实现如权利要求1-10任一项所述的场景识别方法。
PCT/CN2020/091690 2019-05-28 2020-05-22 一种场景识别方法、一种场景识别装置及一种电子设备 WO2020238775A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910452148.1 2019-05-28
CN201910452148.1A CN110348291A (zh) 2019-05-28 2019-05-28 一种场景识别方法、一种场景识别装置及一种电子设备

Publications (1)

Publication Number Publication Date
WO2020238775A1 true WO2020238775A1 (zh) 2020-12-03

Family

ID=68174121

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/091690 WO2020238775A1 (zh) 2019-05-28 2020-05-22 一种场景识别方法、一种场景识别装置及一种电子设备

Country Status (2)

Country Link
CN (1) CN110348291A (zh)
WO (1) WO2020238775A1 (zh)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113483283A (zh) * 2021-08-05 2021-10-08 威强科技(北京)有限公司 一种可根据使用场景自动调节姿态的照明装置
CN113824884A (zh) * 2021-10-20 2021-12-21 深圳市睿联技术股份有限公司 拍摄方法与装置、摄影设备及计算机可读存储介质
CN114339028A (zh) * 2021-11-17 2022-04-12 深圳天珑无线科技有限公司 拍照方法、电子设备以及计算机可读存储介质
CN114422682A (zh) * 2022-01-28 2022-04-29 安谋科技(中国)有限公司 拍摄方法、电子设备和可读存储介质
CN114697516A (zh) * 2020-12-25 2022-07-01 花瓣云科技有限公司 三维模型重建方法、设备和存储介质
CN116055712A (zh) * 2022-08-16 2023-05-02 荣耀终端有限公司 成片率确定方法、装置、芯片、电子设备及介质
CN116074623A (zh) * 2022-05-30 2023-05-05 荣耀终端有限公司 一种摄像头的分辨率选择方法和装置
WO2023123601A1 (zh) * 2021-12-27 2023-07-06 展讯通信(上海)有限公司 图像色彩处理方法、装置和电子设备
CN117133311A (zh) * 2023-02-09 2023-11-28 荣耀终端有限公司 音频场景识别方法及电子设备

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110348291A (zh) * 2019-05-28 2019-10-18 华为技术有限公司 一种场景识别方法、一种场景识别装置及一种电子设备
CN112101387A (zh) * 2020-09-24 2020-12-18 维沃移动通信有限公司 显著性元素识别方法及装置
CN112819064B (zh) * 2021-01-28 2022-04-22 南京航空航天大学 基于谱聚类的终端区时序气象场景识别方法
CN113095194A (zh) * 2021-04-02 2021-07-09 北京车和家信息技术有限公司 图像分类方法、装置、存储介质及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764208A (zh) * 2018-06-08 2018-11-06 Oppo广东移动通信有限公司 图像处理方法和装置、存储介质、电子设备
CN109101931A (zh) * 2018-08-20 2018-12-28 Oppo广东移动通信有限公司 一种场景识别方法、场景识别装置及终端设备
CN109271899A (zh) * 2018-08-31 2019-01-25 朱钢 一种提高Ai智慧摄影场景识别准确率的实现方法
CN109389209A (zh) * 2017-08-09 2019-02-26 上海寒武纪信息科技有限公司 处理装置及处理方法
CN110348291A (zh) * 2019-05-28 2019-10-18 华为技术有限公司 一种场景识别方法、一种场景识别装置及一种电子设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102207966B (zh) * 2011-06-01 2013-07-10 华南理工大学 基于对象标签的视频内容快速检索方法
CN103220431A (zh) * 2013-05-07 2013-07-24 深圳市中兴移动通信有限公司 自动切换拍照模式的方法及装置
CN105447460B (zh) * 2015-11-20 2019-05-31 联想(北京)有限公司 一种信息处理方法及电子设备
CN108304821B (zh) * 2018-02-14 2020-12-18 Oppo广东移动通信有限公司 图像识别方法及装置、图像获取方法及设备、计算机设备及非易失性计算机可读存储介质
CN108921040A (zh) * 2018-06-08 2018-11-30 Oppo广东移动通信有限公司 图像处理方法和装置、存储介质、电子设备
CN108898174A (zh) * 2018-06-25 2018-11-27 Oppo(重庆)智能科技有限公司 一种场景数据采集方法、场景数据采集装置及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389209A (zh) * 2017-08-09 2019-02-26 上海寒武纪信息科技有限公司 处理装置及处理方法
CN108764208A (zh) * 2018-06-08 2018-11-06 Oppo广东移动通信有限公司 图像处理方法和装置、存储介质、电子设备
CN109101931A (zh) * 2018-08-20 2018-12-28 Oppo广东移动通信有限公司 一种场景识别方法、场景识别装置及终端设备
CN109271899A (zh) * 2018-08-31 2019-01-25 朱钢 一种提高Ai智慧摄影场景识别准确率的实现方法
CN110348291A (zh) * 2019-05-28 2019-10-18 华为技术有限公司 一种场景识别方法、一种场景识别装置及一种电子设备

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114697516A (zh) * 2020-12-25 2022-07-01 花瓣云科技有限公司 三维模型重建方法、设备和存储介质
CN114697516B (zh) * 2020-12-25 2023-11-10 花瓣云科技有限公司 三维模型重建方法、设备和存储介质
CN113483283A (zh) * 2021-08-05 2021-10-08 威强科技(北京)有限公司 一种可根据使用场景自动调节姿态的照明装置
CN113824884A (zh) * 2021-10-20 2021-12-21 深圳市睿联技术股份有限公司 拍摄方法与装置、摄影设备及计算机可读存储介质
CN113824884B (zh) * 2021-10-20 2023-08-08 深圳市睿联技术股份有限公司 拍摄方法与装置、摄影设备及计算机可读存储介质
CN114339028A (zh) * 2021-11-17 2022-04-12 深圳天珑无线科技有限公司 拍照方法、电子设备以及计算机可读存储介质
WO2023123601A1 (zh) * 2021-12-27 2023-07-06 展讯通信(上海)有限公司 图像色彩处理方法、装置和电子设备
CN114422682A (zh) * 2022-01-28 2022-04-29 安谋科技(中国)有限公司 拍摄方法、电子设备和可读存储介质
CN114422682B (zh) * 2022-01-28 2024-02-02 安谋科技(中国)有限公司 拍摄方法、电子设备和可读存储介质
CN116074623A (zh) * 2022-05-30 2023-05-05 荣耀终端有限公司 一种摄像头的分辨率选择方法和装置
CN116074623B (zh) * 2022-05-30 2023-11-28 荣耀终端有限公司 一种摄像头的分辨率选择方法和装置
CN116055712A (zh) * 2022-08-16 2023-05-02 荣耀终端有限公司 成片率确定方法、装置、芯片、电子设备及介质
CN116055712B (zh) * 2022-08-16 2024-04-05 荣耀终端有限公司 成片率确定方法、装置、芯片、电子设备及介质
CN117133311A (zh) * 2023-02-09 2023-11-28 荣耀终端有限公司 音频场景识别方法及电子设备
CN117133311B (zh) * 2023-02-09 2024-05-10 荣耀终端有限公司 音频场景识别方法及电子设备

Also Published As

Publication number Publication date
CN110348291A (zh) 2019-10-18

Similar Documents

Publication Publication Date Title
WO2020238775A1 (zh) 一种场景识别方法、一种场景识别装置及一种电子设备
WO2021052232A1 (zh) 一种延时摄影的拍摄方法及设备
WO2020192461A1 (zh) 一种延时摄影的录制方法及电子设备
WO2021104485A1 (zh) 一种拍摄方法及电子设备
WO2021135707A1 (zh) 机器学习模型的搜索方法及相关装置、设备
WO2022017261A1 (zh) 图像合成方法和电子设备
CN112840635A (zh) 智能拍照方法、***及相关装置
CN111625670A (zh) 一种图片分组方法及设备
WO2021169515A1 (zh) 一种设备间数据交互的方法及相关设备
WO2021057752A1 (zh) 图像选优方法及电子设备
CN112446832A (zh) 一种图像处理方法及电子设备
WO2021068926A1 (zh) 模型更新方法、工作节点及模型更新***
WO2022022319A1 (zh) 一种图像处理方法、电子设备、图像处理***及芯片***
CN111552451A (zh) 显示控制方法及装置、计算机可读介质及终端设备
CN112700377A (zh) 图像泛光处理方法及装置、存储介质
CN111176465A (zh) 使用状态识别方法、装置、存储介质与电子设备
WO2022143921A1 (zh) 一种图像重建方法、相关装置及***
WO2022062884A1 (zh) 文字输入方法、电子设备及计算机可读存储介质
CN112188094B (zh) 图像处理方法及装置、计算机可读介质及终端设备
CN113467735A (zh) 图像调整方法、电子设备及存储介质
CN115150542A (zh) 一种视频防抖方法及相关设备
CN112584037A (zh) 保存图像的方法及电子设备
CN113536834A (zh) 眼袋检测方法以及装置
WO2022214004A1 (zh) 一种目标用户确定方法、电子设备和计算机可读存储介质
WO2022033344A1 (zh) 视频防抖方法、终端设备和计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20814053

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20814053

Country of ref document: EP

Kind code of ref document: A1