WO2022204925A1 - 一种图像的获取方法以及相关设备 - Google Patents

一种图像的获取方法以及相关设备 Download PDF

Info

Publication number
WO2022204925A1
WO2022204925A1 PCT/CN2021/083874 CN2021083874W WO2022204925A1 WO 2022204925 A1 WO2022204925 A1 WO 2022204925A1 CN 2021083874 W CN2021083874 W CN 2021083874W WO 2022204925 A1 WO2022204925 A1 WO 2022204925A1
Authority
WO
WIPO (PCT)
Prior art keywords
vehicle
target
moment
image
image data
Prior art date
Application number
PCT/CN2021/083874
Other languages
English (en)
French (fr)
Inventor
黄怡
汤秋缘
李皓
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202180000814.3A priority Critical patent/CN113228620B/zh
Priority to PCT/CN2021/083874 priority patent/WO2022204925A1/zh
Publication of WO2022204925A1 publication Critical patent/WO2022204925A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C5/00Registering or indicating the working of vehicles
    • G07C5/08Registering or indicating performance data other than driving, working, idle, or waiting time, with or without registering driving, working, idle or waiting time
    • G07C5/0841Registering performance data

Definitions

  • the present application relates to the field of artificial intelligence, and in particular, to an image acquisition method and related equipment.
  • the embodiments of the present application provide an image acquisition method and related equipment, which no longer require users to shoot images, thereby solving the problem that drivers cannot shoot images, and also solving the problem of potential safety hazards in user shooting; Included in the first time period, the time interval between the shooting time of the target image and the first time is less than or equal to the target threshold, so as to avoid missing the scene you want to shoot.
  • an embodiment of the present application provides an image acquisition method, which can be used in the field of vehicles in the field of artificial intelligence.
  • the method is applied to a vehicle, and one or more first camera devices are configured outside the vehicle.
  • the method includes: The environment around the vehicle is photographed by the first camera device to obtain first image data, and the first image data corresponds to the environment around the vehicle in the first time period; the first image data can be expressed as a The video obtained by recording the environment around the vehicle, or the first image data includes a plurality of first video frames (that is, images) obtained from the foregoing video, the plurality of first video frames and each The time corresponds to, or the first image data includes a plurality of first images obtained by photographing the environment around the vehicle within the first time period.
  • the vehicle In response to the acquired photographing instruction, the vehicle acquires the first moment corresponding to the photographing instruction; the vehicle acquires the target image from the first image data according to the first moment, and outputs the target image, and the first moment is included in the first time period , the interval between the shooting moment of the target image and the first moment is less than or equal to the target threshold, and the target threshold can be 5 seconds, 8 seconds, 10 seconds, 15 seconds or other values.
  • the environment around the vehicle is photographed by the first camera device configured in the vehicle to obtain the first image data, and when the photographing instruction input by the user is received, the first moment corresponding to the photographing instruction can be obtained, from In the first image data, the shooting time is selected as the target image at the first time, that is, the user is no longer required to shoot the image, thereby solving the problem that the driver cannot shoot the image, and also solving the problem of potential safety hazards in the shooting by the user; in addition, Since the first image data corresponds to the environment around the vehicle in the first time period, that is, the first camera device configured in the vehicle is used to photograph the environment around the vehicle, and then the image collected at the first moment is selected to avoid missing the desired image. the scene to be photographed.
  • the method further includes: the vehicle generates a photographing instruction in response to the received target voice, and the intention corresponding to the target voice is to take a photo;
  • Recognition model after the vehicle obtains any voice information input by the user (for the convenience of description, it will be referred to as "the first voice information" in the following), it can convert the first voice information input by the user through the model for voice recognition. is the text content, and then according to the text content corresponding to the first voice information, to determine whether the user has the intention of taking pictures, that is, whether the first voice information is the target voice, if the first voice information is the target voice, the vehicle determines to obtain to the camera command entered by the user.
  • the vehicle collects the gesture information input by the second user, and when it is determined according to the gesture information of the second user that the preset gesture input by the second user is determined, in response to the acquired preset gesture, a photographing instruction is generated; the preset gesture may be a static gesture , can also be dynamic gestures.
  • the second user may be any user inside the vehicle, or may be a passenger limited to a fixed position only.
  • the user can trigger the vehicle to generate a photographing instruction by inputting a voice or gesture, and the operation is simple and easy to implement.
  • the acquisition process of the first moment corresponding to the vehicle and the photographing instruction is performed.
  • the vehicle determines a third moment corresponding to the photographing instruction in response to the acquired photographing instruction, and obtains the first moment corresponding to the photographing instruction according to the third moment, and the third moment is the time of the photographing instruction. generation time; in another implementation, if the vehicle generates a photographing instruction in response to the received target voice, the vehicle acquires the fourth moment corresponding to the target voice in response to the acquired photographing instruction, and according to the At four moments, the first moment corresponding to the photographing instruction is acquired, and the fourth moment is the acquisition moment of the target voice.
  • the acquisition moment of the target voice can be any of the following moments: The start acquisition time of the speech, the end acquisition time of the target speech, or the middle time of acquiring the target speech.
  • the vehicle if the vehicle generates a photographing instruction in response to the acquired preset gesture, the vehicle acquires a second moment corresponding to the gesture instruction in response to the acquired photographing instruction, and the second moment corresponds to the gesture Instruct the shooting moment of the corresponding gesture image, and the preset gesture may be a dynamic gesture or a static gesture.
  • a model and a semantic library for performing natural language processing (NLP) tasks may also be configured in the vehicle, and the vehicle will use the text content and semantics corresponding to the first speech information
  • the library inputs the model for performing the NLP task, so as to determine whether the intent corresponding to the first voice information is to take a picture through the model for performing the NLP task.
  • the intent refers to the purpose of the user and is used to indicate the needs of the user.
  • the vehicle can recognize the user's intention from the voice information input by the user device.
  • the model for performing the NLP task can be specifically implemented by a neural network, or it can be implemented by a non-neural network model.
  • the model for performing the NLP task can specifically use a bidirectional attention neural network (bidirectional encoder representations from transformers, BERT), recurrent neural network (RNN), question-answering neural network (QANet) and other models for machine reading comprehension (machine reading comprehension, MRC), or other Models that can implement semantic understanding capabilities are also fine.
  • BERT bidirectional attention neural network
  • RNN recurrent neural network
  • QANet question-answering neural network
  • MRC machine reading comprehension
  • the method further includes: the vehicle acquires at least one target keyword from the target speech, where the target keyword is description information of the shooting object and/or shooting direction; wherein the keyword refers to the intent
  • the specific information of the content is also the key information used to trigger a specific service.
  • the keyword is, for example, a keyword in the user input information. As an example, for example, "right” and “black car” in the user input information "take a picture of the black car on the right" are the keywords of the input information.
  • the semantic library may also include slot information, where the slot information is the description information of the keyword, for any voice information in the voice information input by the user (for convenience of description, hereinafter referred to as "the first voice information")
  • the vehicle can input the text content and semantic library corresponding to the first voice information into the model for performing the NLP task, so as to judge the intention corresponding to the first voice information through the model for performing the NLP task, and use the model for performing the NLP task to judge the intention corresponding to the first voice information.
  • the model for performing the NLP task extracts the keywords in the first voice information, outputs the intent corresponding to the first voice information, and the keywords extracted from the first voice information.
  • the vehicle acquiring the target image from the first image data includes: the vehicle acquiring the target image from the first image data according to the target keyword.
  • the keyword for describing the shooting object can be used to describe the name, type, Color, shape, or other description information of the photographed object; and/or, if there is a keyword for describing the photographing direction in at least one target keyword, the photographing direction of the target image is the direction pointed by the target keyword.
  • the photographing instruction is input in the form of voice, and after obtaining the target voice for triggering the photographing, the target keyword is also obtained from the target voice, and the target keyword is used to point to the object the user wants to photograph, or,
  • the target keyword is used to point to the direction the user wants to shoot, that is, the vehicle can further understand what kind of image the user wants, which is beneficial to improve the accuracy of the output target image, that is, it is beneficial to output the image that meets the user's expectations. In order to further improve the user viscosity of this program.
  • the preset gesture is a preset static gesture
  • the vehicle in response to the acquired photographing instruction, acquires the first moment corresponding to the photographing instruction, including: the vehicle responding to the acquired photographing instruction , obtain a second moment corresponding to the preset gesture, determine the second moment as the first moment, and the second moment is the shooting moment of the preset static gesture.
  • the shooting moment of the static gesture can be directly acquired, and the preset static gesture is taken as the first moment, providing another acquisition method of the first moment, and increasing the The implementation flexibility of this solution; in addition, since the time interval between when the user sees the object they want to shoot and makes the preset static gesture is generally relatively short, the shooting moment of the preset static gesture is directly determined as the first The time is also more in line with the shooting time that the user actually wants.
  • the vehicle in response to the acquired photographing instruction, acquires the first moment corresponding to the photographing instruction, including: the vehicle, in response to the acquired photographing instruction, determines a third time corresponding to the photographing instruction time, the third time is the generation time of the photographing instruction; the vehicle determines the first time according to the third time, the first time is located before the third time, and the value of the first duration can be 0.5 seconds, 1 second, 2 seconds, 3 seconds, 5 seconds, or other values, etc., which can be determined in combination with factors such as the speed at which the vehicle processes the voice information.
  • the vehicle since the vehicle has been moving forward in the process that the vehicle receives the voice information input by the user and the vehicle processes the semantic information input by the user to determine whether it is the target voice, when the vehicle generates the photographing instruction, it has been omitted. It is later than the moment when the user wants to take a picture, and the first moment is determined before the third moment, that is, the first moment may be closer to the moment when the user wants to take a picture, and the first moment is used as the reference point for obtaining the image. to an image that is more in line with what the user actually wants to get.
  • the vehicle is configured with at least two first camera devices, and the shooting ranges of different first camera devices do not overlap or partially overlap
  • the method further includes: the vehicle acquires a first direction;
  • the direction is determined according to any one or more of the following: the sight line direction of the first user, the face orientation of the first user, the body orientation of the first user, or other directions, etc., which are not limited here.
  • the first user may be any one or a combination of any of the following: a driver, a user located at a preset position in the vehicle, a user who issued a photographing instruction, or other types of users, etc., which type of user is specifically selected , which can be determined according to the actual situation.
  • the vehicle selects at least one (for convenience of description, hereinafter referred to as M) target first camera devices whose shooting range covers the first direction from at least two (for convenience of description, hereinafter referred to as S) first camera devices.
  • the vehicle obtains the target image from the first image data, including: the vehicle selects second image data from the first image data, the second image data is a subset of the first image data, and the second image data is S first camera devices is acquired by M target camera devices; the vehicle acquires the target image from the second image data.
  • the first direction is also acquired, and the shooting range is selected from the multiple camera devices.
  • the range covers the target camera device in the first direction, and selects the second image data captured by the target camera device from the first image data, and then obtains the target image from the second image data.
  • the second image data is relative to the first image data
  • the amount of data is less, which improves the efficiency of the step of acquiring the target image compared to directly acquiring the target image from the first image data
  • the first direction is determined according to any one or more of the following: the line of sight of the user direction, the user's face orientation, the user's body orientation, and the user's gesture direction, while the general user will face, look or point to the area of interest, and use the first direction to filter the captured image data, which is conducive to screening the user. Desired image to increase user stickiness of this program.
  • the vehicle is configured with at least two camera devices, and the shooting ranges of different camera devices do not overlap or partially overlap
  • the method further includes: acquiring target keywords from the target speech, the target keywords is the description information of the shooting object and/or shooting direction; obtain the first direction, and the first direction is determined according to any one or more of the following: line of sight direction, face orientation and body orientation; select the shooting range coverage from at least two camera devices The target camera in the first direction.
  • the vehicle acquires the target image from the first image data, including: selecting second image data from the first image data, and the second image data is collected by the target camera device; acquiring from the second image data according to the target keyword The target image, wherein there is an object indicated by the target keyword in the target image, and/or the shooting direction of the target image is the direction pointed by the target keyword.
  • an embodiment of the present application provides an image acquisition device, which can be used in the field of vehicles in the field of artificial intelligence.
  • the image acquisition device is applied to a vehicle, and the vehicle is equipped with a camera device.
  • the image acquisition device includes: a shooting module , which is used to control the camera device to photograph the environment around the vehicle to obtain first image data, which corresponds to the environment around the vehicle in the first time period; the acquisition module is used to respond to the obtained photographing instruction , obtains the first moment corresponding to the photographing instruction; the obtaining module is used to obtain the target image from the first image data according to the first moment, and output the target image, the first moment is included in the first time period, and the The interval duration between the shooting moment and the first moment is less than or equal to the target threshold.
  • the image processing apparatus provided in the second aspect of the embodiment of the present application may also perform the steps performed by the vehicle in each possible implementation manner of the first aspect.
  • the implementation steps and the beneficial effects brought by each possible implementation manner reference may be made to the descriptions in the various possible implementation manners in the first aspect, and details are not repeated here.
  • an embodiment of the present application provides a vehicle, which may include a processor, the processor is coupled to a memory, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the above-mentioned first aspect is implemented The steps performed by the vehicle in the image acquisition method.
  • an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the program is run on a computer, the computer is made to execute the above-mentioned first aspect. The steps performed by the vehicle in the image acquisition method.
  • an embodiment of the present application provides a circuit system, the circuit system includes a processing circuit, and the processing circuit is configured to execute the steps performed by the vehicle in the image acquisition method.
  • an embodiment of the present application provides a computer program that, when running on a computer, causes the computer to execute the steps performed by the vehicle in the image acquisition method described in the first aspect above.
  • an embodiment of the present application provides a chip system, where the chip system includes a processor for implementing the functions involved in the above aspects, for example, sending or processing the data and/or information involved in the above method .
  • the chip system further includes a memory for storing necessary program instructions and data of the server or the communication device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • FIG. 1a is a schematic structural diagram of a vehicle in the image acquisition method provided by the embodiment of the application;
  • FIG. 1b is a schematic flowchart of a method for acquiring an image provided by an embodiment of the present application
  • FIG. 2 is another schematic flowchart of the image acquisition method provided by the embodiment of the present application.
  • FIG. 3 is a schematic interface diagram of a function of triggering the shooting of the surrounding environment in the image acquisition method provided by the embodiment of the present application;
  • FIG. 4 is a schematic diagram of an interface for acquiring keywords in an image acquisition method provided by an embodiment of the present application
  • FIG. 5 is another schematic flowchart of the image acquisition method provided by the embodiment of the present application.
  • FIG. 6 is a schematic diagram of a target image and a first moment in an image acquisition method provided by an embodiment of the present application
  • FIG. 7 is a schematic diagram of an interface for outputting a target image in an image acquisition method provided by an embodiment of the present application.
  • FIG. 8 is another schematic flowchart of an image acquisition method provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an image acquisition apparatus provided by an embodiment of the present application.
  • FIG. 10 is another schematic structural diagram of an image acquisition apparatus provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a vehicle provided by an embodiment of the application.
  • FIG. 12 is another schematic structural diagram of the vehicle provided by the embodiment of the application.
  • the embodiments of the present application can be applied to various scenarios where a vehicle needs to be photographed during driving.
  • the aforementioned vehicles include but are not limited to cars, trucks, motorcycles, buses, boats, airplanes, helicopters, recreational vehicles, playground vehicles, Construction equipment, trams, golf carts, trains, etc.
  • a user including a driver or a passenger in the vehicle
  • the driver does not have time to take a picture of the environment around the vehicle, and the vehicle is running at high speed.
  • the passengers do not have time to take pictures of the environment around the vehicle.
  • the safety of the user taking pictures of the environment around the vehicle is also low.
  • FIG. 1a A schematic diagram of the structure.
  • the vehicle is shown as a car as an example.
  • the black origin in Fig. 1a represents the position of the camera outside the vehicle.
  • the exterior of the vehicle is equipped with multiple cameras. device (in Fig.
  • FIG. 1a there are 6 camera devices as an example), different camera devices are arranged at different positions of the vehicle, so that the shooting ranges of different camera devices do not overlap or partially overlap, it should be understood that the example in Fig. 1a This scheme is only for the convenience of understanding, and is not used to limit this scheme.
  • FIG. 1 b is a schematic flowchart of a method for acquiring an image provided by an embodiment of the present application.
  • the vehicle continuously shoots the environment around the vehicle through an external camera device, or the vehicle non-continuously shoots the environment around the vehicle through an external camera device to acquire first image data; wherein the first image data Corresponding to the environment around the vehicle in the first time period, the first time period refers to the time when the vehicle is photographed by the camera device; the first time period includes multiple moments, that is, the first image data includes multiple moments around the vehicle. image of the environment.
  • the vehicle can detect the photographing instruction input by the user in real time, and in response to the acquired photographing instruction, obtain the first moment corresponding to the photographing instruction; wherein, the photographing instruction can be triggered based on the voice or gesture input by the user; or, the vehicle also The blinking frequency of the user can be collected.
  • the blinking frequency of the user is greater than or equal to the preset threshold, it is determined that the photographing instruction input by the user has been obtained;
  • the heart rate is greater than or equal to the preset threshold, it is determined that the photographing instruction input by the user has been obtained;
  • the vehicle can also collect other types of human body information of the user to obtain the photographing instruction input by the user, etc.
  • the types of photographing instructions are not exhaustive here. . S3.
  • the vehicle After determining the first time, the vehicle can obtain the target image from the first image data according to the first time, and output the target image.
  • the first time is included in the first time period, and the shooting time of the target image is the same as the first time.
  • the time interval between moments is less than the target threshold.
  • the first image data is captured by the camera device configured in the vehicle, that is, the user is no longer required to capture images, thus solving the problem that the driver or passengers cannot capture images , which also solves the problem of potential safety hazards in user shooting; in addition, because the first image data corresponds to the environment around the vehicle in the first time period, that is, the camera device configured in the vehicle is used to continuously shoot the environment around the vehicle, and then select Images captured at the first moment to avoid missing the desired scene.
  • the user's personal information such as voice, gesture, blink rate, heart rate and other personal information during driving may involve the user's personal privacy, in an implementation manner, the user can input the first operation (the first operation) to the vehicle.
  • the vehicle in response to the first operation input by the user, the vehicle triggers to start collecting the aforementioned information, so as to trigger the generation of a photographing instruction.
  • the vehicle may output inquiry information to the user to determine whether one or more kinds of information of the user can be collected; specifically, the vehicle may output inquiry information through voice, text or other means, as an example, for example, The vehicle outputs "May I ask if the voice information from you can be collected?" etc., and if the user replies "Yes", it is determined that the vehicle can collect the voice sent by the user. It should be understood that the examples here are only for the convenience of understanding this solution, and no limited to this program.
  • the photographing instruction can be embodied in various forms.
  • the image acquisition method provided by the embodiment of the present application is described in detail by only taking the photographing instruction using voice instruction and gesture instruction as an example.
  • the camera command is a voice command
  • FIG. 2 is a schematic flowchart of a method for acquiring an image provided by an embodiment of the present application.
  • the method for acquiring an image provided by an embodiment of the present application may include:
  • the vehicle controls a first camera device to photograph the environment around the vehicle to acquire first image data.
  • S first camera devices may be configured outside the vehicle, and the environment around the vehicle is photographed by the aforementioned S first camera devices to obtain first image data, where S is greater than or equal to 1
  • S is greater than or equal to 1
  • the shooting directions of different first camera devices do not overlap or partially overlap; the first image data corresponds to the environment around the vehicle in the first time period.
  • the concept of an image data can be understood in conjunction with the above description.
  • the vehicle may record the environment around the vehicle through the S first camera devices, so as to obtain video data corresponding to the environment around the vehicle in the first time period. Further, in an implementation manner, the vehicle may directly determine the aforementioned video data as the first image data, that is, the first image data may specifically represent a video obtained by photographing the environment around the vehicle within the first time period.
  • the vehicle may perform a video frame extraction operation according to the video data corresponding to the environment around the vehicle in the first time period to obtain the first image data, that is, the first image data includes a plurality of first image data.
  • the plurality of first video frames correspond to each moment in the first time period.
  • the vehicle may use the first camera device to take pictures of the environment around the vehicle according to the target frequency to obtain the first image data, that is, the first image data includes a plurality of first image data.
  • the plurality of first images are used to show the environment around the vehicle at each moment in the first time period.
  • a plurality of first camera devices outside the vehicle can be triggered to automatically and continuously photograph the surrounding environment of the vehicle.
  • the purpose of photographing the environment around the vehicle by a camera device may be not only to facilitate the user to photograph the surrounding environment, but also to assist the vehicle in path planning, etc., which is not limited here.
  • the vehicle's photographing function can be turned on, and a first camera device outside the vehicle can trigger a camera around the vehicle. environment for continuous shooting.
  • the vehicle since the vehicle is pre-configured with a plurality of external first camera devices, after the vehicle is started, the vehicle continuously shoots the environment around the vehicle through some of the external first camera devices, and when the vehicle detects the user After the input of the first operation, in response to the detected first operation, all the first camera devices outside the vehicle are triggered to continuously photograph the surrounding environment of the vehicle, etc.
  • the vehicle triggers the use of the first camera device to capture the surrounding environment. The way the environment is photographed.
  • the first operation may be a voice command input by the user.
  • the vehicle may be preconfigured with a button for enabling the "vehicle-assisted photographing function", and when the user presses the aforementioned button, it is deemed that the vehicle detects the first operation input by the user.
  • one or more touch screens may be pre-configured in the vehicle, and a first icon for receiving the first operation is displayed on the aforementioned touch screen, and the user may perform a touch operation on the first icon to input the first operation
  • the aforementioned touch operations may be operations such as single-click, double-click, and long-press. It should be understood that the examples here are only to facilitate understanding of the manner in which the user inputs the first operation, and are not intended to limit the present solution.
  • FIG. 3 is a schematic interface diagram of a function of triggering the shooting of the surrounding environment in the image acquisition method provided by the embodiment of the present application.
  • Figure 3 includes two sub-schematics (a) and (b).
  • the first icon that is, A1 in the sub-schematic diagram of FIG. 3 (a)
  • A1 in the sub-schematic diagram is used to input the first operation, thereby triggering the vehicle to start photographing the environment around the vehicle through the first external camera device.
  • the vehicle is also provided with a touch screen in the rear row and a first icon (that is, A2 in the sub-schematic diagram of FIG. 3 (a)) is set.
  • the user can click ( b) A2 in the sub-schematic diagram is used to input the first operation, thereby triggering the vehicle to start photographing the environment around the vehicle through the first external camera device;
  • FIG. 3 is only an example to facilitate understanding of this solution,
  • the vehicle may also set the first icon and the like on the central control screen and the rear touch screen at the same time, which is not specifically limited here.
  • the vehicle acquires the target voice, and generates a photographing instruction in response to the received target voice.
  • the pre-set photographing instruction in the vehicle may be triggered by the voice input by the user, then the vehicle may be pre-configured with a model for voice recognition, and the vehicle obtains any voice information input by the user (for convenience) description, hereinafter referred to as "first voice information"), the first voice information input by the user can be converted into text content through the model for voice recognition, and then according to the text content corresponding to the first voice information, to It is determined whether the user has the intention to take a picture, that is, whether the first voice information is the target voice, and if the first voice information is the target voice, the vehicle determines to obtain the photographing instruction input by the user.
  • first voice information any voice information input by the user (for convenience) description
  • a model and a semantic library for performing natural language processing (NLP) tasks may also be configured in the vehicle, and a model for performing NLP tasks may also be referred to as performing natural language understanding (natural language understanding, NLU) task model, the vehicle inputs the text content and semantic library corresponding to the first voice information into the aforementioned model, so as to use the model to determine whether the intention corresponding to the first voice information is to take a photo.
  • NLP natural language understanding
  • the intention (intention) refers to the purpose of the user, which is used to indicate the needs of the user.
  • the vehicle can recognize the user's intention from the voice information input by the user device.
  • the voice information input by the user is "the surface of that black car is velvet, so beautiful", and the vehicle can recognize from the input voice information that the user's intention is to "take a picture”.
  • the intent recognition model can be obtained by training a large amount of corpus, which is a corpus that expresses the intent in different ways.
  • Models used to perform NLP tasks can be implemented by neural networks or non-neural network models.
  • models used to perform NLP tasks can use bidirectional attention neural networks (bidirectional encoder representations). from transformers, BERT), recurrent neural network (RNN), question answering neural network (QANet), etc. for machine reading comprehension (machine reading comprehension, MRC) models, or other types of Models can too.
  • the semantic library may include description information of the shooting intent, and the description information of the intent may support a flexible description manner.
  • any natural language expression that supports the user's habit in this embodiment of the present application that is, the user input
  • the target speech may be relatively straightforward.
  • the user adopts relatively standardized and formatted expressions such as "photographing the sunset”, “photographing the building on the left”, “taking a photo of the black car on the right”, etc.; the target voice input by the user can also be relatively implicit,
  • the target speech input by the user is "the sunset is so beautiful today", “the flowers on the roadside are so beautiful", “what kind of car is in front”, etc., which are not limited here.
  • the semantic database may also include multiple words, and when the vehicle detects that the words in the semantic database appear in the voice information input by the user (which may include the driver and passengers in the vehicle), it is determined that the user has the ability to take pictures. Therefore, the aforementioned voice information output by the user is determined as the target voice, and then the target voice is determined as the photographing instruction.
  • the words included in the semantic database include: photographed, photographed, photographed, so beautiful, so beautiful, or other words, etc., which are not exhaustive here. It should be understood that the explanation of each term here is only to facilitate the understanding of the solution, and is not used to limit the solution.
  • the vehicle can detect the voice input by the user in real time after the user actively enables the "vehicle-assisted photographing" function. information to obtain the target voice.
  • the vehicle starts to detect the voice information input by the user in real time to obtain the target voice after the vehicle is started.
  • the vehicle can also acquire the target keyword from the target speech.
  • the keyword refers to specific information of the intended content, and is also key information used to trigger a specific service.
  • the keyword is, for example, a keyword in the user input information.
  • “right” and “black car” in the user input information "take a picture of the black car on the right” are the keywords of the input information; as another example, for example, in the user input information "the sunset is so beautiful today”
  • the “sunset” is the keyword of the input information.
  • the semantic library may also include slot information.
  • the slot information is the description information of the keyword.
  • the description information of the slot also supports a flexible description method.
  • a description method similar to an attribute can be used.
  • the description information for the slot of "type of photographed object” may be a description method such as "noun”.
  • a keyword-based description method may also be used. Shooting direction", “type of subject”, “shape of subject”, etc. It should be understood that the examples here are only to facilitate the understanding of the concept of slot information, and are not used to limit this solution.
  • the vehicle can input the text content and semantic library corresponding to the first voice information into a database for performing NLP tasks.
  • the intention corresponding to the first speech information is judged by the model for performing the NLP task, and the keywords in the first speech information are extracted by the model for performing the NLP task, and the output is the same as that of the first speech information. Intentions corresponding to the voice information, and keywords extracted from the first voice information.
  • the slot information in the semantic library may all be optional keywords, or, the slot information in the semantic library may also include one or more mandatory slot information, if the vehicle does not obtain from the target voice.
  • query information can be output to the user, and the query information is used to instruct the user to input the keywords corresponding to the mandatory slot information.
  • the mandatory slot information may include "type of object to be photographed”; as another example, for example, mandatory slot information may include "type of object to be captured” and "shooting direction”.
  • the slot selection information can be determined based on the actual situation, and is not limited here.
  • the vehicle can output the inquiry information in the form of voice, or output the inquiry information in the form of text, or output the inquiry information in the form of speech and text, or output the inquiry information in other ways, which are not included here. Do limit.
  • FIG. 4 is a schematic diagram of an interface for acquiring keywords in the image acquiring method provided by the embodiment of the present application.
  • the query information in the form of text is output on the display screen in the rear row of the vehicle.
  • the vehicle outputs the query information in the form of text and voice as an example.
  • B1 represents that the vehicle is playing the query information in the form of voice.
  • the vehicle acquires, in response to the acquired photographing instruction, a first moment corresponding to the photographing instruction.
  • the vehicle may acquire the first moment corresponding to the photographing instruction in response to the acquired photographing instruction. Specifically, in an implementation manner, the vehicle determines a third moment corresponding to the photographing instruction in response to the acquired photographing instruction.
  • the third moment is the generation moment of the photographing instruction, that is, the third moment is the moment when the vehicle receives the target voice input by the user and generates the photographing instruction in response to the received target voice.
  • the vehicle acquires the first time corresponding to the photographing instruction according to the third time. Specifically, in one case, the vehicle may directly determine the third moment as the first moment, that is, the first moment is the moment when the photographing instruction is generated.
  • the vehicle takes a time point before the third time and the interval between the third time and the third time is the first time as the first time, and the value of the first time can be 0.5 seconds, 1 second, 2 seconds, 3 seconds, 5 seconds or other values, etc., which can be determined in combination with factors such as the speed at which the vehicle processes the voice information.
  • the vehicle since the vehicle has been moving forward in the process that the vehicle receives the voice information input by the user and the vehicle processes the semantic information input by the user to determine whether it is the target voice, when the vehicle generates the photographing instruction, it has already It is slightly later than the moment when the user wants to take a picture, and the first moment is determined before the third moment, that is, the first moment may be closer to the moment when the user wants to take a picture. Taking the first moment as the reference point for acquiring the image can Obtain an image that is more in line with what the user actually wants to get.
  • the vehicle in response to the acquired photographing instruction, acquires the fourth moment corresponding to the target voice, and the fourth moment is the acquisition moment of the target voice. Since the entire target voice reception time is a time period, then The acquisition time of the target voice can be any of the following times: the initial acquisition time of the target voice, the termination acquisition time of the target voice (also referred to as the time when the target voice is successfully received), and the middle moment of acquiring the target voice (also referred to as the time when the target voice is successfully received). It is the time corresponding to the middle point of the receiving duration of the entire target speech) or other time points in the acquisition process of acquiring the target speech, etc., which are not limited here.
  • the vehicle may directly determine the fourth time as the first time, or may use the time before the fourth time and the interval between the fourth time and the fourth time as the second time as the first time.
  • the value may be 0.5 seconds, 1 second, 2 seconds, 3 seconds, or other values, etc., which may be determined in combination with factors such as the type of the fourth moment.
  • the vehicle acquires a first direction, and selects a target camera device whose shooting range covers the first direction from at least two first camera devices.
  • At least two first camera devices may be configured in the vehicle, and the shooting ranges of different first camera devices do not overlap or partially overlap. From the first camera device, at least one target camera device whose shooting range covers the first direction is selected.
  • the first direction is determined according to any one or more of the following: the line of sight of the first user, the face orientation of the first user, the body orientation of the first user, or other directions, etc., which are not limited here.
  • the first user can be any one or a combination of the following: a driver, a user located at a preset position in the vehicle, a user who sends a photographing instruction, or other types of users, etc., which type of user is specifically selected , which can be determined according to the actual situation.
  • At least one second camera device may also be configured inside the vehicle, and the vehicle responds to the acquired photographing instruction and uses the second camera device to capture the first camera.
  • an image of a user perform face detection on the image of the first user to determine the face area in the image of the first user, perform key point positioning on the aforementioned face area to determine the eye area in the aforementioned face area .
  • the aforementioned key point positioning operation can be completed by a preset algorithm
  • the aforementioned preset algorithm includes but is not limited to an edge detection (robert) algorithm, a Sobel (Sobel) algorithm, etc., or, the aforementioned key point positioning operation can be completed through a preset model.
  • the aforementioned preset model may be an active contour (snake) model, or may be completed by a neural network for detecting face key points, etc.
  • the methods for detecting face key points are not exhaustive here.
  • the vehicle cuts out the eye area image from the image of the first user, and generates the line of sight direction corresponding to the eye area image through the neural network, that is, the line of sight direction of the first user is obtained.
  • an eye tracker may be configured inside the vehicle, and the vehicle uses the eye tracker to collect the sight direction of the first user, and the technology used by the eye tracker may be based on the pupil corneal vector reflection technology (the pupil center corneareflection technique, PCCR), visual tracking or other technologies based on three-dimensional (3-dimension, 3D) eyeball models, etc., I will not be exhaustive here. It should be said that the vehicle can also adopt other means to realize the first user. The collection of line-of-sight directions is not exhaustive here.
  • the vehicle can control the second camera to collect the image of the first user, and generate the face orientation of the first user through the neural network for face orientation recognition according to the image of the first user.
  • the neural network used for face orientation recognition may adopt a learning vector quantization (LVQ) neural network, a BP neural network, or other types of neural networks, etc., which are not exhaustive here.
  • LVQ learning vector quantization
  • the first direction is the orientation of the user's body
  • a sensor for collecting point cloud data of the user may be configured inside the vehicle, and the vehicle passes the point cloud data corresponding to the current posture of the first user through the aforementioned sensor, Further, the body orientation of the first user can be generated.
  • the vehicle may collect the image of the first user through the second camera device, and generate the body orientation of the first user through a neural network, etc., and the methods for generating the body orientation of the first user are not exhaustive here.
  • the first direction may also adopt other types of directions, for example, the first direction may adopt the gesture direction of the first user, etc., which will not be exhaustive here.
  • the vehicle may output inquiry information to the user to determine whether the first direction can be acquired; specifically , the vehicle can output inquiry information through voice, text or other means, as an example, for example, the vehicle can output "May I ask if it is possible to collect your gaze direction" through voice, etc. If the user replies "Yes", it is determined that the vehicle can collect the user's information direction of sight, etc.
  • the user may input a second operation to the vehicle, and the vehicle, in response to the second operation input by the user, triggers to start executing the acquisition operation of the first direction.
  • the second operation may be for the user to turn on the device for collecting the first direction in the vehicle.
  • the eye tracker is turned off by default, and when the user actively turns on the eye tracker , it is considered that the second operation has been input.
  • the user may input a second operation through the central control screen configured in the vehicle, and it should be understood that the various examples here are only for the convenience of understanding the solution, and are not intended to limit the solution.
  • the vehicle acquires the target image from the first image data according to the first moment, and outputs the target image.
  • the vehicle may acquire one or more target images from the first image data according to the first time, and output the one or more target images.
  • the first moment is included in the first time period, and the interval between the shooting moment of the target image and the first moment is less than or equal to the target threshold; the value of the target threshold can be 5 seconds, 8 seconds, 10 seconds, 15 seconds Seconds or other values, etc., are not limited here.
  • Step 204 is an optional step. If step 204 is not performed, and the target keyword is not obtained from the target voice in step 202, step 205 may include: if the first image data is specifically represented as video data, then the vehicle starts from the first image data. One or more target video frames whose shooting time is the first time are obtained from the image data, and each obtained target video frame is determined as a target image, that is, the shooting time of the target image is the first time.
  • the number threshold of target images may be pre-configured in the vehicle (for the convenience of description, the value of the number threshold of target images is N as an example ), the vehicle obtains from the first image data N images with the closest interval between the shooting moment and the first moment, and determines the aforementioned N images as N target images, and the value of N can be 3, 4 , 5, 6, 8, 9 or other values, etc.
  • the specific value of N can be flexibly set according to the actual situation.
  • the vehicle may be pre-configured with the value of the target threshold, and the vehicle selects from the first image data all images whose interval duration between the shooting moment and the first moment is less than or equal to the target threshold, and uses All the images obtained above are determined as target images.
  • step 205 may include: the vehicle obtains the target image from the first image data according to the first moment and the at least one target keyword.
  • the keyword for describing the shooting object can be used to describe the name, type, Color, shape, or other descriptive information about the subject, etc.
  • the shooting direction of the target image is the direction pointed by the target keyword.
  • a total of S first camera devices are configured in the vehicle.
  • in at least one target keyword there is only a keyword for describing the photographing object, and no keyword for describing the photographing direction.
  • the voice information input by the user is "the sunset is beautiful today"
  • a target keyword "sunset” can be obtained
  • the keyword is a keyword used to describe the type of the photographed object.
  • the first image data is video data, that is, the first image data includes S first videos.
  • the vehicle obtains S second videos from the first image data, and each second video in the S second videos is a video whose starting point is the fifth moment and the end point is the sixth moment, and the interval between the fifth moment and the first moment is The interval duration is equal to the target threshold, and the interval between the sixth moment and the first moment is equal to the target threshold.
  • the vehicle can obtain at least one target video frame in the S second videos, and determine each target video frame as a target image, the target video frame has the object indicated by the target keyword, and the target keyword indicates in the target image. Object.
  • the vehicle obtains the second image of the S group from the first image data, the shooting time of the earliest image in the S group of the second image is the fifth time, and the shooting time of the latest image in the S group of the second image is the sixth time. time. Then, the vehicle can acquire at least one target image from the second images of the S group, and the target image contains the object indicated by the target keyword.
  • the at least one keyword for describing the shooting direction there is only at least one keyword for describing the shooting direction in the at least one target keyword, and there is no keyword for describing the shooting object, and the at least one keyword for describing the shooting direction is used for One or more second directions are indicated.
  • the voice information input by the user is "Wow, the front looks so beautiful”
  • a target keyword "front” may be obtained
  • the keyword is a keyword used to describe the shooting direction.
  • the first image data includes S first videos, since the S first videos are in one-to-one correspondence with the S first camera devices, the shooting ranges of different first camera devices among the S first camera devices do not overlap or partially overlap.
  • the vehicle selects N first cameras corresponding to all the second directions from the S first cameras to obtain N first videos from the S first videos, and for the N first videos in the N first videos For each first video, the vehicle obtains a video frame whose shooting time is the first time from the first video, and determines the obtained video frame as a target image, so as to obtain N targets from the N first videos image, the shooting moment of each target image is the first moment.
  • the vehicle selects N first camera devices corresponding to all the second directions from the S first camera devices to obtain N groups of first images from the S groups of first images, and the vehicle selects N first images from the N groups of first images.
  • One or more images whose shooting time is the first time are acquired, and the aforementioned acquired images are determined as target images, and the shooting time of each target image is the first time.
  • At least one target keyword contains both a keyword for describing the shooting direction and a keyword for describing the shooting object.
  • the vehicle can first obtain N first videos from the S first videos (the specific implementation method is described in the above description), and then, according to the first moment, obtain N first videos from the N first videos.
  • N second videos are obtained from the video (the specific implementation is described in the above description), and then target images are obtained from the N second videos according to the target keyword used to describe the shooting object (the specific implementation is described in the above description).
  • the vehicle can first obtain the first images of the N groups from the first images of the S groups (the specific implementation is described in the above description), and then according to the first moment, from the N groups of the first images N groups of second images are obtained from the images (the specific implementation is described in the above description), and then target images are obtained from the N groups of second images according to the target keyword used to describe the photographed object (the specific implementation is described in the above description).
  • FIG. 5 is a schematic flowchart of an image acquisition method provided by the embodiments of the present application.
  • the vehicle is started, four first camera devices outside the vehicle are triggered to photograph the surrounding environment of the vehicle, so as to obtain the first image data, and the first image data is related to the environment surrounding the vehicle in the first time period
  • the aforementioned four first camera devices are located at the front, left, right and rear of the vehicle, respectively.
  • the vehicle In response to the first operation input by the user, the vehicle starts the function of "Vehicle Assisted Photographing" to start acquiring the first voice information input by the user (that is, any voice information input by the user) in real time, and detects the first voice information input by the user.
  • the voice information is the target voice (that is, judging whether the first voice information input by the user is a photographing instruction)
  • the target keyword is obtained from the target voice, as an example, for example, the input target If the voice is "shoot the black car in front of the right", the vehicle obtains two target keywords of "right front” and "black car”.
  • the vehicle In response to the acquired photographing instruction, the vehicle acquires the first moment corresponding to the photographing instruction. C4. The vehicle acquires the target image from the first image data according to the first moment and the target keyword. It should be understood that the example in FIG. 5 is only to facilitate understanding of this solution, and is not intended to limit this solution.
  • FIG. 6 is a schematic diagram of the target image and the first moment in the image acquisition method provided by the embodiment of the present application.
  • the first image data includes multiple images, and there are 20 images in the first images of the S group as an example.
  • the first images of the S group are arranged in the order of shooting time from early to late.
  • FIG. 6 is a schematic diagram of the target image and the first moment in the image acquisition method provided by the embodiment of the present application.
  • the first image data includes multiple images, and there are 20 images in the first images of the S group as an example.
  • the first images of the S group are arranged in the order of shooting time from early to late.
  • a rectangle represents one Image
  • D1 refers to the image captured at the first moment
  • D2 refers to the image captured at the fifth moment
  • D3 refers to the image captured at the sixth moment
  • D4, D5 and D6 all represent the image in which the object indicated by the target keyword exists , that is, D4, D5, and D6 all represent target images, and the time intervals between the aforementioned three target images and the first moment are all less than the target threshold.
  • the photographing instruction is input in the form of voice, after obtaining the target voice for triggering the photographing, the target keyword is also obtained from the target voice, and the target keyword is used to point to the object the user wants to photograph, or , the target keyword is used to point to the direction the user wants to shoot, that is, the vehicle can further understand what kind of image the user wants, which is beneficial to improve the accuracy of the output target image, that is, it is beneficial to output the image that meets the user's expectations. , in order to further improve the user viscosity of this program.
  • step 205 may include: the vehicle selects M target cameras corresponding to the first direction from the S first camera devices through step 204 After the device is installed, the second image data is selected from the first image data; wherein, the second image data is a subset of the first image data, and the second image data is collected by M target camera devices in the S first camera devices. , and obtain the target image from the second image data.
  • the vehicle acquires the target image from the first image data in a similar manner according to the third time instant.
  • the first image data in the case of "if step 204 is not executed, and the target keyword is not obtained from the target speech in step 202" is replaced with the second image data, which can be understood by referring to the above description, It will not be repeated here.
  • the first direction is also acquired, which is selected from the multiple camera devices.
  • the shooting range covers the target camera in the first direction, and the second image data captured by the target camera is selected from the first image data, and then the target image is obtained from the second image data.
  • the amount of image data is less, which improves the efficiency of the step of acquiring the target image compared to directly acquiring the target image from the first image data; in addition, the first direction is determined according to any one or more of the following: the user's The direction of sight, the direction of the user's face, the direction of the user's body, and the direction of the user's gesture, while the general user will look towards, look at or point to the area of interest, and use the first direction to filter the captured image data, which is conducive to screening User-desired images to increase the user stickiness of this program.
  • step 204 is executed, and at least one target keyword is obtained from the target speech in step 202, then in one case, there is only a keyword for indicating the photographing object in the aforementioned at least one target keyword, and there is no keyword for If the keyword indicating the shooting direction is used, the vehicle can first select the second image data from the first image data according to the first direction, and the second image data is collected by the M target camera devices. The vehicle acquires the target image from the second image data according to the target keyword used to indicate the photographed object and the first moment.
  • step 204 is not executed, and at least one target keyword is obtained from the target voice through step 202, there is only a keyword for describing the shooting object in the at least one target keyword, and no In the case where there is a keyword for describing the shooting direction", the description of the specific implementation manner of "obtaining the target image from the first image data", the difference is that the first image data in the foregoing implementation manner is replaced by this The second image data in the implementation manner will not be repeated here.
  • the at least one target keyword obtained through step 202 contains a keyword for indicating the shooting direction, because the reliability of the second direction directly input by the user through the voice information is higher than that of the first direction. reliability, no matter whether there is a keyword used to indicate the photographed object in at least one target keyword, the vehicle can no longer perform step 204, then the specific implementation method for the vehicle to obtain the target image from the first image data is the same as " If step 204 is not performed, and at least one target keyword is obtained from the target voice through step 202, and the at least one target keyword contains a keyword for indicating the shooting direction", the response to "From the first image The description of the specific implementation manner of obtaining the target image from the data" will not be repeated here.
  • a display screen is configured in the vehicle, and the acquired target images can be displayed directly through the display screen.
  • the display screen may be a central control screen of the vehicle, or a touch screen for receiving the first operation, etc.
  • the specific display screen to be selected can be flexibly set in combination with the actual product form, which is not limited here.
  • a wireless communication connection is pre-established between the vehicle and the vehicle carried by the user, and the vehicle can directly send the acquired target image to the vehicle, so as to display the acquired target image through the vehicle carried by the user .
  • the vehicle can also output the target image and the like to the user in other ways, which are not limited here.
  • the target image may also carry the shooting time of the target image.
  • FIG. 7 is a schematic diagram of an interface for outputting a target image in the image acquisition method provided by the embodiment of the present application.
  • Figure 7 includes two sub-schematic diagrams (a) and (b). Both sub-schematic diagram (a) of Figure 7 and sub-schematic diagram (b) of Figure 7 take the output of 6 target images as an example.
  • sub-schematic diagram (a) of Figure 7 Taking the output of the target image through the display screen of the vehicle as an example, the sub-schematic diagram (b) of FIG. 7 takes the output of the target image through the vehicle carried by the user as an example, and in FIG.
  • FIG. 7 is only for the convenience of understanding the present solution, and is not intended to limit the present solution.
  • the camera command is a voice command
  • FIG. 8 is a schematic flowchart of a method for acquiring an image provided by an embodiment of the present application.
  • the method for acquiring an image provided by an embodiment of the present application may include:
  • the vehicle photographs the environment around the vehicle by using a first camera device to acquire first image data.
  • step 801 is similar to the specific implementation of step 201 in the embodiment corresponding to FIG. 2 , which can be directly referred to for understanding, and will not be repeated here.
  • the vehicle acquires a preset gesture, and generates a photographing instruction in response to the acquired preset gesture.
  • the photographing instruction preset in the vehicle may be a gesture instruction, that is, the user may input the photographing instruction by making a preset gesture. Then, similar to step 202 in the embodiment corresponding to FIG. 2 , in one case, after the vehicle is started, it may start to collect the gesture information of the second user in real time. When the gesture is performed, it is determined that the user has input a photographing instruction. In another case, if the vehicle enables the "vehicle-assisted photographing" function based on the first operation input by the user, the vehicle may start collecting the gesture information of the second user in real time after the user actively enables the "vehicle-assisted photographing" function. to get the preset gesture input by the user.
  • the specific implementation manner of the first operation input by the user may refer to the description in the corresponding embodiment of FIG. 2 , which will not be repeated here.
  • the preset gestures can be static gestures or dynamic gestures.
  • the second user may be any user inside the vehicle, or a passenger who is limited to a fixed position. As an example, for example, a preset gesture can only be input for the passenger at the co-pilot position. It should be noted that the specific second user Which users in the car are selected by the user can be determined based on the actual product form, which is not limited here.
  • the process of collecting the user's gesture information for the vehicle Whether the preset gesture is set to be a static gesture, for example, the gesture pointed to by the preset gesture is a clenched fist gesture, a five-finger open gesture, or other static gestures, etc., or the preset gesture is a dynamic gesture, as an example, for example, the preset gesture points to The gesture is to make a fist and then open the five fingers.
  • a second camera device may be configured inside the vehicle, and the vehicle collects the gesture image of the second user inside the vehicle in real time through the second camera device, analyzes the gesture image of the second user through a computer vision algorithm, and uses the second camera device to analyze the gesture image of the second user.
  • the gesture image of the second user is compared with the image corresponding to the preset gesture. If the gesture image of the second user and the image corresponding to the preset gesture point to the same type of gesture, it is determined that the vehicle has obtained the preset input by the second user. gesture; if the gesture image of the second user and the image corresponding to the preset gesture point to different types of gestures, it is determined that the vehicle has not acquired the preset gesture input by the second user.
  • a sensor for collecting gesture information of the user may be configured inside the vehicle, and the aforementioned sensor may be a laser, a radar, or other types of sensors.
  • the vehicle can collect point cloud data corresponding to the gesture of the second user through the aforementioned sensors, and then judge whether the gesture input by the user is a preset gesture according to the point cloud data corresponding to the gesture of the second user. It should be understood that the vehicle can also Whether the gesture input by the second user is a preset gesture is determined based on other principles, which is not exhaustive here.
  • the vehicle In response to the acquired photographing instruction, the vehicle acquires a first moment corresponding to the photographing instruction.
  • the vehicle after acquiring the photographing instruction, the vehicle needs to acquire the first moment corresponding to the photographing instruction in response to the acquired photographing instruction.
  • the vehicle determines a third moment corresponding to the photographing instruction in response to the acquired photographing instruction, and the third moment is the photographing instruction The moment when the instruction was generated.
  • the vehicle acquires the first time corresponding to the photographing instruction according to the third time.
  • the vehicle acquires the second moment corresponding to the gesture instruction in response to the acquired photographing instruction.
  • the second moment is the shooting moment of the gesture image corresponding to the gesture instruction; further, if the preset gesture is a static gesture, the second moment is a determined moment. If the preset gesture is a dynamic gesture, the second moment can be any one of the following moments: the initial acquisition moment of the preset gesture, the termination acquisition moment of the preset gesture, any moment in the process of acquiring the preset gesture, etc. , which is not limited here.
  • the vehicle determines the first time according to the second time. Further, if the preset gesture is a static gesture, the vehicle may directly determine the second moment as the first moment. If the preset gesture is a dynamic gesture, a time point before the second moment and the interval between the second moment and the second moment is a third duration may also be determined as the first moment, and the value of the third duration may be the same as that of the second moment.
  • the value of the duration can be the same or different. As an example, for example, the value of the third duration can be 1 second, 2 seconds, or other values.
  • the specific value of the third duration can be combined with the type of dynamic gesture and the type and other factors.
  • the shooting moment of the static gesture can be directly obtained, and the preset static gesture is used as the first moment, providing another way of obtaining the first moment, adding
  • the shooting moment of the preset static gesture is directly determined as the first A moment is also more in line with the shooting moment that the user actually wants.
  • the vehicle acquires a first direction, and selects a target camera device whose shooting range covers the first direction from at least two first camera devices.
  • step 804 is similar to the specific implementation manner of step 204 in the embodiment corresponding to FIG. 2 , which can be directly referred to for understanding, and will not be repeated here.
  • the vehicle acquires the target image from the first image data according to the first moment, and outputs the target image.
  • step 805 for the specific implementation of step 805, please refer to the description of step 205 in the corresponding embodiment of FIG. 2 . It should be noted that, because in this embodiment, the vehicle is triggered to generate a photographing instruction by inputting a preset gesture. , so the specific implementation of step 805 only includes several situations in which the target keyword is not obtained from the target speech through step 202 in step 205 .
  • the user can trigger the vehicle to generate a photographing instruction by inputting a voice or gesture, which is simple to operate and easy to implement.
  • the environment around the vehicle is photographed by the camera device configured in the vehicle to obtain the first image data, and when the photographing instruction input by the user is received, the first moment corresponding to the photographing instruction can be determined, and the first time corresponding to the photographing instruction can be determined from the first Selecting the target image at the first moment in the image data, that is, the user is no longer required to shoot the image, which solves the problem that the driver cannot capture the image, and also solves the problem of potential safety hazards in the user's shooting;
  • the first image data corresponds to the environment around the vehicle in the first time period, that is, the camera device configured in the vehicle is used to continuously shoot the environment around the vehicle, and then the image collected at the first moment is selected to avoid missing the desired shooting. scenery.
  • FIG. 9 is a schematic structural diagram of an image acquisition device provided by an embodiment of the present application.
  • the image acquisition device 900 is applied to a vehicle, and the vehicle is equipped with a camera device.
  • the image acquisition device 900 includes: a shooting module 901 , It is used to control the camera device to photograph the environment around the vehicle to obtain first image data, and the first image data corresponds to the environment around the vehicle in the first time period; the obtaining module 902 is used to respond to the obtained photographing instruction , obtain the first moment corresponding to the photographing instruction; the obtaining module 902 is further configured to obtain the target image from the first image data according to the first moment, and output the target image, the first moment is included in the first time period, the target image The time interval between the shooting moment of the image and the first moment is less than or equal to the target threshold.
  • FIG. 10 is a schematic structural diagram of an image acquiring apparatus provided by an embodiment of the present application.
  • the image acquiring apparatus 900 further includes: a generating module 903 for responding to receiving The target voice is to generate a photographing instruction, and the intention corresponding to the target voice is to take a photograph; or, the generating module 903 is configured to generate a photographing instruction in response to the acquired preset gesture.
  • the obtaining module 902 is further configured to obtain target keywords from the target speech, where the target keywords are description information of the shooting object and/or shooting direction; the obtaining module 902 is specifically configured to obtain target keywords according to the target keywords. , obtaining a target image from the first image data, wherein the target image contains an object indicated by the target keyword, and/or the shooting direction of the target image is the direction pointed by the target keyword.
  • the preset gesture is a preset static gesture
  • the acquiring module 902 is specifically configured to, in response to the acquired photographing instruction, acquire the second moment corresponding to the preset gesture, and determine the second moment to determine the first One moment, the second moment is the shooting moment of the preset static gesture.
  • the acquiring module 902 is specifically configured to: in response to the acquired photographing instruction, determine a third moment corresponding to the photographing instruction, where the third moment is the generation moment of the photographing instruction; according to the third moment, determine The first moment, the first moment is before the third moment.
  • the vehicle is configured with at least two camera devices, and the shooting ranges of different camera devices do not overlap or partially overlap; the acquiring module 902 is further configured to acquire the first direction, the first direction Determined based on any one or more of the following: gaze direction, face orientation, and body orientation.
  • the image acquisition device 900 further includes: a selection module 904 for selecting a target camera whose shooting range covers the first direction from at least two cameras; an acquisition module 902 for selecting a second image from the first image data The second image data is collected by the target camera, and the target image is obtained from the second image data.
  • FIG. 11 is a schematic structural diagram of a vehicle provided by an embodiment of the present application.
  • the vehicle 1100 may be deployed with the vehicle described in the corresponding embodiment of FIG. 9 .
  • the image acquisition device 900 is used to realize the functions of the vehicle in the corresponding embodiments of FIGS. 1b to 8 .
  • the vehicle 1100 includes: a receiver 1101, a transmitter 1102, a processor 1103, and a memory 1104 (wherein the number of processors 1103 in the vehicle 1100 may be one or more, and one processor is taken as an example in FIG. 11), wherein , the processor 1103 may include an application processor 11031 and a communication processor 11032.
  • the receiver 1101, the transmitter 1102, the processor 1103, and the memory 1104 may be connected by a bus or otherwise.
  • Memory 1104 may include read-only memory and random access memory, and provides instructions and data to processor 1103 . A portion of memory 1104 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1104 stores processors and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.
  • the processor 1103 controls the operation of the vehicle.
  • various components of the vehicle are coupled together through a bus system, where the bus system may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus.
  • the various buses are referred to as bus systems in the figures.
  • the methods disclosed in the above embodiments of the present application may be applied to the processor 1103 or implemented by the processor 1103 .
  • the processor 1103 may be an integrated circuit chip, which has signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by an integrated logic circuit of hardware in the processor 1103 or an instruction in the form of software.
  • the above-mentioned processor 1103 may be a general-purpose processor, a digital signal processor (DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (ASIC), a field programmable Field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable Field-programmable gate array
  • the processor 1103 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of this application.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the data field.
  • the storage medium is located in the memory 1104, and the processor 1103 reads the information in the memory 1104, and completes the steps of the above method in combination with its hardware.
  • the receiver 1101 can be used to receive input numerical or character information, and generate signal input related to the relevant settings and function control of the vehicle.
  • the transmitter 1102 can be used to output digital or character information through the first interface; the transmitter 1102 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1102 can also include a display device such as a display screen .
  • the processor 1103 is configured to execute the image acquisition method performed by the vehicle in the embodiments corresponding to FIG. 1 b to FIG. 8 . It should be noted that the specific manner in which the application processor 11031 in the processor 1103 executes the above steps is based on the same concept as the method embodiments corresponding to FIG. The method embodiments corresponding to FIG. 1 b to FIG. 8 are the same. For specific content, reference may be made to the descriptions in the method embodiments shown above in this application, which will not be repeated here.
  • FIG. 12 is a schematic structural diagram of a vehicle according to an embodiment of the application.
  • the vehicle 1100 is configured in a fully or partially automatic driving mode.
  • the vehicle 1100 can be Control itself while in autonomous driving mode, and can determine the current state of the vehicle and its surrounding environment through human operation, determine the possible behavior of at least one other vehicle in the surrounding environment, and determine the possibility of the other vehicle performing the possible behavior.
  • a corresponding confidence level the vehicle 1100 is controlled based on the determined information.
  • the vehicle 1100 may also be set to operate without human interaction when the vehicle 1100 is in an autonomous driving mode.
  • Vehicle 1100 may include various subsystems, such as travel system 102 , sensor system 104 , control system 106 , one or more peripherals 108 and power supply 110 , computer system 112 , and user interface 116 .
  • vehicle 1100 may include more or fewer subsystems, and each subsystem may include multiple components. Additionally, each of the subsystems and components of the vehicle 1100 may be wired or wirelessly interconnected.
  • the travel system 102 may include components that provide powered motion for the vehicle 1100 .
  • travel system 102 may include engine 118 , energy source 119 , transmission 120 , and wheels/tires 121 .
  • the engine 118 may be an internal combustion engine, an electric motor, an air compression engine, or other types of engine combinations, such as a hybrid engine composed of a gasoline engine and an electric motor, and a hybrid engine composed of an internal combustion engine and an air compression engine.
  • Engine 118 converts energy source 119 into mechanical energy. Examples of energy sources 119 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electricity.
  • the energy source 119 may also provide energy to other systems of the vehicle 1100 .
  • Transmission 120 may transmit mechanical power from engine 118 to wheels 121 .
  • Transmission 120 may include a gearbox, a differential, and a driveshaft. In one embodiment, transmission 120 may also include other devices, such as clutches.
  • the drive shaft may include one or more axles that may be coupled to one or more wheels 121 .
  • the sensor system 104 may include several sensors that sense information about the environment around the vehicle 1100 for acquiring point cloud data corresponding to the environment at each time, and images corresponding to the environment at each time.
  • the sensor system 104 may include a positioning system 122 (the positioning system may be a global positioning GPS system, a Beidou system or other positioning systems), an inertial measurement unit (IMU) 124, a radar 126, a laser rangefinder 128 and camera 130.
  • the sensor system 104 may also include sensors that monitor the internal systems of the vehicle 1100 (eg, an in-vehicle air quality monitor, a fuel gauge, an oil temperature gauge, etc.). Sensing data from one or more of these sensors can be used to detect objects and their corresponding properties (position, shape, orientation, velocity, etc.). This detection and identification is a critical function for the safe operation of autonomous vehicle 1100 .
  • the positioning system 122 may be used to estimate the geographic location of the vehicle 1100 .
  • the IMU 124 is used to sense position and orientation changes of the vehicle 1100 based on inertial acceleration.
  • IMU 124 may be a combination of an accelerometer and a gyroscope.
  • the radar 126 may use radio signals to perceive objects in the surrounding environment of the vehicle 1100, and may specifically be represented by a millimeter-wave radar or a lidar. In some embodiments, in addition to sensing objects, radar 126 may be used to sense the speed and/or heading of objects.
  • the laser rangefinder 128 may utilize the laser light to sense objects in the environment in which the vehicle 1100 is located.
  • the laser rangefinder 128 may include one or more laser sources, laser scanners, and one or more detectors, among other system components.
  • Camera 130 may be used to capture multiple images of the surrounding environment of vehicle 1100 .
  • Camera 130 may be a still camera or a video camera.
  • Control system 106 controls the operation of the vehicle 1100 and its components.
  • Control system 106 may include various components including steering system 132 , throttle 134 , braking unit 136 , computer vision system 140 , line control system 142 , and obstacle avoidance system 144 .
  • the steering system 132 is operable to adjust the heading of the vehicle 1100 .
  • it may be a steering wheel system.
  • the throttle 134 is used to control the operating speed of the engine 118 and thus the speed of the vehicle 1100 .
  • the braking unit 136 is used to control the deceleration of the vehicle 1100 .
  • the braking unit 136 may use friction to slow the wheels 121 .
  • the braking unit 136 may convert the kinetic energy of the wheels 121 into electrical current.
  • the braking unit 136 may also take other forms to slow the wheels 121 to control the speed of the vehicle 1100 .
  • Computer vision system 140 is operable to process and analyze images captured by camera 130 in order to identify objects and/or features in the environment surrounding vehicle 1100 .
  • the objects and/or features may include traffic signals, road boundaries and obstacles.
  • Computer vision system 140 may use object recognition algorithms, Structure from Motion (SFM) algorithms, video tracking, and other computer vision techniques.
  • SFM Structure from Motion
  • the computer vision system 140 may be used to map the environment, track objects, estimate the speed of objects, and the like.
  • the route control system 142 is used to determine the travel route and travel speed of the vehicle 1100 .
  • the route control system 142 may include a lateral planning module 1421 and a longitudinal planning module 1422, respectively, for combining information from the obstacle avoidance system 144, the GPS 122, and one or more predetermined maps
  • the data for the vehicle 1100 determines the travel route and travel speed.
  • Obstacle avoidance system 144 is used to identify, evaluate, and avoid or otherwise traverse obstacles in the environment of vehicle 1100, which may be embodied as both actual obstacles and virtual moving bodies that may collide with vehicle 1100.
  • the control system 106 may additionally or alternatively include components in addition to those shown and described. Alternatively, some of the components shown above may be reduced.
  • Vehicle 1100 interacts with external sensors, other vehicles, other computer systems, or users through peripherals 108 .
  • Peripherals 108 may include a wireless communication system 146 , an onboard computer 148 , a microphone 150 and/or a speaker 152 .
  • the peripheral device 108 provides a means for the user of the vehicle 1100 to interact with the user interface 116 .
  • the onboard computer 148 may provide information to the user of the vehicle 1100 .
  • User interface 116 may also operate on-board computer 148 to receive user input.
  • the onboard computer 148 can be operated via a touch screen.
  • peripherals 108 may provide a means for vehicle 1100 to communicate with other devices located within the vehicle.
  • microphone 150 may receive audio (eg, voice commands or other audio input) from a user of vehicle 1100 .
  • speakers 152 may output audio to a user of vehicle 1100 .
  • Wireless communication system 146 may include receiver 1201 and transmitter 1202 shown in FIG. 12 .
  • the power supply 110 may provide power to various components of the vehicle 1100 .
  • the power source 110 may be a rechargeable lithium-ion or lead-acid battery.
  • One or more battery packs of such batteries may be configured as a power source to provide power to various components of the vehicle 1100 .
  • power source 110 and energy source 119 may be implemented together, such as in some all-electric vehicles.
  • the computer system 112 may include at least one processor 1103 and a memory 1104.
  • the computer system 112 may include at least one processor 1103 and a memory 1104.
  • the functions of the processor 1103 and the memory 1104 reference may be made to the description in FIG. 12, which will not be repeated here.
  • Computer system 112 may control functions of vehicle 1100 based on input received from various subsystems (eg, travel system 102 , sensor system 104 , and control system 106 ) and from user interface 116 .
  • computer system 112 may utilize input from control system 106 to control steering system 132 to avoid obstacles detected by sensor system 104 and obstacle avoidance system 144 .
  • computer system 112 is operable to provide control of various aspects of vehicle 1100 and its subsystems.
  • one or more of these components described above may be installed or associated with the vehicle 1100 separately.
  • memory 1104 may exist partially or completely separate from vehicle 1100 .
  • the above-described components may be communicatively coupled together in a wired and/or wireless manner.
  • FIG. 12 should not be construed as a limitation on the embodiments of the present application.
  • An autonomous vehicle traveling on a road such as vehicle 1100 above, can recognize objects within its surroundings to determine adjustments to current speed.
  • the objects may be other vehicles, traffic control equipment, or other types of objects.
  • each identified object may be considered independently, and based on the object's respective characteristics, such as its current speed, acceleration, distance from the vehicle, etc., may be used to determine the speed at which the autonomous vehicle is to adjust.
  • the vehicle 1100 or a computing device associated with the vehicle 1100 such as the computer system 112, computer vision system 140, memory 1104 of FIG. ice on the road, etc.
  • each identified object is dependent on the behavior of the other, so it is also possible to predict the behavior of a single identified object by considering all identified objects together.
  • the vehicle 1100 can adjust its speed based on the predicted behavior of the identified object. In other words, the vehicle 1100 can determine what steady state the vehicle will need to adjust to (eg, accelerate, decelerate, or stop) based on the predicted behavior of the object.
  • the computing device may also provide instructions to modify the steering angle of the vehicle 1100 so that the vehicle 1100 follows a given trajectory and/or maintains contact with objects in the vicinity of the vehicle 1100 (eg, a road the safe lateral and longitudinal distances for cars in the adjacent lanes).
  • the above-mentioned vehicle 1100 can be a car, a truck, a motorcycle, a bus, a boat, an airplane, a helicopter, a lawn mower, a recreational vehicle, a playground vehicle, construction equipment, a tram, a golf cart, a train, a cart, etc.
  • the application examples are not particularly limited.
  • Embodiments of the present application also provide a computer program product that, when running on a computer, causes the computer to execute the steps performed by the vehicle in the methods described in the embodiments shown in the foregoing FIGS. 1 b to 8 .
  • Embodiments of the present application also provide a computer-readable storage medium, where a program for performing signal processing is stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, the computer executes the program shown in FIG. 1 b to FIG. 8 above. The steps performed by the vehicle in the method described in the exemplary embodiment.
  • the image acquisition apparatus may be a chip, and the chip includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin, or a circuit Wait.
  • the processing unit can execute the computer-executed instructions stored in the storage unit, so that the chip executes the image acquisition method described in the embodiments shown in FIG. 1 b to FIG. 8 above.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program of the method in the first aspect.
  • the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be retrieved from a website, computer, training device, or data Transmission from the center to another website site, computer, training facility or data center via wired (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.) means.
  • wired eg coaxial cable, fiber optic, digital subscriber line (DSL)
  • wireless eg infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a training device, a data center, or the like that includes an integration of one or more available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Traffic Control Systems (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本申请实施例设计人工智能领域中的车辆领域,公开了一种图像的获取方法以及相关设备,方法应用于车辆,车辆配置有摄像装置,方法包括:控制摄像装置对车辆周围的环境进行拍摄,以获取到第一图像数据,第一图像数据与第一时间段内车辆周围的环境对应;响应于获取到的拍照指令,获取与拍照指令对应的第一时刻;根据第一时刻,从第一图像数据中获取目标图像,并输出目标图像。不再需要用户进行图像拍摄,从而解决了驾驶员无法拍摄图像的问题,也解决了用户拍摄存在安全隐患的问题;由于第一时刻包括于第一时间段内,目标图像的拍摄时刻与第一时刻之间的间隔时长小于或等于目标阈值,避免错过想要拍摄的景物。

Description

一种图像的获取方法以及相关设备 技术领域
本申请涉及人工智能领域,尤其涉及一种图像的获取方法以及相关设备。
背景技术
在车辆行驶过程中,有时候驾驶员或乘客会觉得周围的景色很漂亮,想要拍照记录下来。但是往往会存在以下问题:驾驶员因为需要驾驶车辆,所以不能去进行拍摄;乘客需要拿出手机进行拍摄,但想要拍摄的景物可能已经错过了;拍摄窗外景物时可能需要摇下车窗,把手伸出窗外,操作麻烦而且存在安全隐患。
以上种种原因导致驾驶员或乘客难以拍摄到行驶过程中的所期望拍摄到的景色。
发明内容
本申请实施例提供了一种图像的获取方法以及相关设备,不再需要用户进行图像拍摄,从而解决了驾驶员无法拍摄图像的问题,也解决了用户拍摄存在安全隐患的问题;由于第一时刻包括于第一时间段内,目标图像的拍摄时刻与第一时刻之间的间隔时长小于或等于目标阈值,避免错过想要拍摄的景物。
为解决上述技术问题,本申请实施例提供以下技术方案:
第一方面,本申请实施例提供一种图像的获取方法,可以用于人工智能领域的车辆领域中,方法应用于车辆,车辆的外部配置有一个或多个第一摄像装置,方法包括:车辆通过第一摄像装置对车辆周围的环境进行拍摄,以获取到第一图像数据,第一图像数据与第一时间段内车辆周围的环境对应;第一图像数据可以表现为对第一时间段内车辆周围的环境进行录像得到的视频,或者,第一图像数据包括从前述视频中获取到的多个第一视频帧(也即图像),多个第一视频帧与第一时间段内的各个时刻对应,或者,第一图像数据包括为对第一时间段内车辆周围的环境进行拍照得到的多个第一图像。车辆响应于获取到的拍照指令,获取与拍照指令对应的第一时刻;车辆根据第一时刻,从第一图像数据中获取目标图像,并输出目标图像,第一时刻包括于第一时间段内,目标图像的拍摄时刻与第一时刻之间的间隔时长小于或等于目标阈值,目标阈值的取值可以为5秒、8秒、10秒、15秒或其他取值等。
本实现方式中,通过车辆配置的第一摄像装置对车辆周围的环境进行拍摄以得到第一图像数据,当接收到用户输入的拍照指令时,可以获取该拍照指令所对应的第一时刻,从第一图像数据中选取拍摄时刻为第一时刻的目标图像,也即不再需要用户进行图像拍摄,从而解决了驾驶员无法拍摄图像的问题,也解决了用户拍摄存在安全隐患的问题;此外,由于第一图像数据与第一时间段内车辆周围的环境对应,也即利用车辆配置的第一摄像装置对车辆周围的环境进行拍摄,进而选取在第一时刻采集到的图像,以避免错过想要拍摄的景物。
在第一方面的一种可能实现方式中,方法还包括:车辆响应于接收到的目标语音,生成拍照指令,与目标语音对应的意图为拍照;具体的,车辆中可以预先配置有用于进行语 音识别的模型,车辆获取到用户输入的任意一个语音信息(为方便描述,后续简称为“第一语音信息”)后,可以通过该用于进行语音识别的模型将用户输入的第一语音信息转换为文字内容,进而根据与第一语音信息对应的文字内容,来判断用户是否有拍照的意图,也即该第一语音信息是否为目标语音,若第一语音信息为目标语音,则车辆确定获取到用户输入的拍照指令。或者,车辆采集第二用户输入的手势信息,当根据第二用户的手势信息确定第二用户输入预设手势时,响应于获取到的预设手势,生成拍照指令;预设手势可以为静态手势,也可以为动态手势。第二用户可以为车辆内部的任意一个用户,也可以为仅限定为固定位置的乘客。
本实现方式中,用户可以通过输入语音或手势的方式触发车辆生成拍照指令,操作简单,易于实现。
在第一方面的一种可能实现方式中,针对车辆与拍照指令对应的第一时刻的获取过程。在一种实现方式中,车辆响应于获取到的拍照指令,确定与拍照指令对应的第三时刻,并根据该第三时刻,获取与拍照指令对应的第一时刻,第三时刻为拍照指令的生成时刻;在另一种实现方式中,若车辆为响应于接收到的目标语音,生成拍照指令,则车辆响应于获取到的拍照指令,获取与目标语音对应的第四时刻,并根据该第四时刻,获取与拍照指令对应的第一时刻,第四时刻为目标语音的获取时刻,由于整个目标语音的接收时间是一个时间段,则目标语音的获取时刻可以为以下任一种时刻:目标语音的起始获取时刻、目标语音的终止获取时刻或获取目标语音的中间时刻。在另一种实现方式中,若车辆为响应于获取到的预设手势,生成拍照指令,则车辆响应于获取到的拍照指令,获取与手势指令对应的第二时刻,第二时刻为与手势指令对应的手势图像的拍摄时刻,预设手势可以为动态手势或静态手势。
在第一方面的一种可能实现方式中,车辆中还可以配置有用于执行自然语言处理(natural language processing,NLP)任务的模型和语义库,车辆将与第一语音信息对应的文字内容和语义库输入用于执行NLP任务的模型,以通过用于执行NLP任务的模型来确定与第一语音信息对应的意图是否为拍照。其中,意图(intent)是指用户的目的,用于指示用户的需求。车辆可以从用户设备输入的语音信息中识别用户的意图。该用于执行NLP任务的模型具体可以通过神经网络来实现,也可以通过非神经网络类的模型来实现,作为示例,例如该用于执行NLP任务的模型具体可以采用双向注意力神经网络(bidirectional encoder representations from transformers,BERT)、循环神经网络(recurrent neural network,RNN)、问题-答案神经网络(question answering network,QANet)等用于进行机器阅读理解(machine reading comprehension,MRC)的模型,或者其他能够实现语义理解功能的模型也可以。
在第一方面的一种可能实现方式中,方法还包括:车辆从目标语音中获取至少一个目标关键词,目标关键词为拍摄对象和/或拍摄方向的描述信息;其中,关键词是指意图内容的具体信息,也是用于触发特定服务的关键信息。关键词例如是用户输入信息中的关键词。作为示例,例如用户输入信息“对右边的黑车进行拍照”中的“右边”和“黑车”即为该输入信息的关键词。具体的,语义库还可以包括槽位信息,槽位信息为关键词的描述信息, 针对用户输入的语音信息中的任一个语音信息(为方便描述,后续简称为“第一语音信息”),车辆可以将与第一语音信息对应的文字内容和语义库输入用于执行NLP任务的模型中,以通过该用于执行NLP任务的模型对第一语音信息对应的意图进行判断,并通过该用于执行NLP任务的模型对第一语音信息中的关键词进行提取,输出与第一语音信息对应的意图,和从第一语音信息中提取出来的关键词。车辆从第一图像数据中获取目标图像,包括:车辆根据目标关键词,从第一图像数据中获取目标图像。其中,若至少一个目标关键词中存在用于描述拍摄对象的关键词,则目标图像中存在目标关键词指示的对象,用于描述拍摄对象的关键词可以用于描述拍摄对象的名称、类型、颜色、形状或其他拍摄对象的描述信息等;和/或,若至少一个目标关键词中存在用于描述拍摄方向的关键词,则目标图像的拍摄方向为目标关键词指向的方向。
本实现方式中,通过语音的形式输入拍照指令,在得到用于触发拍照的目标语音后,还会从目标语音中获取目标关键词,目标关键词用于指向用户想要拍摄的物体,或者,目标关键词用于指向用户想要拍摄的方向,也即车辆能够进一步了解到用户想要什么样的图像,有利于提高输出的目标图像的准确度,也即有利于输出满足用户期望的图像,以进一步提高本方案的用户粘度。
在第一方面的一种可能实现方式中,预设手势为预设静态手势,车辆响应于获取到的拍照指令,获取与拍照指令对应的第一时刻,包括:车辆响应于获取到的拍照指令,获取与预设手势对应的第二时刻,并将第二时刻确定第一时刻,第二时刻为预设静态手势的拍摄时刻。本实现方式中,当预设手势是静态手势指令时,可以直接获取该静态手势的拍摄时刻,并将预设静态手势作为第一时刻,提供了第一时刻的另一种获取方式,增加了本方案的实现灵活性;此外,由于用户看到想要拍摄的对象,到做出预设静态手势之间的时间间隔一般都比较短,所以直接将预设静态手势的拍摄时刻确定为第一时刻,也比较符合用户实际想要的拍摄时刻。
在第一方面的一种可能实现方式中,车辆响应于获取到的拍照指令,获取与拍照指令对应的第一时刻,包括:车辆响应于获取到的拍照指令,确定与拍照指令对应的第三时刻,第三时刻为拍照指令的生成时刻;车辆根据第三时刻,确定第一时刻,第一时刻位于第三时刻之前,第一时长的取值可以为0.5秒、1秒、2秒、3秒、5秒或其他数值等,具体可以结合车辆处理语音信息的速度等因素来确定。
本实现方式中,由于在车辆接收用户输入的语音信息和车辆对用户输入的语义信息进行处理以确定是否为目标语音这个过程中,车辆一直在前进,则当车辆生成拍照指令之时,已经略晚于用户想要拍照的时刻了,将第一时刻确定于第三时刻之前,也即第一时刻可能更加贴近于用户想要拍照的时刻,以第一时刻作为获取图像的基准点,能够获取到更为符合用户实际想要得到的图像。
在第一方面的一种可能实现方式中,车辆配置有至少两个第一摄像装置,不同的第一摄像装置的拍摄范围不重叠或部分重叠,方法还包括:车辆获取第一方向;第一方向根据如下任一项或多项确定:第一用户的视线方向、第一用户的面部朝向、第一用户的身体朝向或其他方向等,此处不做限定。进一步地,第一用户可以为以下中的任一种或多种组合: 驾驶员、位于车辆中预设位置的用户、发出拍照指令的用户或其他类型的用户等,具体选择哪种类型的用户,可结合实际情况确定。车辆从至少两个(为方便描述,后续称为S个)第一摄像装置中选取拍摄范围覆盖第一方向的至少一个(为方便描述,后续称为M个)目标第一摄像装置。车辆从第一图像数据中获取目标图像,包括:车辆从第一图像数据中选取第二图像数据,第二图像数据为第一图像数据的子集,第二图像数据为S个第一摄像装置中由M个目标摄像装置采集得到;车辆从第二图像数据中获取目标图像。
本实现方式中,在获取到与多个摄像装置对应的第一图像数据后,由于多个摄像装置的拍摄范围不重叠或部分重叠,还会获取第一方向,从多个摄像装置中选取拍摄范围覆盖第一方向的目标摄像装置,并从第一图像数据中选取由目标摄像装置拍摄的第二图像数据,进而从第二图像数据中获取目标图像,由于第二图像数据相对于第一图像数据的数据量更少,相对于直接从第一图像数据中获取目标图像,提高了获取目标图像这一步骤的效率;此外,第一方向为根据如下任一项或多项确定:用户的视线方向、用户的面部朝向、用户的身体朝向和用户的手势方向,而一般用户会朝向、看向或指向感兴趣的区域,利用第一方向对拍摄到的图像数据进行筛选,有利于筛选到用户期望的图像,以提高本方案的用户粘度。
在第一方面的一种可能实现方式中,车辆配置有至少两个摄像装置,不同的摄像装置的拍摄范围不重叠或部分重叠,方法还包括:从目标语音中获取目标关键词,目标关键词为拍摄对象和/或拍摄方向的描述信息;获取第一方向,第一方向根据如下任一项或多项确定:视线方向、面部朝向和身体朝向;从至少两个摄像装置中选取拍摄范围覆盖第一方向的目标摄像装置。车辆从第一图像数据中获取目标图像,包括:从第一图像数据中选取第二图像数据,第二图像数据中为目标摄像装置采集到的;根据目标关键词,从第二图像数据中获取目标图像,其中,目标图像中存在目标关键词指示的对象,和/或,目标图像的拍摄方向为目标关键词指向的方向。
第二方面,本申请实施例提供了一种图像的获取装置,可以用于人工智能领域的车辆领域中,图像的获取装置应用于车辆,车辆配置有摄像装置,图像的获取装置包括:拍摄模块,用于控制摄像装置对车辆周围的环境进行拍摄,以获取到第一图像数据,第一图像数据与第一时间段内车辆周围的环境对应;获取模块,用于响应于获取到的拍照指令,获取与拍照指令对应的第一时刻;获取模块,用于根据第一时刻,从第一图像数据中获取目标图像,并输出目标图像,第一时刻包括于第一时间段内,目标图像的拍摄时刻与第一时刻之间的间隔时长小于或等于目标阈值。
本申请实施例的第二方面提供的图像处理装置还可以执行第一方面的各个可能实现方式中车辆执行的步骤,对于本申请实施例第二方面以及第二方面的各种可能实现方式的具体实现步骤,以及每种可能实现方式所带来的有益效果,均可以参考第一方面中各种可能的实现方式中的描述,此处不再一一赘述。
第三方面,本申请实施例提供了一种车辆,可以包括处理器,处理器和存储器耦合,存储器存储有程序指令,当存储器存储的程序指令被处理器执行时实现上述第一方面所述的图像的获取方法中车辆执行的步骤。
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当所述程序在计算机上运行时,使得计算机执行上述第一方面所述的图像的获取方法中车辆执行的步骤。
第五方面,本申请实施例提供了一种电路***,所述电路***包括处理电路,所述处理电路配置为执行所述的图像的获取方法中车辆执行的步骤。
第六方面,本申请实施例提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面所述的图像的获取方法中车辆执行的步骤。
第七方面,本申请实施例提供了一种芯片***,该芯片***包括处理器,用于实现上述各个方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据和/或信息。在一种可能的设计中,所述芯片***还包括存储器,所述存储器,用于保存服务器或通信设备必要的程序指令和数据。该芯片***,可以由芯片构成,也可以包括芯片和其他分立器件。
附图说明
图1a为本申请实施例提供的图像的获取方法中车辆的一种结构示意图;
图1b为本申请实施例提供的图像的获取方法的一种流程示意图;
图2为本申请实施例提供的图像的获取方法的另一种流程示意图;
图3为本申请实施例提供的图像的获取方法中触发对周围的环境进行拍摄的功能的一种界面示意图;
图4为本申请实施例提供的图像的获取方法中获取关键词的一个界面示意图;
图5为本申请实施例提供的图像的获取方法的另一个流程示意图;
图6为本申请实施例提供的图像的获取方法中目标图像与第一时刻的一个示意图;
图7为本申请实施例提供的图像的获取方法中输出目标图像的一个界面示意图;
图8为本申请实施例提供的图像的获取方法的另一种流程示意图;
图9为本申请实施例提供的图像的获取装置的一种结构示意图;
图10为本申请实施例提供的图像的获取装置的另一种结构示意图;
图11为本申请实施例提供的车辆的一种结构示意图;
图12为本申请实施例提供的车辆的另一种结构示意图。
具体实施方式
本申请的说明书和权利要求书及上述附图中的术语“第一”、第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、***、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请实施例可以应用于各种车辆行驶过程中需要拍照的场景中,前述车辆包括但不限于轿车、卡车、摩托车、公共汽车、船、飞机、直升飞机、娱乐车、游乐场车辆、施工设备、电车、高尔夫球车、火车等。具体的,当用户(包括位于车辆内的驾驶员或乘客)在车辆的行驶过程中想对车辆周围的环境进行拍照时,由于驾驶员没有时间对车辆周围的环境进行拍照,且车辆在高速运行,乘客也来不及对车辆周围的环境进行拍照,此外,在车辆行驶的过程中,用户对车辆周围的环境进行拍照的安全度也较低。
为了解决上述问题,本申请实施例提供了一种图像的获取方法,该方法应用于车辆中,车辆的外部配置有一个或多个摄像装置,该摄像装置可以具体表现为摄像机、相机或其他类型的摄像装置等,为了对本申请实施例提供的车辆有更直观地了解,先结合图1a对本申请实施例中采用的车辆进行描述,图1a为本申请实施例提供的图像的获取方法中车辆的一种结构示意图,图1a中以车辆的形态为具体表现为轿车为例,如图1a所示,图1a中以黑色的原点代表摄像装置在车辆外部的位置,车辆的外部配置有多个摄像装置(图1a中以设置有6个摄像装置为例),不同的摄像装置配置于车辆的不同位置上,从而不同的摄像装置的拍摄范围不重叠或部分重叠,应理解,图1a中的示例仅为方便理解本方案,不用于限定本方案。
具体的,如图1b所示,图1b为本申请实施例提供的图像的获取方法的一种流程示意图。S1、车辆通过外部的摄像装置对车辆周围的环境进行持续拍摄,或者,车辆通过外部的摄像装置对车辆周围的环境非持续性的拍摄,以获取到第一图像数据;其中,第一图像数据与第一时间段内车辆周围的环境对应,第一时间段指的是车辆通过摄像装置进行拍摄的时间;第一时间段包括多个时刻,也即第一图像数据包括多个时刻车辆周围的环境的图像。S2、车辆可以实时检测用户输入的拍照指令,响应于获取到的拍照指令,获取与拍照指令对应的第一时刻;其中,该拍照指令具体可以基于用户输入的语音或手势触发;或者,车辆还可以采集用户的眨眼频率,当用户的眨眼频率大于或等于预设阈值时,确定获取到了用户输入的拍照指令;或者,车辆还可以在方向盘上配置有传感器,以采集用户的心率,当用户的心率大于或等于预设阈值时,确定获取到了用户输入的拍照指令;车辆还可以采集用户的其它类型的人体信息,以获取用户输入的拍照指令等等,此处不对拍照指令的类型进行穷举。S3、车辆在确定第一时刻后,可以根据第一时刻,从第一图像数据中获取目标图像,并输出目标图像,第一时刻包括于第一时间段内,目标图像的拍摄时刻与第一时刻之间的间隔时长小于目标阈值。由于目标图像是从第一图像数据中获取的,第一图像数据是通过车辆配置的摄像装置拍摄得到的,也即不再需要用户进行图像拍摄,从而解决了驾驶员或者乘客无法拍摄图像的问题,也解决了用户拍摄存在安全隐患的问题;此外,由于第一图像数据与第一时间段内车辆周围的环境对应,也即利用车辆配置的摄像装置对车辆周围的环境进行连续拍摄,进而选取在第一时刻采集到的图像,以避免错过想要拍摄的景物。需要说明的是,由于用户在行车期间的语音、手势、眨眼频率、心率等个人信息可能会涉及到用户的个人隐私,因此,在一种实现方式中,用户可以对车辆输入第一操作(第一操作的具体实现方式将在后续步骤中详细说明),车辆响应于用户输入的第一操作,触发开始采集前述信息,以触发生成拍照指令。在另一种实现方式中,车辆可以向用户输出询 问信息,以确定能否采集用户的一种或多种信息;具体的,车辆可以通过语音、文本或其他方式输出询问信息,作为示例,例如车辆通过语音的方式输出“请问是否可以采集您发出的语音信息”等,若用户回复“可以”,则确定车辆能够采集用户发出的语音,应理解,此处举例仅为方便理解本方案,不用于限定本方案。
参阅上述描述可知,该拍照指令具体可以表现为各种形式,如下实施例中仅以拍照指令分别采用语音指令和手势指令为例,对本申请实施例提供的图像的获取方法进行详细描述。
一、拍照指令为语音指令
具体的,请参阅图2,图2为本申请实施例提供的图像的获取方法的一种流程示意图,本申请实施例提供的图像的获取方法可以包括:
201、车辆控制第一摄像装置对车辆周围的环境进行拍摄,以获取到第一图像数据。
本申请实施例中,车辆的外部可以配置有S个第一摄像装置,通过前述S个第一摄像装置对车辆周围的环境进行拍摄,以得到第一图像数据,其中,S为大于或等于1的整数,当S的取值大于1时,不同的第一摄像装置的拍摄方向不重叠或部分重叠;第一图像数据与第一时间段内车辆周围的环境对应,对于第一时间段和第一图像数据的概念可结合上述描述进行理解。
具体的,在一种情况下,车辆可以通过S个第一摄像装置对车辆周围的环境进行录像,以得到与第一时间段内车辆周围的环境对应的视频数据。进一步地,在一种实现方式中,车辆可以直接将前述视频数据确定为第一图像数据,也即第一图像数据具体可以表现为对第一时间段内车辆周围的环境进行拍摄得到的视频。
在另一种实现方式中,车辆可以根据与第一时间段内车辆周围的环境对应的视频数据,进行视频帧提取操作,以获取到第一图像数据,也即第一图像数据包括多个第一视频帧(也即图像),多个第一视频帧与第一时间段内的各个时刻对应。
在另一种情况下,在该第一时间段内,车辆可以通过第一摄像装置按照目标频率对车辆周围的环境进行拍照,以得到第一图像数据,也即第一图像数据包括多个第一图像,多个第一图像用于展示第一时间段内各个时刻车辆周围的环境。
更具体的,在一种实现方式中,可以为在车辆启动后,触发车辆外部的多个第一摄像装置开始自动对车辆周围的环境进行持续的拍摄,需要说明的是,车辆通过外部的第一摄像装置对车辆周围的环境进行拍摄的目的可以不仅仅是为了方便用户对周围环境进行拍照,也可以用于辅助车辆进行路径规划等,此处不做限定。
在另一种实现方式中,也可以车辆在检测到用户输入的第一操作后,响应于检测到的第一操作,开启车辆的拍照功能,触发通过车辆外部的第一摄像装置对车辆周围的环境进行持续拍摄。在另一种实现方式中,由于车辆中预先配置有多个外部的第一摄像装置,车辆启动后,车辆通过部分外部的第一摄像装置对车辆周围的环境进行持续拍摄,当车辆检测到用户输入的第一操作后,响应于检测到的第一操作,触发通过车辆外部的所有第一摄像装置对车辆周围的环境进行持续拍摄等,此处不限定车辆触发通过第一摄像装置对周围的环境进行拍摄的方式。
进一步地,在一种情况中,该第一操作可以为用户输入的语音指令,作为示例,例如当用户发出“开启车辆辅助拍照功能”的语音时,视为车辆检测到用户输入的第一操作。在另一种情况中,车辆中可以预先配置有用于开启“车辆辅助拍照功能”的按钮,当用户按压前述按钮时,视为车辆检测到用户输入的第一操作。在另一种情况中,车辆中可以预先配置有一个或多个触摸屏,前述触摸屏上展示有用于接收第一操作的第一图标,用户可以通过对第一图标执行触摸操作以输入第一操作,前述触摸操作可以为单击、双击、长按等操作。应理解,此处举例仅为方便理解用户输入第一操作的方式,不用于限定本方案。
为更直观地理解本方案,请参阅图3,图3为本申请实施例提供的图像的获取方法中触发对周围的环境进行拍摄的功能的一种界面示意图。图3包括(a)和(b)两个子示意图。图3的(a)子示意图中以在车辆的中控屏幕中设置有第一图标(也即图3的(a)子示意图中的A1)为例,用户可以通过点击图3的(a)子示意图中的A1以输入第一操作,从而触发车辆开始通过外部的第一摄像装置对车辆周围的环境进行拍摄。图3的(b)子示意图中以车辆在后排也设置有触控屏幕中设置有第一图标(也即图3的(a)子示意图中的A2),用户可以通过点击图3的(b)子示意图中的A2以输入第一操作,从而触发车辆开始通过外部的第一摄像装置对车辆周围的环境进行拍摄;需要说明的是,图3仅为方便理解本方案的一种示例,车辆也可以在中控屏幕和后排触摸屏中同时设置第一图标等,具体此处不做限定。
202、车辆获取目标语音,响应于接收到的目标语音,生成拍照指令。
本申请的一些实施例中,车辆中预先设置的拍照指令可以通过用户输入的语音触发,则车辆中可以预先配置有用于进行语音识别的模型,车辆获取到用户输入的任意一个语音信息(为方便描述,后续简称为“第一语音信息”)后,可以通过该用于进行语音识别的模型将用户输入的第一语音信息转换为文字内容,进而根据与第一语音信息对应的文字内容,来判断用户是否有拍照的意图,也即该第一语音信息是否为目标语音,若第一语音信息为目标语音,则车辆确定获取到用户输入的拍照指令。
具体的,车辆中还可以配置有用于执行自然语言处理(natural language processing,NLP)任务的模型和语义库,用于执行NLP任务的模型也可以称为用于执行自然语言理解(natural language understanding,NLU)任务的模型,车辆将与第一语音信息对应的文字内容和语义库输入前述模型中,以通过该模型来确定与第一语音信息对应的意图是否为拍照。
其中,意图(intention)是指用户的目的,用于指示用户的需求。车辆可以从用户设备输入的语音信息中识别用户的意图。示例性的,用户输入的语音信息是“那个黑车的表面是天鹅绒的哎,好漂亮”,车辆可以从该输入的语音信息中识别出用户的意图是“拍照”。其中,意图的识别模型可以根据大量语料训练得到,该大量语料是采用不同表达方式表达该意图的语料。
用于执行NLP任务的模型具体可以通过神经网络来实现,也可以通过非神经网络类的模型来实现,作为示例,例如用于执行NLP任务的模型具体可以采用双向注意力神经网络(bidirectional encoder representations from transformers,BERT)、循环神经网络(recurrent neural network,RNN)、问题-答案神经网络(question answering network,QANet)等用于 进行机器阅读理解(machine reading comprehension,MRC)的模型,或者其他类型的模型也可以。
在一种情况中,语义库中可以包括拍摄意图的描述信息,意图的描述信息可以支持灵活的描述方式,对应的,本申请实施例中支持用户习惯的任何自然语言表达,也即用户输入的目标语音可以为比较直白的。作为示例,例如用户采用诸如“拍摄夕阳”、“拍摄左边的大楼”、“对右边的黑车进行拍照”等较为规范、格式化的表达方式;用户输入的目标语音也可以为较为隐含的,作为示例,例如用户输入的目标语音为“今天的夕阳好美啊”、“路边的花好漂亮”、“前面的车是哪一款”等等,此处不做限定。
在另一种情况中,语义库中也可以包括多个词语,当车辆检测到用户(可以包括车辆中的驾驶员和乘客)输入的语音信息中出现语义库中的词语时,确定用户具有拍照意图,从而将用户输出的前述语音信息确定为目标语音,进而将目标语音确定为拍照指令。作为示例,例如语义库中包括的词语有:拍摄、拍下来、拍照、好漂亮、好美或其他词语等,此处不做穷举。应理解,此处对各个名词的解释仅为方便理解本方案,不用于限定本方案。
更具体的,在一种实现方式中,若车辆是基于用户输入的第一操作开启“车辆辅助拍照”功能,则车辆在用户主动开启“车辆辅助拍照”功能后,可以实时检测用户输入的语音信息,以获取到目标语音。在另一种实现方式中,若车辆在启动后,就自动开启“车辆辅助拍照”功能,则车辆在启动后,就开始实时检测用户输入的语音信息,以获取到目标语音。
可选地,车辆还可以从目标语音中获取目标关键词。其中,关键词是指意图内容的具体信息,也是用于触发特定服务的关键信息。关键词例如是用户输入信息中的关键词。作为示例,例如用户输入信息“对右边的黑车进行拍照”中的“右边”和“黑车”即为该输入信息的关键词;作为另一示例,例如用户输入信息“今天的夕阳好美啊”中的“夕阳”即为该输入信息的关键词。
具体的,语义库还可以包括槽位信息,槽位信息为关键词的描述信息,槽位的描述信息也支持应用灵活的描述方式,在一种实现方式中,可以采用类似属性的描述方式,例如,对于“拍摄对象的类型”的槽位的描述信息可以是“名词”等描述方式,在另一种实现方式中,也可以采用键词式的描述方式,例如,槽位信息可以是“拍摄方向”、“拍摄对象的类型”、“拍摄对象的形状”等描述方式。应当理解,此处举例也仅为方便理解槽位信息这一概念,不用于限定本方案。
针对用户输入的语音信息中的任一个语音信息(为方便描述,后续简称为“第一语音信息”),车辆可以将与第一语音信息对应的文字内容和语义库输入用于执行NLP任务的模型中,以通过该用于执行NLP任务的模型对第一语音信息对应的意图进行判断,并通过该用于执行NLP任务的模型对第一语音信息中的关键词进行提取,输出与第一语音信息对应的意图,和从第一语音信息中提取出来的关键词。
进一步地,语义库中的槽位信息可以全部为可选关键词,或者,语义库中的槽位信息也可以包括一个或多个必选槽位信息,则若车辆未从目标语音中获取到与必选槽位信息对应的关键词,可以向用户输出询问信息,该询问信息用于指示用户输入与必选槽位信息对 应的关键词。作为示例,例如必选槽位信息可以包括“拍摄对象的类型”;作为另一示例,例如必选槽位信息可以包括“拍摄对象的类型”和“拍摄方向”等,具体设置什么类型的必选槽位信息,可以结合实际情况确定,此处不做限定。
车辆可以采用语音的形式输出该询问信息,也可以通过文本的形式输出该询问信息,还可以通过语音和文本的形式输出该询问信息,或者,还可以通过其他方式输出该询问信息,此处不做限定。
为更直观地理解本方案,请参阅图4,图4为本申请实施例提供的图像的获取方法中获取关键词的一个界面示意图。如图4所示,车辆后排的显示屏幕上输出了文本形式的询问信息,图4中以车辆同时通过文本和语音的形式输出询问信息为例,B1代表车辆正通过语音的形式播放询问信息,应理解,图4中的示例仅为方便理解本方案,不用于限定本方案。
203、车辆响应于获取到的拍照指令,获取与拍照指令对应的第一时刻。
本申请实施例中,车辆在检测到拍照指令后,可以响应于获取到的拍照指令,获取与拍照指令对应的第一时刻。具体的,在一种实现方式中,车辆响应于获取到的拍照指令,确定与拍照指令对应的第三时刻。其中,第三时刻为拍照指令的生成时刻,也即第三时刻为车辆在接收完用户输入的目标语音,并响应于接收到的目标语音,生成拍照指令的时刻。车辆根据该第三时刻,获取与拍照指令对应的第一时刻。具体的,在一种情况中,车辆可以直接将第三时刻确定为第一时刻,也即第一时刻就是拍照指令的生成时刻。在另一种情况中,车辆将位于第三时刻之前且与第三时刻之间的间隔时长为第一时长的时间点作为第一时刻,第一时长的取值可以为0.5秒、1秒、2秒、3秒、5秒或其他数值等,具体可以结合车辆处理语音信息的速度等因素来确定。
本申请实施例中,由于在车辆接收用户输入的语音信息和车辆对用户输入的语义信息进行处理以确定是否为目标语音这个过程中,车辆一直在前进,则当车辆生成拍照指令之时,已经略晚于用户想要拍照的时刻了,将第一时刻确定于第三时刻之前,也即第一时刻可能更加贴近于用户想要拍照的时刻,以第一时刻作为获取图像的基准点,能够获取到更为符合用户实际想要得到的图像。
在另一种实现方式中,车辆响应于获取到的拍照指令,获取与目标语音对应的第四时刻,第四时刻为目标语音的获取时刻,由于整个目标语音的接收时间是一个时间段,则目标语音的获取时刻可以为以下任一种时刻:目标语音的起始获取时刻、目标语音的终止获取时刻(也可以称为目标语音接收成功的时刻)、获取目标语音的中间时刻(也可以称为整个目标语音的接收时长的中间点所对应的时刻)或获取目标语音的获取过程中的其他时间点等,此处不做限定。
进一步地,车辆可以直接将第四时刻确定为第一时刻,也可以将位于第四时刻之前且与第四时刻之间的间隔时长为第二时长的时间点作为第一时刻,第二时长的取值可以为0.5秒、1秒、2秒、3秒或其他数值等,具体可以结合第四时刻的类型等因素来确定。
204、车辆获取第一方向,从至少两个第一摄像装置中选取拍摄范围覆盖第一方向的目标摄像装置。
本申请的一些实施例中,车辆中可以配置有至少两个第一摄像装置,不同的第一摄像装置的拍摄范围不重叠或部分重叠,则车辆还可以获取第一方向,并从至少两个第一摄像装置中选取拍摄范围覆盖第一方向的至少一个目标摄像装置。其中,第一方向根据如下任一项或多项确定:第一用户的视线方向、第一用户的面部朝向、第一用户的身体朝向或其他方向等,此处不做限定。进一步地,第一用户可以为以下中的任一种或多种组合:驾驶员、位于车辆中预设位置的用户、发出拍照指令的用户或其他类型的用户等,具体选择哪种类型的用户,可结合实际情况确定。
具体的,若第一方向为用户的视线方向,则在一种实现方式中,车辆内部也可以配置有至少一个第二摄像装置,车辆响应与获取到的拍照指令,通过第二摄像装置采集第一用户的图像,对第一用户的图像进行人脸检测,以确定第一用户的图像中的人脸区域,对前述人脸区域进行关键点定位,以确定前述人脸区域中的眼部区域。其中,前述关键点定位操作可以通过预设算法完成,前述预设算法包括但不限于边缘检测(robert)算法、索贝尔(Sobel)算法等,或者,前述关键点定位操作可以通过预设模型完成,前述预设模型可以为主动轮廓线(snake)模型,或者,可以通过用于进行人脸关键点检测的神经网络来完成等等,此处不对进行人脸关键点检测的方法进行穷举。车辆从第一用户的图像中截取出眼部区域图像,通过神经网络生成与该眼部区域图像对应的视线方向,也即得到第一用户的视线方向。
在另一种实现方式中,车辆内部可以配置有眼动仪,车辆通过眼动仪来采集第一用户的视线方向,该眼动仪所采用的技术可以为基于瞳孔角膜向量反射技术(the pupil center corneareflection technique,PCCR)、基于三维(3-dimension,3D)眼球模型的视觉跟踪或其他技术等,此处不做穷举,需要说的是,车辆也可以采用其他手段以实现对第一用户的视线方向的采集,此处不做穷举。
若第一方向为用户的面部朝向,则车辆可以控制第二摄像装置采集第一用户的图像,根据第一用户的图像,通过用于进行人脸朝向识别的神经网络生成第一用户的面部朝向。作为示例,例如用于进行人脸朝向识别的神经网络可以采用学习向量化(learning vector quantization,LVQ)神经网络、BP神经网络或其他类型的神经网络等等,此处不做穷举。
若第一方向为用户的身体朝向,则在一种实现方式中,车辆内部可以配置有用于采集用户的点云数据的传感器,车辆通过前述传感器与第一用户的当前姿势对应的点云数据,进而可以生成第一用户的身体朝向。在另一种实现方式中,车辆可以通过第二摄像装置采集第一用户的图像,并通过神经网络生成第一用户的身体朝向等,此处不对生成第一用户的身体朝向的方式进行穷举。需要说明的是,第一方向还可以采用其他类型的方向,例如第一方向可以采用第一用户的手势方向等,此处不做穷举。
需要说明的是,由于第一方向的获取过程可能会涉及到用户的个人隐私,因此,在一种实现方式中,车辆可以向用户输出询问信息,以确定能否采集该第一方向;具体的,车辆可以通过语音、文本或其他方式输出询问信息,作为示例,例如车辆通过语音的方式输出“请问是否可以采集您的视线方向”等,若用户回复“可以”,则确定车辆能够采集用户的视线方向等。在另一种实现方式中,用户可以对车辆输入第二操作,车辆响应于用户输 入的第二操作,触发开始执行第一方向的获取操作。具体的,在一种情况中,该第二操作可以为用户打开车辆中用于采集第一方向的装置,作为示例,例如眼动仪在默认状态下处于关闭状态,当用户主动打开眼动仪时,视为输入了第二操作。在另一种情况中,用户可以通过车辆中配置的中控屏幕输入第二操作等,应理解,此处的种种均举例仅为方便理解本方案,不用于限定本方案。
205、车辆根据第一时刻,从第一图像数据中获取目标图像,并输出目标图像。
本申请实施例中,车辆在通过步骤203确定第一时刻后,可以根据第一时刻从第一图像数据中获取一个或多个目标图像,并输出该一个或多个目标图像。其中,第一时刻包括于第一时间段内,目标图像的拍摄时刻与第一时刻之间的间隔时长小于或等于目标阈值;目标阈值的取值可以为5秒、8秒、10秒、15秒或其他取值等等,此处不做限定。
具体的,针对从第一图像数据中获取目标图像的过程。步骤204为可选步骤,若不执行步骤204,且步骤202中未从目标语音中获取到目标关键词,则步骤205可以包括:若第一图像数据具体表现为视频数据,则车辆从第一图像数据中获取拍摄时刻为第一时刻的一个或多个目标视频帧,并将获取到的每个目标视频帧确定为一个目标图像,也即目标图像的拍摄时刻就是第一时刻。
若第一图像数据具体包括多个第一图像,则在一种实现方式中,车辆中可以预先配置目标图像的数量阈值(为方便描述,后续以目标图像的数量阈值的取值为N为例),车辆从第一图像数据中获取拍摄时刻与第一时刻之间的间隔时长最接近的N个图像,并将前述N个图像确定为N个目标图像,N的取值可以为3、4、5、6、8、9或其他数值等,具体N的取值可结合实际情况灵活设定。
在另一种实现方式中,车辆中可以预先配置有目标阈值的取值,车辆从第一图像数据中选取拍摄时刻与第一时刻之间的间隔时长小于或等于目标阈值的所有图像,并将前述获取到的所有图像均确定为目标图像。
若不执行步骤204,且通过步骤202从目标语音中获取到至少一个目标关键词,则步骤205可以包括:车辆根据第一时刻和至少一个目标关键词,从第一图像数据中获取目标图像。其中,若至少一个目标关键词中存在用于描述拍摄对象的关键词,则目标图像中存在目标关键词指示的对象,用于描述拍摄对象的关键词可以用于描述拍摄对象的名称、类型、颜色、形状或其他拍摄对象的描述信息等。或者,若至少一个目标关键词中存在用于描述拍摄方向的关键词,则目标图像的拍摄方向为目标关键词指向的方向。
进一步地,车辆中共配置有S个第一摄像装置,在一种情况下,至少一个目标关键词中仅存在用于描述拍摄对象的关键词,不存在用于描述拍摄方向的关键词。作为示例,例如用户输入的语音信息为“今天的夕阳好美啊”,则可以获取到一个目标关键词“夕阳”,该关键词为用于描述拍摄对象的类型的关键词。若第一图像数据为视频数据,也即第一图像数据包括S个第一视频。则车辆从第一图像数据中获取S个第二视频,S个第二视频中各个第二视频均为起点为第五时刻且终点为第六时刻的视频,第五时刻与第一时刻之间的间隔时长等于目标阈值,第六时刻与第一时刻之间的间隔时长等于目标阈值。则车辆可以对S个第二视频中获取至少一个目标视频帧,并将每个目标视频帧确定为一个目标图像, 目标视频帧中存在目标关键词指示的对象,目标图像中存在目标关键词指示的对象。
若第一图像数据包括多个图像,也即第一图像数据包括与S个第一摄像装置对应的S组第一图像。则车辆从第一图像数据中获取S组第二图像,S组第二图像中最早拍摄的图像的拍摄时刻为第五时刻,S组第二图像中最晚拍摄的图像的拍摄时刻为第六时刻。则车辆可以对S组第二图像中获取至少一个目标图像,目标图像中存在目标关键词指示的对象。
在另一种情况下,至少一个目标关键词中仅存在至少一个用于描述拍摄方向的关键词,不存在用于描述拍摄对象的关键词,该至少一个用于描述拍摄方向的关键词用于指示一个或多个第二方向。作为示例,例如用户输入的语音信息为“哇,你看前面好美啊”,则可以获取到一个目标关键词“前面”,该关键词为用于描述拍摄方向的关键词。若第一图像数据包括S个第一视频,由于S个第一视频与S个第一摄像装置一一对应,S个第一摄像装置中不同的第一摄像装置的拍摄范围不重叠或部分重叠,则车辆从S个第一摄像装置中选取与所有第二方向对应的N个第一摄像装置,以从S个第一视频中获取到N个第一视频,针对N个第一视频中的每一个第一视频,车辆从第一视频中获取拍摄时刻为第一时刻的一个视频帧,将前述获取到的视频帧确定为一个目标图像,以从N个第一视频中获取到N个目标图像,每个目标图像的拍摄时刻为第一时刻。
若第一图像数据包括S组第一图像,由于S组第一图像与S个第一摄像装置一一对应,S个第一摄像装置中不同的第一摄像装置的拍摄范围不重叠或部分重叠,则车辆从S个第一摄像装置中选取与所有第二方向对应的N个第一摄像装置,以从S组第一图像中获取到N组第一图像,车辆从N组第一图像中获取拍摄时刻为第一时刻的一个或多个图像,并将前述获取到的图像确定为目标图像,每个目标图像的拍摄时刻为第一时刻。
在另一种情况中,至少一个目标关键词中既存在用于描述拍摄方向的关键词,又存在用于描述拍摄对象的关键词。若第一图像数据包括S个第一视频,则车辆可以先从S个第一视频中获取到N个第一视频(具体实现方式参与上述描述),再根据第一时刻,从N个第一视频中获取N个第二视频(具体实现方式参与上述描述),进而根据用于描述拍摄对象的目标关键词,从N个第二视频中获取目标图像(具体实现方式参与上述描述)。
若第一图像数据包括S组第一图像,则车辆可以先从S组第一图像中获取到N组第一图像(具体实现方式参与上述描述),再根据第一时刻,从N组第一图像中获取N组第二图像(具体实现方式参与上述描述),进而根据用于描述拍摄对象的目标关键词,从N组第二图像中获取目标图像(具体实现方式参与上述描述)。
为了对本申请实施例有更为直观地理解,请参阅图5,图5为本申请实施例提供的图像的获取方法的一个流程示意图。其中,C1、车辆启动后,触发通过车辆外部的四个第一摄像装置对车辆周围的环境进行拍摄,以获取到第一图像数据,第一图像数据为与第一时间段内车辆周围的环境对应的视频,前述四个第一摄像装置分别位于车辆的前方、左侧、右侧和后方。C2、车辆响应于用户输入的第一操作,开启“车辆辅助拍照”功能,以开始实时获取用户输入的第一语音信息(也即用户输入的任意一个语音信息),并检测用户输入的第一语音信息是否为目标语音(也即判断用户输入的第一语音信息是否为拍照指令),若车辆检测到用户输入的目标语音,则从目标语音中获取目标关键词,作为示例,例如输入 的目标语音为“拍摄右前方的黑车”,则车辆获取到“右前方”和“黑车”两个目标关键词。C3、车辆响应于获取到的拍照指令,获取与拍照指令对应的第一时刻。C4、车辆根据第一时刻和目标关键词,从第一图像数据中获取目标图像,应理解,图5中的示例仅为方便理解本方案,不用于限定本方案。
为更直观地理解本方案,请参阅图6,图6为本申请实施例提供的图像的获取方法中目标图像与第一时刻的一个示意图。图6中以第一图像数据包括多个图像,S组第一图像中存在20个图像为例,S组第一图像按照拍摄时间从早到晚的顺序排列,图6中以一个矩形代表一个图像,D1指代在第一时刻拍摄的图像,D2代表在第五时刻拍摄的图像,D3代表在第六时刻拍摄的图像,D4、D5和D6均代表的图像中存在目标关键词指示的对象,也即D4、D5和D6均代表目标图像,前述三个目标图像与第一时刻之间的时间间隔均小于目标阈值,应理解,图6中的示例仅为方便理解目标图像的拍摄时刻与第一时刻之间的关系,不用于限定本方案。
本申请实施例中,通过语音的形式输入拍照指令,在得到用于触发拍照的目标语音后,还会从目标语音中获取目标关键词,目标关键词用于指向用户想要拍摄的物体,或者,目标关键词用于指向用户想要拍摄的方向,也即车辆能够进一步了解到用户想要什么样的图像,有利于提高输出的目标图像的准确度,也即有利于输出满足用户期望的图像,以进一步提高本方案的用户粘度。
若执行步骤204,且步骤202中未从目标语音中获取到目标关键词,则步骤205可以包括:车辆通过步骤204从S个第一摄像装置中选取出与第一方向对应的M个目标摄像装置后,从第一图像数据中选取第二图像数据;其中,第二图像数据为第一图像数据的子集,第二图像数据为S个第一摄像装置中由M个目标摄像装置采集得到,并从第二图像数据中获取目标图像。需要说的是,车辆根据第一时刻从第二图像数据中获取目标图像的具体实现方式,与,“若未执行步骤204,且步骤202中未从目标语音中获取到目标关键词”这一情况中,车辆根据第三时刻从第一图像数据中获取目标图像的方式类似。区别在于,将“若未执行步骤204,且步骤202中未从目标语音中获取到目标关键词”这一情况中的第一图像数据,替换为第二图像数据,可参阅上述描述进行理解,此处不再赘述。
本申请实施例中,在获取到与多个摄像装置对应的第一图像数据后,由于多个摄像装置的拍摄范围不重叠或部分重叠,还会获取第一方向,从多个摄像装置中选取拍摄范围覆盖第一方向的目标摄像装置,并从第一图像数据中选取由目标摄像装置拍摄的第二图像数据,进而从第二图像数据中获取目标图像,由于第二图像数据相对于第一图像数据的数据量更少,相对于直接从第一图像数据中获取目标图像,提高了获取目标图像这一步骤的效率;此外,第一方向为根据如下任一项或多项确定:用户的视线方向、用户的面部朝向、用户的身体朝向和用户的手势方向,而一般用户会朝向、看向或指向感兴趣的区域,利用第一方向对拍摄到的图像数据进行筛选,有利于筛选到用户期望的图像,以提高本方案的用户粘度。
若执行步骤204,且步骤202中从目标语音中获取到至少一个目标关键词,则在一种情况中,前述至少一个目标关键词中仅存在用于指示拍摄对象的关键词,不存在用于指示 拍摄方向的关键词,则车辆可以先根据第一方向,从第一图像数据中选取第二图像数据,第二图像数据为通过M个目标摄像装置采集得到的。车辆根据用于指示拍摄对象的目标关键词和第一时刻,从第二图像数据中获取目标图像。前述步骤的具体实现方式可参阅上述“若不执行步骤204,且通过步骤202从目标语音中获取到至少一个目标关键词,至少一个目标关键词中仅存在用于描述拍摄对象的关键词,不存在用于描述拍摄方向的关键词”这一情况中,对“从第一图像数据中获取目标图像”的具体实现方式的描述,区别在于,将前述实现方式中的第一图像数据替换为本实现方式中的第二图像数据,此处不再进行赘述。
在另一种情况中,通过步骤202获取到的至少一个目标关键词中存在用于指示拍摄方向的关键词,由于用户通过语音信息直接输入的第二方向的可信度高于第一方向的可信度,则无论至少一个目标关键词中是否存在用于指示拍摄对象的关键词,车辆均可以不再执行步骤204,则车辆从第一图像数据中获取目标图像的具体实现方式,与“若不执行步骤204,且通过步骤202从目标语音中获取到至少一个目标关键词,该至少一个目标关键词中存在用于指示拍摄方向的关键词”这一情况中,对“从第一图像数据中获取目标图像”的具体实现方式的描述,此处不再进行赘述。
针对车辆输出目标图像的过程。车辆在获取到一个或多个目标图像后,在一种实现方式中,车辆中配置有显示屏幕,可以直接通过该显示屏幕展示获取到的目标图像。其中,该显示屏幕可以为车辆的中控屏幕,也可以为用于接收第一操作的触摸屏幕等,具体选用哪一个显示屏幕可以结合实际产品形态来灵活设定,此处不做限定。
在另一种实现方式中,车辆与用户携带的车辆之间预先建立有无线通信连接,车辆可以直接将获取到的目标图像发送给车辆,以通过用户携带的车辆对获取到的目标图像进行展示。车辆还可以通过其他方式向用户输出目标图像等,此处不做限定。
可选地,目标图像中还可以携带有目标图像的拍摄时间。
为更直观地理解本方案,请参阅图7,图7为本申请实施例提供的图像的获取方法中输出目标图像的一个界面示意图。图7包括(a)和(b)两个子示意图,图7的(a)子示意图和图7的(b)子示意图均以输出6个目标图像为例,图7的(a)子示意图中以通过车辆的显示屏幕输出目标图像为例,图7的(b)子示意图中以通过用户携带的车辆输出目标图像为例,图7中以在“相册”这一应用程序中展示目标图像为例,应理解,图7中的示例仅为方便理解本方案,不用于限定本方案。
二、拍照指令为语音指令
具体的,请参阅图8,图8为本申请实施例提供的图像的获取方法的一种流程示意图,本申请实施例提供的图像的获取方法可以包括:
801、车辆通过第一摄像装置对车辆周围的环境进行拍摄,以获取到第一图像数据。
本申请实施例中,步骤801的具体实现方式与图2对应实施例中的步骤201的具体实现方式类似,可直接参阅理解,此处不做赘述。
802、车辆获取预设手势,响应于获取到的预设手势,生成拍照指令。
本申请的一些实施例中,车辆中预先设置的拍照指令可以为手势指令,也即用户可以 通过做出预设手势的方式来输入拍照指令。则与图2对应实施例中的步骤202类似,在一种情况中,车辆可以在启动后,开始实时采集第二用户的手势信息,当根据第二用户的手势信息确定第二用户输入预设手势时,确定用户输入了拍照指令。在另一种情况中,车辆是基于用户输入的第一操作开启“车辆辅助拍照”功能,则车辆可以在用户主动开启“车辆辅助拍照”功能后,车辆开始实时采集第二用户的手势信息,以获取到用户输入的预设手势。
其中,用户输入第一操作的具体实现方式可参阅图2对应实施例中的描述,此处不做赘述。预设手势可以为静态手势,也可以为动态手势。第二用户可以为车辆内部的任意一个用户,也可以为仅限定为固定位置的乘客,作为示例,例如仅可以为位于副驾驶位置的乘客输入预设手势等,需要说明的是,具体第二用户选定为车内的哪些用户,可结合实际产品形态确定,此处不做限定。
具体的,针对车辆采集用户的手势信息的过程。无论是预设手势设定为静态手势,例如该预设手势指向的手势为握拳手势、五指张开的手势或其他静态手势等,还是预设手势为动态手势,作为示例,例如预设手势指向的手势为先握拳后再五指张开的手势。在一种实现方式中,车辆内部可以配置有第二摄像装置,车辆通过第二摄像装置实时采集车辆内部的第二用户的手势图像,通过计算机视觉算法分析第二用户的手势图像,并将第二用户的手势图像与预设手势所对应的图像进行比对,若第二用户的手势图像与预设手势所对应的图像指向同一类型的手势,则确定车辆获取到第二用户输入的预设手势;若第二用户的手势图像与预设手势所对应的图像指向不同类型的手势,则确定车辆未获取到第二用户输入的预设手势。
在另一种实现方式中,车辆内部可以配置有用于采集用户的手势信息的传感器,前述传感器可以为激光、雷达或其他类型的传感器等。车辆可以通过前述传感器采集与第二用户的手势对应的点云数据,进而根据与第二用户的手势对应的点云数据,判断用户输入的手势是否为预设手势等,应理解,车辆还可以基于其他原理来判断第二用户输入的手势是否为预设手势,此处不做穷举。
803、车辆响应于获取到的拍照指令,获取与拍照指令对应的第一时刻。
本申请的一些实施例中,车辆在获取到拍照指令后,需要响应于获取到的拍照指令,获取与拍照指令对应的第一时刻。
具体的,与图2对应实施例中的步骤203的具体实现方式类似,在一种实现方式中,车辆响应于获取到的拍照指令,确定与拍照指令对应的第三时刻,第三时刻为拍照指令的生成时刻。车辆根据该第三时刻,获取与拍照指令对应的第一时刻。具体实现方式可参阅上述描述,此处不做赘述。
在另一种实现方式中,车辆响应于获取到的拍照指令,获取与手势指令对应的第二时刻。其中,第二时刻为与手势指令对应的手势图像的拍摄时刻;进一步地,若预设手势为静态手势,则第二时刻为一个确定的时刻。若预设手势为动态手势,则第二时刻可以为以下中的任一种时刻:预设手势的初始获取时刻、预设手势的终止获取时刻、预设手势的获取过程中的任一个时刻等,此处不做限定。
车辆根据第二时刻,确定第一时刻。进一步地,若预设手势为静态手势,则车辆可以直接将第二时刻确定为第一时刻。若预设手势为动态手势,则也可以将位于第二时刻之前且与第二时刻之间的间隔时长为第三时长的时间点确定为第一时刻,第三时长的取值可以与第二时长的取值相同,也可以不同,作为示例,例如第三时长的取值可以为1秒、2秒或其他取值,具体第三时长的取值可以结合动态手势的类型、第二时刻的类型等因素确定。
本申请实施例中,当预设手势是静态手势指令时,可以直接获取该静态手势的拍摄时刻,并将预设静态手势作为第一时刻,提供了第一时刻的另一种获取方式,增加了本方案的实现灵活性;此外,由于用户看到想要拍摄的对象,到做出预设静态手势之间的时间间隔一般都比较短,所以直接将预设静态手势的拍摄时刻确定为第一时刻,也比较符合用户实际想要的拍摄时刻。
804、车辆获取第一方向,从至少两个第一摄像装置中选取拍摄范围覆盖第一方向的目标摄像装置。
本申请实施例中,步骤804的具体实现方式与图2对应实施例中的步骤204的具体实现方式类似,可直接参阅理解,此处不做赘述。
805、车辆根据第一时刻,从第一图像数据中获取目标图像,并输出目标图像。
本申请实施例中,步骤805的具体实现方式可参阅图2对应实施例中步骤205中的描述,需要说明的是,由于本实施例中是通过输入预设手势的方式,触发车辆生成拍照指令,所以步骤805的具体实现方式中,仅包括步骤205中未通过步骤202从目标语音中获取到目标关键词的几种情况中。
本申请实施例中,用户可以通过输入语音或手势的方式触发车辆生成拍照指令,操作简单,易于实现。
本申请实施例中,通过车辆配置的摄像装置对车辆周围的环境进行拍摄以得到第一图像数据,当接收到用户输入的拍照指令时,可以确定该拍照指令所对应的第一时刻,从第一图像数据中选取拍摄时刻为第一时刻的目标图像,也即不再需要用户进行图像拍摄,从而解决了驾驶员无法拍摄图像的问题,也解决了用户拍摄存在安全隐患的问题;此外,由于第一图像数据与第一时间段内车辆周围的环境对应,也即利用车辆配置的摄像装置对车辆周围的环境进行连续拍摄,进而选取在第一时刻采集到的图像,以避免错过想要拍摄的景物。
在图1a至图8所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关设备。具体参阅图9,图9为本申请实施例提供的图像的获取装置的一种结构示意图,图像的获取装置900应用于车辆,车辆配置有摄像装置,图像的获取装置900包括:拍摄模块901,用于控制摄像装置对车辆周围的环境进行拍摄,以获取到第一图像数据,第一图像数据与第一时间段内车辆周围的环境对应;获取模块902,用于响应于获取到的拍照指令,获取与拍照指令对应的第一时刻;获取模块902,还用于根据第一时刻,从第一图像数据中获取目标图像,并输出目标图像,第一时刻包括于第一时间段内,目标图像的拍摄时刻与第一时刻之间的间隔时长小于或等于目标阈值。
在一种可能的设计中,请参阅图10,图10为本申请实施例提供的图像的获取装置的 一种结构示意图,图像的获取装置900还包括:生成模块903,用于响应于接收到的目标语音,生成拍照指令,与目标语音对应的意图为拍照;或者,生成模块903,用于响应于获取到的预设手势,生成拍照指令。
在一种可能的设计中,获取模块902,还用于从目标语音中获取目标关键词,目标关键词为拍摄对象和/或拍摄方向的描述信息;获取模块902,具体用于根据目标关键词,从第一图像数据中获取目标图像,其中,目标图像中存在目标关键词指示的对象,和/或,目标图像的拍摄方向为目标关键词指向的方向。
在一种可能的设计中,预设手势为预设静态手势;获取模块902,具体用于响应于获取到的拍照指令,获取与预设手势对应的第二时刻,并将第二时刻确定第一时刻,第二时刻为预设静态手势的拍摄时刻。
在一种可能的设计中,获取模块902,具体用于:响应于获取到的拍照指令,确定与拍照指令对应的第三时刻,第三时刻为拍照指令的生成时刻;根据第三时刻,确定第一时刻,第一时刻位于第三时刻之前。
在一种可能的设计中,请参阅图10,车辆配置有至少两个摄像装置,不同的摄像装置的拍摄范围不重叠或部分重叠;获取模块902,还用于获取第一方向,第一方向根据如下任一项或多项确定:视线方向、面部朝向和身体朝向。图像的获取装置900还包括:选取模块904,用于从至少两个摄像装置中选取拍摄范围覆盖第一方向的目标摄像装置;获取模块902,具体用于从第一图像数据中选取第二图像数据,第二图像数据中为目标摄像装置采集到的,并从第二图像数据中获取目标图像。
需要说明的是,图像的获取装置900中各模块/单元之间的信息交互、执行过程等内容,与本申请中图1a至图8对应的各个方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
接下来介绍本申请实施例提供的一种车辆,请参阅图11,图11为本申请实施例提供的车辆的一种结构示意图,其中,车辆1100上可以部署有图9对应实施例中所描述的图像的获取装置900,用于实现图1b至图8对应实施例中车辆的功能。具体的,车辆1100包括:接收器1101、发射器1102、处理器1103和存储器1104(其中车辆1100中的处理器1103的数量可以一个或多个,图11中以一个处理器为例),其中,处理器1103可以包括应用处理器11031和通信处理器11032。在本申请的一些实施例中,接收器1101、发射器1102、处理器1103和存储器1104可通过总线或其它方式连接。
存储器1104可以包括只读存储器和随机存取存储器,并向处理器1103提供指令和数据。存储器1104的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1104存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。
处理器1103控制车辆的操作。具体的应用中,车辆的各个组件通过总线***耦合在一起,其中总线***除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线***。
上述本申请实施例揭示的方法可以应用于处理器1103中,或者由处理器1103实现。处理器1103可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1103中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1103可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1103可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领数据域成熟的存储介质中。该存储介质位于存储器1104,处理器1103读取存储器1104中的信息,结合其硬件完成上述方法的步骤。
接收器1101可用于接收输入的数字或字符信息,以及产生与车辆的相关设置以及功能控制有关的信号输入。发射器1102可用于通过第一接口输出数字或字符信息;发射器1102还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;发射器1102还可以包括显示屏等显示设备。
本申请实施例中,处理器1103,用于执行图1b至图8对应实施例中的车辆执行的图像的获取方法。需要说明的是,处理器1103中的应用处理器11031执行上述各个步骤的具体方式,与本申请中图1b至图8对应的各个方法实施例基于同一构思,其带来的技术效果与本申请中图1b至图8对应的各个方法实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
请参阅图12,图12为本申请实施例提供的车辆的一种结构示意图,当车辆1100具体表现为自动驾驶车辆时,车辆1100配置为完全或部分地自动驾驶模式,例如,车辆1100可以在处于自动驾驶模式中的同时控制自身,并且可通过人为操作来确定车辆及其周边环境的当前状态,确定周边环境中的至少一个其他车辆的可能行为,并确定其他车辆执行可能行为的可能性相对应的置信水平,基于所确定的信息来控制车辆1100。在车辆1100处于自动驾驶模式中时,也可以将车辆1100置为在没有和人交互的情况下操作。
车辆1100可包括各种子***,例如行进***102、传感器***104、控制***106、一个或多个***设备108以及电源110、计算机***112和用户接口116。可选地,车辆1100可包括更多或更少的子***,并且每个子***可包括多个部件。另外,车辆1100的每个子***和部件可以通过有线或者无线互连。
行进***102可包括为车辆1100提供动力运动的组件。在一个实施例中,行进***102可包括引擎118、能量源119、传动装置120和车轮/轮胎121。
其中,引擎118可以是内燃引擎、电动机、空气压缩引擎或其他类型的引擎组合,例如,汽油发动机和电动机组成的混动引擎,内燃引擎和空气压缩引擎组成的混动引擎。引擎118将能量源119转换成机械能量。能量源119的示例包括汽油、柴油、其他基于石油 的燃料、丙烷、其他基于压缩气体的燃料、乙醇、太阳能电池板、电池和其他电力来源。能量源119也可以为车辆1100的其他***提供能量。传动装置120可以将来自引擎118的机械动力传送到车轮121。传动装置120可包括变速箱、差速器和驱动轴。在一个实施例中,传动装置120还可以包括其他器件,比如离合器。其中,驱动轴可包括可耦合到一个或多个车轮121的一个或多个轴。
传感器***104可包括感测关于车辆1100周边的环境的信息的若干个传感器,用于获取各个时刻的环境所对应的点云数据,以及各个时刻的环境所对应的图像。例如,传感器***104可包括定位***122(定位***可以是全球定位GPS***,也可以是北斗***或者其他定位***)、惯性测量单元(inertial measurement unit,IMU)124、雷达126、激光测距仪128以及相机130。传感器***104还可包括被监视车辆1100的内部***的传感器(例如,车内空气质量监测器、燃油量表、机油温度表等)。来自这些传感器中的一个或多个的传感数据可用于检测对象及其相应特性(位置、形状、方向、速度等)。这种检测和识别是自主车辆1100的安全操作的关键功能。
其中,定位***122可用于估计车辆1100的地理位置。IMU 124用于基于惯性加速度来感知车辆1100的位置和朝向变化。在一个实施例中,IMU 124可以是加速度计和陀螺仪的组合。雷达126可利用无线电信号来感知车辆1100的周边环境内的物体,具体可以表现为毫米波雷达或激光雷达。在一些实施例中,除了感知物体以外,雷达126还可用于感知物体的速度和/或前进方向。激光测距仪128可利用激光来感知车辆1100所位于的环境中的物体。在一些实施例中,激光测距仪128可包括一个或多个激光源、激光扫描器以及一个或多个检测器,以及其他***组件。相机130可用于捕捉车辆1100的周边环境的多个图像。相机130可以是静态相机或视频相机。
控制***106为控制车辆1100及其组件的操作。控制***106可包括各种部件,其中包括转向***132、油门134、制动单元136、计算机视觉***140、线路控制***142以及障碍避免***144。
其中,转向***132可操作来调整车辆1100的前进方向。例如在一个实施例中可以为方向盘***。油门134用于控制引擎118的操作速度并进而控制车辆1100的速度。制动单元136用于控制车辆1100减速。制动单元136可使用摩擦力来减慢车轮121。在其他实施例中,制动单元136可将车轮121的动能转换为电流。制动单元136也可采取其他形式来减慢车轮121转速从而控制车辆1100的速度。计算机视觉***140可以操作来处理和分析由相机130捕捉的图像以便识别车辆1100周边环境中的物体和/或特征。所述物体和/或特征可包括交通信号、道路边界和障碍体。计算机视觉***140可使用物体识别算法、运动中恢复结构(Structure from Motion,SFM)算法、视频跟踪和其他计算机视觉技术。在一些实施例中,计算机视觉***140可以用于为环境绘制地图、跟踪物体、估计物体的速度等等。线路控制***142用于确定车辆1100的行驶路线以及行驶速度。在一些实施例中,线路控制***142可以包括横向规划模块1421和纵向规划模块1422,横向规划模块1421和纵向规划模块1422分别用于结合来自障碍避免***144、GPS 122和一个或多个预定地图的数据为车辆1100确定行驶路线和行驶速度。障碍避免***144用于识别、评估和避免 或者以其他方式越过车辆1100的环境中的障碍体,前述障碍体具体可以表现为实际障碍体和可能与车辆1100发生碰撞的虚拟移动体。在一个实例中,控制***106可以增加或替换地包括除了所示出和描述的那些以外的组件。或者也可以减少一部分上述示出的组件。
车辆1100通过***设备108与外部传感器、其他车辆、其他计算机***或用户之间进行交互。***设备108可包括无线通信***146、车载电脑148、麦克风150和/或扬声器152。在一些实施例中,***设备108为车辆1100的用户提供与用户接口116交互的手段。例如,车载电脑148可向车辆1100的用户提供信息。用户接口116还可操作车载电脑148来接收用户的输入。车载电脑148可以通过触摸屏进行操作。在其他情况中,***设备108可提供用于车辆1100与位于车内的其它设备通信的手段。例如,麦克风150可从车辆1100的用户接收音频(例如,语音命令或其他音频输入)。类似地,扬声器152可向车辆1100的用户输出音频。无线通信***146可以包括图12中示出的接收器1201和发射器1202。
电源110可向车辆1100的各种组件提供电力。在一个实施例中,电源110可以为可再充电锂离子或铅酸电池。这种电池的一个或多个电池组可被配置为电源为车辆1100的各种组件提供电力。在一些实施例中,电源110和能量源119可一起实现,例如一些全电动车中那样。
车辆1100的部分或所有功能受计算机***112控制。计算机***112可包括至少一个处理器1103和存储器1104,对于处理器1103和存储器1104的功能的描述可参阅上述图12中的描述,此处不再赘述。
计算机***112可基于从各种子***(例如,行进***102、传感器***104和控制***106)以及从用户接口116接收的输入来控制车辆1100的功能。例如,计算机***112可利用来自控制***106的输入以便控制转向***132来避免由传感器***104和障碍避免***144检测到的障碍体。在一些实施例中,计算机***112可操作来对车辆1100及其子***的许多方面提供控制。
可选地,上述这些组件中的一个或多个可与车辆1100分开安装或关联。例如,存储器1104可以部分或完全地与车辆1100分开存在。上述组件可以按有线和/或无线方式来通信地耦合在一起。
可选地,上述组件只是一个示例,实际应用中,上述各个模块中的组件有可能根据实际需要增添或者删除,图12不应理解为对本申请实施例的限制。在道路行进的自动驾驶车辆,如上面的车辆1100,可以识别其周围环境内的物体以确定对当前速度的调整。所述物体可以是其它车辆、交通控制设备、或者其它类型的物体。在一些示例中,可以独立地考虑每个识别的物体,并且基于物体的各自的特性,诸如它的当前速度、加速度、与车辆的间距等,可以用来确定自动驾驶车辆所要调整的速度。
可选地,车辆1100或者与车辆1100相关联的计算设备如图12的计算机***112、计算机视觉***140、存储器1104可以基于所识别的物体的特性和周围环境的状态(例如,交通、雨、道路上的冰、等等)来预测所识别的物体的行为。可选地,每一个所识别的物体都依赖于彼此的行为,因此还可以将所识别的所有物体全部一起考虑来预测单个识别的物体的行为。车辆1100能够基于预测的所识别的物体的行为来调整它的速度。换句话说, 车辆1100能够基于所预测的物体的行为来确定车辆将需要调整到(例如,加速、减速、或者停止)什么稳定状态。在这个过程中,也可以考虑其它因素来确定车辆1100的速度,诸如,车辆1100在行驶的道路中的横向位置、道路的曲率、静态和动态物体的接近度等等。除了提供调整自动驾驶车辆的速度的指令之外,计算设备还可以提供修改车辆1100的转向角的指令,以使得车辆1100遵循给定的轨迹和/或维持与车辆1100附近的物体(例如,道路上的相邻车道中的轿车)的安全横向和纵向距离。
上述车辆1100可以为轿车、卡车、摩托车、公共汽车、船、飞机、直升飞机、割草机、娱乐车、游乐场车辆、施工设备、电车、高尔夫球车、火车、和手推车等,本申请实施例不做特别的限定。
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述图1b至图8所示实施例描述的方法中车辆所执行的步骤。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述图1b至图8所示实施例描述的方法中车辆所执行的步骤。
本申请实施例提供的图像的获取装置具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使芯片执行上述图1b至图8所示实施例描述的图像的获取方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述第一方面方法的程序执行的集成电路。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领数据域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干 指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。

Claims (15)

  1. 一种图像获取方法,其特征在于,所述方法应用于车辆,所述车辆配置有摄像装置,所述方法包括:
    控制所述摄像装置对所述车辆周围的环境进行拍摄,以获取第一图像数据,所述第一图像数据与第一时间段内所述车辆周围的环境对应;
    响应于获取到的拍照指令,获取与所述拍照指令对应的第一时刻;
    根据所述第一时刻,从所述第一图像数据中获取目标图像,所述第一时刻包括于所述第一时间段内,所述目标图像的拍摄时刻与所述第一时刻之间的间隔时长小于或等于目标阈值。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    响应于接收到的目标语音,生成所述拍照指令,所述目标语音对应的意图为拍照;或者,
    响应于获取到的预设手势,生成所述拍照指令。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    从所述目标语音中获取目标关键词,所述目标关键词包括拍摄对象和/或拍摄方向的描述信息;
    所述从所述第一图像数据中获取目标图像,包括:
    根据所述目标关键词,从所述第一图像数据中获取目标图像,其中,所述目标图像中存在所述目标关键词指示的对象,和/或,目标图像的拍摄方向为所述目标关键词指向的方向。
  4. 根据权利要求2所述的方法,其特征在于,所述预设手势包括预设静态手势,所述响应于获取到的拍照指令,获取与所述拍照指令对应的第一时刻,包括:
    响应于获取到的拍照指令,获取与所述预设手势对应的第二时刻,并将所述第二时刻确定所述第一时刻,所述第二时刻为所述预设静态手势的拍摄时刻。
  5. 根据权利要求1至3任一项所述的方法,其特征在于,所述响应于获取到的拍照指令,获取与所述拍照指令对应的第一时刻,包括:
    响应于获取到的拍照指令,确定与所述拍照指令对应的第三时刻,所述第三时刻为所述拍照指令的生成时刻;
    根据所述第三时刻,确定所述第一时刻,所述第一时刻位于所述第三时刻之前。
  6. 根据权利要求1或2所述的方法,其特征在于,所述车辆配置有至少两个摄像装置,不同的摄像装置的拍摄范围不重叠或部分重叠,所述方法还包括:
    获取第一方向,所述第一方向根据如下任一项或多项确定:视线方向、面部朝向和身体朝向;
    从至少两个摄像装置中选取拍摄范围覆盖所述第一方向的目标摄像装置;
    所述从所述第一图像数据中获取目标图像,包括:
    从所述第一图像数据中获取第二图像数据,所述第二图像数据中为所述目标摄像装置采集到的;
    从所述第二图像数据中获取所述目标图像。
  7. 一种图像获取装置,其特征在于,所述图像获取装置应用于车辆,所述车辆配置有摄像装置,所述图像的获取装置包括:
    拍摄模块,用于控制所述摄像装置对所述车辆周围的环境进行拍摄,以获取第一图像数据,所述第一图像数据与第一时间段内所述车辆周围的环境对应;
    获取模块,用于响应于获取到的拍照指令,获取与所述拍照指令对应的第一时刻;
    所述获取模块,还用于根据所述第一时刻,从所述第一图像数据中获取目标图像,所述第一时刻包括于所述第一时间段内,所述目标图像的拍摄时刻与所述第一时刻之间的间隔时长小于或等于目标阈值。
  8. 根据权利要求7所述的装置,其特征在于,所述装置还包括:
    生成模块,用于响应于接收到的目标语音,生成所述拍照指令,所述目标语音对应的意图为拍照;或者,
    所述生成模块,用于响应于获取到的预设手势,生成所述拍照指令。
  9. 根据权利要求8所述的装置,其特征在于,所述获取模块,还用于从所述目标语音中获取目标关键词,所述目标关键词包括拍摄对象和/或拍摄方向的描述信息;
    所述获取模块,具体用于根据所述目标关键词,从所述第一图像数据中获取目标图像,其中,所述目标图像中存在所述目标关键词指示的对象,和/或,目标图像的拍摄方向为所述目标关键词指向的方向。
  10. 根据权利要求8所述的装置,其特征在于,所述预设手势包括预设静态手势;
    所述获取模块,具体用于响应于获取到的拍照指令,获取与所述预设手势对应的第二时刻,并将所述第二时刻确定所述第一时刻,所述第二时刻为所述预设静态手势的拍摄时刻。
  11. 根据权利要求7至9任一项所述的装置,其特征在于,所述获取模块,具体用于:
    响应于获取到的拍照指令,确定与所述拍照指令对应的第三时刻,所述第三时刻为所述拍照指令的生成时刻;
    根据所述第三时刻,确定所述第一时刻,所述第一时刻位于所述第三时刻之前。
  12. 根据权利要求7或8所述的装置,其特征在于,所述车辆配置有至少两个摄像装置,不同的摄像装置的拍摄范围不重叠或部分重叠;
    所述获取模块,还用于获取第一方向,所述第一方向根据如下任一项或多项确定:视线方向、面部朝向和身体朝向;
    所述装置还包括:选取模块,用于从至少两个摄像装置中获取拍摄范围覆盖所述第一方向的目标摄像装置;
    所述获取模块,具体用于从所述第一图像数据中选取第二图像数据,所述第二图像数据中为所述目标摄像装置采集到的,并从所述第二图像数据中获取所述目标图像。
  13. 一种计算机程序,其特征在于,当所述计算机程序在计算机上运行时,使得计算机执行如权利要求1至6中任意一项所述的方法。
  14. 一种计算机可读存储介质,其特征在于,包括计算机程序,当所述计算机程序在计 算机上运行时,使得计算机执行如权利要求1至6中任意一项所述的方法。
  15. 一种车辆,其特征在于,包括处理器和存储器,所述处理器与所述存储器耦合,
    所述存储器,用于存储程序;
    所述处理器,用于执行所述存储器中的程序,使得所述车辆执行如权利要求1至6中任一项所述的方法。
PCT/CN2021/083874 2021-03-30 2021-03-30 一种图像的获取方法以及相关设备 WO2022204925A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180000814.3A CN113228620B (zh) 2021-03-30 2021-03-30 一种图像的获取方法以及相关设备
PCT/CN2021/083874 WO2022204925A1 (zh) 2021-03-30 2021-03-30 一种图像的获取方法以及相关设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/083874 WO2022204925A1 (zh) 2021-03-30 2021-03-30 一种图像的获取方法以及相关设备

Publications (1)

Publication Number Publication Date
WO2022204925A1 true WO2022204925A1 (zh) 2022-10-06

Family

ID=77081270

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083874 WO2022204925A1 (zh) 2021-03-30 2021-03-30 一种图像的获取方法以及相关设备

Country Status (2)

Country Link
CN (1) CN113228620B (zh)
WO (1) WO2022204925A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113640013A (zh) * 2021-08-12 2021-11-12 安徽江淮汽车集团股份有限公司 用于辅助驾驶的道路测试数据处理方法
CN115802146B (zh) * 2021-09-07 2024-04-02 荣耀终端有限公司 一种录像中抓拍图像的方法及电子设备
CN114040107B (zh) * 2021-11-19 2024-04-16 智己汽车科技有限公司 智能汽车图像拍摄***、方法、车辆及介质
CN114201225A (zh) * 2021-12-14 2022-03-18 阿波罗智联(北京)科技有限公司 一种车机功能唤醒的方法及装置
CN114760417A (zh) * 2022-04-25 2022-07-15 北京地平线信息技术有限公司 一种图像拍摄方法和装置、电子设备和存储介质
US20240114236A1 (en) * 2022-09-29 2024-04-04 Samsung Electronics Co., Ltd. Apparatus and method for controlling a robot photographer with semantic intelligence
CN116300092B (zh) * 2023-03-09 2024-05-14 北京百度网讯科技有限公司 智能眼镜的控制方法、装置、设备以及存储介质
CN117041627B (zh) * 2023-09-25 2024-03-19 宁波均联智行科技股份有限公司 Vlog视频生成方法及电子设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108495071A (zh) * 2018-02-26 2018-09-04 浙江吉利汽车研究院有限公司 一种行车记录仪紧急拍摄方法及***
CN110876011A (zh) * 2018-08-30 2020-03-10 上海博泰悦臻电子设备制造有限公司 基于图像识别技术的行车拍摄方法及车辆
CN111277755A (zh) * 2020-02-12 2020-06-12 广州小鹏汽车科技有限公司 一种拍照控制方法、***及车辆
CN111385475A (zh) * 2020-03-11 2020-07-07 Oppo广东移动通信有限公司 图像获取方法、拍照装置、电子设备及可读存储介质
US20200290648A1 (en) * 2019-03-11 2020-09-17 Honda Motor Co., Ltd. Vehicle control system, vehicle control method, and storage medium
CN112109729A (zh) * 2019-06-19 2020-12-22 宝马股份公司 用于车载***的人机交互方法、装置和***
CN112513787A (zh) * 2020-07-03 2021-03-16 华为技术有限公司 车内隔空手势的交互方法、电子装置及***

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN204559720U (zh) * 2015-04-23 2015-08-12 宁波树袋熊汽车智能科技有限公司 一种可在汽车行进中拍摄风景影像的装置
JP2017001532A (ja) * 2015-06-10 2017-01-05 富士重工業株式会社 車両の走行制御装置
CN105869233A (zh) * 2016-03-25 2016-08-17 奇瑞汽车股份有限公司 实现智能交互的行车记录仪及其控制方法
CN106131413B (zh) * 2016-07-19 2020-04-14 纳恩博(北京)科技有限公司 一种拍摄设备的控制方法及拍摄设备
US10218837B1 (en) * 2018-02-12 2019-02-26 Benjamin J. Michael Dweck Systems and methods for preventing concurrent driving and use of a mobile phone
CN108375986A (zh) * 2018-03-30 2018-08-07 深圳市道通智能航空技术有限公司 无人机的控制方法、装置及终端
CN108712610A (zh) * 2018-05-18 2018-10-26 北京京东尚科信息技术有限公司 智能照相机
WO2020227996A1 (zh) * 2019-05-15 2020-11-19 深圳市大疆创新科技有限公司 一种拍摄控制方法、装置及控制设备、拍摄设备
CN111899518A (zh) * 2020-07-13 2020-11-06 深圳市多威尔科技有限公司 一种基于物联网的电单车违章管理***
EP4030751A4 (en) * 2020-07-27 2022-11-23 Huawei Technologies Co., Ltd. METHOD, DEVICE AND SYSTEM FOR VIDEO ASSEMBLY
CN112182256A (zh) * 2020-09-28 2021-01-05 长城汽车股份有限公司 一种物体识别方法、装置及车辆
CN112380922B (zh) * 2020-10-23 2024-03-22 岭东核电有限公司 复盘视频帧确定方法、装置、计算机设备和存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108495071A (zh) * 2018-02-26 2018-09-04 浙江吉利汽车研究院有限公司 一种行车记录仪紧急拍摄方法及***
CN110876011A (zh) * 2018-08-30 2020-03-10 上海博泰悦臻电子设备制造有限公司 基于图像识别技术的行车拍摄方法及车辆
US20200290648A1 (en) * 2019-03-11 2020-09-17 Honda Motor Co., Ltd. Vehicle control system, vehicle control method, and storage medium
CN112109729A (zh) * 2019-06-19 2020-12-22 宝马股份公司 用于车载***的人机交互方法、装置和***
CN111277755A (zh) * 2020-02-12 2020-06-12 广州小鹏汽车科技有限公司 一种拍照控制方法、***及车辆
CN111385475A (zh) * 2020-03-11 2020-07-07 Oppo广东移动通信有限公司 图像获取方法、拍照装置、电子设备及可读存储介质
CN112513787A (zh) * 2020-07-03 2021-03-16 华为技术有限公司 车内隔空手势的交互方法、电子装置及***

Also Published As

Publication number Publication date
CN113228620B (zh) 2022-07-22
CN113228620A (zh) 2021-08-06

Similar Documents

Publication Publication Date Title
WO2022204925A1 (zh) 一种图像的获取方法以及相关设备
WO2021052213A1 (zh) 调整油门踏板特性的方法和装置
WO2021013193A1 (zh) 一种交通灯的识别方法及装置
WO2021212379A1 (zh) 车道线检测方法及装置
WO2022000448A1 (zh) 车内隔空手势的交互方法、电子装置及***
EP4029750A1 (en) Data presentation method and terminal device
CN113631452B (zh) 一种变道区域获取方法以及装置
WO2021063012A1 (zh) 视频通话人脸呈现方法、视频通话装置及汽车
EP4137914A1 (en) Air gesture-based control method and apparatus, and system
US20240137721A1 (en) Sound-Making Apparatus Control Method, Sound-Making System, and Vehicle
WO2021217575A1 (zh) 用户感兴趣对象的识别方法以及识别装置
US20230232113A1 (en) Method and apparatus for controlling light compensation time of camera module
WO2024093768A1 (zh) 一种车辆告警方法以及相关设备
CN112810603A (zh) 定位方法和相关产品
CN114771539B (zh) 车辆变道决策方法、装置、存储介质及车辆
CN115223122A (zh) 物体的三维信息确定方法、装置、车辆与存储介质
CN115170630A (zh) 地图生成方法、装置、电子设备、车辆和存储介质
CN114880408A (zh) 场景构建方法、装置、介质以及芯片
CN208515473U (zh) 一车辆和超声波***
CN115214629B (zh) 自动泊车方法、装置、存储介质、车辆及芯片
CN114572219B (zh) 自动超车方法、装置、车辆、存储介质及芯片
CN115115822B (zh) 车端图像处理方法、装置、车辆、存储介质及芯片
CN115042813B (zh) 车辆控制方法、装置、存储介质及车辆
WO2019013929A1 (en) METHODS AND SYSTEMS FOR PROVIDING REMOTE ASSISTANCE TO A VEHICLE
WO2024108380A1 (zh) 自动泊车方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21933610

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21933610

Country of ref document: EP

Kind code of ref document: A1