WO2024045616A1 - 定向声道的选择方法、电子设备、介质和车辆 - Google Patents

定向声道的选择方法、电子设备、介质和车辆 Download PDF

Info

Publication number
WO2024045616A1
WO2024045616A1 PCT/CN2023/086442 CN2023086442W WO2024045616A1 WO 2024045616 A1 WO2024045616 A1 WO 2024045616A1 CN 2023086442 W CN2023086442 W CN 2023086442W WO 2024045616 A1 WO2024045616 A1 WO 2024045616A1
Authority
WO
WIPO (PCT)
Prior art keywords
directional sound
sound channel
target
image
recognized
Prior art date
Application number
PCT/CN2023/086442
Other languages
English (en)
French (fr)
Inventor
陈瑞
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2024045616A1 publication Critical patent/WO2024045616A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/38Transceivers, i.e. devices in which transmitter and receiver form a structural unit and in which at least one part is used for functions of transmitting and receiving
    • H04B1/3822Transceivers, i.e. devices in which transmitter and receiver form a structural unit and in which at least one part is used for functions of transmitting and receiving specially adapted for use in vehicles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/40Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
    • H04W4/48Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for in-vehicle communication

Definitions

  • the present disclosure relates to the field of computer technology, and specifically, to a directional sound channel selection method, an electronic device, a computer-readable storage medium, and a vehicle.
  • Embodiments of the present disclosure provide a directional sound channel selection method, an electronic device, a computer-readable storage medium, and a vehicle.
  • a method for selecting a directional sound channel including: obtaining an image to be recognized that reflects an object within a predetermined angle range around the target object; determining the position information of the target person in the image to be recognized; And a directional sound channel that matches the sound propagation direction with the position information of the target person is used as a target directional sound channel.
  • an electronic device includes: a memory on which an executable program is stored; and one or more processors.
  • the one or more processors call the When the program is executed, the selection method provided by the first aspect of the present disclosure can be implemented.
  • a computer-readable storage medium is provided, on which an executable program is stored.
  • the selection method provided by the first aspect of the present disclosure can be implemented.
  • a vehicle includes a vehicle body, an electronic device, a camera device, and a plurality of intercom devices, wherein the electronic device is an electronic device provided by the second aspect of the present disclosure.
  • Equipment the vehicle body is the target object
  • the camera device is used to acquire the image to be recognized
  • the plurality of intercom devices respectively correspond to a plurality of directional sound channels with different sound propagation directions.
  • Figure 1 is a flow chart of an embodiment of a directional sound channel selection method provided by the present disclosure
  • FIG. 2 is a flow chart of an implementation of step S130;
  • FIG. 3 is a flow chart of an implementation of step S120;
  • Figure 4 is a flow chart of another embodiment of the directional sound channel selection method provided by the present disclosure.
  • Figure 5 is a schematic diagram showing the correspondence between the directional sound channel and the area of the image to be recognized.
  • Figure 6 is a schematic diagram showing turning on one intercom device of the vehicle and turning off other intercom devices.
  • a directional sound channel selection method is provided. As shown in FIG. 1 , the selection method includes arranging S110 to S130.
  • step S110 an image to be recognized that reflects an object within a predetermined angle range around the target object is obtained.
  • step S120 the location information of the target person in the image to be recognized is determined.
  • step S130 the directional sound channel whose sound propagation direction matches the position information of the target person is used as the target directional sound channel.
  • the target object may be a vehicle or other equipment that requires safety, such as a counter window.
  • the predetermined angle range is not particularly limited and can be determined according to the specific type of the target object and the specific scene in which the target object is located.
  • the predetermined angle may be 360°, so that the image to be recognized may be a panoramic image around the target object.
  • the present disclosure is not limited to this.
  • the predetermined angle may not exceed 180°.
  • the image to be recognized After acquiring the image to be recognized within a predetermined angle range around the target object, the image to be recognized is recognized, and after the target person is recognized, the directional sound channel of the corresponding orientation is used as the target directional sound channel. Subsequently, the intercom device corresponding to the target directional sound channel is controlled to open, so that "when the person in the target object is in a closed state (for example, without opening the window)", the person in the target object can communicate with the outside of the target object. character to make a call.
  • AI artificial intelligence
  • a target portrait in the panoramic image can be identified by inputting the image to be recognized into a deep learning neural network.
  • the "image to be recognized” here can be a panoramic image captured by a panoramic camera that reflects the surrounding target object, or it can be a combination of multiple sub-images captured by cameras with different orientations.
  • the location information of the target person is not particularly limited.
  • the position information of the target person may be the relative position of the target person relative to the target object.
  • the relative position of the target person relative to the target object may be the angle that the target person rotates relative to the reference line.
  • the sound propagation direction of each directional sound channel can cover a certain angular range. After determining the angle that the target person rotates relative to the reference line, the target directional sound channel can be determined based on the angular range covered by each directional sound channel.
  • the position information of the target person includes the coordinates of the characteristic pixels that make up the portrait of the target person in the image to be recognized.
  • step S130 using the directional sound channel whose sound propagation direction matches the position information of the target person as the target directional sound channel (ie, step S130 ) includes the following steps S131 to S132 .
  • step S131 the identification information of the directional sound channel corresponding to the coordinates of the characteristic pixel is determined according to a mapping table, where the mapping table includes a mapping relationship between the pixel coordinate range and the identification information of the directional sound channel.
  • step S132 the directional sound channel corresponding to the identification information of the directional sound channel corresponding to the characteristic pixel is used as the target directional sound channel.
  • each directional sound channel corresponds to a pixel coordinate range. If the characteristic pixels of the portrait fall within the pixel coordinates of a certain directional sound channel, then this directional sound channel is the directional sound channel that matches the target person.
  • the "identification information" may be a "number”.
  • Table 1 below shows an implementation of the mapping table.
  • pixel in the portrait is the "characteristic pixel of the portrait", as long as it can reflect the position of the corresponding person.
  • the coordinates at the center position of the portrait can be used as the "feature pixel of the portrait”.
  • each directional sound channel corresponds to a rectangular area on the image to be recognized.
  • Table 1 (x1, y1) is the upper left corner of the area corresponding to the directional sound channel numbered 1.
  • the coordinates of the pixel point in the corner, (x1', y1') is the coordinate of the pixel point in the lower right corner of the area corresponding to the directional sound channel numbered 1;
  • (x2, y2) is the area corresponding to the directional sound channel numbered 2
  • the coordinates of the pixel point in the upper left corner of The coordinates of the pixel point in the upper left corner of the area, (x3', y3') are the coordinates of the pixel point in the lower right corner of the area corresponding to the directional sound channel number 3;
  • (x4, y4) are the directional sound channel number 4
  • the coordinates of the pixel point in the upper left corner of the area corresponding to the channel, (x4', y4') are the coordinates of the pixel point in the
  • the characteristic pixels of the target person's portrait can be used as the pixels of the center point of the human face in the target person's portrait.
  • the image to be recognized may include sub-images captured by multiple cameras.
  • the image to be recognized includes sub-images captured by multiple cameras, and the mapping relationship in the mapping table may include a mapping relationship between the identification information of the cameras, the pixel coordinate range, and the identification information of the directional sound channel.
  • the identification information of the camera may be a camera number.
  • Table 2 below shows an implementation of the above mapping table.
  • each directional sound channel corresponds to a rectangular area on the image to be recognized.
  • Table 2 (x1, y1) is the upper left corner of the area corresponding to the directional sound channel numbered 1.
  • the coordinates of the pixel point in the corner, (x1', y1') is the coordinate of the pixel point in the lower right corner of the area corresponding to the directional sound channel numbered 1;
  • (x2, y2) is the area corresponding to the directional sound channel numbered 2
  • the coordinates of the pixel point in the upper left corner of The coordinates of the pixel point in the upper left corner of the area, (x3', y3') are the coordinates of the pixel point in the lower right corner of the area corresponding to the directional sound channel number 3;
  • (x4, y4) are the directional sound channel number 4
  • the coordinates of the pixel point in the upper left corner of the area corresponding to the channel, (x4', y4') are the coordinates of the pixel point in the
  • determining the position information of the target person in the image to be recognized includes the following steps S121 to S124.
  • step S121 the portrait in the image to be recognized is identified.
  • step S122 the distance between the person corresponding to the portrait and the target object is determined.
  • step S123 a person whose distance from the target object does not exceed a predetermined distance threshold is used as the target person.
  • step S124 the location information of the target person is determined.
  • a person whose distance from the target object does not exceed a predetermined distance threshold is used as the target person.
  • step S121 can be performed through AI recognition.
  • step S122 in the step of determining the distance between the person corresponding to the portrait and the target object (ie, step S122), the following formula is used to calculate the distance between the person corresponding to the portrait and the target object.
  • distance: D (Wf*F)/Wp;
  • face width can be an empirical value.
  • the width of a human face is usually between 12cm and 14cm.
  • a camera for example, a camera
  • the selection method further includes the following steps S102 to S104.
  • step S102 a video stream is received.
  • step S104 each video frame in the video stream is used as the image to be recognized.
  • each frame of image in the video stream can be used as an image to be recognized.
  • the corresponding directional sound channel needs to be determined.
  • an electronic device includes: a memory on which an executable program is stored; and one or more processors.
  • the one or more processors call the When the program is executed, the directional sound channel selection method provided by the first aspect of the present disclosure can be implemented.
  • the electronic device when the target object is a vehicle, the electronic device may be a vehicle-mounted device provided inside the vehicle.
  • the electronic device may also include one or more I/O interfaces, connected between the processor and the memory, and configured to implement information exchange between the processor and the memory.
  • I/O interfaces connected between the processor and the memory, and configured to implement information exchange between the processor and the memory.
  • a processor is a device with data processing capabilities, including but not limited to a central processing unit (CPU), etc.
  • a memory is a device with data storage capabilities, including but not limited to random access memory (RAM, more specifically such as SDRAM, DDR etc.), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory (FLASH); the I/O interface (read-write interface) is connected between the processor and the memory, and can realize the Information exchange, including but not limited to data bus (Bus), etc.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • FLASH flash memory
  • the I/O interface read-write interface
  • the processor, memory, and I/O interfaces are connected to each other and, in turn, to other components of the computing device via a bus.
  • a computer-readable storage medium which An executable program is stored on the computer, and when the executable program is called, the directional sound channel selection method provided by the first aspect of the present disclosure can be implemented.
  • a vehicle includes a vehicle body, electronic equipment, a camera device and a plurality of intercom equipment, wherein the electronic equipment is the above-mentioned electronic equipment provided by the present disclosure, and the The vehicle body is the target object, the camera device is used to acquire the image to be recognized, and the plurality of intercom devices respectively correspond to multiple directional sound channels with different sound propagation directions.
  • the image to be recognized After acquiring the image to be recognized within a predetermined angle range around the vehicle body, the image to be recognized is recognized, and after the target person is recognized, the directional sound channel of the corresponding orientation is used as the target directional sound channel. Subsequently, the intercom device corresponding to the target directional sound channel is controlled to open, so that "when the person in the target object is in a closed state (for example, without opening the window)", the person in the target object can communicate with the outside of the target object. character to make a call.
  • the specific type and structure of the intercom device are not particularly limited.
  • an intercom host and an intercom auxiliary machine that match the intercom host are provided.
  • the intercom host is provided inside the vehicle body, and the intercom auxiliary machine is provided in the vehicle. on the outer surface of the body.
  • a plurality of the intercom devices are arranged around the vehicle body.
  • the main intercom may include a microphone and a speaker
  • the auxiliary intercom may include a sound directional microphone and a sound directional external speaker. Multiple sound-directional microphones and multiple sound-directional external speakers can cover all areas around the car body.
  • both the main intercom and the secondary intercom are turned off by default.
  • the camera device and the electronic device can be connected via wireless (eg, WIFI, Bluetooth) or wired communication
  • the intercom device can be connected via wireless (eg, WIFI, Bluetooth) or wired Communication connection with electronic equipment.
  • the camera device may include multiple cameras with different orientations; as another optional implementation, the camera device may be a panoramic camera.
  • the camera device uses Real Time Streaming Protocol (RTSP) to push the video stream to the electronic device.
  • RTSP Real Time Streaming Protocol
  • each camera sends the collected images to the electronic device in real time, where the camera focal length is F.
  • the electronic device trains the image object classification model to support face recognition.
  • the video stream received by the electronic device uses the frame image in the video stream as the image to be recognized for face recognition, and can obtain the center of the face.
  • the pixel area coordinates of the pixel in the image are used to support face recognition.
  • the electronic device determines that there is a human body in the image captured by camera N, and calculates the distance between the person and the camera.
  • the usual calculation method is: (face width*F)/face pixel width.
  • the electronic device determines that if the distance between the person and the vehicle is less than a certain preset value, it will match it with the mapping table of the camera number, image pixel block (X, Y, X', Y'), and directional sound channel.
  • the camera will be matched first. Find the N numbered camera, and then match the pixel block.
  • the directional sound channel number of the electronic equipment will turn on the intercom device of this directional sound channel and turn off the intercom device of other directional sound channels.
  • the 360° surroundings of the vehicle are divided into 8 areas (area 1 to area 8 respectively).
  • the 8 areas correspond to 8 directional sound channels respectively. If area 1 is detected If there is a target person in the target, the intercom equipment in areas 2 to 8 will be turned off, and the intercom equipment in area 1 will be turned on. In this way, people in the car can have a conversation with people outside the car in area 1 without opening the car window.
  • Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable media.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store desired information and that can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
  • Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a general illustrative sense only and not for purpose of limitation. In some instances, it will be apparent to those skilled in the art that features, characteristics and/or elements described in connection with a particular embodiment may be used alone, or may be used in conjunction with other embodiments, unless expressly stated otherwise. Features and/or components used in combination. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made without departing from the scope of the present disclosure as set forth in the appended claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Acoustics & Sound (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供一种定向声道的选择方法,包括:获取反映目标物体周围预定角度范围内物体的待识别图像;确定所述待识别图像中的目标人物的位置信息;以及将声传播方向与所述目标人物的位置信息相匹配的定向声道作为目标定向声道。本公开还提供一种电子设备、一种存储介质、和一种车辆。

Description

定向声道的选择方法、电子设备、介质和车辆 技术领域
本公开涉及计算机技术领域,具体地,涉及一种定向声道的选择方法、一种电子设备、一种计算机可读存储介质和一种车辆。
背景技术
车辆已经成为一种主要的代步工具,随着对公共卫生安全的重视,应当尽量减少打开车窗的次数。当车内人员需要与车外人对话时,不可避免的需要打开车窗,打开车窗的同时也带来了各种安全隐患。
因此,如何消除车内人员与车外人员对话时的安全隐患成为本领域亟待解决的技术问题。
发明内容
本公开实施例提供一种定向声道的选择方法、一种电子设备、一种计算机可读存储介质和一种车辆。
作为本公开的第一个方面,提供一种定向声道的选择方法,包括:获取反映目标物体周围预定角度范围内物体的待识别图像;确定所述待识别图像中的目标人物的位置信息;以及将声传播方向与所述目标人物的位置信息相匹配的定向声道作为目标定向声道。
作为本公开的第二个方面,提供一种电子设备,所述电子设备包括:存储器,其上存储有可执行程序;一个或多个处理器,当所述一个或多个处理器调用所述可执行程序时,能够实现本公开第一个方面所提供的选择方法。
作为本公开的第三个方面,提供一种计算机可读存储介质,其上存储有可执行程序,当所述可执行程序被调用时,能够实现本公开第一个方面所提供的选择方法。
作为本公开的第四个方面,提供一种车辆,所述车辆包括车辆本体、电子设备、摄像装置和多个对讲设备,其中,所述电子设备为本公开第二个方面所提供的电子设备,所述车辆本体为所述目标物体, 所述摄像装置用于获取所述待识别图像,多个所述对讲设备分别对应于声传播方向互不相同的多个定向声道。
附图说明
图1是本公开所提供的定向声道的选择方法的一种实施方式的流程图;
图2是步骤S130的一种实施方式的流程图;
图3是步骤S120的一种实施方式的流程图;
图4是本公开所提供的定向声道的选择方法的另一种实施方式的流程图;
图5是展示定向声道与待识别图像的区域之间的对应关系的示意图;以及
图6是展示打开车辆一个对讲设备、关闭其他对讲设备的示意图。
具体实施方式
为使本领域的技术人员更好地理解本公开的技术方案,下面结合附图对本公开提供的定向声道的控制方法、电子设备、计算机可读存储介质和车辆进行详细描述。
在下文中将参考附图更充分地描述示例实施例,但是所述示例实施例可以以不同形式来体现且不应当被解释为限于本文阐述的实施例。反之,提供这些实施例的目的在于使本公开透彻和完整,并将使本领域技术人员充分理解本公开的范围。
在不冲突的情况下,本公开各实施例及实施例中的各特征可相互组合。
如本文所使用的,术语“和/或”包括一个或多个相关列举条目的任何和所有组合。
本文所使用的术语仅用于描述特定实施例,且不意欲限制本公开。如本文所使用的,单数形式“一个”和“该”也意欲包括复数形式,除非上下文另外清楚指出。还将理解的是,当本说明书中使用术 语“包括”和/或“由……制成”时,指定存在所述特征、整体、步骤、操作、元件和/或组件,但不排除存在或添加一个或多个其它特征、整体、步骤、操作、元件、组件和/或其群组。
除非另外限定,否则本文所用的所有术语(包括技术和科学术语)的含义与本领域普通技术人员通常理解的含义相同。还将理解,诸如那些在常用字典中限定的那些术语应当被解释为具有与其在相关技术以及本公开的背景下的含义一致的含义,且将不解释为具有理想化或过度形式上的含义,除非本文明确如此限定。
作为本公开的第一个方面,提供一种定向声道的选择方法,如图1所示,所述选择方法包括布置S110至S130。
在步骤S110中,获取反映目标物体周围预定角度范围内物体的待识别图像。
在步骤S120中,确定所述待识别图像中的目标人物的位置信息。
在步骤S130中,将声传播方向与所述目标人物的位置信息相匹配的定向声道作为目标定向声道。
所述目标物体上设置有多个对讲设备,这多个对讲设备分别对应多个声音传播方向互不相同的定向声道。所述目标物体可以是车辆,也可以是其他对安全有需求的设备,例如,柜台橱窗。
在本公开中,对所述预定角度范围不做特殊的限定,可以根据所述目标物体的具体类型以及所述目标物体所处的具体场景来确定。作为一种可选实施方式,所述预定角度可以是360°,这样所述待识别图像可以是所述目标物体周围的全景图像。
当然,本公开并不限于此,例如,当所述目标物体为银行柜台橱窗时,所述预定角度可不超过180°。
在获取了目标物体周围预定角度范围内的待识别图像后,对所述待识别图像进行识别,并且在识别出目标人物后,将相应方位的定向声道作为目标定向声道。随后,控制与目标定向声道对应的对讲设备打开,从而可以实现“在目标物体中的人物处于封闭状态(例如,不开车窗)”的情况下,目标物体中的人物与目标物体外部的人物进行通话。
在本公开中,可以通过人工智能(AI,Artificial Intelligence)技术识别所述全景图像中是否存在目标人像。例如,可以通过将待识别图像输入深度学习神经网络的方式来识别全景图像中是否存在目标图像。
需要指出的是,此处的“待识别图像”可以是全景摄像头所拍摄的能够反映环绕目标物体的全景图像,也可以是多个朝向不同的摄像头拍摄的子图像的组合。
在本公开中,对目标人物的位置信息不做特殊的限定。作为一种可选实施方式,所述目标人物的位置信息可以是所述目标人物相对于所述目标物体的相对位置。
例如,可以在目标物体上设置基准线。所述目标人物相对于所述目标物体的相对位置可以是所述目标人物相对于所述基准线所转过的角度。每个定向声道的声传播方向可以覆盖一定的角度范围。在确定了所述目标人物相对于所述基准线所转过的角度后,根据各个定向声道所覆盖的角度范围,可以确定目标定向声道。
当然,本公开并不限于此。作为另一种可选实施方式,所述目标人物的位置信息包括组成所述目标人物的人像的特征像素在所述待识别图像中的坐标。
相应地,如图2所示,将声传播方向与所述目标人物的位置信息相匹配的定向声道作为目标定向声道(即,步骤S130)包括以下步骤S131至S132。
在步骤S131中,根据映射表确定与所述特征像素的坐标相对应的定向声道的标识信息,其中,所述映射表包括像素坐标范围与定向声道的标识信息之间的映射关系。
在步骤S132中,将与所述特征像素相对应的定向声道的标识信息所对应的定向声道作为所述目标定向声道。
在上述实施方式中,每个定向声道都对应一个像素坐标范围。如果人像的特征像素落入某个定向声道的像素坐标方位内,那么,这个定向声道就是与目标人物相匹配的定向声道。作为一种可选实施方式,所述“标识信息”可以是“编号”。
下表1示出了所述映射表的一种实施方式。
表1
在本公开中,对“人像的特征像素”是人像中的哪个像素不做特殊的限定,只要能够反应相应人物的位置即可。作为一种可选实施方式,可以将人像的中心位置处的坐标作为“人像的特征像素”。
如图5中所示,每个定向声道都对应于待识别图像上的一个矩形区域,相应地,在表1中:(x1,y1)是编号为1的定向声道对应的区域的左上角的像素点的坐标,(x1’,y1’)是编号为1的定向声道对应的区域的右下角的像素点的坐标;(x2,y2)是编号为2的定向声道对应的区域的左上角的像素点的坐标,(x2’,y2’)是编号为2的定向声道对应的区域的右下角的像素点的坐标;(x3,y3)是编号为3的定向声道对应的区域的左上角的像素点的坐标,(x3’,y3’)是编号为3的定向声道对应的区域的右下角的像素点的坐标;(x4,y4)是编号为4的定向声道对应的区域的左上角的像素点的坐标,(x4’,y4’)是编号为4的定向声道对应的区域的右下角的像素点的坐标,依次类推。
由于需要向目标人物播放声音,根据人脸的位置确定目标定向声道可以使得车外人员更好的听到声音,并且可以将车外人员的声音更好地传播到车内。因此,可以将目标人物的人像的特征像素作为所述目标人物的人像中的人脸的中心点的像素。
如上文中所述,所述待识别图像可以包括多个摄像头拍摄的分图像。相应地,所述待识别图像包括多个摄像头拍摄的分图像,所述映射表中的映射关系可以包括摄像头的标识信息、像素坐标范围与定向声道的标识信息之间的映射关系。所述摄像头的标识信息可以是摄像头的编号。
下表2示出了上述映射表的一种实施方式。
表2

如图5中所示,每个定向声道都对应于待识别图像上的一个矩形区域,相应地,在表2中:(x1,y1)是编号为1的定向声道对应的区域的左上角的像素点的坐标,(x1’,y1’)是编号为1的定向声道对应的区域的右下角的像素点的坐标;(x2,y2)是编号为2的定向声道对应的区域的左上角的像素点的坐标,(x2’,y2’)是编号为2的定向声道对应的区域的右下角的像素点的坐标;(x3,y3)是编号为3的定向声道对应的区域的左上角的像素点的坐标,(x3’,y3’)是编号为3的定向声道对应的区域的右下角的像素点的坐标;(x4,y4)是编号为4的定向声道对应的区域的左上角的像素点的坐标,(x4’,y4’)是编号为4的定向声道对应的区域的右下角的像素点的坐标,依次类推。
在本公开中,如图3所示,确定所述待识别图像中的目标人物的位置信息(即,步骤S130)包括以下步骤S121至S124。
在步骤S121中,识别所述待识别图像中的人像。
在步骤S122中,确定所述人像对应的人物与所述目标物体之间的距离。
在步骤S123中,将与所述目标物体之间的距离不超过预定距离阈值的人物作为所述目标人物。
在步骤S124中,确定所述目标人物的位置信息。
目标物体周围的人物可能非常多,只有距离目标物体足够近的人物才有可能是与目标物体内部人员通话的目标人物。因此,在本公开实施方式中,将与所述目标物体之间的距离不超过预定距离阈值的人物作为所述目标人物。
如上文中所述,可以通过AI识别的方式执行步骤S121。
在本公开中,在确定所述人像对应的人物与所述目标物体之间的距离的步骤(即,步骤S122)中,利用以下公式计算所述人像对应的人物与所述目标物体之间的距离:
D=(Wf*F)/Wp;
其中,D为所述人像对应的人物与所述目标物体之间的距离;Wf为人脸宽度;F为拍摄所述人像的摄像头的焦距;Wp为人脸像素宽度。
需要指出的是“人脸宽度”可以是一个经验值。例如,人脸宽度通常在12cm至14cm之间。
在本公开中,利用摄像装置(例如,摄像头)来拍摄待识别图像。为了精确地确定目标人物,如图4所示,在获取反映目标物体周围预定角度范围内物体的待识别图像(即,步骤S110)之前,所述选择方法还包括以下步骤S102至S104。
在步骤S102中,接收视频流。
在步骤S104中,将视频流中的各个视频帧作为所述待识别图像。
也就是说,在本公开中,可以对视频流中的每一帧图像都作为待识别图像,在任意一帧视频中识别出目标人物,都需要确定相应的定向声道。
作为本公开的第二个方面,提供一种电子设备,所述电子设备包括:存储器,其上存储有可执行程序;一个或多个处理器,当所述一个或多个处理器调用所述可执行程序时,能够实现本公开第一个方面所提供的定向声道的选择方法。
在本公开中,当目标物体为车辆时,所述电子设备可以是一种设置在车辆内部的车载设备。
可选地,所述电子设备还可以包括一个或多个I/O接口,连接在处理器与存储器之间,配置为实现处理器与存储器的信息交互。
处理器为具有数据处理能力的器件,其包括但不限于中央处理器(CPU)等;存储器为具有数据存储能力的器件,其包括但不限于随机存取存储器(RAM,更具体如SDRAM、DDR等)、只读存储器(ROM)、带电可擦可编程只读存储器(EEPROM)、闪存(FLASH);I/O接口(读写接口)连接在处理器与存储器间,能实现处理器与存储器的信息交互,其包括但不限于数据总线(Bus)等。
在一些实施例中,处理器、存储器和I/O接口通过总线相互连接,进而与计算设备的其它组件连接。
作为本公开的第三个方面,提供一种计算机可读存储介质,其 上存储有可执行程序,当所述可执行程序被调用时,能够实现本公开第一个方面所提供的定向声道的选择方法。
作为本公开的第四个方面,提供一种车辆,所述车辆包括车辆本体、电子设备、摄像装置和多个对讲设备,其中,所述电子设备为本公开所提供的上述电子设备,所述车辆本体为所述目标物体,所述摄像装置用于获取所述待识别图像,多个所述对讲设备分别对应于声传播方向互不相同的多个定向声道。
在获取了车辆本体周围预定角度范围内的待识别图像后,对所述待识别图像进行识别,并且在识别出目标人物后,将相应方位的定向声道作为目标定向声道。随后,控制与目标定向声道对应的对讲设备打开,从而可以实现“在目标物体中的人物处于封闭状态(例如,不开车窗)”的情况下,目标物体中的人物与目标物体外部的人物进行通话。
在本公开中,对所述对讲设备的具体类型和具体结构不做特殊的限定。作为一种可选实施方式,对讲主机和与所述对讲主机相匹配的对讲副机,所述对讲主机设置在所述车辆本体内部,所述对讲副机设置在所述车辆本体的外表面上。多个所述对讲设备环绕所述车辆本体设置。
在本公开中,所述主对讲机可以包括麦克风和喇叭,副对讲机可以包括声音定向麦克风和声音定向外放喇叭。多个声音定向麦克风和多个声音定向外方喇叭可以覆盖车身一周的所有区域。
可选地,主对讲机和副对讲机均处于默认关闭的状态。
作为一种可选实施方式,摄像装置和电子设备之间可以通过无线(例如,WIFI、蓝牙)或有线的方式通信连接,并且对讲设备可以通过无线(例如,WIFI、蓝牙)或有线的方式与电子设备通信连接。
作为一种可选实施方式,摄像装置可以包括多个朝向不同的摄像头;作为另一种可选实施方式,摄像装置可以是全景摄像头。
作为一种可选实施方式,摄像装置采用实时流协议(RTSP,Real Time Streaming Protocol)将视频流推送至电子设备。
下面介绍实现车辆内外人员通话的具体流程。
采取RTSP推送视频流方式,各个摄像头将采集的图像实时发送给电子设备,其中,摄像头焦距为F。
电子设备基于图像深度学习技术,训练图像物体分类模型,支持人脸识别,电子设备接收到的视频流,把视频流中的帧图像作为待识别图像进行人脸识别,并能获得人脸的中心像素在图像中的像素区域坐标。
电子设备确定N号摄像头拍摄的图像存在人体,计算人距离摄像头的距离,通常计算方法为:(人脸宽度*F)/人脸像素宽度。
电子设备确定如果人和车距离值小于某个预设值时,与摄像头编号、图像像素块(X,Y,X',Y')、定向声通道三者映射表进行匹配,先匹配摄像头,找N编号摄像头,其次匹配像素块,人脸中心点的像素坐标(X-p,Y-p)处于图像像素块区域内,即,X-p>=X,Y-p>=Y,X-p<=X且Y-p<=Y,上述都匹配成功后,就可以查到对应的定向声道编号。
电子设备定向声道编号,将打开该定向声道的对讲装置,同时关闭其他定向声道的对讲装置。
如图6所示,以车辆的中心点为中心,将车辆周围360°划分为8个区域(分别为区域1至区域8),8个区域分别对应8个定向声道,如果检测到区域1中存在目标人物,则将区域2至区域8的对讲设备关闭,并将区域1的对讲设备打开。这样,不需要打开车窗,车内人员即可与区域1中的车外人员进行对话。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、***、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质 (或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其它数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其它存储器技术、CD-ROM、数字多功能盘(DVD)或其它光盘存储、磁盒、磁带、磁盘存储或其它磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其它的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其它传输机制之类的调制数据信号中的其它数据,并且可包括任何信息递送介质。
本文已经公开了示例实施例,并且虽然采用了具体术语,但它们仅用于并仅应当被解释为一般说明性含义,并且不用于限制的目的。在一些实例中,对本领域技术人员显而易见的是,除非另外明确指出,否则可单独使用与特定实施例相结合描述的特征、特性和/或元素,或可与其它实施例相结合描述的特征、特性和/或元件组合使用。因此,本领域技术人员将理解,在不脱离由所附的权利要求阐明的本公开的范围的情况下,可进行各种形式和细节上的改变。

Claims (12)

  1. 一种定向声道的选择方法,包括:
    获取反映目标物体周围预定角度范围内物体的待识别图像;
    确定所述待识别图像中的目标人物的位置信息;以及
    将声传播方向与所述目标人物的位置信息相匹配的定向声道作为目标定向声道。
  2. 根据权利要求1所述的选择方法,其中,所述目标人物的位置信息包括所述目标人物的人像的特征像素在所述待识别图像中的坐标,并且
    将声传播方向与所述目标人物的位置信息相匹配的定向声道作为目标定向声道包括:
    根据映射表确定与所述特征像素的坐标相对应的定向声道的标识信息,其中,所述映射表包括像素坐标范围与定向声道的标识信息之间的映射关系;以及
    将与所述特征像素相对应的定向声道的标识信息所对应的定向声道作为所述目标定向声道。
  3. 根据权利要求2所述的选择方法,其中,所述目标人物的人像的特征像素为所述目标人物的人像中的人脸的中心点的像素。
  4. 根据权利要求2所述的选择方法,其中,所述待识别图像包括多个摄像头拍摄的分图像,并且
    所述映射表中的映射关系包括摄像头的标识信息、所述像素坐标范围与所述定向声道的标识信息之间的映射关系。
  5. 根据权利要求1所述的选择方法,其中,确定所述待识别图像中的目标人物的位置信息包括:
    识别所述待识别图像中的人像;
    确定所述人像对应的人物与所述目标物体之间的距离;
    将与所述目标物体之间的距离不超过预定距离阈值的人物作为所述目标人物;以及
    确定所述目标人物的位置信息。
  6. 根据权利要求5所述的选择方法,其中,在确定所述人像对应的人物与所述目标物体之间的距离的步骤中,利用以下公式计算所述人像对应的人物与所述目标物体之间的距离:
    D=(Wf*F)/Wp;
    其中,D为所述人像对应的人物与所述目标物体之间的距离;
    Wf为人脸宽度;
    F为拍摄所述人像的摄像头的焦距;
    Wp为人脸像素宽度。
  7. 根据权利要求1至6中任意一项所述的选择方法,其中,在获取反映目标物体周围预定角度范围内物体的待识别图像之前,所述选择方法还包括:
    接收视频流;以及
    将视频流中的各个视频帧作为所述待识别图像。
  8. 根据权利要求1至6中任意一项所述的选择方法,其中,所述预定角度为360°。
  9. 一种电子设备,所述电子设备包括:
    存储器,其上存储有可执行程序;
    一个或多个处理器,当所述一个或多个处理器调用所述可执行程序时,能够实现权利要求1至8中任意一项所述的选择方法。
  10. 一种计算机可读存储介质,其上存储有可执行程序,当所述可执行程序被调用时,能够实现权利要求1至8中任意一项所述的 选择方法。
  11. 一种车辆,包括:车辆本体、电子设备、摄像装置和多个对讲设备,其中,
    所述电子设备为权利要求9所述的电子设备,
    所述车辆本体为所述目标物体,
    所述摄像装置用于获取所述待识别图像,
    多个所述对讲设备分别对应于声传播方向互不相同的多个定向声道。
  12. 根据权利要求11所述的车辆,其中,所述对讲设备包括对讲主机和与所述对讲主机相匹配的对讲副机,所述对讲主机设置在所述车辆本体内部,所述对讲副机设置在所述车辆本体的外表面上。
PCT/CN2023/086442 2022-08-29 2023-04-06 定向声道的选择方法、电子设备、介质和车辆 WO2024045616A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211039768.0 2022-08-29
CN202211039768.0A CN117674880A (zh) 2022-08-29 2022-08-29 定向声道的选择方法、电子设备、介质和车辆

Publications (1)

Publication Number Publication Date
WO2024045616A1 true WO2024045616A1 (zh) 2024-03-07

Family

ID=90083091

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/086442 WO2024045616A1 (zh) 2022-08-29 2023-04-06 定向声道的选择方法、电子设备、介质和车辆

Country Status (2)

Country Link
CN (1) CN117674880A (zh)
WO (1) WO2024045616A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011201406A (ja) * 2010-03-25 2011-10-13 Denso It Laboratory Inc 車外音提供装置、車外音提供方法およびプログラム
CN113301329A (zh) * 2021-05-21 2021-08-24 康佳集团股份有限公司 基于图像识别的电视声场校正方法、装置及显示设备
WO2022001204A1 (zh) * 2020-06-29 2022-01-06 海信视像科技股份有限公司 显示设备及屏幕发声方法
CN113997863A (zh) * 2021-11-24 2022-02-01 北京字节跳动网络技术有限公司 数据处理方法、装置和车辆

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011201406A (ja) * 2010-03-25 2011-10-13 Denso It Laboratory Inc 車外音提供装置、車外音提供方法およびプログラム
WO2022001204A1 (zh) * 2020-06-29 2022-01-06 海信视像科技股份有限公司 显示设备及屏幕发声方法
CN113301329A (zh) * 2021-05-21 2021-08-24 康佳集团股份有限公司 基于图像识别的电视声场校正方法、装置及显示设备
CN113997863A (zh) * 2021-11-24 2022-02-01 北京字节跳动网络技术有限公司 数据处理方法、装置和车辆

Also Published As

Publication number Publication date
CN117674880A (zh) 2024-03-08

Similar Documents

Publication Publication Date Title
US10083710B2 (en) Voice control system, voice control method, and computer readable medium
CN106651955B (zh) 图片中目标物的定位方法及装置
US11343445B2 (en) Systems and methods for implementing personal camera that adapts to its surroundings, both co-located and remote
CN109685740B (zh) 人脸校正的方法及装置、移动终端及计算机可读存储介质
WO2017215295A1 (zh) 一种摄像机参数调整方法、导播摄像机及***
WO2022042168A1 (zh) 音频处理方法及电子设备
CN107909113B (zh) 交通事故图像处理方法、装置及存储介质
US20150146078A1 (en) Shift camera focus based on speaker position
WO2017031901A1 (zh) 人脸识别方法、装置及终端
CN107767333B (zh) 美颜拍照的方法、设备及计算机可存储介质
US9263044B1 (en) Noise reduction based on mouth area movement recognition
WO2020134866A1 (zh) 关键点检测方法及装置、电子设备和存储介质
TW201901527A (zh) 視訊會議裝置與視訊會議管理方法
WO2018121385A1 (zh) 一种信息处理方法、装置和计算机存储介质
US10681308B2 (en) Electronic apparatus and method for controlling thereof
CN110059590B (zh) 一种人脸活体验证方法、装置、移动终端及可读存储介质
WO2019223292A1 (zh) 一种拍摄角度确定方法、装置、终端及可读存储介质
US20220230267A1 (en) Image processing method and apparatus based on video conference
CN111432115A (zh) 基于声音辅助定位的人脸追踪方法、终端及存储装置
TWI588590B (zh) 影像產生系統及影像產生方法
US10191708B2 (en) Method, apparatrus and computer-readable medium for displaying image data
CN108900903A (zh) 视频处理方法及装置、电子设备和存储介质
WO2024045616A1 (zh) 定向声道的选择方法、电子设备、介质和车辆
WO2024001617A1 (zh) 玩手机行为识别方法及装置
WO2022007681A1 (zh) 拍摄控制方法、移动终端和计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23858657

Country of ref document: EP

Kind code of ref document: A1