CN115885316A

CN115885316A - Height detection method, device and storage medium

Info

Publication number: CN115885316A
Application number: CN202180006425.1A
Authority: CN
Inventors: 焦磊磊; 马超群; 张旭; 段超
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-07-29
Filing date: 2021-07-29
Publication date: 2023-03-31
Also published as: WO2023004682A1

Abstract

The application relates to a height detection method, a height detection device and a storage medium, wherein the method comprises the following steps: performing semantic plane detection on a plurality of video frames acquired by an image acquisition component of electronic equipment, and determining ground information in the plurality of video frames; carrying out face detection on the plurality of video frames to determine a face area; determining a first face pose of a target object in the plurality of video frames according to the face area and a preset face three-dimensional model; and determining a first height of the target object according to the ground information, the first face pose and the equipment pose of the electronic equipment. The height detection of the embodiment of the application does not depend on professional equipment, can automatically detect the height of the target object without manual positioning, and is convenient to operate and high in accuracy.

Description

Height detection method, device and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a height detection method, device and storage medium.

Background

The traditional height measuring method usually needs a measurer to manually operate a professional instrument, such as a height measuring instrument and the like, so that the measuring efficiency is low, and the instrument is inconvenient to carry and is not suitable for personal use.

With the development of image processing technology, the depth image of the human body target to be measured can be shot through professional equipment such as a binocular camera and a depth camera, and the height of the human body target can be measured through processing of the depth image. However, this method has a limitation in that it depends on professional equipment such as a binocular camera and a depth camera, and it is necessary to capture a complete human body image (i.e., a whole body image of a human body target).

Disclosure of Invention

In view of the above, a height detection method, a height detection device and a storage medium are provided.

In a first aspect, an embodiment of the present application provides a height detection method, including: performing semantic plane detection on a plurality of video frames acquired by an image acquisition component of electronic equipment, and determining ground information in the plurality of video frames; carrying out face detection on the plurality of video frames to determine a face area; determining a first face pose of a target object in the plurality of video frames according to the face area and a preset face three-dimensional model; and determining a first height of the target object according to the ground information, the first face pose and the equipment pose of the electronic equipment.

According to the embodiment of the application, semantic plane detection can be performed on a plurality of video frames acquired by an image acquisition component of electronic equipment, and ground information in the plurality of video frames is determined; simultaneously carrying out face detection on the plurality of video frames to determine a face area, and determining a first face pose of a target object in the plurality of video frames according to the face area and a preset face three-dimensional model; and then, the first height of the target object is determined according to the ground information, the first face pose and the equipment pose of the electronic equipment, so that the height detection does not depend on professional equipment (such as a binocular camera, a depth camera and the like), the face pose of the target object can be determined through face recognition and a face three-dimensional technology, the height of the target object is determined through the face pose, the equipment pose and the ground information, the target object does not need to be manually positioned, a complete human body image of the target object does not need to be shot, and the operation is convenient and the accuracy is high.

In a first possible implementation form of the height detection method according to the first aspect, the method further comprises at least one of: prompting a user to shoot the ground under the condition that the ground information is not detected within a preset time period; prompting a user to adjust the pose of the equipment under the condition that the pitch angle of the electronic equipment indicated by the pose of the equipment does not meet a first preset condition; prompting a user to adjust the device pose and/or change the face pose of the target object under the condition that the first face pose does not meet a second preset condition; or prompting a user to adjust the pose of the equipment under the condition that the face area does not meet a third preset condition.

According to the embodiment of the application, the ground information and the device pose indication can be detected in the preset time period, under the condition that the pitch angle of the electronic device does not meet at least one of the first preset condition, the first face pose does not meet the second preset condition and the face area does not meet the third preset condition, a user is prompted, for example, the user is prompted to shoot the ground, the device pose is adjusted, the face pose of a target object is changed, and the like, so that the user is correspondingly adjusted, and the height detection accuracy is improved.

In a second possible implementation manner of the height detection method according to the first aspect or the first possible implementation manner of the first aspect, the determining a first height of the target object according to the ground information, the first face pose, and a device pose of the electronic device includes: determining a second height of the target object according to the ground information, the first face pose and the equipment pose; and carrying out post-processing on the second height to obtain the first height, wherein the post-processing comprises Kalman filtering.

According to the embodiment of the application, the second height of the target object can be determined according to the ground information, the first face pose and the equipment pose, post-processing such as Kalman filtering is carried out on the second height, the first height of the target object is obtained, and therefore the height detection accuracy can be improved.

In a third possible implementation form of the height detection method according to the first aspect as such or the first possible implementation form of the first aspect or the second possible implementation form of the first aspect, the method further comprises: displaying the first body height in a display interface of the electronic device.

According to the embodiment of the application, after the first height of the target object is determined, the first height of the target object can be displayed in a display interface of the electronic device in a mode of animation, text, augmented Reality (AR) and the like, and therefore user experience can be improved.

According to the first aspect, in a fourth possible implementation manner of the height detection method, the determining the first body height of the target object according to the ground information, the first face pose and the device pose of the electronic device includes: adjusting the pose of the first face according to a preset interpupillary distance reference value to obtain a pose of a second face; according to the equipment pose, performing coordinate transformation on the second face pose to obtain a third face pose of the target object, wherein the third face pose is located under a world coordinate system; and determining the first height of the target object according to the third face pose and the ground information.

According to the embodiment of the application, the pose of the first face under the camera coordinate system can be adjusted and the coordinate transformation can be carried out, the pose of the third face under the world coordinate system is obtained, and the first height of the target object is determined according to the pose of the third face and the ground information, so that the first height of the target object can be calculated under the world coordinate system, and the height detection accuracy is improved.

According to a fourth possible implementation manner of the first aspect, in a fifth possible implementation manner of the height detection method, the adjusting the first face pose according to a preset interpupillary distance reference value to obtain a second face pose includes: determining a face size transformation coefficient according to a preset pupil distance reference value and a pupil distance value in the first face pose; and adjusting the pose of the first face according to the face size transformation coefficient to obtain a second face pose of the target object.

According to the embodiment of the application, the human face size transformation coefficient is determined through the pupil distance reference value and the pupil distance value in the first human face pose, the first human face pose is adjusted according to the human face size transformation coefficient, the second human face pose of the target object is obtained, and therefore the human face actual size and the pose of the target object in the camera coordinate system can be obtained.

According to a fourth possible implementation manner of the first aspect, in a sixth possible implementation manner of the height detection method, the determining a first height of the target object according to the third face pose and the ground information includes: determining the position of the top of the head of the target object according to the third face pose; and determining the first height of the target object according to the head top position and the ground information.

According to the embodiment of the application, the head position of the target object is determined, the first height of the target object is determined according to the head position and the ground information, and the height detection accuracy can be improved simply, quickly and accurately.

In a second aspect, an embodiment of the present application provides a height detection apparatus applied to an electronic device, including an image acquisition component configured to acquire a plurality of video frames; a processing component configured to: performing semantic plane detection on the plurality of video frames, and determining ground information in the plurality of video frames; carrying out face detection on the plurality of video frames to determine a face area; determining a first face pose of a target object in the plurality of video frames according to the face image of the face area and a preset face three-dimensional model; and determining a first height of the target object according to the ground information, the first face pose and the equipment pose of the electronic equipment.

In a first possible implementation form of the height detection apparatus according to the second aspect, the processing means is further configured to at least one of: prompting a user to shoot the ground under the condition that the ground information is not detected within a preset time period; prompting a user to adjust the pose of the equipment under the condition that the pitch angle of the electronic equipment indicated by the pose of the equipment does not meet a first preset condition; prompting a user to adjust the device pose and/or change the face pose of the target object under the condition that the first face pose does not meet a second preset condition; or prompting a user to adjust the pose of the equipment under the condition that the face area does not meet a third preset condition.

According to the embodiment of the application, the user can be prompted under at least one condition that the pitch angle of the electronic equipment, which is indicated by the ground information and the equipment pose, does not meet the first preset condition, the first face pose does not meet the second preset condition or the face area does not meet the third preset condition within the preset time period, for example, the user is prompted to shoot the ground, the equipment pose is adjusted, the face pose of a target object is changed, and the like, so that the user can perform corresponding adjustment, and the height detection accuracy is improved.

In a second possible implementation manner of the height detecting apparatus according to the second aspect or the first possible implementation manner of the second aspect, the determining a first height of the target object according to the ground information, the first face pose, and the device pose of the electronic device includes: determining a second height of the target object according to the ground information, the first face pose and the equipment pose; and performing post-processing on the second height to obtain the first height, wherein the post-processing comprises Kalman filtering.

In a third possible implementation form of the height detection apparatus according to the second aspect as such or the first possible implementation form of the second aspect or the second possible implementation form of the second aspect, the processing component is further configured to: and displaying the first height on a display interface of the electronic equipment.

In a fourth possible implementation manner of the height detecting apparatus according to the second aspect, the determining the first height of the target object according to the ground information, the first face pose and the device pose of the electronic device includes: adjusting the pose of the first face according to a preset interpupillary distance reference value to obtain a pose of a second face; according to the equipment pose, performing coordinate transformation on the second face pose to obtain a third face pose of the target object, wherein the third face pose is located under a world coordinate system; and determining the first height of the target object according to the third face pose and the ground information.

According to the embodiment of the application, the pose of the first face under the camera coordinate system can be adjusted and subjected to coordinate transformation to obtain the pose of the third face under the world coordinate system, and the first height of the target object is determined according to the pose of the third face and the ground information, so that the first height of the target object can be calculated under the world coordinate system, and the height detection accuracy is improved.

According to a fourth possible implementation manner of the second aspect, in a fifth possible implementation manner of the height detecting device, the adjusting the first face pose according to a preset pupil distance reference value to obtain a second face pose includes: determining a face size transformation coefficient according to a preset pupil distance reference value and a pupil distance value in the first face pose; and adjusting the pose of the first face according to the face size transformation coefficient to obtain a second face pose of the target object.

According to a fourth possible implementation manner of the second aspect, in a sixth possible implementation manner of the height detection apparatus, the determining the first height of the target object according to the third face pose and the ground information includes: determining the position of the top of the head of the target object according to the third face pose; and determining a first height of the target object according to the vertex position and the ground information.

According to the embodiment of the application, the head position of the target object is determined, the first height of the target object is determined according to the head position and the ground information, and the height detection accuracy can be improved simply and rapidly.

In a third aspect, embodiments of the present application provide a height measuring device, comprising: an image acquisition component for acquiring a plurality of video frames; a processor; a memory for storing processor-executable instructions; wherein the processor is configured to execute the instructions to implement the height detection method of the first aspect or one or more of its many possible implementations of the first aspect.

In a fourth aspect, embodiments of the present application provide a non-transitory computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, implement a height detection method of the first aspect or one or more of the many possible implementations of the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in an electronic device, a processor in the electronic device performs the height detection method of the first aspect or one or more of the many possible implementations of the first aspect.

These and other aspects of the present application will be more readily apparent in the following description of the embodiment(s).

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the application and, together with the description, serve to explain the principles of the application.

Fig. 1 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.

Fig. 2 shows a software structure block diagram of an electronic device according to an embodiment of the present application.

FIG. 3 shows a flow chart of a height detection method according to an embodiment of the present application.

Fig. 4 is a schematic diagram illustrating a process of detecting ground information according to an embodiment of the present application.

FIG. 5 shows a schematic diagram of a process for determining a first face pose of a target object according to an embodiment of the application.

FIG. 6 is a diagram illustrating a height display of a target object according to an embodiment of the present application.

FIG. 7 shows a schematic diagram of a process of height detection according to an embodiment of the present application.

FIG. 8 shows a block diagram of a height detection apparatus according to an embodiment of the present application.

Detailed Description

Various exemplary embodiments, features and aspects of the present application will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present application. It will be understood by those skilled in the art that the present application may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present application.

In the related art, when the height of a human body is measured, professional equipment such as a binocular camera and a depth camera is generally required, the dependence on the equipment is high, and a complete human body image needs to be shot, so that certain limitations are realized. For example, in some technical solutions, a binocular camera is used to capture a scene image, image coordinates of a human head apex of a human body target in the scene image are acquired, depth information corresponding to the human head apex generated by the binocular camera is acquired according to the image coordinates of the human head apex, coordinates of the human head apex under a camera coordinate system are calculated by using the image coordinates and the depth information of the human head apex, and the height of the human body target is measured according to the coordinates of the human head apex under the camera coordinate system and the installation height, the pitch angle and the inclination angle of the binocular camera.

According to the technical scheme, a binocular camera is required to be arranged (namely, the binocular camera is dependent on equipment), meanwhile, the pose of the camera and the installation height of the known camera are required to be fixed, and the use scene is limited to a certain extent. In addition, this technical scheme still must shoot the complete human body and can realize the height measurement, has certain limitation.

For another example, in some technical solutions, a dense semantic map may be generated based on a semantic synchronous localization and mapping (SLAM) technique, then plane semantic detection is implemented through the dense semantic map, the height of an object is automatically identified according to an internal relation between semantics, then the length and the width of the object are calculated based on ground extraction and focus target segmentation post-projection, and finally the bounding box size (length, width and height) of the object is obtained. As the human body belongs to one of the generalized objects, the height of the human body can be measured by the technical scheme.

However, the semantic SLAM technology-based dense semantic map generation needs a depth camera (namely, the device is dependent), and the technical scheme has a good measuring effect on objects with surfaces parallel to the ground, the human body has a complex shape, the vertex does not have an obvious plane, and the measuring accuracy is not high. In addition, this technical scheme needs to shoot the full face of target object, including the object top, otherwise can't rebuild out the complete profile of object, and to the scene that human height was measured, just need shoot human target from higher angle with the shooter, and the operation is inconvenient and has certain limitation.

With the development of Artificial Intelligence (AI), some technical solutions employ a human face height model obtained by machine learning to measure the height of a human body. For example, a face classifier and a face height model may be trained respectively, then an image of a human target to be measured is input into the face classifier for face detection, so as to obtain a face image of the human target, and the face image is input into the face height model for processing, so as to obtain the height of the human target.

However, the core of the technical scheme is the human face height model, and the human face height model obtained through machine learning is poor in interpretability and large in dependence on training data, and the human face height model is large in generalization difficulty and low in accuracy of measurement results due to the fact that human face height relations of different races possibly differ.

In other embodiments, the height measurement is performed by manually operating an Augmented Reality (AR) scale. For example, a spatial equation of the ground can be obtained by using a plane detection and SLAM technology, then a measurement is performed on the foot of a human target (i.e., a measurement object) where a virtual anchor point needs to be positioned by a human, a virtual AR scale is pulled from bottom to top, the measurement is stopped after the virtual AR scale is pulled to a head top position, and then the length of the AR scale, i.e., the height of the human target, is obtained through a three-dimensional (3-dimension, 3D) spatial coordinate system established by SLAM.

However, according to the technical scheme, manual participation is needed, the measurement efficiency is low, the virtual anchor point is clicked on a two-dimensional (2-dimensional, 2D) image and projected onto a 3D plane through ray projection, and due to reasons such as object shielding and manual operation errors, the virtual anchor point seems to be positioned under the feet of a human target but has a large actual difference, namely, the virtual anchor point is not accurately positioned, so that the height measurement result is inaccurate.

In order to solve the technical problem, the application provides a height detection method which can be applied to electronic equipment. The height detection method of the embodiment of the application can perform semantic plane detection on a plurality of video frames acquired by an image acquisition component of electronic equipment, and determine ground information in the plurality of video frames; simultaneously carrying out face detection on a plurality of video frames to determine a face area; and determining first face poses of the target objects in the plurality of video frames according to the face areas and a preset face three-dimensional model, and determining the first body height of the target objects according to ground information, the first face poses and the equipment poses of the electronic equipment.

The height of the target object is detected in the mode, so that the height of the target object is not dependent on professional equipment (such as a binocular camera, a depth camera and the like), the face pose of the target object can be determined through face recognition and a face three-dimensional technology, the height of the target object is further determined through the face pose, the equipment pose and ground information, the target object does not need to be manually positioned, a complete human body image of the target object does not need to be shot, and the height detection method is convenient to operate and high in accuracy.

The electronic device according to the embodiment of the application may be a touch screen or a non-touch screen, the touch screen electronic device may be controlled by clicking, sliding and the like on a display screen with a finger, a stylus and the like, and the non-touch screen electronic device may be connected to an input device such as a mouse, a keyboard, a touch panel and the like and controlled by the input device.

Fig. 1 shows a schematic structural diagram of an electronic device 100 according to an embodiment of the present application. The electronic device 100 may include at least one of a mobile phone, a foldable electronic device, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, a Personal Digital Assistant (PDA), an Augmented Reality (AR) device, a Virtual Reality (VR) device, an Artificial Intelligence (AI) device, a wearable device, a vehicle-mounted device, a smart home device, or a smart city device. The embodiment of the present application does not particularly limit the specific type of the electronic device 100.

The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) connector 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identity Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processor (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), among others. The different processing units may be separate devices or may be integrated into one or more processors. The processor can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 may be a cache memory. The memory may store instructions or data that have been used or used more frequently by the processor 110. If the processor 110 needs to use the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc. The processor 110 may be connected to modules such as a touch sensor, an audio module, a wireless communication module, a display, a camera, etc. through at least one of the above interfaces.

It should be understood that the interface connection relationship between the modules illustrated in the embodiments of the present application is only an illustration, and does not limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.

The electronic device 100 may implement display functions via the GPU, the display screen 194, and the application processor, among others. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may be a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), or the like. In some embodiments, the electronic device 100 may include 1 or more display screens 194.

The electronic device 100 can realize functions of photographing, shooting and the like through the camera 193, the isp, the video codec, the GPU, the display screen 194, the application processor AP, the neural network processor NPU and the like, that is, realize related functions of image and video acquisition and the like.

The camera 193 may be used to acquire color image data of a photographic subject. In some embodiments, camera 193 may also be used to acquire depth data of the subject. That is, the camera in the electronic device 100 may be a general camera that does not collect depth data, such as a monocular camera, or may be a professional camera that can collect depth data, such as a binocular camera, a depth camera, or the like. The specific type of camera 193 is not limited by this application.

The ISP can be used to process color image data collected by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene.

In some embodiments, electronic device 100 may include 1 or more cameras 193. Specifically, the electronic device 100 may include 1 front camera and at least 1 rear camera. Among them, the front camera may be generally used to collect the photographer's own color image data facing the display screen 194, and the rear camera may be used to collect the color image data of the photographic subject (e.g., a person, a landscape, etc.) faced by the photographer.

In some embodiments, a CPU or GPU or NPU in processor 110 may process a plurality of video frames captured by camera 193. Specifically, the processor 110 may detect a plurality of video frames acquired by an image acquisition component (i.e., the camera 193) of the electronic device 100, and determine ground information in the plurality of video frames; simultaneously carrying out face detection on the plurality of video frames to determine a face area, and determining a first face pose of a target object in the plurality of video frames according to the face area and a preset face three-dimensional model; and then determining the first height of the target object according to the ground information, the first face pose and the equipment pose of the electronic equipment. In some embodiments, the first height of the target object may also be displayed in the display screen 194 of the electronic device 100.

The gyro sensor 180B in the electronic device 100 may be used to determine the motion pose of the electronic device 100. In some embodiments, the angular velocity of electronic device 100 about three axes (i.e., the x, y, and z axes) may be determined by gyroscope sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. Illustratively, when the shutter is pressed, the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance to be compensated for the lens module according to the shake angle, controls the lens to move in the reverse direction to counteract the shake of the electronic device 100, and thereby achieves anti-shake. The gyroscope sensor 180B may also be used in navigation, motion sensing games, and other scenarios.

Acceleration sensor 180E in electronic device 100 may detect the magnitude of acceleration of electronic device 100 in various directions (typically three axes x, y, and z). The magnitude and direction of gravity can be detected when the electronic device 100 is stationary. The method can also be used for recognizing the posture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.

In some embodiments, the gyroscope sensor 180B, the acceleration sensor 180E, and other components of the electronic device 100 may form an Inertial Measurement Unit (IMU) for measuring the device pose of the electronic device 100.

The touch sensor 180K in the electronic device 100 is also referred to as a "touch device". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation applied thereto or nearby. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided via the display screen 194. In other embodiments, the touch sensor 180K may be disposed on a surface of the electronic device 100, different from the position of the display screen 194.

The keys 190 in the electronic device 100 may include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys or touch keys. The electronic apparatus 100 may receive a key input, and generate a key signal input related to user setting and function control of the electronic apparatus 100. For example, when photographing or shooting is performed by a camera Application (APP) of the electronic apparatus 100, the camera APP may provide keys for the user to operate, such as start of photographing or shooting, and end of shooting.

The motor 191 in the electronic device 100 may generate a vibration cue. The motor 191 may be used for incoming call vibration cues, as well as for touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also respond to different vibration feedback effects for touch operations applied to different areas of the display screen 194. Different application scenes (such as time reminding, information receiving, alarm clock, game, photographing, camera shooting and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

The software system of the electronic device 100 may employ a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. The embodiment of the present application takes an Android system with a layered architecture as an example, and exemplarily illustrates a software structure of the electronic device 100.

Fig. 2 shows a block diagram of a software structure of the electronic device 100 according to an embodiment of the present application.

The layered architecture divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into five layers, from top to bottom, an application Layer, an application framework Layer, an Android Runtime (ART) and native C/C + + libraries, a Hardware Abstraction Layer (HAL), and a kernel Layer.

The application layer may include a series of application packages.

As shown in fig. 2, the application package may include camera, gallery, calendar, phone call, map, navigation, WLAN, bluetooth, music, video, short message, etc. applications.

The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions.

As shown in FIG. 2, the application framework layers may include a window manager, content provider, view system, resource manager, notification manager, activity manager, input manager, and the like.

The Window Manager provides a Window Management Service (WMS), which may be used for Window management, window animation management, surface management, and a relay station as an input system.

The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.

The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.

The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a brief dwell, and does not require user interaction. Such as a notification manager used to notify download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scroll bar text at the top status bar of the system, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, prompting text information in the status bar, sounding a prompt tone, vibrating the electronic device, flashing an indicator light, etc.

The campaign Manager may provide a campaign Manager Service (AMS), which may be used for the start-up, switching, scheduling of system components (e.g., campaigns, services, content providers, broadcast receivers), and the management and scheduling of application processes.

The Input Manager may provide an Input Manager Service (IMS) that may be used to manage inputs to the system, such as touch screen inputs, key inputs, sensor inputs, and the like. The IMS takes the event from the input device node and assigns the event to the appropriate window by interacting with the WMS.

The android runtime comprises a core library and an android runtime. Android runtime is responsible for converting source code into machine code. Android runtime mainly includes adopting Advanced (AOT) compilation technology and Just In Time (JIT) compilation technology.

The core library is mainly used for providing the functions of basic Java class libraries, such as basic data structure, mathematics, IO, tools, database, network and the like libraries. The core library provides an API for android application development of users.

The native C/C + + library may include a plurality of functional modules. For example: surface manager (surface manager), media Framework (Media Framework), libc, openGL ES, SQLite, webkit, etc. Wherein the surface manager is configured to manage the display subsystem and provide a fusion of the 2D and 3D layers for the plurality of applications. The media framework supports a variety of commonly used audio, video format playback and recording, as well as still image files, and the like. The media library may support a variety of audio-video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, and the like. OpenGL ES provides for the rendering and manipulation of 2D graphics and 3D graphics in applications. SQLite provides a lightweight relational database for applications of electronic device 100.

The hardware abstraction layer runs in a user space (user space), encapsulates the kernel layer driver and provides a calling interface for the upper layer.

The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.

The following describes exemplary work flows of software and hardware of the electronic device 100 in conjunction with the height detection scenario of the embodiment of the present application.

Suppose that the height detection is realized through application height APP on the electronic equipment, when the height detection is carried out, the user can touch the height APP icon on the display screen of the electronic equipment, and when the touch sensor 180K receives the touch operation, the corresponding hardware interruption is sent to the kernel layer. The kernel layer processes the touch operation into an original input event (including touch coordinates, timestamp of the touch operation, and the like). The raw input events are stored at the kernel layer. And the application program framework layer acquires the original input event from the kernel layer and identifies the control corresponding to the input event. Use this touch operation to be touch clicking operation, the controlling part that this clicking operation corresponds is the controlling part of height APP icon for the example, and height APP invokes the interface on application frame layer, starts height APP, and then starts the camera drive through invoking the kernel layer, gathers a plurality of video frames through camera 193, gathers the video stream through camera 193 promptly. The plurality of video frames can comprise a target object with the height to be detected.

After acquiring the plurality of video frames, the electronic device 100 may perform related processing such as ground detection and face detection on the plurality of video frames through the processor 110, so as to determine the height of the target object.

FIG. 3 shows a flow chart of a height detection method according to an embodiment of the present application. As shown in FIG. 3, the height detection method comprises: step S310, semantic plane detection is carried out on a plurality of video frames collected by an image collecting component of the electronic equipment, and ground information in the plurality of video frames is determined.

The image acquisition component can be a camera of the electronic device, the camera can be a common camera which does not acquire depth data, such as a monocular camera, and also can be a professional camera which can acquire depth data, such as a binocular camera and a depth camera.

In the case that the image capturing component is a general camera, the plurality of video frames captured by the image capturing component are color (RGB) video frames, and since the plurality of video frames may form a video stream, it may also be considered that the image capturing component captures an RGB video stream. In the case that the image capturing part is a professional camera, the plurality of video frames captured by the image capturing part may include depth data in addition to RGB image data. It should be noted that the present application does not limit the specific type of the image capturing component.

A plurality of video frames (namely RGB video streams) can be collected through an image collecting component of the electronic equipment, and the collected video frames are subjected to plane detection, semantic segmentation and other processing to determine the ground information in the video frames. The ground information may be represented by a plane equation in space, or may be represented by other ways, which is not limited in this application.

In one possible implementation manner, when determining the ground information in the plurality of video frames, the plane detection may be performed on the plurality of video frames acquired by the image acquisition component first, and the position information of the plurality of planes in the plurality of video frames may be determined. Alternatively, the location information of multiple planes in multiple video frames may be determined by SLAM techniques.

For example, three-dimensional information can be extracted from a plurality of video frames acquired by the image acquisition component to obtain sparse point cloud data, and the device pose of the electronic device when acquiring each video frame is determined; and then according to the equipment pose when the electronic equipment collects each video frame, carrying out plane fitting on the sparse point cloud data through a plane fitting algorithm to obtain the position information of a plurality of planes in a plurality of video frames. The position information of each plane can be represented by a plane equation in space.

The sparse point cloud data is obtained by extracting the three-dimensional information in the video frames, and the sparse point cloud data is subjected to plane fitting according to the equipment pose of the electronic equipment when the electronic equipment collects each video frame to obtain the position information of the planes in the video frames, so that the processing efficiency can be improved, and the accuracy of the position information of each plane can be improved.

The method can determine the position information of a plurality of planes in a plurality of video frames, and can perform semantic segmentation on the plurality of video frames acquired by the image acquisition component to obtain the semantic segmentation result of each video frame. Specifically, for any one of the plurality of video frames, semantic recognition may be performed on the video frame, a category of an object in the video frame, such as a ground, a desk, a wall, etc., may be recognized, and each pixel in the video frame may be labeled according to the recognized category of the object, so as to obtain a semantic segmentation result of the video frame.

Then, according to the position information of the planes in the video frames and the semantic segmentation result of each video frame, performing semantic identification on the planes in the video frames to obtain multiple semantic plane information, such as plane information of a desktop, a wall surface, a ground surface and the like, and then selecting the ground information from the multiple semantic plane information.

Fig. 4 shows a schematic diagram of a detection process of ground information according to an embodiment of the present application. As shown in fig. 4, it is assumed that the height detection method of the embodiment of the present application is implemented by using a height APP of an application program on an electronic device (e.g., a mobile phone), and after the height APP is opened by a user, the height APP may perform image acquisition by using an image acquisition component (e.g., a camera) of the electronic device to obtain a plurality of video frames 410 (i.e., RGB video streams); then, three-dimensional information extraction is carried out on the plurality of video frames 410 to obtain sparse point cloud data 420, and meanwhile, the device poses 430 when the electronic device collects the plurality of video frames 410 are determined; and according to the device pose 430 when the electronic device collects a plurality of video frames 410, performing plane fitting on the sparse point cloud data 420 through a plane fitting algorithm to obtain position information 440 of a plurality of planes in the plurality of video frames 410.

While determining the position information 440 of the planes in the video frames 410, performing semantic segmentation 450 on the video frames 410 to obtain a semantic segmentation result 460; then, semantic recognition can be performed on the planes in the video frames 410 according to the position information 440 of the planes and the semantic segmentation result 460 to obtain semantic plane information 470, and the ground information 480 is selected from the semantic plane information 470, wherein the ground information 480 can be represented by a plane equation in space.

The above description has exemplarily explained the detection process of the ground information only by taking a plurality of video frames (i.e. RGB video streams) captured by the image capturing component as input. In some embodiments, the depth data acquired by the image acquisition component and the information such as the device pose of the electronic device acquired by the inertial measurement unit IMU can be simultaneously used as input, so as to improve the accuracy of ground detection.

Through plane detection and semantic segmentation of a plurality of video frames acquired by the image acquisition component, the electronic equipment can automatically sense a shot scene and acquire a plurality of semantic plane information, and further can automatically identify ground information, so that not only can manual operations of a user be avoided, such as manual operations of manually clicking the ground by the user, and the like, but also the accuracy of ground detection can be improved.

In one possible implementation, the user may be prompted to photograph the ground in the event that ground information in multiple video frames is not detected within a preset period of time (e.g., 5s, 10s, etc.). For example, prompt information such as "please shoot the ground", "not detect the ground" can be broadcasted to the user in a voice broadcast manner, and prompt information such as "please shoot the ground", "not detect the ground" can also be shown to the user in a text manner, an animation manner and the like on a display interface of the height APP, so that the user can adjust the shooting content in time, and the ground detection efficiency can be improved.

It should be noted that, a person skilled in the art may set the content and the prompting mode of the prompting information when the ground information in the plurality of video frames is not detected according to the actual situation, which is not limited in the present application.

Step S320, performing face detection on the plurality of video frames to determine a face region. When the ground information in a plurality of video frames is determined, the face detection can be carried out on the plurality of video frames through modes of feature extraction, key point detection and the like. When a complete face is detected, a face region can be determined from a plurality of video frames, and an object corresponding to the face region is determined as a target object. The number of the face regions may be one or more, and the number of the target objects may also be one or more, which are not limited in the present application.

In a possible implementation manner, after the face region is determined from the multiple video frames, it may be determined whether the face region meets a third preset condition, where the third preset condition is that the face region is located in a preset region of the video frame where the face region is located, the preset region may be a central region of the video frame where the face region is located, for example, a region whose area is one half of that of a central point in the video frame where the face region is located may be set as the central region of the video frame where the face region is located. It should be noted that, a person skilled in the art may set the preset region of the video frame where the face region is located according to an actual situation, which is not limited in this application.

Under the condition that the face area does not meet the third preset condition, the user can be prompted to adjust the equipment pose of the electronic equipment in the modes of voice broadcasting, text display, animation display and the like, so that the face area meets the third preset condition, the face area is located in the preset area of the video frame where the face area is located, and the height detection accuracy is improved.

And step S330, determining a first face pose of a target object in the plurality of video frames according to the face region and a preset face three-dimensional model. When the first face pose of a target object in a plurality of video frames is determined, a face three-dimensional model of the target object can be established through a pre-trained neural network according to a face area and a preset face three-dimensional model (namely an average face three-dimensional model). For example, the face region and the preset face three-dimensional model may be input into a pre-trained Convolutional Neural Network (CNN) for registration to obtain the face three-dimensional model of the target object.

Then, according to the preset parameters of the human face three-dimensional model, for example, the constraint relation on the human face structure: the pupil distance, the distance from the tip of the nose to the top of the head and the like are determined, the position information and the rotation information of the three-dimensional model of the face of the target object relative to the image acquisition part of the electronic equipment are determined, and the position information and the rotation information of the three-dimensional model of the face of the target object relative to the image acquisition part of the electronic equipment are determined as the first face pose of the target object. Wherein the rotation information may be represented by pitch angle, roll angle, and yaw angle.

FIG. 5 illustrates a schematic diagram of a process for determining a first face pose of a target object according to an embodiment of the present application. As shown in fig. 5, a plurality of video frames 510 captured by an image capturing component of an electronic device may be subjected to face detection 520 to determine a face region 530; inputting the human face region 530 and the preset human face three-dimensional model 540 into a pre-trained convolutional neural network CNN 550 for registration to obtain a human face three-dimensional model 560 of the target object; according to the preset parameters of the three-dimensional model 540, the position information and the rotation information of the three-dimensional model of the face of the target object relative to the image acquisition component of the electronic device are determined, and the position information and the rotation information are determined as the first face pose 570 of the target object.

By the method, the face three-dimensional model of the target object can be determined according to the face area and the preset face three-dimensional model, and the first face pose of the target object is determined according to the parameters of the preset face three-dimensional model, so that the face three-dimensional reconstruction technology can be used for determining the first face pose of the target object, the processing efficiency can be improved, the accuracy of the first face pose of the target object can be improved, and the accuracy of height detection is improved.

In a possible implementation manner, a neural network (for generating a three-dimensional model of a face of a target object, such as the convolutional neural network CNN 550) may be trained in advance according to a plurality of sample face regions and a preset three-dimensional model of the face. For example, for any sample face region, the sample face region and a preset face three-dimensional model can be input into a neural network for registration to obtain a sample face three-dimensional model; then, performing reverse rendering (reverse render) on the sample human face three-dimensional model, namely projecting the sample human face three-dimensional model to a two-dimensional space to obtain a reverse rendering image; determining the network loss of the neural network according to the difference between each reverse rendering image and the corresponding sample face area; network parameters of the neural network are adjusted according to the network loss.

And when the neural network meets the preset training end condition, ending the training to obtain the trained neural network. The training end condition may be, for example, that the training round of the neural network reaches a preset training round threshold, that the network loss of the neural network converges within a certain interval, that the neural network passes the verification on the verification set, or the like. The training end condition of the neural network can be set by a person skilled in the art according to practical situations, and the present application is not limited thereto.

In a possible implementation manner, after determining the first face pose of the target object, it may be determined whether the first face pose meets a second preset condition, where the second preset condition is that a pitch angle in the first face pose is located within a preset second angle interval, a roll angle in the first face pose is located within a preset third angle interval, and a yaw angle in the first face pose is located within a preset fourth angle interval. The second angle interval, the third angle interval and the fourth angle interval may be the same or different. It should be noted that, a person skilled in the art may set specific values of the second angle interval, the third angle interval, and the fourth angle interval according to an actual situation, and the application does not limit this.

Under the condition that the first face pose of the target object does not meet the second preset condition, the user can be prompted to adjust the device pose of the electronic device and/or change the face pose of the target object in the modes of voice broadcasting, text display, animation display and the like so that the first face pose of the target object meets the second preset condition, the face of the target object faces the image acquisition component of the electronic device, namely, the face in the video frame acquired by the image acquisition component can be the face of the target object, and the height detection accuracy is improved.

Step S340, determining a first height of the target object according to the ground information, the first face pose and the equipment pose of the electronic equipment. The device pose of the electronic device is the pose when the electronic device collects the video frame where the face area is located.

In a possible implementation manner, before determining the first height of the target object, it may be determined whether a pitch angle of the electronic device indicated by the device pose of the electronic device satisfies a first preset condition, where the first preset condition is that the pitch angle of the electronic device indicated by the device pose of the electronic device is within a preset first angle interval.

Under the condition that the pitch angle of the electronic equipment indicated by the equipment pose of the electronic equipment does not meet a first preset condition, a user can be prompted to adjust the equipment pose of the electronic equipment in the modes of voice broadcasting, text display, animation display and the like so as to avoid excessive upward shooting or downward shooting during video frame acquisition, and therefore the accuracy of height detection is improved.

In determining the first height of the target object, the ground information and a coordinate system of the first face pose of the target object may be determined first. In the case of the SLAM technique, the ground information is located in world coordinates. The Y axis of the world coordinate system can be determined as the vertical direction of the real world according to the data acquired by the inertial measurement unit IMU, and because physical information such as the distance, the object size and the like in the world coordinate system is the same as the real world, the connection between the virtual world coordinate system and the real world can be established in such a way, so that the size of the object calculated in the world coordinate system is the actual size of the object in the real world.

And the first face pose of the target object is the face pose of the target object relative to the image capture component of the electronic device, which is located under the camera coordinate system. In a camera coordinate system, an image acquisition component of the electronic equipment is located at a coordinate origin, from the perspective of a three-dimensional model of a face of a target object, the position of the image acquisition component of the electronic equipment is fixed, and because there is no size comparison between an object in the camera coordinate system and an object in the real world, the face size of the target object in the camera coordinate system and the actual face size of the target object in the real world have a certain scaling ratio. Therefore, face resizing and coordinate system transformation are required.

In a possible implementation manner, when determining the first height of the target object, the first face pose of the target object may be first adjusted according to a preset interpupillary distance reference value to obtain a second face pose of the target object, where a face size indicated by the first face pose is the same as a face size of the target object in the face area, a face size indicated by the second face pose is a face actual size of the target object, and the second face pose is located under a camera coordinate system. That is, the first face pose of the target object may be adjusted in the camera coordinate system, so that the face size indicated by the adjusted second face pose is the actual face size of the target object.

For example, the interpupillary distance value in the first face pose of the target object can be determined, and the face size transformation coefficient is determined according to the preset interpupillary distance reference value and the interpupillary distance value in the first face pose; and then adjusting the pose of the first face according to the face size transformation coefficient to obtain a second face pose of the target object.

And determining a face size transformation coefficient through the pupil distance reference value and the pupil distance value in the first face pose, and adjusting the first face pose according to the face size transformation coefficient to obtain a second face pose of the target object, so that the actual face size and pose of the target object in the camera coordinate system can be obtained.

After the second face pose of the target object is obtained, coordinate transformation can be performed on the second face pose according to the equipment pose of the electronic equipment, and a third face pose of the target object is obtained, wherein the third face pose of the target object is located under a world coordinate system.

In one possible implementation, the pose P of the second face can be obtained by the following formula (1) _C The coordinate transformation is carried out to carry out the coordinate transformation,obtaining a third face pose P of the target object _w ：

P _w ＝T ^-1 P _C (1)

In formula (1), T represents a rigid body transformation matrix determined from the device pose (R, T) of the electronic device,

wherein R represents a rotation matrix in the device pose of the electronic device, and t represents a translation matrix in the device pose of the electronic device.

And then, determining the first body height of the target object according to the third face pose and the ground information. In one possible implementation, the vertex position of the target object may be determined according to the third face pose, and then the first body height of the target object may be determined according to the vertex position of the target object and the ground information. For example, assume that the overhead position of the target object determined from the third face pose is (x) ₁ ,y ₁ ,z ₁ ) The ground information is expressed by a planar equation F = F (x, Y, z) in space, and can be calculated in the Y-axis direction (x ₁ ,y ₁ ,z ₁ ) First distance L to ground F = F (x, y, z) ₁ And the first distance L is set ₁ The first body height L of the target object is determined, i.e. L = L ₁ 。

The height detection method is simple and rapid and can improve the accuracy of height detection by determining the top position of the head of the target object and determining the first height of the target object according to the top position and the ground information.

In a possible implementation manner, the nose tip position of the target object may be determined according to the third face pose, then the vertex position of the target object is determined according to the nose tip position of the target object and a preset proportional relation between the nose tip to the chin and the nose tip to the vertex, and the first body height of the target object is determined according to the vertex position of the target object and the ground information.

For example, assume a target determined from a third face poseThe nasal tip position of the subject is (x) ₂ ,y ₂ ,z ₂ ) The ground information is expressed by the equation of plane F = F (x, y, z) in space, and can be determined according to the nose tip position (x) of the target object ₂ ,y ₂ ,z ₂ ) And determining the vertex position (x) of the target object according to the preset proportional relation between the tip of the nose to the chin and the tip of the nose to the vertex ₃ ,y ₃ ,z ₃ ) Then (x) is calculated in the Y-axis direction ₃ ,y ₃ ,z ₃ ) Second distance L to ground F = F (x, y, z) ₂ And the second distance L is set ₂ The first body height L of the target object is determined, i.e. L = L ₂ 。

The head top position of the target object is determined through the nose tip position of the target object and the proportional relation between the nose tip to the chin and the nose tip to the head top, and the first body height of the target object is determined according to the head top position of the target object and the ground information, so that the height detection accuracy can be improved.

In a possible implementation manner, when there are a plurality of face regions corresponding to the target object, for any face region, a second height of the target object may be determined in a manner similar to that described above according to the ground information in the plurality of video frames, the first face pose of the target object, and the device pose of the electronic device when the electronic device captures the video frame in which the face region is located; and then performing post-processing such as Kalman filtering, averaging and the like on the plurality of second heights to obtain the first height of the target object. In this way, the accuracy of height detection can be improved.

In one possible implementation manner, after the first height of the target object is determined, the first height of the target object may be further displayed in a display interface of the electronic device. For example, after the first height of the target object is determined, the first height of the target object may be displayed in a display interface of the electronic device by animation, text, augmented Reality (AR), or the like. When the height detection is realized through the height APP, the first height of the target object can be displayed in the display interface of the height APP. The display interface of the height APP may include a real-time image interface in which an image capture component of the electronic device captures video frames.

FIG. 6 is a diagram illustrating a height display of a target object according to an embodiment of the present application. As shown in fig. 6, the user performs height detection on the target object 630 through the height APP on the electronic device 600, the display interface 610 of the height APP displays a video frame acquired by an image acquisition component (not shown in the figure) of the electronic device 600 in real time, when the height APP detects the height of the target object 630, the height of the target object 630 can be displayed at a preset position above the head of the target object 630 in the display interface 610 of the height APP in the manner of the augmented reality icon 620, and the display information can be "height: 175CM ".

The above-mentioned fig. 6 exemplarily illustrates the display mode of the height by taking only one target object as an example. It should be noted that the heights of a plurality of target objects may be displayed in the above manner. The person skilled in the art may also set a display mode, a display position, and the like of the height of the target object according to the actual situation, which is not limited in this application.

According to the height detection method, semantic plane detection can be carried out on a plurality of video frames acquired by an image acquisition component of electronic equipment, and ground information in the plurality of video frames is determined; simultaneously carrying out face detection on the plurality of video frames to determine a face area, and determining a first face pose of a target object in the plurality of video frames according to the face area and a preset face three-dimensional model; and then, the first height of the target object is determined according to the ground information, the first face pose and the equipment pose of the electronic equipment, so that the height detection does not depend on professional equipment (such as a binocular camera, a depth camera and the like), the face pose of the target object can be determined through face recognition and a face three-dimensional technology, the height of the target object is determined through the face pose, the equipment pose and the ground information, the target object does not need to be manually positioned, a complete human body image of the target object does not need to be shot, and the operation is convenient and the accuracy is high.

FIG. 7 shows a schematic diagram of a process of height detection according to an embodiment of the present application. As shown in fig. 7, assuming that the user performs height detection on the target object through a height APP running on the electronic device, when the user opens the height APP, step S701 is executed, and the height APP acquires a plurality of video frames (i.e., video streams) through an image acquisition component of the electronic device. Optionally, during the height detection process, the image capturing component may continuously capture the video stream.

In the case that the semantic plane detection is implemented by the SLAM technology, it may be determined in step S702 whether the SLAM is successfully initialized, and if the SLAM is not successfully initialized, the user is prompted to move the electronic device, and step S701 is re-executed; if the SLAM initialization is successful, step S703 is executed to perform semantic plane detection on the plurality of video frames, and in step S704, it is determined whether the ground information is detected within a preset time period.

If the ground information is not detected within the preset time period, prompting the user to shoot the ground, and simultaneously continuing to execute the step S701; if the ground information is detected within the preset time period, step S705 is executed to perform face detection on the plurality of video frames, determine a face region, and in step S706, determine whether the face region meets a third preset condition, where the third preset condition is that the face region is located in a preset region of the video frame where the face region is located.

If the face area does not meet the third preset condition, prompting the user to adjust the device pose of the electronic device, and simultaneously executing the step S701 again; if the face region meets the third preset condition, executing step S707, determining a first face pose of the target object in the plurality of video frames according to the face region and the preset face three-dimensional model, and in step S708, determining whether the first face pose meets a second preset condition, where the second preset condition is that a pitch angle in the first face pose is within a preset second angle interval, a roll angle in the first face pose is within a preset third angle interval, and a yaw angle in the first face pose is within a preset fourth angle interval.

If the first face pose does not meet the second preset condition, prompting a user to adjust the equipment pose of the electronic equipment and/or change the face pose of the target object, and simultaneously executing the step S701 again; and executing step S709 when the first face pose meets the second preset condition, and determining whether the pitch angle of the electronic device indicated in the device pose of the electronic device meets a first preset condition, where the first preset condition is that the pitch angle of the electronic device indicated in the device pose of the electronic device is within a preset first angle interval.

If the pitch angle of the electronic equipment indicated in the equipment pose of the electronic equipment does not meet the first preset condition, prompting a user to adjust the equipment pose of the electronic equipment, and simultaneously executing the step S701 again; if the pitch angle of the electronic device indicated in the device pose of the electronic device meets a first preset condition, executing step S710, and determining a first body height of the target object according to the ground information, the first face pose and the device pose of the electronic device; then, step S711 is executed to display the first height of the target object on the display interface of the height APP in a manner of augmented reality AR or the like.

According to the height detection method, through the SLAM technology and semantic segmentation, the ground information can be automatically identified, the height of the target object can be automatically detected, manual operation (such as manual clicking operation or marking of the target object) is not needed, and the heights of a plurality of target objects can be simultaneously detected, so that the height detection process is simplified, and the height detection efficiency is improved. In addition, the embodiment of the application acquires the three-dimensional information through the SLAM technology, can also avoid the contact of the electronic equipment and the human body of the target object, and is safe and reliable.

According to the height detection method, height detection is carried out on the basis of a plurality of video frames acquired by a common camera (such as a monocular camera), professional equipment such as a depth camera is not needed, equipment dependence is reduced, and in some embodiments, height detection can be achieved through handheld equipment (such as a mobile phone and a smart watch) by a user. Meanwhile, the embodiment of the application acquires the face pose of the target object through the face recognition and face three-dimensional reconstruction technology, is rapid and accurate, can improve the accuracy of height detection, and is also suitable for scenes such as target object movement, shooting angle change and the like.

FIG. 8 shows a block diagram of a height detection apparatus according to an embodiment of the present application. As shown in fig. 8, the height detection apparatus is applied to an electronic device, and includes: an image capture component 810 for capturing a plurality of video frames; a processing component 820 configured to: performing semantic plane detection on the plurality of video frames, and determining ground information in the plurality of video frames; carrying out face detection on the plurality of video frames to determine a face area; determining a first face pose of a target object in the plurality of video frames according to the face image of the face area and a preset face three-dimensional model; and determining a first height of the target object according to the ground information, the first face pose and the equipment pose of the electronic equipment.

In one possible implementation, the processing component is further configured to at least one of: prompting a user to shoot the ground under the condition that the ground information is not detected within a preset time period; prompting a user to adjust the pose of the equipment under the condition that the pitch angle of the electronic equipment indicated by the pose of the equipment does not meet a first preset condition; prompting a user to adjust the device pose and/or change the face pose of the target object under the condition that the first face pose does not meet a second preset condition; or prompting a user to adjust the pose of the equipment under the condition that the face area does not meet a third preset condition.

In one possible implementation, the determining a first altitude of the target object according to the ground information, the first face pose, and the device pose of the electronic device includes: determining a second height of the target object according to the ground information, the first face pose and the equipment pose; and performing post-processing on the second height to obtain the first height, wherein the post-processing comprises Kalman filtering.

In one possible implementation, the processing component is further configured to: and displaying the first height on a display interface of the electronic equipment.

An embodiment of the present application provides a height detection device, includes: an image acquisition component for acquiring a plurality of video frames; a processor and a memory for storing processor-executable instructions; wherein the processor is configured to implement the above method when executing the instructions.

Embodiments of the present application provide a non-transitory computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

Embodiments of the present application provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, the processor in the electronic device performs the above method.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an erasable Programmable Read-Only Memory (EPROM or flash Memory), a Static Random Access Memory (SRAM), a portable Compact Disc Read-Only Memory (CD-ROM), a Digital Versatile Disc (DVD), a Memory stick, a floppy disk, a mechanical coding device, a punch card or an in-groove protrusion structure, for example, having instructions stored thereon, and any suitable combination of the foregoing.

The computer readable program instructions or code described herein may be downloaded to the respective computing/processing device from a computer readable storage medium, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives the computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present application may be assembler instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry can execute computer-readable program instructions to implement aspects of the present application by utilizing state information of the computer-readable program instructions to personalize custom electronic circuitry, such as Programmable Logic circuits, field-Programmable Gate arrays (FPGAs), or Programmable Logic Arrays (PLAs).

Various aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

It is also noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by hardware (e.g., a Circuit or an ASIC) for performing the corresponding function or action, or by combinations of hardware and software, such as firmware.

While the invention has been described in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a review of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the word "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Having described embodiments of the present application, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

A height detection method, comprising:

performing semantic plane detection on a plurality of video frames acquired by an image acquisition component of electronic equipment, and determining ground information in the plurality of video frames;

carrying out face detection on the plurality of video frames to determine a face area;

determining a first face pose of a target object in the plurality of video frames according to the face area and a preset face three-dimensional model;

and determining a first height of the target object according to the ground information, the first face pose and the equipment pose of the electronic equipment.
The method of claim 1, further comprising at least one of:

prompting a user to shoot the ground under the condition that the ground information is not detected within a preset time period;

prompting a user to adjust the pose of the equipment under the condition that the pitch angle of the electronic equipment indicated by the pose of the equipment does not meet a first preset condition;

prompting a user to adjust the device pose and/or change the face pose of the target object when the first face pose does not meet a second preset condition; or

And prompting a user to adjust the pose of the equipment under the condition that the face area does not meet a third preset condition.
The method of claim 1 or 2, wherein the determining a first body height of the target object from the ground information, the first face pose, and the device pose of the electronic device comprises:

determining a second height of the target object according to the ground information, the first face pose and the equipment pose;

and carrying out post-processing on the second height to obtain the first height, wherein the post-processing comprises Kalman filtering.
The method according to any one of claims 1-3, further comprising:

displaying the first body height in a display interface of the electronic device.
A height detection device, applied to an electronic apparatus, comprising:

an image acquisition component for acquiring a plurality of video frames;

a processing component configured to:

performing semantic plane detection on the plurality of video frames, and determining ground information in the plurality of video frames;

carrying out face detection on the plurality of video frames to determine a face area;

determining a first face pose of a target object in the plurality of video frames according to the face image of the face area and a preset face three-dimensional model;

and determining a first height of the target object according to the ground information, the first face pose and the equipment pose of the electronic equipment.
The apparatus of claim 5, wherein the processing component is further configured to at least one of:

prompting a user to shoot the ground under the condition that the ground information is not detected within a preset time period;

prompting a user to adjust the pose of the equipment under the condition that the pitch angle of the electronic equipment indicated by the pose of the equipment does not meet a first preset condition;

prompting a user to adjust the device pose and/or change the face pose of the target object under the condition that the first face pose does not meet a second preset condition; or

And prompting a user to adjust the pose of the equipment under the condition that the face area does not meet a third preset condition.
The apparatus of claim 5 or 6, wherein the determining a first body height of the target object from the ground information, the first face pose, and the device pose of the electronic device comprises:

determining a second height of the target object according to the ground information, the first face pose and the equipment pose;

and carrying out post-processing on the second height to obtain the first height, wherein the post-processing comprises Kalman filtering.
The apparatus according to any one of claims 5-7, wherein the processing component is further configured to:

displaying the first height on a display interface of the electronic device.
A height detection device, comprising:

an image acquisition component for acquiring a plurality of video frames;

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to implement the method of any one of claims 1-4 when executing the instructions.
A non-transitory computer readable storage medium having stored thereon computer program instructions, wherein the computer program instructions, when executed by a processor, implement the method of any one of claims 1-4.
A computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in an electronic device, a processor in the electronic device performs the method of any of claims 1-4.