CN113177553A

CN113177553A - Method and device for identifying floor buttons of inner panel of elevator

Info

Publication number: CN113177553A
Application number: CN202110606042.XA
Authority: CN
Inventors: 楼云江; 李爽; 张近民; 孟雨皞; 陈雨景
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-07-27
Anticipated expiration: 2041-05-31
Also published as: CN113177553B

Abstract

The invention relates to a method and a device for identifying floor buttons of an inner panel of an elevator. The method comprises two processes for detecting and identifying floor buttons of an inner panel of the elevator, which are respectively as follows: a related identification network building and training process; and an inner panel floor button detection and identification process. In the first process, a convolutional neural network with a certain structure is built and trained by utilizing an open source data set. In a second process: and (3) providing a candidate region of the text region, then giving out possible key positions by using a clustering algorithm, carrying out text recognition by using the network trained in the first process, and matching the recognition result with the possible key position result to give out a preliminary detection and recognition result. The device is a computer device for trying the method. The scheme of the invention has lower operation cost, does not need to train the network separately aiming at different use environments, and has stronger anti-interference capability to noise and stronger adaptability to different use environments.

Description

Method and device for identifying floor buttons of inner panel of elevator

Technical Field

The invention relates to a method and a device for detecting and identifying floor buttons of an inner panel of an elevator, belonging to the technical field of image identification by combining machine vision.

Background

With the continuous development and maturity of the robot technology, the robot technology begins to gradually appear in the public life. Currently, mobile robots are being applied step by step in aspects such as autonomous navigation, guidance, transportation, etc. The mobile service robot capable of realizing cross-floor navigation in the multi-floor building has great application value: they can be used for guiding guests in a hotel or sending corresponding articles or dishes for a specified guest room; the system can also help doctors or patients with inconvenient actions to transfer medicines or carry personal articles in hospitals; and the device can also be used for security patrol of buildings, indoor cleaning and the like.

Currently, the navigation technology of a single floor is mature and perfect, and the autonomous navigation and operation technology of cross-floor still needs a longer perfect process. One key technical problem to be solved by the cross-floor navigation technology is how to realize autonomous cross-floor of the robot. At present, much attention is attracted to the fact that the robot completes floor transfer in a mode that the robot takes a vertical ladder by self. Compared with other modes (such as using stairs or escalators), the mode of using the straight ladder does not require additional design on the structure of the robot, and meanwhile, the floor transfer through the straight ladder has higher efficiency. But correspondingly, the manner in which floor transfers are accomplished using straight elevators requires the robot to have the ability to operate the elevator autonomously. Among these, the most important is the need for the robot to be able to detect and recognize the floors and numbers represented by the buttons on the panel inside the elevator. As long as the robot can identify the floor buttons on the elevator panel, the robot can have the capability of automatically operating the elevator by matching with the mechanical arm.

Disclosure of Invention

The invention provides detection and identification of floor buttons of an inner panel of an elevator, aiming at solving the problems of detection and identification of the buttons of the inner panel of a robot elevator and enabling a detection and identification method to have higher robustness and applicability.

The technical scheme of the invention relates to a method for identifying floor buttons of an inner panel of an elevator, which comprises the following steps:

s1, providing a convolutional neural network for information recognition in an elevator panel and training the convolutional neural network by using a related character data set to recognize a predetermined character based on the convolutional neural network, wherein the predetermined character at least comprises a number;

s2, collecting pictures containing a plurality of buttons of an elevator panel in a head-up shooting posture, detecting a plurality of maximum stable extreme value regions in the pictures, and screening out one or more candidate character regions by combining preset geometric limit characteristics;

s3, calculating a button area result exceeding a preset possibility probability through a clustering algorithm according to the horizontal and vertical coordinate position relation of the maximum stable extremum areas;

s4, transmitting the candidate character areas into the constructed and trained convolutional neural network to obtain the recognition result of the characters in each candidate character area;

s5, outputting the result of character detection and recognition of the button region exceeding the preset possibility probability in cooperation with the result of step S3 and the result of step S4.

Further, the step S1 includes:

s11, constructing the convolutional neural network, and outputting a plurality of numerical values between 0 and 1 to represent the probability that the picture data belong to the preset character after sequentially carrying out vector convolution operation, first maximum pooling layer processing, two-dimensional matrix convolution operation, second maximum pooling layer processing and full-connection layer processing on the picture data of the input layer;

and S12, training the network by using the calibrated given image and character data set, so that the recognition accuracy of the convolutional neural network reaches a preset accuracy threshold.

Further, the step S2 includes:

s21, converting the collected picture containing a plurality of buttons of the elevator panel into a gray-scale map, and then updating the pixel value of each point according to the following formula

Wherein f (x, y) is an original pixel value of a pixel at the image coordinate (x, y), g (x, y) is an updated pixel value of the pixel at the image coordinate (x, y), and T is a preset pixel threshold;

s22, detecting one or more maximum stable extremum regions in the gray-scale image by using a maximum stable extremum region detection arithmetic unit;

s23, using the rectangular length-width ratio in the detected maximum stable extremum regions as a first limiting condition, and then screening the maximum stable extremum regions meeting the first limiting condition as candidate character regions.

Further, the step S3 includes:

s31, taking the area ratio of the maximum stable extremum region in the detected maximum stable extremum regions to the enclosed rectangle as a second limiting condition, then reserving candidate character regions meeting the second limiting condition as regions to be clustered, and calculating region coordinates of the regions to be clustered in the image;

s32, clustering is carried out on the basis of the abscissa of the multiple regions to be clustered to obtain multiple clusters, each element of each cluster is associated with a maximum stable extremum region coordinate, and then clusters containing more elements are selected to execute the subsequent steps;

s33, clustering the clusters clustered by the horizontal coordinates by the vertical coordinates;

and S34, combining the abscissa center and the ordinate center of the cluster, and marking the combined area as a button coordinate position, so that the area corresponding to the button coordinate position is marked as a button area exceeding the preset possibility probability.

Further, the step S4 includes:

s41, after the image gray scale processing of the candidate character area, the image is compressed into a black and white image of 28 pixels by 28 pixels;

and S42, transmitting the black-and-white image into the established convolutional neural network, and selecting the character corresponding to the maximum numerical value in the output probabilities as a recognition result.

Further, the step S5 includes:

s51, calculating the distance between each candidate character area and a plurality of button areas, and if the distance is smaller than a preset distance threshold, matching the character areas with the button areas smaller than the preset distance threshold;

s52, determining the total number of characters of the character area matched with each button area,

if the total number of the characters is 1, temporarily marking the character corresponding to the button area to represent the floor number,

temporarily marking a combination of a plurality of characters corresponding to the button areas to represent the floor number if the number of the characters of the number in the total number of the characters is 2 and the characters in the button areas are approximate in size, shape and position;

s53, checking the coordinates of all the button areas temporarily marked as representing the floor number, calculating the average size of the character areas corresponding to the coordinates of these button areas,

and if the character area exceeds the average size by a percentage threshold, releasing the floor number associated with the character area and the button area.

Further, the method also comprises the following steps:

s61, extracting an arithmetic progression and a tolerance according to the button area cluster clustered by the abscissa in the step S32 and the digital character recognition result of the corresponding character area;

and S62, combining the known information of the highest floor and the lowest floor and the arithmetic progression, and completing and correcting the positions of all the buttons and the information of the floors represented by the buttons according to the known space equidistance rule of the button arrangement.

In some embodiments, the predetermined characters include the numbers 0 through 9, the letters a, the letters G, and a minus sign, or may also include other known elevator button symbols.

The invention also relates to a computer-readable storage medium, on which program instructions are stored, which program instructions, when executed by a processor, implement the above-mentioned method.

The invention also relates to a computer device arranged in the mobile robot, which comprises an image acquisition circuit, a processor and the computer readable storage medium.

The invention has the following beneficial effects.

1. The method adopts a related identification network to build and train a process and an inner panel floor digital button detection and identification process, so as to realize the detection and identification of the floor information of the buttons of the inner panel of the elevator, namely the identification buttons correspond to the floors.

2. In the process of detecting and identifying the floor digital buttons of the inner panel, the scheme of the invention provides a candidate region of a text region by detecting the maximum stable extremum region and matching with the designed manual characteristics; giving possible button positions through a clustering algorithm; identifying possible text candidate areas through a trained simple convolutional neural network; matching the recognition result with the possible button position result to give a preliminary detection and recognition result; and finally, complementing and correcting the detection positions and the recognition results of all the buttons by combining other known information such as the spatial arrangement rule of the buttons. Therefore, the multiple buttons on the elevator panel can be simultaneously and accurately identified.

3. The scheme of the invention has lower operation cost, does not need to train the network separately aiming at different use environments, only needs to simply modify individual parameters or threshold values, has stronger anti-interference capability on noise and stronger adaptability to different use environments.

4. The technical scheme of the invention can be matched in a vision system of the robot to realize the automatic detection of the robot and identify the floor information of the buttons on the inner panel of the elevator, so that the robot can operate the elevator buttons automatically to move to the target floor.

Drawings

Fig. 1 is a main flow diagram of a method for panel floor button detection and identification in an elevator according to the present invention.

Fig. 2 is a schematic diagram of the structure of a convolutional neural network in the method according to the present invention.

Fig. 3 is a detailed flow chart of the method according to the invention.

Fig. 4 is a schematic diagram of the maximum stable extremum region screened out from the picture according to the method of the present invention.

Fig. 5 is a schematic diagram of character candidate regions screened out in a picture according to the method of the present invention.

Fig. 6(a) to 6(e) are schematic diagrams of a clustering process in the method according to the present invention.

Fig. 7 is a schematic diagram of the recognition result of the character candidate area in the picture by the convolutional neural network according to the method of the present invention.

Fig. 8 is a schematic illustration of the preliminary detection and identification results of the method according to the invention.

Fig. 9 is a schematic illustration of the final detection and identification results of the method according to the invention.

Detailed Description

The conception, the specific structure and the technical effects of the present invention will be clearly and completely described in conjunction with the embodiments and the accompanying drawings to fully understand the objects, the schemes and the effects of the present invention.

The technical scheme of the invention realizes the detection and identification of the floor buttons of the panel in the elevator means that the positions of the floor buttons of the elevator are detected in the image containing the floor buttons of the panel in the elevator and the floor information represented by each floor button is identified (namely the floor information corresponds to the movement of the elevator to the next floor when the button is operated). Typically, such floor information will be reflected or imprinted on the surface of the floor buttons in characters (e.g., numbers, letters, and words), braille, and the like. Thus, the button recognition in the solution of the invention is further based on image recognition, mainly recognizing these characters to the surface image of the floor button and matching to a specific floor button (or the position of the floor button).

The technical scheme of the invention mainly adopts two processes to detect and identify the floor buttons of the inner panel of the elevator, which are respectively as follows: a related identification network building and training process; and an inner panel floor button detection and identification process. In the first process, the open source data set is constructed and utilized to train the convolutional neural network with a certain structure. In the second process: firstly, a candidate region of a text region is provided, then a clustering algorithm is utilized to give possible key positions, then a network trained in the first process is used for text recognition, and a recognition result is matched with the possible key position result to give a preliminary detection and recognition result. In addition, the detection positions and the recognition results of all the buttons are supplemented and corrected in combination with other known information.

The elevator inner panel floor button identification method and device of the present invention will be described in detail in the following with several embodiments. Wherein, the technical scheme is mainly described by taking numerical characters (namely 0-9) as an example. It will be appreciated that the solutions described in these embodiments are equally applicable to other characters of the elevator button being identified, such as the letter A, G, the mathematical symbol "-", etc.

Referring to fig. 1, an elevator inner panel floor button recognition method according to some embodiments of the present invention, taking button detection and recognition of numeric characters as an example, includes the steps of:

s1: building a convolutional neural network for information recognition in an elevator panel and training the convolutional neural network by using a related data set so as to be capable of recognizing numbers 0-9;

s2: acquiring a picture containing a plurality of buttons of an elevator panel, detecting a Maximum Stable Extremum Region (MSER) in the picture, and screening out a digital candidate region by combining the characteristics of relevant presets (namely manual setting);

s3: calculating the button region result exceeding the preset probability (namely, higher probability, such as exceeding 90% probability) through a clustering algorithm according to the coordinate position relation of the maximum stable extremum regions;

s4: transmitting the digital candidate region into the convolutional neural network established and trained in step S1 to obtain a recognition result of the character text region (for example, the digital text region);

s5: using the result of step S3 and the result of step S4 to give preliminary detection and recognition results in cooperation with each other;

s6: and (4) complementing and correcting the result obtained in the step (S5) by combining other known information and the result obtained in the step (S5) to obtain a final detection and identification result.

Details of each of the above steps are described in various embodiments below in conjunction with the flow chart shown in fig. 3.

In one embodiment, step S1 is implemented as follows.

Step S11: the convolutional neural network is constructed as shown in figure 2. The method comprises the steps of sequentially carrying out vector convolution operation (by using a ReLU activation function), first maximum pooling layer processing, two-dimensional matrix convolution operation (by using the ReLU activation function), second maximum pooling layer processing and full-link layer processing (by using the ReLU activation function) on picture data of an input layer, and then outputting a plurality of numerical values between 0 and 1 to represent the probability that the picture data belongs to a preset character. For example, a black-and-white picture with 28 × 28 input layers is output as 10 numbers between 0 and 1, which indicates the probability that the picture belongs to 0 to 9.

Step S12: and training the network by using the calibrated related data set to ensure that the network has the identification accuracy rate of more than 99 percent. For example, a large number of black and white pictures with 0-9 numbers being calibrated are input into the convolutional neural network shown in fig. 2, and then it is determined whether the accuracy rate meeting the calibrated preset value is more than 99% from a plurality of output recognition results. If not, executing step S12-2, and continuing inputting new black and white pictures with calibrated 0-9 digits into the convolutional neural network for continuing training. In other embodiments, the letters A, G, the mathematical symbol "-" and the like may also be used as training.

In one embodiment, step S2 is implemented as follows.

Step S20: acquiring a picture containing a plurality of buttons of an elevator panel in a head-up shooting posture; or correcting the angle of the collected picture to ensure that the object in the picture keeps a normal observation visual angle instead of being inverted or inclined.

Step S21: firstly, the color original image is converted into a gray scale image, and then the pixel value of each point is updated according to the following formula

Wherein f (x, y) is the original pixel value of the pixel at the image coordinate (x, y), g (x, y) is the updated pixel value of the pixel at the image coordinate (x, y), and T is the preset pixel threshold. The value of T can be adjusted so that the picture can display enough button outline pixels to facilitate the binarization process.

Step S22: a Maximally Stable Extremal Region (MSER) detector is used to detect a maximally stable extremal region in the gray scale map. The MSER detector is a program module which encapsulates the MSER area detection algorithm. For the binarized image, the binarization threshold value is set to [0, 255], so that the binarized image undergoes a process from full black to full white (as an overhead view with rising water level). In the process, the area of some connected regions is slightly changed along with the rising of the threshold value, and the region is the MSER. In the example of the present invention, the maximum stable extremum regions in the detected result in the collected elevator button picture are shown as the blocks in fig. 4.

Step S23: the most stable extremal region satisfying the condition is taken as a candidate region of the number by using the feature condition set manually (preferably, the aspect ratio, the ratio of the area of the most stable extremal region to the area of the rectangle surrounding the most stable extremal region, and the like are generally selected). The filtered out digital candidate regions are shown as boxes in fig. 5. And when the characteristic condition judgment of all MSERs is completed, the subsequent steps are executed.

In one embodiment, step S3 is implemented as follows.

Step S31: by using the manually set characteristic conditions (preferably, the length-width ratio, the ratio of the area of the maximum stable extremum region to the area of the rectangle surrounding the maximum stable extremum region, etc.) the obvious noise regions are filtered out, for example, the MSER with the overlong length of the lower edge of the billboard above the button in FIG. 4 is filtered out.

Step S32: clustering the abscissas of all the regions in the result of step S31 to obtain a plurality of clusters (each element of each cluster is associated with one detected MSER), as shown in fig. 6 (a); clusters with more elements (e.g., 3 elements) are selected for the subsequent steps, as shown in fig. 6 (b).

Step S33: clustering is performed with ordinate in the clusters obtained in step S32, respectively, as shown in fig. 6 (c);

step S34: combining the results of steps S32 and S33, the abscissa center and the ordinate center of the cluster are combined to obtain a button coordinate position with a high possibility, as shown in fig. 6 (e).

In one embodiment, step S4 is implemented as follows.

Step S41: the gray scale map of the digital candidate region obtained in step S23 is scaled to a black-and-white map of 28 × 28 pixels.

Step S42: the black-and-white image obtained in step S41 is transmitted to the neural network constructed in step S1, and the number corresponding to the largest number of the ten output probabilities is selected as the recognition result, as shown in fig. 7.

Step S43: and judging whether the identification of all the number candidate regions is finished, if so, executing the subsequent steps, and otherwise, returning to the step S41.

In one embodiment, step S5 is implemented as follows.

Step S51: for each candidate region of digital text, the button coordinate closest to it is found and the distance between them is calculated, and if the distance is close (e.g., the distance is less than the width dimension of a half button), the digital text region and the button coordinate are considered to match.

Step S52: checking the number of the digital texts matched with each button coordinate, and if the number is 1, determining that the button coordinate corresponds to the floor represented by the number; if the number is 2, whether the two numeric text areas are approximate in size, shape and position, if so, the button is considered to correspond to the floor represented by the two digits formed by the two numbers; if the number is greater than 2 or equal to zero, the button coordinate is not considered to correspond to the corresponding floor for the time being. In other embodiments, if each button coordinate is identified as matching both a number and a letter, such as "13A" is identified, both numbers, albeit with letters, are converted to the corresponding floor 14 according to known rules. In other embodiments, if the number of floors of the elevator exceeds 99 floors, the number of matched numeric texts is adjusted to be greater than 3 or equal to zero, and the button coordinate is not considered to correspond to the corresponding floor.

Step S53: all the button positions which are temporarily considered to represent floors are checked, the average size of the text regions corresponding to the button positions is calculated, and if the text regions which are obviously larger or smaller than the average size exist (for example, the region exceeds 150% of the average area, or the edge length exceeds 50% of the average length), the button position corresponding to the region is temporarily not considered to represent the corresponding floor. For example, the text region of the noise in the lower right corner of fig. 7 is eliminated. This results in a preliminary recognition result, as shown in fig. 8.

In one embodiment, step S6 is implemented as follows.

Step S61: from bottom to top in each column of the preliminary recognition result, an arithmetic difference number column with the cluster number finally obtained in step S32 as a tolerance is found.

Step S62: and (4) complementing the floors of each row which should exist by combining the known information of the highest floor and the lowest floor and the arithmetic progression in the step S61, and complementing and correcting all the button positions and the floors represented by the button positions according to the spatial equidistance rule of the button arrangement to obtain the final detection and identification result, as shown in the attached figure 9. In addition, in other embodiments, the number of floors may be determined according to known floor information rules, such as, for example, G-floor is equivalent to 1-floor, and 3A-floor is equivalent to 4-floor, to conform to the arithmetic progression.

In some embodiments, a computer program implementing the method of the present invention may be collocated in a vision processing system of a robot to enable the robot to autonomously detect and identify button floor information on an elevator inner panel, so that the robot autonomously operates elevator buttons to move to a target floor.

It should be recognized that the method steps in embodiments of the present invention may be embodied or carried out by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The method may use standard programming techniques. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.

Further, the operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.

Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the invention may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention may also include the computer itself when programmed according to the methods and techniques described herein.

A computer program can be applied to input data to perform the functions described herein to transform the input data to generate output data that is stored to non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.

The above description is only a preferred embodiment of the present invention, and the present invention is not limited to the above embodiment, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention as long as the technical effects of the present invention are achieved by the same means. The invention is capable of other modifications and variations in its technical solution and/or its implementation, within the scope of protection of the invention.

Claims

1. A method for identifying floor buttons on a panel in an elevator, the method comprising the steps of:

s2, acquiring pictures containing a plurality of buttons of an elevator panel, detecting a plurality of maximum stable extremum regions in the pictures, and screening one or more candidate character regions by combining preset geometric limit characteristics;

2. The method according to claim 1, wherein the step S1 includes:

3. The method according to claim 1, wherein the step S2 includes:

4. The method according to claim 1, wherein the step S3 includes:

5. The method according to claim 1, wherein the step S4 includes:

6. The method according to claim 4, wherein the step S5 includes:

7. The method of claim 6, further comprising the steps of:

8. The method of any one of claims 1 to 7, wherein the predetermined characters include numbers 0 to 9, letters A, letters G, and a negative sign.

9. A computer readable storage medium having stored thereon program instructions which, when executed by a processor, implement the method of any one of claims 1 to 8.

10. A computer device disposed in a mobile robot comprising image acquisition circuitry, a processor, and the computer readable storage medium of claim 9.