CN113243886A - Vision detection system and method based on deep learning and storage medium - Google Patents

Vision detection system and method based on deep learning and storage medium Download PDF

Info

Publication number
CN113243886A
CN113243886A CN202110652556.9A CN202110652556A CN113243886A CN 113243886 A CN113243886 A CN 113243886A CN 202110652556 A CN202110652556 A CN 202110652556A CN 113243886 A CN113243886 A CN 113243886A
Authority
CN
China
Prior art keywords
posture
user
layer
evaluation module
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110652556.9A
Other languages
Chinese (zh)
Other versions
CN113243886B (en
Inventor
桑高丽
卢丽
闫超
韩强
陶陶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Yifei Technology Co ltd
Original Assignee
Sichuan Yifei Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Yifei Technology Co ltd filed Critical Sichuan Yifei Technology Co ltd
Priority to CN202110652556.9A priority Critical patent/CN113243886B/en
Publication of CN113243886A publication Critical patent/CN113243886A/en
Application granted granted Critical
Publication of CN113243886B publication Critical patent/CN113243886B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B3/00Apparatus for testing the eyes; Instruments for examining the eyes
    • A61B3/02Subjective types, i.e. testing apparatus requiring the active assistance of the patient
    • A61B3/028Subjective types, i.e. testing apparatus requiring the active assistance of the patient for testing visual acuity; for determination of refraction, e.g. phoropters
    • A61B3/032Devices for presenting test symbols or characters, e.g. test chart projectors
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B3/00Apparatus for testing the eyes; Instruments for examining the eyes
    • A61B3/0016Operational features thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/113Recognition of static hand signs

Landscapes

  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Surgery (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Veterinary Medicine (AREA)
  • Public Health (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Animal Behavior & Ethology (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Ophthalmology & Optometry (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses a vision detection system, a method and a storage medium based on deep learning, wherein the vision detection system comprises an identification display module, an image acquisition module, a posture evaluation module and a result evaluation module; the image acquisition module is used for acquiring images of limb actions made by the arms of the user and inputting the images into the posture evaluation module; the gesture evaluation module is used for detecting and acquiring gesture key points of limb actions made by the arms; and the result evaluation module judges the arm state of the user according to the arm posture key point of the user, further judges whether the action of the user is consistent with the vision detection identification, and outputs a detection result. The result evaluation module judges whether the hand posture direction of the user is consistent or not according to the hand posture direction of the user, the hand posture is larger than the finger direction, the target is easier to identify, and the detection precision is higher.

Description

Vision detection system and method based on deep learning and storage medium
Technical Field
The invention belongs to the technical field of vision detection, and particularly relates to a vision detection system and method based on deep learning and a storage medium.
Background
With the development of science and technology and the popularity of information technology, people have more and more time to use high-technology equipment such as mobile phones, computers, televisions and the like, so that the risk of eyesight damage is increased, and particularly for teenager groups, the people cannot ignore the eyesight damage. The general vision detection is generally performed in a professional institution in a manual mode, and a user cannot perform independent operation.
Meanwhile, some intelligent vision detection systems, such as the intelligent vision detector based on image analysis disclosed in patent CN106778597A, adopt a detector of a traditional algorithm in the field of machine vision to judge the posture and the pointing direction of a finger, and construct a vision detection system based on the direction information of the finger. However, such methods are based on conventional algorithms and have relatively low accuracy. Meanwhile, the finger has a small target and a complex joint, so that the gesture and the pointing direction of the finger are difficult to accurately judge, and the performance of the system is limited.
In the field of deep learning, machine vision models constructed based on a self-attention mechanism, which have appeared in recent years, have achieved excellent accuracy in many fields, but the general self-attention mechanism has increased exponentially to the power of 4 as the image input size increases. On a large-sized input image, the amount of computation is required to be excessive.
Disclosure of Invention
The present invention aims to provide a vision testing system, method and storage medium based on deep learning, and aims to solve the above problems.
The invention is mainly realized by the following technical scheme:
a vision detection system based on deep learning comprises an identification display module, an image acquisition module, a posture evaluation module and a result evaluation module; the identification display module is used for displaying the vision detection identification, and the user utilizes the arm to make corresponding limb actions; the image acquisition module is used for acquiring images of limb actions made by the arms of the user and inputting the images into the posture evaluation module; the gesture evaluation module is used for detecting and acquiring gesture key points of limb actions made by the arms; and the result evaluation module judges the arm state of the user according to the arm posture key point of the user, further judges whether the action of the user is consistent with the vision detection identification, and outputs a detection result.
The invention displays the mark for vision detection through the display module, the user uses the arm to make corresponding limb movement, after the image acquisition module acquires the image of the limb movement, the posture key point of the user is detected by the posture evaluation module based on deep learning, and finally the result evaluation module judges whether the user movement is consistent with the vision detection mark. The result evaluation module judges whether the hand posture direction of the user is consistent or not according to the hand posture direction of the user, the hand posture is larger than the finger direction, the target is easier to identify, and the detection precision is higher.
In order to better implement the present invention, further, the posture evaluation module includes a target detection submodule and a posture detection submodule, and the target detection submodule is used for detecting a coordinate frame of a human body; the input of the posture detection submodule is an image area corresponding to a human body, key points of the human body posture are detected, and coordinate information of the key points of the posture is output.
In order to better implement the present invention, further, the gesture detection sub-module includes a plurality of alternating local attention units and a result output unit, which are sequentially arranged from front to back, wherein the alternating local attention units are used for extracting gesture feature information and generating a feature map; and the result output unit is used for up-sampling the feature map to improve the resolution of the feature map and generating final posture key point coordinate information from the feature map.
The target detection submodule is used for detecting a coordinate frame of a human body and can be realized by detectors such as yolo and the like. And then, cutting an image area corresponding to the human body, and using the cut image area as the input of a posture detection submodule to detect 17 key points of the human body posture. And the result output unit performs up-sampling on the characteristic diagram by adopting deconvolution. The number of channels of the output feature map is 17, and the feature map corresponds to 17 key points of the human body posture respectively. And the specific coordinate information of the key points is given by the coordinates of the maximum value on the characteristic diagram. The attitude evaluation module is constructed by adopting an alternating local attention mechanism, and has the advantages of high precision, small calculated amount and the like compared with the traditional machine vision method, a convolutional neural network and the like.
In order to better implement the present invention, further, the alternating local attention unit includes a region embedding layer and several alternating local attention layers, which are sequentially arranged from front to back, the region embedding layer is used for down-sampling the input image or feature map to fuse the information of all spatial points in the region into a single feature vector; the alternating local attention layer comprises a first region division layer, a first region self-attention layer, a second region division layer and a second region self-attention layer which are sequentially arranged from front to back; the first area dividing layer and the second area dividing layer are respectively used for dividing the feature map into a plurality of areas, and the first area self-attention layer and the second area self-attention layer are respectively used for performing self-attention operation in each area. For a region embedding layer with a downsampling rate of k, it can be implemented by a convolution operation with a convolution kernel size of k and a convolution step size of k.
In order to better implement the present invention, if the size of the input feature map is HxW, the partition size of the first region partition layer is M, the partition size of the second region partition layer is N, and the first region partition layer and the second region partition layer partition the feature map into (H/M) x (W/M) MxM regions and (H/N) x (W/N) NxN regions, respectively; the partition sizes M and N are relatively prime integers. The division mode can lead the same characteristic point to be divided with different characteristic points when the first area and the second area are divided, thereby realizing information circulation between the areas and leading the alternate local attention layer to obtain global characteristic information.
In order to better implement the present invention, the division sizes of the first area division layer and the second area division layer are 7 and 5, respectively.
Further, if the down-sampling rate of the area embedding layer is 2, the number of the alternating local attention layers in the 4 alternating local attention units is 2, 4, 10, 1.
In order to better implement the invention, the result evaluation module further determines the state of the arm according to the relative position of the elbow and wrist key points in the human posture key points, if the state of the arm of the user is consistent with the vision detection identifier, the detection result is determined to be correct, otherwise, the detection result is determined to be wrong.
And the result evaluation module judges whether the arm is in 5 types of states including leftward, rightward, upward and downward and other states by judging the relative positions of the elbow key point, the wrist key point and other key points in the key points of the human posture, and represents the response of the user to the vision detection identifier. If the response is consistent with the vision detection identification, the result is judged to be correct, and if the response is not consistent with the vision detection identification, the result is judged to be wrong.
A vision detection method based on deep learning is carried out by adopting the vision detection system, and comprises the following steps:
step S100: the vision detection identification is displayed through the identification display module, so that a user can observe the vision detection identification and make corresponding actions by using arms;
step S200: acquiring an image of limb actions of an arm of a user through an image acquisition module;
step S300: inputting the image collected in the step S200 into a posture evaluation module and detecting a posture key point of a user; firstly, detecting a coordinate frame of a human body in an image through a target detection submodule, then cutting an image area corresponding to the human body and inputting the image area into a posture detection submodule, detecting key points of the posture of the human body, and obtaining coordinate information of the posture key points;
step S400: and the result evaluation module judges the state of the arm according to the relative position of the key point of the human posture, if the state of the arm of the user is consistent with the vision detection identifier, the detection result is judged to be correct, and if not, the detection result is judged to be wrong.
In order to better implement the present invention, further, in step S300, the alternating local attention unit is used to extract the pose feature information and generate a feature map, and then the feature map is up-sampled by the result output unit, and coordinate information of the pose key point is generated from the feature map.
A computer readable storage medium storing computer program instructions which, when executed by a processor, implement the vision detection method described above.
The invention has the beneficial effects that:
(1) the result evaluation module in the invention detects the arm gesture direction of the user, but not the finger direction. Compared with the direction of the fingers, the gesture of the arm is larger in target and easier to identify, and the detection precision is higher;
(2) the invention judges the direction of the arm by detecting the relative positions of the elbow and wrist key points and other key points to judge the response result of the user. Compared with the common mode of judging the orientation of the finger, the method has the advantages that the target of the arm is larger, and the judgment precision is higher;
(3) the attitude evaluation module is constructed by adopting an alternating local attention mechanism, and has the advantages of high precision, small calculated amount and the like compared with the traditional machine vision method, a convolutional neural network and the like;
(4) compared with the common self-attention unit, the alternate local attention unit adopted by the invention has the advantages that the calculation complexity is reduced by
Figure DEST_PATH_IMAGE001
Become into
Figure 135479DEST_PATH_IMAGE002
Or
Figure DEST_PATH_IMAGE003
Typically M and N are much smaller than H, W, and therefore the computational complexity is much reduced.
Drawings
FIG. 1 is a functional block diagram of the present invention;
FIG. 2 is a process flow diagram of a pose estimation module;
FIG. 3 is a functional block diagram of a gesture detection sub-module;
FIG. 4 is a schematic diagram of an alternate local attention unit;
FIG. 5 is a schematic diagram of a structure of alternating local attention layers.
Detailed Description
Example 1:
a vision detection system based on deep learning is shown in figure 1 and comprises an identification display module, an image acquisition module, a posture evaluation module and a result evaluation module; the identification display module is used for displaying the vision detection identification, and the user utilizes the arm to make corresponding limb actions; the image acquisition module is used for acquiring images of limb actions made by the arms of the user and inputting the images into the posture evaluation module; the gesture evaluation module is used for detecting and acquiring gesture key points of limb actions made by the arms; and the result evaluation module judges the arm state of the user according to the arm posture key point of the user, further judges whether the action of the user is consistent with the vision detection identification, and outputs a detection result.
Further, the result evaluation module judges the state of the arm according to the relative position of the elbow key point and the wrist key point in the human posture key point, if the state of the arm of the user is consistent with the vision detection identifier, the detection result is judged to be correct, and if not, the detection result is judged to be wrong.
The invention displays the mark for vision detection through the display module, the user uses the arm to make corresponding limb movement, after the image acquisition module acquires the image of the limb movement, the posture key point of the user is detected by the posture evaluation module based on deep learning, and finally the result evaluation module judges whether the user movement is consistent with the vision detection mark. The result evaluation module judges whether the hand posture direction of the user is consistent or not according to the hand posture direction of the user, the hand posture is larger than the finger direction, the target is easier to identify, and the detection precision is higher.
Example 2:
in this embodiment, optimization is performed on the basis of embodiment 1, and as shown in fig. 2, the posture evaluation module includes a target detection submodule and a posture detection submodule, where the target detection submodule is used to detect a coordinate frame of a human body; the input of the posture detection submodule is an image area corresponding to a human body, key points of the human body posture are detected, and coordinate information of the key points of the posture is output.
Further, as shown in fig. 3, the gesture detection sub-module includes a plurality of alternating local attention units and a result output unit, which are sequentially arranged from front to back, and the alternating local attention units are configured to extract gesture feature information and generate a feature map; and the result output unit is used for up-sampling the feature map to improve the resolution of the feature map and generating final posture key point coordinate information from the feature map.
Further, as shown in fig. 4, the alternating local attention unit includes a region embedding layer and several alternating local attention layers, which are sequentially arranged from front to back, and the region embedding layer is configured to down-sample an input image or feature map to fuse information of all spatial points in a region into a single feature vector; as shown in fig. 5, the alternating local attention layer includes a first region division layer, a first region self-attention layer, a second region division layer, and a second region self-attention layer, which are sequentially arranged from front to back; the first area dividing layer and the second area dividing layer are respectively used for dividing the feature map into a plurality of areas, and the first area self-attention layer and the second area self-attention layer are respectively used for performing self-attention operation in each area.
Further, if the size of the input feature map is HxW, the partition size of the first region partition layer is M, the partition size of the second region partition layer is N, and the first region partition layer and the second region partition layer partition the feature map into (H/M) x (W/M) MxM regions and (H/N) x (W/N) NxN regions, respectively; the partition sizes M and N are relatively prime integers. H, W is a conventional expression of the size of the feature map, and therefore, the description thereof is omitted.
Further, the division sizes of the first area division layer and the second area division layer are 7 and 5, respectively.
The target detection submodule is used for detecting a coordinate frame of a human body and can be realized by detectors such as yolo and the like. And then, cutting an image area corresponding to the human body, and using the cut image area as the input of a posture detection submodule to detect 17 key points of the human body posture. And the result output unit performs up-sampling on the characteristic diagram by adopting deconvolution. The number of channels of the output feature map is 17, and the feature map corresponds to 17 key points of the human body posture respectively. And the specific coordinate information of the key points is given by the coordinates of the maximum value on the characteristic diagram. The attitude evaluation module is constructed by adopting an alternating local attention mechanism, and has the advantages of high precision, small calculated amount and the like compared with the traditional machine vision method, a convolutional neural network and the like.
Other parts of this embodiment are the same as embodiment 1, and thus are not described again.
Example 3:
a vision detection system based on deep learning is shown in figure 1 and comprises an identification display module, an image acquisition module, a posture evaluation module and a result evaluation module; the identification display module is used for displaying the vision detection identification, and the user utilizes the arm to make corresponding limb actions; the image acquisition module is used for acquiring images of limb actions made by the arms of the user and inputting the images into the posture evaluation module; the gesture evaluation module is used for detecting and acquiring gesture key points of limb actions made by the arms; and the result evaluation module judges the arm state of the user according to the arm posture key point of the user, further judges whether the action of the user is consistent with the vision detection identification, and outputs a detection result.
Further, as shown in fig. 2, the posture evaluation module includes a target detection submodule and a posture detection submodule, and the target detection submodule is used for detecting a coordinate frame of a human body; the input of the posture detection submodule is an image area corresponding to a human body, key points of the human body posture are detected, and coordinate information of the key points of the posture is output.
Further, as shown in fig. 3, the pose estimation module is constructed by using an alternating local attention mechanism, and includes several alternating local attention units, configured to extract pose feature information and generate a feature map, so as to obtain coordinate information of a final pose key point.
As shown in fig. 4, the alternating local attention unit includes a region embedding layer and several alternating local attention layers, which are sequentially arranged from front to back, and the region embedding layer is used to down-sample an input image or feature map to fuse information of all spatial points in a region into a single feature vector; as shown in fig. 5, the alternating local attention layer includes a first region division layer, a first region self-attention layer, a second region division layer, and a second region self-attention layer, which are sequentially arranged from front to back; the first area dividing layer and the second area dividing layer are respectively used for dividing the feature map into a plurality of areas, and the first area self-attention layer and the second area self-attention layer are respectively used for performing self-attention operation in each area.
Further, if the size of the input feature map is HxW, the partition size of the first region partition layer is M, the partition size of the second region partition layer is N, and the first region partition layer and the second region partition layer partition the feature map into (H/M) x (W/M) MxM regions and (H/N) x (W/N) NxN regions, respectively; the partition sizes M and N are relatively prime integers.
Further, the division sizes of the first area division layer and the second area division layer are 7 and 5, respectively.
The invention displays the mark for vision detection through the display module, the user uses the arm to make corresponding limb movement, after the image acquisition module acquires the image of the limb movement, the posture key point of the user is detected by the posture evaluation module based on deep learning, and finally the result evaluation module judges whether the user movement is consistent with the vision detection mark. The result evaluation module judges whether the hand posture direction of the user is consistent or not according to the hand posture direction of the user, the hand posture is larger than the finger direction, the target is easier to identify, and the detection precision is higher.
Example 4:
a vision testing method based on deep learning is carried out by adopting the vision testing system as shown in figure 1, and comprises the following steps:
step S100: the vision detection identification is displayed through the identification display module, so that a user can observe the vision detection identification and make corresponding actions by using arms;
step S200: acquiring an image of limb actions of an arm of a user through an image acquisition module;
step S300: inputting the image collected in the step S200 into a posture evaluation module and detecting a posture key point of a user; as shown in fig. 2, firstly, a coordinate frame of a human body in an image is detected by a target detection submodule, then, an image area corresponding to the human body is cut and input into a posture detection submodule, key points of the posture of the human body are detected, and coordinate information of the posture key points is obtained;
step S400: and the result evaluation module judges the state of the arm according to the relative position of the key point of the human posture, if the state of the arm of the user is consistent with the vision detection identifier, the detection result is judged to be correct, and if not, the detection result is judged to be wrong.
Further, in step S300, the alternative local attention unit is used to extract the pose feature information and generate a feature map, and then the result output unit performs upsampling on the feature map and generates coordinate information of the pose key point from the feature map.
The invention displays the mark for vision detection through the display module, the user uses the arm to make corresponding limb movement, after the image acquisition module acquires the image of the limb movement, the posture key point of the user is detected by the posture evaluation module based on deep learning, and finally the result evaluation module judges whether the user movement is consistent with the vision detection mark. The result evaluation module judges whether the hand posture direction of the user is consistent or not according to the hand posture direction of the user, the hand posture is larger than the finger direction, the target is easier to identify, and the detection precision is higher.
Example 5:
a vision detection method based on deep learning is realized by adopting a vision detection system, as shown in figure 1, firstly, a display module is used for displaying a mark for vision detection, a user utilizes an arm to make corresponding limb movement, after the image acquisition module acquires the corresponding limb movement, a posture key point of the user is detected by a posture evaluation module based on the deep learning, and finally, a result evaluation module is used for judging whether the movement of the user is consistent with the vision detection mark.
Further, as shown in fig. 2, the pose estimation module is composed of a target detection submodule and a pose detection submodule constructed by using an alternating local attention method. The target detection submodule is used for detecting a coordinate frame of a human body and can be realized by detectors such as yolo and the like. And then, cutting an image area corresponding to the human body, and using the cut image area as the input of a posture detection submodule to detect 17 key points of the human body posture. In this embodiment, the object detection submodule detects a human image in an image by using a yolov5 object detector. The corresponding region is then cropped, scaled to 224x224 size, as input to the pose detection sub-module.
Further, as shown in FIG. 3, the gesture detection sub-module is constructed primarily using an alternating local attention approach. The specific structure is composed of a plurality of repeated alternating local attention units and a result output unit. And the alternating local attention unit is used for extracting the attitude characteristic information and generating a characteristic diagram. And the result output unit is used for up-sampling the feature map, improving the resolution of the feature map and generating final posture key point coordinate information from the feature map. In this embodiment, 4 alternating local attention cells are used.
Further, as shown in fig. 4, the alternating local attention cells are composed of a region embedding layer and a plurality of repeated alternating local attention layers. And the region embedding layer is used for downsampling the input image or the feature map and fusing the information of all spatial points in the region into a single feature vector. For a region embedding layer with a downsampling rate of k, it can be implemented by a convolution operation with a convolution kernel size of k and a convolution step size of k.
As shown in fig. 5, the alternating local attention layer is composed of a first zone division layer, a first zone self-attention layer, a second zone division layer, and a second zone self-attention layer. If the size of the input feature map is HxW, the division size of the first region division layer is M, the division size of the second region division layer is N, and the first region division layer and the second region division layer divide the feature map into (H/M) x (W/M) MxM regions and (H/N) x (W/N) NxN regions respectively; the division sizes M and N are relatively prime integers, so that feature information can be exchanged among the regions, and therefore the alternative local attention layer can acquire global feature information.
Further, the division size of the first area is 7, and the division size of the second area is 5. The computational complexity of the region self-attention unit is
Figure 114936DEST_PATH_IMAGE004
Or
Figure DEST_PATH_IMAGE005
While the computational complexity of a normal self-attention unit is
Figure 956991DEST_PATH_IMAGE006
Typically M and N are much smaller than H, W, and therefore the computational complexity is much reduced.
Further, the region embedding layer down-sampling rate is 2. Of the 4 alternating local attention cells, the number of repetitions of the alternating local attention layer is 2, 4, 10, 1, respectively.
Further, the result output unit performs up-sampling on the feature map by using deconvolution. The number of channels of the output feature map is 17, and the feature map corresponds to 17 key points of the human body posture respectively. And the specific coordinate information of the key points is given by the coordinates of the maximum value on the characteristic diagram.
And the result evaluation module judges whether the arm is in 5 types of states including leftward, rightward, upward and downward and other states by judging the relative positions of the elbow key point, the wrist key point and other key points in the key points of the human posture, and represents the response of the user to the vision detection identifier. If the response is consistent with the vision detection identification, the result is judged to be correct, and if the response is not consistent with the vision detection identification, the result is judged to be wrong.
The result evaluation module in the invention detects the arm gesture direction of the user, but not the finger direction. Compared with the direction of the fingers, the gesture of the arm is larger in target and easier to identify, and the detection precision is higher.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications and equivalent variations of the above embodiments according to the technical spirit of the present invention are included in the scope of the present invention.

Claims (10)

1. A vision detection system based on deep learning is characterized by comprising an identification display module, an image acquisition module, a posture evaluation module and a result evaluation module; the identification display module is used for displaying the vision detection identification, and the user utilizes the arm to make corresponding limb actions; the image acquisition module is used for acquiring images of limb actions made by the arms of the user and inputting the images into the posture evaluation module; the gesture evaluation module is used for detecting and acquiring gesture key points of limb actions made by the arms; and the result evaluation module judges the arm state of the user according to the arm posture key point of the user, further judges whether the action of the user is consistent with the vision detection identification, and outputs a detection result.
2. The deep learning based vision detection system of claim 1, wherein the gesture evaluation module comprises a target detection submodule and a gesture detection submodule, and the target detection submodule is used for detecting a coordinate frame of a human body; the input of the posture detection submodule is an image area corresponding to a human body, key points of the human body posture are detected, and coordinate information of the key points of the posture is output.
3. The vision detection system based on deep learning of claim 2, wherein the gesture detection submodule includes a plurality of alternative local attention units and a result output unit, which are arranged from front to back in sequence, and the alternative local attention units are used for extracting gesture feature information and generating a feature map; and the result output unit is used for up-sampling the feature map to improve the resolution of the feature map and generating final posture key point coordinate information from the feature map.
4. The vision detection system based on deep learning of claim 3, wherein the alternative local attention unit comprises a region embedding layer and a plurality of alternative local attention layers, the region embedding layer is arranged from front to back, and the region embedding layer is used for down-sampling an input image or a feature map to fuse information of all spatial points in a region into a single feature vector; the alternating local attention layer comprises a first region division layer, a first region self-attention layer, a second region division layer and a second region self-attention layer which are sequentially arranged from front to back; the first area dividing layer and the second area dividing layer are respectively used for dividing the feature map into a plurality of areas, and the first area self-attention layer and the second area self-attention layer are respectively used for performing self-attention operation in each area.
5. The vision detection system based on deep learning of claim 4, wherein if the size of the input feature map is HxW, the division size of the first region division layer is M, the division size of the second region division layer is N, and the first region division layer and the second region division layer divide the feature map into (H/M) x (W/M) MxM regions, (H/N) x (W/N) NxN regions, respectively; the partition sizes M and N are relatively prime integers.
6. The vision testing system based on deep learning of claim 5, wherein the first region division layer and the second region division layer have division sizes of 7 and 5 respectively.
7. The vision testing system based on deep learning of claim 1, wherein the result evaluation module determines the state of the arm according to the relative position of the elbow and wrist key points in the key points of the human posture, and if the state of the arm of the user is consistent with the vision testing mark, the testing result is determined to be correct, otherwise, the testing result is determined to be wrong.
8. A vision testing method based on deep learning, which is performed by using the vision testing system of any one of claims 1-7, and comprises the following steps:
step S100: the vision detection identification is displayed through the identification display module, so that a user can observe the vision detection identification and make corresponding actions by using arms;
step S200: acquiring an image of limb actions of an arm of a user through an image acquisition module;
step S300: inputting the image collected in the step S200 into a posture evaluation module and detecting a posture key point of a user; firstly, detecting a coordinate frame of a human body in an image through a target detection submodule, then cutting an image area corresponding to the human body and inputting the image area into a posture detection submodule, detecting key points of the posture of the human body, and obtaining coordinate information of the posture key points;
step S400: and the result evaluation module judges the state of the arm according to the relative position of the key point of the human posture, if the state of the arm of the user is consistent with the vision detection identifier, the detection result is judged to be correct, and if not, the detection result is judged to be wrong.
9. The vision testing method based on deep learning of claim 8, wherein in step S300, the alternative local attention unit is used to extract the pose feature information and generate the feature map, and then the result output unit performs up-sampling on the feature map and generates the coordinate information of the pose key point from the feature map.
10. A computer-readable storage medium storing computer program instructions, characterized in that the program instructions, when executed by a processor, implement the method of claim 8 or 9.
CN202110652556.9A 2021-06-11 2021-06-11 Vision detection system and method based on deep learning and storage medium Active CN113243886B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110652556.9A CN113243886B (en) 2021-06-11 2021-06-11 Vision detection system and method based on deep learning and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110652556.9A CN113243886B (en) 2021-06-11 2021-06-11 Vision detection system and method based on deep learning and storage medium

Publications (2)

Publication Number Publication Date
CN113243886A true CN113243886A (en) 2021-08-13
CN113243886B CN113243886B (en) 2021-11-09

Family

ID=77187718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110652556.9A Active CN113243886B (en) 2021-06-11 2021-06-11 Vision detection system and method based on deep learning and storage medium

Country Status (1)

Country Link
CN (1) CN113243886B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114013431A (en) * 2022-01-06 2022-02-08 宁波均联智行科技股份有限公司 Automatic parking control method and system based on user intention
CN114305317A (en) * 2021-12-23 2022-04-12 广州视域光学科技股份有限公司 Method and system for intelligently distinguishing user feedback optotypes

Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5619291A (en) * 1995-09-01 1997-04-08 Putnam; Mark D. Patient-user interactive psychotherapy apparatus and method
CN103598870A (en) * 2013-11-08 2014-02-26 北京工业大学 Optometry method based on depth-image gesture recognition
CN203524640U (en) * 2013-11-09 2014-04-09 宋秋杰 Eyesight automatic detection instrument for ophthalmology
CN106778597A (en) * 2016-12-12 2017-05-31 朱明� Intellectual vision measurer based on graphical analysis
US20180103838A1 (en) * 2015-01-20 2018-04-19 Green C.Tech Ltd Method and system for automatic eyesight diagnosis
US10089556B1 (en) * 2017-06-12 2018-10-02 Konica Minolta Laboratory U.S.A., Inc. Self-attention deep neural network for action recognition in surveillance videos
CN109145867A (en) * 2018-09-07 2019-01-04 北京旷视科技有限公司 Estimation method of human posture, device, system, electronic equipment, storage medium
CN109785396A (en) * 2019-01-23 2019-05-21 中国科学院自动化研究所 Writing posture monitoring method based on binocular camera, system, device
CN110135243A (en) * 2019-04-02 2019-08-16 上海交通大学 A kind of pedestrian detection method and system based on two-stage attention mechanism
CN110353622A (en) * 2018-10-16 2019-10-22 武汉交通职业学院 A kind of vision testing method and eyesight testing apparatus
CN110825845A (en) * 2019-10-23 2020-02-21 中南大学 Hierarchical text classification method based on character and self-attention mechanism and Chinese text classification method
CN110858295A (en) * 2018-08-24 2020-03-03 广州汽车集团股份有限公司 Traffic police gesture recognition method and device, vehicle control unit and storage medium
CN210372949U (en) * 2019-08-19 2020-04-21 中国人民解放军总医院 Flashlight capable of measuring pupil diameter
CN111091604A (en) * 2019-11-18 2020-05-01 中国科学院深圳先进技术研究院 Training method and device of rapid imaging model and server
CN111178251A (en) * 2019-12-27 2020-05-19 汇纳科技股份有限公司 Pedestrian attribute identification method and system, storage medium and terminal
US20200234078A1 (en) * 2018-06-15 2020-07-23 Shenzhen Sensetime Technology Co., Ltd. Target matching method and apparatus, electronic device, and storage medium
CN112017198A (en) * 2020-10-16 2020-12-01 湖南师范大学 Right ventricle segmentation method and device based on self-attention mechanism multi-scale features
CN112149466A (en) * 2019-06-28 2020-12-29 富士通株式会社 Arm action recognition method and device and image processing equipment
US20210004087A1 (en) * 2018-02-19 2021-01-07 Valkyrie Industries Limited Haptic Feedback for Virtual Reality
CN112270283A (en) * 2020-11-04 2021-01-26 北京百度网讯科技有限公司 Abnormal driving behavior determination method, device, equipment, vehicle and medium
CN112418227A (en) * 2020-10-28 2021-02-26 北京工业大学 Monitoring video truck segmentation method based on double-self-attention mechanism
CN112686234A (en) * 2021-03-22 2021-04-20 杭州魔点科技有限公司 Face image quality evaluation method, electronic device and storage medium
CN112801069A (en) * 2021-04-14 2021-05-14 四川翼飞视科技有限公司 Face key feature point detection device, method and storage medium
CN112883149A (en) * 2021-01-20 2021-06-01 华为技术有限公司 Natural language processing method and device

Patent Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5619291A (en) * 1995-09-01 1997-04-08 Putnam; Mark D. Patient-user interactive psychotherapy apparatus and method
CN103598870A (en) * 2013-11-08 2014-02-26 北京工业大学 Optometry method based on depth-image gesture recognition
CN203524640U (en) * 2013-11-09 2014-04-09 宋秋杰 Eyesight automatic detection instrument for ophthalmology
US20180103838A1 (en) * 2015-01-20 2018-04-19 Green C.Tech Ltd Method and system for automatic eyesight diagnosis
CN106778597A (en) * 2016-12-12 2017-05-31 朱明� Intellectual vision measurer based on graphical analysis
US10089556B1 (en) * 2017-06-12 2018-10-02 Konica Minolta Laboratory U.S.A., Inc. Self-attention deep neural network for action recognition in surveillance videos
US20210004087A1 (en) * 2018-02-19 2021-01-07 Valkyrie Industries Limited Haptic Feedback for Virtual Reality
US20200234078A1 (en) * 2018-06-15 2020-07-23 Shenzhen Sensetime Technology Co., Ltd. Target matching method and apparatus, electronic device, and storage medium
CN110858295A (en) * 2018-08-24 2020-03-03 广州汽车集团股份有限公司 Traffic police gesture recognition method and device, vehicle control unit and storage medium
CN109145867A (en) * 2018-09-07 2019-01-04 北京旷视科技有限公司 Estimation method of human posture, device, system, electronic equipment, storage medium
CN110353622A (en) * 2018-10-16 2019-10-22 武汉交通职业学院 A kind of vision testing method and eyesight testing apparatus
CN109785396A (en) * 2019-01-23 2019-05-21 中国科学院自动化研究所 Writing posture monitoring method based on binocular camera, system, device
CN110135243A (en) * 2019-04-02 2019-08-16 上海交通大学 A kind of pedestrian detection method and system based on two-stage attention mechanism
CN112149466A (en) * 2019-06-28 2020-12-29 富士通株式会社 Arm action recognition method and device and image processing equipment
CN210372949U (en) * 2019-08-19 2020-04-21 中国人民解放军总医院 Flashlight capable of measuring pupil diameter
CN110825845A (en) * 2019-10-23 2020-02-21 中南大学 Hierarchical text classification method based on character and self-attention mechanism and Chinese text classification method
CN111091604A (en) * 2019-11-18 2020-05-01 中国科学院深圳先进技术研究院 Training method and device of rapid imaging model and server
CN111178251A (en) * 2019-12-27 2020-05-19 汇纳科技股份有限公司 Pedestrian attribute identification method and system, storage medium and terminal
CN112017198A (en) * 2020-10-16 2020-12-01 湖南师范大学 Right ventricle segmentation method and device based on self-attention mechanism multi-scale features
CN112418227A (en) * 2020-10-28 2021-02-26 北京工业大学 Monitoring video truck segmentation method based on double-self-attention mechanism
CN112270283A (en) * 2020-11-04 2021-01-26 北京百度网讯科技有限公司 Abnormal driving behavior determination method, device, equipment, vehicle and medium
CN112883149A (en) * 2021-01-20 2021-06-01 华为技术有限公司 Natural language processing method and device
CN112686234A (en) * 2021-03-22 2021-04-20 杭州魔点科技有限公司 Face image quality evaluation method, electronic device and storage medium
CN112801069A (en) * 2021-04-14 2021-05-14 四川翼飞视科技有限公司 Face key feature point detection device, method and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SALZER, YAEL等: "《Evaluation of the attention network test using vibrotactile stimulations》", 《BEHAVIOR RESEARCH METHODS》 *
朱张莉等: "《注意力机制在深度学习中的研究进展》", 《中文信息学报》 *
高扬: "《人工智能与机器人先进技术丛书 智能摘要与深度学习》", 30 April 2019, 北京理工大学出版社 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114305317A (en) * 2021-12-23 2022-04-12 广州视域光学科技股份有限公司 Method and system for intelligently distinguishing user feedback optotypes
CN114305317B (en) * 2021-12-23 2023-05-12 广州视域光学科技股份有限公司 Method and system for intelligently distinguishing user feedback optotype
CN114013431A (en) * 2022-01-06 2022-02-08 宁波均联智行科技股份有限公司 Automatic parking control method and system based on user intention

Also Published As

Publication number Publication date
CN113243886B (en) 2021-11-09

Similar Documents

Publication Publication Date Title
CN109859296B (en) Training method of SMPL parameter prediction model, server and storage medium
US9996982B2 (en) Information processing device, authoring method, and program
CN113243886B (en) Vision detection system and method based on deep learning and storage medium
CN102750079B (en) Terminal unit and object control method
CN103718175B (en) Detect equipment, method and the medium of subject poses
CN110109535A (en) Augmented reality generation method and device
CN112001859B (en) Face image restoration method and system
CN110986969B (en) Map fusion method and device, equipment and storage medium
CN111401318B (en) Action recognition method and device
CN111652054B (en) Joint point detection method, gesture recognition method and device
CN109902631B (en) Rapid face detection method based on image pyramid
US20220392201A1 (en) Image feature matching method and related apparatus, device and storage medium
CN109359514A (en) A kind of gesture tracking identification federation policies method towards deskVR
CN113642393A (en) Attention mechanism-based multi-feature fusion sight line estimation method
CN113505694B (en) Man-machine interaction method and device based on sight tracking and computer equipment
CN113792651B (en) Gesture interaction method, device and medium integrating gesture recognition and fingertip positioning
CN111914756A (en) Video data processing method and device
CN112183506A (en) Human body posture generation method and system
CN112733641A (en) Object size measuring method, device, equipment and storage medium
CN116363750A (en) Human body posture prediction method, device, equipment and readable storage medium
Jin et al. Multi-person gaze-following with numerical coordinate regression
CN115249365A (en) Human body direction detection device and human body direction detection method
Solichah et al. Marker-less motion capture based on openpose model using triangulation
Song et al. ConcatNet: A deep architecture of concatenation-assisted network for dense facial landmark alignment
CN113918013A (en) Gesture directional interaction system and method based on AR glasses

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant