CN114005181A

CN114005181A - Interactive relationship identification method and device and electronic equipment

Info

Publication number: CN114005181A
Application number: CN202111277727.0A
Authority: CN
Inventors: 郝燕茹
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-02-01

Abstract

The disclosure provides an interactive relationship identification method and device and electronic equipment, relates to the technical field of artificial intelligence, and particularly relates to the technical field of computer vision and deep learning. The method comprises the following steps: determining first characteristic information of a human body and second characteristic information of an object in an image to be detected through a first neural network; determining the spatial relationship between the human body and the object according to the first characteristic information and the second characteristic information; determining background characteristic information, third characteristic information of a human body and fourth characteristic information of an object in the image to be detected through a second neural network; and determining the interactive relationship between the human body and the object according to the spatial relationship, the background characteristic information, the third characteristic information and the fourth characteristic information. The method improves the identification accuracy of the interactive relationship.

Description

Interactive relationship identification method and device and electronic equipment

Technical Field

The present disclosure relates to computer vision and deep learning technologies in the technical field of artificial intelligence, and in particular, to a method and an apparatus for identifying an interaction relationship, and an electronic device.

Background

Human-Object Interaction (HOI) recognition refers to recognition of an interactive relationship between a person and an Object in an image, namely recognition of what the person is doing to a certain Object in the image, and with development of artificial intelligence technology, demands for detection of the interactive relationship between the person and the Object in various scenes are gradually increased, for example, recognition of phone playing and phone calling of the person in a supervision scene, recognition of phone calling and smoking of the person in a smart site, a gas station, safety production and other scenes, recognition of a suitcase in a security scene, recognition of a racket swinging of the person in a sports scene and the like.

In the related art, a method based on context learning can be adopted for identifying the interaction relationship, however, the method depends on context information related to the object in the image, and if the context information is insufficient, the identification accuracy is low.

Disclosure of Invention

The disclosure provides an interactive relation identification method and device capable of improving identification accuracy and electronic equipment.

According to an aspect of the present disclosure, there is provided a method for identifying an interaction relationship, including:

determining first characteristic information of a human body and second characteristic information of an object in an image to be detected through a first neural network;

determining the spatial relationship between the human body and the object according to the first characteristic information and the second characteristic information;

determining background characteristic information in the image to be detected, third characteristic information of the human body and fourth characteristic information of the object through a second neural network;

and determining the interactive relationship between the human body and the object according to the spatial relationship, the background characteristic information, the third characteristic information and the fourth characteristic information.

According to another aspect of the present disclosure, there is provided an apparatus for identifying an interaction relationship, including:

the first determining module is used for determining first characteristic information of a human body and second characteristic information of an object in the image to be detected through a first neural network;

the second determining module is used for determining the spatial relationship between the human body and the object according to the first characteristic information and the second characteristic information;

the third determining module is used for determining the background characteristic information in the image to be detected, the third characteristic information of the human body and the fourth characteristic information of the object through a second neural network;

and the fourth determining module is used for determining the interactive relationship between the human body and the object according to the spatial relationship, the background characteristic information, the third characteristic information and the fourth characteristic information.

According to still another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

According to yet another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the first aspect described above.

According to yet another aspect of the present disclosure, there is provided a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of an electronic device can read the computer program, execution of the computer program by the at least one processor causing the electronic device to perform the method of the first aspect.

According to the technical scheme disclosed by the invention, the identification accuracy of the interaction relationship between the human and the object is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic view of an application scenario of a method for identifying an interaction relationship provided in an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of a method for identifying an interaction relationship provided according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of an apparatus for identifying an interaction relationship provided according to an embodiment of the present disclosure;

fig. 4 is a schematic block diagram of an electronic device for implementing the method for identifying an interaction relationship according to the embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic view of an application scenario of a method for identifying an interaction relationship according to an embodiment of the present disclosure, where the identification of the interaction relationship requires image recognition on fig. 1, marking people and objects therein, and determining an interaction relationship between the people and the objects, where the interaction relationship may be an action describing what the people do with the objects, for example, with respect to fig. 1, the identification result should be that one person smokes and another person calls. There may be multiple interactions between each person and object, and each scene may contain multiple persons and objects.

In order to improve the identification accuracy of the interactive relationship, the scheme of the embodiment of the disclosure provides that one kind of neural network is used to extract the characteristic information of the human body and the characteristic information of the object in the image, and the spatial relationship between the human body and the object is determined based on the extracted characteristic information of the human body and the characteristic information of the object, and another kind of neural network is used to extract the visual characteristics such as the characteristic information of the background, the characteristic information of the human body and the characteristic information of the object in the image, and the interactive relationship between the human body and the object is determined by combining the spatial relationship and the visual characteristics.

The present disclosure provides a method and an apparatus for identifying an interaction relationship, an electronic device, a storage medium, and a program product, which are applied to computer vision and deep learning technologies in the technical field of artificial intelligence, and in particular, can be applied in smart cities and smart traffic scenes to achieve the purpose of improving the identification accuracy of the interaction relationship.

Hereinafter, the method for identifying an interaction provided by the present disclosure will be described in detail by specific embodiments. It is to be understood that the following detailed description may be combined with other embodiments, and that the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 2 is a schematic flowchart of a method for identifying an interaction relationship according to an embodiment of the present disclosure. The execution subject of the method is an interactive relation recognition device, and the device can be realized in a software and/or hardware mode. As shown in fig. 2, the method includes:

s201, determining first characteristic information of a human body and second characteristic information of an object in an image to be detected through a first neural network.

The specific type of the first neural network is not limited in the embodiment of the disclosure, the input of the first neural network model can be an image to be detected, and the output is first characteristic information of a human body and second characteristic information of an object in the image to be detected; or, the input of the first neural network may also be a human body detection frame and an object detection frame in the image to be detected, and the human body detection frame and the object detection frame in the image to be detected may be input into the first neural network after being detected and identified in advance in other manners, and output as the first characteristic information of the human body and the second characteristic information of the object in the image to be detected.

S202, determining the spatial relationship between the human body and the object according to the first characteristic information and the second characteristic information.

The spatial relationship between the human body and the object can be determined through the first characteristic information of the human body and the second characteristic information of the object, and the spatial relationship can represent the position relationship, the distance and the like between the human body and the object.

S203, determining background characteristic information, third characteristic information of the human body and fourth characteristic information of the object in the image to be detected through a second neural network.

The second neural network in the embodiments of the present disclosure may be different from the first neural network, and the specific type of the second neural network is not limited in the embodiments of the present disclosure. The second neural network can determine the third characteristic information of the human body and the fourth characteristic information of the object in the detected image, and can also determine the background characteristic information in the image to be detected.

The input of the second neural network model can be an image to be detected, and the output of the second neural network model can be background characteristic information, third characteristic information of a human body in the image to be detected and fourth characteristic information of an object; or the input of the second neural network may also be a human body detection frame, an object detection frame and a feature vector of the image to be detected in the image to be detected, the human body detection frame, the object detection frame and the feature vector of the image to be detected in the image to be detected may be detected and identified in other manners in advance and then input into the second neural network, and the background feature information in the image to be detected, the first feature information of the human body and the second feature information of the object are output, wherein the background feature information may be feature information of a background except the human body and the object in the human body detection frame and the object detection frame; or, the input of the second neural network may also be a human body detection frame and an object detection frame in the image to be detected, the human body detection frame and the object detection frame in the image to be detected may be detected and identified in other manners in advance and then input to the second neural network, and the output is the background characteristic information in the image to be detected, the first characteristic information of the human body, and the second characteristic information of the object, and the background characteristic information may be the characteristic information of the background except the human body and the object in the human body detection frame and the object detection frame.

And S204, determining the interactive relationship between the human body and the object according to the spatial relationship, the background characteristic information, the third characteristic information and the fourth characteristic information.

Various visual characteristic information (background characteristic information, characteristic information of the human body and characteristic information of the object) of the image to be detected can be refined or strengthened by utilizing the spatial relationship between the human body and the object, so that the interactive relationship between the human body and the object is determined by integrating the spatial relationship between the human body and the object and the visual characteristic information.

According to the method, the first neural network is used for extracting the characteristic information of the human body and the characteristic information of the object in the image, the spatial relationship between the human body and the object is determined based on the characteristic information, the second neural network is used for extracting visual characteristics such as the characteristic information of the background, the characteristic information of the human body and the characteristic information of the object in the image, the interactive relationship between the human body and the object is determined by combining the spatial relationship and the visual characteristic information, and the identification accuracy of the interactive relationship is improved.

On the basis of the above embodiments, how to extract the above various kinds of feature information and how to determine the interaction relationship between the human body and the object are further explained.

Optionally, the image to be detected is input into a fourth neural network to obtain a human body detection frame and an object detection frame in the image to be detected, and optionally, the fourth neural network is fast-RCNN, which is not limited in the embodiment of the present disclosure.

Correspondingly, in S201, determining the first characteristic information of the human body and the second characteristic information of the object in the image to be detected through the first neural network includes:

and inputting the human body detection frame and the object detection frame into the first neural network to obtain first characteristic information of the human body and second characteristic information of the object. Optionally, the first neural network is CNN. And then, the spatial relationship between the human body and the object can be determined according to the first characteristic information and the second characteristic information of the object.

In S203, determining, by a second neural network, background feature information in the image to be detected, third feature information of the human body, and fourth feature information of the object, including:

and inputting the human body detection frame and the object detection frame into a second neural network to obtain background characteristic information, third characteristic information of the human body and fourth characteristic information of the object. Optionally, the second neural network is ResNet 50.

On the basis of the above, the following describes determining the interaction relationship between the human body and the object according to the spatial relationship, the background feature information, the third feature information, and the fourth feature information in S204.

First, according to the space relation between the human body and the object, a first interactive relation between the human body and the object is determined.

After the spatial relationship between the human body and the physics in the image to be detected is determined, the attention characteristic information of the object is determined according to the spatial relationship, the attention characteristic information can also become spatial attention characteristic information, and the attention characteristic information of the object is based on the spatial relationship between the human body and the physics, and the second characteristic information of the object is refined or strengthened, such as pixel filling and the like; and then, according to the attention characteristic information of the object and the first characteristic information of the human body, determining a first interactive relation between the human body and the object. Optionally, the attention feature information of the object and the first feature information of the human body may be input into the interaction relationship classification model to obtain the output first interaction relationship and probability. The interaction relation classification model can be obtained by training through image samples in advance.

And secondly, determining a second interaction relation and interaction strength between the human body and the object according to the spatial relation, the background characteristic information, the third characteristic information and the fourth characteristic information.

Optionally, determining attention feature information of the object according to the spatial relationship; and determining a second interaction relation and interaction strength according to the attention characteristic information, the background characteristic information, the third characteristic information and the fourth characteristic information of the object. It should be noted that, if the attention feature information of the object is determined in the first interactive relationship, the second interactive relationship and the interactive intensity may be determined directly according to the attention feature information of the object, the background feature information, the third feature information of the human body, and the fourth feature information of the object without re-determining in this step.

Optionally, the background feature information, the third feature information of the human body, the fourth feature information of the object and the attention feature information of the object are multiplied respectively, and the multiplication result is input into a pre-trained interaction relationship classification model to obtain a second interaction relationship; and meanwhile, obtaining the interaction strength according to the multiplication result.

The three feature information, namely the background feature information, the third feature information of the human body and the fourth feature information of the object, are multiplied by the attention feature information of the object respectively, and the three feature information are refined by using a spatial relationship, so that the identification accuracy is higher.

And finally, determining the interactive relationship between the human body and the object according to the first interactive relationship, the second interactive relationship and the interactive strength.

The interactive relation between the human body and the object, namely a first interactive relation and a second interactive relation, is respectively determined by adopting two methods, the first interactive relation and the second interactive relation are aggregated, and the characteristic information corresponding to the first interactive relation, the characteristic information corresponding to the second interactive relation and the interactive strength are multiplied to obtain the interactive relation between the human body and the object.

Optionally, the method of the embodiment of the present disclosure further includes:

inputting the third characteristic information of the person and the fourth characteristic information of the object into a third neural network to obtain a third interactive relation between the human body and the object; and updating the interaction relation between the human body and the object according to the third interaction relation. Optionally, the third neural network is a graph convolution network.

Based on the foregoing embodiment, in addition to determining the first interaction relationship, the second interaction relationship, and the interaction strength, the third feature information and the fourth feature information may be input to a third neural network, and then the third interaction relationship between the human body and the object is obtained, and the obtained three interaction relationships are aggregated to obtain the interaction relationship between the human body and the object. Illustratively, the feature information corresponding to the first interaction relationship, the feature information corresponding to the second interaction relationship, the feature information corresponding to the third interaction relationship, and the interaction strength are multiplied to obtain the interaction relationship between the human body and the object.

Fig. 3 is a schematic structural diagram of an apparatus for identifying an interaction relationship according to an embodiment of the present disclosure. As shown in fig. 3, the interactive relationship recognition apparatus 300 includes:

the first determining module 301 is configured to determine, through a first neural network, first characteristic information of a human body and second characteristic information of an object in an image to be detected;

a second determining module 302, configured to determine a spatial relationship between the human body and the object according to the first characteristic information and the second characteristic information;

a third determining module 303, configured to determine, through a second neural network, background feature information in the image to be detected, third feature information of the human body, and fourth feature information of the object;

and a fourth determining module 304, configured to determine an interaction relationship between the human body and the object according to the spatial relationship, the background feature information, the third feature information, and the fourth feature information.

In one embodiment, the fourth determining module 304 includes:

the first determining unit is used for determining a first interaction relation between the human body and the object according to the spatial relation;

the second determining unit is used for determining a second interaction relation and interaction strength between the human body and the object according to the spatial relation, the background characteristic information, the third characteristic information and the fourth characteristic information;

and the third determining unit is used for determining the interactive relationship between the human body and the object according to the first interactive relationship, the second interactive relationship and the interactive strength.

In one embodiment, the first determination unit includes:

the first determining subunit is used for determining attention feature information of the object according to the spatial relationship;

and the second determining subunit is used for determining the first interaction relation according to the attention characteristic information and the first characteristic information of the object.

In one embodiment, the second determination unit comprises:

the third determining subunit is used for determining attention characteristic information of the object according to the spatial relationship;

and the fourth determining subunit is used for determining a second interaction relationship and interaction strength according to the attention feature information, the background feature information, the third feature information and the fourth feature information of the object.

In one embodiment, the fourth determining subunit is configured to:

multiplying the background characteristic information, the third characteristic information and the fourth characteristic information with the attention characteristic information of the object respectively, and inputting the multiplication result into a pre-trained interactive relation classification model to obtain a second interactive relation;

and obtaining the interaction strength according to the multiplication result.

In one embodiment, the apparatus 300 for identifying an interaction relationship further includes:

the fifth determining module is used for inputting the third characteristic information and the fourth characteristic information into a third neural network to obtain a third interactive relation between the human body and the object;

and the updating module is used for updating the interaction relation between the human body and the object according to the third interaction relation.

the sixth determining module is used for inputting the image to be detected into the fourth neural network to obtain a human body detection frame and an object detection frame in the image to be detected;

the first determination module 301 includes:

the fourth determining unit is used for inputting the human body detection frame and the object detection frame into the first neural network to obtain first characteristic information and second characteristic information;

the second determination module 302 includes:

and the fifth determining unit is used for inputting the human body detection frame and the object detection frame into the second neural network to obtain the background characteristic information, the third characteristic information and the fourth characteristic information.

In one embodiment, the third determination unit comprises:

and the fifth determining subunit is used for multiplying the characteristic information corresponding to the first interactive relationship, the characteristic information corresponding to the second interactive relationship and the interactive intensity to obtain the interactive relationship between the human body and the object.

The apparatus of the embodiment of the present disclosure may be configured to execute the method for identifying an interaction relationship in the above method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.

The present disclosure also provides an electronic device and a non-transitory computer-readable storage medium storing computer instructions, according to embodiments of the present disclosure.

According to an embodiment of the present disclosure, the present disclosure also provides a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of the electronic device can read the computer program, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any of the embodiments described above.

Fig. 4 is a schematic block diagram of an electronic device for implementing the method for identifying an interaction relationship according to the embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 4, the electronic device 400 includes a computing unit 401 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)402 or a computer program loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data required for the operation of the device 400 can also be stored. The computing unit 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

A number of components in device 400 are connected to I/O interface 405, including: an input unit 406 such as a keyboard, a mouse, or the like; an output unit 407 such as various types of displays, speakers, and the like; a storage unit 408 such as a magnetic disk, optical disk, or the like; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Computing unit 401 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 401 executes the respective methods and processes described above, such as the identification method of the interaction relationship. For example, in some embodiments, the method of identifying an interaction relationship may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 400 via the ROM 402 and/or the communication unit 409. When the computer program is loaded into RAM 403 and executed by computing unit 401, one or more steps of the method for identifying an interaction described above may be performed. Alternatively, in other embodiments, the computing unit 401 may be configured to perform the identification method of the interaction relationship by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method for identifying an interaction relationship comprises the following steps:

2. The method of claim 1, wherein the determining an interactive relationship between the human body and the object according to the spatial relationship, the background feature information, the third feature information, and the fourth feature information comprises:

determining a first interaction relation between the human body and the object according to the spatial relation;

determining a second interaction relationship and interaction strength between the human body and the object according to the spatial relationship, the background feature information, the third feature information and the fourth feature information;

and determining the interactive relationship between the human body and the object according to the first interactive relationship, the second interactive relationship and the interactive strength.

3. The method of claim 2, wherein said determining a first interaction relationship between said human body and said object from said spatial relationship comprises:

according to the spatial relationship, determining attention feature information of the object;

and determining the first interaction relation according to the attention characteristic information and the first characteristic information of the object.

4. The method according to claim 2, wherein the determining a second interaction relationship and interaction strength between the human body and the object according to the spatial relationship, the background feature information, the third feature information and the fourth feature information comprises:

and determining the second interaction relationship and the interaction strength according to the attention feature information, the background feature information, the third feature information and the fourth feature information of the object.

5. The method of claim 4, wherein the determining the second interaction relationship and the interaction strength according to the attention feature information, the background feature information, the third feature information, and the fourth feature information of the object comprises:

multiplying the background characteristic information, the third characteristic information and the fourth characteristic information with the attention characteristic information of the object respectively, and inputting the multiplication result into a pre-trained interactive relationship classification model to obtain a second interactive relationship;

and obtaining the interaction strength according to the multiplication result.

6. The method according to any one of claims 1-5, further comprising:

inputting the third characteristic information and the fourth characteristic information into a third neural network to obtain a third interactive relation between the human body and the object;

and updating the interaction relation between the human body and the object according to the third interaction relation.

7. The method of any of claims 1-6, further comprising:

inputting the image to be detected into a fourth neural network to obtain a human body detection frame and an object detection frame in the image to be detected;

the determining of the first characteristic information of the human body and the second characteristic information of the object in the image to be detected through the first neural network comprises the following steps:

inputting the human body detection frame and the object detection frame into the first neural network to obtain the first characteristic information and the second characteristic information;

determining background characteristic information, third characteristic information of a human body and fourth characteristic information of an object in the image to be detected through a second neural network, wherein the determining comprises the following steps:

inputting the human body detection frame and the object detection frame into the second neural network to obtain the background characteristic information, the third characteristic information and the fourth characteristic information.

8. The method according to any one of claims 2-5, wherein the determining an interaction between a human body and an object according to the first interaction, the second interaction and the interaction strength comprises:

and multiplying the characteristic information corresponding to the first interactive relationship, the characteristic information corresponding to the second interactive relationship and the interactive intensity to obtain the interactive relationship between the human body and the object.

9. An apparatus for identifying an interaction relationship, comprising:

10. The apparatus of claim 9, wherein the fourth determining means comprises:

a second determining unit, configured to determine a second interaction relationship and interaction strength between the human body and the object according to the spatial relationship, the background feature information, the third feature information, and the fourth feature information;

11. The apparatus of claim 10, wherein the first determining unit comprises:

and the second determining subunit is used for determining the first interaction relation according to the attention characteristic information of the object and the first characteristic information.

12. The apparatus of claim 10, wherein the second determining unit comprises:

the third determining subunit is used for determining attention feature information of the object according to the spatial relationship;

a fourth determining subunit, configured to determine the second interaction relationship and the interaction strength according to the attention feature information, the background feature information, the third feature information, and the fourth feature information of the object.

13. The apparatus of claim 12, wherein the fourth determining subunit is to:

and obtaining the interaction strength according to the multiplication result.

14. The apparatus of any of claims 9-13, further comprising:

a fifth determining module, configured to input the third feature information and the fourth feature information into a third neural network, so as to obtain a third interaction relationship between the human body and the object;

and the updating module is used for updating the interactive relation between the human body and the object according to the third interactive relation.

15. The apparatus of any of claims 9-14, further comprising:

a sixth determining module, configured to input the image to be detected into a fourth neural network, so as to obtain a human detection frame and an object detection frame in the image to be detected;

the first determining module includes:

a fourth determining unit, configured to input the human body detection box and the object detection box into the first neural network to obtain the first feature information and the second feature information;

the second determining module includes:

a fifth determining unit, configured to input the human body detection box and the object detection box into the second neural network, so as to obtain the background feature information, the third feature information, and the fourth feature information.

16. The apparatus according to any one of claims 10-13, wherein the third determining unit comprises:

and the fifth determining subunit is configured to multiply the feature information corresponding to the first interaction relationship, the feature information corresponding to the second interaction relationship, and the interaction strength to obtain the interaction relationship between the human body and the object.

17. An electronic device, comprising:

at least one processor; and a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.

19. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1-8.