US20230048581A1 - Data exchange and transformation in stream computing systems - Google Patents

Data exchange and transformation in stream computing systems Download PDF

Info

Publication number
US20230048581A1
US20230048581A1 US17/885,115 US202217885115A US2023048581A1 US 20230048581 A1 US20230048581 A1 US 20230048581A1 US 202217885115 A US202217885115 A US 202217885115A US 2023048581 A1 US2023048581 A1 US 2023048581A1
Authority
US
United States
Prior art keywords
application
computer
container orchestration
implemented method
data stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/885,115
Inventor
Giuseppe Coviello
Kunal Rao
Murugan Sankaradas
Srimat Chakradhar
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Laboratories America Inc
Original Assignee
NEC Laboratories America Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Laboratories America Inc filed Critical NEC Laboratories America Inc
Priority to US17/885,115 priority Critical patent/US20230048581A1/en
Assigned to NEC LABORATORIES AMERICA, INC. reassignment NEC LABORATORIES AMERICA, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHAKRADHAR, SRIMAT, COVIELLO, GIUSEPPE, RAO, KUNAL, SANKARADAS, MURUGAN
Publication of US20230048581A1 publication Critical patent/US20230048581A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/543User-generated data transfer, e.g. clipboards, dynamic data exchange [DDE], object linking and embedding [OLE]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/542Event management; Broadcasting; Multicasting; Notifications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing

Definitions

  • the present invention relates to data management in distributed computing systems, and, more particularly, to developing complex services using data streams.
  • Smart sensors collect information from a variety of sources, and the exponential growth in the number of such sensors has caused a similar growth in the number of data streams that need to be managed.
  • a method for executing an application includes extending a container orchestration system application programming interface (API) to handle objects that specify components of an application.
  • An application representation is executed using the extended container orchestration system API, including the instantiation of one or more services that define a data stream path from a sensor to a device.
  • a system for executing an application includes a hardware processor and a memory that stores a computer program.
  • the computer program When executed by the hardware processor, the computer program causes the hardware processor to extend a container orchestration system application programming interface (API) to handle objects that specify components of an application and to execute an application representation using the extended container orchestration system API, including the instantiation of one or more services that define a data stream path from a sensor to a device.
  • API application programming interface
  • FIG. 1 is a block diagram of a distributed computing system, in accordance with an embodiment of the present invention.
  • FIG. 2 is a block diagram of a processing node in a distributed computing system, in accordance with an embodiment of the present invention
  • FIG. 3 A is a block diagram of a distributed computing application, in accordance with an embodiment of the present invention.
  • FIG. 3 B is a block diagram of a distributed computing application, in accordance with an embodiment of the present invention.
  • FIG. 4 is a block diagram of a data stream representation of a distributed computing application, in accordance with an embodiment of the present invention.
  • FIG. 5 is a block/flow diagram of a method of executing a distributed computing application using an extension to a container orchestration engine, in accordance with an embodiment of the present invention
  • FIG. 6 is a block diagram of a hardware processing system that executes a distributed computing application, in accordance with an embodiment of the present invention.
  • FIG. 7 is a block diagram of an extended container orchestration engine that executes a distributed computing application, in accordance with an embodiment of the present invention.
  • Scaling multi-sensor stream processing applications while ensuring reliable operation in the face of software and hardware failures, is particularly challenging as the number of data streams increases. Barriers to developer productivity may be reduced by providing a layer of abstraction that enables easy exchange, transformation, and management of data streams in complex, multi-sensor distributed stream processing applications.
  • Applications may be designed by defining and registering abstract objects, such as drivers, sensors, streams, analytics units, actuators, and devices, which together can be used to specify the overall application pipeline.
  • appropriate data communication mechanisms among the application’s objects may be automatically determined, for example including network connections and serialization and deserialization of data streams. Developers need only provide logic for different types of analytics processing and the system will automatically handle application-specific allocation, scheduling, and execution on the underlying distributed computing resources, as well as providing auto-scaling and operational reliability.
  • a user 102 may execute a workload on the distribution computing system 100 .
  • the user 102 communicates with manager system 104 .
  • the user 102 supplies information regarding the workload, including the number and type of processing nodes 106 that will be needed to execute the workload.
  • the information provided to the manager system 104 includes, for example, a number of processing nodes 106 , a processor type, an operating system, an execution environment, storage capacity, random access memory capacity, network bandwidth, and any other points that may be needed for the workload.
  • the user 102 can furthermore provide images or containers to the manager system 104 for storage in a registry there.
  • the distributed computing system 100 may include many thousands of processing nodes 106 , each of which can be idle or busy in accordance with the workloads being executed by the distributed computing system 100 at any given time. Although a single manager system 104 is shown, there may be multiple such manager systems 104 , with multiple registries distributed across the distributed computing system 100 .
  • the manager system 104 determines which processing nodes 106 will implement the microservices that make up the corresponding application.
  • the manager system 104 may configure the processing nodes 106 , for example based on node and resource availability at the time of provisioning.
  • the microservices may be hosted entirely on separate processing nodes 106 , or any number of microservices may be collocated at a same processing node 106 .
  • the manager system 104 and the distributed computing system 100 can handle multiple different workloads from multiple different users 102 , such that the availability of particular resources will depend on what is happening in the distributed computing system 100 generally.
  • Provisioning refers to the process by which resources in a distributed computing system 100 are allocated to a user 102 and are prepared for execution.
  • provisioning includes the determinations made by the manager system 104 as to which processing elements 106 will be used for the workload as well as the transmission of images and any configuration steps that are needed to prepare the processing nodes 106 for execution of the workload.
  • the configuration may include, for example, identifying communications methods to be used by the microservices.
  • the processing node 106 includes a hardware processor 202 , a memory 204 , and a network interface 206 .
  • the network interface 206 may be configured to communicate with the manager system 104 , with the user 102 , and with other processing nodes 106 as needed, using any appropriate communications medium and protocol.
  • the processing node 106 also includes one or more functional modules that may, in some embodiments, be implemented as software that is stored in the memory 204 and that may be executed by the hardware processor 202 . In other embodiments, one or more of the functional modules may be implemented as one or more discrete hardware components in the form of, e.g., application-specific integrated chips or field programmable gate arrays.
  • the processing node 106 may include one or more containers 208 . It is specifically contemplated that each container 208 represents a distinct operating environment.
  • the containers 208 each include a set of software applications, configuration files, workload datasets, and any other information or software needed to execute a specific workload. These containers 208 may implement one or more microservices for a distributed application.
  • the containers 208 are stored in memory 204 and are instantiated and decommissioned by the container orchestration engine 210 as needed. It should be understood that, as a general matter, an operating system of the processing node 106 exists outside the containers 208 . Thus, each container 208 interfaces with the same operating system kernel, reducing the overhead needed to execute multiple containers simultaneously. The containers 208 meanwhile may have no communication with one another outside of the determined methods of communication, reducing security concerns.
  • An abstraction engine 201 coordinates with the container orchestration engine 210 to handle configuration of the container(s) 208 , for example providing configurations for communications mechanisms and the various objects of the distributed system in accordance with an application specification.
  • the container orchestration engine 210 may be implemented as a KUBERNETES® system.
  • the abstraction engine 210 may be implemented as operators within KUBERNETES® that define custom resources that the KUBERNETES® system can handle directly, so that a definition of the distributed application can call the extended API of the abstraction engine 201 to instantiate components of the representation.
  • a video analytics application can perform real-time monitoring of a video stream, which may include monitoring a given area to determine whether specific individuals have entered the area.
  • the video analytics application can generate an alert or automated response to the detection of such an individual.
  • the application may include exemplary microservices such as video intake 304 , face detection 306 , face matching 308 , alerts manager 310 , and biometrics manager 312 .
  • a camera 302 generates visual data, such as a stream of images making up a video stream.
  • Video intake 304 processes this visual data and performs any appropriate filtering or formatting to generate frames that may be considered by downstream microservices.
  • Face detection 306 identifies faces within the frames of the video stream. This identification may include labeling the frame to indicate the presence of a face within the image and may further include coordinates for a bounding box of the face within the image. Face matching 308 may then connect the face image with information about a person shown in the image. This matching may draw on information from biometrics manager 312 , which may store profiles of people of interest. The profile may include biometric information, such as facial features that may be used to match with face images, as well as identifying information such as the person’s name and role.
  • a person’s role may include information regarding access authorization. For example, a person may be authorized to access a restricted area, or may be specifically forbidden from entering a restricted area.
  • the alerts manager 310 may generate an alert responsive to the detection of a person by face matching 308 . For example, an alert may indicate that an authorized person is present in the area, that a forbidden person is in the area, or that an unknown person is in the area.
  • a security system 312 may automatically respond to the alerts.
  • the response may include a security response, such as automatically locking or unlocking a door or other access point, sounding a visual and/or audible alarm, summoning security personnel, and requesting further authentication from the detected person.
  • multiple video streams can be processed at once.
  • multiple cameras 302 may generate respective video streams, and there may be respective microservices instances of video intake 304 , face detection 306 , and face matching 308 .
  • the various microservices may be implemented as containers 208 within a processing node 106 .
  • multiple microservices may be implemented on a single processing node 106 , for example using different respective containers 208 or by implementing multiple microservices within a single container 208 .
  • the microservices may be implemented using multiple different processing nodes 106 , with communications between the containers 208 of the different processing nodes 106 being handled over an appropriate network.
  • FIG. 3 B an exemplary distributed application is shown, including a set of interconnected microservices.
  • multiple types of data are fused to control physical access through a gate 368 .
  • a first camera 352 operates in the visual range of the electromagnetic spectrum, taking pictures of an environment.
  • a second camera 354 operates in the infrared range of the electromagnetic spectrum, generating thermal images of the same environment.
  • the visual information is first processed to perform person detection 356 and face recognition 358 .
  • the person detection 356 takes frames of the video and identifies the locations of people within each frame, while face recognition 358 locates faces within each frame and compares them to stored faces in a database of registered faces 359 .
  • This information is fused with thermal imaging information from the thermal camera 354 at temporal fusion 360 , where images taken from the visual camera 352 and from the thermal camera 354 at roughly the same time are correlated with one another.
  • Spatial fusion 362 identifies regions of the thermal images that correspond to regions where a person is detected in the visual images. Spatial fusion 362 accounts for the possibility that the two cameras may not be collocated, so that their images will show different respective views of the environment.
  • This fusion of data can be used to perform fever screening 364 . Because the thermal information indicates the temperature of an object, a person’s body temperature can be accurately determined. For people with a higher than normal body temperature, fever screening 364 may indicate that the person has a fever.
  • the face recognition 358 and fever screening 364 may be used in tandem to perform gate control 366 , where access to a controlled area is determined according to a set of security policies. For example, access may be limited to people whose faces match the one of the registered faces 359 . Access may further be barred to those individuals who show signs of a fever, for example to lessen the spread of disease. Responsive to these security policies, gate control 366 may operate the gate 368 to allow or deny access.
  • a data stream may be a flow of homogeneous discrete messages. Some streams have only data that is produced by sensors 402 , while others may include insights gained by analyzing and fusing data from the sensors 402 .
  • a given data stream may include multiple sensors 402 that each generate respective streams.
  • a sensor 402 is a device that produces raw data. Examples of sensors include cameras, location sensors (e.g., global positioning satellite sensors), environmental sensors such as temperature and pressure sensors, light detection and ranging (LIDAR) sensors, and radar sensors. Applications process and analyze raw data from sensors 402 to generate insights. Sensors 402 may have wired or wireless networking capability, or they may be physically attached to a computing device through an interface. Sensors 402 may be the beginning of a data stream and so represent the first stage of a distributed application. In the examples of FIGS. 3 A and 3 B , the camera 302 , 352 , and 354 may each represent a sensor 402 .
  • LIDAR light detection and ranging
  • a driver 404 may generate a data stream from a sensor’s output.
  • the driver 404 may perform any appropriate type of encoding or processing needed to generate the data stream, for example as video intake 304 may take the respective still images generated by the camera driver 302 and may encode them as a video bitstream.
  • Analytics unit 406 processes the data stream and/or generates an augmented data stream.
  • One or more analytics units 406 may be used to perform different functions.
  • An analytics unit may fuse multiple streams, for example accepting inputs from multiple sensors 402 and generating an output that is based on the multiple inputs, as shown in FIG. 3 B .
  • analytics unit 406 may include person detection 356 and face recognition 358 .
  • the analytics unit 406 subscribes to the data streams of the visual camera 352 and the thermal camera 354 , processes the streams, and generates an output augmented stream that has, for example, indications about recognized faces and temperature information.
  • the augmented stream may be used as inputs to other analytics units 406 or to actuators 408 , which can control a device using the information in the input streams.
  • each individual function may be performed by a separate analytics unit 406 .
  • Distinct analytics units 406 may subscribe to a same input stream, making it a simple matter to reuse a stream.
  • a database 407 may be used by the analytics unit 406 to help maintain a state of the application.
  • the biometrics manager 312 of the above example may include such a database to track biometric information of the various users.
  • the alerts manager 310 may perform the function of an actuator 408 , generating instructions that may be used by a device 410 .
  • a device 410 may be physical or virtual and may be controlled using insights derived by the data analysis. Examples of devices 410 include entry/exit gates, displays, light arrays, and graphical user interfaces. Following the example above, system security 312 may represent such a device 410 .
  • the device(s) 410 may have networking capabilities or may be connected directly to computing devices.
  • the device(s) 410 may be the ending point of a particular distributed application. Not all applications will necessarily have a corresponding device 410 .
  • a given device 410 may represent a sensor 402 and a device 410 if it generates data and can be controlled.
  • a camera 302 may produce a video stream, but may also allow remote management to configure camera parameters.
  • the developer may provide logic for generating a stream from a sensor 402 , for example using a driver 404 , analyzing a stream or fusing multiple streams using an analytics unit 406 , and/or controlling a device 410 using an actuator 408 .
  • This logic may be provided in the form of scripts that are interpreted by the abstraction engine 201 at run-time.
  • API application programming interface
  • APIs may be implemented using the idiom of a given programming language and may be simple and lightweight.
  • Some exemplary APIs include:
  • the DataX class may be used from applications written in different programming languages. For example, in GO®, the opaque DataX struct may be instantiated and its methods may be used. In C++, a static DataX library can linked with the application to use its functions.
  • a standalone utility may be used during application development and deployment.
  • a developer may interact with a client machine using the utility to deploy a distributed application.
  • Some functions that may be implemented include package and deploy drivers, analytics units, and actuators; register sensors, streams, and gadgets, including details like name, configuration, connections, resource requirements, etc.; publish a stream for unmanaged sensors; receive an existing stream; retrieve logs for components; list existing components; remove existing components; and execute a function locally on a client machine.
  • This framework can provide an abstraction that simplifies application development and that makes distributed stream processing applications easier to write. Data scientists, who may not be familiar with the details of the underlying container orchestration engine 210 , are able to use these abstractions to create a distributed application.
  • the abstractions may be implemented on top of a underlying container orchestration engine 210 in multiple ways.
  • a first option is to use the underlying container orchestration engine’s stock API server, where resources like Pods, StatefulSets, and ReplicaSets may be used to describe the application’s desired state.
  • the underlying container orchestration engine 210 handles deployment of the application and ensures that the current state matches the developer’s declarative description of the desired state. However, the underlying container orchestration engine 210 may not understand the workload that is running when the first option is used.
  • Another option is to extend the API server of the underlying container orchestration engine 210 (e.g., with abstraction engine 201 ) to add new functionality using operators, which may provide encapsulation of domain-specific knowledge of running a specific stream processing application.
  • the container system API may be extended by defining operators, allowing the abstraction engine to take advantage of the underlying container orchestration engine’s tools.
  • Custom resources may be defined as extensions of the container system’s API.
  • a resource may be an endpoint in the API that stores a collection of API objects of a certain kind.
  • the built-in Pods resource may include a collection of Pod objects.
  • a new operator may be defined as a software extension to the container orchestration engine 210 that makes use of custom resources to manage distributed applications and their components.
  • the operator pattern provides the logic for monitoring and maintaining resources that it defines, which means that the operator can take actions based on the resource’s state.
  • the operator may take actions needed to ensure that the distributed application is in a coherent state at all times. It also protects the system from users' actions that might bring the system into an unrecoverable incoherent state. For example, uninstalling a driver while a sensor is being used can bring the system into an incoherent state, but the operator can detect this and prevent it.
  • a user may provide the name, a script or image that includes the business logic, and optionally a configuration schema.
  • the operator may automatically create an image for executing the script.
  • the operator may automatically cascade the upgrade to running instances. The operator may ensure the coherency of the upgrade by enforcing that new configuration schemas are compatible with the schemas of the running instances.
  • the user may optionally provide a script to convert the configuration schemas, in which case the operator will accept the upgrade only if the script can be executed successfully for all running instances. If a user requests the deletion of a driver, analytics unit, or actuator, then the operator may check if the entity is currently in use and refuse the operation if there is already a running instance for that entity.
  • the operator When registering a sensor, the operator ensures that the associated driver is installed and that the driver configuration schema provided by the user is compatible with the configuration schema expected by the installed driver.
  • the operator may also maintain the driver’s running instance on appropriate computing resources as long as the sensor is registered. For example, if a sensor is physically attached to a computing node through a USB interface, then the operator may maintain a running instance on the same computing node.
  • a registered sensor may generate an output stream that has the same name as the sensor.
  • a user may request to create augmented streams by providing an analytics unit that generates the stream, the input streams, and a configuration for the analytics unit.
  • the operator checks that the analytics unit is available, that the configuration is compatible, and that the input streams are registered. Unless the user requests a fixed number of instances, the operator may then automatically scale the number of instances of the analytics unit. The operator may perform similar operations when the user registers a new gadget. Before deleting any sensors or streams, the operator may ensure that they are not inputs that are used to produce other streams.
  • Communication between microservices may be handled using, e.g., a message bus or distributed queue or by a communications protocol such as HTTP.
  • a scalable message queue like Neural Autonomic Transport System (NATS)
  • NATS Neural Autonomic Transport System
  • the operator manages the deployment and configuration of NATS, which then uses authentication and authorization so that only services associated with the abstraction engine 201 may connect to the NATS server, for example to subscribe and publish only on the defined and registered streams.
  • the operator that implements the abstraction engine 201 may further use a containerized application, called a sidecar, that the operator may run alongside each instance of a user-provided driver, analytics unit, or actuator.
  • the sidecar may automatically manage data communications with a message bus, including connection, subscription, and publishing messages.
  • the sidecar may further monitor the health of the user’s application, for example exposing metrics such as systems resource utilization and the number of messages received, dropped, and published.
  • the operator and the underlying container orchestration engine 210 may use those metrics to ensure that all the components are working correctly, and also to drive the auto-scaling process.
  • Block 502 extends the container orchestration engine 210 to include the abstraction engine 201 as described above.
  • Block 502 may use operators to add new functionality to the container orchestration engine 210 , for example to define objects that can be used to represent a distributed application so that the container orchestration engine 210 can interact with those objects.
  • a pre-existing container orchestration engine 210 can be extended to handle components like drivers, analytics units, sensors, devices, streams, and databases.
  • extending the container orchestration engine 210 may include using operators within a KUBERNETES® system.
  • the container orchestration engine 210 includes an operator 602 that extends the default functionality of the engine 210 .
  • pods 604 may be instantiated (e.g., as one or more containers) to perform the roles of drivers 604 1 , analytics units 604 2 , and actuators 604 3 .
  • Each pod 604 may include a script 606 , which may interact with other elements of a system, and a sidecar 608 , which may communicate with other pods 604 .
  • the computing device 700 is configured to perform application abstraction and execution.
  • the computing device 700 may be embodied as any type of computation or computer device capable of performing the functions described herein, including, without limitation, a computer, a server, a rack based server, a blade server, a workstation, a desktop computer, a laptop computer, a notebook computer, a tablet computer, a mobile computing device, a wearable computing device, a network appliance, a web appliance, a distributed computing system, a processor- based system, and/or a consumer electronic device. Additionally or alternatively, the computing device 700 may be embodied as a one or more compute sleds, memory sleds, or other racks, sleds, computing chassis, or other components of a physically disaggregated computing device.
  • the computing device 700 illustratively includes the processor 710 , an input/output subsystem 720 , a memory 730 , a data storage device 740 , and a communication subsystem 750 , and/or other components and devices commonly found in a server or similar computing device.
  • the computing device 700 may include other or additional components, such as those commonly found in a server computer (e.g., various input/output devices), in other embodiments. Additionally, in some embodiments, one or more of the illustrative components may be incorporated in, or otherwise form a portion of, another component.
  • the memory 730 or portions thereof, may be incorporated in the processor 710 in some embodiments.
  • the processor 710 may be embodied as any type of processor capable of performing the functions described herein.
  • the processor 710 may be embodied as a single processor, multiple processors, a Central Processing Unit(s) (CPU(s)), a Graphics Processing Unit(s) (GPU(s)), a single or multi-core processor(s), a digital signal processor(s), a microcontroller(s), or other processor(s) or processing/controlling circuit(s).
  • the memory 730 may be embodied as any type of volatile or non-volatile memory or data storage capable of performing the functions described herein. In operation, the memory 730 may store various data and software used during operation of the computing device 700 , such as operating systems, applications, programs, libraries, and drivers.
  • the memory 730 is communicatively coupled to the processor 710 via the I/O subsystem 720 , which may be embodied as circuitry and/or components to facilitate input/output operations with the processor 710 , the memory 730 , and other components of the computing device 700 .
  • the I/O subsystem 720 may be embodied as, or otherwise include, memory controller hubs, input/output control hubs, platform controller hubs, integrated control circuitry, firmware devices, communication links (e.g., point-to-point links, bus links, wires, cables, light guides, printed circuit board traces, etc.), and/or other components and subsystems to facilitate the input/output operations.
  • the I/O subsystem 720 may form a portion of a system-on-a-chip (SOC) and be incorporated, along with the processor 710 , the memory 730 , and other components of the computing device 700 , on a single integrated circuit chip.
  • SOC system-on-a-chip
  • the data storage device 740 may be embodied as any type of device or devices configured for short-term or long-term storage of data such as, for example, memory devices and circuits, memory cards, hard disk drives, solid state drives, or other data storage devices.
  • the data storage device 740 can store program code 740 A for generating an abstract representation of a distributed application and program code 740 B for executing the abstract representation, including implementing the abstraction engine 201 with a container orchestration engine 210 .
  • the communication subsystem 750 of the computing device 700 may be embodied as any network interface controller or other communication circuit, device, or collection thereof, capable of enabling communications between the computing device 700 and other remote devices over a network.
  • the communication subsystem 750 may be configured to use any one or more communication technology (e.g., wired or wireless communications) and associated protocols (e.g., Ethernet, InfiniBand®, Bluetooth®, Wi-Fi®, WiMAX, etc.) to effect such communication.
  • communication technology e.g., wired or wireless communications
  • protocols e.g., Ethernet, InfiniBand®, Bluetooth®, Wi-Fi®, WiMAX, etc.
  • the computing device 700 may also include one or more peripheral devices 760 .
  • the peripheral devices 760 may include any number of additional input/output devices, interface devices, and/or other peripheral devices.
  • the peripheral devices 760 may include a display, touch screen, graphics circuitry, keyboard, mouse, speaker system, microphone, network interface, and/or other input/output devices, interface devices, and/or peripheral devices.
  • computing device 700 may also include other elements (not shown), as readily contemplated by one of skill in the art, as well as omit certain elements.
  • various other sensors, input devices, and/or output devices can be included in computing device 700 , depending upon the particular implementation of the same, as readily understood by one of ordinary skill in the art.
  • various types of wireless and/or wired input and/or output devices can be used.
  • additional processors, controllers, memories, and so forth, in various configurations can also be utilized.
  • Embodiments described herein may be entirely hardware, entirely software or including both hardware and software elements.
  • the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
  • Embodiments may include a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
  • a computer-usable or computer readable medium may include any apparatus that stores, communicates, propagates, or transports the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the medium can be magnetic, optical, electronic, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
  • the medium may include a computer-readable storage medium such as a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk, etc.
  • Each computer program may be tangibly stored in a machine-readable storage media or device (e.g., program memory or magnetic disk) readable by a general or special purpose programmable computer, for configuring and controlling operation of a computer when the storage media or device is read by the computer to perform the procedures described herein.
  • the inventive system may also be considered to be embodied in a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
  • a data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus.
  • the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution.
  • I/O devices including but not limited to keyboards, displays, pointing devices, etc. may be coupled to the system either directly or through intervening I/O controllers.
  • Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks.
  • Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
  • the term “hardware processor subsystem” or “hardware processor” can refer to a processor, memory, software or combinations thereof that cooperate to perform one or more specific tasks.
  • the hardware processor subsystem can include one or more data processing elements (e.g., logic circuits, processing circuits, instruction execution devices, etc.).
  • the one or more data processing elements can be included in a central processing unit, a graphics processing unit, and/or a separate processor- or computing element-based controller (e.g., logic gates, etc.).
  • the hardware processor subsystem can include one or more on-board memories (e.g., caches, dedicated memory arrays, read only memory, etc.).
  • the hardware processor subsystem can include one or more memories that can be on or off board or that can be dedicated for use by the hardware processor subsystem (e.g., ROM, RAM, basic input/output system (BIOS), etc.).
  • the hardware processor subsystem can include and execute one or more software elements.
  • the one or more software elements can include an operating system and/or one or more applications and/or specific code to achieve a specified result.
  • the hardware processor subsystem can include dedicated, specialized circuitry that performs one or more electronic processing functions to achieve a specified result.
  • Such circuitry can include one or more application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), and/or programmable logic arrays (PLAs).
  • ASICs application-specific integrated circuits
  • FPGAs field-programmable gate arrays
  • PDAs programmable logic arrays
  • such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C).
  • This may be extended for as many items listed.

Abstract

Methods and systems for executing an application include extending a container orchestration system application programming interface (API) to handle objects that specify components of an application. An application representation is executed using the extended container orchestration system API, including the instantiation of one or more services that define a data stream path from a sensor to a device.

Description

    RELATED APPLICATION INFORMATION
  • This application claims priority to U.S. Pat. Application No. 63/232,562, filed on Aug. 12, 2021, incorporated herein by reference in its entirety.
  • BACKGROUND Technical Field
  • The present invention relates to data management in distributed computing systems, and, more particularly, to developing complex services using data streams.
  • Description of the Related Art
  • Smart sensors collect information from a variety of sources, and the exponential growth in the number of such sensors has caused a similar growth in the number of data streams that need to be managed.
  • SUMMARY
  • A method for executing an application includes extending a container orchestration system application programming interface (API) to handle objects that specify components of an application. An application representation is executed using the extended container orchestration system API, including the instantiation of one or more services that define a data stream path from a sensor to a device.
  • A system for executing an application includes a hardware processor and a memory that stores a computer program. When executed by the hardware processor, the computer program causes the hardware processor to extend a container orchestration system application programming interface (API) to handle objects that specify components of an application and to execute an application representation using the extended container orchestration system API, including the instantiation of one or more services that define a data stream path from a sensor to a device.
  • These and other features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
  • BRIEF DESCRIPTION OF DRAWINGS
  • The disclosure will provide details in the following description of preferred embodiments with reference to the following figures wherein:
  • FIG. 1 is a block diagram of a distributed computing system, in accordance with an embodiment of the present invention;
  • FIG. 2 is a block diagram of a processing node in a distributed computing system, in accordance with an embodiment of the present invention;
  • FIG. 3A is a block diagram of a distributed computing application, in accordance with an embodiment of the present invention;
  • FIG. 3B is a block diagram of a distributed computing application, in accordance with an embodiment of the present invention;
  • FIG. 4 is a block diagram of a data stream representation of a distributed computing application, in accordance with an embodiment of the present invention;
  • FIG. 5 is a block/flow diagram of a method of executing a distributed computing application using an extension to a container orchestration engine, in accordance with an embodiment of the present invention;
  • FIG. 6 is a block diagram of a hardware processing system that executes a distributed computing application, in accordance with an embodiment of the present invention; and
  • FIG. 7 is a block diagram of an extended container orchestration engine that executes a distributed computing application, in accordance with an embodiment of the present invention.
  • DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
  • Building multi-sensor distributed applications is a complex programming task. Application developers may express the functionality of an application as a collection of interacting microservices that form a processing pipeline. The developers design and specify appropriate data communication systems between the distributed microservices, which has a significant impact on the performance of the application. The underlying hardware that runs the microservices plays a role as well, and databases may need to be managed to maintain the state of any microservices that need it.
  • Scaling multi-sensor stream processing applications, while ensuring reliable operation in the face of software and hardware failures, is particularly challenging as the number of data streams increases. Barriers to developer productivity may be reduced by providing a layer of abstraction that enables easy exchange, transformation, and management of data streams in complex, multi-sensor distributed stream processing applications. Applications may be designed by defining and registering abstract objects, such as drivers, sensors, streams, analytics units, actuators, and devices, which together can be used to specify the overall application pipeline.
  • During runtime, appropriate data communication mechanisms among the application’s objects may be automatically determined, for example including network connections and serialization and deserialization of data streams. Developers need only provide logic for different types of analytics processing and the system will automatically handle application-specific allocation, scheduling, and execution on the underlying distributed computing resources, as well as providing auto-scaling and operational reliability.
  • Referring now to FIG. 1 , a diagram of a distributed computing system 100 is shown. A user 102 may execute a workload on the distribution computing system 100. To this end, the user 102 communicates with manager system 104. The user 102 supplies information regarding the workload, including the number and type of processing nodes 106 that will be needed to execute the workload.
  • The information provided to the manager system 104 includes, for example, a number of processing nodes 106, a processor type, an operating system, an execution environment, storage capacity, random access memory capacity, network bandwidth, and any other points that may be needed for the workload. The user 102 can furthermore provide images or containers to the manager system 104 for storage in a registry there.
  • The distributed computing system 100 may include many thousands of processing nodes 106, each of which can be idle or busy in accordance with the workloads being executed by the distributed computing system 100 at any given time. Although a single manager system 104 is shown, there may be multiple such manager systems 104, with multiple registries distributed across the distributed computing system 100.
  • Before and during execution of the workload, the manager system 104 determines which processing nodes 106 will implement the microservices that make up the corresponding application. The manager system 104 may configure the processing nodes 106, for example based on node and resource availability at the time of provisioning. The microservices may be hosted entirely on separate processing nodes 106, or any number of microservices may be collocated at a same processing node 106. The manager system 104 and the distributed computing system 100 can handle multiple different workloads from multiple different users 102, such that the availability of particular resources will depend on what is happening in the distributed computing system 100 generally.
  • Provisioning, as the term is used herein, refers to the process by which resources in a distributed computing system 100 are allocated to a user 102 and are prepared for execution. Thus, provisioning includes the determinations made by the manager system 104 as to which processing elements 106 will be used for the workload as well as the transmission of images and any configuration steps that are needed to prepare the processing nodes 106 for execution of the workload. The configuration may include, for example, identifying communications methods to be used by the microservices.
  • Referring now to FIG. 2 , additional detail on a processing node 106 is shown. The processing node 106 includes a hardware processor 202, a memory 204, and a network interface 206. The network interface 206 may be configured to communicate with the manager system 104, with the user 102, and with other processing nodes 106 as needed, using any appropriate communications medium and protocol. The processing node 106 also includes one or more functional modules that may, in some embodiments, be implemented as software that is stored in the memory 204 and that may be executed by the hardware processor 202. In other embodiments, one or more of the functional modules may be implemented as one or more discrete hardware components in the form of, e.g., application-specific integrated chips or field programmable gate arrays.
  • The processing node 106 may include one or more containers 208. It is specifically contemplated that each container 208 represents a distinct operating environment. The containers 208 each include a set of software applications, configuration files, workload datasets, and any other information or software needed to execute a specific workload. These containers 208 may implement one or more microservices for a distributed application.
  • The containers 208 are stored in memory 204 and are instantiated and decommissioned by the container orchestration engine 210 as needed. It should be understood that, as a general matter, an operating system of the processing node 106 exists outside the containers 208. Thus, each container 208 interfaces with the same operating system kernel, reducing the overhead needed to execute multiple containers simultaneously. The containers 208 meanwhile may have no communication with one another outside of the determined methods of communication, reducing security concerns.
  • An abstraction engine 201 coordinates with the container orchestration engine 210 to handle configuration of the container(s) 208, for example providing configurations for communications mechanisms and the various objects of the distributed system in accordance with an application specification.
  • In some examples, the container orchestration engine 210 may be implemented as a KUBERNETES® system. The abstraction engine 210 may be implemented as operators within KUBERNETES® that define custom resources that the KUBERNETES® system can handle directly, so that a definition of the distributed application can call the extended API of the abstraction engine 201 to instantiate components of the representation.
  • Referring now to FIG. 3A, an exemplary distributed application is shown, including a set of interconnected microservices. In this example, a video analytics application can perform real-time monitoring of a video stream, which may include monitoring a given area to determine whether specific individuals have entered the area. The video analytics application can generate an alert or automated response to the detection of such an individual.
  • The application may include exemplary microservices such as video intake 304, face detection 306, face matching 308, alerts manager 310, and biometrics manager 312. A camera 302 generates visual data, such as a stream of images making up a video stream. Video intake 304 processes this visual data and performs any appropriate filtering or formatting to generate frames that may be considered by downstream microservices.
  • Face detection 306 identifies faces within the frames of the video stream. This identification may include labeling the frame to indicate the presence of a face within the image and may further include coordinates for a bounding box of the face within the image. Face matching 308 may then connect the face image with information about a person shown in the image. This matching may draw on information from biometrics manager 312, which may store profiles of people of interest. The profile may include biometric information, such as facial features that may be used to match with face images, as well as identifying information such as the person’s name and role.
  • In the case of a security system, a person’s role may include information regarding access authorization. For example, a person may be authorized to access a restricted area, or may be specifically forbidden from entering a restricted area. The alerts manager 310 may generate an alert responsive to the detection of a person by face matching 308. For example, an alert may indicate that an authorized person is present in the area, that a forbidden person is in the area, or that an unknown person is in the area.
  • A security system 312 may automatically respond to the alerts. The response may include a security response, such as automatically locking or unlocking a door or other access point, sounding a visual and/or audible alarm, summoning security personnel, and requesting further authentication from the detected person.
  • In a distributed computing system, multiple video streams can be processed at once. For example, multiple cameras 302 may generate respective video streams, and there may be respective microservices instances of video intake 304, face detection 306, and face matching 308.
  • The various microservices may be implemented as containers 208 within a processing node 106. In some cases, multiple microservices may be implemented on a single processing node 106, for example using different respective containers 208 or by implementing multiple microservices within a single container 208. In some cases, the microservices may be implemented using multiple different processing nodes 106, with communications between the containers 208 of the different processing nodes 106 being handled over an appropriate network.
  • Referring now to FIG. 3B, an exemplary distributed application is shown, including a set of interconnected microservices. In this application, multiple types of data are fused to control physical access through a gate 368. For example, a first camera 352 operates in the visual range of the electromagnetic spectrum, taking pictures of an environment. A second camera 354 operates in the infrared range of the electromagnetic spectrum, generating thermal images of the same environment.
  • The visual information is first processed to perform person detection 356 and face recognition 358. The person detection 356 takes frames of the video and identifies the locations of people within each frame, while face recognition 358 locates faces within each frame and compares them to stored faces in a database of registered faces 359.
  • This information is fused with thermal imaging information from the thermal camera 354 at temporal fusion 360, where images taken from the visual camera 352 and from the thermal camera 354 at roughly the same time are correlated with one another. Spatial fusion 362 identifies regions of the thermal images that correspond to regions where a person is detected in the visual images. Spatial fusion 362 accounts for the possibility that the two cameras may not be collocated, so that their images will show different respective views of the environment.
  • This fusion of data can be used to perform fever screening 364. Because the thermal information indicates the temperature of an object, a person’s body temperature can be accurately determined. For people with a higher than normal body temperature, fever screening 364 may indicate that the person has a fever.
  • The face recognition 358 and fever screening 364 may be used in tandem to perform gate control 366, where access to a controlled area is determined according to a set of security policies. For example, access may be limited to people whose faces match the one of the registered faces 359. Access may further be barred to those individuals who show signs of a fever, for example to lessen the spread of disease. Responsive to these security policies, gate control 366 may operate the gate 368 to allow or deny access.
  • Referring now to FIG. 4 , a representation of a distributed application is shown, which may be configured to process a data stream. A data stream may be a flow of homogeneous discrete messages. Some streams have only data that is produced by sensors 402, while others may include insights gained by analyzing and fusing data from the sensors 402. A given data stream may include multiple sensors 402 that each generate respective streams.
  • A sensor 402 is a device that produces raw data. Examples of sensors include cameras, location sensors (e.g., global positioning satellite sensors), environmental sensors such as temperature and pressure sensors, light detection and ranging (LIDAR) sensors, and radar sensors. Applications process and analyze raw data from sensors 402 to generate insights. Sensors 402 may have wired or wireless networking capability, or they may be physically attached to a computing device through an interface. Sensors 402 may be the beginning of a data stream and so represent the first stage of a distributed application. In the examples of FIGS. 3A and 3B, the camera 302, 352, and 354 may each represent a sensor 402.
  • A driver 404 may generate a data stream from a sensor’s output. The driver 404 may perform any appropriate type of encoding or processing needed to generate the data stream, for example as video intake 304 may take the respective still images generated by the camera driver 302 and may encode them as a video bitstream.
  • Analytics unit 406 processes the data stream and/or generates an augmented data stream. One or more analytics units 406 may be used to perform different functions. An analytics unit may fuse multiple streams, for example accepting inputs from multiple sensors 402 and generating an output that is based on the multiple inputs, as shown in FIG. 3B.
  • For example, analytics unit 406 may include person detection 356 and face recognition 358. The analytics unit 406 subscribes to the data streams of the visual camera 352 and the thermal camera 354, processes the streams, and generates an output augmented stream that has, for example, indications about recognized faces and temperature information. The augmented stream may be used as inputs to other analytics units 406 or to actuators 408, which can control a device using the information in the input streams. In some cases, each individual function may be performed by a separate analytics unit 406. Distinct analytics units 406 may subscribe to a same input stream, making it a simple matter to reuse a stream.
  • In some cases, a database 407 may be used by the analytics unit 406 to help maintain a state of the application. For example, the biometrics manager 312 of the above example may include such a database to track biometric information of the various users.
  • In this example, the alerts manager 310 may perform the function of an actuator 408, generating instructions that may be used by a device 410. Such a device 410 may be physical or virtual and may be controlled using insights derived by the data analysis. Examples of devices 410 include entry/exit gates, displays, light arrays, and graphical user interfaces. Following the example above, system security 312 may represent such a device 410.
  • The device(s) 410 may have networking capabilities or may be connected directly to computing devices. The device(s) 410 may be the ending point of a particular distributed application. Not all applications will necessarily have a corresponding device 410. Also, a given device 410 may represent a sensor 402 and a device 410 if it generates data and can be controlled. For example, a camera 302 may produce a video stream, but may also allow remote management to configure camera parameters.
  • When programming such an application, the developer may provide logic for generating a stream from a sensor 402, for example using a driver 404, analyzing a stream or fusing multiple streams using an analytics unit 406, and/or controlling a device 410 using an actuator 408. This logic may be provided in the form of scripts that are interpreted by the abstraction engine 201 at run-time.
  • When creating a distributed application, a programmer may use an application programming interface (API) to handle configuration, receiving data, and publishing data. APIs may be implemented using the idiom of a given programming language and may be simple and lightweight. Some exemplary APIs include:
    • get-configuration: To receive the configuration provided at the rime of registration of a particular sensor or stream.
    • next: To receive the first available message from any of the input streams. This function returns the message and the name of the stream which produced the message. When there are multiple input streams, the stream’s name can be used to identify he source of the input message. Drivers may not be able to use this function as they do not have an input stream.
    • emit: To publish a message in the output stream. All messages from a particular driver or analytics unit may go into the same output stream.
    • A “datax” module may be implemented in PYTHON® with a “DataX” class, having the following methods:
    • get-configuration: Returns a dict that has the content of the configuration file provided by the developer. If no configuration file is provided when registering the sensor or the stream, an empty dict may be returned.
    • next: Returns a tuple that has the name of the stream (str) and the message as a dict, where the keys are str.
    • emit(message): Message may be of type dict and the keys of the dict may be str, while the values can be any plain PYTHON® object.
  • The DataX class may be used from applications written in different programming languages. For example, in GO®, the opaque DataX struct may be instantiated and its methods may be used. In C++, a static DataX library can linked with the application to use its functions.
  • A standalone utility may be used during application development and deployment. A developer may interact with a client machine using the utility to deploy a distributed application. Some functions that may be implemented include package and deploy drivers, analytics units, and actuators; register sensors, streams, and gadgets, including details like name, configuration, connections, resource requirements, etc.; publish a stream for unmanaged sensors; receive an existing stream; retrieve logs for components; list existing components; remove existing components; and execute a function locally on a client machine.
  • This framework can provide an abstraction that simplifies application development and that makes distributed stream processing applications easier to write. Data scientists, who may not be familiar with the details of the underlying container orchestration engine 210, are able to use these abstractions to create a distributed application.
  • The abstractions may be implemented on top of a underlying container orchestration engine 210 in multiple ways. A first option is to use the underlying container orchestration engine’s stock API server, where resources like Pods, StatefulSets, and ReplicaSets may be used to describe the application’s desired state. The underlying container orchestration engine 210 handles deployment of the application and ensures that the current state matches the developer’s declarative description of the desired state. However, the underlying container orchestration engine 210 may not understand the workload that is running when the first option is used. Another option is to extend the API server of the underlying container orchestration engine 210 (e.g., with abstraction engine 201) to add new functionality using operators, which may provide encapsulation of domain-specific knowledge of running a specific stream processing application. The container system API may be extended by defining operators, allowing the abstraction engine to take advantage of the underlying container orchestration engine’s tools.
  • Custom resources may be defined as extensions of the container system’s API. A resource may be an endpoint in the API that stores a collection of API objects of a certain kind. For example, the built-in Pods resource may include a collection of Pod objects. Once a custom resource is installed, its objects can be accessed through the underlying container orchestration engine 210 in the same manner as built-in resources. Components like drivers, analytics units, actuators, sensors, gadgets, streams, and databases may thereby be implemented in the underlying container orchestration engine 210 as custom resources.
  • A new operator may be defined as a software extension to the container orchestration engine 210 that makes use of custom resources to manage distributed applications and their components. The operator pattern provides the logic for monitoring and maintaining resources that it defines, which means that the operator can take actions based on the resource’s state.
  • For example, the operator may take actions needed to ensure that the distributed application is in a coherent state at all times. It also protects the system from users' actions that might bring the system into an unrecoverable incoherent state. For example, uninstalling a driver while a sensor is being used can bring the system into an incoherent state, but the operator can detect this and prevent it.
  • To register a driver, an analytics unit, or an actuator, a user may provide the name, a script or image that includes the business logic, and optionally a configuration schema. When the user provides just a script, the operator may automatically create an image for executing the script. When the user requests an upgrade of drivers, analytics units, and/or actuators, the operator may automatically cascade the upgrade to running instances. The operator may ensure the coherency of the upgrade by enforcing that new configuration schemas are compatible with the schemas of the running instances. When performing an upgrade, the user may optionally provide a script to convert the configuration schemas, in which case the operator will accept the upgrade only if the script can be executed successfully for all running instances. If a user requests the deletion of a driver, analytics unit, or actuator, then the operator may check if the entity is currently in use and refuse the operation if there is already a running instance for that entity.
  • When registering a sensor, the operator ensures that the associated driver is installed and that the driver configuration schema provided by the user is compatible with the configuration schema expected by the installed driver. The operator may also maintain the driver’s running instance on appropriate computing resources as long as the sensor is registered. For example, if a sensor is physically attached to a computing node through a USB interface, then the operator may maintain a running instance on the same computing node. A registered sensor may generate an output stream that has the same name as the sensor.
  • A user may request to create augmented streams by providing an analytics unit that generates the stream, the input streams, and a configuration for the analytics unit. The operator checks that the analytics unit is available, that the configuration is compatible, and that the input streams are registered. Unless the user requests a fixed number of instances, the operator may then automatically scale the number of instances of the analytics unit. The operator may perform similar operations when the user registers a new gadget. Before deleting any sensors or streams, the operator may ensure that they are not inputs that are used to produce other streams.
  • Communication between microservices may be handled using, e.g., a message bus or distributed queue or by a communications protocol such as HTTP. However, a scalable message queue, like Neural Autonomic Transport System (NATS), can automatically set up communications mechanisms among entities like drivers, analytics units, and actuators. The operator manages the deployment and configuration of NATS, which then uses authentication and authorization so that only services associated with the abstraction engine 201 may connect to the NATS server, for example to subscribe and publish only on the defined and registered streams.
  • The operator that implements the abstraction engine 201 may further use a containerized application, called a sidecar, that the operator may run alongside each instance of a user-provided driver, analytics unit, or actuator. The sidecar may automatically manage data communications with a message bus, including connection, subscription, and publishing messages. The sidecar may further monitor the health of the user’s application, for example exposing metrics such as systems resource utilization and the number of messages received, dropped, and published. The operator and the underlying container orchestration engine 210 may use those metrics to ensure that all the components are working correctly, and also to drive the auto-scaling process.
  • Referring now to FIG. 5 , a method for creating and deploying a distributed application is shown. Block 502 extends the container orchestration engine 210 to include the abstraction engine 201 as described above. Block 502 may use operators to add new functionality to the container orchestration engine 210, for example to define objects that can be used to represent a distributed application so that the container orchestration engine 210 can interact with those objects. Thus, a pre-existing container orchestration engine 210 can be extended to handle components like drivers, analytics units, sensors, devices, streams, and databases. In some examples, extending the container orchestration engine 210 may include using operators within a KUBERNETES® system.
  • In block 504, a software developer creates a representation of the distributed application, for example defining a data stream that begins at a sensor 402 and that ends at a device 410, with any appropriate drivers 404, analytics units 406, and actuators 408 in between. The abstract representation may include any appropriate configuration information for physical components, including designating the components to be used and how those components are to be initialized and shut down. The analytics units 406 may include instructions for any appropriate type of processing, accepting information of one type and outputting augmented information. The abstract representation may identify relationships between each component of the application, for example specifying that the output of a given component forms the input of a next component.
  • The abstract representation may be stored in a format that can be read by the abstraction engine 201, for example with a set of instructions for the extended API of the container orchestration engine 210. The abstract representation may then be distributed to the processing nodes 106.
  • Block 506 implements the distributed application using the abstraction engine 201 and the container orchestration engine 210. Because the abstraction engine 201 adds functionality to the API of the container orchestration engine 210, the container orchestration engine 210 can treat the objects of the abstract representation as native elements, instantiating whatever containers 208 are called for to implement specified microservices.
  • Referring now to FIG. 6 , additional detail on an implementation of a distributed application is shown. The container orchestration engine 210 includes an operator 602 that extends the default functionality of the engine 210. By interfacing with the operator 602, pods 604 may be instantiated (e.g., as one or more containers) to perform the roles of drivers 604 1, analytics units 604 2, and actuators 604 3. Each pod 604 may include a script 606, which may interact with other elements of a system, and a sidecar 608, which may communicate with other pods 604.
  • Thus, information from a sensor 612 may be processed by the script 606 of a diver pod 604 1 to perform processing on the input information. The sidecar 608 of the driver pod 604 1 may communicate with the sidecar 608 of an analytics unit pod 604 2 via a message bus 610 (or any other appropriate communications mechanism). The analytics unit pod 604 2 may take the information from the driver pod 604 1 as input and may generate an augmented data stream, which may be sent to actuator pod 604 3 via the message bus 610. The script 606 of the actuator pod 604 3 may then make a determination regarding an action to take, based on the augmented data stream, and may issue commands to a device 614.
  • Referring now to FIG. 7 , an exemplary computing device 700 is shown, in accordance with an embodiment of the present invention. The computing device 700 is configured to perform application abstraction and execution.
  • The computing device 700 may be embodied as any type of computation or computer device capable of performing the functions described herein, including, without limitation, a computer, a server, a rack based server, a blade server, a workstation, a desktop computer, a laptop computer, a notebook computer, a tablet computer, a mobile computing device, a wearable computing device, a network appliance, a web appliance, a distributed computing system, a processor- based system, and/or a consumer electronic device. Additionally or alternatively, the computing device 700 may be embodied as a one or more compute sleds, memory sleds, or other racks, sleds, computing chassis, or other components of a physically disaggregated computing device.
  • As shown in FIG. 7 , the computing device 700 illustratively includes the processor 710, an input/output subsystem 720, a memory 730, a data storage device 740, and a communication subsystem 750, and/or other components and devices commonly found in a server or similar computing device. The computing device 700 may include other or additional components, such as those commonly found in a server computer (e.g., various input/output devices), in other embodiments. Additionally, in some embodiments, one or more of the illustrative components may be incorporated in, or otherwise form a portion of, another component. For example, the memory 730, or portions thereof, may be incorporated in the processor 710 in some embodiments.
  • The processor 710 may be embodied as any type of processor capable of performing the functions described herein. The processor 710 may be embodied as a single processor, multiple processors, a Central Processing Unit(s) (CPU(s)), a Graphics Processing Unit(s) (GPU(s)), a single or multi-core processor(s), a digital signal processor(s), a microcontroller(s), or other processor(s) or processing/controlling circuit(s).
  • The memory 730 may be embodied as any type of volatile or non-volatile memory or data storage capable of performing the functions described herein. In operation, the memory 730 may store various data and software used during operation of the computing device 700, such as operating systems, applications, programs, libraries, and drivers. The memory 730 is communicatively coupled to the processor 710 via the I/O subsystem 720, which may be embodied as circuitry and/or components to facilitate input/output operations with the processor 710, the memory 730, and other components of the computing device 700. For example, the I/O subsystem 720 may be embodied as, or otherwise include, memory controller hubs, input/output control hubs, platform controller hubs, integrated control circuitry, firmware devices, communication links (e.g., point-to-point links, bus links, wires, cables, light guides, printed circuit board traces, etc.), and/or other components and subsystems to facilitate the input/output operations. In some embodiments, the I/O subsystem 720 may form a portion of a system-on-a-chip (SOC) and be incorporated, along with the processor 710, the memory 730, and other components of the computing device 700, on a single integrated circuit chip.
  • The data storage device 740 may be embodied as any type of device or devices configured for short-term or long-term storage of data such as, for example, memory devices and circuits, memory cards, hard disk drives, solid state drives, or other data storage devices. The data storage device 740 can store program code 740A for generating an abstract representation of a distributed application and program code 740B for executing the abstract representation, including implementing the abstraction engine 201 with a container orchestration engine 210. The communication subsystem 750 of the computing device 700 may be embodied as any network interface controller or other communication circuit, device, or collection thereof, capable of enabling communications between the computing device 700 and other remote devices over a network. The communication subsystem 750 may be configured to use any one or more communication technology (e.g., wired or wireless communications) and associated protocols (e.g., Ethernet, InfiniBand®, Bluetooth®, Wi-Fi®, WiMAX, etc.) to effect such communication.
  • As shown, the computing device 700 may also include one or more peripheral devices 760. The peripheral devices 760 may include any number of additional input/output devices, interface devices, and/or other peripheral devices. For example, in some embodiments, the peripheral devices 760 may include a display, touch screen, graphics circuitry, keyboard, mouse, speaker system, microphone, network interface, and/or other input/output devices, interface devices, and/or peripheral devices.
  • Of course, the computing device 700 may also include other elements (not shown), as readily contemplated by one of skill in the art, as well as omit certain elements. For example, various other sensors, input devices, and/or output devices can be included in computing device 700, depending upon the particular implementation of the same, as readily understood by one of ordinary skill in the art. For example, various types of wireless and/or wired input and/or output devices can be used. Moreover, additional processors, controllers, memories, and so forth, in various configurations can also be utilized. These and other variations of the processing system 700 are readily contemplated by one of ordinary skill in the art given the teachings of the present invention provided herein.
  • Embodiments described herein may be entirely hardware, entirely software or including both hardware and software elements. In a preferred embodiment, the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
  • Embodiments may include a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. A computer-usable or computer readable medium may include any apparatus that stores, communicates, propagates, or transports the program for use by or in connection with the instruction execution system, apparatus, or device. The medium can be magnetic, optical, electronic, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. The medium may include a computer-readable storage medium such as a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk, etc.
  • Each computer program may be tangibly stored in a machine-readable storage media or device (e.g., program memory or magnetic disk) readable by a general or special purpose programmable computer, for configuring and controlling operation of a computer when the storage media or device is read by the computer to perform the procedures described herein. The inventive system may also be considered to be embodied in a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
  • A data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution. Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) may be coupled to the system either directly or through intervening I/O controllers.
  • Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
  • As employed herein, the term “hardware processor subsystem” or “hardware processor” can refer to a processor, memory, software or combinations thereof that cooperate to perform one or more specific tasks. In useful embodiments, the hardware processor subsystem can include one or more data processing elements (e.g., logic circuits, processing circuits, instruction execution devices, etc.). The one or more data processing elements can be included in a central processing unit, a graphics processing unit, and/or a separate processor- or computing element-based controller (e.g., logic gates, etc.). The hardware processor subsystem can include one or more on-board memories (e.g., caches, dedicated memory arrays, read only memory, etc.). In some embodiments, the hardware processor subsystem can include one or more memories that can be on or off board or that can be dedicated for use by the hardware processor subsystem (e.g., ROM, RAM, basic input/output system (BIOS), etc.).
  • In some embodiments, the hardware processor subsystem can include and execute one or more software elements. The one or more software elements can include an operating system and/or one or more applications and/or specific code to achieve a specified result.
  • In other embodiments, the hardware processor subsystem can include dedicated, specialized circuitry that performs one or more electronic processing functions to achieve a specified result. Such circuitry can include one or more application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), and/or programmable logic arrays (PLAs).
  • These and other variations of a hardware processor subsystem are also contemplated in accordance with embodiments of the present invention.
  • Reference in the specification to “one embodiment” or “an embodiment” of the present invention, as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment”, as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment. However, it is to be appreciated that features of one or more embodiments can be combined given the teachings of the present invention provided herein.
  • It is to be appreciated that the use of any of the following "/", "and/or", and "at least one of'’, for example, in the cases of "A/B", "A and/or B" and "at least one of A and B", is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended for as many items listed.
  • The foregoing is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the present invention and that those skilled in the art may implement various modifications without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention. Having thus described aspects of the invention, with the details and particularity required by the patent laws, what is claimed and desired protected by Letters Patent is set forth in the appended claims.

Claims (20)

What is claimed is:
1. A computer-implemented method for executing an application, comprising:
extending a container orchestration system application programming interface (API) to handle objects that specify components of an application; and
executing an application representation using the extended container orchestration system API, including the instantiation of one or more services that define a data stream path from a sensor to a device.
2. The computer-implemented method of claim 1, wherein the container orchestration system uses a KUBERNETES® container management system that handles execution of the application representation on one or more physical processing nodes.
3. The computer-implemented method of claim 2, wherein extending the container orchestration system API includes adding one or more functions to the container orchestration system API using a KUBERNETES® operator.
4. The computer-implemented method of claim 3, wherein executing the application representation includes executing functions of the extended container orchestration system API that are specified by the application representation.
5. The computer-implemented method of claim 1, wherein the container orchestration system instantiates the one or more microservices as pods that include a script to perform processing on a data stream and a sidecar that communicates with other pods in the application.
6. The computer-implemented method of claim 1, wherein the data stream path includes the sensor, a driver that defines how information from the sensor is handled, the device, and an actuator that defines how inputs are applied to the device.
7. The computer-implemented method of claim 6, wherein the data stream path further includes a data analytics unit that performs data processing on an input and that generates an augmented output.
8. The computer-implemented method of claim 7, wherein the data analytics unit fuses inputs from multiple sensors to generate the augmented output.
9. The computer-implemented method of claim 7, wherein executing the application representation includes instantiating a database that maintains state information for the data analytics unit.
10. The computer-implemented method of claim 1, wherein instantiation of the one or more services includes instantiating microservices as containers in one or more hardware processing nodes and establishing communications paths between microservices along the data stream path.
11. The computer-implemented method of claim 1, wherein the data stream path includes one or more reusable data streams that can be accessed by a service of a second application.
12. A system for executing an application, comprising:
a hardware processor;
a memory that stores a computer program, which, when executed by the hardware processor, causes the hardware processor to:
extend a container orchestration system application programming interface (API) to handle objects that specify components of an application; and
execute an application representation using the extended container orchestration system API, including the instantiation of one or more services that define a data stream path from a sensor to a device.
13. The system of claim 12, wherein the container orchestration system uses a KUBERNETES® container management system that handles execution of the application representation on one or more physical processing nodes.
14. The system of claim 13, wherein the hardware processor is further configured to add one or more functions to the container orchestration system API using a KUBERNETES® operator.
15. The system of claim 14, wherein the hardware processor is further configured to execute functions of the extended container orchestration system API that are specified by the application representation.
16. The system of claim 12, wherein the container orchestration system instantiates the one or more microservices as pods that include a script to perform processing on a data stream and a sidecar that communicates with other pods in the application.
17. The system of claim 12, wherein the data stream path includes the sensor, a driver that defines how information from the sensor is handled, the device, and an actuator that defines how inputs are applied to the device.
18. The system of claim 17, wherein the data stream path further includes a data analytics unit that performs data processing on an input and that generates an augmented output.
19. The system of claim 18, wherein the data analytics unit fuses inputs from multiple sensors to generate the augmented output.
20. The system of claim 18, wherein the hardware processor is further configured to instantiate a database that maintains state information for the data analytics unit.
US17/885,115 2021-08-12 2022-08-10 Data exchange and transformation in stream computing systems Pending US20230048581A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/885,115 US20230048581A1 (en) 2021-08-12 2022-08-10 Data exchange and transformation in stream computing systems

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163232562P 2021-08-12 2021-08-12
US17/885,115 US20230048581A1 (en) 2021-08-12 2022-08-10 Data exchange and transformation in stream computing systems

Publications (1)

Publication Number Publication Date
US20230048581A1 true US20230048581A1 (en) 2023-02-16

Family

ID=85176318

Family Applications (1)

Application Number Title Priority Date Filing Date
US17/885,115 Pending US20230048581A1 (en) 2021-08-12 2022-08-10 Data exchange and transformation in stream computing systems

Country Status (1)

Country Link
US (1) US20230048581A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190065323A1 (en) * 2017-08-25 2019-02-28 Vmware, Inc. Containerized application snapshots
US20210240540A1 (en) * 2020-02-03 2021-08-05 International Business Machines Corporation Serverless platform request routing
US20220156631A1 (en) * 2020-11-17 2022-05-19 International Business Machines Corporation Machine-learning model to predict probability of success of an operator in a paas cloud enviornment
US20220188149A1 (en) * 2020-12-15 2022-06-16 International Business Machines Corporation Distributed multi-environment stream computing
US20220283792A1 (en) * 2021-03-03 2022-09-08 Verizon Patent And Licensing Inc. Containerized network function deployment during runtime resource creation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190065323A1 (en) * 2017-08-25 2019-02-28 Vmware, Inc. Containerized application snapshots
US20210240540A1 (en) * 2020-02-03 2021-08-05 International Business Machines Corporation Serverless platform request routing
US20220156631A1 (en) * 2020-11-17 2022-05-19 International Business Machines Corporation Machine-learning model to predict probability of success of an operator in a paas cloud enviornment
US20220188149A1 (en) * 2020-12-15 2022-06-16 International Business Machines Corporation Distributed multi-environment stream computing
US20220283792A1 (en) * 2021-03-03 2022-09-08 Verizon Patent And Licensing Inc. Containerized network function deployment during runtime resource creation

Similar Documents

Publication Publication Date Title
US11921815B2 (en) Techniques for the automated customization and deployment of a machine learning application
CN112685069B (en) Method and system for real-time updating of machine learning models
EP3889914A2 (en) Unsupervised learning of scene structure for synthetic data generation
US11018959B1 (en) System for real-time collection, processing and delivery of data-telemetry
US10540569B2 (en) System, method, and recording medium for detecting video face clustering with inherent and weak supervision
US20190260831A1 (en) Distributed integrated fabric
US11003910B2 (en) Data labeling for deep-learning models
US11379718B2 (en) Ground truth quality for machine learning models
DE102021125231A1 (en) UNSUPERVISED DOMAIN MATCHING WITH NEURAL NETWORKS
US11270226B2 (en) Hybrid learning-based ticket classification and response
CN115004251A (en) Scene graph generation of unlabeled data
CN117122929A (en) Identifying application buffers for post-processing and reuse in auxiliary applications
Coviello et al. DataX: A system for data exchange and transformation of streams
KR102299158B1 (en) Trusted predictive analytic execution middleware
US20230048581A1 (en) Data exchange and transformation in stream computing systems
US11190470B2 (en) Attachment analytics for electronic communications
US20230177385A1 (en) Federated machine learning based on partially secured spatio-temporal data
US11573770B2 (en) Container file creation based on classified non-functional requirements
CN117296042A (en) Application management platform for super-fusion cloud infrastructure
US11734576B2 (en) Cooperative neural networks with spatial containment constraints
Kritikos et al. Towards an Optimized, Cloud-Agnostic Deployment of Hybrid Applications
US20200097883A1 (en) Dynamically evolving textual taxonomies
US11847510B2 (en) System for application self-optimization in serverless edge computing environments
US11714624B2 (en) Managing and deploying applications in multi-cloud environment
Różańska et al. An Architecture for Autonomous Proactive and Polymorphic Optimization of Cloud Applications

Legal Events

Date Code Title Description
AS Assignment

Owner name: NEC LABORATORIES AMERICA, INC., NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:COVIELLO, GIUSEPPE;RAO, KUNAL;SANKARADAS, MURUGAN;AND OTHERS;SIGNING DATES FROM 20220808 TO 20220809;REEL/FRAME:060772/0545

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER