WO2025085441A1

WO2025085441A1 - Systems and methods of machine learning based spatial layout optimization via 3d reconstructions

Info

Publication number: WO2025085441A1
Application number: PCT/US2024/051416
Authority: WO
Inventors: Omid MOHARERI; Zhuohong HE; Muhammad Abdullah Jamal; Reza KHODAYI MEHR
Original assignee: Intuitive Surgical Operations Inc
Current assignee: Intuitive Surgical Operations Inc
Priority date: 2023-10-16
Filing date: 2024-10-15
Publication date: 2025-04-24
Anticipated expiration: 2026-04-16

Abstract

Machine learning-based spatial layout optimization via 3-dimensional (3D) reconstructions is provided. A system receives sensor data associated with a medical session and captured by a set of sensors situated in a medical environment. The system generates a three dimensional (3D) reconstruction of the medical environment based on the sensor data. The system identifies a phase of the medical session based on the 3D reconstruction and via one or more models. The system determines a metric indicative of quality of a spatial layout of the medical environment during the phase via the one or more models. The system determines an action based on the metric. The one or more processors can perform the action.

Description

SYSTEMS AND METHODS OF MACHINE LEARNING BASED SPATIAL LAYOUT OPTIMIZATION VIA 3D RECONSTRUCTIONS

CLAIM OF PRIORITY

[0001] This application claims priority to U.S. Patent Application No. 63/590,700, filed October 16, 2023, the full disclosure of which is incorporated herein in its entirety.

BACKGROUND

[0002] Medical procedures can be performed in an operating room. As the amount and variety of equipment in the operating room increases, or medical procedures become increasingly complex, it can be challenging to perform such medical procedures efficiently, reliably, or without incident.

SUMMARY

[0003] Technical solutions disclosed herein are generally related to machine learning based spatial layout optimization via 3 -dimensional (3D) reconstructions. The technology can capture 3D sensor data from a medical environment (e.g., a medical station or an operating room (OR)) from phases of a workflow of a medical procedure. The technology can use one or more machine learning models or functions to automatically detect the phase of the workflow, and reconstruct a 3D spatial layout and topology of the medical environment for each phase. The technology can provide a 3D viewer that can present a 3D reconstruction of the phase-based medical environment reconstruction. The system can evaluate workflow issues related to equipment placement and topology. The viewer can present the 3D reconstruction alongside, and in time synchronization with, the surgical video. The technology can evaluate the 3D reconstructions for each phase to trigger alerts, notifications, generate objective performance indicator (“OPI) metrics, evaluate in room traffic and motion, or create guidelines for an optimal medical environment layout. The evaluations can be based on, for example, one or more of medical environment workflow efficiency metrics (e.g., time duration of each activity), sequence of activities, motion patterns, people and equipment tracking functions, or historical data from a large, multi-institutional dataset. The technology can provide such alerts or notifications pre-operation (e.g., suggest optimal layout), post-operation (e.g., identify inefficiencies), or real-time during the operation (e.g., suggest a change in layout responsive to detecting or predicting an event, such as a collision, sterility breach, excessive motion or inefficient motion). [0004] At least one aspect is directed to a system. The system can include one or more processors, coupled with memory. The one or more processors can receive sensor data associated with a medical session and captured by a set of sensors situated in a medical environment. The one or more processors can generate a three dimensional (3D) reconstruction of the medical environment based on the sensor data. The one or more processors can identify a phase of the medical session based on the 3D reconstruction and via one or more models. In some cases, the one or more processors can identify the phase using a separate model trained with machine learning (e.g., a spatio-temporal video model) to detect activities without using the 3D reconstruction. In some cases, the one or more processors can utilize both the activity or phase detection model and the 3D reconstruction model. The one or more processors can determine a metric indicative of quality of a spatial layout of the medical environment during the phase via the one or more models (e.g., a performance model). The one or more processors can determine an action based on the metric. The one or more processors can perform the action.

[0005] In some implementations, the one or more processors can determine the metric based on a count of personnel detected in the medical environment, a pattern of motion of the personnel or objects, a size of the medical environment, time elapsed in the phase or the medical session, occurrence of one or more adverse events, or the phase of the medical session.

[0006] The one or more processors can compare the metric with a threshold established for the phase of the medical session. The one or more processors can determine the action based on the comparison of the metric with the threshold. The one or more processors can determine the threshold based on metadata of the medical environment, the metadata indicating at least one of a type of medical environment, a geographic location of the medical environment, or an age of the medical environment.

[0007] In some cases, the phase of the medical session can occur prior to a medical procedure in the medical session. The one or more processors can determine, during the phase, that the spatial layout of the medical environment in the 3D reconstruction varies from a baseline layout of medical environments established based on historical data associated with medical procedures similar to the medical procedure. The one or more processors can provide the action comprising an instruction to adjust the spatial layout to conform with the baseline layout prior to performance of the medical procedure in the medical session. [0008] The one or more processors can determine the action based on the metric, and input the metric into an action model trained with machine learning on historical data to determine the action to perform. The action can include to adjust at least one of a height, orientation, or location of an object located in the medical environment. The action can include to automatically control an object located in the medical environment. The action can include to provide at least one of an indication of the metric, an alert, or a guideline for an optimal spatial layout.

[0009] The one or more processors can detect an occurrence of an adverse event in the medical environment during the medical session. The one or more processors can perform the action by providing an indication of the occurrence of the adverse event.

[0010] The one or more processors can predict an adverse event in the medical environment during the medical session. The one or more processors can predict the adverse event based on at least one of the pattern of motion in the 3D reconstruction, the phase of the medical session, or user input received during the medical session. The adverse event can include at least one of a collision, a sterility breach, or an inefficient motion.

[0011] The one or more processors can, to perform the action, provide an indication of the prediction of the adverse event prior to occurrence of the adverse event. The one or more processors can, to perform the action, provide, prior to occurrence of the adverse event and responsive to the prediction, guidance to mitigate a likelihood of occurrence of the adverse event. The one or more processors can, to perform the action, disable, prior to occurrence of the adverse event and responsive to the prediction, a function of a robotic medical system used to perform a medical procedure within the medical session. The one or more processors can disable the function for at least one of a predetermined time interval or until detection of a predetermined event.

[0012] The one or more processors can determine, subsequent to performance of the action, a second metric indicative of quality of the spatial layout of the medical environment in the phase of the medical session. The one or more processors can determine, based at least in part on the second metric, an improvement in the quality of the spatial layout of the medical environment.

[0013] In some cases, the sensor data can include 3D point cloud data received from the set of sensors. The one or more processors can register, via a function, the 3D point cloud data from the set of sensors to a common coordinate system to generate the 3D reconstruction. In some cases, the sensor data include intensity data and depth data. The sensor data can be multimodal sensor data, such as RGB, depth, intensity, thermal camera, WiFi-signal data, or data from a robotic medical system. The one or more processors can determine the pattern of motion as corresponding to movement of personnel in the medical environment during the medical session. The one or more processors can overlay, on a graphical user interface, the pattern of motion on the 3D reconstruction.

[0014] The one or more processors can establish, within the 3D reconstruction, a digital twin of an object associated with performance of the medical session in the medical environment. The one or more processors can overlay, on the graphical user interface with the digital twin, a heatmap corresponding to the pattern of motion in the medical environment during the medical session.

[0015] The one or more processors can train the reconstruction model with machine learning using training data comprising at least one of labeled sets of 3D reconstructions, historical 3D point cloud data obtained from one or more medical environments, computer aided drawings, blueprints, multi-modal sensor data, or video stream data obtained from a robotic medical system configured to perform at least a portion of the medical session.

[0016] The one or more processors can train the performance model with machine learning using training data comprising at least one of historical metrics indicative of quality of medical environments, outcomes of medical sessions, or 3D reconstructions. The historical metrics can include, for example, quality of the medical environment (e.g., safety, efficiency, or performance), outcomes of medical sessions, cost efficiency (e.g., the amount of space the medical equipment or environment occupies in a facility or hospital as compared to the amount of space that is available or the amount of space that is wasted).

[0017] The one or more processors can train the action model with machine learning using training data comprising at least one of simulated actions performed in 3D reconstructions, historical event logs of medical sessions performed via one or more robotic medical systems. The training data can include data or knowledge provided or established by experts that indicates or describes an appropriate action configured to resolve or mitigate an issue related to a particular spatial layout.

[0018] An aspect can be directed to a method. The method can be performed by one or more processors coupled with memory. The method can include the one or more processors receiving sensor data associated with a medical session and captured by a set of sensors situated in a medical environment. The method can include the one or more processors generating a three dimensional (3D) reconstruction of the medical environment based on the sensor data. The method can include the one or more processors identifying a phase of the medical session based on the 3D reconstruction and via one or more models (e.g., a multimodal model or a reconstruction model). The method can include the one or more processors determining a metric indicative of quality of a spatial layout of the medical environment during the phase via the one or more models (e.g., the multi-modal model or a performance model). The method can include the one or more processors determining an action based on the metric. The method can include the one or more processors performing the action.

[0019] An aspect can be directed to a non-transitory computer-readable medium storing processor executable instructions that, when executed by one or more processors, cause the one or more processors to receive sensor data associated with a medical session and captured by a set of sensors situated in a medical environment. The instruction can cause the one or more processors to generate a three dimensional (3D) reconstruction of the medical environment based on the sensor data. The instructions can cause the one or more processors to identify a phase of the medical session based on the 3D reconstruction and via one or more models. The instructions can cause the one or more processors to determine a metric indicative of quality of a spatial layout of the medical environment during the phase via the one or more models. The instructions can cause the one or more processors to determine an action based on the metric. The instructions can cause the one or more processors to perform the action.

[0020] These and other aspects and implementations are discussed in detail below. The foregoing information and the following detailed description include illustrative examples of various aspects and implementations and provide an overview or framework for understanding the nature and character of the claimed aspects and implementations. The drawings provide illustration and a further understanding of the various aspects and implementations and are incorporated in and constitute a part of this specification. The foregoing information and the following detailed description and drawings include illustrative examples and should not be considered as limiting. BRIEF DESCRIPTION OF THE DRAWINGS

[0021] The accompanying drawings are not intended to be drawn to scale. Like reference numbers and designations in the various drawings indicate like elements. For purposes of clarity, not every component can be labeled in every drawing. In the drawings:

[0022] FIG. 1 depicts an example system of machine learning-based layout optimization via 3D reconstructions.

[0023] FIG. 2 depicts an example graphical user interface for presenting a 3D reconstruction.

[0024] FIG. 3 depicts an example process of machine learning-based spatial layout optimization via 3D reconstructions.

[0025] FIG. 4 depicts an example process of machine learning-based spatial layout optimization via 3D reconstructions.

[0026] FIG. 5A depicts an example medical environment.

[0027] FIG. 5B depicts an example graphical user interface depicting a layout of a medical environment overlayed with patterns of motion.

[0028] FIG. 6 is a block diagram depicting an architecture for a computer system that can be employed to implement elements of the systems and methods described and illustrated herein, including aspects of the system depicted in FIG. 1, the user interface depicted in FIG. 2 and FIG. 5B, and the methods or processes depicted in FIGS. 3 and 4.

DETAILED DESCRIPTION

[0029] Following below are more detailed descriptions of various concepts related to, and implementations of, methods, apparatuses, and systems of machine learning-based spatial layout optimization via 3D reconstructions. The various concepts introduced above and discussed in greater detail below can be implemented in any of numerous ways.

[0030] Although the present disclosure is discussed in the context of a surgical procedure, in some embodiments, the present disclosure can be applicable to other medical sessions or environments or activities, as well as non-medical activities where removal of irrelevant information is desired. [0031] Technical solutions described herein are generally directed to machine learning based spatial layout optimization via 3-dimensional (3D) reconstructions. The technology can capture 3D sensor data of a medical environment (e.g., operating room (OR)) from phases of a medical session, which can include a medical procedure or a workflow. The technology can use one or more machine learning models or functions to automatically detect the phase of the medical session, and reconstruct a 3D room layout and topology of the medical environment for each phase. The technology can reconstruct the 3D room layout using 3D point cloud data from multiple sensors in order to create a realistic and human viewable 3D reconstruction or model with the relevant or desired medical environment equipment in a representative location in the 3D reconstruction. To do so, the technology can invoke a registration function configured to convert, translate or port the 3D data into a single coordinate system.

[0032] The technology can provide a 3D viewer that can present a 3D reconstruction of the phase-based medical environment reconstruction. The system can evaluate workflow issues related to equipment placement and topology. The viewer can present the 3D reconstruction alongside, and in time synchronization with, the surgical video. The 3D reconstructions can include an indication or representation of the equipment in the medical environment, including, for example, a digital twin of the equipment. The digital twin can be obtained from a data repository or other storage, and be preconfigured with characteristics, settings or constraints that can correspond to the equipment, such as dimensions, orientation, movements, or adjustability.

[0033] The technology can evaluate (e.g., based on motion patterns, people and equipment tracking algorithms, or historical data from a large, multi-institutional dataset) the 3D reconstructions for each phase to trigger alerts, notifications, generate objective performance indicator (“OPI) metrics, evaluate in room traffic and motion, or create guidelines for an optimal spatial layout. The technology can provide such alerts or notifications pre-op (e.g., suggest optimal layout), post-op (e.g., identify inefficiencies), or real-time during the operation (e.g., suggest a change in layout responsive to detecting or predicting an event, such as a collision, sterility breach, or excessive/inefficient motion).

[0034] Thus, the technical solutions of this disclosure can provide a 3D spatial layout that can be linked to specific phases of a medical session. The phases of the medical session can include room preparation, robot setup, a surgical or medical procedure, turn over, or cleaning, for example. The technology can associate the 3D room layout with positive or negative outliers in data associated with an entity, organization, site, medical environment, OR, or other category. To do so, the technology can evaluate or analyze non-operative metrics from datasets of one or more entities to create a distribution and identify outliers and trends in the distribution. A spatial layout associated with positive and negative outliers can be presented or provided via an application in order to facilitate making adjustments or modifications to the spatial layout. By providing the spatial layout associated with positive and negative outliers via an application, this technology can facilitate establishing or maintaining a standard in spatial layout, and the discovery of optimal or highly performing layouts for an entity, organization, or other institution, while addressing inefficiencies or issues in the layout that is associated with outliers.

[0035] The technology, using the 3D reconstruction of the medical environment and machine learning models, can extract a spatial layout that is configured to improve the medical session workflow in terms of non-operative metrics. The technology can present the extracted layouts via an application, display device, or web-based graphical user interface in order to facilitate making adjustments or modifications to the layout. To do so, the technology can invoke or execute tracking functions configured to determine motion patterns with respect to a spatial layout. Motion patterns can refer to or include the movement of personnel or equipment in a medical environment room during various phases of a medical session, including, for example, room preparation, robot setup, during a medical procedure, turn over, or cleaning. The technology can overlay the traffic patterns on top of the 3D reconstruction of the room layout. The technology can overlay traffic patterns over one another to illustrate potential adverse events, such as collisions between personnel or objects. The technology can provide a heatmap or other graphical user element that indicates an intensity of movement along a particular path. The heatmap can indicate a count of personnel traversing a particular path through the medical environment, an amount of time personnel spend traversing a particular path through the medical environment, an average speed or rate of movement along a path, a size of objects or equipment that traverse the path, or types of objects that traverse a path (e.g., personnel or equipment). Thus, using the overlayed traffic patterns, the technology can identify or predict issues related to collision of equipment, personnel, or sterility breach issues that may be a result of the layout of the medical environment.

[0036] The technology can leverage historical data from a large multi-institutional data set to extract or generate spatial layouts that are associated with efficient medical or surgical workflows for each type of procedure. The technology can facilitate adjustments to the layout or provide guidance prior to an operation or medical procedure (e.g., pre-op), subsequent to or after a medical procedure (e.g., post-op), or real-time guidance during a medical session. Realtime guidance can include, for example, a notification, alert, or action to change a layout in response to detecting or predicting an adverse event (e.g., collision or sterility breach), or detecting an inefficiencies (e.g., excessive motion), which may indicate a deviation from an optimal layout.

[0037] FIG. 1 depicts an example system 100 for machine learning-based spatial layout optimization using 3D reconstructions, in accordance with implementations. The system 100 can be associated with a medical environment 105. The medical environment 105 can be a medical surgical environment. A medical surgical environment can include a surgical facility such as an operating room in which a surgical procedure, whether, invasive, non-invasive, inpatient, or out-patient, can be performed on a patient. The system 100 can be associated with other types of medical sessions or activities, or non-medical environments that can require removal of non-surgical information from a data stream captured from that environment. The system 100 can include one or more data capture devices 110. Data capture devices 110 can collect images, kinematics data, or system events, for example. For example, the data capture device can include an image capture devices designed, constructed and operational to capture images from a particular viewpoint within the medical environment 105. The data capture devices 110 can be positioned, mounted, or otherwise located to capture content from any viewpoint that facilitates the data processing system recognizing phases of a procedure.

[0038] For example, in some embodiments, a first data capture device can be positioned to capture one or more images of an area where a patient is located within the medical environment 105. A second data capture device 110 can be positioned to capture one or more images of an area where one or more medical professionals are located within the medical environment 105. A third data capture device 110 can be configured to capture one or more images of other designated areas within the medical environment 105. The data capture devices 110 can include any of a variety of sensors, cameras, video imaging devices, infrared imaging devices, visible light imaging devices, intensity imaging devices (e.g., black, color, grayscale imaging devices, etc.), depth imaging devices (e.g., stereoscopic imaging devices, time-of- flight imaging devices, etc.), medical imaging devices such as endoscopic imaging devices, ultrasound imaging devices, etc., non-visible light imaging devices, any combination or sub- combination of the above mentioned imaging devices, or any other type of imaging devices that can be suitable for the purposes described herein.

[0039] The medical environment can include multiple data capture devices 110 (e.g., sensors) which can be referred to as a set of sensors. The data capture devices 110 can be referred to or include 3D sensors, such as time-of-flight sensors. A time-of-flight (ToF) sensor can refer to or include a type of sensor that can be used to measure the distance between a sensor and an object based on the amount of time it takes for a light or sound signal to travel to the object and back to the sensor. ToF sensors can include a light detection and ranging (e.g., lidar), which can use laser light to measure the time it takes for a laser pulse to bounce of an object and return to the sensor. The distance can be determined based on the time and the speed of the laser pulse (e.g., the speed of light). The sensors can include ToF cameras or depth cameras. ToF cameras can use infrared light sources, such as light emitting devices (LEDs), to emit a pulse of light and then measure the time it takes for the light to reflect off objects and return to the sensor. In some cases, the 3D sensor can include a dot projector, such as a device that emits a pattern of dots or points of light onto a surface or object. The 3D sensors can be used to create 3D depth maps or point clouds.

[0040] The images that are captures by the data capture device 110 can include still images, video images, vector images, bitmap images, other types of images, or combinations thereof. In some cases, one or more of the data capture devices 110 can be configured to capture other parameters (e.g., sound, motion, pressure, or temperature) within the medical environment 105. The data capture devices 110 can capture the images at any suitable predetermined capture rate or frequency. Other settings, such as zoom settings or resolution, of each of the data capture devices 110 can vary as desired to capture suitable images from a particular viewpoint. The data capture devices 110 can have fixed viewpoints, locations, positions, or orientations. The data capture devices 110 can be portable, or otherwise configured to change orientation or telescope in various directions. The data capture devices 110 can be part of a multi-sensor architecture including multiple sensors, with each sensor being configured to detect, measure, or otherwise capture a particular parameter (e.g., sound, images, or pressure).

[0041] The images captured by the data capture devices 110 can be sent as a data stream component to a visualization tool 170 or the data processing system 130. A data stream component can be considered any sequence of digital encoded data or analog data from a data source such as the data capture devices 110. The visualization tool 170 can be configured to receive a plurality of data stream components and combine the plurality of data stream components into a single data stream.

[0042] The visualization tool 170 can receive a data stream component from a medical tool 120. The medical tool 120 can be any type and form of tool used for surgery, medical procedures or a tool in an operating room or environment associated with or having an image capture device. The medical tool 120 can be an endoscope for visualizing organs or tissues, for example, within a body of the patient. The medical tool 120 can include other or additional types of therapeutic or diagnostic medical imaging implements. The medical tool 120 can be configured to be installed in a robotic medical system 125.

[0043] The robotic medical system 125 can be a computer-assisted system configured to perform a surgical or medical procedure or activity on a patient via or using or with the assistance of one or more robotic components or medical tools. The robotic medical system 125 can include one or more manipulator arms that perform one or more computer-assisted medical tasks. The medical tool 120 can be installed on a manipulator arm of the robotic medical system 125 to perform a surgical task. The images (e.g., video images) captured by the medical tool 120 can be sent to the visualization tool 170. The robotic medical system 125 can include one or more input ports to receive direct or indirect connection of one or more auxiliary devices. For example, the visualization tool 170 can be connected to the robotic medical system 125 to receive the images from the medical tool when the medical tool is installed in the robotic medical system (e.g., on a manipulator arm of the robotic medical system). The visualization tool 170 can combine the data stream components from the data capture devices 110 and the medical tool 120 into a single combined data stream for presenting on a display 172 (e.g., display 630 depicted in FIG. 6). The display 172 can be associated with a client device 174, user control system or other type of display system, whether within the medical environment 105 or remote, to view the single combined data stream. The client device 174 can refer to or include a laptop computer, desktop computer, tablet, smartphone, portable computing device, or wearable device, for example.

[0044] The system 100 can include a data processing system 130. The data processing system 130 can be associated with the medical environment 105, or cloud-based. The data processing system 130 can include an interface 132 designed, constructed and operational to communicate with one or more component of system 100 via network 101, including, for example, the robotic medical system 125 or client device 174. The data processing system 130 can include a data collector 134 to capture or otherwise receive or obtain data from one or more component or system associated with the medical environment 105 or via the network 101, including, for example, one or more data capture devices 110 or a set of sensors. The data processing system 130 can include a 3D reconstructor 136 to generate a 3D reconstruction of the spatial layout of the medical environment 105. The data processing system 130 can include a phase detector 138 to identify or determine a phase associated with the medical session. The data processing system 130 can include a tracker 140 to identify a pattern of motion or otherwise track the movement of personnel or objects, based on a function 142, in the medical environment 105. The data processing system 130 can include a model generator 144 that can maintain, manage, operate, or otherwise provide one or more models for utilization by the data processing system 130. The data processing system 130 can include a performance controller 146 to determine OPIs associated with the quality, performance, or efficiency of a spatial layout of a medical environment 105 in which a medical session occurs. The performance controller 146 can include or utilize an event predictor 148 to identify or predict adverse events, such as collisions or sterility breaches, in the medical environment during the medical session. The performance controller 146 can, based on the OPI or predicted adverse event, can provide an alert, warning, or notification that can improve the efficiency of the layout of the medical environment 105 or reduce the likelihood or prevent the occurrence of the adverse event. In some cases, the performance controller 146 can automatically adjust a component, medical equipment, or object in the medical environment 105 to improve the quality, performance, or efficiency of the spatial layout in the medical environment 105.

[0045] The interface 132, data collector 134, 3D reconstructor 136, phase detector 138, tracker 140, model generator 144, performance controller 146, or event predictor 148 can each communicate with the data repository 150 or database. The data processing system 130 can include or otherwise access the data repository 150. The data repository 150 can include one or more data files, data structures, arrays, values, or other information that facilitates operation of the data processing system 130. The data repository 150 can include one or more local or distributed databases, and can include a database management system. The data repository 150 can include, maintain, or manage a data stream 152. The data stream 152 can be received by the data collector 134 and stored in the data repository 150. The data stream 152 can include or be formed from one or more of a video stream, event stream, or kinematics stream. The data stream 152 can include data collected by one or more data capture devices 110, such as a set of 3D sensors from a variety of angles or vantage points with respect to the procedure activity (e.g., point or area of surgery).

[0046] The event stream can include a stream of event data or information, such as packets, that identify or convey a state of the robotic medical system 125 or an event that occurred in association with the robotic medical system 125 or surgical or medical surgery being performed with the robotic medical system. Data of the event stream can be captured by the robotic medical system 125 or a data capture device 110. An example state of the robotic medical system 125 can indicate whether the medical tool 120 is installed on a manipulator arm of the robotic medical system or not, whether it was calibrated, or whether it was fully functional (e.g., without errors) during the procedure. For example, when the medical tool 120 is installed on a manipulator arm of the robotic medical system 125, a signal or data packet(s) can be generated indicating that the medical tool has been installed on the manipulator arm of the robotic medical system 125. The signal or data packet(s) can be sent to the data collector 134 as the event stream. Another example state of the robotic medical system 125 can indicate whether the visualization tool 170 is connected, whether directly to the robotic medical system or indirectly through another auxiliary system that is connected to the robotic medical system.

[0047] Kinematics stream data can refer to or include data associated with one or more of the manipulator arms or medical tools 120 attached to manipulator arms, which can be captured or detected by one or more displacement transducers, orientational sensors, positional sensors, or other types of sensors and devices to measure parameters or generate kinematics information. The kinematics data can include sensor data along with time stamps and an indication of the medical tool 120 or type of medical tool 120 associated with the sensor data.

[0048] The data repository 150 can include, store, maintain, or otherwise manage a 3D reconstruction 154. The 3D reconstruction 154 can be generated by the 3D reconstructor 136. The 3D reconstruction 154 can include a 3D layout of the medical environment 105 (e.g., operating room) in which the medical session takes place (e.g., where the medical procedure is performed or occurs). The 3D reconstruction 154 can be stored in various file format, including, for example, a gLTF/GLB file, an .OBJ file (e.g., wavefront object), a .USDZ/USD file, a PLY file (e.g., polygon file format), or other file format. The 3D reconstruction 154 can include information about a geometry (e.g., shape of the medical environment), appearance (e.g., color), scene (e.g., position of objects in the medical environment), or animations (e.g., movement of 3D objects in the medical environment). The file format can include a text format, comma separated files, data structures, or other formats.

[0049] The data repository 150 can include, store, manage, or otherwise maintain one or more models 156. The models 156 can refer to or include machine learning models. The models 156 can be trained, established, configured, updated, or otherwise provided by the model generator 144. The models 156 can be configured to identify, predict, classify, categorize, or otherwise score aspects of a medical procedure, medical environment 105, or operating room to facilitate machine learning-based layout optimization via 3D reconstructions. The data processing system 130 can utilize a single, multi-modal machine learning model, or multiple machine learning models. The multiple machine learning models can each be multimodel, for example.

[0050] For example, the models 156 can include a reconstruction model designed, constructed and operational to generate a 3D reconstruction of a medical environment based on sensor data. The models 156 can include a performance model designed, constructed and operational to determine a metric or a value for a metric that indicates the quality of a spatial layout based on the 3D reconstruction of the medical environment. The performance model can predict or detect adverse events in the medical environment. The models 156 can include an action model designed, constructed and operational to identify, determine or select an action to perform in the medical environment based on the metric, value of the metric, quality of the medical environment, or other output from the performance model. The action model can determine an action to perform that can improve the quality of the spatial layout of the medical environment, or mitigates the likelihood of a predicted adverse event occurring in the medical environment. The models 156 can include a phase model designed, constructed and operational to determine, detect, identify, or predict a phase in a medical session based on input data (e.g., sensor data, multi-modal sensor data, the 3D reconstruction, event stream, or kinematics data). The models 156 can include a tracker model designed, constructed and operational to determine a pattern of motion during a medical session that occurs in a medical environment 105.

[0051] The data repository 150 can include, store, manage, or otherwise maintain a threshold 158. The threshold 158 can refer to or include a numerical value that can be used to determine whether a metric indicative of performance of a layout of an operating room generated by the performance controller 146 is satisfactory. For example, a metric for efficiency can be a length of a path between two objects in the layout of the operating room, or the amount of time taken to traverse between the two objects. The value for the metric for a medical procedure can be compared with the threshold for the metric to determine whether the value for the metric exceeds the threshold for the metric, in which case the data processing system 130 can determine that the layout is inefficient and perform an action to cause a change in the layout.

[0052] In some cases, the threshold data structure 158 can include or refer to a map of thresholds. The thresholds can map to one or more attributes, factors, or categories associated with a medical environment 105 or medical procedure. For example, the threshold can map to an institution in which the medical procedure is to be performed, a type of medical procedure, or a phase in the medical procedure. To evaluate a performance of the layout of the operating room for a particular phase in the medical procedure, the data processing system 130 can select the corresponding threshold via a lookup in the threshold data structure 158 using the detected phase.

[0053] The data repository 150 can include, store, manage, or otherwise maintain phases 160 data. The phases 160 can refer to or include operative phases or non-operative phases (e.g., operating room related phases), such as room preparation, robot setup, performance of a medical procedure, turn over, or cleaning, for example. In some cases, phases 160 can refer to or include operative phases, such as exposure, dissection, transection, reconstruction, and extraction. Exposure can refer to or include the process of visualizing and accessing a surgical site by creating a clear and adequate field of view, dissection can refer to or include cutting, separating and removing tissues or anatomical structures to gain access to specific areas, identify structures, or perform surgical procedures. Transection can refer to or include severing or cutting a structure, such as a blood vessel, nerve, or organ using a surgical instrument. Extraction can refer to or include the removal of a tissue, organ, foreign object, or other anatomical structure from the body. Reconstruction can refer to or include the process of restoring or rebuilding a damaged or missing tissue, organ, or body part, and can include techniques or tasks such as grafting, suturing, or using prosthetic materials to recreate the structure and restore form and function. [0054] The data repository 150 can include, manage or maintain historical data 162. Historical data 162 can include prior video stream, event stream, or kinematic stream data. Historical data 162 can include data associated with a location of the medical environment 105, workflows, facilities, layouts in a current institution or other institutions, room layout performance metrics, or other information that can facilitate machine learning-based layout optimization via 3D reconstructions. Historical data 162 can include training data used to train or update the models 156 using machine learning.

[0055] The data repository 150 can include, manage or maintain a digital twin 164. The digital twin 164 can refer to or include a virtual representation of an object, such as equipment located in the medical environment 105. The object can include, for example, one or more of an operating table, hospital bed, surgical light, room light fixtures, surgical boom, surgical displays, documentation stations, operating room integration systems, blanket warmers, intravenous dispenser, scrub sinks, defibrillators, anesthesia machine, patient monitors, sterilizers, or EKG/ECG machines. The digital twin 164 of an object can include information about the object, such as dimensions, configurations, capabilities, position, orientation, or articulation.

[0056] The data processing system 130 can interface with, communicate with, or otherwise receive or provide information with one or more component of system 100 via network 101, including, for example, the robotic medical system 125 or client device 174. The data processing system 130, robotic medical system 125 or client device 174 can each include at least one logic device such as a computing device having a processor to communicate via the network 101. The data processing system 130, robotic medical system 125 or client device 174 can include at least one computation resource, server, processor or memory. For example, the data processing system 130 can include a plurality of computation resources or processors coupled with memory.

[0057] The data processing system 130 can be part of or include a cloud computing environment. The data processing system 130 can include multiple, logically-grouped servers and facilitate distributed computing techniques. The logical group of servers may be referred to as a data center, server farm or a machine farm. The servers can also be geographically dispersed. A data center or machine farm may be administered as a single entity, or the machine farm can include a plurality of machine farms. The servers within each machine farm can be heterogeneous - one or more of the servers or machines can operate according to one or more type of operating system platform.

[0058] The data processing system 130, or components thereof can include a physical or virtual computer system operatively coupled, or associated with, the medical environment 105. In some embodiments, the data processing system 130, or components thereof can be coupled, or associated with, the medical environment 105 via a network 101, either directly or directly through an intermediate computing device or system. The network 101 can be any type or form of network. The geographical scope of the network can vary widely and can include a body area network (BAN), a personal area network (PAN), a local-area network (LAN) (e.g., Intranet), a metropolitan area network (MAN), a wide area network (WAN), or the Internet. The topology of the network 101 can assume any form such as point-to-point, bus, star, ring, mesh, tree, etc. The network 101 can utilize different techniques and layers or stacks of protocols, including, for example, the Ethernet protocol, the internet protocol suite (TCP/IP), the ATM (Asynchronous Transfer Mode) technique, the SONET (Synchronous Optical Networking) protocol, the SDH (Synchronous Digital Hierarchy) protocol, etc. The TCP/IP internet protocol suite can include application layer, transport layer, internet layer (including, e.g., IPv6), or the link layer. The network 101 can be a type of a broadcast network, a telecommunications network, a data communication network, a computer network, a Bluetooth network, or other types of wired and wireless networks.

[0059] The data processing system 130, or components thereof, can be located at least partially at the location of the surgical facility associated with the medical environment 105 or remotely therefrom. Elements of the data processing system 130, or components thereof can be accessible via portable devices such as laptops, mobile devices, wearable smart devices, etc. The data processing system 130, the data collector 134, or components thereof, can include other or additional elements that can be considered desirable to have in performing the functions described herein. The data processing system 130, or components thereof, can include, or be associated with, one or more components or functionality of computing system 600 depicted in FIG. 6, including, for example, one or more processors coupled with memory.

[0060] The data processing system 130 can include an interface 132 designed, constructed and operational to communicate with one or more component of system 100 via network 101, including, for example, the robotic medical system 125 or client device 174. The interface 132 can include a network interface. The interface 132 can include or provide a user interface, such as a graphical user interface.

[0061] The interface 132 can provide data for presentation via a 3D viewer 176 that can depict, illustrate, render, present, or otherwise provide a 3D reconstruction generated by the data processing system 130. The interface 132 can provide the 3D viewer 176 application via a web browser. The 3D viewer 176 can be a software-as-a-service application hosted by the data processing system 130 or one or more servers. The 3D viewer 176 can be a native application hosted or executed on a client device 174. An example of a 3D reconstruction presented by the 3D viewer 176 is illustrated in FIG. 2.

[0062] The data processing system 130 can include a data collector 134 designed, constructed and operational to receive sensor data associated with a medical procedure and captured by a set of sensors (e.g., multiple data capture devices 110) situated in an operating room (e.g., medical environment 105 depicted in FIGS. 5A or 5B). The sensor data can include ToF sensor data collected, captured, or otherwise sensed by 3D sensors or data capture devices 110. The sensor data can refer to or be a part of a data stream 152. The sensor data can include multi-modal sensor data. The multi-modal sensor data, can include, for example, RGB, depth, intensity, thermal camera, WiFi-signal data, or data from the robotic medical system 125.

[0063] The data stream 152 can include sensor data, or be referred to as sensor data. The data stream 152, or sensor data, can include information about the sensor that collected the data, such as an identifier of the sensor (e.g., a unique identifier of the sensor or alphanumeric identifier), an indication of a position or a location of the sensor (e.g., location within the medical environment 105, operating room identifier, site identifier, or institution name), orientation of the sensor, or type of the sensor. The data processing system 130 can, in some cases, use one or more of the location, position, or orientation information of each of the set of sensors to register the sensor data from each sensor to a common or single coordinate system.

[0064] The sensor data, or data stream 152, can include information such as 3D point cloud data from a set of ToF sensors. The data stream can include a time stamp for the data sensed or detected by the sensor. For example, the data stream 152 can include multiple frames of 3D sensor data. Each frame can be associated with an identifier of a sensor and a timestamp the data was received, captured, or detected. The multiple sensors in the medical environment 105 can by time synchronized such that timestamps associated with data captured by the set of sensors can be synchronized. [0065] The data processing system 130 can receive sensor data from various phases of a medical procedure. The data processing system 130 can receive sensor data that was detected before the medical procedure has commenced, during the medical procedure, or after the medical procedure. The data processing system 130 can receive sensor data that was collected during all phases of a workflow associated with a medical procedure, including operative and non-operative phases such as: room preparation (e.g., preparing the operating room for a medical procedure), robot setup (e.g., setting up a robotic medical system or aspect thereof in order to perform a surgical or medical procedure), a surgical or medical procedure, turn over (e.g., the time between one patient exiting surgery to the time at which the next patient enters the operating room to begin surgery), or cleaning the operating room. The data processing system 130 can receive the data in real-time as the data is collected, in a batch mode based on a time interval, or upon completion of the medical procedure.

[0066] Thus, the data stream 152 can capture various activities associated with a medical procedure in the operating room over a time interval with a robotic medical system 125. The data collector 134 can access the data stream from the visualization tool 170 or the display 172. The data collector 134 can receive an event stream from the robotic medical system 125 or kinematic stream. The event stream can include a stream of event data or information, such as packets, that identify or convey a state of the robotic medical system 125 or an event that occurred in association with the robotic medical system or surgical or medical surgery being performed with the robotic medical system. An example state of the robotic medical system 125 can indicate whether the medical tool 120 is installed on a manipulator arm of the robotic medical system or not. For example, when the medical tool 120 is installed on a manipulator arm of the robotic medical system 125, a signal or data packet(s) can be generated indicating that the medical tool has been installed on the manipulator arm of the robotic medical system 125. The signal or data packet(s) can be sent to the data collector 134 as the event stream.

Another example state of the robotic medical system 125 can indicate whether the visualization tool 170 is connected, whether directly to the robotic medical system or indirectly through another auxiliary system that is connected to the robotic medical system.

[0067] The robotic medical system 125 can have other states, which can be detected by the data collector 134. The data collector 134 can determine (e.g., record) or otherwise receive the event stream through an Application Programming Interface (API) of the robotic medical system 125. The data collector 134 can determine or otherwise receive the event stream via other suitable mechanisms. The data collector 134 can poll the robotic medical system 125 to determine the state of the robotic medical system 125.

[0068] The data processing system 130 can include a 3D reconstructor 136 designed, configured and operational to generate a three dimensional (3D) reconstruction of the medical procedure based on the sensor data. The 3D reconstruction of the medical procedure can refer to or include a 3D reconstruction of a layout or topology of the medical environment 105 or operating room in which the medical procedure is to be performed or performed. The 3D reconstruction of the topology of the operating room can indicate the objects or personnel in the medical environment, and how the objects or personnel are interrelated or arranged. The topology can be to scale to provide a realistic representation of the arrangement, distances, or interrelation between the objects and personnel in the operating room and associated with the medical procedure. For example, the 3D reconstructor 136 can receive sensor data that includes 3D point cloud data from the set of sensors (e.g., time-of-flight sensors). Sensors (e.g., data capture devices 110) can be positioned at predetermined or arbitrary locations that each allow a respective sensor to capture images or point clouds of a medical environment 105 from a particular viewpoint or viewpoints. Any suitable location for a sensor may be considered an arbitrary location, which can include fixed locations that are not determined by system 100, random locations, and/or dynamic locations. The viewpoint of a sensor (e.g., the position, orientation, and view settings such as zoom for a sensor) can determine the content of the images or data that are captured by the sensor.

[0069] The 3D reconstructor 136 can use one or more models 156 to perform 3D reconstruction for each phase of the medical session. In some cases, the models 156 can include a reconstruction model and a separate phase model. The 3D reconstructor 136 can use the reconstruction model to perform the 3D reconstruction that is separate or different from the phase model used to identify the phases in the medical session. The reconstruction model and the phase model can work in parallel. For example, the models can work in parallel to facilitate generating 3D reconstructions for each phase. In some cases, the reconstruction model and the phase model can be combined into one model 156 that can facilitate both phase identification (e.g., by the phase detector 138) and 3D reconstruction (e.g., by the 3D reconstructor 136). The

[0070] The 3D reconstructor 136 can register, via a function, the 3D point cloud data to a single coordinate system to generate the 3D reconstruction of the medical procedure. For example, the 3D reconstructor 136 can stitch together the sensor data from the set of sensors into a common coordinate frame to provide a single, 3D reconstruction that indicates the layout or topology of the operating room. The data processing system 130 can create a 3D reconstruction that is realistic and human viewable. The 3D reconstruction can include significant operating room equipment in their actual location.

[0071] To do so, the data processing system 130 can execute a registration function to put the received 3D sensor data into a coordinate system. The data processing system 130 can identify a reference point in the operating room. The reference point can be a coordinate or point in the operating room. The reference point can refer to or include a landmark in the operating room. The data processing system 130 can use the reference point or landmark as the basis of a coordinate system for the operating room. To determine the reference point, the data processing system 130 can calibrate the set of sensors to each other to determine a common point cloud. For example, to calibrate the set of sensors, the data processing system 130 can perform feature detection on a predetermined pattern (e.g., a checkerboard) to extract predetermined or known features (e.g., comers). The data processing system 130 can use these correspondences to establish a transformation between different sensors.

[0072] In another example, the data processing system 130 can perform calibration or registration with 3D point clouds obtained from depth images in order to register a set of sensors facing the same direction and opposing depth sensors. The data processing system 130 can use a 3D target to perform the registration among the multiple 3D depth sensors.

[0073] In some cases, the data processing system 130 can perform a time-based alignment process to facilitate registering the set of sensors or data therefrom to a single coordinate system. The data processing system 130 can receive data from different sensors at different times, and use the timestamps in the data stream from the respective sensors in order to align or synchronize the data streams. By performing time-based alignment or synchronization of the data streams, the data processing system 130 can facilitate registering data from the set of sensors with different viewpoints to a single coordinate system, and synthesizing a 3D reconstruction of the medical procedure.

[0074] In an illustrative example, a 3-axis coordinate system can be denoted X, Y, and Z, and the coordinate (0, 0, 0) can correspond to the reference point or landmark in the operating room. The data processing system 130 can determine a location of a sensor in the operating room relative to the reference point in the operating room. The data processing system 130 can compute a coordinate transformation (e.g., a function) between the sensor and the reference point in the operating room. The coordinate transformation can include an offset between the sensor location and the reference point, such as an offset amount in one or more of the X, Y, or Z axes. The data processing system 130 can determine a coordinate transformation or offset function for each sensor in the operating room relative to the reference point in the operating room. To register all sensor data to a single coordinate system, the data processing system 130 can apply respective coordinate transformations established for sensor data from each sensor to put all the sensor data in the same coordinate system.

[0075] The 3D reconstruction can include a synthetic digital twin of known components. The data processing system 130 (e.g., via 3D reconstructor 136) can identify objects or equipment in the medical environment 105, and replace the objects with a digital twin 164 when generating the 3D reconstruction. The data processing system 130 can utilize one or more models 156 (e.g., a reconstruction model) trained with machine learning to recognize, identify, or classify objects in the data stream 152. Upon recognizing or detecting an object via the reconstruction model, the data processing system 130 can perform a look up, query, or other search to determine whether a digital twin corresponding to the detected object is available in the digital twin 164 data structure or data repository. The data processing system 130, upon determining that a digital twin is available for the object, can integrate the digital twin version of the object into the 3D reconstruction.

[0076] The digital twin data structure 164 can include multiple versions of a type of object or equipment. The digital twin data structure 164 can include a version of the type of object that is parameterized or configured to be customized, adjusted, or modified. For example, the digital twin version can be parameterized, which can allow the data processing system 130 to modify the digital twin prior to integration into the 3D reconstruction. The digital twin can include parameters or variable relating to size of a piece of equipment, features of the equipment, or configuration of the equipment. The data processing system 130 can position, orient, configure, or otherwise modify the digital twin to match the actual object in the medical environment 105 to provide a realistic 3D reconstruction of the medical environment 105.

[0077] The data processing system 130 can include a phase detector 138 designed, constructed and operational to identify, based on the 3D reconstruction and via a model trained with machine learning, a phase of the medical procedure. The phase detector 138 can detect, identify, or recognize the phase based on the 3D reconstruction or directly using the data stream obtained from the data capture devices 110. For example, the phase detector 138 can identify the phase using a model trained with machine learning (e.g., a spatio-temporal video model) to detect activities without using the 3D reconstruction. In some cases, the phase detector 138 can utilize both the machine learning model and the 3D reconstruction model.

[0078] The phase detector 138 can identify activities associated with the medical procedure in order to determine the phase. To do so, the phase detector 138 can use one or more models 156 trained with machine learning. The one or more models 156 can include separate phase model configured to perform phase identification, or a single multi-modal model that can facilitate phase identification and other functions. The models can be trained, established, maintained, or otherwise provided by the model generator 144. For example, the model 156 can be trained on a workflow of the procedure, which can include operative and non-operative phases of the medical procedure. For example, the non-operative and operative phases of the workflow associated with the medical procedure can include room preparation, robot setup, a surgical or medical procedure, turn over, or cleaning. The operative phases of the workflow of the medical procedure can include exposure, dissection, transection, reconstruction, and extraction. The operative phases can, therefore, be a subset of the medical procedure phase in the full workflow that contains both the non-operative and operative phases.

[0079] Activities can correspond to a more granular or lower resolution function relative to phases. Activities can include, for example, the turn-over of a medical environment, setup of a medical environment, sterile preparation, robotic medical system draping, patient-in, patient preparation, intubation, patient draping, port placement, robotic medical system roll-up, robotic medical system docking, robotic medical system undocking, robotic medical system roll-back, patient close, robot medical system undraping, patient undraping, patient out, or cleaning of the medical environment.

[0080] To determine a phase, the data processing system 130 can perform activity recognition or detection, and map the detected activities to a corresponding phase. The phase detector 138 can utilize the machine learning model 156 (e.g., a phase model) to determine, based on the data stream 152 from the set of sensors, an activity of the scene of the medical environment 105 captured by the set of sensors of data capture device 110. For example, the machine learning model 156 can be a viewpoint agnostic machine learning model trained to determine the activity of scene based on data stream 152 that includes an arbitrary number of image streams captured from arbitrary viewpoints. As a result, the configuration of data capture devices 110 may not be constrained by the model 156 (e.g., phase model) to a fixed number of data capture device 110 being located only at certain fixed or relative locations, but the data processing system 130 can be configured to receive inputs from any configuration of imaging data capture devices 110 suitable in the medical setting or environment.

[0081] For example, system 100 may can be a dynamic system or include dynamic components that can have viewpoints that can dynamically change during a medical session (e.g., during any phase of the medical session such as during pre-operative activities (e.g., setup activities), intra-operative activities, or post-operative activities). The viewpoint of a data capture device 110 can dynamically change in any way that changes the field of view of the sensor, such as by changing a location, pose (e.g., position and orientation), orientation, zoom setting, or other parameter of the data capture device 110.

[0082] The machine learning model 156 (e.g., phase model) can be trained to determine the activity of the medical environment 105 (e.g., scene or operating room) based on the data stream 152 or the 3D reconstruction. The machine learning model 156 (e.g., phase model) can include activity recognition functions or techniques, neural networks, or a recurrent neural network (RNN). The phase detector 138, using a machine learning model 156, can perform activity recognition to extract features of the 3D reconstruction or the data stream 152 to determine an activity within the scene. For example, the phase detector 138, using activity recognition, can extract features from the 3D reconstruction or the data stream 152. The data processing system 130 can be configured with any suitable activity recognition technique, such as a fine-tuned inflated 3D (I3D) model 156 (e.g., phase model) or any other neural network or deep neural network.

[0083] The phase detector 138 can perform activity recognition using an RNN model. The RNN can use extracted features from the 3D reconstruction or the data stream to determine respective classifications of an activity of scene (e.g., medical environment 105 or operating room). For example, the data processing system 130 can input features extracted from the 3D reconstruction into the RNN model to determine a classification of the activity of the scene as captured from the set of sensors (e.g., data capture devices 110) associated with the medical procedure in the medical environment 105. In some cases, the phase detector 138 can determine the activity of the scene directly using sensor data received from each sensor of the set of sensors, in which case the phase detector 138 can determine, via the RNN model, a first classification of the activity from the first sensor data, a second classification of the activity from the second sensor data, a third classification of the activity from third sensor data, etc. By using the 3D reconstruction to determine the activity of the scene, aspects of this technical solution can improve the accuracy in which an activity can be recognized as the 3D reconstruction can incorporate data from multiple viewpoints into a single coordinate system and reconstruction, which can include digital twins, while reducing computing resource utilization by virtue of determining a single classification via the RNN model, as opposed to separately invoking the RNN model for sensor data received from each sensor of the set of sensors.

[0084] In some cases, the data processing system 130 can determine the activity of the scene using both the 3D reconstruction and the data stream 152 from each of the sensors of the set of sensors. For example, the 3D reconstructor 136 can generate the 3D reconstruction using point cloud data captured by a set of 3D sensors (e.g., ToF sensors). The phase detector 138 can use the phase model, RNN model, or other neural network or deep neural network with one or more hidden layers, to recognize an activity of the scene using the 3D reconstruction. Further, the phase detector 138 can use imaging data of the data stream 152 captured by each sensor to recognize activities, via the neural network, from the image data. Thus, the phase detector 138 can generate, for a given time interval, a first classification of an activity of the scene using the 3D reconstruction, a second classification of the activity of the scene using sensor data from a first sensor, and a third classification of the activity of the scene using sensor data from a second sensor. The sensor data can be multi-modal sensor data.

[0085] The neural network models can each provide a classification to a data fusion module of the phase detector 138 which can generate fused data for determining the activity of a medical session in the medical environment 105. For example, a data fusion module can receive a respective classification of the activity of a medical session in the medical environment 105 from each of neural network models and determine, based on the respective classifications, a final classification of the activity of medical session occurring in the medical environment 105. The phase detector 138, using data fusion, can generate the fused data to determine the final classification in any suitable manner. For example, the phase detector 138 can fuse the classification by applying a weight to classifications from the neural network models to determine the final classification.

[0086] In some cases, the phase detector 138 can receive additional information with each classification to generate the fused data to determine the activity of scene 105. For example, the phase detector 138 can also receive an activity visibility metric for each video clip or image stream that rates how visible the activity of the medical session in the medical environment 105 is in corresponding imagery. The activity visibility metric can include a score or any other metric that represents a rating of how visible an activity of the medical session in the medical environment 105 is in the imagery. The phase detector 138 can weigh the classification based on a confidence score generated by the neural network model 156. In some cases, the phase detector 138 can apply the highest weight to the classification made using the 3D reconstruction, as the 3D reconstruction is generated from the set of sensors having varying viewpoints.

[0087] The model 156 (e.g., phase model) can include multiple layers (e.g., stages) of functions. Such layers may refer to functions or processes (e.g., activity recognition function, RNN function), which can be referred to as "vertical" or “horizontal’ layers in configurations, or channels of data processing. The machine learning model 156 can include additional, fewer, or different layers (e.g., different configurations of layers). Further, layers (horizontal or vertical) of machine learning model 156 may be connected in any suitable manner such that connected layers may communicate and/or share data between or among layers.

[0088] For example, a machine learning model 156 (e.g., phase model) can include video long short-term memories (LSTMs) configured to determine a classification of an activity of scene 105 as captured by image data from one or more data capture devices 110. For example, video LSTM can determine a first classification of the activity based on image data and features extracted or processed by feature processing modules. Video LSTM can determine a second classification of the activity based on second image data and features extracted or processed by feature processing modules, which can be fused together with the first classification to result in a more accurate determination of the activity of scene 105 than a classification based solely on individual image streams.

[0089] Machine learning model 156 can includes a global LSTM configured to determine a global classification of the activity of scene 105 based on fused data generated by fusing classifications from the 3D reconstruction and individual data streams 152 from each sensors in the set of sensors. As the global classification can be based on fused data, the global classification can be a determination of the activity of scene 105 based on both 3D reconstruction and image data.

[0090] The data processing system 130 can include a tracker 140 designed, constructed and operational to determine, recognize, detector, otherwise identify a pattern of motion in the medical environment 105. The tracker 140 can identify the pattern of motion in the medical environment 105 using the 3D reconstruction. The tracker 140 can determine the pattern of motion corresponding to movement of personnel, equipment, or other objects in the operating room during any phase of the workflow associated with the medical session. The tracker 140 can determine the pattern of motion during operative or non-operative phases associated with the medical session, including, for example room preparation, robot setup, the medical procedure, turn over, or cleaning, as well as operative phases within the medical procedure, such as exposure, dissection, transection, reconstruction, or extraction. The data processing system 130 can provide, for display via a graphical user interface, the pattern of motion overlayed on the 3D reconstruction.

[0091] To determine motion patterns with respect to a spatial layout (e.g., the layout or topology of the medical environment 105, location of objects, equipment or personnel within the medical environment 105), the tracker 140 can use a tracking function 142. The tracker 140, upon determining the motion pattern can overlay the traffic pattern on top of a 3D reconstruction of a spatial layout of the medical environment 105. The tracker 140 can determine an amount of movement of people and objects in the medical environment 105 based on the motion pattern, and provide the information to a performance controller 146 to determine a level of quality, performance or efficiency of the spatial layout of the medical environment 105 based, at least in part, on the motion patterns. The tracker 140 can provide motion pattern information or metrics in a motion graph, a heatmap, or an analytics dashboard, which can be presented via the interface 132.

[0092] The tracker 140 can use one or more functions 142 or other techniques to track, detect, or otherwise identify patterns of motion in the medical environment 105. The pattern of motion can refer to or include a path between two or more locations, points, or objects in the medical environment 105. The pattern of motion can refer to the movement of a person, an object, or piece of equipment in the medical environment 105. The movement can occur in any phase of the workflow associated with the medical session, including, for example, preoperative, intraoperative, or post-operative.

[0093] For example, the tracker 140 can be configured with or utilize an optical flow-based tracking function 142. The function 142 can track motion in pixels in consecutive video frames or frames of the 3D reconstruction to estimate a motion vector of an object in the scene. To do so, the tracker 140 can perform feature detection to identify or detect key points or features in a first frame of the video or 3D reconstruction, such as comers, edges or other distinctive points. In some cases, the 3D reconstruction can include digital twins, in which case the tracker 140 can identify or track movement of the digital twin representation of the object in the 3D reconstruction from one time interval to another.

[0094] To determine the pattern of motion, the tracker 140 can identify or detect the same feature in subsequent frames of the 3D reconstruction or video by searching for the corresponding points in the new frame. For example, the tracker 140 can estimate an intensity of each feature point changes between frames.

[0095] The tracker 140, upon detecting the corresponding features in the current frame or subsequent frame, can use a motion estimation function 142 to determine a motion vector for the feature or object. The motion vector can indicate a direction and distance the object or feature has moved from one frame to another in the 3D reconstruction. To reduce noise or improve accuracy, the tracker 140 can apply a filtering or smoothing technique. The tracker 140 can apply filtering or smoothing to the motion vectors for an object over multiple frames to determine the pattern of motion. With the filtered and smoothed motion vectors, the tracker 140 can determine the pattern of motion of a particular object or personnel in the medical environment 105 over various phases associated with the medical session in the medical environment 105. For example, the tracker 140 can compute the trajectory of a person by connecting positions of tracked features associated with that person over a time interval in the 3D reconstruction.

[0096] Example functions 142 the tracker 140 can invoke or execute to determine a pattern of motion can include Lucas-Kanade, Horn-Schunck, or dense optical flow. In some cases, the tracker 140 can utilize a model 156 (e.g., a tracker model). The tracker model can include a deep learning model (e.g., a convolution neural network or recurrent neural network) to track an object, which can be configured to handle a wider range of scenarios. The deep neural network (e.g., a model 156) can be trained to identify, recognize, determine, or otherwise detect a pattern of motion in a 3D reconstruction over one or more time intervals. To do so, the neural network model 156 (e.g., tracker model) can perform object detection in each frame of the 3D reconstruction, and track the detected objects from one frame to the next frame. To track the object from frame-to-frame in the 3D reconstruction, the neural network model 156 can leverage functions such as a Kalman filter, particle filter, or another deep learning-based tracker to track the detected object across subsequent frames. The deep learning-based tracker can utilize a combination of motion prediction and object or appearance matching to estimate new positions of tracked objects in each frame.

[0097] The tracker 140, using deep learning techniques, can handle occlusions or temporary disappearances of object by predicting a likely person of the object based on previous motion information, and then continue tracking the object after it reappears. The tracker 140 can perform smoothing and post-processing (e.g., via a Kalman filter or moving average filter) to improve the tracking results from the deep learning technique.

[0098] The data processing system 130 can include a model generator 144 designed, constructed or operational to train, maintain, manage, or otherwise provide one or models 156 generated with one or more machine learning techniques. The data processing system 130 can utilize various types of models trained with various types of machine learning techniques (e.g., a spatio-temporal video model) and training data (e.g., multi-modal sensor data). The models 156 can be generated, trained, or otherwise configured to receive a certain type of input and provide a certain type of output that facilitates machine learning-based layout optimization. The models 156 can include, for example, a reconstruction model used by the 3D reconstructor 136 to generate a 3D reconstruction of a spatial layout of a medical environment 105. The models 156 can include a phase model used by the phase detector 138 to detect an activity or phase of the medical session occurring in the medical environment 105. The models 156 can include a tracker model used by the tracker 140 to identify a pattern of motion during the medical session that occurs in the medical environment. The models 156 can include a performance model used by the performance controller 146 to determine a metric or quality of the medical session. The performance controller 146 can use the performance model to determine or predict an adverse event during the medical session. The models 156 can include an action model used by the performance controller 146 to determine an action to perform based on the quality of the spatial layout or determination or prediction of an adverse event. In some cases, a single multi-modal model 156 can perform the functions of two or more of the phase model, performance model, tracker model, reconstruction model, performance model, or action model, for example.

[0099] Thus, the data processing system 130 can utilize multiple models configured or trained for different types of input and output. For example, a first one or more phase models can be configured to detect activities or a phase in a data stream. A second one or more phase models can be configured to detect activities or a phase in a 3D reconstruction. A third one or more tracker models can be configured to detect a pattern of motion. A fourth one or more performance models can be used by the performance controller 146 and configured to determine metrics associated with efficiency of a layout of the operating room. A fifth one or more performance models can be used by the event predictor 148 and configured to predict adverse events that may occur in the operating room, or that have occurred during a phase in a workflow associated with a medical session.

[0100] The model generator 144 can train one or more models 156 based on training data using a machine learning technique, such as a neural network, convolution neural network, recurrent neural network, LSTMs, deep learning, a transformer architecture, or a self-attention transformer architecture, for example. For example, the model 156 can be a spatio-temporal model, which can refer to a model generated based on data collected across time and space and has at least one spatial and one temporal property. The model 156 can be a spatio-temporal video model trained, generated or established to make predictions about data that varies in both space and time in a video stream. The spatio-temporal model can combine techniques from both spatial analysis and time series analysis.

[0101] The training data can include labeled sets of imagery, labeled sets of 3D reconstructions. Training data sets can include imagery of medical procedures captured by data capture devices 110. For example, a particular medical session or medical procedure can be captured by four imaging devices and the video clips of the four image streams can be labeled to generate a training set. A subset including the video clips of three of the four image streams may be used as another training data set. Thus, using a same set of data streams, multiple training data sets may be generated. Video clips from two or more image streams may be interpolated and/or otherwise processed to generate additional video clips that may be included in additional training data sets. In this manner, a machine learning model 156 can be trained to be viewpoint agnostic, able to determine activities of scenes based on arbitrary numbers of image streams from arbitrary viewpoints.

[0102] The model generator 144 can train the models 156 using historical data 162. The historical data 162 can include multi-modal sensor data. The training data can include time series data, in which data can be organized in a sequential order or with timestamps. The model generator 144 can train or update the models 156 using data from various institutions. The model generator 144 can train or update the models 156 using feedback from a user. The training data used by the model generator 144 to train the models 156 can be similar to or compatible with the type of input the deployed model 156 is being configured to receive and process. For example, to train the reconstruction model, the training data can include historical 3D reconstruction data. The historical 3D reconstruction data can be labeled in a manner to allow the model 156 to learn what to predict. For example, the 3D reconstruction data can be labeled with objects, patterns of motions, or indications of efficiency of a layout of an operating room. Thus, to train the reconstruction model, the model generator 144 can utilize training data that includes at least one of labeled sets of 3D reconstructions, historical 3D point cloud data obtained from one or more medical environments, computer aided drawings, blueprints, or video stream data obtained from a robotic medical system configured to perform at least a portion of the medical session.

[0103] To train a performance model, the model generator 144 can utilize training data that includes at least one of historical metrics indicative of quality of medical environments, outcomes of medical sessions, or 3D reconstructions. The historical metrics can include, for example, quality of the medical environment (e.g., safety, efficiency, or performance), outcomes of medical sessions, cost efficiency (e.g., the amount of space the medical equipment or environment occupies in a facility or hospital as compared to the amount of space that is available or the amount of space that is wasted). To train an action model, the model generator 144 can use training that includes least one of simulated actions performed in 3D reconstructions, historical event logs of medical sessions performed via one or more robotic medical systems. Training data can include data or knowledge provided or established by experts that indicates or describes an appropriate action configured to resolve or mitigate an issue related to a particular spatial layout.

[0104] The data processing system 130 can include a performance controller 146 designed, constructed and operational to determine, based at least on a pattern of motion in the 3D reconstruction, a metric indicative of quality of a spatial layout of the medical environment 105 during the phase of the medical procedure. The metric can correspond to an OPI that indicates a quality, performance of efficiency of a spatial layout of a medical environment 105 (e.g., an operating room). The quality can refer to or include the performance associated with a spatial layout, or an efficiency associated with the spatial layout. The performance controller 146 can perform, based on the metric and the phase, an action. The action can be to improve the quality or performance of the spatial layout of the medical environment, or reduce the likelihood of negative performance (e.g., adverse events) in the medical environment 105. [0105] The performance controller 146 can determine one or more metrics that can indicate a level of quality or performance associated with the medical environment, the layout of the medical environment, or the medical session. The performance controller 146 can determine metrics for each phase or aggregate metrics for the whole medical session. The perform controller 146 can perform actions based on or responsive to phase-based metrics or aggregate metrics.

[0106] The value of the metric can include a numerical value, a letter grade (e.g., A, B, C, D, or F), a score, a binary value (e.g., good or bad, 0 or 1, efficient or inefficient) or other indicator of quality, performance or efficiency. The numerical value of the metric can range from 0 to 1, 0 to 10, 0 to 100, or have any other range. The numeric value of the metric can include a percentage. The metric can be normalized based on a baseline or average metric established for a particular type of metric or particular phase associated with the medical session being performed in the medical environment. The metrics can include statistical metrics, such as averages, standard deviations, or variances.

[0107] The metrics can be based on one or more factors or a combination of factors. The metric can be based on, for example, a duration of a phase of a medical session, a pattern of motion, a layout of the medical environment, or a dimension of the medical environment. In an illustrative example, the performance controller 146 can determine the metric based on a count of personnel detected in the medical environment, a size of the medical environment, and the phase of the medical session. In some cases, the data processing system 130 can determine the metric based on a count of personnel detected in the medical environment, a pattern of motion of the personnel or objects, a size of the medical environment, time elapsed in the phase or the medical session, occurrence of one or more adverse events, or the phase of the medical session.

[0108] The metric can be based on a weight function with inputs corresponding to one or more of the count of personnel, size of the room, or phase of the medical session. The data processing system 130 can determine weights for the function that can optimize the quality, efficiency or performance of the spatial layout (e.g., performance of a medical session in the spatial layout of the medical environment).

[0109] The performance controller 146 can determine, for a particular phase in the medical session as determined by the phase detector 138, the number of personnel in the medical environment (e.g., operating room) and a square footage of the medical environment. To determine the metric, the performance controller 146 can utilize a function or a model 156 trained with machine learning. For example, the performance controller 146 can input the phase, count of personnel, and square footage of the medical environment into a machine learning model 156 (e.g., performance model) to predict the performance of the spatial layout of the medical environment 105. The machine learning model 156 (e.g., performance model) can be trained with training data that includes various counts of personnel, sizes of operating rooms, phases of medical sessions, and corresponding performance scores. Thus, the performance controller 146 can determine, based on the machine learning model 156, a quality metric or performance metric.

[0110] The performance controller 146 can determine a quality level or a level of efficiency of a layout of a medical environment 105 based on a pattern of motion during a phase associated with the medical session. The pattern of motion or movement can be quantified using one or more metrics that can indicate a duration of a type of motion or movement, distance of the movement of the path, speed of the motion or movement, angular speed or rotation rate associated with a motion or movement, vertices associated with the motion or movement, shape of a path of the movement, or idle time during traversal of the path. For example, a high idle time metric (e.g., based on comparison with a threshold) or low speed of movement metric (e.g., based on comparison with a threshold) during traversal of a path can indicate negative performance or inefficiency in the layout due to a surgeon having to unnecessarily slow down or stop while traversing path through the medical environment.

[OHl] In another example, a path that is circuitous through the layout of the medical environment (e.g., associated with unnecessary or excessive twists, bends, turns, or deviations from the most direct or efficient route), can be determined to be inefficient or associated with low performance. The performance controller 146, to determine whether a path is unnecessarily circuitous, can establish an optimal path for a person or object during a phase of the workflow associated with the medical session, and then compare the pattern of motion associated with the 3D reconstruction with the optimal path. In some cases, the performance controller can compare the pattern of motion with a baseline pattern of motion identified from historical data 162. In some cases, the performance controller 146 can identify an optimal path by adjusting the layout or topology in the 3D reconstruction of the medical environment to predict whether an improved path can be generated or created for the medical environment without negatively impacting the performance of the medical session. For example, the performance controller 146 can evaluate different positions, locations, or orientations for an object in the medical environment to determine if one of them would result in a more efficient path or pattern of motion. Upon identifying a more efficient path as a result of changing the location, position or orientation of an object in the medical environment, the performance controller 146 can provide a suggestion or otherwise perform an action to cause the change in the layout to improve the quality or performance of the layout.

[0112] The performance controller 146 can compare the metric with a threshold 158 established for the phase of the medical session. The performance controller 146 can select the threshold 158 configured for a particular phase, type of medical environment, type of medical procedure, or other attribute or characteristic. The performance controller 146 can determine the threshold based on metadata of the medical environment. For example, metadata can include a site, surgeon, institution, geographic location of the medical environment 105, age of the medical environment 105, surgeon identifier, type of the medical environment, or other information associated with the medical environment or medical session that can facilitate selecting or identifying a threshold configured to evaluate the quality or performance of the layout associated with the medical session or medical environment in which the medical procedure of the medical session is to be performed or is being performed.

[0113] The performance controller 146 can compare the metric with the threshold to determine whether the metric satisfies the threshold. For example, depending on the type of metric, if the metric is greater than the threshold, then the data processing system 130 can determine that the metric is indicative of a negative, poor, or low level of performance.

[0114] The data processing system 130 can determine, based on the comparison of the metric with the threshold, to perform an action to improve the performance of the layout of the operating room. For example, the performance controller 146 can determine, identify, or select an action such as to provide an alert, notification, instruction, guideline for an optimal spatial layout, or warning to indicate to adjust a layout or configuration of the operating room. The instruction can be to move the position or orientation of a piece of equipment in the operating room, such as a desk, operating table, operating lamp, light fixture, computer, or chair. The guideline for the optimal spatial layout can be based on a baseline spatial layout having a desired quality metric or performance for a phase in the medical session.

[0115] In some cases, the performance controller 146 can select an action to automatically adjust the layout or configuration of the medical environment. The performance controller 146 can select or identify an action based on a rule, heuristic, or map data structure. For example, the performance controller 146 can perform a lookup in a table using the metric, phase, or other information to select an action to perform. In some cases, the performance controller 146 can utilize a machine learning model to determine the action to perform in order to improve the performance of the layout of the medical environment 105. For example, the performance controller 146 can input the metric into a second model 156 trained with machine learning on historical data to determine the action to perform. The action can include, for example, to adjust at least one of a height (e.g., raise or lower an object), orientation, or location of an object located in the medical environment.

[0116] For example, the performance controller 146 can determine that the performance of the layout can be improved by adjusting a height of an object or piece of equipment in the medical environment, such as a table height or lamp height. In the even the lamp height is adjustable via an actuator or other motor, the performance controller 146 can transmit a command or instruction to the actuator, motor, or other electronic controller of the lamp to cause the adjustment to the height of the lamp. The performance controller 146, using the machine learning model 156, can predict the height of the lamp that can result in an improved performance metric. For example, the performance controller 146 can predict the optimal height of the lamp based on historical 3D reconstruction data and corresponding performance scores used to train the machine learning model 156.

[0117] The performance controller 146 can evaluate the performance of the medical session during any phase in the workflow associated with the medical session, including, for example, pre-operatively, intraoperatively, or post-operatively. The performance controller 146 can predict a performance of a layout in a phase prior to occurrence of the phase. For example, the performance controller 146 can determine, in a pre-operative phase, the performance of the layout during an intraoperative phase. For example, the performance controller 146 can determine, prior to performance of the medical session, that the layout of the medical environment in the 3D reconstruction varies from a baseline layout of medical environments established based on historical data associated with medical sessions corresponding to the medical procedure. The performance controller 146 can provide an instruction to adjust the layout to conform with the baseline layout prior to performance of the medical session.

[0118] To do so, the performance controller 146 can access the 3D reconstruction of the layout of the medical environment 105 constructed from sensor data from the set of sensors via the 3D reconstructor 136. The performance controller 146 can access historical data 162 in the data repository 150 to identify or establish a baseline layout. The baseline layout can correspond to a layout associated with high performance or satisfactory performance greater than a threshold 158. The baseline layout can be based on 3D reconstructions across various institutions, phases, or types of medical procedures. The baseline layout can indicate a size of a room, count of personnel, count of objects or equipment, topology of the medical environment, or patterns of motion. The performance controller 146 can compare the layout with the baseline to identify variances, such as variances in the topology, count of objects or equipment or personnel, or patterns of motion. In some cases, the performance controller 146 can map the factors or attributes of the layout and the baseline layout to a multidimensional space in order to compute a distance metric between the layout and the baseline layout. If the distance metric is greater than a threshold (e.g., 5%, 10%, 15%, 20% or more), then the performance controller 146 can predict that the layout may be low performing or inefficient. Responsive to predicting that the layout may be low performing, the data processing system 130 can provide an alert, notification, warning, instruction or perform another action to preemptively improve the performance of the layout prior to occurrence of the phase of the medical session.

[0119] The performance controller 146 can determine to perform an action that includes disabling a function of the robotic medical system 125 used to perform a medical procedure within the medical session. The data processing system 130 can disable the function prior to occurrence of the adverse event and responsive to the prediction of the adverse event. The function can include, for example, an activity of a tool, cutting, re-positioning, moving, or stitching. The data processing system 130 disable the function for at least one of a predetermined time interval or until detection of a predetermined event. For example, the data processing system 130 can disable the function until a surgeon acknowledges a warning, alert, or notification. The data processing system 130 can disable the function until a user select a button on a user interface or provides a user input (e.g., voice input, keyboard input, mouse input, touchpad input, or gesture input) acknowledging a warning, alert, or notification. The data processing system 130 can disable the function until a response to a prompt is received. The data processing system 130 can disable the function until a countdown timer expires (e.g., 5 seconds, 10 seconds, 30 seconds, 60 seconds, 2 minutes, or other time interval). The duration of the timer can be set based on the type of adverse event, or phase of the medical session.

[0120] Thus, the data processing system 130 can determine, during the phase, that the spatial layout of the medical environment in the 3D reconstruction varies from a baseline layout of medical environments established based on historical data associated with medical procedures similar to the medical procedure. The data processing system 130 can provide an action that includes an instruction to adjust the spatial layout to conform with the baseline layout prior to performance of the medical procedure in the medical session.

[0121] The performance controller 146 can include an event predictor 148 designed, constructed and operational to predict, based at least on the pattern of motion in the 3D reconstruction and the phase of the medical session, an adverse event in the medical environment during the medical session. The data processing system 130 (e.g., performance controller 146 or event predictor 148) can provide, responsive to predicting the adverse event, an action including an indication of the prediction of the adverse event prior to occurrence of the adverse event. For example, the data processing system 130 can perform an action to prevent the occurrence of the adverse event, reduce the likelihood of occurrence of the adverse effect, or otherwise mitigate the negative consequence of effect of the adverse event.

[0122] The event predictor 148 can predict the adverse event based on at least one of the pattern of motion in the 3D reconstruction, the phase of the medical session, or user input (e.g., voice input, keyboard input, mouse input, touchpad input, or gesture) received during the medical session. The adverse event can include at least one of a collision, a sterility breach, or an inefficient motion. A collision can refer to or include two or more personnel colliding or coming into contact with another in an undesired manner; a person and an object or piece of equipment colliding or coming into contact with another in an undesired manner; or two or more objects colliding or coming into contact with another in an undesired manner.

[0123] A sterility breach in a medical environment can refer to or include a situation in which the sterile environment within the medical environment is compromised, potentially leading to contamination. Sterility breaches can occur when foreign contaminants, such as bacteria, viruses, or particles, enter the sterile field. Sterility breaches can occur as a result of, for example, unintentional contact (e.g., healthcare professionals or surgical team members accidentally touching non-sterile surfaces or objects and then touching sterile equipment or the surgical site without proper hand hygiene or glove changes), torn or damaged sterile wraps, or airborne contamination (e.g., airborne particles or microorganisms, such as dust or bacteria, entering the medical environment due to ventilation issues or inadequate air filtration).

[0124] An inefficient motion can refer to or include a surgeon, object or equipment having to move an excessive distance during a phase of the workflow, or excessively re-orienting or rotating during a phase. An inefficient motion can refer to or include a duration of a motion, or idle time of a surgeon.

[0125] To predict the adverse event, the event predictor 148 can evaluate the patterns of motions to determine a trend or predict how paths associated with various patterns of motions may change or evolve throughout the workflow associated with the medical session. The event predictor 148 can utilize a performance model (e.g., a model 156) trained with machine learning. The event predictor 148 can input the 3D reconstruction of the layout or topology of the medical environment 105 into a machine learning model 156 (e.g., a performance model) to predict whether a collision may occur. For example, the machine learning model 156 can be trained on historical layouts, patterns of motions, and collisions. The machine learning model 156 can be configured to predict whether a collision may occur based on a given layout, topology, phase, type of medical session, or other metadata. The machine learning model 156 can output, based on a 3D reconstruction of a topology of a medical environment 105, a prediction or score indicative of a collision. The model 156 can indicate the phase of the workflow of the medical session in which the collision may occur. Responsive to the prediction that a collision may occur, the performance controller 146 can perform an action, such as providing an alert or warning that a collision may occur. The data processing system 130 can provide a guideline for an optimal spatial layout.

[0126] The performance controller 146 can predict metrics indicative of performance for one or more phases in the workflow of the medical session. For example, the performance controller 146 can predict, based on at least one of the 3D reconstruction or the metric, a second metric indicative of performance of the layout of the medical environment for a second phase subsequent to the phase. The performance controller 146 can select, based on the second metric, a second action to perform subsequent to completion of the phase to improve performance of the layout of the medical environment during the second phase of the medical session.

[0127] The performance controller 146 can determine the second metric and second action independent of the first metric and first action. For example, the first action may change or modify the layout during a phase, which in turn may result in the data processing system 130 creating a second 3D reconstruction of the layout, or otherwise modifying the previously generated 3D construction of the layout. The performance controller 146 can determine the second metric based on the new 3D reconstruction that was updated or modified responsive to the first action performed during the phase of the medical session. Thus, the data processing system 130 can continuously update the 3D reconstruction and generate new performance metrics and action in real-time.

[0128] The data processing system 130 can output the metric or action with visual output or auditory output (e.g., sound, alarm, or voice output). The data processing system 130 can provide the output or guidance with visual signals or audio signals. The guidance can be configured to mitigate a likelihood of occurrence of an adverse event prior to occurrence of the adverse event. The data processing system 130 can include an interface 132 that can provide or present a graphical user interface executed or rendered by a 3D viewer 176 on a client device 174. The 3D viewer 176 can provide the graphical user interface for display via a display device. The interface 132 can provide a 3D viewer 176, such as web-based 3D viewer 176 application, or native 3D viewer 176 application. The interface 132 can transmit, view network 101, data or output that can be rendered or presented for display via a display device. For example, the data processing system 130 can provide a 3D reconstruction for display via the client device 174. The data processing system 130 can overlay patterns of motions or performance metrics on the 3D reconstruction, alongside the 3D reconstruction, or otherwise. The data processing system 130 can provide an analytics dashboard with various performance metrics. The data processing system 130 can provide a video stream (e.g., data stream 152) of the medical session for display via the graphical user interface. The data processing system 130 can provide the 3D reconstruction alongside or overlayed on the video stream. The data processing system 130 can synchronize the playback (or real-time playback) of the video stream of the medical session with the 3D reconstruction, patterns of motions, or performance metrics. The data processing system 130 can provide a graph of values of a performance metric with respect to time or phases in the workflow.

[0129] The data processing system 130 can present, via the graphical user interface, the 3D reconstruction including one or more digital twins of one more objects in the medical environment. The data processing system 130 can overlay, on the graphical user interface with the digital twin, a heatmap corresponding to the pattern of motion in the medical environment during the medical session. The heatmap of motion (or motion heatmap) can visually depict an intensity or distribution of motion within the medical environment or specific region or area within the medical environment over a certain time interval. The intensity can correspond to the degree of motion in a particular region or area within the medical environment. The intensity can be based on factors such as speed, direction, or both. The degree of motion can be determined from the 3D reconstruction or data stream 152 by the tracker 140.

[0130] The data processing system 130 can allow for interaction with the analytics dashboard. For example, the user can interact with a metric, 3D reconstruction, video, or phase timeline to better evaluate various metrics at various phases or activities in the medical session. The 3D viewer 176 can provide various perspectives or views of the medical environment, including a top view or different side views. The user can zoom in or out the 3D reconstruction using the 3D viewer, as well as pan through the 3D reconstruction. The 3D reconstruction can include an annotation of the phase detected by the phase detector 138.

[0131] FIG. 2 depicts an example graphical user interface for presenting a 3D reconstruction. The graphical user interface can be presented by one or more system or component depicted in FIG. 1, including for example, a data processing system 130 or 3D viewer 176. The 3D viewer 176 depicted in FIG. 2 illustrates a 3D reconstruction 200 generated by 3D reconstructor 136. The 3D reconstruction 200 depicted in FIG. 2 can include a representation of a surgeon 202 in a medical environment. The 3D reconstruction 200 can include a robotic system 204, and other objects or equipment 206. The robotic system 204 included in the 3D reconstruction can be a digital twin of the robotic medical system 125 in the actual medical environment 105.

[0132] The data processing system 130 can indicate a keep-out zone that can correspond to the adverse event location. Establishing a keep-out zone can prevent, or reduce the likelihood, of an adverse event occurring in the keep-out zone. To establish a keep-out zone, the data processing system 130 can use the patterns of motions determined by the tracker 140 to determine, predict, or otherwise identify potential adverse events, such as collisions or sterility breaches. For example, as depicted in FIG. 5B, the data processing system 130 can determine that an adverse event 575 can include a collision due to the first path 565 intersecting or colliding with the second path 570 during a same time interval or phase of the medical session. Responsive to detecting this potential adverse event, the data processing system 130 can establish a keep-out zone to prevent, avoid, or otherwise reduce the likelihood of the occurrence of the adverse event. The keep-out zone can include a geospatial zone. The keep- out zone can include a temporal and geospatial zone. For example, the keep-out zone can prohibit personnel 530B depicted in FIG. 5B from traversing the second path 570 during a first time interval, while allowing personnel 530A to traverse the first path 565 during the first time interval. The keep-out zone can prohibit the personnel 530A from traversing the first path 565 during a second time interval in which the second personnel 530 is authorized to traverse the second path 570, thereby preventing or reducing the likelihood of the collision or adverse event at 575.

[0133] FIG. 3 depicts an example method of machine learning-based spatial layout optimization via 3D reconstructions. The method 300 can be performed by one or more system or component depicted in FIG. 1 or FIG. 6, including, for example, a data processing system. At ACT 302, the method 300 can include the data processing system receiving sensor data. The data processing system can receive sensor data from a set of 3D sensors located in the medical environment. The 3D sensors can be located in and around the medical environment. The 3D sensors can capture optical, visual, infrared, or depth information in the medical environment. The 3D sensors can continuously capture data, or capture data based on a time interval, sample rate, or other frequency. For example, the 3D sensors can capture data at a frame rate of 1 Hz, 2 Hz, 3 Hz, 10 Hz, 16 Hz, 20 Hz, 24 Hz, 40 Hz, 50 Hz, or other frame rate or sample rate.

[0134] The data processing system 130 can receive data from the 3D sensors in real-time as a continually data stream, based on a time interval (e.g., every 5 seconds, 10 seconds, 30 seconds, or 1 minute), or in a batch mode. The data processing system 130 can request the data from the 3D sensors, or the 3D sensors can push the data to the data processing system.

[0135] At ACT 304, the data processing system can generate a 3D reconstruction. The data processing system can generate the 3D reconstruction of the layout or topology of the medical environment using the 3D sensor data. The 3D reconstruction can include or indicate a layout or topology of the medical environment, which can include personnel, objects, or equipment. The 3D reconstruction can include digital twin representation of certain objects. An example 3D reconstruction is illustrated in FIG. 2.

[0136] At ACT 306, the data processing system can identify a phase of the medical session. The data processing system can identify the phase of the medical session based on the data stream received from various data capture devices. The data processing system can determine the phase of the medical session based on the 3D reconstruction. To determine the phase, the data processing system can utilize a machine learning model. The data processing system can determine operative or non-operative phases. The data processing system can determine phases at various levels of granularity or resolution. For example, the data processing system can determine the phase as one of pre-operative, operative, or post-operative. In another example, the data processing system can determine the phase as one of room preparation, robot setup, a medical session, turn over, or cleaning. In another example, the data processing system can determine a phase of the medical session, such as exposure, dissection, transection, reconstruction, or extraction.

[0137] At ACT 308, the data processing system can determine a metric indicative of a quality of the medical environment. The metric can be indicative of quality of the spatial layout of the medical environment during the detected phase. The metric can be based on the phase, or the performance can be based on comparing the metric with a threshold established for the detected phase. In some cases, the metric can be normalized based on phase. For example, if the metric corresponds to a count of personnel, then the metric can be normalized using a baseline count of personnel established for each phase using historical data. Thus, the metric can indicate performance based on a particular phase. In another example, the metric of performance can correspond to an intensity of motion, which can be normalized based on an average intensity of motion in each phase established from historical data.

[0138] At ACT 310, the data processing system can determine an action. The data processing system can determine, identify, or select an action to perform based on the metric and the phase. In some cases, the data processing system can determine that the metric indicates a satisfactory level of performance, and determine to not change the layout or provide an instruction or notification to maintain the same layout throughout the phase or subsequent phases of the workflow associated with the medical session. In some cases, the data processing system can determine, based on a comparison of the metric with a threshold, to perform an action to improve the performance of the layout of the medical environment.

[0139] At ACT 312, the data processing system can perform an action. The data processing system can provide an instruction, alert, warning, or other indication to cause a change in the layout of the medical environment to improve the performance metric. The data processing system can provide guidance to mitigate or reduce the likelihood of an adverse event. The data processing system can provide guidance to adjust the spatial layout to be closer or more in alignment with an optimal layout.

[0140] The data processing system can perform the action in real-time, at the end of a current phase and before commencement of a subsequent phase, or after the medical session is completed so that the layout can be adjusted prior to commencement of the next medical session on the next patient. [0141] FIG. 4 depicts an example method of machine learning-based spatial layout optimization via 3D reconstructions. The method 400 can be performed by one or more system or component depicted in FIG. 1 or FIG. 6, including, for example, a data processing system. The method 400 can include one or more ACTs of process 300, including, for example, the data processing system receiving sensor data at ACT 302, generating a 3D reconstruction at ACT 304, identifying a phase of a medical session at ACT 306, and determining a metric indicative of quality at ACT 308.

[0142] Subsequent to determining the metric at ACT 308, the method 400 can proceed to decision block 410. At decision block 410, the data processing system can determine whether to perform an action based on the metric. For example, the data processing system can compare the metric with a threshold. The threshold can be selected based on the type of metric and phase identified at ACT 306. The threshold can be selected based on metadata associated with the medical environment or medical session, such as type of medical session, geographic location of medical environment, or surgeon identifier. In some cases, the data processing system can determine to perform an action if the metric is greater than or equal to the threshold. In some cases, the data processing system can determine to perform the action if the metric is less than or equal to the threshold. In some cases, the data processing system can select the action based on the amount of difference between the metric and the threshold, percentage difference, or type of metric.

[0143] If the data processing system determines to perform an action at decision block 410, the data processing system can proceed to ACT 414 to perform a selected action. If, however, the data processing system determines to not perform an action at decision block 410 based on the metric, the data processing system can proceed to decision block 412.

[0144] At decision block 412, the data processing system can make a prediction with regard to an adverse event. If the data processing system predicts that an adverse event may occur based on the current 3D reconstruction of the layout and patterns of motions, the data processing system can proceed to ACT 414 to select and perform an action to prevent, mitigate, or reduce the likelihood of occurrence of the adverse event. If, however, the data processing system does not predict that an adverse event is going to occur in a given phase, subsequent phase, or throughout the workflow, then the data processing system can return to ACT 302 to collect sensor data. Adverse events can include, for example, collisions, sterility breaches, or inefficient motions. [0145] FIG. 5 A depicts an example medical environment 105, in accordance with embodiments. The medical environment 105 can refer to or include a surgical environment or surgical system. The medical environment 105 can include a robotic medical system 125, a user control system 510, and an auxiliary system 515 communicatively coupled one to another. A visualization tool 520 (e.g., the visualization tool 170) can be connected to the auxiliary system 515, which in turn can be connected to the robotic medical system 125. Thus, when the visualization tool 520 is connected to the auxiliary system 515 and this auxiliary system is connected to the robotic medical system 125, the visualization tool can be considered connected to the robotic medical system. In some embodiments, the visualization tool 520 can additionally or alternatively be directly connected to the robotic medical system 125.

[0146] The medical environment 105 can be used to perform a computer-assisted medical procedure with a patient 525. In some embodiments, surgical team can include a surgeon 530A and additional medical personnel 530B-530D such as a medical assistant, nurse, and anesthesiologist, and other suitable team members who can assist with the surgical procedure or medical session. The medical session can include the surgical procedure being performed on the patient 525, as well as any pre-operative (e.g., which can include setup of the medical environment 105, including preparation of the patient 525 for the procedure), and postoperative (e.g., which can include clean up or post care of the patient), or other processes during the medical session. Although described in the context of a surgical procedure, the medical environment 105 can be implemented in a non-surgical procedure, or other types of medical procedures or diagnostics that can benefit from the accuracy and convenience of the surgical system.

[0147] The robotic medical system 125 can include a plurality of manipulator arms 535A- 535D to which a plurality of medical tools (e.g., the medical tool 120) can be coupled or installed. Each medical tool can be any suitable surgical tool (e.g., a tool having tissueinteraction functions), imaging device (e.g., an endoscope, an ultrasound tool, etc.), sensing instrument (e.g., a force-sensing surgical instrument), diagnostic instrument, or other suitable instrument that can be used for a computer-assisted surgical procedure on the patient 525 (e.g., by being at least partially inserted into the patient and manipulated to perform a computer- assisted surgical procedure on the patient). Although the robotic medical system 125 is shown as including four manipulator arms (e.g., the manipulator arms 535A-535D), in other embodiments, the robotic medical system can include greater than or fewer than four manipulator arms. Further, not all manipulator arms can have a medical tool installed thereto at all times of the medical session. Moreover, in some embodiments, a medical tool installed on a manipulator arm can be replaced with another medical tool as suitable.

[0148] One or more of the manipulator arms 535A-535D or the medical tools attached to manipulator arms can include one or more displacement transducers, orientational sensors, positional sensors, or other types of sensors and devices to measure parameters or generate kinematics information. One or more components of the medical environment 105 can be configured to use the measured parameters or the kinematics information to track (e.g., determine poses of) or control the medical tools, as well as anything connected to the medical tools or the manipulator arms 535A-535D.

[0149] The user control system 510 can be used by the surgeon 530A to control (e.g., move) one or more of the manipulator arms 535A-535D or the medical tools connected to the manipulator arms. To facilitate control of the manipulator arms 535A-535D and track progression of the medical session, the user control system 510 can include a display (e.g., the display 172) that can provide the surgeon 530A with imagery (e.g., high-definition 3D imagery) of a surgical site associated with the patient 525 as captured by a medical tool (e.g., the medical tool 120, which can be an endoscope) installed to one of the manipulator arms 535A-535D. The user control system 510 can include a stereo viewer having two or more displays where stereoscopic images of a surgical site associated with the patient 525 and generated by a stereoscopic imaging system can be viewed by the surgeon 530A. In some embodiments, the user control system 510 can also receive images from the auxiliary system 515 and the visualization tool 520.

[0150] The surgeon 530A can use the imagery displayed by the user control system 510 to perform one or more procedures with one or more medical tools attached to the manipulator arms 535A-535D. To facilitate control of the manipulator arms 535A-535D or the medical tools installed thereto, the user control system 510 can include a set of controls. These controls can be manipulated by the surgeon 530A to control movement of the manipulator arms 535A- 535D or the medical tools installed thereto. The controls can be configured to detect a wide variety of hand, wrist, and finger movements by the surgeon 530A to allow the surgeon to intuitively perform a procedure on the patient 525 using one or more medical tools installed to the manipulator arms 535A-535D. [0151] The auxiliary system 515 can include one or more computing devices configured to perform processing operations within the medical environment 105. For example, the one or more computing devices can control or coordinate operations performed by various other components (e.g., the robotic medical system 125, the user control system 510) of the medical environment 105. A computing device included in the user control system 510 can transmit instructions to the robotic medical system 125 by way of the one or more computing devices of the auxiliary system 515. The auxiliary system 515 can receive and process image data representative of imagery captured by one or more imaging devices (e.g., medical tools) attached to the robotic medical system 125, as well as other data stream sources received from the visualization tool. For example, one or more image capture devices (e.g., the data capture devices 110) can be located within the medical environment 105. These image capture devices can capture images from various viewpoints within the medical environment 105. These images (e.g., video streams) can be transmitted to the visualization tool 520, which can then passthrough those images to the auxiliary system 515 as a single combined data stream. The auxiliary system 515 can then transmit the single video stream (including any data stream received from the medical tool(s) of the robotic medical system 125) to present on a display (e.g., the display 172) of the user control system 510.

[0152] In some embodiments, the auxiliary system 515 can be configured to present visual content (e.g., the single combined data stream) to other team members (e.g., the medical personnel 530B-530D) who may not have access to the user control system 510. Thus, the auxiliary system 515 can include a display 640 configured to display one or more user interfaces, such as images of the surgical site, information associated with the patient 525 or the surgical procedure, or any other visual content (e.g., the single combined data stream). In some embodiments, display 640 can be a touchscreen display or include other features to allow the medical personnel 530A-530D to interact with the auxiliary system 515.

[0153] The robotic medical system 125, the user control system 510, and the auxiliary system 515 can be communicatively coupled one to another in any suitable manner. For example, in some embodiments, the robotic medical system 125, the user control system 510, and the auxiliary system 515 can be communicatively coupled by way of control lines 645, which can represent any wired or wireless communication link as can serve a particular implementation. Thus, the robotic medical system 125, the user control system 510, and the auxiliary system 515 can each include one or more wired or wireless communication interfaces, such as one or more local area network interfaces, Wi-Fi network interfaces, cellular interfaces, etc.

[0154] It is to be understood that the medical environment 105 can include other or additional components or elements that can be needed or considered desirable to have for the medical session for which the surgical system is being used.

[0155] FIG. 5B depicts an example graphical user interface depicting a layout of a medical environment overlayed with patterns of motion. The paths depicted in FIG. 5B can be identified and overlayed by one or more system or component depicted in FIG. 1, including, for example, a data processing system. The illustration depicted in FIG. 5B can correspond to a 3D reconstruction of the topology or layout of the medical environment 105. The data processing system can overlay an indication of an ingress or egress 560, such as a door to the medical environment.

[0156] The data processing system can overlay one or more paths, such as a first path 565 and second path 570. The first path 565 and second path 570 can represent a pattern of motion. The first path 565 and second path 570 can have a weight or intensity based on a degree of motion along the respective path. The first path and second path can be associated with motion heatmap.

[0157] The data processing system can identify an adverse event 575 corresponding to a collision along the first path 565 and second path 570 between a first surgeon 530B and a second surgeon 530A. For example, during a particular phase, the data processing system can predict that the surgeon 530A can traverse the first path 565 from 510 to 535A. During the same phase, the data processing system can predict that the surgeon 530B traverses second path 570. The data processing system can predict a likelihood of the adverse event 575 occurring based on the likelihood of the paths 565 and 570 being traversed during a same time interval or phase.

[0158] In some cases, the data processing system can predict the adverse event as an inefficient movement or motion, such as one of the surgeons idling, pausing, or deviating from one of the paths 565 or 570 for the purpose of avoiding the collision. This deviation or idling can be predicted by the data processing system as an adverse event. The data processing system can, responsive to detecting an adverse event, provide an alert, warning, notification or other indication to adjust a layout of the medical environment in order to improve the efficiency of the layout of the medical environment. For example, the equipment or table used by surgeon 530B can be moved to a different location or position (e.g., height) such that the path 570 does not intersect with path 565 at any time during the workflow associated with the medical session or medical procedure, thereby improving the efficiency or performance of the layout of the medical environment.

[0159] FIG. 6 is a block diagram depicting an architecture for a computer system 600 that can be employed to implement elements of the systems and methods described and illustrated herein, including aspects of the systems depicted in FIG. 1, FIG. 5 A or FIG. 5B, the user interface depicted in FIG. 3, and the methods or processes depicted in FIGS. 3 and 4. For example, the data processing system 130, robotic medical system 125, or client device 174 can include one or more component or functionality of computing system 600. The computer system 600 can be any computing device used herein and can include or be used to implement a data processing system or its components. The computer system 600 includes at least one bus 605 or other communication component or interface for communicating information between various elements of the computer system. The computer system further includes at least one processor 610 or processing circuit coupled to the bus 605 for processing information. The computer system 600 also includes at least one main memory 615, such as a random-access memory (RAM) or other dynamic storage device, coupled to the bus 605 for storing information, and instructions to be executed by the processor 610. The main memory 615 can be used for storing information during execution of instructions by the processor 610. The computer system 600 can further include at least one read only memory (ROM) 620 or other static storage device coupled to the bus 605 for storing static information and instructions for the processor 610. A storage device 625, such as a solid-state device, magnetic disk or optical disk, can be coupled to the bus 605 to persistently store information and instructions.

[0160] The computer system 600 can be coupled via the bus 605 to a display 630, such as a liquid crystal display, or active-matrix display, for displaying information. An input device 635, such as a keyboard or voice interface can be coupled to the bus 605 for communicating information and commands to the processor 610. The input device 635 can include a touch screen display (e.g., the display 630). The input device 635 can include sensors to detect gestures. The input device 635 can also include a cursor control, such as a mouse, a trackball, or cursor direction keys, for communicating direction information and command selections to the processor 610 and for controlling cursor movement on the display 630.

[0161] The processes, systems and methods described herein can be implemented by the computer system 600 in response to the processor 610 executing an arrangement of instructions contained in the main memory 615. Such instructions can be read into the main memory 615 from another computer-readable medium, such as the storage device 625. Execution of the arrangement of instructions contained in the main memory 615 causes the computer system 600 to perform the illustrative processes described herein. One or more processors in a multiprocessing arrangement can also be employed to execute the instructions contained in the main memory 615. Hard-wired circuitry can be used in place of or in combination with software instructions together with the systems and methods described herein. Systems and methods described herein are not limited to any specific combination of hardware circuitry and software.

[0162] The processor 610 can execute one or more instructions associated with the system 100. The processor 610 can include an electronic processor, an integrated circuit, or the like including one or more of digital logic, analog logic, digital sensors, analog sensors, communication buses, volatile memory, nonvolatile memory, and the like. The processor 610 can include, but is not limited to, at least one microcontroller unit (MCU), microprocessor unit (MPU), central processing unit (CPU), graphics processing unit (GPU), physics processing unit (PPU), embedded controller (EC), or the like. The processor 610 can include, or be associated with, a memory 615 operable to store or storing one or more non-transitory computer-readable instructions for operating components of the system 100 and operating components operably coupled to the processor 610. The one or more instructions can include at least one of firmware, software, hardware, operating systems, or embedded operating systems, for example. The processor 610 or the system 100 generally can include at least one communication bus controller to effect communication between the system processor and the other elements of the system 100.

[0163] The memory 615 can include one or more hardware memory devices to store binary data, digital data, or the like. The memory 615 can include one or more electrical components, electronic components, programmable electronic components, reprogrammable electronic components, integrated circuits, semiconductor devices, flip flops, arithmetic units, or the like. The memory 615 can include at least one of a non-volatile memory device, a solid-state memory device, a flash memory device, a NAND memory device, a volatile memory device, etc. The memory 615 can include one or more addressable memory regions disposed on one or more physical memory arrays.

[0164] Although an example computing system has been described in FIG. 6, the subject matter including the operations described in this specification can be implemented in other types of digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.

[0165] The herein described subject matter sometimes illustrates different components contained within, or connected with, different other components. It is to be understood that such depicted architectures are illustrative, and that in fact many other architectures can be implemented which achieve the same functionality. In a conceptual sense, any arrangement of components to achieve the same functionality is effectively “associated” such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as “associated with” each other such that the desired functionality is achieved, irrespective of architectures or intermedial components. Likewise, any two components so associated can also be viewed as being “operably connected,” or “operably coupled,” to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being “operably couplable,” to each other to achieve the desired functionality. Specific examples of operably couplable include but are not limited to physically mateable or physically interacting components or wirelessly interactable or wirelessly interacting components or logically interacting or logically interactable components.

[0166] With respect to the use of plural or singular terms herein, those having skill in the art can translate from the plural to the singular or from the singular to the plural as is appropriate to the context or application. The various singular/plural permutations can be expressly set forth herein for sake of clarity.

[0167] It will be understood by those within the art that, in general, terms used herein, and especially in the appended claims (e.g., bodies of the appended claims) are generally intended as “open” terms (e.g., the term “including” should be interpreted as “including but not limited to,” the term “having” should be interpreted as “having at least,” the term “includes” should be interpreted as “includes but is not limited to,” etc.).

[0168] Although the figures and description can illustrate a specific order of method steps, the order of such steps can differ from what is depicted and described, unless specified differently above. Also, two or more steps can be performed concurrently or with partial concurrence, unless specified differently above. Such variation can depend, for example, on the software and hardware systems chosen and on designer choice. All such variations are within the scope of the disclosure. Likewise, software implementations of the described methods can be accomplished with standard programming techniques with rule-based logic and other logic to accomplish the various connection steps, processing steps, comparison steps, and decision steps.

[0169] It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation, no such intent is present. For example, as an aid to understanding, the following appended claims can contain usage of the introductory phrases “at least one” and “one or more” to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim recitation to inventions containing only one such recitation, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an” (e.g., “a” or “an” should typically be interpreted to mean “at least one” or “one or more”); the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should typically be interpreted to mean at least the recited number (e.g., the bare recitation of “two recitations,” without other modifiers, typically means at least two recitations, or two or more recitations).

[0170] Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B, and C together, etc.). In those instances where a convention analogous to “at least one of A, B, or C, etc.” is used, in general, such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, or C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B, and C together, etc.). It will be further understood by those within the art that virtually any disjunctive word or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B.” [0171] Further, unless otherwise noted, the use of the words “approximate,” “about,”

“around,” “substantially,” etc., mean plus or minus ten percent.

[0172] The foregoing description of illustrative implementations has been presented for purposes of illustration and of description. It is not intended to be exhaustive or limiting with respect to the precise form disclosed, and modifications and variations are possible in light of the above teachings or can be acquired from practice of the disclosed implementations. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents.

Claims

CLAIMS What is claimed is:

1. A computer-assisted medical system, comprising: one or more processors, coupled with memory, to: receive sensor data associated with a medical session and captured by a set of sensors situated in a medical environment; generate a three dimensional (3D) reconstruction of the medical environment based on the sensor data; identify a phase of the medical session based on the 3D reconstruction and via one or more models; determine a metric indicative of quality of a spatial layout of the medical environment based on the generated 3D reconstruction and the identified phase using the one or more models; determine an action based on the metric; and perform the action.

2. The system of claim 1, wherein the one or more processors are further configured to: determine the metric based on a count of personnel detected in the medical environment, a pattern of motion of the personnel or objects, a size of the medical environment, time elapsed in the phase or the medical session, an occurrence of one or more adverse events, or the phase of the medical session.

3. The system of claim 1, wherein the one or more processors are further configured to: compare the metric with a threshold established for the phase of the medical session; and determine the action based on the comparison of the metric with the threshold.

4. The system of claim 3, wherein the one or more processors are further configured to: determine the threshold based on metadata of the medical environment, the metadata indicating at least one of a type of medical environment, a geographic location of the medical environment, or an age of the medical environment.

5. The system of claim 1, wherein the phase of the medical session occurs prior to a medical procedure in the medical session, and: to determine the metric includes to determine that the spatial layout of the medical environment deviates from a baseline layout of medical environments established based on historical data associated with historic medical procedures similar to the medical procedure; and the action comprises an instruction to adjust the spatial layout to conform with the baseline layout prior to performance of the medical procedure in the medical session.

6. The system of claim 1, wherein to determine the action based on the metric, the one or more processors are further configured to: input the metric into an action model trained with machine learning on historical data to determine the action to perform.

7. The system of claims 1 or 6, wherein the action comprises to: adjust at least one of a height, orientation, or location of an object located in the medical environment; automatically control an object located in the medical environment; or provide at least one of an indication of the metric, an alert, or a guideline for an optimal spatial layout.

8. The system of claim 1, wherein the one or more processors are further configured to: predict an adverse event based on at least one of a pattern of motion in the 3D reconstruction, the phase of the medical session, or user input received during the medical session.

9. The system of claim 8, wherein the adverse event comprises at least one of a collision, a sterility breach, or an inefficient motion.

10. The system of one of claims 8 or 9, wherein to perform the action, the one or more processors are further configured to: provide an indication of the prediction of the adverse event prior to occurrence of the adverse event.

11. The system of one of claims 8 or 9, wherein to perform the action, the one or more processors are further configured to: provide, prior to occurrence of the adverse event and responsive to the prediction, guidance to mitigate a likelihood of occurrence of the adverse event.

12. The system of one of claims 8 or 9, wherein to perform the action, the one or more processors are further configured to: disable, prior to occurrence of the adverse event and responsive to the prediction, a function of a robotic medical system used to perform a medical procedure within the medical session, wherein the function is disabled for at least one of a predetermined time interval or until detection of a predetermined event.

13. The system of claim 1, wherein the one or more processors are further configured to: determine, subsequent to performance of the action, a second metric indicative of quality of the spatial layout of the medical environment in the phase of the medical session; and determine, based at least in part on the second metric, whether the quality of the spatial layout of the medical environment improved.

14. The system of claim 1, wherein the sensor data comprises intensity data, depth data, and 3D point cloud data received from the set of sensors, and the one or more processors are further configured to: register, via a function, the 3D point cloud data from the set of sensors to a common coordinate system to generate the 3D reconstruction.

15. The system of claim 1, wherein the one or more processors are further configured to: determine a pattern of motion corresponding to movement of personnel in the medical environment during the medical session; and overlay, on a graphical user interface, the pattern of motion on the 3D reconstruction.

16. The system of claim 1, wherein the one or more processors are further configured to: establish, within the 3D reconstruction, a digital twin of an object associated with performance of the medical session in the medical environment; and overlay, on a graphical user interface with the digital twin, a heatmap corresponding to a pattern of motion in the medical environment during the medical session.

17. The system of claim 1, wherein the one or more models include a reconstruction model, and the one or more processors are further configured to: train the reconstruction model with machine learning using training data comprising at least one of labeled sets of 3D reconstructions, historical 3D point cloud data obtained from one or more medical environments, computer aided drawings, blueprints, or video stream data obtained from a robotic medical system configured to perform at least a portion of the medical session.

18. The system of claim 1, wherein the one or more models include a performance model, and the one or more processors are further configured to: train the performance model with machine learning using training data comprising at least one of historical metrics indicative of quality of medical environments, outcomes of medical sessions, or 3D reconstructions.

19. The system of claim 1, wherein the one or more processors are further configured to: train an action model with machine learning using training data comprising at least one of simulated actions performed in 3D reconstructions or historical event logs of medical sessions performed via one or more robotic medical systems; and determine the action via the action model.

20. A method, comprising: receiving, by one or more processors coupled with memory, sensor data associated with a medical session and captured by a set of sensors situated in a medical environment; generating, by the one or more processors, a three dimensional (3D) reconstruction of the medical environment based on the sensor data; identifying, by the one or more processors, a phase of the medical session based on the 3D reconstruction and via one or more models; determining, by the one or more processors, a metric indicative of quality of a spatial layout of the medical environment based on the generated 3D reconstruction and the identified phase using the one or more models; determining, by the one or more processors, an action based on the metric; and performing, by the one or more processors, the action.

21. A non-transitory computer-readable medium storing processor executable instructions that, when executed by one or more processors, cause the one or more processors to: receive sensor data associated with a medical session and captured by a set of sensors situated in a medical environment; generate a three dimensional (3D) reconstruction of the medical environment based on the sensor data; identify a phase of the medical session based on the 3D reconstruction and via one or more models; determine a metric indicative of quality of a spatial layout of the medical environment based on the generated 3D reconstruction and the identified phase using the one or more models; determine an action based on the metric; and perform the action.