[go: up one dir, main page]

CN119166004A - Information interaction method, device, electronic device, storage medium and program product - Google Patents

Information interaction method, device, electronic device, storage medium and program product Download PDF

Info

Publication number
CN119166004A
CN119166004A CN202411302934.0A CN202411302934A CN119166004A CN 119166004 A CN119166004 A CN 119166004A CN 202411302934 A CN202411302934 A CN 202411302934A CN 119166004 A CN119166004 A CN 119166004A
Authority
CN
China
Prior art keywords
information
interaction
display window
display
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411302934.0A
Other languages
Chinese (zh)
Inventor
何震
郭飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN202411302934.0A priority Critical patent/CN119166004A/en
Publication of CN119166004A publication Critical patent/CN119166004A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04815Interaction with a metaphor-based environment or interaction object displayed as three-dimensional, e.g. changing the user viewpoint with respect to the environment or object
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses an information interaction method, an information interaction device, electronic equipment, a storage medium and a program product, and belongs to the technical field of augmented reality. The method comprises the steps of obtaining interaction behavior information generated in an interaction interface of the augmented reality equipment, wherein the interaction behavior information comprises fixation time length when at least one display object in a display window of the interaction interface is respectively fixed, obtaining space attribute information of the display window in the interaction interface, semantic information of the display window and use preference information of a user on the augmented reality equipment under the condition that the fixation time length represented by the interaction behavior information reaches preset fixation time length, determining the semantic information according to type information and state information of the display window, determining the space attribute information according to size information and position information of the display window, determining target interaction information according to the interaction behavior information, the space attribute information, the semantic information and the use preference information, and displaying the target interaction information on the interaction interface.

Description

Information interaction method, information interaction device, electronic equipment, storage medium and program product
Technical Field
The application belongs to the technical field of augmented reality, and particularly relates to an information interaction method, an information interaction device, electronic equipment, a storage medium and a program product.
Background
With the development of science and technology, users increasingly use Extended Reality (XR) devices for online interaction. The interactive interface of the existing XR equipment mainly adopts the design thought of mobile platforms such as mobile phones and the like, and tends to fold operation instructions in interface elements so as to achieve the overall simple design style. This design, while improving the visual experience, increases the operational complexity to some extent.
Currently, a user may operate an XR device via a handle, gestures, eye, etc. Because the operation instruction is usually folded in the interface element, the operation of the user is complicated, so that the interaction efficiency is reduced, and the hand and eye fatigue of the user is accelerated.
Disclosure of Invention
The embodiment of the application aims to provide an information interaction method, an information interaction device, electronic equipment, a storage medium and a program product, which can solve the problems that the operation of a user is complicated, the interaction efficiency is reduced, and the hand and eye fatigue of the user is accelerated.
In a first aspect, an embodiment of the present application provides an information interaction method, where the method includes:
The method comprises the steps of obtaining interaction behavior information generated in an interaction interface of the augmented reality equipment, wherein the interaction interface comprises at least one display window, the interaction behavior information is interaction process information of the at least one display window, and the interaction behavior information comprises gazing duration of each gazed at of at least one display object in the display window;
under the condition that the interaction behavior information characterizes that the gazing time length reaches a preset gazing time length, acquiring spatial attribute information of the display window on the interaction interface, semantic information of the display window and preference information of the user for the use of the augmented reality equipment, wherein the semantic information is determined according to type information and state information of the display window, and the spatial attribute information is determined according to size information and position information of the display window;
Determining target interaction information according to the interaction behavior information, the spatial attribute information, the semantic information and the use preference information, wherein the target interaction information is associated with the operation intention of the user;
And displaying the target interaction information on the interaction interface.
In a second aspect, an embodiment of the present application provides an information interaction apparatus, including:
The system comprises a first acquisition module, a second acquisition module and a display module, wherein the first acquisition module is used for acquiring interactive behavior information generated in an interactive interface of the augmented reality equipment, the interactive interface comprises at least one display window, the interactive behavior information is interactive process information of the at least one display window, and the interactive behavior information comprises gazing time length of each gazed at of at least one display object in the display window;
The second acquisition module is used for acquiring spatial attribute information of the display window on the interactive interface, semantic information of the display window and preference information of the user for the use of the augmented reality equipment under the condition that the interaction behavior information characterizes that the gazing duration reaches a preset gazing duration, wherein the semantic information is determined according to type information and state information of the display window, and the spatial attribute information is determined according to size information and position information of the display window;
a determining module, configured to determine target interaction information according to the interaction behavior information, the spatial attribute information, the semantic information, and the usage preference information, where the target interaction information is associated with an operation intention of the user;
And the display module is used for displaying the target interaction information on the interaction interface.
In a third aspect, an embodiment of the present application provides an electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the method as described in the first aspect.
In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor perform the steps of the method according to the first aspect.
In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and where the processor is configured to execute a program or instructions to implement a method according to the first aspect.
In a sixth aspect, embodiments of the present application provide a computer program product stored in a storage medium, the program product being executable by at least one processor to implement the method according to the first aspect.
In the embodiment of the application, because the target interaction information is associated with the operation intention of the user, the interaction result can be directly displayed by acquiring the interaction behavior information generated in the interaction interface of the augmented reality device, and under the condition that the interaction behavior information characterizes that the gazing time of the user on a certain display object in the display window reaches the preset gazing time, the target interaction information is determined according to the interaction behavior information, the space attribute information of the display window in the interaction interface, the semantic information and the use preference information of the user on the augmented reality device, and the target interaction information is displayed on the interaction interface, namely, the following operation intention of the user is predicted in the process of interacting with the augmented reality device, and the target interaction information associated with the operation intention is displayed, or the user can interact with the augmented reality device based on the target interaction information, so that the interaction operation of the user is simplified, the interaction efficiency is improved, and the hand eye fatigue of the user is relieved.
Drawings
Fig. 1 is a schematic flow chart of an information interaction method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of determining activation weights according to an embodiment of the present application;
FIG. 3 is a schematic diagram of determining an association relationship between two modality information based on a cross-attention mechanism according to an embodiment of the present application;
FIG. 4 is a schematic diagram of another embodiment of determining an association between two modality information based on a cross-attention mechanism;
FIG. 5 is a schematic diagram of determining target interaction information using a multi-modal large model according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a system architecture according to an embodiment of the present application;
FIG. 7 is a schematic diagram of an interactive interface provided by an embodiment of the present application;
Fig. 8 is a schematic structural diagram of an information interaction device according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
Fig. 10 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions of the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which are obtained by a person skilled in the art based on the embodiments of the present application, fall within the scope of protection of the present application.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.
For existing XR interaction designs, most manipulation modes can be summarized as two operation primitives, "navigation" and "selection". Common "navigation" operation implementations include ray navigation based on handle tracking, action navigation based on gesture tracking, action navigation based on eye movement tracking, etc., and common "select" operation implementations include button selection based on handle, action selection based on gesture recognition, etc.
Whether relying on navigational operations for handle tracking, gesture tracking, and eye movement tracking, or on selected operations for handle key presses and gesture recognition, its underlying logic relies on frequent movements of the "hand" and "eye". Based on this, operating the XR device by eye-hand interaction for a long time can cause significant user eye strain and arm strain.
In addition, as described in the background section, the interactive interface of the existing XR device mainly uses the design thought of mobile platforms such as mobile phones, and tends to fold the operation instructions in the interface elements, so as to achieve an overall simple design style. This design trend, while improving the visual experience, increases the operational complexity to some extent. For example, "copy text from reader to browser for searching", a series of actions such as a) selecting text by operating cursor by handle, b) selecting copy command after long key trigger menu, c) selecting browser address bar, d) selecting paste command after long key trigger menu again, and e) clicking search button are undergone in the middle. Even if the eye tracking capability is integrated, the operation of 'handle movement' is mainly simplified, and the whole link is still not optimized in the interaction efficiency, so that the eye and hand fatigue of a user can be accelerated.
Based on the above, in order to solve the problems in the prior art, embodiments of the present application provide an information interaction method, an apparatus, an electronic device, a readable storage medium, and a computer program product. The information interaction method can be applied to an extended reality scene, and can also be applied to the fields of digital intelligent popularization (such as advertisements, services and the like), natural intuitional interaction in the field of robots and the like, and is not limited herein.
The following describes in detail the information interaction provided by the embodiment of the present application through specific embodiments and application scenarios thereof with reference to the accompanying drawings.
Fig. 1 shows a flow chart of an information interaction method according to an embodiment of the present application. As shown in fig. 1, the information interaction method provided in the embodiment of the present application includes the following steps S110 to S140, and each step is explained in detail below.
S110, acquiring interaction behavior information of a user on an interaction interface of the augmented reality device, wherein the interaction interface comprises at least one display window, the interaction behavior information is interaction process information of the at least one display window, and the interaction behavior information comprises gazing duration of each gazed at of at least one display object in the display window.
Here, a plurality of applications may be installed in an Extended Reality (XR) device. A display window corresponding to the application may be displayed in the interactive interface. The display window may display a text label, a text box, an image, a control, and the like. Wherein the text box may be described using text near the user's point of view. The image may be described using a region of interest (Region of Interest, POI) near the user's point of view. The controls may correspond to, for example, "confirm," "cancel," "share," etc. functions.
From a time dimension, the interaction behavior information may include interactions between the user starting the XR device and obtaining the interaction behavior information. From the interaction type dimension, the interaction behavior information may include viewpoint information and target operation behavior. The viewpoint information may include a gaze sequence and a gaze duration of the user on the display object in the display window. The viewpoint information may belong to the "navigation operation" information mentioned above. The target operational behavior may include a user selected operation of the control. The target operational behavior may pertain to the "selected operation" mentioned above. From the information type dimension, the interactive behavior information may include a sequence of display objects, dwell time information, and action segments.
Based on this, in order to improve accuracy of the following prediction of the interaction behavior based on the interaction behavior information, in some embodiments, S110 may specifically include:
Receiving a first input of a user to a display object in at least one display window;
Responding to the first input, acquiring a display object sequence corresponding to a plurality of display objects, a gazing duration for gazing each display object and a target operation behavior for operating at least one display window;
Determining stay time information of a user on each display object according to the gazing time length corresponding to each display object in the first preset time length;
Splitting a display object sequence into a plurality of operation fragments according to the target operation behavior;
and determining interaction behavior information according to the display object sequence, the stay time information and the operation fragment.
Here, the first input includes, but is not limited to, gaze input by a user through an eyeball, touch input by a user to a display object through a touch device such as a finger or a stylus, a voice instruction input by the user, a specific gesture input by the user, other feasibility inputs, and the like. The first input may be specifically determined according to actual use requirements, which is not limited by the embodiment of the present application. The specific gesture in the embodiment of the application can be any one of a single-click gesture, a sliding gesture, a dragging gesture, a pressure recognition gesture, a long-press gesture, an area change gesture, a double-press gesture and a double-click gesture, and the click input in the embodiment of the application can be single-click input, double-click input or any-time click input, and the like, and can also be long-press input or short-press input. The first input may be, for example, a user's browsing input of a display object in at least one display window.
The sequence of display objects may characterize an interaction sequence generated in accordance with an input order when a user makes a first input to the plurality of display objects. In addition, the target operation behavior of the user for operating the display window may specifically be a selection operation of a control in the display window by the user. The selection operation may include, for example, a user clicking on a "share" control, clicking on an "edit" control, clicking on a "confirm" control, and so forth.
As an example, a sequence of display objects may be generated by monitoring the interactive behavior of a user. The sequence of display objects may be recorded, for example, as an M1-view-out text tab |m2-view-in image display area |m3-view hovering |m4-key trigger menu |m5-pressing the "share" button |m6-view-out image area |m7-view-in browser search box |m8-key input text |m9-key trigger search near the image (0.2, 0.4). Wherein M1-M4, M6-M8 may belong to operations that do not involve "move" and "opt" class interface layers of the background functionality. That is, M1-M4, M6-M8 may pertain to the "navigation operation" mentioned hereinabove. In addition, M5 may be an operation for "triggering the sharing function", and M9 may be an operation for "triggering the search function". M5 and M9 may belong to a "select operation" involving a background function. That is, the target operation behavior may include M5 and M9 in the above example.
As an example, if the display object sequence is longer, the display object sequence may be split into a plurality of segments in order to better predict the next interaction behavior of the user, i.e., to predict the operation intention of the user, from the display object sequence later. Specifically, since the "navigation operation" is a precondition of the "selection operation", which is the final purpose of the "navigation operation", the target operation behavior can be split as an operation separator to the display object sequence, and a plurality of operation fragments can be obtained. The operation fragment may be a minimum sequence of operations to move, navigate to "select operation". The length of the operation fragment may reflect the complexity of the operation. The longer the operation fragment, the higher the complexity of the operation fragment, and the higher the importance of the operation fragment to the subsequent prediction of the interaction behavior. Conversely, the shorter the operation fragment, the lower the complexity of the operation fragment, and the lower the importance of the operation fragment to the subsequent prediction of the interaction behavior. On the basis of the above example, the manipulation segments may include M1-M5 and M6-M9.
In addition, the first preset duration may be a preset time window, for example, 10 minutes. The stay time length information may represent a duty ratio of a gazing time length corresponding to each display object in a first preset time length, or a proportional relationship between gazing time lengths corresponding to a plurality of display objects respectively. The dwell time information may slide the histogram representation. The stay time information may reflect a user's attention to the display window.
As an example, by monitoring the interaction behavior of the user, the duration of the user's gaze on each display object may be obtained. And carrying out statistical analysis on a plurality of gazing time durations within a first preset time duration to obtain the residence time duration information of the user on each display object.
As a more specific example, the gaze point is sampled at a preset frequency (e.g., 10 Hz) for a first preset time period (e.g., 10 minutes), and the duty ratio of each display window focused by the gaze point in the first preset time period is counted, so as to obtain the sliding histogram.
In this way, the interactive behavior information is determined according to the longer display object sequence, the shorter operation fragments and the residence time information obtained by carrying out statistical analysis on each display object, so that the interactive behavior information can be more abundant, and the accuracy of the subsequent interactive behavior prediction based on the interactive behavior information is improved.
S120, under the condition that the interaction behavior information characterizes that the gazing time length reaches the preset gazing time length, acquiring space attribute information of a display window on an interaction interface, semantic information of the display window and use preference information of a user on the augmented reality equipment, wherein the semantic information is determined according to type information and state information of the display window, and the space attribute information is determined according to size information and position information of the display window.
Here, if the gazing time length of the user on a certain display object reaches the preset gazing time length, the next interaction behavior of the user can be predicted based on the interaction behavior information generated by the interaction between the user and the augmented reality device.
In order to improve accuracy of predicting the next interaction behavior of the user, before the interaction behavior is predicted, spatial attribute information of the display window on the interaction interface, semantic information of the display window and preference information of the user on the XR device can be obtained, and the interaction behavior is predicted according to the interaction behavior information, the spatial attribute information, the semantic information, the preference information and other multi-mode information. In consideration of time dimension, the spatial attribute information and the semantic information can belong to instant information, the interaction behavior information can belong to short period information, and the use preference information can belong to long period information.
The spatial attribute information of the display window on the interactive interface may include size information and position information of the display window on the interactive interface.
Based on this, in order to improve the prediction efficiency of predicting the interaction behavior, in some embodiments, the obtaining the spatial attribute information of the display window on the interaction interface may specifically include:
Acquiring first size information and anchor point information of a display window on an interactive interface;
mapping the first size information into a target coordinate system to obtain second size information;
projecting an anchor point corresponding to the anchor point information onto the unit sphere to obtain anchor point position information;
and determining the space attribute information according to the second size information and the anchor point pose information.
Here, the first size information may be actual size information of the display window at the interactive interface. The second size information may be size information obtained after mapping the first size information to the target coordinate system. The second size information and the first size information may be the same or different. The target coordinate system may be, for example, a normalized device coordinate system (Normalized Device Coordinates, NDC). The first size information of the different display windows at the interactive interface may be in different data dimensions. Therefore, in order to improve the standardization of the subsequent data processing procedure, the first size information may be mapped into the target coordinate system, that is, the first size information may be subjected to the standardization process to obtain the second size information. In addition, the larger the size of the display window, the higher the user's attention to the display window may be. The smaller the size of the display window, the lower the user's attention to the display window may be.
In addition, the anchor information may be position information of an anchor of the display window. The unit sphere may be a sphere with a radius of 1. Anchor point pose information may be expressed in terms of quaternions. The anchor point position information is obtained by uniformly projecting the anchor points to the unit sphere, so that the position information of the anchor points can be subjected to standardized processing, and the standardization of the subsequent data processing process is improved. If the anchor point pose information indicates that the anchor point is positioned at the front main view angle position of the unit sphere, the higher the importance degree of the display content in the display window corresponding to the anchor point can be. If the anchor point pose information indicates that the anchor point is located at the edge position of the unit sphere, the importance level of the display content in the display window corresponding to the anchor point can be lower.
In this way, by determining the spatial attribute information according to the second size information and the anchor point pose information after the normalization processing, the normalization of the spatial attribute information can be improved, and further the prediction efficiency of the subsequent prediction of the interaction behavior can be improved.
In addition, semantic information of the display window may be determined according to type information and state information of the display window. Wherein the type information may be determined according to the type of the display content in the display window. The type information may be expressed in terms of phrases. The type information may include dynamic types such as games, videos, etc., and static types such as pages, text, etc. In performing the prediction of the interaction behavior, the importance of the dynamic type may be higher than the importance of the static type. In addition, the status information may include whether the display window is in an active state or the display window is in an inactive state. That is, the display windows may include a first display window whose state information is in an active state and at least one second display window whose state information is in an inactive state.
Based on this, in order to improve accuracy of the following prediction of the interaction behavior, in some embodiments, the semantic information of the display window may specifically include:
Acquiring type information and state information of a display window, wherein the state information is used for indicating whether the display window is activated or not;
performing numerical processing on the type information under the condition that the type information is text information, and obtaining a numerical result corresponding to the type information;
Acquiring an activation weight corresponding to the state information;
And determining semantic information according to the numerical result and the activation weight.
Here, the digitizing of the type information may be to map text information to numeric information to obtain a digitized result. For example, "game" may be mapped to "10" and "page" may be mapped to "5". Wherein the value size may represent the importance of the type information. The larger the value may indicate the higher the importance of the type of information in subsequent prediction of interactive behavior.
In addition, if the display window is in an activated state, the activation weight of the display window may be 1, and if the display window is in a deactivated state, the activation weight of the display window may be 0. In practice, however, even if the display window is in an inactive state, its data still contributes to the prediction of interaction behavior.
Therefore, in order to improve accuracy of the following prediction of the interaction behavior, in some embodiments, the acquiring the activation weight corresponding to the state information may specifically include:
Acquiring a window distance between the first display window and each second display window to obtain at least one window distance;
and determining the corresponding activation weight of each display window according to at least one window distance.
Typically, the first display window is a user-selected display window. The second display window may be a display window that is not selected by the user. The display window not selected by the user may be a "foreground" window or a "background" window. Wherein the "foreground" window may be closer to the first display window. The "background" window may be farther from the first display window.
As an example, if the interactive interface includes a first display window and two second display windows a and B, and the window distance between the first display window and the second display window a is 3, and the window distance between the first display window and the second display window B is 5, the activation weight corresponding to the second display window a may be 3/8, and the activation weight corresponding to the second display window B may be 5/8. In addition, the activation weight corresponding to the first display window may be 1. Wherein, the higher the activation weight may indicate the higher the user's attention to the display window.
According to the embodiment of the application, the contribution degree of the second display window to the interactive behavior prediction can be reserved by determining the activation weight according to the window distance, so that the accuracy of the subsequent interactive behavior prediction can be improved.
Based on this, in order to improve accuracy of the window distance, further improve accuracy of the following prediction of the interaction behavior, in some embodiments, the obtaining the window distance between the first display window and each second display window may specifically include:
acquiring a first projection of an anchor point of a first display window on a unit sphere and a second projection of a second display window on the unit sphere;
The spherical distance between the first projection and each of the second projections is determined as the window distance.
As shown in fig. 2, taking the example that the interactive interface includes four display windows, 200 may represent the sphere of the unit sphere, 201 may represent the first projection, 202 may represent the second projection, and 203 may represent the window distance d i. Wherein, the value of i can be from 0 to N, and N can represent the number of the second display windows. d 0 may be 0, representing the spherical distance between the first projection and the first projection, and d 1、d2、d3 may represent the spherical distances between the first projection and the three second projections, respectively.
By determining the window distance according to the spherical distance between the projections of the anchor points of the display window on the unit sphere, the accuracy of the window distance can be improved, and the accuracy of the subsequent interactive behavior prediction is further improved.
In addition, after determining the at least one window distance, the activation weight may be calculated as described in the above example, and the activation weight may also be calculated by the following formula:
Wherein,
In the above formula, w i may be an activation weight, λ may be a settable super parameter, and may be reserved for a developer to perform a customized adjustment, so as to control the overall specific gravity. By determining the activation weight through the formula, the scientificity of the activation weight can be improved, and the accuracy of the subsequent interactive behavior prediction is further improved.
In conclusion, semantic information is determined according to the numerical result and the activation weight, and further interaction behavior prediction is performed based on the semantic information, so that the accuracy of interaction behavior prediction can be improved.
Further, the user's usage preference information for the XR device may include user's usage preference information for a plurality of applications in the XR device. The usage preference information may indicate that the user prefers to use a certain application in the XR device, and a period of time that the user prefers to use a certain application.
Based on this, in order to further improve accuracy of the interaction behavior prediction, in some embodiments, the obtaining the usage preference information of the user for the augmented reality device may specifically include:
Acquiring the use proportion information of a user to a plurality of applications in the augmented reality equipment and the use frequency information of the user to a target application in a plurality of time periods within a second preset time period, wherein the second preset time period comprises the time periods, and the target application is any one of the applications;
the usage preference information is determined based on the usage proportion information and the usage frequency information.
Here, the usage proportion information may indicate a proportion of the usage period of the application by the user to the second preset period. The usage frequency information may represent how frequently the user uses the target application within the target time period. The usage proportion information and the usage frequency information may be regarded as a priori knowledge under long period (i.e. second preset duration) statistics. The second preset time period may be, for example, two weeks to one month. Events such as starting, switching and stopping running of the application can be recorded through an operating system of the augmented reality device, and then the using time information of the application in a second preset duration is tracked. The usage proportion information of the plurality of applications can be determined by performing statistical analysis on the usage time information of the plurality of applications within the second preset time period. The larger the use proportion of the user to the application in the second preset time period, the larger the use requirement of the user to the application can be indicated. The smaller the usage proportion of the user to the application within the second preset time period, the smaller the usage requirement of the user to the application can be indicated.
In addition, the second preset time period may be divided by a time period in consideration of a certain regularity of daily activities of most users. The application using habit of the user can be counted by counting the using frequency of the user to an application in each time period. The higher the frequency of use of an application for a certain period of time, the more important the application is in that period of time.
Therefore, the use preference information is determined according to the use proportion information and the use frequency information, so that the use requirement and the use habit of the user on the XR equipment can be fully considered in the interactive behavior prediction, and the accuracy of the interactive behavior prediction is further improved.
S130, determining target interaction information according to the interaction behavior information, the spatial attribute information, the semantic information and the use preference information, wherein the target interaction information is associated with the operation intention of the user.
The target interaction information may characterize the user's next interaction behavior. The target interaction information may include shortcut interaction controls and/or interaction result information for performing a next operation. The shortcut interaction control and the interaction result information can be at least one. The shortcut interaction control may be, for example, a "share" control, a "screen capture" control, etc. The interaction result information may be, for example, explanatory information, price list information of the commodity, or the like.
Based on this, in order to improve accuracy of interaction behavior prediction, in some embodiments, determining the target interaction information according to the interaction behavior information, the spatial attribute information, the semantic information, and the usage preference information may specifically include:
The interactive behavior information, the spatial attribute information, the semantic information and the use preference information are input into a multi-modal large model, and the multi-modal large model is utilized to determine target interactive information.
The multimodal big model may be an artificial intelligence big model capable of handling multimodal information. The multimodal information may include, among other things, interaction behavior information, spatial attribute information, semantic information, and usage preference information. The multimodal big model may be, for example, a multimodal transducer model.
In some embodiments, the above-mentioned interactive behavior information, spatial attribute information, semantic information and usage preference information are input into a multi-modal large model, and the target interactive information is determined by using the multi-modal large model, which may specifically include:
Inputting interaction behavior information, spatial attribute information, semantic information and usage preference information into a multi-modal large model, and determining an association relationship between first modal information and second modal information for each piece of first modal information based on a cross-attention mechanism, wherein the first modal information is any one of the interaction behavior information, the spatial attribute information, the semantic information and the usage preference information, and the second modal information is any one of the interaction behavior information, the spatial attribute information, the semantic information and the usage preference information except the first modal information;
Based on a self-attention mechanism, determining an internal dependency relationship of the first modality information based on the association relationship for each first modality information;
And determining target interaction information according to the association relation and the internal dependency relation corresponding to each piece of first modality information.
For example, in the case where the first modality information is spatial attribute information and the second modality information is semantic information, a schematic diagram for determining an association relationship between the first modality information and the second modality information based on a cross-attention mechanism using a multi-modality large model may be shown in fig. 3.
For example, in the case that the first modality information is spatial attribute information and the second modality information is interactive behavior information, a schematic diagram for determining an association relationship between the first modality information and the second modality information based on a cross-attention mechanism using a multi-modality large model may be shown in fig. 4.
For example, if the spatial attribute information is denoted as C1, the semantic information is denoted as C2, the interaction behavior information is denoted as C3, and the usage preference information is denoted as C4, a schematic diagram for determining the target interaction information by using the multi-modal large model according to the embodiment of the present application may be shown in fig. 5.
According to the method and the device for predicting the interactive behavior, the accuracy of the interactive behavior prediction can be improved by determining the target interactive information through the multi-mode large model.
And S140, displaying the target interaction information on the interaction interface.
The target interaction information may be displayed in a floating window on the interaction interface. The target interaction information displayed in the floating window may be dynamically adjusted as the user interacts with the XR device.
As an example, when a user reads text, the explanation corresponding to the phrase concerned by the user can be displayed, when the user browses the e-commerce catalogue, the commodity price list can be displayed, when the viewpoint of the user is firstly transferred from the text display area to the video playing area, and then a certain animal in the video content is focused for a period of time, the screen capturing control and the text introduction of the animal can be simultaneously displayed.
If the target interaction information is the shortcut interaction control, the user can continue to interact by directly clicking the shortcut interaction control on the interaction interface without finding the interaction control for interaction through layer-by-layer operation, so that the user operation can be simplified. If the target interaction information is interaction result information, the user can directly obtain the interaction result, and the user operation is further simplified.
In the embodiment of the application, because the target interaction information is associated with the operation intention of the user, the interaction result can be directly displayed by acquiring the interaction behavior information generated in the interaction interface of the augmented reality device, and under the condition that the interaction behavior information characterizes that the gazing time of the user on a certain display object in the display window reaches the preset gazing time, the target interaction information is determined according to the interaction behavior information, the space attribute information of the display window in the interaction interface, the semantic information and the use preference information of the user on the augmented reality device, and the target interaction information is displayed on the interaction interface, namely, the following operation intention of the user is predicted in the process of interacting with the augmented reality device, and the target interaction information associated with the operation intention is displayed, or the user can interact with the augmented reality device based on the target interaction information, so that the interaction operation of the user is simplified, the interaction efficiency is improved, and the hand eye fatigue of the user is relieved.
In order to better describe the whole solution, some specific examples are given based on the above embodiments.
First, the multimodal large model may be deployed in an XR device or in the cloud, without limitation. Generally, since the multi-mode large model is large in scale, in order to ensure normal operation of the XR device, the multi-mode large model can be deployed at the cloud, and the scheme in the embodiment of the application is executed by adopting a system architecture of 'end+cloud'. In the case of a multimodal large model deployed in the cloud, a schematic diagram of a system architecture in an embodiment of the present application may be shown in fig. 6.
In addition, as shown in FIG. 7, 700 may represent a user and 701 may represent an interactive interface of an XR device. 702 may represent a movement trajectory of the user's viewpoint (in this example, the viewpoint first moves from the text display area to the video play area, then gazes at a pet in the video content for a period of time, and finally moves to the browser search box). 703 may represent predictions of multimodal big models that guess that the first intent of the user may want to learn about the variety, habit, etc. of the pet, thus displaying some introductory content within the floating window. 704 may infer that the second intent of the user is a screenshot on behalf of the model, thus additionally displaying a "screenshot" control. 705 may stay within the search box on behalf of the user's current viewpoint, thus predicting the search keyword, i.e., displaying the recommended phrase. 706 and 607 may represent quick interaction controls that rank 2, 3, corresponding to a "share" control and a "search" control, respectively.
It should be noted that, when predicting in this example, the interaction behavior information may include first sub-interaction behavior information before the XR device moves from the current power-on to the point of view into the text display area, and second sub-interaction behavior information when the point of view moves from the text display area to the video playing area, then stares at the pet in the video content for a period of time, and finally moves to the browser search box.
Therefore, the interactive behavior information, the semantic information, the spatial attribute information, the use preference information and other information are perceived through the XR equipment, the interactive behavior information, the semantic information, the spatial attribute information and the use preference information are input into the multi-mode large model, the target interactive information is determined by utilizing the multi-mode large model, and the target interactive information is dynamically displayed on an interactive interface of the XR equipment, so that the method is beneficial to enriching information expression of the XR interface in combination with potential demands of users in real time, and improving interaction efficiency and relieving eye and hand fatigue of the users.
According to the information interaction method provided by the embodiment of the application, the execution subject can be an information interaction device. In the embodiment of the application, a method for executing information interaction by an information interaction device is taken as an example, and the information interaction device provided by the embodiment of the application is described.
Fig. 8 shows a schematic structural diagram of an information interaction device according to an embodiment of the present application. As shown in fig. 8, an information interaction device 800 provided in an embodiment of the present application may include:
The first obtaining module 801 is configured to obtain interaction behavior information generated in an interaction interface of the augmented reality device, where the interaction interface includes at least one display window, the interaction behavior information is interaction process information of the at least one display window, and the interaction behavior information includes gazing duration of each of at least one display object in the display window;
A second obtaining module 802, configured to obtain, when the gaze duration represented by the interaction behavior information reaches a preset gaze duration, spatial attribute information of the display window on the interaction interface, semantic information of the display window, and preference information of the user for use of the augmented reality device, where the semantic information is determined according to type information and state information of the display window, and the spatial attribute information is determined according to size information and position information of the display window;
A determining module 803, configured to determine target interaction information according to the interaction behavior information, the spatial attribute information, the semantic information, and the usage preference information, where the target interaction information is associated with an operation intention of the user;
and the display module 804 is configured to display the target interaction information on the interaction interface.
The information interaction device 800 is described in detail below, and is specifically as follows:
In some embodiments, the first obtaining module 801 may specifically include:
A receiving unit for receiving a first input of a user to a display object in at least one display window;
a first obtaining unit configured to obtain, in response to a first input, a display object sequence corresponding to a plurality of display objects, a gazing duration for gazing at each display object, and a target operation behavior for operating at least one display window;
The first determining unit is used for determining stay time information of a user on each display object according to the gazing time length corresponding to each display object in the first preset time length;
The splitting unit is used for splitting the display object sequence into a plurality of operation fragments according to the target operation behavior;
and the second determining unit is used for determining interaction behavior information according to the display object sequence, the stay time information and the operation fragment.
In some embodiments, the second obtaining module 802 may specifically further include:
the second acquisition unit is used for acquiring type information and state information of the display window, wherein the state information is used for indicating whether the display window is activated or not;
the processing unit is used for carrying out numerical processing on the type information under the condition that the type information is text information to obtain a numerical result corresponding to the type information;
a third acquisition unit, configured to acquire an activation weight corresponding to the state information;
and the third determining unit is used for determining semantic information according to the numerical result and the activation weight.
In some of these embodiments, the display windows include a first display window with state information in an active state and at least one second display window with state information in an inactive state. Based on this, the second acquisition module 802 may specifically further include:
a fourth obtaining unit, configured to obtain a window distance between the first display window and each second display window, to obtain at least one window distance;
And the fourth determining unit is used for determining the corresponding activation weight of each display window according to at least one window distance.
In some embodiments, the fourth obtaining unit may specifically include:
The acquisition subunit is used for acquiring a first projection of an anchor point of the first display window on the unit sphere and a second projection of the second display window on the unit sphere;
A first determination subunit for determining a spherical distance between the first projection and each of the second projections as a window distance.
In some embodiments, the second obtaining module 802 may specifically further include:
a fifth obtaining unit, configured to obtain usage proportion information of a user on a plurality of applications in the augmented reality device in a second preset duration and usage frequency information of the user on a target application in a plurality of time periods, where the second preset duration includes a plurality of time periods, and the target application is any one of the plurality of applications;
and a fifth determining unit for determining the usage preference information based on the usage proportion information and the usage frequency information.
In the embodiment of the application, because the target interaction information is associated with the operation intention of the user, the interaction result can be directly displayed by acquiring the interaction behavior information generated in the interaction interface of the augmented reality device, and under the condition that the interaction behavior information characterizes that the gazing time of the user on a certain display object in the display window reaches the preset gazing time, the target interaction information is determined according to the interaction behavior information, the space attribute information of the display window in the interaction interface, the semantic information and the use preference information of the user on the augmented reality device, and the target interaction information is displayed on the interaction interface, namely, the following operation intention of the user is predicted in the process of interacting with the augmented reality device, and the target interaction information associated with the operation intention is displayed, or the user can interact with the augmented reality device based on the target interaction information, so that the interaction operation of the user is simplified, the interaction efficiency is improved, and the hand eye fatigue of the user is relieved.
The information interaction device in the embodiment of the application can be electronic equipment or a component in the electronic equipment, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. The electronic device may be a Mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a Mobile internet appliance (Mobile INTERNET DEVICE, MID), an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a robot, a wearable device, an ultra-Mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), etc., and may also be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a Television (TV), a teller machine, a self-service machine, etc., which are not particularly limited in the embodiments of the present application.
The information interaction device in the embodiment of the application can be a device with an operating system. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, and the embodiment of the present application is not limited specifically.
The information interaction device provided by the embodiment of the present application can implement each process implemented by the embodiments of the methods of fig. 1 to 7, and in order to avoid repetition, a detailed description is omitted here.
Optionally, as shown in fig. 9, the embodiment of the present application further provides an electronic device 900, which includes a processor 901 and a memory 902, where a program or an instruction capable of running on the processor 901 is stored in the memory 902, and the program or the instruction implements each step of the above-mentioned embodiment of the information interaction method when being executed by the processor 901, and the steps can achieve the same technical effect, so that repetition is avoided, and no further description is given here.
The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device.
Fig. 10 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
The electronic device 1000 includes, but is not limited to, a radio frequency unit 1001, a network module 1002, an audio output unit 1003, an input unit 1004, a sensor 1005, a display unit 1006, a user input unit 1007, an interface unit 1008, a memory 1009, and a processor 1010.
Those skilled in the art will appreciate that the electronic device 1000 may also include a power source (e.g., a battery) for powering the various components, which may be logically connected to the processor 1010 by a power management system to perform functions such as managing charge, discharge, and power consumption by the power management system. The electronic device structure shown in fig. 10 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than shown, or may combine certain components, or may be arranged in different components, which are not described in detail herein.
The processor 1010 is configured to obtain interaction behavior information generated in an interaction interface of the augmented reality device, where the interaction interface includes at least one display window, the interaction behavior information is interaction process information of the at least one display window, the interaction behavior information includes gaze time length of at least one display object in the display window, when the gaze time length represented by the interaction behavior information reaches a preset gaze time length, obtain spatial attribute information of the display window on the interaction interface, semantic information of the display window, and preference information of a user for use of the augmented reality device, where the semantic information is determined according to type information and state information of the display window, and the spatial attribute information is determined according to size information and position information of the display window;
and a display unit 1006, configured to display the target interaction information on the interaction interface.
According to the embodiment of the application, the next operation intention of the user is predicted in the process of interaction between the user and the augmented reality equipment, and the target interaction information related to the operation intention is displayed, so that the interaction result can be directly displayed, or the user can interact with the augmented reality equipment based on the target interaction information, the interaction operation of the user is simplified, the interaction efficiency is improved, and the hand and eye fatigue of the user is relieved.
Optionally, a user input unit 1007 is configured to receive a first input from a user of a display object in at least one display window;
The processor 1010 is further configured to:
Responding to the first input, acquiring a display object sequence corresponding to a plurality of display objects, a gazing duration for gazing each display object and a target operation behavior for operating at least one display window;
Determining stay time length information of a user on each display object according to the gazing time length corresponding to each display object in the first preset time length, splitting a display object sequence into a plurality of operation fragments according to target operation behaviors, and determining interaction behavior information according to the display object sequence, the stay time length information and the operation fragments.
Optionally, the processor 1010 is further configured to:
Acquiring type information and state information of a display window, wherein the state information is used for indicating whether the display window is activated or not;
performing numerical processing on the type information under the condition that the type information is text information, and obtaining a numerical result corresponding to the type information;
and determining semantic information according to the numeric result and the activation weight.
Optionally, the display windows include a first display window with state information in an active state and at least one second display window with state information in an inactive state. Based on this, the processor 1010 is further configured to:
Acquiring a window distance between the first display window and each second display window to obtain at least one window distance;
and determining the corresponding activation weight of each display window according to at least one window distance.
Optionally, the processor 1010 is further configured to:
acquiring a first projection of an anchor point of a first display window on a unit sphere and a second projection of a second display window on the unit sphere;
The spherical distance between the first projection and each of the second projections is determined as the window distance.
Optionally, the processor 1010 is further configured to:
Acquiring the use proportion information of a user to a plurality of applications in the augmented reality equipment and the use frequency information of the user to a target application in a plurality of time periods within a second preset time period, wherein the second preset time period comprises the time periods, and the target application is any one of the applications;
the usage preference information is determined based on the usage proportion information and the usage frequency information.
The embodiment of the application is beneficial to enriching the information expression of the XR interface by combining the potential requirements of the user in real time, improving the interaction efficiency and relieving the eye and hand fatigue of the user.
It should be appreciated that in embodiments of the present application, the input unit 1004 may include a graphics processor (Graphics Processing Unit, GPU) 10041 and a microphone 10042, where the graphics processor 10041 processes image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The display unit 1006 may include a display panel 10061, and the display panel 10061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1007 includes at least one of a touch panel 10071 and other input devices 10072. The touch panel 10071 is also referred to as a touch screen. The touch panel 10071 can include two portions, a touch detection device and a touch controller. Other input devices 10072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein.
The memory 1009 may be used to store software programs as well as various data. The memory 1009 may mainly include a first memory area storing programs or instructions and a second memory area storing data, wherein the first memory area may store an operating system, application programs or instructions (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 1009 may include volatile memory or nonvolatile memory, or the memory 1009 may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM), static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate Synchronous dynamic random access memory (Double DATA RATE SDRAM, DDRSDRAM), enhanced Synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCH LINK DRAM, SLDRAM), and Direct random access memory (DRRAM). Memory 1009 in embodiments of the application includes, but is not limited to, these and any other suitable types of memory.
The processor 1010 may include one or more processing units, and optionally the processor 1010 integrates an application processor that primarily processes operations involving an operating system, user interface, application program, etc., and a modem processor that primarily processes wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 1010.
The embodiment of the application also provides a readable storage medium, on which a program or an instruction is stored, which when executed by a processor, implements each process of the above-mentioned information interaction method embodiment, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here.
Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic or optical disk, etc.
The embodiment of the application further provides a chip, which comprises a processor and a communication interface, wherein the communication interface is coupled with the processor, and the processor is used for running programs or instructions to realize the processes of the information interaction method embodiment, and the same technical effects can be achieved, so that repetition is avoided, and the description is omitted here.
It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.
Embodiments of the present application provide a computer program product stored in a storage medium, where the program product is executed by at least one processor to implement the respective processes of the above-described information interaction method embodiments, and achieve the same technical effects, and for avoiding repetition, a detailed description is omitted herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims (11)

1.一种信息交互方法,其特征在于,包括:1. An information interaction method, characterized by comprising: 获取扩展现实设备的交互界面中产生的交互行为信息,所述交互界面包括至少一个显示窗口,所述交互行为信息为所述至少一个显示窗口的交互过程信息,所述交互行为信息包括所述显示窗口中至少一个显示对象各自被注视的注视时长;Acquire interactive behavior information generated in an interactive interface of an extended reality device, the interactive interface comprising at least one display window, the interactive behavior information being interactive process information of the at least one display window, the interactive behavior information comprising a gaze duration of each of at least one display object in the display window; 在所述交互行为信息表征所述注视时长达到预设注视时长的情况下,获取所述显示窗口在所述交互界面的空间属性信息、所述显示窗口的语义信息和用户对所述扩展现实设备的使用偏好信息,其中,所述语义信息根据所述显示窗口的类型信息和状态信息确定,所述空间属性信息根据所述显示窗口的尺寸信息和位置信息确定;When the interaction behavior information indicates that the gaze duration reaches a preset gaze duration, obtaining spatial attribute information of the display window on the interaction interface, semantic information of the display window, and user preference information for the extended reality device, wherein the semantic information is determined according to type information and state information of the display window, and the spatial attribute information is determined according to size information and position information of the display window; 根据所述交互行为信息、所述空间属性信息、所述语义信息和所述使用偏好信息,确定目标交互信息,所述目标交互信息与所述用户的操作意图相关联;Determining target interaction information according to the interaction behavior information, the spatial attribute information, the semantic information and the usage preference information, wherein the target interaction information is associated with the operation intention of the user; 在所述交互界面显示所述目标交互信息。The target interaction information is displayed on the interaction interface. 2.根据权利要求1所述的方法,其特征在于,所述获取扩展现实设备的交互界面中产生的交互行为信息,包括:2. The method according to claim 1, wherein obtaining the interactive behavior information generated in the interactive interface of the extended reality device comprises: 接收所述用户对所述至少一个显示窗口中所述显示对象的第一输入;receiving a first input from the user on the display object in the at least one display window; 响应于所述第一输入,获取与多个所述显示对象对应的显示对象序列、对每个所述显示对象进行注视的注视时长以及对所述至少一个显示窗口进行操作的目标操作行为;In response to the first input, acquiring a display object sequence corresponding to the plurality of display objects, a gaze duration for gazing at each of the display objects, and a target operation behavior for operating the at least one display window; 根据第一预设时长内每个所述显示对象分别对应的所述注视时长,确定所述用户对每个所述显示对象的停留时长信息;Determining the duration information of the user's stay on each of the displayed objects according to the gaze durations respectively corresponding to each of the displayed objects within the first preset duration; 根据所述目标操作行为,将所述显示对象序列拆分为多个操作片段;According to the target operation behavior, splitting the display object sequence into multiple operation segments; 根据所述显示对象序列、所述停留时长信息和所述操作片段确定所述交互行为信息。The interactive behavior information is determined according to the display object sequence, the stay duration information and the operation fragment. 3.根据权利要求1所述的方法,其特征在于,获取所述显示窗口的语义信息,包括:3. The method according to claim 1, wherein obtaining the semantic information of the display window comprises: 获取所述显示窗口的类型信息和状态信息,所述状态信息用于指示所述显示窗口是否激活;Acquire type information and state information of the display window, wherein the state information is used to indicate whether the display window is activated; 在所述类型信息为文本信息的情况下,对所述类型信息进行数值化处理,得到与所述类型信息对应的数值化结果;In the case where the type information is text information, performing numerical processing on the type information to obtain a numerical result corresponding to the type information; 获取与所述状态信息对应的激活权重;Obtaining an activation weight corresponding to the state information; 根据所述数值化结果和所述激活权重确定所述语义信息。The semantic information is determined according to the quantified result and the activation weight. 4.根据权利要求3所述的方法,其特征在于,所述显示窗口包括所述状态信息为处于激活状态的第一显示窗口和所述状态信息为处于非激活状态的至少一个第二显示窗口,所述获取与所述状态信息对应的激活权重,包括:4. The method according to claim 3, characterized in that the display window includes a first display window whose state information is in an activated state and at least one second display window whose state information is in an inactivated state, and the acquiring the activation weight corresponding to the state information comprises: 获取所述第一显示窗口与每个所述第二显示窗口之间的窗口距离,得到至少一个窗口距离;Acquire a window distance between the first display window and each of the second display windows to obtain at least one window distance; 根据所述至少一个窗口距离确定每个所述显示窗口对应的所述激活权重。The activation weight corresponding to each of the display windows is determined according to the at least one window distance. 5.根据权利要求4所述的方法,其特征在于,所述获取所述第一显示窗口与每个所述第二显示窗口之间的窗口距离,包括:5. The method according to claim 4, wherein obtaining the window distance between the first display window and each of the second display windows comprises: 获取所述第一显示窗口的锚点在单位球上的第一投影和所述第二显示窗口在所述单位球上的第二投影;Acquire a first projection of an anchor point of the first display window on a unit sphere and a second projection of the second display window on the unit sphere; 将所述第一投影和每个所述第二投影之间的球面距离确定为所述窗口距离。A spherical distance between the first projection and each of the second projections is determined as the window distance. 6.根据权利要求1所述的方法,其特征在于,获取所述用户对所述扩展现实设备的使用偏好信息,包括:6. The method according to claim 1, wherein obtaining the user's usage preference information for the extended reality device comprises: 获取所述用户在第二预设时长内对所述扩展现实设备中多个应用的使用比例信息和所述用户在多个时间段内对目标应用的使用频率信息,所述第二预设时长包括所述多个时间段,所述目标应用为所述多个应用中的任意一个;Acquire usage ratio information of multiple applications in the extended reality device by the user within a second preset time period and usage frequency information of a target application by the user within multiple time periods, wherein the second preset time period includes the multiple time periods, and the target application is any one of the multiple applications; 根据所述使用比例信息和所述使用频率信息,确定所述使用偏好信息。The usage preference information is determined according to the usage ratio information and the usage frequency information. 7.一种信息交互装置,其特征在于,包括:7. An information interaction device, comprising: 第一获取模块,用于获取扩展现实设备的交互界面中产生的交互行为信息,所述交互界面包括至少一个显示窗口,所述交互行为信息为所述至少一个显示窗口的交互过程信息,所述交互行为信息包括所述显示窗口中至少一个显示对象各自被注视的注视时长;A first acquisition module is used to acquire interaction behavior information generated in an interaction interface of an extended reality device, wherein the interaction interface includes at least one display window, and the interaction behavior information is interaction process information of the at least one display window, wherein the interaction behavior information includes a gaze duration of each of at least one display object in the display window; 第二获取模块,用于在所述交互行为信息表征所述注视时长达到预设注视时长的情况下,获取所述显示窗口在所述交互界面的空间属性信息、所述显示窗口的语义信息和用户对所述扩展现实设备的使用偏好信息,其中,所述语义信息根据所述显示窗口的类型信息和状态信息确定,所述空间属性信息根据所述显示窗口的尺寸信息和位置信息确定;A second acquisition module is used to acquire, when the interaction behavior information indicates that the gaze duration reaches a preset gaze duration, spatial attribute information of the display window on the interaction interface, semantic information of the display window, and user preference information for the extended reality device, wherein the semantic information is determined according to type information and state information of the display window, and the spatial attribute information is determined according to size information and position information of the display window; 确定模块,用于根据所述交互行为信息、所述空间属性信息、所述语义信息和所述使用偏好信息,确定目标交互信息,所述目标交互信息与所述用户的操作意图相关联;a determination module, configured to determine target interaction information according to the interaction behavior information, the spatial attribute information, the semantic information and the usage preference information, wherein the target interaction information is associated with the operation intention of the user; 显示模块,用于在所述交互界面显示所述目标交互信息。A display module is used to display the target interaction information on the interaction interface. 8.根据权利要求7所述的装置,其特征在于,所述第一获取模块包括:8. The device according to claim 7, wherein the first acquisition module comprises: 接收单元,用于接收所述用户对所述至少一个显示窗口中所述显示对象的第一输入;A receiving unit, configured to receive a first input from the user on the display object in the at least one display window; 第一获取单元,用于响应于所述第一输入,获取与多个所述显示对象对应的显示对象序列、对每个所述显示对象进行注视的注视时长以及对所述至少一个显示窗口进行操作的目标操作行为;A first acquisition unit, configured to acquire, in response to the first input, a display object sequence corresponding to the plurality of display objects, a gaze duration for gazing at each of the display objects, and a target operation behavior for operating the at least one display window; 第一确定单元,用于根据第一预设时长内每个所述显示对象分别对应的所述注视时长,确定所述用户对每个所述显示对象的停留时长信息;A first determining unit, configured to determine the duration information of the user's stay on each of the displayed objects according to the gaze durations respectively corresponding to each of the displayed objects within a first preset duration; 拆分单元,用于根据所述目标操作行为,将所述显示对象序列拆分为多个操作片段;A splitting unit, used for splitting the display object sequence into multiple operation segments according to the target operation behavior; 第二确定单元,用于根据所述显示对象序列、所述停留时长信息和所述操作片段确定所述交互行为信息。The second determining unit is used to determine the interactive behavior information according to the display object sequence, the stay duration information and the operation fragment. 9.一种电子设备,其特征在于,包括处理器和存储器,所述存储器存储可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1-6任一项所述的信息交互方法的步骤。9. An electronic device, characterized in that it comprises a processor and a memory, wherein the memory stores programs or instructions that can be run on the processor, and when the program or instructions are executed by the processor, the steps of the information interaction method according to any one of claims 1 to 6 are implemented. 10.一种可读存储介质,其特征在于,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如权利要求1-6任一项所述的信息交互方法的步骤。10. A readable storage medium, characterized in that the readable storage medium stores a program or instruction, and when the program or instruction is executed by a processor, the steps of the information interaction method according to any one of claims 1 to 6 are implemented. 11.一种计算机程序产品,其特征在于,所述计算机程序产品被存储在存储介质中,所述计算机程序产品被至少一个处理器执行以实现如权利要求1-6任一项所述的信息交互方法的步骤。11. A computer program product, characterized in that the computer program product is stored in a storage medium, and the computer program product is executed by at least one processor to implement the steps of the information interaction method according to any one of claims 1 to 6.
CN202411302934.0A 2024-09-18 2024-09-18 Information interaction method, device, electronic device, storage medium and program product Pending CN119166004A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411302934.0A CN119166004A (en) 2024-09-18 2024-09-18 Information interaction method, device, electronic device, storage medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411302934.0A CN119166004A (en) 2024-09-18 2024-09-18 Information interaction method, device, electronic device, storage medium and program product

Publications (1)

Publication Number Publication Date
CN119166004A true CN119166004A (en) 2024-12-20

Family

ID=93892279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411302934.0A Pending CN119166004A (en) 2024-09-18 2024-09-18 Information interaction method, device, electronic device, storage medium and program product

Country Status (1)

Country Link
CN (1) CN119166004A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119806334A (en) * 2025-03-12 2025-04-11 湖南华焜数创科技有限公司 Laser projection interactive display method, device and system for improving timeliness

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119806334A (en) * 2025-03-12 2025-04-11 湖南华焜数创科技有限公司 Laser projection interactive display method, device and system for improving timeliness

Similar Documents

Publication Publication Date Title
US11829720B2 (en) Analysis and validation of language models
US11790914B2 (en) Methods and user interfaces for voice-based control of electronic devices
US11181988B1 (en) Incorporating user feedback into text prediction models via joint reward planning
US10275022B2 (en) Audio-visual interaction with user devices
Chen et al. User-defined gestures for gestural interaction: extending from hands to other body parts
US20220229985A1 (en) Adversarial discriminative neural language model adaptation
JP6602372B2 (en) Inactive area of touch surface based on contextual information
RU2632144C1 (en) Computer method for creating content recommendation interface
US20180349349A1 (en) Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling
EP3693958B1 (en) Electronic apparatus and control method thereof
CN112424736B (en) Machine Interaction
US20140049462A1 (en) User interface element focus based on user's gaze
US20230394248A1 (en) Injection of user feedback into language model adaptation
JP2017531246A (en) Handedness detection from touch input
TW201232329A (en) Method, apparatus and system for interacting with content on web browsers
US20230319224A1 (en) User interfaces for managing visual content in a media representation
EP3876085A1 (en) Self-learning digital interface
Li et al. Predicting the noticeability of dynamic virtual elements in virtual reality
CN119166004A (en) Information interaction method, device, electronic device, storage medium and program product
Lang et al. A multimodal smartwatch-based interaction concept for immersive environments
Kong et al. The Ability-Based Design Mobile Toolkit (ABD-MT): Developer Support for Runtime Interface Adaptation Based on Users' Abilities
US11199906B1 (en) Global user input management
CN118543104A (en) Interface interaction method and device in game, storage medium and electronic device
US10915221B2 (en) Predictive facsimile cursor
KR20150093045A (en) Sketch Retrieval system, user equipment, service equipment and service method based on meteorological phenomena information and computer readable medium having computer program recorded therefor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination