WO2025188634A1

WO2025188634A1 - Techniques for capturing media

Info

Publication number: WO2025188634A1
Application number: PCT/US2025/018154
Authority: WO
Original assignee: Ferrix Industrial LLC
Current assignee: Ferrix Industrial LLC
Priority date: 2024-03-07
Filing date: 2025-03-03
Publication date: 2025-09-12
Anticipated expiration: 2026-09-07
Also published as: WO2025188634A8

Abstract

The present disclosure generally relates to capturing media. Some techniques are for selectively capturing media in accordance with some embodiments. Other techniques are for repositioning a camera in accordance with some embodiments. Other techniques are for providing composition guidance in accordance with some embodiments.

Description

TECHNIQUES FOR CAPTURING MEDIA

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] The present application claims priority to U.S. Provisional Patent Application Serial No. 63/562,635, entitled “TECHNIQUES FOR CAPTURING MEDIA,” filed March 7, 2024, which is hereby incorporated by reference in its entirety for all purposes.

BACKGROUND

[0002] Computer systems often capture media including still photos and video. Computer systems often frame an environment for capture based on how a user is holding the computer systems.

SUMMARY

[0003] Existing techniques for capturing media are generally cumbersome and inefficient. For example, some existing techniques use a complex and time-consuming user interface, which may include multiple key presses or keystrokes. Some existing techniques require more time than necessary, wasting user time and device energy. This latter consideration is particularly important in battery-operated devices.

[0004] Accordingly, the present technique provides electronic devices with faster, more efficient methods and interfaces for capturing media. Such methods and interfaces optionally complement or replace other methods capturing media. Such methods and interfaces reduce the cognitive burden on a user and produce a more efficient human -machine interface. For battery-operated computing devices, such methods and interfaces conserve power and increase the time between battery charges.

[0005] In some embodiments, a method that is performed at a computer system that is in communication with a media capture component and a microphone is described. In some embodiments, the method comprises: detecting, via the microphone, a first input corresponding to a request to capture media; and after detecting the first input corresponding to the request to capture media: in accordance with a determination that the first input corresponds to a first instruction, capturing, via the media capture component, media in response to a first set of one or more conditions being satisfied; and in accordance with a determination that the first input corresponds to a second instruction different from the first instruction, capturing, via the media capture component, media in response to a second set of one or more conditions being satisfied, wherein the second set of one or more conditions is different from the first set of one or more conditions.

[0006] In some embodiments, a non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component and a microphone is described. In some embodiments, the one or more programs includes instructions for: detecting, via the microphone, a first input corresponding to a request to capture media; and after detecting the first input corresponding to the request to capture media: in accordance with a determination that the first input corresponds to a first instruction, capturing, via the media capture component, media in response to a first set of one or more conditions being satisfied; and in accordance with a determination that the first input corresponds to a second instruction different from the first instruction, capturing, via the media capture component, media in response to a second set of one or more conditions being satisfied, wherein the second set of one or more conditions is different from the first set of one or more conditions.

[0007] In some embodiments, a transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component and a microphone is described. In some embodiments, the one or more programs includes instructions for: detecting, via the microphone, a first input corresponding to a request to capture media; and after detecting the first input corresponding to the request to capture media: in accordance with a determination that the first input corresponds to a first instruction, capturing, via the media capture component, media in response to a first set of one or more conditions being satisfied; and in accordance with a determination that the first input corresponds to a second instruction different from the first instruction, capturing, via the media capture component, media in response to a second set of one or more conditions being satisfied, wherein the second set of one or more conditions is different from the first set of one or more conditions.

[0008] In some embodiments, a computer system configured to communicate with a media capture component and a microphone is described. In some embodiments, the computer system comprises one or more processors and memory storing one or more programs configured to be executed by the one or more processors. In some embodiments, the one or more programs includes instructions for: detecting, via the microphone, a first input corresponding to a request to capture media; and after detecting the first input corresponding to the request to capture media: in accordance with a determination that the first input corresponds to a first instruction, capturing, via the media capture component, media in response to a first set of one or more conditions being satisfied; and in accordance with a determination that the first input corresponds to a second instruction different from the first instruction, capturing, via the media capture component, media in response to a second set of one or more conditions being satisfied, wherein the second set of one or more conditions is different from the first set of one or more conditions.

[0009] In some embodiments, a computer system configured to communicate with a media capture component and a microphone is described. In some embodiments, the computer system comprises means for performing each of the following steps: detecting, via the microphone, a first input corresponding to a request to capture media; and after detecting the first input corresponding to the request to capture media: in accordance with a determination that the first input corresponds to a first instruction, capturing, via the media capture component, media in response to a first set of one or more conditions being satisfied; and in accordance with a determination that the first input corresponds to a second instruction different from the first instruction, capturing, via the media capture component, media in response to a second set of one or more conditions being satisfied, wherein the second set of one or more conditions is different from the first set of one or more conditions.

[0010] In some embodiments, a computer program product is described. In some embodiments, the computer program product comprises one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component and a microphone. In some embodiments, the one or more programs include instructions for: detecting, via the microphone, a first input corresponding to a request to capture media; and after detecting the first input corresponding to the request to capture media: in accordance with a determination that the first input corresponds to a first instruction, capturing, via the media capture component, media in response to a first set of one or more conditions being satisfied; and in accordance with a determination that the first input corresponds to a second instruction different from the first instruction, capturing, via the media capture component, media in response to a second set of one or more conditions being satisfied, wherein the second set of one or more conditions is different from the first set of one or more conditions.

[0011] In some embodiments, a method that is performed at a computer system that is in communication with a media capture component and a movement component is described. In some embodiments, the method comprises: while capturing video via the media capture component: in accordance with a determination that a first set of one or more capture conditions is satisfied, moving, via the movement component, a portion of the computer system that includes the media capture component in a first movement pattern, wherein moving the portion of the computer system in the first movement pattern causes framing of the video to change as the portion of the computer system moves; and in accordance with a determination that a second set of one or more capture conditions is satisfied, wherein the second set of one or more capture conditions is different from the first set of one or more capture conditions, moving, via the movement component, the portion of the computer system in a second movement pattern different from the first movement pattern, wherein moving the portion of the computer system in the second movement pattern causes framing of the video to change as the portion of the computer system moves.

[0012] In some embodiments, a non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component and a movement component is described. In some embodiments, the one or more programs includes instructions for: while capturing video via the media capture component: in accordance with a determination that a first set of one or more capture conditions is satisfied, moving, via the movement component, a portion of the computer system that includes the media capture component in a first movement pattern, wherein moving the portion of the computer system in the first movement pattern causes framing of the video to change as the portion of the computer system moves; and in accordance with a determination that a second set of one or more capture conditions is satisfied, wherein the second set of one or more capture conditions is different from the first set of one or more capture conditions, moving, via the movement component, the portion of the computer system in a second movement pattern different from the first movement pattern, wherein moving the portion of the computer system in the second movement pattern causes framing of the video to change as the portion of the computer system moves. [0013] In some embodiments, a transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component and a movement component is described. In some embodiments, the one or more programs includes instructions for: while capturing video via the media capture component: in accordance with a determination that a first set of one or more capture conditions is satisfied, moving, via the movement component, a portion of the computer system that includes the media capture component in a first movement pattern, wherein moving the portion of the computer system in the first movement pattern causes framing of the video to change as the portion of the computer system moves; and in accordance with a determination that a second set of one or more capture conditions is satisfied, wherein the second set of one or more capture conditions is different from the first set of one or more capture conditions, moving, via the movement component, the portion of the computer system in a second movement pattern different from the first movement pattern, wherein moving the portion of the computer system in the second movement pattern causes framing of the video to change as the portion of the computer system moves.

[0014] In some embodiments, a computer system configured to communicate with a media capture component and a movement component is described. In some embodiments, the computer system comprises one or more processors and memory storing one or more programs configured to be executed by the one or more processors. In some embodiments, the one or more programs includes instructions for: while capturing video via the media capture component: in accordance with a determination that a first set of one or more capture conditions is satisfied, moving, via the movement component, a portion of the computer system that includes the media capture component in a first movement pattern, wherein moving the portion of the computer system in the first movement pattern causes framing of the video to change as the portion of the computer system moves; and in accordance with a determination that a second set of one or more capture conditions is satisfied, wherein the second set of one or more capture conditions is different from the first set of one or more capture conditions, moving, via the movement component, the portion of the computer system in a second movement pattern different from the first movement pattern, wherein moving the portion of the computer system in the second movement pattern causes framing of the video to change as the portion of the computer system moves. [0015] In some embodiments, a computer system configured to communicate with a media capture component and a movement component is described. In some embodiments, the computer system comprises means for performing each of the following steps: while capturing video via the media capture component: in accordance with a determination that a first set of one or more capture conditions is satisfied, moving, via the movement component, a portion of the computer system that includes the media capture component in a first movement pattern, wherein moving the portion of the computer system in the first movement pattern causes framing of the video to change as the portion of the computer system moves; and in accordance with a determination that a second set of one or more capture conditions is satisfied, wherein the second set of one or more capture conditions is different from the first set of one or more capture conditions, moving, via the movement component, the portion of the computer system in a second movement pattern different from the first movement pattern, wherein moving the portion of the computer system in the second movement pattern causes framing of the video to change as the portion of the computer system moves.

[0016] In some embodiments, a computer program product is described. In some embodiments, the computer program product comprises one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component and a movement component. In some embodiments, the one or more programs include instructions for: while capturing video via the media capture component: in accordance with a determination that a first set of one or more capture conditions is satisfied, moving, via the movement component, a portion of the computer system that includes the media capture component in a first movement pattern, wherein moving the portion of the computer system in the first movement pattern causes framing of the video to change as the portion of the computer system moves; and in accordance with a determination that a second set of one or more capture conditions is satisfied, wherein the second set of one or more capture conditions is different from the first set of one or more capture conditions, moving, via the movement component, the portion of the computer system in a second movement pattern different from the first movement pattern, wherein moving the portion of the computer system in the second movement pattern causes framing of the video to change as the portion of the computer system moves.

[0017] In some embodiments, a method that is performed at a computer system that is in communication with a media capture component, an input component, and an output component is described. In some embodiments, the method comprises: detecting, via the input component, a first set of one or more inputs corresponding to one or more instructions that include one or more spoken words; in response to detecting the first set of one or more inputs corresponding to the one or more instructions, preparing to capture media via the media capture component; and while preparing to capture media via the media capture component: in accordance with a determination that the one or more instructions includes first content, providing, via the output component, first composition guidance, wherein the first composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component; and in accordance with a determination that the one or more instructions includes second content different from the first content, providing, via the output component, second composition guidance different from the first composition guidance, wherein the second composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component.

[0018] In some embodiments, a non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component, an input component, and an output component is described. In some embodiments, the one or more programs includes instructions for: detecting, via the input component, a first set of one or more inputs corresponding to one or more instructions that include one or more spoken words; in response to detecting the first set of one or more inputs corresponding to the one or more instructions, preparing to capture media via the media capture component; and while preparing to capture media via the media capture component: in accordance with a determination that the one or more instructions includes first content, providing, via the output component, first composition guidance, wherein the first composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component; and in accordance with a determination that the one or more instructions includes second content different from the first content, providing, via the output component, second composition guidance different from the first composition guidance, wherein the second composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component. [0019] In some embodiments, a transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component, an input component, and an output component is described. In some embodiments, the one or more programs includes instructions for: detecting, via the input component, a first set of one or more inputs corresponding to one or more instructions that include one or more spoken words; in response to detecting the first set of one or more inputs corresponding to the one or more instructions, preparing to capture media via the media capture component; and while preparing to capture media via the media capture component: in accordance with a determination that the one or more instructions includes first content, providing, via the output component, first composition guidance, wherein the first composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component; and in accordance with a determination that the one or more instructions includes second content different from the first content, providing, via the output component, second composition guidance different from the first composition guidance, wherein the second composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component.

[0020] In some embodiments, a computer system configured to communicate with a media capture component, an input component, and an output component is described. In some embodiments, the computer system comprises one or more processors and memory storing one or more programs configured to be executed by the one or more processors. In some embodiments, the one or more programs includes instructions for: detecting, via the input component, a first set of one or more inputs corresponding to one or more instructions that include one or more spoken words; in response to detecting the first set of one or more inputs corresponding to the one or more instructions, preparing to capture media via the media capture component; and while preparing to capture media via the media capture component: in accordance with a determination that the one or more instructions includes first content, providing, via the output component, first composition guidance, wherein the first composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component; and in accordance with a determination that the one or more instructions includes second content different from the first content, providing, via the output component, second composition guidance different from the first composition guidance, wherein the second composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component.

[0021] In some embodiments, a computer system configured to communicate with a media capture component, an input component, and an output component is described. In some embodiments, the computer system comprises means for performing each of the following steps: detecting, via the input component, a first set of one or more inputs corresponding to one or more instructions that include one or more spoken words; in response to detecting the first set of one or more inputs corresponding to the one or more instructions, preparing to capture media via the media capture component; and while preparing to capture media via the media capture component: in accordance with a determination that the one or more instructions includes first content, providing, via the output component, first composition guidance, wherein the first composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component; and in accordance with a determination that the one or more instructions includes second content different from the first content, providing, via the output component, second composition guidance different from the first composition guidance, wherein the second composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component.

[0022] In some embodiments, a computer program product is described. In some embodiments, the computer program product comprises one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component, an input component, and an output component. In some embodiments, the one or more programs include instructions for: detecting, via the input component, a first set of one or more inputs corresponding to one or more instructions that include one or more spoken words; in response to detecting the first set of one or more inputs corresponding to the one or more instructions, preparing to capture media via the media capture component; and while preparing to capture media via the media capture component: in accordance with a determination that the one or more instructions includes first content, providing, via the output component, first composition guidance, wherein the first composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component; and in accordance with a determination that the one or more instructions includes second content different from the first content, providing, via the output component, second composition guidance different from the first composition guidance, wherein the second composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component.

[0023] Executable instructions for performing these functions are, optionally, included in a non-transitory computer-readable storage medium or other computer program product configured for execution by one or more processors. Executable instructions for performing these functions are, optionally, included in a transitory computer-readable storage medium or other computer program product configured for execution by one or more processors.

DESCRIPTION OF THE FIGURES

[0024] For a better understanding of the various described embodiments, reference should be made to the Detailed Description below, in conjunction with the following drawings in which like reference numerals refer to corresponding parts throughout the figures.

[0025] FIG. l is a block diagram illustrating a computer system in accordance with some embodiments.

[0026] FIGS. 2A-2C are diagrams illustrating exemplary components and user interfaces of electronic device in accordance with some embodiments.

[0027] FIG. 3 is a block diagram illustrating exemplary components of a device in accordance with some embodiments.

[0028] FIG. 4 is a functional diagram of an exemplary actuator device in accordance with some embodiments.

[0029] FIG. 5 is a functional diagram of an exemplary agent system in accordance with some embodiments.

[0030] FIGS. 6A-6M illustrate exemplary user interfaces for capturing media in accordance with some embodiments. [0031] FIG. 7 is a flow diagram illustrating a method for selectively capturing media in accordance with some embodiments.

[0032] FIG. 8 is a flow diagram illustrating a method for repositioning a camera in accordance with some embodiments.

[0033] FIG. 9 is a flow diagram illustrating a method for providing composition guidance in accordance with some embodiments.

DETAILED DESCRIPTION

[0034] The description to follow sets forth exemplary methods, components, parameters, and the like. While specific examples are set out below, it should be recognized that such embodiments should not be understood as limiting the scope of the present disclosure to the explicit descriptions of the examples set forth herein but instead should be understood as providing illustrative examples.

[0035] One or more steps of the methods described herein can rely on (be contingent on) one or more conditions being satisfied. In some embodiments, a method is performed by iterating a process multiple times. In some embodiments, contingent steps can be satisfied on different iterations of the same process and still be within the scope of the methods described herein. For example, for a given method that includes two steps that are contingent on different conditions, one of ordinary skill in the art would understand that the given method is considered performed even when a process is repeated multiple times until the contingent steps are satisfied. In some embodiments, multiple iterations of a process are not required to in order to practice claims as presented herein. For example, electronic device, system, or computer readable medium claims can be performed without iteratively repeating a process. In some embodiments, the electronic device, system, or computer readable medium claims include instructions for performing one or more steps that are contingent upon one or more conditions being satisfied. Because such instructions are stored in one or more processors and/or at one or more memory locations, the electronic device, system, or computer readable medium claims can include logic that determines whether the one or more conditions have been satisfied without needing to repeat steps of a process.

[0036] Although elements are described below using numerical descriptors, such as “a first” and/or “a second,” these elements do not correspond to order or distinct representations and should not be limited to the stated numerical term. In some embodiments, these terms simply used as prefix to distinguish a reference to one element from a reference to another element. For example, a “first” device and a “second” device can be two separate references to the same device. In contrast, for example, a “first” device and a “second” device can be a reference to two different devices (e.g., not the same device and/or not the same type of device). For example, a first computer system and a second computer system do not correspond to a first and a second in time, and merely are used to distinguish between two computer systems. As such, the first computer system can be termed a second computer system, and the second computer system can be termed a first computer system without departing from the scope of the various described embodiments.

[0037] For description of various elements and examples, the use of certain terminology is used to provide productive descriptions of the subject matter below and should not be read as limiting. As used to describe various examples herein, the singular forms of “a,” “an,” and “the” should not be interpreted as precluding or excluding the plural forms as well, unless the context clearly indicates otherwise. As well, “and/or” is used to encompasses any and all possible combinations of one or more associated listed items. For example, “x and/or y” should be interpreted as including “x,” or “y,” as well as “x and y” as possible permutations. Further, the use of the terms “includes,” “including,” “comprises,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

[0038] When describing choices and/or logical possibilities, the term “if’ is, optionally, construed to mean “when,” “upon,” “in response to determining,” “in response to detecting,” or “in accordance with a determination that” depending on the context. Similarly, the phrase “if it is determined” or “if [a stated condition or event] is detected” is, optionally, construed to mean “upon determining,” “in response to determining,” “upon detecting [the stated condition or event],” “in response to detecting [the stated condition or event],” or “in accordance with a determination that [the stated condition or event]” depending on the context.

[0039] The processes described below enhance the operability of the devices and make the user-device and/or user-device interfaces more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the device) through various techniques, including by providing improved feedback (e.g., visual, haptic, acoustic, and/or tactile feedback) to the user, reducing the number of inputs needed to perform an operation, providing additional control options without cluttering the user interface with additional displayed controls, performing an operation when a set of conditions has been met without requiring further input (e.g., input by a user), and/or additional techniques, such as increasing the security and/or privacy of the computer system and reducing burn-in of one or more portions of a user interface of a display. These techniques also reduce power usage and improve battery life of the device by enabling the user to use the device more quickly and efficiently.

[0040] FIG. 1 depicts a block diagram of computer system 100 (e.g., electronic device and/or electronic system) including a set of electronic components in communication with (e.g., connected to) (e.g., wired or wirelessly) to each other. It should be understood that computer system 100 is merely one example of a computer system that can be used to perform functionality described below and that one or more other computer systems can be used to perform the functionality described below. Additionally, while FIG. 1 depicts a computer architecture of computer system 100, other computer architectures (e.g., including more components, similar components, and/or fewer components) of a computer system can be used to perform functionality described herein.

[0041] In some embodiments, computer system 100 can correspond to (e.g., be and/or include) a system on a chip, a server system, a personal computer system, a smart phone, a smart watch, a wearable device, a tablet, a laptop computer, a fitness tracking device, a headmounted display (HMD) device, a desktop computer, a communal device (e.g., smart speaker, connected thermostat, and/or additional home based computer systems), an accessory (e.g., switch, light, speaker, air conditioner, heater, window cover, fan, lock, media playback device, television, and so forth), a controller, a hub, and/or a sensor.

[0042] In some embodiments, a sensor includes one or more hardware components capable of detecting (e.g., sensing, generating, and/or processing) information about a physical environment in proximity to the sensor. For example, a sensor can be configured to detect information surrounding the sensor, detect information in one or more directions casting away from the sensor, and/or detect information based on contact of the sensor with an element of the physical environment. In some embodiments, a hardware component of a sensor includes a sensing component (e.g., a temperature and/or image sensor), a transmitting component (e.g., a radio and/or laser transmitter), and/or a receiving component (e.g., a laser and/or radio receiver). In some embodiments, a sensor includes an angle sensor, a breakage sensor, a flow sensor, a force sensor, a gas sensor, a humidity or moisture sensor, a glass breakage sensor, a chemical sensor, a contact sensor, a non-contact sensor, an image sensor (e.g., a RGB camera and/or an infrared sensor), a particle sensor, a photoelectric sensor (e.g., ambient light and/or solar), a position sensor (e.g., a global positioning system), a precipitation sensor, a pressure sensor, a proximity sensor, a radiation sensor, an inertial measurement unit, a leak sensor, a level sensor, a metal sensor, a microphone, a motion sensor, a range or depth sensor (e.g., RADAR, LiDAR), a speed sensor, a temperature sensor, a time-of-flight sensor, a torque sensor, and an ultrasonic sensor, a vacancy sensor, a presence sensor, a voltage and/or current sensor, a conductivity sensor, a resistivity sensor, a capacitive sensor, and/or a water sensor. While only a single computer system is depicted in FIG. 1, functionality described below can be implemented with two or more computer systems operating together. Additionally, in some embodiments, computer system 100 includes one or more sensors as described above, and information about the physical environment is captured by combining data from one sensor with data from one or more additional sensors (e.g., that are part of the computer and/or one or more additional computer systems).

[0043] As illustrated in FIG. 1, computer system 100 consists of processor subsystem 110, memory 120, and VO interface 130. Memory 120 corresponds to system memory in communication with processor subsystem 110. The electronic components making up computer system 100 are electrically connected through interconnect 150, which allows communication between the components of computer system 100. For example, interconnect 150 can be a system bus, one or more memory locations, and/or additional electrical channels for connective multiple components of computer system 100. Also, I/O interface 130 is connected to, via a wired and/or wireless connection, I/O device 140. In some embodiments, computer system 100 includes a component made up of I/O interface 130 and I/O device 140 such that the functionality of the individual components is included in the component. Additionally, it should be understood that computer system 100 can include one or more I/O interfaces, communicating with one or more I/O devices. In some embodiments, computer system 100 consists of multiple processor subsystem 100s, each electrically connected through interconnect 150. [0044] In some embodiments, processor subsystem 110 includes one or more processors or individual processing units capable of executing instructions (e.g., program, system, and/or interrupt) to perform functionality described herein. For example, operating system level and/or application level instructions executed by processor subsystem 110. In some embodiments, processor subsystem 110 includes one or more components (e.g., implemented as hardware, software, and/or a combination thereof) capable of supporting, interpreting, and/or performing machine learning instructions and/or operations. For example, computer system 100 can perform operations according to a machine learning model locally. Alternatively, or in addition, computer system 100 can communicate with (e.g., performing calculations on and/or executing instructions corresponding to) a remote interactive knowledge base (e.g., a processing resource that implements a machine learning model, artificial intelligence model, and/or large language model) to perform operations that can be otherwise outside a set of capabilities of computer system 100. For example, computer system 100 can determine a set of inputs (e.g., instructions, data, and/or parameters) to the interactive knowledge base for performing desired machine learning operations.

[0045] Memory 120 in communication with processor subsystem 110 can be implemented by a variety of different physical, non-transitory memory media. In some embodiments, computer system 100 includes multiple memory components and/or multiple types of memory components, each connected to processor subsystem 110 directly and/or via interconnect 150. For example, memory 120 can be implemented using a removable flash drive, storage array, a storage area network (e.g., SAN), flash memory, hard disk storage, optical drive storage, floppy disk storage, removable disk storage, random access memory (e g., SDRAM, DDR SDRAM, RAM-SRAM, EDO RAM, and/or RAMBUS RAM), and/or read only memory (e.g., PROM and/or EEPROM). Additionally, in some embodiments, processor subsystem 110 and/or interconnect 150 is connected to a memory controller that is electrically connected to memory 120.

[0046] In some embodiments, instructions can be executed by processor subsystem 110. In this example, memory 120 can include a computer readable medium (e.g., non-transitory or transitory computer readable medium) usable to store (e.g., configured to store, assigned to store, and/or that stores) instructions to be executable by processor subsystem 110. In some embodiments, each instruction stored by memory 120 and executed by processor subsystem 110 corresponds to an operation for completing the functionality described herein. For example, memory 120 can store program instructions to implement the functionality associated with methods 700, 800, and 900 (FIGS. 7, 8, and 9) described below.

[0047] As mentioned above, I/O interface 130 can be one or more types of interfaces enabling computer system 100 to communicate with other devices. In some embodiments, I/O interface 130 includes a bridge chip (e.g., Southbridge) from a front-side bus to one or more back-side buses. In some embodiments, I/O interface 130 enables communication with one or more I/O devices, illustrated as I/O device 140, via one or more corresponding buses or other interfaces. For example, an VO device can include one or more: physical userinterface devices (e.g., a physical keyboard, a mouse, and/or a joystick), storage devices (e.g., as described above with respect to memory 120), network interface devices (e.g., to a local or wide-area network), sensor devices (e.g., as described above with respect to sensors), and/or auditory and/or visual output devices (e.g., screen, speaker, light, and/or projector). In some embodiments, the visual output device is referred to as a display component. For example, the display component can be configured to provide visual output, such as displaying images on a physically viewable medium via an LED display or image projection. As used herein, “displaying” content includes causing to display the content (e.g., video data rendered and/or decoded by a display controller) by transmitting, via a wired or wireless connection, data (e.g., image data and/or video data) to an integrated or external display component to visually produce the content.

[0048] In some embodiments, computer system 100 includes a component that integrates EO device 140 with other components (e.g., a component that includes EO interface 130 and EO device 140). In some embodiments, EO device 140 is separate from other components of computer system 100 (e.g., is a discrete component). In some embodiments, EO device 140 includes a network interface device that permits computer system 100 to connect to (e.g., communicate with) a network or other computer systems, in a wired or wireless manner. In some embodiments, a network interface device can include Wi-Fi, Bluetooth, NFC, USB, Thunderbolt, Ethernet, and so forth. For example, computer system 100 can utilize an NFC connection to facilitate a bank, credit, financial, token (e.g., fungible or non-fungible token), and/or cryptocurrency transaction between computer system 100 and another computer system within proximity.

[0049] In some embodiments, EO device 140 includes components for detecting a user (e.g., a user, a person, an animal, another computer system different from the computer system, and/or an object) and/or an input (e.g., a tap input and/or a non-tap input (e.g., a verbal input, an acoustic request, an acoustic command, an acoustic statement, a swipe input, a hold-and-drag input, a gaze input, an air gesture, and/or a mouse click)) from a detected user. In some embodiments, I/O device 140 enables computer system 100 to identify users associated with and/or without an account within an environment. For example, computer system 100 can detect a known user (e.g., a user that corresponds to an account) and access information about the user using the known user’s account. In some embodiments, as part of computer system 100 detecting a user, computer system 100 detects that the user’s account is associated with (e.g., is included in and/or identified with respect to) a group of users. For example, computer system 100 can access information associated with a family of accounts in response to detecting a member of the family that is defined as a group of accounts. In some embodiments, an account corresponding to a user can be connected with additional accounts and/or additional computer systems. For example, computer system 100 can detect such additional computer systems and/or detect such computer systems for detecting the user. In some embodiments, computer system 100 detects unknown users and enables guest accounts for the unknown users to utilize computer system 100.

[0050] In some embodiments, VO device 140 includes one or more cameras. In some embodiments, a camera includes an image sensor (e.g., one or more optical sensors and/or one or more depth camera sensors) that provides computer system 100 with the ability to detect a user and/or a user’s gestures (e.g., hand gestures and/or air gestures) as input. In some embodiments, an air gesture is a gesture that is detected without the user touching an input element that is part of the device (or independently of an input element that is a part of the device) and is based on detected motion of a portion of the user’s body through the air including motion of the user’s body relative to an absolute reference (e.g., an angle of the user’s arm relative to the ground or a distance of the user’s hand relative to the ground), relative to another portion of the user’s body (e.g., movement of a hand of the user relative to a shoulder of the user, movement of one hand of the user relative to another hand of the user, and/or movement of a finger of the user relative to another finger or portion of a hand of the user), and/or absolute motion of a portion of the user’s body (e.g., a tap gesture that includes movement of a hand in a predetermined pose by a predetermined amount and/or speed, or a shake gesture that includes a predetermined speed or amount of rotation of a portion of the user’s body). In some embodiments, the one or more cameras enable computer system 100 to transmit pictorial and/or video information to an application. For example, image data captured by a camera can enable computer system 100 to complete a video phone call by transmitting video data to an application for performing the video phone call.

[0051] In some embodiments, I/O device 140 includes one or more microphones. For example, a microphone can be used by 100 to obtain data and/or information from a user without a contact input. In some embodiments, a microphone enables computer system 100 to detect verbal and/or speech input from a user. In some embodiments, computer system 100 utilizes speech input to enable personal assistant functionality. For example, a user eliciting a request to computer system 100 to perform an action and/or obtain information for the user. In some embodiments, computer system 100 utilizes speech input (e.g., along with one or more other input and/or output techniques) to request and/or detect information from a user without requiring the user to make physical contact with computer system 100.

[0052] In some embodiments, VO device 140 includes physical input mediums for a user to interact directly with computer system 100. In some embodiments, a physical input medium includes one or more physical buttons (e.g., tactile depressible button and/or touch sensitive non-depressible component) on computer system 100 and/or connected to computer system 100, a mouse and keyboard input method (e.g., connected to computer system 100 together and/or separately with one or more I/O interfaces), and/or a touch sensitive display component.

[0053] In some embodiments, I/O device 140 includes one or more components for outputting information (e.g., a display component, an audio generation component, a speaker, a haptic output device, a display screen, a projector, and/or a touch-sensitive display). In some embodiments, computer system 100 uses I/O device 140 to convey information and/or a state of computer system 100. In some embodiments, I/O device 140 includes a tactile output component. For example, a tactile output component can be a haptic generation component that enables computer system 100 to convey information to a user in contact with (e.g., holding, touching, and/or nearby) computer system 100. In some embodiments, I/O device 140 includes one or more components for outputting visual outputs (e.g., video, image, animation, 3D rendering, augmented reality overlay, motion graphics, data visualization, digital art, etc.). For example, displaying content from one or more applications and/or system applications, and/or displaying a widget (e.g., a control that displays real-time information and/or data) corresponding to one or more applications. [0054] In some embodiments, I/O device 140 includes one or more components for outputting audio (e.g., smart speakers, home theater system, soundbars, headphones, earphones, earbuds, speakers, television speakers, augmented reality headset speakers, audio jacks, optical audio output, Bluetooth audio outputs, HDMI audio outputs, audio sensors, etc.). In some embodiments, computer system 100 is able to output audio through the one or more speakers. For example, computer system 100 outputting audio-based content and/or information to a user. In some embodiments, the one or more speakers enable spatial audio (e.g., an audio output corresponding to an environment (e.g., computer system 100 detecting materials and/or objects within the environment and/or computer system 100 altering the audio pattern, intensity, and/or waveform to compensate for varying characteristics of an environment)).

[0055] FIGS. 2-5 illustrate exemplary components and user interfaces of device 200 in accordance with some embodiments. Device 200 (sometimes referred to herein as device 200) can include one or more features of computer system 100. In the examples described with respect to FIGS. 2-5, device 200 is a laptop computer. In some embodiments, device 200 is not limited to being a laptop computer and one of ordinary skill in the art should recognize that device 200 can be one or more other devices (e.g., as described herein and/or that include one or more of the components and/or functions described herein with respect to device 200). For example, device 200 can be a communal device (such as a smart display, a smart speaker, and/or a television) and/or a personal device (such as a smart phone, a smart watch, a tablet, a desktop computer, a fitness tracking device, and/or a head mounted display device). In some embodiments, a communal device is configured to provide functionality to multiple users (e.g., at the same time and/or at different times). In such embodiments, the communal device can be administered and/or set up by a single user. In some embodiments, a personal device is configured to provide functionality to a single user (e.g., at a time, such as when the single user is logged into the personal device).

[0056] FIGS. 2A-2C illustrate device 200 in three different physical positions. As illustrated in FIG. 2A, device 200 is a laptop computer (also referred to herein as a “laptop”) that includes base portion 200-2 (e.g., that rests on a surface, such as a desk, horizontally as shown in FIG. 2A) and display portion 200-1 that is connected to base portion 200-2 at connection 200-3 (e.g., one or more connection points, a motorized arm, a hinge, and/or a joint) that enables display portion 200-1 to pivot and/or change orientation with respect to base portion 200-2. For example, device 200 can pivot at connection 200-3 to rotate display portion 200-1 and/or device 200 to one or more positions corresponding to an “OFF” internal state (e.g., as further described below in relation to FIG. 2C). In some embodiments, a position corresponding to an “OFF” internal state is a position in which device 200 is in a predetermined pose. For example, a predetermined pose can include display portion 200-1 positioned parallel to base portion 200-2 or display portion 200-1 forming a predetermined angle (e.g., 60-degree angle) with respect to base portion 200-2. In some embodiments, in the “OFF” internal state, an area in which content is displayed by device 200 is positioned in a manner that corresponds to (e.g., represents, is associated with, and/or is configured to accompany) the “OFF” internal state (e.g., facing down, not visible, and/or obscuring the area in which content is displayed). In some embodiments, in the “OFF” internal state, an area in which content is displayed by device 200 is not positioned in a manner that corresponds to (e.g., represents, is associated with, and/or is configured to accompany) the “OFF” internal state (e.g., instead is positioned in a manner that corresponds to an “ON” internal state). For example, when not in the “OFF” internal state, device 200 can be positioned within a range of different open positions (e.g., in which display portion 200-1 is not parallel to base portion 200-2 and the area in which content is displayed by device 200 is visible and/or not obscured). It should be recognized that display portion 200-1 being parallel to base portion 200-2 is an example of a position corresponding to an “OFF” internal state (e.g., a closed position) of device 200. In some embodiments, another configuration could set another orientation of display portion 200-1 with respect to base portion 200-2 as the closed position of device 200, such as illustrated in FIG. 2C.

[0057] FIG. 2A illustrates display screen 200-4 (representing the area in which content is displayed by device 200) on the left and device 200 in a corresponding pose on the right. As illustrated in FIG. 2A, device 200 is in a first position (e.g., display portion 200-1 is perpendicular to base portion 200-2 forming a 90-degree angle). In FIG. 2A, display screen 200-4 represents what is currently being displayed (e.g., via a display component) by device 200 while open in the first position. In FIG. 2A, display screen 200-4 illustrates an internal state in which device 200 is “ON” (e.g., operational, powered on, awake, a higher powered and/or more resource intensive state than the “OFF” state, and/or activated). In some embodiments, device 200 displays (e.g., via display screen 200-4) one or more user interfaces (e.g., user interface objects, windows, application user interfaces, system user interfaces, controls, and/or other visual content). In some embodiments, device 200 displays (e.g., via display screen 200-4) the one or more user interfaces while in the “ON” internal state. For example, in FIG. 2A, device 200 is in the “ON” internal state and display screen 200-4 displays a desktop user interface 200-5 that includes an application window. In some embodiments, a user interface includes (and/or is) one or more user interface objects (e.g., windows, icons, and/or other graphical objects). For example, a user interface (e.g., 200-5) can include one or more graphical objects different than, and/or the same as, an application window.

[0058] FIG. 2B illustrates display screen 200-4 on the left and device 200 in a corresponding pose on the right. As illustrated in FIG. 2B, device 200 is in a second position (e.g., display portion 200-1 is angled (e.g., via connection 200-3) with respect to base portion 200-2 forming at a 120-degree angle (e.g., a larger angle than in FIG. 2A)). In FIG. 2B, display screen 200-4 represents what is being displayed by device 200 while in the second position. Display screen 200-4 illustrates an internal state in which device 200 is “ON” (e.g., the same internal state as the top diagram of FIG. 2A). In FIG. 2B, device 200 displays (e.g., via display screen 200-4) desktop user interface 200-5 (e.g., and is the same as displayed in FIG. 2A). In some embodiments, device 200 displays a different user interface (e.g., other than desktop user interface 200-5). For example, although FIG. 2B illustrates device 200 displaying the same desktop user interface 200-5 as in FIGS. 2A while in a different position than in FIG. 2A, device 200 can display a different user interface. In some embodiments, device 200 displays a user interface that corresponds to (e.g., is based on, due to, caused by, related to, and/or configured to accompany) a physical state (e.g., position, location, and/or orientation), including content that is specific to a particular angle or specific to a current context.

[0059] FIG. 2C illustrates display screen 200-4 on the left and device 200 in a corresponding pose on the right. As illustrated in FIG. 2C, device 200 is in a third position (e.g., display portion 200-1 is angled (e.g., via connection 200-3) with respect to base portion 200-2 forming at a 60-degree angle (e.g., a smaller angle than in FIG. 2A and FIG. 2B)). In FIG. 2C, display screen 200-4 represents what is being displayed by device 200 while in the third position. In FIG. 2C, display screen 200-4 illustrates an internal state in which device 200 is “OFF” (e.g., not operational, not powered on, not awake, not activated, powered off, asleep, hibernating, inactive, and/or deactivated). In some embodiments, device 200 does not display (e.g., via display screen 200-4) (e.g., forgoes displaying) the one or more user interfaces while in the “OFF” internal state (e.g., does not display any visual content). In some embodiments, device 200 displays (e.g., via display screen 200-4) one or more user interfaces while in the “OFF” internal state (e.g., the same and/or different from one or more user interfaces displayed while in the “ON” internal state) (e.g., a user interface specific to the “OFF” state and/or a manner of displaying a user interface that is not specific to the “OFF” internal state). In FIG. 2C, display screen 200-4 is blank because nothing is being displayed on the display of device 200 (e.g., display screen 200-4 is off and/or not displaying a user interface) (e.g., desktop user interface 200-5 is not displayed on display screen 200-4).

[0060] In some embodiments, device 200 includes one or more components (also referred to herein as “movement components”) that enable device 200 to perform (e.g., cause and/or control) movement (and/or be moved). For example, performing movement can include moving a portion of device 200 (e.g., less than or all components of the device move), moving all of device 200 (e.g., the entire device (including all of its components) moves, such as by changing location), and/or moving one or more other devices and/or components (e.g., that are in communication with device 200 and/or movement components of device 200). For example, device 200 can automatically move (e.g., pivot), cause, and/or control movement of display portion 200-1 relative to base portion 200-2, such as to any of the positions illustrated in FIGS. 2A-2C. In some embodiments, device 200 performs movement based on an internal state of device 200. Performing movement based on an internal state can enable new (e.g., otherwise unavailable) interactions by device 200. For example, such new interactions of device 200 can be configured using special features, functions, modes, and/or programs that take advantage of the ability of device 200 to perform movement. Examples of such interaction include using movement to communicate (e.g., to a user) an internal state (e.g., on, off, sleeping, and/or hibernating) of the device, to assist with user input (e.g., reduce distance to a user), and/or to augment interaction behavior of the device (e.g., moving in particular ways, during an interaction with a user, that convey information such as importance and/or direction of attention). In some embodiments, the movement performed corresponds to (e.g., is caused by, is in response to, and/or is determined and/or performed based on) one or more of: detected input, detected context (e.g., environmental context and/or user context), and/or an internal state of device 200 (e.g., an internal state and/or a set of multiple internal states). For example, device 200 can perform a movement of the display portion such that device 200 moves from being in the first position illustrated in FIG. 2A to being in the second position illustrated in FIG. 2B. In this example, device 200 can detect that a user has repositioned with respect to device 200 (e.g., the user stood up), and in response, device 200 can perform the movement to the second position so that the display is at an optimized viewing angle based on the repositioned height and/or angle of the user’s eyes with respect to the display of device 200. As another example, device 200 can perform a movement such that device 200 moves from being in the first position illustrated in FIG. 2A to being in the third position illustrated in FIG. 2C. In this example, device 200 can perform the movement to the third position in response to detecting an internal state with reduced activity (e.g., the “OFF” internal state as described above). In this way, the movement of device 200 to one or more positions can indicate an internal state of device 200.

[0061] FIGS. 2A-2C illustrate device 200 having a display portion that is able to move with one degree of freedom via connection 200-3 (e.g., a hinge) connecting display portion 200-1 to base portion 200-2. In some embodiments, device 200 includes one or more components that have one or more degrees of freedom. For example, a movement component (e.g., an output component that causes and/or allows movement) (e.g., 200-26C of FIG. 5) of device 200 can include multiple degrees of freedom (e.g., six degrees of freedom including three components of translation and three components of rotation). For example, device 200 can be implemented to be able to move the display portion in a telescoping forward or backward motion (e.g., display portion 200-1 moves forward while base portion 200-2 remains stationary in space relative to the base portion (e.g., to reduce and/or extend viewing distance for a user)). As yet another example, device 200 can be implemented to be able to move the display portion to rotate about an axis that is perpendicular to the hinge such that the display portion can turn to position the display to follow a user as they walk around device 200. While the examples shown in FIGS. 2A-2C illustrate a hinge, other movement components can be included in device 200, such as an actuator (e.g., a pneumatic actuator, hydraulic actuator and/or an electric actuator), a movable base, a rotatable component, and/or a rotatable base. In some embodiments, one or more movement components can cause device 200 to move in different ways, such as to rotate (e.g., 0-360 degrees), to move laterally (e.g., right, left, down, up, and/or any combination thereof), and/or to tilt (e.g., 0-360 degrees).

[0062] FIG. 3 illustrates exemplary block diagram of device 200. In some embodiments, device 200 includes some or all of the components described with respect to FIGS. 1 A, IB, 3, and 5B. As illustrated in FIG. 3, device 200 has bus 200-13 that operatively couples VO section 200-12 (also referred to as an I/O subsection and/or an I/O interface) with processors 200-11 and memory 200-10. As illustrated in FIG. 3, I/O section 200-12 is connected to output devices 200-16 (also referred to herein as “output components”). In some embodiments, output devices 200-16 include one or more visual output devices (e.g., a display component, such as a display, a display screen, a projector, and/or a touch-sensitive display), one or more haptic output devices (e.g., a device that causes vibration and/or other tactile output), one or more audio output devices (e.g., a speaker), and/or one or more movement components (e.g., an actuator, a motor, a mechanical linkage, devices that cause and/or allow movement, and/or one or more movement components as described above). As illustrated in FIG. 3, output devices 200-16 include two exemplary movement components (e.g., movement controller 200-17 and actuator 200-18). Actuator 200-18 can be any component that performs physical movement (e.g., of a portion and/or of the entirety) of a device (e.g., device 200 and/or a device coupled to and/or in contact with device 200). Movement controller 200-17 can be any component (e.g., a control device) that controls (e.g., provides control signals to) actuator 200-18. For example, movement controller 200-17 can provide control signals that cause actuator 200-18 to actuate (e.g., cause physical movement). In some embodiments, movement controller 200-17 includes one or more logic component (e.g., a processor), one or more feedback component (e.g., sensor), and/or one or more control components (e.g., for applying control signals, such as a relay, a switch, and/or a control line). In some embodiments, movement controller 200-17 and actuator 200-18 are embodied in the same device and/or component as each other (e.g., a dedicated onboard movement controller 200-17 that is affixed to actuator 200-18). In some embodiments, movement controller 200-17 and actuator 200-18 are embodied in different devices and/or components from each other (e.g., one or more processors 200-11 can function as the movement controller 200-17 of actuator 200-18). In some embodiments, movement controller 200-17 and/or actuator 200-18 are embodied in a device (or one or more devices) other than device 200 (e.g., device 200 is coupled to (e.g., temporarily and/or removably) another device and can instruct movement controller 200-17 and/or control actuator 200-18 of the other device). Actuator 200-18 can function to cause one or more types of mechanical movement (e.g., linear and/or rotational) in one or more manners (e.g., using electric, magnetic, hydraulic, and/or pneumatic power). Examples of actuator 200-18 can include electromechanical actuators, linear actuators, and/or rotary actuators.

[0063] As illustrated in FIG. 3, I/O section 200-12 is connected to input devices 200-14. In some embodiments, input devices 200-14 include one or more visual input devices (e.g., a camera and/or a light sensor), one or more physical input devices (e.g., a button, a slider, a switch, a touch-sensitive surface, and/or a rotatable input mechanism), one or more audio input devices (e.g., a microphone), and/or other input devices (e.g., accelerometer, a pressure sensor (e.g., contact intensity sensor), a ranging sensor, a temperature sensor, a GPS sensor, an accelerometer, a directional sensor (e.g., compass), a gyroscope, a motion sensor, and/or a biometric sensor). In addition, I/O section 200-12 can be connected with communication unit 200-15 for receiving application and operating system data, using Wi-Fi, Bluetooth, near field communication (NFC), cellular, and/or other wireless (and/or wired) communication techniques.

[0064] Memory 200-10 of device 200 can include one or more non-transitory computer- readable storage mediums, for storing computer-executable instructions, which, when executed by one or more computer processors 200-11, for example, cause the computer processors to perform the techniques described below, including methods 700, 800, and 900 (FIGS. 7-9). A computer-readable storage medium can be any medium that can tangibly contain or store computer-executable instructions for use by or in connection with the instruction execution system, apparatus, or device. In some embodiments, the storage medium is a transitory computer-readable storage medium. In some embodiments, the storage medium is a non-transitory computer-readable storage medium. The non-transitory computer- readable storage medium can include, but is not limited to, magnetic, optical, and/or semiconductor storages. Examples of such storage include magnetic disks, optical discs based on CD, DVD, and Blu-ray technologies, as well as persistent solid-state memory such as flash and solid-state drives. Device 200 is not limited to the components and configuration of FIG. 3, but can include other and/or additional components in a multitude of possible configurations, all of which are intended to be within the scope of this disclosure.

[0065] FIG. 4 illustrates a functional diagram of actuator 200- 18B in accordance with some embodiments. As described above, actuator 200-18B can be any component that performs physical movement. In some embodiments, actuator 200- 18B operates using input that includes control signal 200-18A and/or energy source 200-18B. For example, actuator 200-18 can be a rotary actuator that converts electric energy into rotational movement. This rotational movement can cause the movement of the display portion of device 200 described above with respect to FIGS. 2A-2C (e.g., a counterclockwise rotational movement of the actuator causes device 200 to move to a position having a larger angle (e.g., the second position illustrated in FIG. 2B) and a clockwise (e.g., opposite) rotational movement of the actuator causes device 200 to move to a position having a smaller angle (e.g., the third position illustrated in FIG. 2C)). Control signal 200-18A can indicate one or more start and/or stop instructions, a movement and/or actuation direction, a movement and/or actuation speed, an amount of time to move and/or actuate, a goal position (e.g., pose and/or location) for movement and/or actuation, and/or one or more other characteristics of movement and/or actuation. In some embodiments, the control signal and the energy source are the same signal and/or input. In some embodiments, one or more additional components (e.g., mechanical and/or electric) are coupled (e.g., removably or permanently) to actuator 200-18B for affecting movement and/or actuation (e.g., mechanical linkage such as a lead screw, gears, and/or other component for changing (e.g., converting) a characteristic of movement and/or actuation). In some embodiments, actuator 200-18B includes one or more feedback components (e.g., position sensor, encoder, overcurrent sensor, and/or force sensor) that form part of a feedback loop for modifying and/or ceasing movement and/or actuation (e.g., slowing actuation as a goal position is reached and/or ceasing actuation if physical resistance to actuation is detected via a sensor). In some embodiments, the one or more feedback components are included (e.g., partially and/or wholly) in a movement controller (e.g., movement controller 200-13) operatively coupled to the actuator.

[0066] Attention is now turned to functionality (e.g., features and/or capabilities) of one or more devices (e.g., computer system 100 and/or device 200). One such functionality is implementing an “agent,” which can alternatively be referred to as a software agent, an intelligent agent, an interactive agent, a virtual assistant, an intelligent virtual assistant, an interactive virtual assistant, a personal assistant, an intelligent personal assistant, an interactive personal assistant, an intelligent interactive personal assistant, and/or an artificial intelligence (Al) assistant. In some embodiments, an agent refers to a set of one or more functions implemented in hardware and/or software (e.g., locally and/or remotely) on an agent system (e.g., a single device and/or multiple devices). In some embodiments, an agent performs operations to perceive an environment, acquire knowledge, retrieve knowledge, learn skills, interact with users, and/or perform tasks. The agent can, for example, perform these (and/or other) operations in response to user input and/or automatically (e.g., at an appropriate time determined based on a perceived context). A non-exhaustive list of exemplary operations that an agent can be used for and/or with includes: tracking a user’s eyes, face, and/or body (e.g., to move with the user and/or identify an intent and/or activity of the user); detecting, recognizing, and/or classifying a user in the environment; detecting and/or responding to input (e.g., verbal input, air gestures, and/or physical input, such as touch input and/or force inputs to physical hardware components (e.g., button, knobs, and/or sliders)); detecting context (e.g., user context, operating context, and/or environmental context); moving (e.g., changing pose, position, orientation, and/or location); performing one or more operations in response to input, context, and/or stimulus (e.g., an object or event (e.g., external and/or internal to a device) that causes one or more responsive operations by a device); providing intelligent interaction capabilities (e.g., due to in part to one or more machine learning (“ML”) models such as a large language model (“LLM”)) for responding and/or causing operations to be performed; and/or performing tasks (e.g., a set of operations for achieving a particular goal) (e.g., automatically and/or intelligently). In some embodiments, an agent performs operations in response to non-contact inputs (e.g., air gestures and/or natural language commands). The preceding list is meant to be illustrative of operations that can be performed using an agent but is not meant to be an exhaustive list. Other operations fall within the intended scope of the capabilities of an agent. Additionally, for the purposes of this disclosure, an agent does not need to include all of the functionality mentioned herein but can include less functionality or more functionality (e.g., an agent can be implemented on an agent system that does not have movement functionality but that otherwise includes an intelligent personal assistant that can interact with a user).

[0067] In some embodiments, a user is (e.g., represents, includes, and/or is included in) one or more of a user, person, object, and/or animal in an environment (e.g., a physical and/or virtual environment) (e.g., of the device). In some embodiments, a user is (e.g., represents, includes, and/or is included in) an entity that is perceived (e.g., detected by the device, one or more other devices, and/or one or more components thereof). In some embodiments, an entity is something that is distinguished from surrounding entities (e.g., pieces of environments and/or other users) and/or that is considered as a discrete logical construct via one or more components (e.g., perception components and/or other components). In some embodiments, a user is physical and/or virtual. For example, a physical user can represent a user standing in front of, and being perceived by, the device. As another example, a virtual user can represent an avatar in a virtual scene perceived by the device (e.g., the avatar is detected in a media stream received by the device and/or captured by a camera of the device). Although presented above as examples of a “user,” the terms and/or concepts referred to as “person,” “object,” and/or “animal” can be interchanged with “user” throughout this disclosure, unless explicitly indicated otherwise. For example, use the term “user” can likewise be understood to also refer to “user,” unless explicitly indicated otherwise.

[0068] As an example, and referring back to FIGS. 2A-2C, an agent implemented at least partially on device 200 can perform operations that cause display portion 200-1 of device 200 to move with respect to base portion 200-2. For example, the agent detects (e.g., perceives and determines the occurrence of) a context that includes the user standing up (e.g., based on facial detection and tracking); and, in response, the agent causes device 200 to open and/or device 200 opens display portion 200-1 to the larger angle. As another example, the agent can detect verbal input that corresponds to (e.g., is interpreted as and/or that refers to an operation that includes) a request to move the display (e.g., “Please move my display,” or “Please enter sleep mode.”); and, in response, the agent causes device 200 to move and/or device 200 moves display portion 200-1.

[0069] FIG. 5 illustrates a functional diagram of an exemplary agent system 200-20A. As illustrated in FIG. 5, agent system 200-20A has a dotted box boundary that encloses input components 200-22, agent components 200-24, and output components 200-26. In some embodiments, agent system 200-20A includes fewer, more, and/or different components than illustrated in FIG. 5. In some embodiments, agent system 200-20 is implemented on a single device (e.g., computer system 100 and/or device 200). In some embodiments, agent system 200-20 is implemented on multiple devices. In some embodiments, one or more components of agent system 200-20 illustrated in and/or described with respect to FIG. 5 are external to but operatively coupled to agent system 200-20 (e.g., an accessory, an external device, an external sensor, an external actuator, an external display component, an external speaker, and/or an external database). In some embodiments, one or more components of agent system 200-20 are local to one or more other components of agent system 200-20. In some embodiments, one or more components of agent system 200-20 are remote from one or more other components of agent system 200-20.

[0070] In some embodiments, input components 200-22 includes components for performing sensing and/or communications functions of agent system 200-20. As illustrated in FIG. 5, input components 200-22 includes one or more sensors 200-22A. One or more sensors 200-22A can include any component that functions to detect data corresponding to a physical environment. Examples of one or more sensors 200-22A can include: a camera, a light sensor, a microphone, an accelerometer, a position sensor, a pressure sensor, a temperature sensor, olfactory sensor, and/or a contact sensor. This list is not intended to be exhaustive, and one or more sensors 200-22A can include other sensors not explicitly identified herein that detect, generate, and/or otherwise provide data that can be used (e.g., processed, stored, and/or transformed) for detecting data corresponding to a physical environment. As illustrated in FIG. 5, input components 200-22 includes one or more communications components 200-22B. One or more communications components 200-22B can include any component that functions to send and/or receive communications (e.g., an antenna, a modem, a network interface component, an encoder, a decoder, and/or a communication protocol stack) internal and/or external to agent system 200-20. Communications components 200-22B can be between different devices and/or between components of the same device. The communications can include control signals and/or data (e.g., messages, instructions, files, application data, and/or media streams). In some embodiments, input components 200-22 includes fewer, more, and/or different components than those illustrated in FIG. 5. In some embodiments, input components 200-22 are implemented in hardware and/or software.

[0071] In some embodiments, agent components 200-24 includes components that manage and/or carry out functions of an agent of agent system 200-20. As illustrated in FIG. 5, agent components 200-24 includes the following functional components: task flow, coordination, and/or orchestration component 200-24A, administration component 200-24B, perception component 200-24C, evaluation component 200-24D, interaction component 200- 24E, policy and decision component 200-24F, knowledge component 200-24G, learning component 200-24H, models component 200-241, and APIs component 200-24J. Each of these components is described briefly below. Notably, this list of agent components 200-24 is not intended to be exhaustive, and agent components 200-24 can include other functional components not explicitly identified herein that can be used (e.g., processed, stored, and/or transformed) for performing any function of an agent, such as those described herein. In some embodiments, agent components 200-24 includes fewer, more, and/or different components than those illustrated in FIG. 5. In some embodiments, agent components 200-24 is implemented in hardware and/or software.

[0072] In some embodiments, task flow, coordination, and/or orchestration component 200-24A performs operations that enable an agent to handle coordination between various components. For example, operations can include handling a data processing task flow to move from perception component 200-24C (e.g., that detects speech input) to models component 200-241 (e.g., for processing the detected speech input using a large language model to determine content and/or intent of the speech input). In some embodiments, task flow, coordination, and/or orchestration component 200-24A performs operations that enable an agent to handle coordination between one or more external components (e.g., resources). For example, FIG. 5 illustrates examples of external components, such as external database 200-30. In some embodiments, administration component 200-24B includes functionality performed by an operating system of a device implementing agent system 200-20. In some embodiments, administration component 200-24B includes functionality performed by one or more applications of a device implementing agent system 200-20.

[0073] In some embodiments, administration component 200-24B performs operations that enable an agent system to handle administrative tasks like managing system and/or component updates, managing user accounts, managing system settings, and/or managing component settings. In some embodiments, administration component 200-24B includes functionality performed by an operating system of a device implementing agent system 200- 20. In some embodiments, administration component 200-24B includes functionality performed by one or more applications of a device implementing agent system 200-20.

[0074] In some embodiments, perception component 200-24C performs operations that enable an agent to perceive environmental input. For example, operations can include detecting that a context and/or environmental condition has occurred, detecting the presence of a user (e.g., user, person, object, and/or animal in an environment), detecting an input that includes speech, detecting an input that includes an air gesture, detecting facial expressions, detecting characteristics (e.g., visible and/or non-visible) of a user, and/or detecting verbal and/or physical cues. In some embodiments, perception component 200-24C includes functionality performed by an operating system of a device implementing agent system 200- 20. In some embodiments, perception component 200-24C includes functionality performed by one or more applications of a device implementing agent system 200-20.

[0075] In some embodiments, evaluation component 200-24D performs operations that enable an agent to process evaluate data (e.g., to determine a context such as a user context, an environmental context, and/or an operating context). For example, operations can include evaluating data gathered from perception component 200-24C, knowledge component 200- 24G, external database 200-30, and/or remote processing resource 200-32. In some embodiments, evaluation component 200-24D includes functionality performed by an operating system of a device implementing agent system 200-20. In some embodiments, evaluation component 200-24D includes functionality performed by one or more applications of a device implementing agent system 200-20.

[0076] Reference is made herein to environmental context (also referred to herein as a “context of an environment” and/or “a context corresponding to an environment”). In some embodiments, an environmental context is a context based on one or more characteristics of the environment (e.g., users, locations, time, weather, and/or lighting). For example, an environmental context can include that it is raining outside, that it is daytime, and/or that a device is currently located in a park. In some embodiments, a device (e.g., using an agent) determines an environmental context (e.g., to be currently true, occurring, and/or applicable) using one or more of detecting input (e.g., via one or more input components) and/or receiving data (e.g., from one or more other devices and/or components in communication with the device).

[0077] Reference is made herein to user context (also referred to herein as a “context of a user” and/or “a context corresponding to a user”) (and/or a user context). In some embodiments, a user context is a context based on one or more characteristics of the user (and/or a user). For example, a user context can include the user’s appearance and/or clothing, personality, actions, behavior, movement, location, and/or pose. In some embodiments, a device (e.g., using an agent) determines a user context (e.g., to be currently true, occurring, and/or applicable) using one or more of detecting input (e.g., via one or more input components) and/or receiving data (e.g., from one or more other devices and/or components in communication with the device). In some embodiments, a device determines user context based on historical context and/or learned characteristics of the user, where one or more characteristics of the user are learned and/or stored over a period of time by the device.

[0078] Reference is made herein to operational context (also referred to herein as a “context of operation” and/or an “operating context”). In some embodiments, an operational context is a context based on one or more characteristics of the operation of a device (e.g., the device determining and/or accessing the operational context and/or one or more other devices). For example, an operational context can include the internal state of the device (and/or of one or more components of the device), an internal dialogue of the device (e.g., the device’s understanding of a context), operations being performed by the device, applications and/processes that are executing (e.g., running and/or open) on the device. In some embodiments, a device (e.g., using an agent) determines an operational context (e.g., to be currently true, occurring, and/or applicable) using one or more of detecting input (e.g., via one or more input components) and/or receiving data (e.g., from one or more other devices and/or components in communication with the device). In some embodiments, a device (e.g., using an agent) determines an operational context (e.g., to be currently true, occurring, and/or applicable) using one or more internal states (e.g., accessed, retrieved, and/or queried by a process of the device).

[0079] In some embodiments, interaction component 200-24E performs operations that enable an agent to manage and/or perform interactions with users. For example, operations can include determining an appropriate interaction model for a particular context and/or in response to a particular input. In some embodiments, interaction component 200-24E includes functionality performed by an operating system of a device implementing agent system 200-20. In some embodiments, interaction component 200-24E includes functionality performed by one or more applications of a device implementing agent system 200-20.

[0080] In some embodiments, policy and decision component 200-24F performs operations that enable an agent to take actions in view of available data. For example, operations can include determining which operations to perform and/or which functional components to utilize in response to a detected context. In some embodiments, policy and decision component 200-24F includes functionality performed by an operating system of a device implementing agent system 200-20. In some embodiments, policy and decision component 200-24F includes functionality performed by one or more applications of a device implementing agent system 200-20.

[0081] In some embodiments, knowledge component 200-24G performs operations that enable an agent to access and use stored knowledge. For example, operations can include indexing, storing, and/or retrieving data from a data store, a database, and/or other resource. In some embodiments, knowledge component 200-24G includes functionality performed by an operating system of a device implementing agent system 200-20. In some embodiments, knowledge component 200-24G includes functionality performed by one or more applications of a device implementing agent system 200-20. [0082] In some embodiments, learning component 200-24H performs operations that enable an agent to learn through experiences. For example, operations can include observing and/or keeping track of data that includes preferences, routines, user characteristics, and/or environmental characteristics in a manner in which such data can be used to inform future operation by the agent and/or a component thereof (e.g., such as when performing tasks and/or interactions with users). In some embodiments, learning component 200-24H includes functionality performed by an operating system of a device implementing agent system 200- 20. In some embodiments, learning component 200-24H includes functionality performed by one or more applications of a device implementing agent system 200-20.

[0083] In some embodiments, models component 200-241 performs operations that enable an agent to apply ML models (e.g., such as a large language model (LLM)) to process data. For example, operations can include storing ML models, executing ML models, training and/or re-training ML models, and/or otherwise managing aspects of implementing ML models. In some embodiments, models component 200-241 includes functionality performed by an operating system of a device implementing agent system 200-20. In some embodiments, models component 200-241 includes functionality performed by one or more applications of a device implementing agent system 200-20.

[0084] In some embodiments, agent system 200-20 responds to natural language input. For example, agent system 200-20 responds to a natural language input that is in the form of a statement, a question, a command, and/or a request. In some embodiments, agent system 200-20 outputs text and/or speech output that is provided in a natural language or mimicking a natural language style. For example, agent system 200-20 can process the natural language question “How hot is it outside?” with a speech response that indicates the current temperature outside at the user’s location (e.g., “It is 18 degrees outside.”). In some embodiments, agent system 200-20 responds to natural language input by providing information (e.g., weather, travel, and/or calendar information) and/or performing a task (e.g., opening a document, searching a database, and/or opening an application).

[0085] In some embodiments, agent system 200-20 includes and/or relies on one or more data models to process input (e.g., natural language input, gesture input, visual input, and/or other data input) and/or provide output (e.g., output of information via natural language output, visual output, audio output, and/or textual output). Such data models can include and/or be trained using user data (e.g., based on particular interactions and/or data from the user being interacted with) and/or global data (e.g., general data based on interactions and/or data from many users). For example, user data (e.g., preferences, previous use of language and/or phrases, calendar entries, a contact list, and/or activity data) can be used to better infer user intent and/or provide responses that are more likely to address a user’s request. In some embodiments, data models used by agent system 200-20 include, are used by, and/or are implemented using one or more machine learning components (e.g., hardware and/or software) (e.g., one or more neural networks). Such machine learning components can be used to process verbal input to determine words and/or phrases therein, one or more contexts that correspond to the words, a user intent corresponding to the words, one or more confidence scores, and/or a set of one or more actions to take in response to the verbal input. Analogous operations can be performed to process other types of inputs, such as visual input, data input, and/or textual input. Such data models can include machine learning and/or data processing models, including, but not limited to, natural language processing models, language models, speech recognition models, object recognition models, visual processing models, ontologies, task flow models, and/or intent recognition models (e.g., used to determine user intent).

[0086] In some embodiments, Application Programming Interfaces (APIs) component 200-24J performs operations that enable an agent to interface with services, devices, and/or components. For example, operations can include relaying data (e.g., requests, responses, and/or other messages) between data interfaces (e.g., between software programs, between a system process and application process, between system processes, between application processes, between communication protocols, between a client and a server, between file systems, and/or between components on different sides of a trust boundary). In some embodiments, the data interfaces served by APIs component 200-24J are local (e.g., to the device, such as two application processes exchanging data) and/or remote (e.g., from the device, such as interfacing with a web service via a remote server). In some embodiments, APIs component 200-24J includes functionality performed by an operating system of a device implementing agent system 200-20. In some embodiments, APIs component 200-24J includes functionality performed by one or more applications of a device implementing agent system 200-20. [0087] In some embodiments, output components 200-26 includes components for performing output functions of agent system 200-20. The exemplary output components illustrated in FIG. 5 are described briefly below. In some embodiments, output components 200-26 include fewer components, more, and/or different components than those illustrated in FIG. 5. In some embodiments, input components is implemented in hardware and/or software.

[0088] As illustrated in FIG. 5, output components 200-26 includes one or more visual output components 200-26A. One or more visual output components 200-26A can include any component that functions to output (e.g., generate, create, and/or display), and/or cause output of, a visual output (e.g., an output that is visually perceptible, such as graphical user interface, playback of visual media content, and/or lighting). Examples of one or more visual output components 200-26A can include: a display component, a projector, a head mounted display (HMD), a light-emitting diode (“LED”), and/or a component that creates visually perceptible effects (e.g., movement). This list is not intended to be exhaustive, and one or more visual output components 200-26A can include other visual output components not explicitly identified herein that detect, generate, and/or otherwise provide data that can be used (e.g., processed, stored, and/or transformed) for outputting visual output.

[0089] As illustrated in FIG. 5, output components 200-26 include one or more audio output components 200-26B. One or more audio output components 200-26B can include any component that functions to output (e.g., generate and/or create), and/or cause output of, an audio output (e.g., an output that is audibly perceptible, such as a sound, music, speech, and/or audio media content). Examples of one or more audio output components 200-26B can include: a speaker, an audio amplifier, a tone generator, and/or a component that creates audibly perceptible effects (e.g., movement such as vibrations). This list is not intended to be exhaustive, and one or more audio output components 200-26B can include other audio output components not explicitly identified herein that detect, generate, and/or otherwise provide data that can be used (e.g., processed, stored, and/or transformed) for outputting audio output.

[0090] As illustrated in FIG. 5, output components 200-26 include one or more movement output components 200-26C (also referred to herein as a “movement component”). One or more movement output components 200-26C can include any component that functions to output (e.g., generate and/or create), and/or cause output of, a movement output (e.g., an output that includes physical movement of the device and/or another device/component). Examples of one or more movement output components 200- 26C can include: a movement controller, an actuator, a mechanical linkage, an electromechanical device, and/or a component that creates physical movement. This list is not intended to be exhaustive, and one or more movement output components 200-26C can include other movement output components not explicitly identified herein that detect, generate, and/or otherwise provide data that can be used (e.g., processed, stored, and/or transformed) for outputting movement output. As illustrated in FIG. 5, output components 200-26 include one or more haptic output components 200-26D. One or more haptic output components 200-26D can include any component that functions to output (e.g., generate, create, and/or display), and/or cause output of, a haptic output (e.g., an output that is physically perceptible using tactile sensation, such as a vibration, pressure, texture, and/or shape). Examples of one or more haptic output components 200-26D can include: a speaker, a component that generates vibrations, a component that generates texture changes, a component that generates pressure changes, and/or a component that creates perceivable tactile effects. This list is not intended to be exhaustive, and one or more haptic output components 200-26D can include other haptic output components not explicitly identified herein that detect, generate, and/or otherwise provide data that can be used (e.g., processed, stored, and/or transformed) for outputting haptic output.

[0091] As illustrated in FIG. 5, output components 200-26 include one or more communications components 200-26E. One or more communications components 200-26E can include any component that functions to send and/or receive communications (e.g., an antenna, a modem, a network interface component, an encoder, a decoder, and/or a communication protocol stack) internal and/or external to agent system 200-20. In some embodiments, the communications can be between different devices and/or between components of the same device. In some embodiments, the communications can include control signals and/or data (e.g., messages, instructions, files, application data, and/or media streams). In some embodiments, one or more communications components 200-26E includes one or more features of one or more communications components 200-22B (e.g., as described above). In some embodiments, one or more communications components 200-26E are the same as one or more communications components 200-22B (e.g., one or more components that handle communication inputs and outputs and thus be considered as either and/or both an input component and an output component). [0092] Throughout this disclosure, reference can be made to movement output (e.g., referred to in various forms such as: movement, device movement, output of movement, device motion, output of motion, and/or motion output). In some embodiments, outputting (e.g., causing output of) movement refers to movement of an electronic device (e.g., a portion or component thereof relative to another portion and/or of the whole electronic device). For example, referring back to FIG. 2B, movement output can refer to device 200 actuating movement component 200-3 to move display portion 200-1 to the position illustrated in FIG. 2B (e.g., from the position in FIG. 2A). In some embodiments, movement output is not (e.g., does not include and/or does not only include) haptic output (e.g., haptic movement output). In some embodiments, movement output is not (e.g., does not include and/or does not only include) vibration output. In some embodiments, movement output is not (e.g., does not include and/or does not only include) oscillating movement (e.g., movement of an actuator that merely causes vibration by moving a component repeatedly along a path that is internal to the device). In some embodiments, movement output includes (e.g., requires and/or results in) changing a location and/or pose of at least a portion of (and/or the entirety of) a component or the electronic device. In some embodiments, movement output includes output that moves at least a portion of (and/or the entirety of) a component or the electronic device from a first location and/or first pose to a second location and/or second pose. For example, with respect to FIGS. 2A-2C, display portion 200-1 is shown in a different location (e.g., in space) and pose (e.g., relative to base portion 200-2) in each of FIGS. 2A, 2B, and 2C. In some embodiments, movement output includes output that moves at least a portion (and/or the entirety of) a component or the electronic device to a third location and/or third pose (e.g., from the first location and/or first pose and/or from the second location and/or the second pose). In some embodiments, the third location and/or the third pose is the same as the first location and/or first pose and/or as the second location and/or the second pose. For example, movement output can include device 200 in FIG. 2A beginning from the first position illustrated in FIG. 2A, moving to the second position illustrated in FIG. 2B, and moving to return to the first position illustrated in FIG. 2 A. For example, movement output can include device 200 in FIG. 2A beginning from the first position illustrated in FIG. 2A, moving to the second position illustrated in FIG. 2B, and continuing movement to come to rest at the third position illustrated in FIG. 2C.

[0093] Throughout this disclosure, an electronic device can be illustrated in (and/or described as being in) different locations and/or poses at different times. For example, in FIG. 2A illustrates device 200 in the first position, FIG. 2B illustrates device 200 in the second position, and FIG. 2A illustrates device 200 in the third position. In some embodiments, the electronic device moves itself between such locations and/or poses (e.g., using movement output). For example, device 200 moves from the first position to the second position under its own power (e.g., using a power source and one or more actuators to cause movement). In particular, any example herein that illustrates and/or describes an electronic device being at different locations and/or poses (e.g., at different times) should be understood to cover a scenario in which the device moved itself between such locations and/or poses (e.g., unless otherwise clearly indicated).

[0094] Throughout this disclosure, reference can be made to “performing output,” “causing output,” and/or “outputting” (e.g., by one or more output generation devices and/or by one or more output generation components) (and/or similar such phrases). In some embodiments, outputting (e.g., or the aforementioned variants) includes (and/or is) outputting movement (e.g., movement output as described above).

[0095] Throughout this disclosure, reference can be made to “displaying,” “causing display of,” and/or “outputting visual content” (e.g., by one or more display components) (and/or similar such phrases). In some embodiments, displaying (e.g., or the aforementioned variants) includes displaying visual content in connection with outputting movement (e.g., movement output as described above).

[0096] Throughout this disclosure, reference can be made to “outputting audio,” “causing output of audio,” and/or “providing audio output” (e.g., by one or more audio generation components and/or by one or more audio output devices) (and/or similar such phrases). In some embodiments, outputting audio (e.g., or the aforementioned variants) includes outputting audio content in connection with outputting movement (e.g., movement output as described above).

[0097] Throughout this disclosure, reference can be made to movement of an avatar (e.g., or other representation of a user, an agent and/or a character that is displayed) (e.g., by one or more display components) (and/or similar such phrases). In some embodiments, moving an avatar (e.g., or the aforementioned variants) includes displaying movement of visual content in connection with outputting movement (e.g., movement output as described above). For example, displaying an avatar nodding in agreement can include movement of the electronic device in a similar manner as the avatar movement (e.g., mimicking nodding). In some embodiments, moving an avatar (e.g., or the aforementioned variants) includes outputting movement (e.g., movement output as described above) without displaying movement of visual content. For example, a device can perform movement output that mimics nodding without moving a displayed avatar (e.g., the avatar does not move relative to the display). As illustrated in FIG. 5, agent system 200-20 can optionally interface with external components such as external database 200-30, remote processing component 200-32, and/or remote administration component 200-34. In some embodiments, external database 200-30 represents one or more functions that provide data storage resources accessible to agent system 200-20. In some embodiments, access to the data of external database 200-30 is provided directly to agent system 200-20 (e.g., the agent system manages the database) and/or indirectly to agent system 200-20 (e.g., a database is managed by a different system, but data stored therein can be provided and/or stored for use by agent system 200-20). In some embodiments, external database 200-30 is dedicated to (e.g., only for use by) agent system 200-20, is not dedicated to agent system 200-20 (e.g., is a database of a web service accessible to different agent systems), and/or is a combination of both dedicated and nondedicated database resources. In some embodiments, remote processing component 200-32 represents one or more components that function as a data processing resource that is accessible to agent system 200-20. In some embodiments, access to remote processing component 200-32 is provided directly to agent system 200-20 (e.g., the agent system manages the processing resources) and/or indirectly to agent system 200-20 (e.g., a processing resource managed by a different system, but that can provide data processing for the benefit of agent system 200-20). In some embodiments, remote processing component 200-32 is dedicated to (e.g., only for use by) agent system 200-20, is not dedicated to agent system 200-20 (e.g., is a processing resource of a web service accessible to different agent systems), and/or is a combination of both dedicated and non-dedicated processing resources. Examples of data processing include processing image data (e.g., for feature extraction and/or object detection), processing audio data (e.g., for processing natural language speech input via a large language model), and/or training a machine learning algorithm and/or model. In some embodiments, remote administration component 200-34 represents functions that include and/or are related to administrative functions. For example, such administrative functions can include providing component updates to agent system 200-30 (e.g., software and/or firmware updates), managing accounts (e.g., permissions, access control, and/or preferences associated therewith), synchronizing between different agent systems and/or components thereof (e.g., such that an agent accessible via multiple devices of a user can provide a consistent user experience between such devices), managing cooperation with other services and/or agent systems, error reporting, managing backup resources to maintain agent system reliability and/or agent availability, and/or other functions required by agent system 200-20 to perform operations, such as those described herein.

[0098] The various components of agent system 200-20 described above with respect to FIG. 5 represent functional blocks that represent functionality. This functionality can be implemented on the same and/or different hardware (e.g., physical components) and/or by the same and/or different software. For example, the functional blocks can be implemented using one or more physical components, devices (e.g., computer system 100 and/or device 200), and/or software programs. In other words, each functional block does not necessarily represent a single, discrete physical component, device, and/or software program, but can be implemented using one or more of these. Further, agent system 200-20 can include multiple implementations of functionality represented by a respective functional block. For example, agent system 200-20 can include multiple different model components representing ML models that are used in different contexts, can include multiple different API components representing different APIs that are used for different services, and/or can include multiple different visual output components that are used for outputting different types of visual output.

[0099] Attention is now turned to discussion of concepts that can arise with respect to operation of an agent.

[0100] As discussed throughout, an agent can be capable of interacting with a user. In some embodiments, this capability includes the ability to process explicit requests, commands, and/or statements. In some embodiments, explicit requests, commands, and/or statements include and/or are interpreted as instructions directed to accomplishing a task (e.g., display X, complete task Y, and/or perform operation Z). In some embodiments, an agent includes the ability to process implicit requests, commands, and/or statements. In some embodiments, an implicit request, command, and/or statement does not include an explicit request, command, and/or statement. For example, “I like going to Europe,” can be interpreted as an implicit request, command, and/or statement which, in response to detecting, device 200 displays an itinerary in response to the statement. As another example, “This picture is for my grandmother,” can be interpreted as an implicit request, command, and/or statement which, in response to detecting, device 200 displays suggestions for modifying the picture). As another example, “I’m so tired,” can be interpreted as an implicit request, command, and/or statement which, in response to detecting, device 200 causes a sleep meditation application to begin a meditation session. As yet another example, “I miss my grandad” can be interpreted as an implicit request, command, and/or statement when, in response to detecting, device 200 can initiate a live communication session (e.g., telephone call, video call, and/or text messaging session) with grandad. In some embodiments, an implicit request is more likely to be processed according to one or more current environmental context, operational context, and/or user context, while an explicit request is less likely to be processed according to one or more current environmental context, operational context, and/or user context. For example, the phrase, “call my grandad,” can be an explicit request, and in response to detecting the request, device 200 will initiate a live communication session with grandad, irrespective of one or more current environmental context, operational context, and/or user context. However, the phrase, “I miss my grandad,” can be an implicit request, and in response to detecting the request, device 200 can display a list of gifts to buy for grandad if a user has been recently talking about buying gifts or could call grandad in another context that does not include the user recently discussing buying gifts. In some embodiments, a request can include one or more explicit requests and one or more implicit requests. In some embodiments, an implicit request is responded to independently from an explicit request; and in other embodiments, a response to an implicit request is dependent on an explicit request.

[0101] Reference can be made herein to a response by an agent that is output by a device. In some embodiments, a response includes an audio portion (e.g., audio output, acoustic output, sound, and/or speech) (also referred to herein as a “verbal response,” an “audio response,” and/or an “acoustic response) and/or a visual portion (e.g., display and/or movement of a representation and/or avatar). In some embodiments, a response includes a movement portion (e.g., movement of the device). In some embodiments, a response includes a haptic portion (e.g., touch and/or vibration).

Reference can be made herein to an internal dialogue, internal context, and/or an operational context, which can refer to a dynamic context or dynamic decision-making process of the device, an internal state of device 200, and/or internal data the device is partially basing its decision on. In some embodiments, an internal dialogue includes a set of one or more rules, characteristics, detections, and/or observations that the computer system uses to generate a response to one or more commands, questions, and/or statements). In some embodiments, the set of one or more rules, characteristics, detections, and/or observations are learned and/or generated via deep learning and/or one or more machine learning algorithms, and/or using one or more machine learning and/or system agents. In some embodiments, an internal dialogue is generated in real-time. In some embodiments, an internal dialogue is locally stored and/or stored via the cloud. In some embodiments, an internal dialogue can be modified, updated, and/or deleted. In some embodiments, an internal dialogue is generated based on other internal dialogues.

[0102] Reference can be made herein to personality and/or behavior (or a representation of personality /behavior) (e.g., of an agent, user, and/or character). In some embodiments, personality and/or behavior refers to a set of one or more characteristics that the device detects, has knowledge of, conforms to, applies, and/or tracks. In some embodiments, the personality or behavior is used as basis to perform operations. For example, an agent can detect a user’s personality and respond in a manner based on the personality (e.g., output different responses in response to different user personalities). As another example, the agent can output a response having characteristics that correspond to one or more characteristics that correspond to the personality and/or behavior (e.g., output a response in different ways that depend on personality of the agent). In some embodiments, such characteristics represent and/or mimic personality of a user, such as how the user acts and/or speaks. In some embodiments, such characteristics approximate a user’s personality.

[0103] In some embodiments, an agent is a system agent. In some embodiments, a system agent is an agent that corresponds to a process that originates from and/or is controlled by an operating system of the device (e.g., the device implementing the agent). In some embodiments, an agent is an application agent. In some embodiments, an application agent is an agent that corresponds to a process that originates from and/or is controlled by an application of (e.g., installed on and/or executed by) the device (e.g., the device implementing the agent).

[0104] Reference can be made herein to a representation (e.g., an avatar and/or avatar representation) of an agent (e.g., and/or of a user (e.g., person, object, and/or an animal) and/or a user interface object (e.g., an animated character)). In some embodiments, a representation of an agent refers to a set of output characteristics (e.g., visual and/or audio) of the agent (and/or the user and/or the user interface object). For example, a representation of an agent can include (and/or correspond to) a set of one or more visual characteristics (e.g., facial features of an animated face) and/or one or more audio characteristics (e.g., language and voice characteristics of audio output). In some embodiments, a representation (e.g., of an agent) is used to represent output by the agent. For example, a device implementing an interactive agent outputs audio in a voice of the agent and displays an animated face of the agent moving in a manner to simulate the agent speaking the audio output. In this way, a user can feel like they are having a normal conversation with the agent. In some embodiments, a representation of an agent is (or is not) inclusive of personality and/or behavior characteristics (e.g., as described above). For example, a representation of an agent can include (and/or correspond to) a set of visual characteristics (e.g., facial features of an animated face) and also a set of personality characteristics. In some embodiments, a representation of an agent includes a set of user characteristics that correspond to visual representation of a user (e.g., representations of a user’s appearance, voice, and/or personality are used as an avatar that appears to move and/or speak). In some embodiments, a representation is a representation of a face (e.g., a user interface object that is output having features that simulate a face and/or facial expressions of a person (e.g., for conveying information to a viewer)).

[0105] In some embodiments, a character (e.g., of an agent and/or avatar) refers to a particular set of characteristics of a representation. For example, an avatar can take on (e.g., use, apply, interact with, and/or output according to) characteristics of a fictional and/or non- fictional character (e.g., from a movie, a show, a book, a series, and/or popular culture).

[0106] In some embodiments, a voice (e.g., of an agent and/or avatar) refers to a set of one or more characteristics corresponding to sound output that resembles (e.g., represents, mimics, and/or recreates) vocal utterance (e.g., attributable and/or simulated as being output by an agent and/or avatar). For example, device 200 can output a sentence that sounds different depending on a voice used. In some embodiments, a particular character and/or avatar can be configured to use a particular voice (e.g., have a corresponding voice). In some embodiments, the particular voice can mimic a user’s voice.

[0107] In some embodiments, an appearance (e.g., of an agent and/or avatar) refers to a set of one or more characteristics corresponding to visual output that represents an avatar (and/or an agent). For example, device 200 can output an avatar that has a set of facial features forming an appearance that resembles a particular character from a movie.

[0108] In some embodiments, an expression of an avatar refers to a set of one or more characteristics corresponding to a particular visual appearance of a user, an avatar, and/or an agent. For example, device 200 can output an avatar that has a set of facial features arranged in a particular way to give the appearance of a facial expression (e.g., which can be used as a form of non-verbal communication to a user) (e.g., a frown is an expression of sadness, a smile is an expression of happiness, and/or wide open eyes is an expression of surprise). As another example, device 200 can output an avatar that has a set of body features (e.g., arms and/or legs) arranged in a particular way to give the appearance of a body expression (e.g., which can be used as a form of non-verbal communication to a user) (e.g., a hand gesture is an expression of approval, covering eyes is an expression of fear, and/or shrugging shoulders is an expression of lack of knowledge). In some embodiments, an expression includes movement (e.g., a head nod is an expression of agreement and/or disagreement) of the avatar. In some embodiments, device 200 can move, via the movement component, to indicate an expression with or without the avatar moving. In some embodiments, an agent performs one or more operations that depend on a user’s expression (e.g., detects if a person is sad and responds with a kind statement or question). In some embodiments, expressions (e.g., whether and/or how they are used and/or how they are output) depends on personality. For example, a first personality can use a particular expression more than a second personality. As another example, an expression (e.g., frown, smile, and/or how wide eyes are opened) for the first personality can appear different from the expression (and/or a similar and/or equivalent expression) for a second personality (e.g., the first personality smiles in a manner that reveals teeth, but the second personality smiles without revealing teeth).

[0109] In some embodiments, an agent (e.g., an avatar of the agent and/or an agent system (e.g., hardware and/or software) implementing the agent) mimics characteristics of another user, agent, and/or character (e.g., in personality, behavior, expressions, and/or voice). In some embodiments, mimicking includes mirroring a user (e.g., copying use of a phrase and/or movement detected from a user interacting with the agent). In some embodiments, mimicking characteristics of a user includes attempting to reproduce the characteristics of the user (e.g., in the exact same manner and/or in manner that resembles the characteristics but is not an exact reproduction of the characteristics). For example, an agent mimicking voice and/or expressions does not require the agent have the exact same voice and/or expressions as the user being mimicked (e.g., but rather simply resembles the user’s voice and/or expressions).

[0110] In some embodiments, a component and/or device uses (e.g., performs operations, makes decisions, and/or determines context based on) learned characteristics (e.g., characteristics of a context, user, and/or environment that the device has learned over time (e.g., via detection, prior experience, and/or feedback (e.g., from one or more users)). For example, characteristics learned over time can include a user’s routine. In such example, if a particular user asks an agent for a summary of any new messages for the user at the same time every day, the agent can learn to perform operations automatically based on the learned characteristics of the routine (e.g., what data is needed, when the data is needed, and/or for which user). In some embodiments, use of learned characteristics enables an agent (and/or device) to improve understanding of (and/or responses to) a context, user, and/or environment, and/or to understand a context, user, and/or environment that otherwise was not (and/or would not be) understood (e.g., not responded to or responded to incorrectly). In some embodiments, learned characteristics are formed (e.g., by and/or for an agent) using reinforcement learning. In some embodiments, learned characteristics correspond to one or more levels of confidence, certainty, and/or reward (e.g., that are shaped by one or more reward functions). In some embodiments, learned characteristics (and/or how they are used to affect output of an agent and/or device) can change over time (e.g., levels confidence, certainty, and/or reward change over time). For example, output of a device before learning a set of learned characteristics can be different from output of the device after learning the set of learned characteristics. In some embodiments, a component and/or device uses learned knowledge. For example, similar to described above with respect to learned characteristics, learned knowledge can refer to information used to update (e.g., enhance, add to, and/or augment) a knowledge base of a device (e.g., for use by an agent implemented thereon). In some embodiments, multiple sets of learned characteristics for a user can be stored and/or used. In some embodiments, different sets of learned characteristics for different users can be stored and/or used.

[OHl] Reference can be made herein to interaction with an agent (and/or a device). In some embodiments, an interaction refers to a set of one or more inputs and/or outputs of a device implementing the agent and one or more users. For example, an interaction can be an input by a user (e.g., “Please turn on the lights”) and a corresponding output (e.g., causing the lights to turn on and/or a response by the device of “Okay”). In some embodiments, interaction can include multiple inputs/outputs by one or more of the parties to the interaction (e.g., device and/or users). For example, an interaction can include a first input by a user (e.g., “Please turn on the lights”) and a corresponding first output (e.g., “Which lights?”), and also include a second input by the user (e.g., “Kitchen lights”) and a second output from the device (e.g., “Okay”). In some embodiments, which inputs and/or outputs are considered together as an interaction is based on a logical and/or contextual grouping (e.g., interactions within the previous thirty (30) seconds and/or interactions relating to turning on the lights). As one of skill will appreciate, an interaction can be considered in a manner that depends on the implementation (e.g., determining when an interaction is complete can involve determining if the user still present (e.g., speaking at all) and/or if the user still talking about the lights or has moved onto a different topic). In some embodiments, an interaction is a current interaction (e.g., ongoing, presently occurring, and/or active). In some embodiments, an interaction is a previous interaction. The examples above describe a device having a conversation with a user. In some embodiments, a conversation is between two or more users (e.g., users in an environment). For example, a device can detect a conversation between to users (e.g., the users are directing speech and responses to each other, rather than to the device).

[0112] In some embodiments, an agent (and/or device) determines and/or performs an operation based on an intent corresponding to a user. For example, a device detects user input and outputs a response that depends on an intent of the user input. For example, a device detects user input that includes a pointing gesture detected together with verbal instruction to “turn on that light,” and in response, the device turns on the light that is determined to correspond to the intent of the input (e.g., the light toward which the pointing gesture directed). In some embodiments, intent is determined (e.g., by the device that detects input and/or by one or more other devices) using one or more of: one or more inputs, knowledge (e.g., learned knowledge about a user based on a history of observed behavior, personality, and interactions), learned characteristics, and/or context. In some embodiments, intent is determined from one or more types of input (e.g., verbal input, visual input via a camera, and/or contextual input). [0113] Attention is now directed towards embodiments of user interfaces (“LT’) and associated processes that are implemented on an electronic device, such as computer system 100 and/or device 200.

[0114] FIGS. 6A-6M illustrate exemplary user interfaces for capturing content based on detected input in accordance with some embodiments. The user interfaces in these figures are used to illustrate the processes described below, including the processes in FIGS. 7-9.

[0115] The left side of FIGS. 6A-6M illustrates computer system 600 as a tablet displaying different user interface objects. It should be recognized that computer system 600 can be other types of computer systems such as a smart phone, a smart watch, a laptop, a communal device, a smart speaker, an accessory, a personal gaming system, a desktop computer, a fitness tracking device, and/or a head-mounted display (HMD) device. In some embodiments, computer system 600 includes and/or is in communication with one or more input devices and/or sensors (e.g., a camera, a LiDAR sensor, a motion sensor, an infrared sensor, a touch-sensitive surface, a physical input mechanism (such as a button or a slider), and/or a microphone). Such sensors can be used to detect presence of, attention of, statements from, inputs corresponding to, requests from, and/or instructions from a user in an environment. It should be recognized that, while some embodiments described herein refer to inputs being voice inputs, other types of inputs can be used with techniques described herein, such as touch inputs that are detected via a touch-sensitive surface and/or air-gestures detected via a camera (e.g., a camera that is in communication (e.g., wireless and/or wired communication) with computer system 600. In some embodiments, computer system 600 includes and/or is in communication with one or more output devices (e.g., a display screen, a projector, a touch-sensitive display, speaker, and/or a movement component). Such output devices are used to present information and/or cause different visual changes of computer system 600. In some embodiments, computer system 600 includes and/or is in communication with one or more movement components (e.g., an actuator, a moveable base, a rotatable component, and/or a rotatable base). Such movement components are used to change a position (e.g., location and/or orientation) of the entirety of computer system 600 or a portion (e.g., including one or more sensors, input components, and/or output components) of computer system 600. In some embodiments, computer system 600 includes one or more components and/or features described above in relation to computer system 100 and/or device 200. In some embodiments, computer system 600 includes one or more agents and/or functions of an agent as described above with respect to FIG. 5. In some embodiments, computer system 600 is, includes, implements, and/or is in communication with one or more agent systems, as described above with respect to FIG. 5, for performing (and/or causing performance of) one or more operations of an agent.

[0116] The right side of FIGS. 6A-6M includes diagram 618. Diagram 618 is a visual aid representing a physical space and/or environment that includes computer system 600 and a first user. Diagram 618 includes computer system representation 620 that is representative of computer system 600 and user representation 622 that is representative of the first user. The positioning of computer system representation 620 and user representation 622 within diagram 618 is representative of the real -world positioning of computer system 600 with respect to the first user. Diagram 618 includes representation of field-of-detection 654 which represents a field-of-detection and/or a field-of-view (sometimes referred to as the field-of- detection of camera 602) of one or more camera sensors of the computer system 600 (e.g., camera 602). The field-of-detection corresponds to the field-of-detection for one or more front facing sensors of computer system 600. In some embodiments, one or more other sensors of computer system 600 has a different field-of-detection than the field-of-detection represented by representation of field-of-detection 654 (e.g., overlapping but smaller or bigger and/or not overlapping) illustrated in FIG. 6A. In this embodiment, there is one user within the environment. In some embodiments, there is more than one user within the environment. Diagram 618 also includes representation of extent of field-of detection.

Representation of extent of field-of detection 652 indicates the extent of the field-of-detection of camera 602 (e.g., one or more sensors of the computer system 600) (e.g., sometimes referred to as a global field-of-detection) (e.g., via a different setting, such as a wide angle setting, and/or a different set of one or more sensors than being used with respect to representation of field-of-detection 654) and/or the extent of the range of motion of representation of field-of-detection 654 (e.g., via a moveable sensor and/or the one or more movement components described above).

[0117] FIGS. 6A-6M illustrate a process where a user requests computer system 600 to capture media in the form of photos and/or video. In some embodiments, the request includes instructions for how computer system 600 should capture a still photograph and/or a video. In such embodiments, computer system 600 guides the user differently depending on (1) the specificity of the request and/or (2) previous interactions with the user. In some embodiments, the request also includes instructions for computer system to mimic a style, such as of a media artist and/or other predefined style, while capturing the media. When capturing media based on the style, computer system 600 uses different settings, such as camera and/or movement, to capture media according to the requested style. In some embodiments, computer system 600 moves one or more camera sensors to change the framing or alignment of users, features, and/or objects within the field-of-detection of the one or more cameras sensors and/or computer system 600 to recreate the requested style. In some embodiments, computer system 600 outputs composition instructions for the user to help recreate the requested style. For example, in some embodiments, to recreate a requested style, computer system 600 instructs a user to move to the right, move to the left, and/or move an object that is within the field-of-detection of the one or more camera sensors.

[0118] As illustrated in FIG. 6A, computer system 600 includes camera 602 and display 604, both of which are front facing (e.g., facing the first user). As illustrated in FIG. 6A, display 604 occupies the majority of the front facing side of computer system 600. As illustrated in FIG. 6A, camera 602 is located above display 604. In some embodiments, camera 602 is located in a different position relative to display 604. For example, in some embodiments, camera 602 is located near one of the side edges of display 604. As illustrated in FIG. 6A, camera 602 is located outside of display 604. In some embodiments, camera 602 is located within or behind display 604. In some embodiments, camera 602 comprises one or more camera sensors. In some embodiments, camera 602 comprises multiple camera sensors of different designs and capabilities (e.g., wide angle, fisheye, macro, telephoto, and/or normal lenses).

[0119] As illustrated in FIG. 6A, computer system 600 displays camera user interface 606 via display 604. Camera user interface 606 includes live-view 608, which includes representations of objects and/or individuals that are within the field-of-detection of the one or more camera sensors (e.g., camera 602) that are connected to and/or in communication with computer system 600. The display of live-view 608 by computer system 600 within camera user interface 606 allows users of computer system 600 to view a preview of the content that will be included in resulting media items that are captured by computer system 600. In some embodiments, live-view 608 includes representations of the field-of-detection of multiple different camera sensors. In some embodiments, live-view 608 occupies the majority of camera user interface 606 and optionally expands to the edges of camera user interface 606. In some embodiments, live-view 608 occupies the entirety of camera user interface 606.

[0120] As illustrated in FIG. 6 A, camera user interface 606 includes frame 610, camera control 612, and content indicator 614. As illustrated in FIG. 6A, computer system 600 displays camera control 612 and content indicator 614 below frame 610. Frame 610 indicates the content that will be captured by computer system 600 while computer system 600 performs a media capture process (e.g., content within frame 610 will be visible in resulting media captured by computer system 600 and content outside of frame 610 will not be visible in resulting media captured by computer system 600). In some embodiments, computer system 600 displays live-view 608 beyond frame 610 to indicate content that is within the field-of-detection of the one or more camera sensors but will not be visible in resulting media that is captured by computer system 600. In some embodiments, computer system 600 moves frame 610 within camera user interface 606 to indicate different frame ratios as different camera modes are selected (e.g., by a user and/or by computer system 600). In some embodiments, in response to detecting a touch input on a location corresponding to camera control 612, computer system 600 initiates a media capture operation that captures content positioned within frame 610.

[0121] As illustrated in FIG. 6A, computer system 600 displays a representation of the most recently captured media content (e.g., an image of a flower) (e.g., first content 616) within content indicator 614. In some embodiments, in response to detecting a touch input on a location corresponding to content indicator 614, computer system 600 displays an enlarged version of the most recently captured media content.

[0122] As illustrated by diagram 618 at FIG. 6A, user representation 622 is within representation of field-of-detection 654, indicating that the first user is within the field-of- detection of the one or more camera sensors (e.g., field-of-detection of camera 602). Accordingly, at FIG. 6A, the first user is within the field-of-detection of the one or more camera sensors. As illustrated at FIG. 6A, because the first user is within the field-of- detection of the one or more camera sensors, computer system 600 displays user image 624 within live-view 608. User image 624 is a representation of the first user as detected by the one or more camera sensors (e.g., camera 602). At FIG. 6 A, computer system 600 detects first verbal input 605a from the first user corresponding to the first user requesting that computer system 600 capture a first image. It should be recognized that the request to capture the first image can be other types of inputs other than a verbal input, such as a tap input detected via display 604 and/or an air gesture (e.g., air point, air swipe, de-pinch gesture, and/or pinch gesture) detected via camera 602.

[0123] As illustrated in FIG. 6B, in response to detecting first verbal input 605a, computer system 600 outputs first audio content 626 corresponding to computer system 600 asking the first user in what style would the first user like the first image to be captured (e.g., how would the first user like to be framed, at what zoom level should computer system 600 capture the still photo at, and/or should computer system 600 track the first user). The content included in first audio content 626 is based on the content that is detected in first verbal input 605a. Given the brevity of first verbal input 605a, computer system 600 outputs first audio content 626 with content to extract additional information from the first user with respect to the type of photo the first user would like for computer system 600 to capture. Examples of additional information computer system 600 can query from the first user include the type of photo, the style of photo, the orientation of the photo, and/or a zoom level of the photo. As explained in greater detail below, when the first user provides computer system 600 with more information when prompting computer system 600 to capture a photo, computer system 600 either does not attempt to extract additional information or attempts to extract less information from the first user.

[0124] At FIG. 6B, at a time after outputting first audio content 626, computer system 600 detects composition directions 605b from the first user corresponding to the first user giving computer system 600 instructions for the style in which the first user would like the first image to be captured. Composition directions 605b indicates that the first user would like for computer system 600 to capture the first user in a style that corresponds to an individual named Kyle. Kyle is an individual and/or artist that does not track their subjects before and/or while capturing media. In response to detecting composition directions 605b, computer system 600 configures itself not to track the first user as the first user moves within the environment. In some embodiments, Kyle is a known artist with recognizable artistic style that has not previously interacted with computer system 600. In some embodiments, Kyle is a second user known to computer system 600 who’s image capturing habits are known to computer system 600.

[0125] In some embodiments, composition directions are based on an artistic style (e.g., style of a specific artist, art movement, artistic period, artistic technique, and/or artistic genre) specifically requested by the user. For example, in some embodiments, the first user specifies that the first user wants media captured to look like a suspense novel cover. In some embodiments, composition directions are based on specific conditions indicated by the user. For example, in some embodiments, the first user specifies that the first user wants to capture a portrait shot of themselves, captured from above, in black and white, with a soft filter. For another example, in some embodiments, the first user specifies that the first user wants a picture of their hand to show off their new championship ring, instructing computer system 600 to be sure the words on the ring are legible.

[0126] In some embodiments, composition directions include an instruction for computer system 600 to begin and/or stop a media capturing process once the user is detected in a certain pose. For example, in some embodiments, composition directions include instructions such as, “start recording video once I sit down,” causing computer system 600 to wait until computer system 600 detects the first user sitting down before capturing media content. For another example, in some embodiments, composition directions include instructions such as “stop taking pictures when I stand up,” causing computer system 600 to stop capturing media content when computer system 600 detects the first user standing up. In some embodiments, composition directions include an instruction for computer system 600 to begin the media capturing process once computer system 600 detects that the first user has performed a gesture. In some embodiments, composition directions include an instruction for computer system 600 to begin the media capturing process once computer system 600 detects a gaze of the first user in the direction of computer system 600.

[0127] In some embodiments, computer system 600 detects composition directions that includes instructions indicating temporal boundaries for capturing media content. In some embodiments, the indicated temporal boundaries are based on camera-detectable inputs such as poses, gestures, gazes, and/or facial expressions (e.g., the first user holds up three fingers indicating that computer system 600 should initiate a media capture operation in three seconds). In some embodiments, the temporal boundaries include information on when computer system 600 should start capturing media content. For example, in some embodiments, composition directions include instructions such as “take the picture in forty- five seconds,” causing computer system 600 to wait forty-five seconds before initiating the capture of media. In some embodiments, the composition directions include information on when computer system 600 should stop capturing media content. For example, in some embodiments, composition directions include instructions such as “stop capturing video after thirty seconds,” causing computer system 600 to cease capturing media thirty seconds after initiating the capture of media. In some embodiments, the indicated temporal boundaries include information with respect to an interval at which computer system 600 should capture different media. For example, in some embodiments, composition directions include information such as “take a picture every six seconds,” causing computer system 600 to capture am image every six seconds. For another example, in some embodiments, composition directions include information such as “capture a two second video every time I hit the ball,” causing computer system 600 to capture a two second video every time computer system 600 detects the first user hit a ball. In some embodiments, the indicated temporal boundaries includes a combination of the above-mentioned media content capturing start, stop, and/or interval information. For example, in some embodiments, compositional directions include information such as “start taking pictures when I pick up the flowers, taking one picture per second for two minutes.”

[0128] In some embodiments, the manner in which media is captured is automatically selected by the computer system 600. In some embodiments, the manner in which media is captured is automatically selected by the computer system 600 based on detected scenarios. For example, if computer system 600 detects the first user is in a graduation cap and gown, computer system 600 automatically selects to capture media using portrait settings that highlight the graduate, such as blurring the background and increasing the saturation of the colors, so the gown does not look washed out. For another example, if computer system 600 detects the first user doing an overly dramatic pose in harsh lighting, computer system 600 automatically selects to capture media in black and white with a grainy filter to mimic the look of an old movie. In some embodiments, composition guidelines are automatically selected based on media capture settings, such as capturing media in black and white versus color. In some embodiments, composition guidelines are automatically selected based on system default media capture rules, such as selecting to capture media with a certain amount of color, light, and/or contrast balance.

[0129] In some embodiments, as described in more detail below, computer system 600 moves a portion of computer system 600, via one or more movement components, to track the first user. In such embodiments, moving the portion of computer system 600 and thereby moving camera 602 moves the field-of-detection of camera 602 (e.g., represented by representation of field of detection 654) (e.g., the field-of-detection of camera 602). In some embodiments, computer system 600 moves the portion of computer system 600 using multiple types of movement (e.g., simultaneously movement or serial movement) including directional (e.g., left, right, up, down, forward, and/or back), rotation (e.g., yaw, roll, and/or pitch), and/or position (e.g., reaching, shortening, folding, leaning, and/or tilting).

[0130] At FIG. 6C, as indicated by user representation 622 being positioned to the right within diagram 618, the first user moves to the right within the field-of-detection of camera 602 of the one or more camera sensors. At FIG. 6C, a determination is made that the first user moves to the right within the field-of-detection of the one or more camera sensors. At FIG. 6C, because computer system 600 has configured itself not to track the first user (e.g., in response to detecting composition directions 605b), computer system 600 does not move the portion of computer system 600 in response to the first user moving to the right (e.g., a first movement pattern). That is, because composition directions 605b requested that computer system 600 capture an image like Kyle would and because Kyle does not track the movement of its subjects when capturing media, computer system 600 does not track the first user while computer system 600 is configured to capture media based on composition directions 605b. In some embodiments, as described in more detail below, computer system 600 moves the portion of computer system 600 automatically in response to detecting the first user move.

[0131] As illustrated in FIG. 6C, because the first user is within the right portion of the field-of-detection of camera 602 and computer system 600 does not move the portion of computer system 600, computer system 600 displays user image 624 within the right side of live-view 608.

[0132] At FIG. 6D, as indicated by user representation 622 being positioned to the left within diagram 618, the first user moves to the left within the field-of-detection of the one or more camera sensors. At FIG. 6D, a determination is made that the first user moves to the left within the field-of-detection of computer system 600. At FIG. 6D, because computer system 600 has configured itself not to track the first user (e.g., in response to detecting composition directions 605b), computer system 600 does not move the portion of computer system 600 to move the field-of-detection of camera 602 of the one or more camera sensors based on the movement of the first user. As illustrated in FIG. 6D, because the first user is within the left portion of field-of detection 654 of computer system 600 and computer system 600 and does not move the portion of computer system 600, computer system 600 displays user image 624 within the left side of live-view 608.

[0133] At FIG. 6D, at a time after the first user moves to the left within the field-of- detection of camera 602, the first user ceases moving and holds a pose. Examples of the first user holding a pose includes the first user holding a facial feature, limbs, extremities, and/or torso in a position for a period of time. At a time after detecting the first user move to the left within the field-of-detection of camera 602, a determination is made that the first user is holding the pose for a predetermined period of time. In some embodiments, computer system 600 detects the first user holding a pose based on the rate of change of the position of the first user over time as detected via camera 602. In some embodiments, computer system 600 automatically (e.g., without intervening user input) captures a photo of the first user based on a determination that the first user has stopped moving and/or held the pose for the predetermined period of time.

[0134] At FIG. 6D, based on the determination that the first user is holding the pose for the predetermined period of time, computer system 600 outputs second audio content 628 corresponding to computer system 600 asking the first user if the first user is ready for the first image to be captured (e.g., are you ready?). In some embodiments, computer system 600 outputs second audio content 628 in response to detecting a predetermined and/or known pose of the first user (e.g., a pose that computer system 600 recognizes as a pose the first user does often and/or a pose computer system 600 has been trained to recognize and look for). In some embodiments, computer system 600 outputs second audio content 628 in response to detecting a gaze in a particular direction and/or gesture of the first user. At FIG. 6D, at a time after outputting second audio content 628, computer system 600 detects third verbal input 605d from the first user corresponding to a positive response (e.g., yes) to second audio content 628. It should be recognized that the positive response can be other types of inputs other than a verbal input, such as a tap input detected via display 604 and/or an air gesture (e.g., air point, air swipe, de-pinch gesture, and/or pinch gesture) detected via camera 602.

[0135] At FIG. 6E, in response to detecting third verbal input 605d, computer system 600 outputs third audio content 630 corresponding to a countdown. At FIG. 6E, in response to computer system 600 reaching the end of the countdown within third audio content 630, computer system 600 captures the first image that includes content within frame 610. After capturing the first image, computer system 600 displays a representation of the first image (e.g., second content 632) within content indicator 614 and ceases displaying the representation of the previously captured media content (e.g., first content 616). In some embodiments, in response to detecting the first user holding a pose for a predetermined threshold of time, computer system 600 outputs third audio content 630 corresponding to a countdown without outputting second audio content 628 or detecting third verbal input 605d. In some embodiments, in response to detecting third verbal input 605d after outputting first audio content 626, computer system 600 captures media content without outputting third audio content 630. In some embodiments, computer system 600 displays the countdown.

[0136] In some embodiments, while computer system 600 is outputting third audio content 630 corresponding to a countdown, computer system 600 detects a verbal and/or an air gesture from the first user corresponding to a request to capture media. For example, if the first user does not want to wait until the end of the countdown for the media to be captured, such as when the first user finds the pose hard to hold, the first user can verbally request computer system 600 to capture media. In some embodiments, in response to detecting a verbal and/or an air gesture from the first user corresponding to a request to capture media while outputting third audio content 630, computer system 600 ceases outputting third audio content 630 and captures media. In some embodiments, while computer system 600 is outputting third audio content 630 corresponding to a countdown, computer system 600 detects a verbal and/or an air gesture corresponding to a request to cancel the media capture operation. For example, the first user can change their mind about liking the composition of the media as seen in live-view 608 and perform an air gesture corresponding to a request to cancel the media capture operation. In some embodiments, in response to detecting a verbal and/or an air gesture corresponding to a request to cancel the media capture operation while outputting third audio content 630, computer system 600 ceases outputting third audio content 630 and does not capture media. In some embodiments, in response to detecting a verbal and/or an air gesture corresponding to a request to cancel the media capture operation while outputting third audio content 630, computer system 600 outputs (e.g., visually outputs and/or audible outputs) an indication that computer system 600 will not capture a media.

[0137] In some embodiments, composition directions include conditions for computer system 600 performing the media capture process. In some embodiments, if conditions included in the composition directions are met, computer system 600 automatically (e.g., without intervening user input) captures media content. In some embodiments, if conditions included within the composition directions are not met, computer system 600 does not capture media. In some embodiments, while computer system 600 is outputting third audio content 630 corresponding to a countdown, computer system 600 detects a change in the environment that causes conditions included in the composition directions for the desired media to no longer be met. Examples of changes to the environment that can cause conditions included in the composition directions to no longer be met include a change in the lighting (e.g., the sun goes behind the clouds, the blinds are opened, a flashlight is turned on, and/or the lights are powered off), a change in one or more of features and/or positions of the first user (e.g., the first user stops smiling, looks away from computer system 600, changes positions, and/or moves out of the field-of-detection of camera 602), and/or a physical change in the environment (e.g., the family pet runs through the field-of-detection of the one or more camera sensors, a sanitation truck appears in the background, and/or an object falls over). In some embodiments, in response to detecting a change in the environment that causes conditions included in the composition directions to no longer be met while outputting third audio content 630, computer system 600 ceases outputting third audio content 630 and does not capture media.

[0138] At FIG. 6F, as indicated by user representation 622 being positioned with the right portion of representation of field-of-detection 654, the first user moves to the right within the field-of-detection of the one or more camera sensors. At FIG. 6F, a determination is made that the first user moves to the right within the field-of-detection of the one or more camera sensors. At FIG. 6F, because computer system 600 has configured itself not to track the first user (e.g., in response to detecting composition directions 605b), computer system 600 does not move the portion computer system 600 in response to the first user moving to the right. As illustrated in FIG. 6F, because the first user moves to the right within the field-of- detection of the one or more camera sensors and computer system 600 does not move the portion of computer system computer system 600 displays user image 624 within the right side of live-view 608. At FIG. 6F, computer system 600 detects the first user holding a second pose for a predetermined amount of time.

[0139] At FIG. 6F, in response to detecting the first user holding the second pose for a predetermined amount of time at a time after the first image, computer system 600 captures a second image. That is, at a time after computer system 600 captures the first image, computer system 600 captures the second image in response to detecting the first user holding the second pose without detecting another verbal and/or air gesture. In some embodiments, computer system 600 detects the first user in a series of poses and in response captures media content corresponding to each detected pose. In some embodiments, computer system 600 detects the first user holding the second pose within a predetermined threshold amount of time after capturing first image, and, in response, computer system 600 captures additional media content. In some embodiments, computer system 600 does not detect the first user holding the second pose within a predetermined threshold amount of time after capturing the first image, and, in response, computer system 600 does not capture additional media content. For example, if computer system 600 detects the first user holding the second pose at a time too long after capturing the first image, computer system 600 does not capture the second image.

[0140] In some embodiments, at a time after capturing media content, computer system 600 detects a pose of the first user that is a pose of a set of known poses, and, in response, computer system 600 captures media content. For example, computer system 600 captures media if computer system 600 detects a pose that is in a series of poses that computer system 600 has been configured to detect. In some embodiments, at a time after capturing media content, computer system 600 detects a pose of the first user that is not a pose of the set of known poses, and, in response, computer system 600 does not capture additional media content.

[0141] As illustrated in FIG. 6F, in response to capturing the second image, computer system 600 displays a representation of the second image (e.g., third content 634) within content indicator 614 and ceases displaying the representation of the first image within content indicator 614. At FIG. 6F, computer system 600 detects composition directions 605f from the first user corresponding to the first user instructing computer system 600 to capture a photo in a particular style (e.g., “capture a photo like Jane would”). Jane is an individual that tracks their subjects before and/or while capturing media. Further, Jane is an individual that captures subjects once the subject is centered in the field-of-detection of the one or more camera sensors. In response to detecting composition directions 605f, computer system 600 configures itself to track the first user as the first user moves within the environment and to capture images of the subject once the subject is centered within the field-of-detection of the one or more camera sensors. In some embodiments, composition directions 605f are similar to composition directions 605b. [0142] At FIG. 6G, because computer system 600 configured itself to track the first user, computer system 600 moves the portion of computer system 600 in a second movement pattern until the first user is centered within the field-of-detection of the one or more camera sensors (e.g., a different response than computer system 600 had to composition directions 605b). The second movement pattern is different than the first movement pattern (e.g., as described above in reference to FIG. 6C). In some embodiments, the second movement pattern is the same as the first movement pattern. In some embodiments, computer system 600 automatically moves the portion of computer system 600 in response to detecting the first user move.

[0143] In some embodiments, movements patterns include two or more types of movement (e.g., directional, rotational, and/or positional movement) (e.g., lateral movement (sideways, forward, backward and/or vertical movement) and/or rotational movement (e.g., clockwise rotation and/or counter-clockwise rotation). For example, in some embodiments, a movement pattern includes rotating the portion of computer system 600 to the left while extending the portion of computer system 600 forward. For another example, in some embodiments, a movement pattern includes movement in two different lateral directions (e.g., right and left and/or up and down). In some embodiments, different movement patterns have different speeds of movement. In some embodiments, different movement patterns have the same speeds of movement. In some embodiments, movement patterns have more than one speed of movement. For example, a movement pattern may start with slow movements and end with quick movements. In some embodiments, different movement patterns have different user and/or object following parameters (e.g., head, torso, hands, and/or within the thirds). For example, computer system 600 follows the hands of a user at a certain distance based on a current movement pattern of the portion of computer system 600. In some embodiments, different movement patterns have the same user and/or object following parameters. In some embodiments, movement patterns have more than one user and/or object following parameter. For example, a movement pattern can have a following parameter for the user’s torso and a following parameter for the user’s head. In some embodiments, computer system 600 moves the portion of computer system 600 in multiple types of movement (e.g., simultaneously movement or serial movement) including directional movement (e.g., left, right, up, down, forward, and/or back), rotational movement (e.g., yaw, roll, and/or pitch), and/or positional movement (e.g., reaching, shortening, folding, leaning, and/or tilting). [0144] In some embodiments, computer system 600 moves the portion of computer system 600 to improve media content composition. For example, computer system 600 moves the portion of computer system 600 so the user is more detectable by cameras 602. In some embodiments, improving the media content composition includes changing the media content composition to meet the composition directions requested by the first user. For example, in some embodiments, in response to detecting the first user requesting an image be captured in the style of an artist that usually has subjects captured at a particular downward angle, computer system 600 moves the portion of computer system 600 such that cameras 602 are positioned above the user. In some embodiments, improving media content composition includes changing a perspective of the composition to meet the composition guidelines automatically selected by computer system 600. For example, in some embodiments, in response to detecting that the first user requests an image be captured of the user and a second user on a beach, computer system 600 automatically selects an artistic style that is based on the rule of thirds and automatically moves the portion of computer system 600 to align the two users along the left third of the field-of-detection of the one or more camera sensors and the horizon along the bottom third of the field-of-detection of the one or more camera sensor.

[0145] In some embodiments, in response to moving the portion of computer system 600 (e.g., to improve media content composition), computer system 600 outputs audio content that is an explanation to the first user why and/or how computer system 600 is moving the portion of computer system 600. For example, in some embodiments, in response to and/or while moving the portion of computer system 600 to capture image content of a family, computer system 600 outputs audio content such as “I am moving to get everyone in frame.” In some embodiments, computer system 600 moves the portion of computer system 600 to follow a subject based on specific user instructions. For example, computer system 600 moves the portion of computer system 600 in response to detecting the first user say, “follow me around the room and take a picture every time I pose holding a book or a pen.” In some embodiments, computer system 600 moving the portion of computer system 600 to follow a user (e.g., to keep a user within and/or centered within the field-of-detection of camera 602) includes following the first user as a whole or following a portion of the first user such as the head, eyes, shoulders, torso, hands, and/or feet of the first user. In some embodiments, computer system 600 moving the portion of computer system 600 to follow the first user results in computer system 600 moving in different movement patterns depending on the portion of the user being followed. For example, in some embodiments, as part of following the head of the first user, computer system 600 moves the portion of computer system 600 in an even manner in level with the head of the first user, while as part of following the hands of the first user, computer system 600 moves the portion of computer system 600 in a more sweeping manner before changing elevation depending on the elevation of the hands of the first user. In some embodiments, computer system 600 moves the portion of computer system 600 while capturing a panorama image.

[0146] At FIG. 6G, a determination is made that the first user is within the center of the field-of-detection of the one or more camera sensors. Based on the determination that the first user is within the center of the field-of detection of computer system 600, computer system 600 ceases moving the portion of computer system 600. As illustrated in FIG. 6G, computer system 600 displays user image 624 centered within frame 610. In some embodiments, in response to detecting the first user centered within the field-of-detection of camera 602, computer system 600 repeats the steps described in FIGS. 6D-6E of (1) outputting audio content corresponding to asking if the first user is ready for the image content to be captured, (2) detecting a verbal input corresponding to a positive response (3) outputting audio content corresponding to a countdown and/or (4) capturing an image once computer system 600 reaches the end of the countdown.

[0147] At FIG. 6G, at a time after detecting the first user centered within the field-of- detection of camera 602, a determination is made that the first user is holding a pose for a predetermined amount of time. At FIG. 6G, based on the determination being made that the first user holds a pose for the predetermined amount of time, computer system 600 automatically (e.g., without intervening user input) captures a third image without outputting audio content corresponding to asking if the first user is ready for image content to be captured. As illustrated in FIG. 6G, in response to capturing the third image, computer system 600 displays a representation of the third image (e.g., fourth content 636) within content indicator 614 and ceases displaying the representation of the previously captured media content (e.g., third content 634). In some embodiments, while capturing media content, computer system 600 moves the portion of computer system 600 automatically to meet composition directions based on specific conditions indicated by the user. For example, in some embodiments, computer system 600 tracks the first user at a certain distance based on a time of day if the detected composition directions instruct computer system to track the first user based on the time of day. In some embodiments, before and/or while capturing media content, computer system 600 configures camera settings of camera 602 such that computer system 600 can satisfy detected composition directions. For example, in some embodiments, computer system 600 configures camera 602 to capture photos only in black and white in response to detecting composition directions that instruct computer system 600 to capture a still photo.

[0148] In some embodiments, computer system 600 automatically captures the third image in response to detecting the first user in a predetermined and/or known pose. In some embodiments, computer system 600 automatically captures the third image in response to detecting the first user holding a pose for a predetermined amount of time and detecting the first user centered within the field-of-detection of camera 602 (e.g., detecting the media content meeting the composition guidelines within composition directions 605f). In some embodiments, computer system 600 automatically captures image content in response to detecting the first user in a predetermined and/or known pose and detecting the first user centered within the field-of-detection of camera 602. In some embodiments, computer system 600 captures the third image after it is determined that the first user is centered within the field-of-detection of the one or more camera sensors in response to detecting a gaze in a particular direction and/or a gesture of the first user.

[0149] In some embodiments, computer system 600 does not move the portion of computer system 600 while capturing image content if the media capture operation corresponds to a still photo or a live photo (e.g., a photo operation where computer system 600 captures media data before and/or after capturing the photo). In some embodiments, if computer system 600 moves the portion of computer system 600 (e.g., to improve media content composition, while following the user, and/or in response to detecting user input), computer system 600 ceases moving the portion of computer system 600 while capturing image content if the media capture operation corresponds to a still photo or a live photo, as described above at FIGS. 6F-6G. For example, in some embodiments, computer system 600 moves the portion of computer system 600 to follow a user’s head as the user changes positions and poses. Computer system 600 ceases moving the portion of computer system 600 while capturing image content and reinitiates moving the portion of computer system 600 after the media capture operation is finished as the user changes to a new position and/or pose. In some embodiments, computer system 600 moves the portion of computer system 600 while capturing media. For example, in some embodiments, if the desired media is media of the first user running, computer system 600 moves the portion of computer system 600 to keep the first user within the field-of-detection of the one or more camera sensors while capturing the media.

[0150] At FIG. 6H, at a time after capturing the third image at FIG. 6G, diagram 618 includes object representation 638 to the right of and next to user representation 622, indicating an object is located next to the first user within the environment. As indicated by diagram 618 at FIG. 6H, user representation 622 and object representation 638 are located within representation of field-of-detection 654. Thus, at FIG. 6H, the first user and the object are located within the field-of-detection of the one or more camera sensors. As illustrated in FIG. 6H, because the first user and the object are within the field-of-detection of camera 602, computer system 600 displays user image 624 and object image 640 within live-view 608. Object image 640 is a representation of the object as detected by one or more camera sensors (e.g., camera 602). At FIG 6H, computer system 600 detects composition directions 605h from the first user corresponding to the first user instructing computer system 600 to capture video content (e.g., take a video).

[0151] At FIG. 61, in response to detecting composition directions 605h, computer system 600 configures itself to track the first user. At FIG. 61, in response to detecting composition directions 605h, computer system 600 outputs composition instructions 642. More specifically, at FIG. 61, computer system 600 outputs composition instructions 642 that state “move the triangle to the left.” Computer system 600 outputs composition instructions 642 because of the brevity and/or generality of composition directions 605h. Composition directions 605h does not contain any specific instructions and/or guidance for computer system 600 with respect to how the video should be taken. Accordingly, at FIG. 61, a determination is made with respect to the best manner in which to capture the video. The content of composition instructions 642 is based on the how it is determined to best capture the video.

[0152] In some embodiments, composition instructions output by computer system 600 are based on one or more objects that are detected by computer system 600 within the field- of-detection of the one or more camera sensors and/or the environment (e.g., objects, furniture, lighting, users, and/or type of background). For example, at FIG. 61, in response to detecting composition directions 605h, computer system 600 outputs composition instructions 642 which includes instructions corresponding to the first user and the object, both of which computer system 600 detects within the field-of-detection of camera 602. In some embodiments, composition instructions are based on a location of a user and/or object within the environment. For example, in some embodiments, computer system 600 outputs audio content corresponding to composition instructions such as, “stand in front of the triangle.”

[0153] In some embodiments, composition instructions output by computer system 600 includes prompts for one or more users to move within the environment. For example, in some embodiments, in response to detecting composition directions, computer system 600 outputs composition instructions that includes instructions for the first user to walk to the left within the environment. In some embodiments, composition instructions output by computer system 600 are based on the distance between an edge of the field-of-detection of the one or more camera sensors and the positioning of the subject (e.g., face of the user, body of the user, and/or extremity of the user). For example, based on a determination that the head of the user is a predetermined distance (e.g., .1-24 inches) from an edge of the field-of-detection of the one or more camera sensors, computer system 600 outputs audio content corresponding to instructions for the user to center themselves within the field-of-detection of the one or more camera sensors.

[0154] In some embodiments, composition instructions output by computer system 600 are based on horizontal lines detected within the field-of-detection of camera 602. For example, in some embodiments, computer system 600 outputs composition instructions that include instructions for the first user to straighten their shoulders so that the shoulders of the first user are horizontal within the field-of-detection of the one or more camera sensors.

[0155] In some embodiments, composition instructions output by computer system 600 are based on the position of the face of the first user. For example, in some embodiments, computer system 600 outputs composition instructions that include instructions for the first user to tilt the chin of the first user upward. In some embodiments, composition instructions output by computer system 600 are based on the position of the face of the first user relative to a fixed reference point. For example, in some embodiments, computer system 600 outputs composition instructions that include instructions for the first user to move their face until it is aligned with an edge of the triangle. For another example, in some embodiments, computer system 600 outputs composition instructions that include instructions for the first user to orient their face within the right third of the field-of-detection of the one or more camera sensors for a more dynamic layout. In some embodiments, composition instructions output by computer system 600 are based on the position of the face of the first user relative to the body of the first user. For example, in some embodiments, computer system 600 outputs composition instructions that include instructions for the first user to move their face forward until the face of the user is over the knee of the first user to create a foreground and a background in the media such that attention is drawn to the face of the user.

[0156] In some embodiments, composition instructions output by computer system 600 are more detailed when the instructions and/or composition directions are more general than when the instructions and/or composition directions are more detailed. For example, in some embodiments, computer system 600 outputs more detailed composition instructions in response to detecting composition directions such as, “take a photo of me and the dog,” in contrast to computer system 600 detecting composition directions such as, “take a photo of me and the dog in front of the tree with a soft filter and with me looking off and to the left.” In some embodiments, composition instructions output by computer system 600 are less detailed when the instructions and/or composition directions are more general than when the instructions and/or composition directions are more detailed. In some embodiments, composition instructions output by computer system 600 are different for the first user than for a second user, even if computer system 600 detects the same composition directions from the first user and the second user. For example, if both the first user and the second user ask computer system 600 to capture a video of them dancing using a style that mimics a respective musical from the 1940s, computer system 600 will output composition instructions such as, “squat down,” for the first user and, computer system 600 will output composition instructions such as, “bend at the knees” to the second user.”

[0157] In some embodiments, composition instructions output by computer system 600 include one or more prompts to change the lighting within the environment, such as change the amount of light (e.g., increase the amount of light or decrease the amount of light), change the type of lighting (e.g., direct lighting vs indirect lighting), change the color of the lighting, and/or change the color temperature of the lighting (e.g., make the lighting a warmer or cooler tone). In some embodiments, when computer system 600 is in communication with a lighting system (e.g., a smart lighting system) in the environment, computer system 600 changes the lighting in the environment in response to detecting composition directions from the first user. In some embodiments, computer system 600 asks permission prior to changing the lighting in the environment. In some embodiments, computer system 600 is granted permission from the first user to change the lighting in the environment prior to computer system 600 detecting composition directions from the user. In some embodiments, computer system 600 automatically (e.g., without detecting user input) changes the lighting in the environment.

[0158] At FIG. 6 J, as indicated by the positioning of user representation 622 and object representation 638 within diagram 618, the first user moves the object and themselves to the left within the environment. At FIG. 6J, a determination is made that the first user moves the object and themselves to the left. At FIG. 6 J, based on the determination that the first user moves the object and themselves to the left, computer system 600 moves the portion of computer system 600 in a third movement pattern (e.g., rotates the portion of computer system 600 to the left) such that the first user is centered within the field-of-detection of the one or more camera sensors at the initiation of the video capture operation. At FIG. 6J, a determination is made that a set of one or more criteria is satisfied (e.g., the first user gazes at computer system 600, the first user is in a particular pose and/or the first use performs a gesture). At FIG. 6J, based on the determination being made that the set of one or more criteria is satisfied, computer system 600 initiates the capture of video. In some embodiments, computer system 600 automatically selects an artistic stye for capturing the video based one on the detection of the occurrence of one or more conditions. For example, computer system 600 captures the video with an artistic style that is suitable for dark conditions when it is detected that the brightness of the environment is low.

[0159] Between FIGS. 6J and 6K, computer system 600 tracks the first user as the first user moves to the right within the environment. Computer system 600 captures a video of the first user as the first user moves to the right within the environment. At FIG. 6K, computer system 600 completes (e.g., ends) the video capturing process.

[0160] At FIG. 6K, as indicated the positioning of user representation 622 within diagram 618 and the positioning of object representation 638 within diagram 618, the first user has moved to the right, leaving the object behind. As illustrated in FIG. 6K, as a part of completing the video capturing process, computer system 600 displays a representation of the newly captured video content (e.g., fifth content 644) and ceases displaying the representation of the previously captured media content (e.g., fourth content 636). At FIG. 6K, computer system 600 detects composition directions 605k from the first user corresponding to the first user telling computer system 600 to capture video content in a style that John would. John is an individual that captures videos of subjects with a zoomed in appearance. Further, John typically captures videos of subjects as the individual moves within an environment. Composition directions 605k are of greater specificity than composition directions 605h. That is, composition directions 605k instruct computer system 600 to take a video in a specific style (e.g., similar to John) while composition directions 605h generically instructs computer system 600 to take a video.

[0161] At FIG. 6L, in response to detecting composition directions 605k, computer system 600 outputs composition instructions 646 corresponding to computer system 600 giving instructions to the first user (e.g., “walk slowly to your left along the wall”). Composition instructions 646 at FIG. 6L differs from composition instructions 642 at FIG. 6H because of the level in detail between composition directions 605h and composition directions 605k. At FIG. 6L, because composition directions 605k are more specific than composition direction 605h, computer system 600 does provide the first user with instructions on how to frame the environment (e.g., instructions to move objects within the environment to prepare for the video).

[0162] At FIG. 6L, as indicated by user image 624 being larger within live-view 608 than in FIG. 6K, in response to detecting composition directions 605k (e.g., composition directions within composition directions 605k), computer system 600 zooms in on the first user via camera 602. In some embodiments, computer system 600 outputs composition instructions based on detected elements in the physical environment. For example, in some embodiments, when computer system 600 detects that the physical environment is dark, computer system 600 outputs composition instructions that instruct the first user to increase the brightness of the physical environment.

[0163] Between FIGS. 6L and 6M, the first user moves to the left within the environment. Between FIGS. 6L and 6M, a determination is made that the first user is moving to the left. Based on the determination being made that the first user is moving to the left within the environment, computer system 600 initiates a video capturing operation and tracks the first user as the user moves. In some embodiments, computer system 600 initiates the video capturing operation based on a determination being made that the first user is following composition instructions 646. In some embodiments, while capturing video content, computer system 600 moves the portion of computer system 600 automatically (e.g., without intervening user input) to satisfy detected composition directions. For example, in some embodiments, while capturing video content, computer system 600 moves the portion of computer system 600 in a manner that creates the impression that the first user is being followed by an animal, based on detected composition directions from the first user. In some embodiments, while capturing video content, computer system 600 automatically moves the portion of computer system 600 based on detected conditions in the environment. For example, while capturing video media, in response to computer system 600 detecting a parent helping their infant walk, computer system 600 moves the portion of computer system 600 close to the ground to capture video of the infant walking, moving slowly with the infant then backing up to get video of the parent and infant walking together. In some embodiments, while capturing media content, computer system 600 moves the portion of computer system 600 based on one or more settings of camera 602. For example, when an active setting of camera 602 requires users to be in the left third of the field-of detection of the one or more camera sensors prior to capturing media, computer system 600 automatically moves the portion of computer system 600 until the first user is aligned within a left or right third of the field-of-detection of the one or more camera sensors. For another example, when an active setting of camera 602 results in media being captured in only black and white, computer system 600 automatically moves the portion of computer system 600 differently than when an active setting of camera 602 results in media being captured with color.

[0164] At FIG. 6M, computer system 600 has completed capturing the video of the first user. As illustrated in FIG. 6M, after computer system 600 has completed capturing the video content, computer system 600 displays a representation of the newly captured video content (e.g., sixth content 648) and ceases displaying the representation of the previously captured media content (e.g., fifth content 644). At FIG. 6M, though computer system 600 has completed capturing the video content, computer system 600 maintains displaying the representation of the first user at the increased zoom level of the first user. In some embodiments, as a part of completing the capture of video content, computer system 600 decreases the zoom level of the first user.

[0165] FIG. 7 is a flow diagram illustrating a method (e.g., process 700) for selectively capturing media in accordance with some embodiments. Some operations in process 700 are, optionally, combined, the orders of some operations are, optionally, changed, and some operations are, optionally, omitted.

[0166] As described below, process 700 provides an intuitive way for selectively capturing media. Process 700 reduces the cognitive burden on a user, thereby creating a more efficient human-machine interface. For battery-operated computing devices, enabling a user to interact with such devices faster and more efficiently conserves power and increases the time between battery charges.

[0167] In some embodiments, process 700 is performed at a computer system (e.g., 600) that is in communication with a media capture component (e.g., 602) (e.g., a sensor, an environmental sensor, a capture component, a camera (e.g., a periscope camera, a telephoto camera, a wide-angle camera, and/or an ultra-wide-angle camera), a depth sensor, a microphone, a heart monitor, and/or a temperature sensor) and a microphone. In some embodiments, the computer system is a phone, a watch, a tablet, a fitness tracking device, a wearable device, a display, a movable computer system, an accessory, a speaker, a light, a head-mounted display (HMD), and/or a personal computing device. In some embodiments, the media capture component includes and/or is the microphone. In some embodiments, the microphone is different from the media capture component.

[0168] The computer system detects (702) (and/or receives), via the microphone, a first input (e.g., a verbal media capture instruction) corresponding to a request to capture media (e.g., 605a, 605f, 605h, and/or 605k) (e.g., an image, a video, and/or audio). In some embodiments, the first input is detected while in a media capture mode (e.g., a mode in which the computer system is configured to capture an image, a video, and/or audio). In some embodiments, while in the media capture mode, the computer system displays, via a display component in communication with the computer system, a user interface corresponding to the media capture mode. In some embodiments, while in the media capture mode, the computer system displays, via a display component in communication with the computer system, a live preview of output (e.g., an image, a video, and/or audio) of the media capture component.

[0169] After (704) (and/or in conjunction with and/or in response to) detecting the first input corresponding to the request to capture media (e.g., 605a, 605f, 605h, and/or 605k) (and/or while in the media capture mode) and in accordance with a determination that the first input (e.g., 605a, 605f, 605h, and/or 605k) corresponds to a first instruction (and/or a first set of one or more instructions), the computer system captures (706), via the media capture component (e.g., 602), media (e.g., 616, 632, 634, 636, 644, and/or 648) (e.g., an image, a video, and/or audio) in response to a first set of one or more conditions being satisfied (e.g., the media is not captured while the first set of one or more conditions is not satisfied and/or until the first set of one or more conditions is satisfied). In some embodiments, in response to detecting the first input corresponding to the request to capture media and/or in accordance with a determination that the first input corresponds to the first instruction, the computer system captures media when, in accordance with, and/or in response to the first set of one or more conditions being satisfied. In some embodiments, in response to detecting the first input corresponding to the request to capture media and/or in accordance with a determination that the first input corresponds to the first instruction, the computer system is configured to capture media when, in accordance with, and/or in response to the first set of one or more conditions being satisfied. In some embodiments, after (and/or in conjunction with or in response to) detecting the first input corresponding to the request to capture media (and/or while in the media capture mode) and in accordance with a determination that the first input corresponds to a second instruction different from the first instruction, the computer system does not capture, via the media capture component, media (e.g., an image, a video, and/or audio) when, in accordance with, and/or in response to the first set of one or more conditions being satisfied. In some embodiments, in response to detecting the first input corresponding to the request to capture media and/or in accordance with a determination that the first input corresponds to the second instruction, the computer system does not capture media when, in accordance with, and/or in response to the second set of one or more conditions being satisfied. In some embodiments, in response to detecting the first input corresponding to the request to capture media and/or in accordance with a determination that the first input corresponds to the second instruction, the computer system is not configured to capture and/or configured to not capture media when, in accordance with, and/or in response to the second set of one or more conditions being satisfied. In some embodiments, after (and/or in conjunction with or in response to) detecting the first input corresponding to the request to capture media (and/or while in the media capture mode), media is not captured while the first set of one or more conditions is not satisfied.

[0170] After (704) detecting the first input corresponding to the request to capture media and in accordance with a determination that the first input (e.g., 605a, 605f, 605h, and/or 605k) corresponds to a second instruction different from the first instruction (and/or a second set of one or more instructions different from the first set of one or more instructions), the computer system captures (708), via the media capture component (e.g., 602), media (e.g., 616, 632, 634, 636, 644, and/or 648) (e.g., an image, a video, and/or audio) in response to a second set of one or more conditions being satisfied (e.g., the media is not captured while the second set of one or more conditions is not satisfied and/or until the second set of one or more conditions is satisfied), wherein the second set of one or more conditions is different from the first set of one or more conditions. In some embodiments, in response to detecting the first input corresponding to the request to capture media and/or in accordance with a determination that the first input corresponds to the second instruction, the computer system captures media when, in accordance with, and/or in response to the second set of one or more conditions being satisfied. In some embodiments, in response to detecting the first input corresponding to the request to capture media and/or in accordance with a determination that the first input corresponds to the second instruction, the computer system is configured to capture media when, in accordance with, and/or in response to the second set of one or more conditions being satisfied. In some embodiments, after (and/or in conjunction with or in response to) detecting the first input corresponding to the request to capture media (and/or while in the media capture mode) and in accordance with a determination that the first input corresponds to the first instruction, the computer system does not capture, via the media capture component, media (e.g., an image, a video, and/or audio) when, in accordance with, and/or in response to the second set of one or more being satisfied. In some embodiments, in response to detecting the first input corresponding to the request to capture media and/or in accordance with a determination that the first input corresponds to the first instruction, the computer system does not capture media when, in accordance with, and/or in response to the second set of one or more conditions being satisfied. In some embodiments, in response to detecting the first input corresponding to the request to capture media and/or in accordance with a determination that the first input corresponds to the first instruction, the computer system is not configured to capture and/or configured to not capture media when, in accordance with, and/or in response to the second set of one or more conditions being satisfied. In some embodiments, after (and/or in conjunction with or in response to) detecting the first input corresponding to the request to capture media (and/or while in the media capture mode), media is not captured and/or the computer system does not capture media while the second set of one or more conditions is not satisfied. In some embodiments, in accordance with a determination that the first input corresponds to the first instruction (and/or a first set of one or more instructions), the computer system displays and/or saves a representation of the media. In some embodiments, in accordance with a determination that the first input corresponds to the second instruction (and/or a first set of one or more instructions), the computer system displays and/or saves a representation of the media. Selectively capturing media when a set of prescribed conditions is met (e.g., the first input corresponds to the first instruction of the second instruction) automatically allows the computer system to perform the media capturing process based on specific guidelines that are expressed by a user such that the media capturing process is tailored to the user’s desires, thereby performing an operation when a set of conditions has been met without requiring further user input.

[0171] In some embodiments, the computer system (e.g., 600) is in communication (e.g., wireless communication and/or wired communication) with a first movement component (e.g., an actuator (e.g., a pneumatic actuator, hydraulic actuator, and/or an electric actuator), a movable base, a rotatable component, a motor, a lift, a level, and/or a rotatable base). In some embodiments, after detecting the first input corresponding to the request to capture media (e.g., 605a, 605f, 605h, and/or 605k) (e.g., and while the computer system is in the media capture mode for media capture component) (e.g., in response to detecting the first input corresponding to the request to capture media), the computer system moves, via the first movement component, a portion of the computer system (e.g., as described above with respect to FIG. 6G) (e.g., a portion of the computer system that includes the media capture component) (e.g., the computer system translates and/or rotates). In some embodiments, the portion of the computer system moves about a single axis of the movement component. In some embodiments, the portion of the computer system moves about two or more axes of the movement component. In some embodiments, the portion of the computer system moves in two different manners (e.g., the computer system rotates, tilts, and/or translates). In some embodiments, the portion of the computer system moves in two different directions (e.g., to the left and upwards, tilts up and moves right, or backwards and to the right). In some embodiments, the portion of the computer system moves in a single direction. In some embodiments, the portion of the computer system ceases moving while capturing and/or to capture the media. In some embodiments, the portion of the computer system continues to move while capturing the media. In some embodiments, a portion (e.g., a movable arm, a hinge, and/or a base) of the computer system moves while, in some embodiments, another portion of the computer system does not move. [0172] In some embodiments, after detecting the first input corresponding to the request to capture media (e.g., 605a, 605f, 605h, and/or 605k), the computer system moves, via the first movement component, a position of the media capture component (e.g., 602) so that a field of view (e.g., 654) of the media capture component (e.g., 602) moves from a first position to a second position different from the first position (e.g., as described above at FIG. 6G). In some embodiments, moving the computer system causes framing of a user to change as the computer system moves (e.g., the user is in a left portion of the field of view of the media capture component when the computer system begins to move, and the user is in a right portion of the field of view of the media capture component when the computer system stops moving). In some embodiments, the computer system moves as the user moves. In some embodiments, the computer system moves the position of the media capture component to keep the user within the field of view of the media capture component. Moving, via the first movement component, the position of the media capture component so that a field of view the first movement component moves from a first position to a second position after detecting the first input corresponding to the request to capture media allows the computer system to better position the media capture component to capture the media, thereby providing additional control options without cluttering the user interface with additional displayed controls and performing an operation when a set of conditions has been met without requiring further user input.

[0173] In some embodiments, after detecting the first input (e.g., 605a, 605f, 605h, and/or 605k) and in accordance with a determination that the first input includes one or more instructions (e.g., audible instructions and/or a gesture-based instruction) for the computer system (e.g., 600) to move in a first manner (e.g., the first input includes a voice command directing the computer system to move in the first manner) (e.g., the first manner includes translation in one or more directions and/or rotation about one or more axes of the movement component), a portion of the computer system (e.g., 600) (e.g., a portion of the computer system that includes the media capture component) moves in the first manner. In some embodiments, after detecting the first input and in accordance with a determination that the first input includes one or more instructions for the computer system to move in a second manner (e.g., the first input includes a voice command directing the computer system to move in the second manner) (e.g., the second manner includes translation in one or more directions and/or rotation about one or more axes of the movement component) different from the first manner, the portion of the computer system moves in the second manner (e.g., and not the first manner) different from the first manner (e.g. as described above in relation to FIGS. 6C and 6G). In some embodiments, in accordance with a determination that the first input includes instructions corresponding to the first manner and the second manner, the computer system moves in the first manner and the second manner. In some embodiments, the computer system moves in the first manner and the second manner sequentially. In some embodiments, the computer system concurrently moves in the first manner and the second manner. In some embodiments, the first manner is the same as the second manner. In some embodiments, moving in the first manner and/or the second manner causes the portion of the computer system to translate and/or rotate. In some embodiments, moving in the first manner and/or the second manner causes the portion of the computer system to move in a respective movement pattern (e.g., determined by the computer system and/or determined by a user of the computer system). In some embodiments, the portion of the computer system moves at a different speed and/or direction while the portion of the computer system moves in the first manner in contrast to when the portion of the computer system moves in the second manner. Moving in a respective manner based on which instructions are included in the first input automatically allows the computer system to move in various manners based on the instructions of the user, thereby performing an operation when a set of conditions has been met without requiring further user input.

[0174] In some embodiments, detecting the first input corresponding to the request to capture media (e.g., 605a, 605f, 605h, and/or 605k) includes capturing, via the microphone, one or more verbal instructions (e.g., the instructions included in 605a, 605f, 605h, and/or 605k). In some embodiments, the first and/or second instructions include one or more verbal instructions (e.g., spoken instructions and/or audible instructions). In some embodiments, the computer system makes a determination that the verbal instructions are provided by a primary user (e.g., a user that is registered with the computer system and/or a targeted user). In some embodiments, the computer system makes a determination that the verbal instructions are provided by a non-primary user (e.g., a user that is not registered with the computer system and/or a user who is a non-targeted user). Capturing, via the media capture component, media after one or more verbal instructions are detected allows the computer system to perform the media capturing operation without displaying a respective user interface, thereby providing additional control options without cluttering the user interface with additional displayed controls and providing improved feedback (e.g., that the first input was detected). [0175] In some embodiments, detecting the first input corresponding to the request to capture media (e.g., 605a, 605f, 605h, and/or 605k) includes capturing, via one or more input devices (e.g., a media capture component and/or another type of device or sensor, and/or a gesture being performed) one or more gesture-based instructions (e.g., as described above at FIG. 6 A) (e.g., one or more air gestures that include movement of a portion of a body of a user in the air). In some embodiments, the first instruction and/or second instruction includes gesture-based instructions (e.g., the user points in a direction, the user makes a hand gesture towards a direction, the user directs their body in a direction, and/or the user walks in a direction). In some embodiments, the instructions are a combination of verbal instructions and gesture-based instructions. In some embodiments, the computer system makes a determination that the gesture-based instructions are provided by a primary user (e.g., a user that is registered with the computer system and/or a targeted user). In some embodiments, the computer system makes a determination that the gesture-based instructions are provided by a non-primary user (e.g., a user that is not registered with the computer system and/or a user who is a non-targeted user). Capturing, via the media capture component, media after one or more gesture-based instructions are detected allows the computer system to perform the media capturing operation without displaying a respective user interface, thereby providing additional control options without cluttering the user interface with additional displayed controls and providing improved feedback.

[0176] In some embodiments, capturing, via the media capture component (e.g., 602), media (e.g., 616, 632, 634, 636, 644, and/or 648) (e.g., an image, a video, and/or audio) in response to the first set of one or more conditions being satisfied includes moving a portion of the computer system (e.g., 600) (e.g., a portion of the computer system that includes the media capture component) in a third manner (e.g., based on a first set of composition rules (e.g., whether a portion of user is within a certain area (e.g., high, middle, and/or lower third) of the field of view) (e.g., whether one or more objects are captured at a certain zoom level, with a certain filter, with a certain color, and/or with a certain amount of light) determined based on the first set of one or more conditions being satisfied and/or not based on the second set of one or more conditions being satisfied). In some embodiments, capturing, via the media capture component, media (e.g., an image, a video, and/or audio) in response to the second set of one or more conditions being satisfied includes moving the portion of the computer system in a fourth manner (e.g., based on a second set of composition rules determined based on the second set of one or more conditions being satisfied and/or not based on the first set of one or more conditions being satisfied) different from the third manner (e.g., as described above at FIG. 6L). In some embodiments, moving in the third manner causes the portion of the computer system to translate and/or rotate. In some embodiments, moving the in the third manner causes the portion of the computer system to move in a respective movement pattern. In some embodiments, the portion of the computer system moves at a different speed and/or direction while the portion of the computer system moves in the third manner in contrast to when the portion of the computer system moves in the first and/or second manner.

[0177] In some embodiments, the computer system (e.g., 600) is in communication with a second movement component (e.g., an actuator (e.g., a pneumatic actuator, hydraulic actuator and/or an electric actuator), a movable base, a rotatable component, a motor, a lift, a level, and/or a rotatable base). In some embodiments, while capturing, via the media capture component (e.g., 602), the media (e.g., 616, 632, 634, 636, 644, and/or 648) and in accordance with a determination that the media being captured is a first type of media (e.g., video media or a still photograph), the computer system moves (e.g., translational movement and/or rotational movement), via the second movement component, a portion of the computer system (e.g., a portion of the computer system that includes the media capture component) during (and/or while) the capture of the media (e.g., of the first type of media) (e.g., as described above at FIG. 6G and 6M). In some embodiments, while capturing, via the media capture component, the media and in accordance with a determination that the media being captured is a second type of media (e.g., video media or a still photograph) different from the first type of media, the computer system forgoes moving, via the second movement component, the portion of the computer system during the capture of the media (e.g., as described above at FIG. 6C) (of the second type of media). In some embodiments, the portion of the computer system stops moving in accordance with a determination that the computer system stops capturing the first type of media. In some embodiments, the portion of the computer system continues moving in accordance with a determination that the portion of the computer system stops capturing the first type of media. In some embodiments, the portion of the computer system moves such that the media capture component can track a user while the media capture component captures the media. In some embodiments, the portion of the computer system moves while capturing both the first type of media and the second type of media. Selectively moving the portion of the computer system based on the type of media that is being captured automatically allows the computer system to indicate whether the computer system is capturing a video or a still image, thereby performing an operation when a set of conditions has been met without requiring further user input and providing improved feedback.

[0178] In some embodiments, the first type of media (e.g., 616, 632, 634, 636, 644, and/or 648) is a video or a panoramic photo (e.g. a photo showing a field of view of an environment that is greater than the field of view of the media capture component). In some embodiments, the second type of media (e.g., 616, 632, 634, 636, 644, and/or 648) is a still photo (e.g., or a set of one or more animated images and/or photos (e.g., a photo that includes a representation of the field of view of the media capture component immediately before the request to capture the photo is detected and a representation of the field of view of the media capture component immediately after the request to capture the photo is detected)).

[0179] In some embodiments, the first input (e.g., 605a, 605f, 605h, and/or 605k) includes instructions (e.g., indication, voice command, directive, and/or order) for the computer system (e.g., 600) to delay capture of the media (e.g., 616, 632, 634, 636, 644, and/or 648) until the detection of a camera-detected input (e.g., as described above at FIG. 6B).

[0180] In some embodiments, the camera-detected input includes the detection (e.g., via the media capture component and/or a set of one or more cameras external to the computer system) of a gaze (e.g., as described above at FIG. 6B). In some embodiments, the computer system outputs an indication (e.g., a haptic indication, a graphical indication, and/or an audio indication) that the gaze is detected. In some embodiments, the gaze is directed towards the computer system. In some embodiments, the gaze is directed away from the computer system. In some embodiments, the gaze is sustained for a predetermined period of time (e.g., 1-15 seconds). Delaying the capture of the media until the computer system detects the gaze allows the computer system to perform a media capture operation without requiring a user to move outside of the field of view of the media capture component to select a user interface object that is displayed by the computer system, thereby providing additional control options without cluttering the user interface with additional displayed controls.

[0181] In some embodiments, the camera-detected input includes the detection of a gesture (e.g., smiling gesture, thumbs up gesture, pointing of index finger gesture, and/or hands raised above head gesture) (e.g., as described above at FIG. 6B). In some embodiments, the gesture is directed (e.g., targeted, focused, pointed at) at the computer system. In some embodiments, the gesture is directed (e.g., targeted, focused, pointed at) at an object (e.g., an individual, an inanimate object, and/or an animal). In some embodiments, the computer system outputs an indication (e.g., a haptic indication, a graphical indication, and/or an audio indication) that the gesture is detected. Delaying the capture of the media until the computer system detects the gesture allows the computer system to perform a media capture operation without requiring a user to move outside of the field of view of the media capture component to select a user interface object that is displayed by the computer system, thereby providing additional control options without cluttering the user interface with additional displayed controls.

[0182] In some embodiments, the camera-detected input includes the detection of a pose (e.g., as described above at FIG. 6B) (e.g., a seated pose, a standing pose, a pose involving two or more individuals (e.g., a human pyramid and/or the interlocking of arms) and/or a kneeling pose). In some embodiments, the computer system outputs an indication (e.g., a haptic indication, a graphical indication, and/or an audio indication) that the pose is detected. In some embodiments, the computer system outputs a graphical representation of the pose. Delaying the capture of the media until the computer system detects the pose allows the computer system to perform a media capture operation without requiring a user to move outside of the field of view of the media capture component to select a user interface object that is displayed by the computer system, thereby providing additional control options without cluttering the user interface with additional displayed controls.

[0183] In some embodiments, the first input (e.g., 605a, 605f, 605h, and/or 605k) includes a set of one or more temporal based instructions (e.g., when to begin capture of the media, when to end capture of the media, and/or how long to capture media for) (e.g., temporal boundaries and/or temporal guidelines) indicating one or more media capture parameters (e.g., as described above at FIG. 6B). In some embodiments, the computer system outputs an indication (e.g., a visual indication, a haptic indication and/or audible indication) that temporal based instructions have been detected (e.g., a series of beeps that represent an amount of seconds until the initiation of the media capture operation, a vocal indication of the length of the media capture operation (e.g.,” video will be captured for 5 seconds” and/or “video will last 30 seconds”)) and optionally indicates an associated temporal duration associated with the temporal based instructions. Capturing media after the set of one or more temporal based instructions are detected allows the computer system to capture media in accordance with the temporal guidelines included in the first input such that users included in the media are properly framed and aligned within the field of view of the media capture component while the media is captured, thereby providing additional control options without cluttering the user interface with additional displayed controls and performing an operation when a set of conditions has been met without requiring further user input.

[0184] In some embodiments, the set of one or more temporal based instructions (e.g., audible instructions and/or gesture-based instructions) includes one or more indications of when capture of media (e.g., 616, 632, 634, 636, 644, and/or 648) will be initiated (e.g., as described above at FIG. 6B) (e.g., “initiate the capture of media in 10 seconds” and/or “initiate the capture of media at 3:30 PM”). In some embodiments, the initiation of the capture of the media is based on timing (e.g., time of day, countdown, timing since user last interacted with the computer system, and/or timing since user last looked at the computer system). In some embodiments, the initiation of the capture of the media is based on an amount of time that the user is positioned in a specific pose. In some embodiments, the initiation of the capture of media is based on an amount of time that has elapsed since the user performed a gesture. Capturing media after one or more indications of when capture of media will be initiated are detected allows the computer system to initiate the capture media at an appropriate time such that the content of the media are properly framed and aligned within the field of view of the media capture component at the initiation of the media capturing process, thereby providing additional control options without cluttering the user interface with additional displayed controls and performing an operation when a set of conditions has been met without requiring further user input.

[0185] In some embodiments, the set of one or more temporal based instructions includes one or more indications (e.g., audible instructions and/or gesture-based instructions) of when capture of media (e.g., 616, 632, 634, 636, 644, and/or 648) will stop (e.g., “cease the capture of media in 10 seconds” and/or “cease the capture of media at 3:30 PM”) (e.g., as described above at FIG. 6B). In some embodiments, the ceasing of the capture of the media is based on timing (e.g., time of day, countdown, timing since user last interacted with computer, and/or timing since user last looked at the computer system). In some embodiments, the ceasing of the capture of the media is based on an amount of time that the user is positioned in a specific pose. In some embodiments, the ceasing of the capture of media is based on an amount of time that has elapsed since the user performed a gesture. Capturing media after one or more indications of when capture of media will stop are detected allows the computer system to cease the capture media at an appropriate time such that only desired content is captured by the media capture component, thereby providing additional control options without cluttering the user interface with additional displayed controls and performing an operation when a set of conditions has been met without requiring further user input.

[0186] In some embodiments, the set of one or more temporal based instructions include one or more indications of (e.g., audible instructions and/or gesture-based instructions) a time interval (e.g., 1-60 seconds) between the capture of separate (and/or different) media items (e.g., 616, 632, 634, 636, 644, and/or 648) (e.g., as described above at FIG. 6B). In some embodiments, the separate media items are different types of media items. In some embodiments, the separate types of media items are the same type of media items. In some embodiments, a second media item is automatically (e.g., without intervening user input) captured after a first media item in accordance with a determination that the time interval has expired. In some embodiments, the computer system outputs (e.g., audibly outputs and/or displays) a countdown of the time interval after capturing an initial media item. Capturing media after one or more indications of a time interval between the capture of separate media items are detected allows the computer system to pause for an adequate amount of time between the capture of different media items such that content of the media items are allowed to be realigned within the field of view of the media capture component, thereby providing additional control options without cluttering the user interface with additional displayed controls and performing an operation when a set of conditions has been met without requiring further user input.

[0187] In some embodiments, the first input (e.g., 605a, 605f, 605h, and/or 605k) includes an indication of a composition guidance (e.g., guidelines that dictate how users in the media should be spatially oriented, guidelines on how users should be positioned relative to each other, and/or guidelines on how objects should be spatially oriented within the field of view of the media capture component) (e.g., rule of thirds, golden ratio, golden triangles, rule of space, rule of odds, and/or user of black and white) related to (e.g., indicating how and/or why media will be captured) capturing the media (e.g., 616, 632, 634, 636, 644, and/or 648) (e.g., as described above at FIG. 6F). In some embodiments, the first set of one or more conditions and/or the second set of one or more conditions are satisfied based on a determination that the composition guidelines are adhered to. In some embodiments, the media is not captured until the composition guidelines are satisfied. In some embodiments, the media is captured without the composition guidelines being adhered to. Capturing media after an indication of a composition guidance is detected allows the computer system to capture media based on the spatial arrangement of users within the field of view of the media capture satisfying the composition guidelines, thereby providing additional control options without cluttering the user interface with additional displayed controls and performing an operation when a set of conditions has been met without requiring further user input.

[0188] In some embodiments, the first set of one or more conditions (and/or second set of one or more conditions) does not include a condition corresponding to a detection (e.g., by the computer system, media capture component, an external media capture component, and/or by an external computer system) of an input (e.g., a tap input, a swipe input, voice command, rotation of a rotatable input mechanism, and/or air hand gesture) corresponding to a first user (e.g., an input that is performed by a user) (e.g., as described above at FIG. 6E). Capturing media without detecting an input corresponding to the first user allows the computer system to perform the media capturing process without requiring that the first user physically interact with (e.g., select) a user interface object that is displayed by the computer system, thereby providing additional control options without cluttering the user interface with additional displayed controls.

[0189] In some embodiments, the first set of one or more conditions includes a condition that is satisfied when a determination (e.g., made by the computer system and/or made by the media capture component) is made that a person in a field of view (e.g., 654) of the media capture component (e.g., 602) has stopped moving (e.g., in a certain manner (e.g., stops walking, stops jumping, stops exercising, and/or stops running in place) and/or at all) for a threshold amount of time (e.g., 1-60 seconds) (e.g., as described above at FIG. 6D). In some embodiments, the progression of the expiration of the threshold amount of time begins in accordance with a determination that the person has not moved for a predefined period of time. In some embodiments, the progression of the expiration of the threshold amount of time is paused in accordance with a determination that the person transitions from not moving to moving. In some embodiments, the progression of the expiration of the threshold amount of time restarts based on a determination that the person begins to move. Capturing media based on a determination that the person has stopped moving for a threshold amount of time allows the computer system to perform the media capturing process without requiring that the person physically interact with (e.g., select) a user interface object that is displayed by the computer system, thereby providing additional control options without cluttering the user interface with additional displayed controls and performing an operation when a set of conditions has been met without requiring further user input.

[0190] In some embodiments, the first set of one or more conditions includes a condition that is satisfied when a determination is made that (e.g., made by the computer system and/or made by the media capture component) a person in a field of view (e.g., 654) of the media capture component (e.g., 602) is positioned in a respective pose (e.g., as described above at FIG. 6D) (e.g., while the person is within the field of view of the media capture component) (e.g., the person is sitting, the person is kneeling, the distance between the person and one or more respective users is below or above a threshold). In some embodiments, the conditions are satisfied based on a determination that the person has been positioned in the respective pose for a predetermined period of time (e.g., 1-30 seconds). Capturing media based on a determination that the person is positioned in a respective pose allows the computer system to perform the media capturing process without requiring that the person physically interact with (e.g., select) a user interface object that is displayed by the computer system, thereby providing additional control options without cluttering the user interface with additional displayed controls and performing an operation when a set of conditions has been met without requiring further user input.

[0191] In some embodiments, before capturing, via the media capture component (e.g., 602), the media (e.g., 616, 632, 634, 636, 644, and/or 648) in response to the first set of one or more conditions being satisfied and in accordance with a determination that the first set of one or more conditions continue to be satisfied, the computer system displays a countdown (e.g., 630) of a period of time that has to elapse before the media is captured (e.g., as described above at FIG. 6E). In some embodiments, the first set of one or more conditions includes a condition that is satisfied based on a determination that the period of time (e.g., 1- 30 seconds) has expired. In some embodiments, the computer system displays a sequence of graphical elements that represent the progression of the period of time. Displaying a countdown of a period of time that has to elapse when a set of prescribed conditions are met (e.g., the first set of one or more conditions continued to be satisfied) automatically allows the computer system to indicate to a user an amount of time before the computer system will initiate capture of the media, thereby performing an operation when a set of conditions has been met without requiring further user input.

[0192] In some embodiments, while displaying the countdown of the period of time that has to elapse before the media (e.g., 616, 632, 634, 636, 644, and/or 648) is captured, the computer system detects that the first set of one or more conditions do not continue to be satisfied. In some embodiments, in response to detecting that the first set of one or more conditions do not continue to be satisfied, the computer system interrupts display of (e.g., ceasing to display, pausing display of, ceasing to show the time elapsing and/or the countdown, and/or ceasing to animate) the countdown of the period of time that has to elapse before the media is captured. In some embodiments, the first set of one or more conditions is not satisfied based on a determination that a criterion is not met during the predefined period of time. In some embodiments, a progression of the predefined period of time is interrupted based on a determination being made that the criteria is not met during the predefined period of time. Interrupting display of the countdown of the period of time that has to elapse before the media is captured in response to detecting that the first set of one or more conditions do not continue to be satisfied allows the computer system to provide an indication to a user that the first set of one or more conditions is no longer satisfied, thereby providing improved visual feedback and providing additional control options without cluttering the user interface with additional displayed controls.

[0193] In some embodiments, while displaying the countdown of the period of time that has to elapse before the media (e.g., 616, 632, 634, 636, 644, and/or 648) is captured, the computer system detects a request (e.g., an explicit request and/or a direct request (e.g., “capture media anyway,” “capture media now,” “ignore countdown and take photo immediately”)) to capture the media before the period of time has elapsed before the media is captured. In some embodiments, in response to detecting the request to capture the media before the period of time has elapsed before the media is captured, the computer system interrupts display of (e.g., ceasing to display, pausing display of, ceasing to show the time elapsing and/or the countdown, and/or ceasing to animate) the countdown of the period of time that has to elapse before the media is captured (e.g., a s described above at FIG. 6E). In some embodiments, the first set of one or more conditions is not satisfied based on a detection of a request to capture media. In some embodiments, a progression of the predefined period of time is interrupted (e.g., a countdown is paused or a countdown progression is ceased) in response to detecting the request to capture media. In some embodiments, the progression of the predefined period is restarted after the request to capture media is detected. Interrupting the display of the countdown of the period of time that has to elapse before the media is captured in response to detecting the request to capture the media before the period of time has elapsed before the media is captured allows the computer system to indicate that the computer system will proceed with capturing the media without respect to the status of the countdown, thereby providing improved visual feedback and providing additional control options without cluttering the user interface with additional displayed controls

[0194] In some embodiments, the media is a first media item (e.g., 616, 632, 634, 636, 644, and/or 648). In some embodiments, after capturing the first media item (e.g., immediately after capturing the first media item or within a predefined period of time (e.g., 1- 600 seconds) after capturing the first media item), the computer system detects, via the media capture component (e.g., 602), a change in a pose of a person in a field of view (e.g., 654) of the media capture component (e.g., the first user transitions from sitting to standing or vice versa). In some embodiments, in response to detecting the change in the pose of the person in the field of view of the media capture component, the computer system captures, via the media capture component, a second media item (e.g., 616, 632, 634, 636, 644, and/or 648) (e.g., as described above at FIG. 6F) (e.g., different from the first media item) (e.g., a video and/or a still photo). In some embodiments, the computer system outputs an indication (e.g., a haptic indication, a graphical indication, and/or an audio indication) that the second media item will be captured. In some embodiments, the change in the pose of the person is detected automatically (e.g., without intervening user input). In some embodiments, the media capture component does not capture the second media item in response to detecting the change in the pose of the person in accordance with a determination that the change in the pose does not satisfy a set of one or more criteria. Capturing, via the media capture component, the second media item in response to detecting the change in the pose of the person allows the computer system to perform a media capture process without requiring that the person physically interact with (e.g., select) a respective user interface displayed by the computer system, thereby providing additional control options without cluttering the user interface with additional displayed controls. [0195] In some embodiments, the media is a third media item (e.g., 616, 632, 634, 636, 644, and/or 648). In some embodiments, after capturing the third media item (e.g., immediately after capturing the third media item or within a predefined period of time (e.g., 1-600 seconds) of capturing the third media item), the computer system detects a change in a pose of a respective person in a field of view (e.g., 654) of the media capture component (e.g., 602) (e.g., the respective person transitions from sitting to standing or vice versa). In some embodiments, in response to detecting the change in the pose of the respective person and in accordance with a determination the respective person is positioned in one or more target poses (e.g., a predefined pose (e.g., a pose that is predefined by the computer system or by the respective person), a pose that corresponds to the first input, a pose that causes the majority of the respective person to be positioned within the field of view of the media capture component, a pose that matches the pose of a respective user within that field of view of the media capture component), the computer system captures, via the media capture component, a fourth media item (e.g., 616, 632, 634, 636, 644, and/or 648) (e.g., different from or separate from the first media item). In some embodiments, in response to detecting the change in the pose of the respective person and in accordance with a determination that the respective person is not positioned in the one or more target poses, the computer system forgoes capturing the fourth media item. In some embodiments, the computer system outputs an indication (e.g., a haptic indication, a graphical indication, and/or an audio indication) that the fourth media item will be captured. In some embodiments, the fourth media item and the third media item are the same types of media items or the fourth media item and the third media item are different types of media items. In some embodiments, the computer system ceases to capture the fourth media item in accordance with a determination that the respective person is no longer positioned in the target pose. Selectively capturing the fourth media item based on whether the respective person is in a respective pose automatically allows the computer system to perform a media capturing process (e.g., or forgo performing the media capture process) based on the detected pose of the respective person, thereby performing an operation when a set of conditions has been met without requiring further user input and providing additional control options without cluttering the user interface with additional displayed controls.

[0196] In some embodiments, the media is a fifth media item (e.g., 616, 632, 634, 636, 644, and/or 648). In some embodiments, after capturing the fifth media item (e.g., immediately after capturing the third media item or within a predefined period of time (e.g., 1-600 seconds) of capturing the third media item), the computer system detects a change in a pose of a respective person in a field of view (e.g., 654) of the media capture component (e.g., 602) (e.g., the respective person transitions from sitting to standing or vice versa). In some embodiments, in response to detecting the change in the pose of the respective person and in accordance with a determination that change in the pose of the respective person was detected within a threshold amount of time (e.g., 1-60 seconds) of capturing the fifth media item, the computer system captures, via the media capture component, a sixth media item (e.g., 616, 632, 634, 636, 644, and/or 648) (e.g., a still photo and/or a video) (e.g., different from and/or separate from the fifth media item). In some embodiments, in response to detecting the change in the pose of the respective person and in accordance with a determination that change in the pose of the respective person was not detected within the threshold amount of time of capturing the fifth media item, the computer system forgoes capturing, via the media capture component, the sixth media item (and, in some embodiments, any media item). In some embodiments, the computer system outputs an indication (e.g., a haptic indication, a graphical indication, and/or an audio indication) that the fifth media item will be captured. In some embodiments, the fifth media item and the sixth media item are the same types of media items. In some embodiments, the fifth media item and the sixth media item are different types of media items. In some embodiments, the computer system ceases to capture the sixth media item in accordance with a determination that the respective person is no longer positioned in the target pose. Selectively capturing the sixth media item based on whether a change in pose of the respective person is detected within the threshold amount of time of capturing the fifth media item automatically allows the computer system to intelligently perform additional media capturing processes (e.g., or intelligently forgo performing additional media capturing processes) based on a change of pose of the fifth media item, thereby performing an operation when a set of conditions has been met without requiring further user input and providing additional control options without cluttering the user interface with additional displayed controls.

[0197] Note that details of the processes described above with respect to process 700 (e.g., FIG. 7) are also applicable in an analogous manner to other methods described herein. For example, process 800 optionally includes one or more of the characteristics of the various methods described above with reference to process 700. For example, the movement patterns of process 800 can be used to position a camera after detecting the first input of process 700. For brevity, these details are not repeated herein. [0198] FIG. 8 is a flow diagram illustrating a method (e.g., process 800) for repositioning a camera in accordance with some embodiments. Some operations in process 800 are, optionally, combined, the orders of some operations are, optionally, changed, and some operations are, optionally, omitted.

[0199] As described below, process 800 provides an intuitive way for repositioning a camera. Process 800 reduces the cognitive burden on a user, thereby creating a more efficient human-machine interface. For battery-operated computing devices, enabling a user to interact with such devices faster and more efficiently conserves power and increases the time between battery charges.

[0200] In some embodiments, process 800 is performed at a computer system (e.g., 600) that is in communication with a media capture component (e.g., 602) (e.g., a periscope camera, a telephoto camera, a wide-angle camera, and/or an ultra-wide-angle camera) and a movement component (e.g., an actuator (e.g., a pneumatic actuator, hydraulic actuator and/or an electric actuator), a movable base, a rotatable component, a motor, a lift, a level, and/or a rotatable base) (e.g., different and/or separate from the media capture component). In some embodiments, the computer system is a phone, a watch, a tablet, a fitness tracking device, a wearable device, a display, a movable computer system, an accessory, a speaker, a light, a head-mounted display (HMD), and/or a personal computing device. In some embodiments, the computer system includes the media capture component and/or the movement component.

[0201] While (802) capturing video (and/or media) via the media capture component (e.g. 602) (and/or a microphone) and in accordance with a determination that a first set of one or more capture conditions is satisfied, the computer system moves (804) (e.g., physically moves), via the movement component, a portion (e.g., a physical portion, the camera, a display component, a display, a center and/or particular portion of a display, and/or a hardware button) of the computer system (e.g., 600) that includes the media capture component (e.g., 602) in (e.g., via, using, and/or with) a first movement pattern, wherein moving the portion of the computer system in the first movement pattern causes framing of the video to change as the portion of the computer system moves (e.g., as described above at FIGS. 6G, 6J, 6M). In some embodiments, before capturing video via the media capture component, the computer system detects, via an input component (e.g., a camera, a depth sensor, a microphone, a hardware input mechanism, a rotatable input mechanism, a heart monitor, a temperature sensor, and/or a touch-sensitive surface) in communication with the computer system, an input corresponding to a request to capture media. In some embodiments, in response to detecting the input corresponding to the request to capture media, the computer system initiates capture of video via the media capture component.

[0202] While (802) capturing video via the media capture component and in accordance with a determination that a second set of one or more capture conditions is satisfied, wherein the second set of one or more capture conditions is different from the first set of one or more capture conditions, the computer system moves (806) (e.g., physically moves), via the movement component, the portion (e.g., a physical portion, the camera, a display component, a display, a center and/or particular portion of a display, and/or a hardware button) of the computer system (e.g., 600) in (e.g., via, using, and/or with) a second movement pattern different from the first movement pattern, wherein moving the portion of the computer system in the second movement pattern causes framing of the video (e.g., 644 and/or 648) to change as the portion of the computer system moves (e.g., as described above at FIGS. 6G, 6 J, 6M) (e.g., without moving the portion of the computer system in the first movement pattern). In some embodiments, while capturing video via the camera, the computer system moves (e.g., physically moves), via the movement component, the portion of the computer system, wherein moving the portion of the first computer system is in the first movement pattern when, while, in conjunction with, and/or in response to the first set of one or more capture conditions being satisfied, and wherein moving the portion of the first computer system is in the second movement pattern when, while, in conjunction with, and/or in response to the second set of one or more capture conditions being satisfied. Selectively moving the portion of the computer system in a respective pattern when a set of prescribed conditions is satisfied (e.g., the first set of one or more capture conditions is satisfied or the second set of one or more capture conditions is satisfied) automatically allows the computer system to intelligently reposition itself such that the media capturing process and the appearance the resulting media item is improved, thereby performing an operation when a set of conditions has been met without requiring further user input.

[0203] In some embodiments, in accordance with a determination that the computer system (e.g., 600) is configured to capture video (e.g., 644 and/or 648) using a first artistic style (e.g., a style of capture that a particular video, photographic, and/or graphics editor use to capture video) (e.g., a style associated with one or more visual characteristics such as tone, color, shading, positioning of user within a field of view (e.g., head or another body part of user being in the bottom third, middle third, or top third of the field of view), amount of zoom, amount of focus applied to different objects in the field of view, focus more on the background and/or objects in the background, focus on the foreground and/or objects in the foreground, a manner in which an artist selectively chooses to portray their subject matter that is determined based on characteristics such as form, color, and/or composition), a set of visual characteristics and/or audio characteristics of media that has become associated with an artist, the manner in which an artistic selectively chooses to create the media (e.g., using a respective medium, using a respective lighting setting, spatial orientation of the computer system), the first set of one or more capture conditions is satisfied (and the second set of one or more capture conditions is not satisfied) (e.g., as described above at FIG. 6L). In some embodiments, in accordance with a determination that the media capture component are and/or the computer system is configured to capture video with a (e.g., based on, according to) first set of visual characteristics (e.g., tone, color, shading, positioning of users within the field of view of the media capture component, amount of zoom, focus is on foreground, focus is on background), the first set of one or more capture conditions is satisfied (e.g., and the second set of one or more capture conditions is not satisfied). In some embodiments, in accordance with a determination that the computer system is configured to capture video using a second artistic style different from the first artistic style, the second set of one or more capture conditions is satisfied (e.g., as described above at FIG. 6L) (e.g., and the first set of one or more capture conditions is not satisfied). In some embodiments, in accordance with a determination that the media capture component are and/or the computer system are configured to capture video with a (e.g., based on, according to) second set of visual characteristics (e.g., tone, color, shading, positioning of users within the field of view of the media capture component, amount of zoom, focus is on foreground, focus is on background) different from the first set of visual characteristics, the second set of one or more capture conditions is satisfied and the first set of one or more camera conditions is not satisfied. In some embodiments, the first set of visual characteristics and/or the second set of visual characteristics are based on a user preference. In some embodiments, the first set of visual characteristics and/or the second set of visual characteristics is based on a preference of an individual other than the user. In some embodiments, the computer system displays an indication of the respective set of visual characteristics that the video will be captured with while the media capture component is configured to capture video with the respective set of visual characteristics. Selectively moving in a respective manner when a set of prescribed prescriptions are met (e.g., the computer system is configured to capture video using a first or second artistic style) automatically allows the computer system to indicate to a user what style the computer system will capture the video with, thereby performing an operation when a set of conditions has been met without requiring further user input.

[0204] In some embodiments, the computer system detects an input (e.g., tap input, swipe input, rotation of a rotatable input mechanism, voice command, gaze, and/or air gesture) corresponding to a respective artistic style (e.g., 605b and/or 605f) (e.g., before and/or while capturing video (and/or media) via the media capture component). In some embodiments, after detecting the input corresponding to a respective artistic style and in response to the occurrence of a triggering condition corresponding to capturing video (e.g., 644 and/or 648) (e.g., a user input or an automatic trigger) and in accordance with a determination that the respective artistic style corresponds to the first artistic style, the computer system captures video using the first artistic style (e.g., as described above at FIGS. 6J, 6K, 6L and 6M) (and not the second artistic style and/or without configuring the computer system to capture video using the second artistic style). In some embodiments, after detecting the input corresponding to the respective artistic style and in response to the occurrence of the triggering condition corresponding to capturing video and in accordance with a determination that the respective artistic style corresponds to the second artistic style, the computer system captures video using the second artistic style (e.g., as described above at FIGS. 6J, 6K, 6L and 6M) (and not the first artistic style and/or without configuring the computer system to capture video using the first artistic style). In some embodiments, the media capture component are and/or the computer system is configured to captured video with the second set of visual characteristics in response to the computer system detecting an input that corresponds to a user. Capturing video using a respective artistic style after detecting an input corresponding to the respective artistic style and in response to the occurrence of the first triggering condition corresponding to capturing video automatically allows the computer system to capture video using a user preferred/requested style, thereby performing an operation when a set of conditions has been met without requiring further user input.

[0205] In some embodiments, the computer system detects an occurrence of one or more respective conditions without detecting an input (e.g., 605a, 605f, 605h, and/or 605k) from a user (e.g., before and/or while capturing video (and/or media) via the media capture component). In some embodiments, after detecting the occurrence of one or more respective conditions without detecting the input from the user and in response to the occurrence of a triggering condition corresponding to capturing video (e.g., 644 and/or 648) (e.g., a user input or automatic trigger) and in accordance with a determination that the one or more respective conditions corresponds to the first artistic style, the computer system captures video using the first artistic style (and not the second artistic style and/or without configuring the computer system to capture video using the second artistic style). In some embodiments, after detecting the occurrence of one or more respective conditions without detecting the input from the user and in response to the occurrence of the triggering condition corresponding to capturing video and in accordance with a determination that the one or more respective conditions corresponds to the second artistic style, the computer system captures video using the second artistic style (and not the first artistic style and/or without configuring the computer system to capture video using the first artistic style) (e.g., as described above at FIG. 6J). Capturing video using a respective artistic style after detecting the occurrence of one or more respective conditions without detecting the input from the user allows the computer system to capture video using an artistic style that best complements the one or more respective conditions, thereby performing an operation when a set of conditions has been met without requiring further user input.

[0206] In some embodiments, in accordance with a determination that a first set of one or more media capture settings (e.g., a flash setting, a brightness setting, a setting that indicates whether the computer system is configured to capture a particular type of media (e.g., a still photo, a video, a series of animated images, a panoramic photo, and/or a portrait)) of the computer system (e.g., 600) is active, the first set of one or more capture conditions is satisfied (e.g., and the second set of one or more capture conditions is not satisfied) (e.g., as described above at FIGS. 6L and 6M). In some embodiments, in accordance with a determination that a second set of one or more media capture settings, different from the first set of one or more media capture settings, of the computer system is active (e.g., and the first set of one or more media capture settings of the computer system is inactive), the second set of one or more capture conditions is satisfied (e.g., and the first set of one or more capture conditions is not satisfied) (e.g., as described above at FIGS. 6L and 6M). In some embodiments, the computer system displays an indication with respect to which respective media capture setting of the computer system is active. In some embodiments, the respective media capture setting corresponds to the type of media (e.g., still photo or video) that is captured via the media capture component. In some embodiments, the respective media setting corresponds to a configuration of the media capture component (e.g., the media capture component is configured to capture portraits, panoramic photos, time-lapse photos, and/or videos in slow motion). Moving the portion of the computer system in a respective manner (e.g., the first manner or the second manner) based on which set of one or more media capture settings is active allows the computer system to indicate which guidelines (e.g., settings) the computer system is following while capturing the video, thereby performing an operation when a set of conditions has been met without requiring further user input and providing improved feedback.

[0207] In some embodiments, the first media capture setting of the computer corresponds to the capture of media (e.g., video and/or still photos) with a first set of one or more colors (e.g., the media is captured with shades (e.g., black and white)). In some embodiments, the second media capture setting of the computer system (e.g., 600) corresponds to the capture of media with a second set of one or more colors (e.g., more colors than black and white) different form the first set of one or more colors (e.g., as described above at FIGS. 6L and 6M). In some embodiments, the computer system has a respective media capture setting where a first portion of media is captured without color and a second portion of media is captured with color. In some embodiments, the first set of colors does not include one or more colors from the second set of colors. In some embodiments, the second set of colors includes one or more colors from the first set of colors. In some embodiments, a color filter and/or a black and white color filter is applied to an image captured with the first set of colors and the color filter and/or the black and white color filter is not applied to an image captured with the second set of colors.

[0208] In some embodiments, moving the portion of the computer system (e.g., 600) that includes the media capture component (e.g., 602) in the first movement pattern includes performing, via the movement component, a first type of movement (e.g., translational movement and/or rotational movement and/or movement) and performing, via the movement component, a second type of movement different from the first type of movement (e.g., as described above at FIG. 6G). In some embodiments, moving the portion of the computer system that includes the media capture component in the second movement pattern includes performing, via the movement component, a third type of movement and performing, via the movement component, a fourth type of movement different from the third type of movement. In some embodiments, the first type of movement, the second type of movement, the third type of movement, and the fourth type of movement are different types of movement. In some embodiments, the first type of movement and the third type of movement are the same types of movement. In some embodiments, the first type of movement and the fourth type of movement are the same types of movement. In some embodiments, the computer system concurrently performs the two or more types of movement. In some embodiments, the computer system performs the two or more types of movement in a serial manner. Performing a first type of movement and a second type of movement as part of moving the portion of the computer system in the first movement pattern allows the computer system to better frame the content of the video as one or more conditions change, thereby performing an operation when a set of conditions has been met without requiring further user input, providing improved feedback (e.g., that the video is being captured), and providing additional control options without cluttering the user interface with additional displayed controls.

[0209] In some embodiments, the first type of movement of the portion of the computer system (e.g., 600) that includes the media capture component (e.g., 602) is a movement in a first lateral (e.g., sideways, leftward, and/or to the side) direction (e.g., as described above at FIG. 6G) (e.g., along a lateral axis of the movement component). In some embodiments, the second type of movement of the computer system that includes the media capture component is movement in a second lateral (e.g., sideways, rightward, and/or to the side) direction, opposite the first lateral direction, along the lateral axis of the movement component. In some embodiments, the two or more types of movement (e.g., the first type of movement and the second type of movement) include movement in the same lateral direction by different magnitudes. In some embodiments, the two or more types of movement include movement in the same lateral direction at different speeds. In some embodiments, the two or more types of movement include movement in the same lateral direction while rotating in different directions. In some embodiments, the two or more or more types of movement include movement in the same lateral direction while rotating at different speeds. Moving the portion of the computer system that includes the media capture component in the first lateral direction as a part of moving the portion of the computer system that includes the media capture component in the first movement pattern allows the computer system to maintain the framing of a user as the user moves in a lateral direction during the capturing of the video, thereby performing an operation when a set of conditions has been met without requiring further user input, providing improved feedback (e.g., that the video is being captured), and providing additional control options without cluttering the user interface with additional displayed controls. [0210] In some embodiments, the first type of movement of the portion of the computer system (e.g., 600) that includes the media capture component (e.g., 602) includes movement in a first vertical direction (e.g., as described above at FIG. 6G) (e.g., upwards and/or downwards) (e.g., along a vertical axis of the movement component). In some embodiments, the second type of movement includes movement in a second vertical direction (e.g., upwards and/or downwards), opposite the first vertical direction, along the vertical axis of the movement component. In some embodiments, the two or more types of movement (e.g., the first type of movement and the second type of movement) include movement in the same vertical direction by different magnitudes. In some embodiments, the two or more types of movement include movement in the same vertical direction at different speeds. In some embodiments, the two or more types of movement include movement in the same vertical direction while rotating in different directions. In some embodiments, the two or more types of movement include movement in the same vertical direction while rotating at different speeds. Moving the portion of the computer system that includes the media capture component in the first vertical direction as a part of moving the portion of the computer system in the first movement pattern allows the computer system to maintain the framing of a user as the user moves in a vertical direction during the capturing of the video, thereby performing an operation when a set of conditions has been met without requiring further user input, providing improved feedback (e.g., that the video is being captured), and providing additional control options without cluttering the user interface with additional displayed controls.

[0211] In some embodiments, the first type of movement of the portion of the computer system (e.g., 600) that includes the media capture component (e.g., 602) includes movement in a first longitudinal (e.g., forward direction and/or backward direction) (e.g., movement towards a user and/or away from the user) direction (e.g., along a longitudinal axis of the movement component) (e.g., as described above at FIG. 6G). In some embodiments, the second type of movement includes movement in a second longitudinal (e.g., forward direction and/or backward direction) (e.g., movement towards a user and/or away from the user), opposite the first longitudinal direction, direction along the longitudinal axis of the movement component. In some embodiments, the two or more types of movement include movement in the same longitudinal direction by different magnitudes. In some embodiments, the two or more types of movement include movement in the same longitudinal direction at different speeds. In some embodiments, the two or more types of movement include movement in the same longitudinal direction while rotating in different directions. In some embodiments, the two or more types of movement include movement in the same longitudinal direction while rotating at different speeds. Moving the portion of the computer system that includes the media capture component in the first longitudinal direction as a part of moving the portion of the computer system in the first movement pattern allows the computer system to maintain and/or change a distance between the portion of the computer system and content of the video, thereby performing an operation when a set of conditions has been met without requiring further user input, providing improved feedback (e.g., that the video is being captured), and providing additional control options without cluttering the user interface with additional displayed controls.

[0212] In some embodiments, one or more of the first type of movement of the portion of the computer system (e.g., 600) that includes the media capture component (e.g., 602) and the second type of movement of the portion of the computer system that includes the media capture component includes rotational movement (e.g., as described above at FIG. 6G). In some embodiments, the first type of movement includes rotation (e.g., yaw, roll, and/or pitch rotation) about a first axis of the movement component. In some embodiments, the second type of movement includes rotation (e.g., yaw, roll, and/or pitch rotation) about a second axis of the movement component. In some embodiments, the first axis and the second axis are different. In some embodiments, the first axis and the second axis are the same. In some embodiments, the computer system rotates about the first axis and the second axis at different speeds. In some embodiments, the computer system rotates about the first axis and the second axis at the same speed. In some embodiments, the computer system translates about a first respective axis of the movement component while the computer system rotates about a second respective axis of the movement component.

[0213] In some embodiments, the portion of the computer system (e.g., 600) that includes the media capture component (e.g., 602) moves at a first rate (e.g., as measured in feet per second, meters per second, and/or inches per second) (e.g., speed and/or acceleration) while the portion of the computer system that includes the media capture component moves in the first movement pattern. In some embodiments, the portion of the computer system that includes the media capture component moves at a second rate (e.g., as measured in feet per second, meters per second, and/or inches per second) different from the first rate (e.g., greater than or less than the first speed) while the portion of the computer system that includes the media capture component moves in the second movement pattern. In some embodiments, the portion of the computer system accelerates and/or decelerates while the portion of the computer system moves in a respective movement pattern. In some embodiments, the portion of the computer system does not accelerate or decelerate while the portion of the computer system moves in the respective movement pattern. Moving the portion of the computer system that includes the media capture component at a respective rate when a set of conditions is met (e.g., the first set of one or more conditions or the second set of one or more conditions is satisfied) automatically allows the computer system to indicate which capture conditions are satisfied, thereby performing an operation when a set of conditions has been met without requiring further user input.

[0214] In some embodiments, the framing of the video (e.g., 644 and/or 648) changes (e.g., tracking the user and/or following the user) based on (e.g., according to and/or using) a first set of one or more tracking parameters (e.g., how closely a user is tracked, how long the user is tracked for, how long the user is out of frame for, criteria for ceasing the tracking of a user, a magnification level of the media capture component, and/or positioning of the user within the frame of the media capture component) while the portion of the computer system (e.g., 600) that includes the media capture component (e.g., 602) moves in the first movement pattern. In some embodiments, the framing of the video changes (e.g., tracking of the user and/or following the user) based on (e.g., according to and/or using) a second set of one or more tracking parameters (e.g., how closely a user is tracked, how long the user is tracked for, how long the user is out of frame for, criteria for ceasing the tracking of a user, a magnification level of the media capture component, and/or positioning of the user and/or a body part of the user within the frame of the one more cameras, such as a head of the user being kept in the middle third, top third, or bottom third of the field of view of the media capture component and/or of a portion of the field of view that corresponds to the top, middle, and/or bottom third of the media), different from the first set of one or more tracking parameters, while the portion of the computer system that includes the media capture component moves in the second movement pattern (e.g., as described above at FIG. 6G). In some embodiments, the framing of the video changes differently when the framing changes based on the first set of one or more tracking parameters than when the framing changes based on the second set of one or more tracking parameters. Changing the framing of the video differently based on whether the portion of the computer system that includes the media capture component is moving with respect to the first movement pattern or the second movement pattern allows the computer system to indicate, via the framing of the video, whether the computer system is moving with respect to the first movement pattern or the second movement pattern, thereby providing improved visual feedback.

[0215] Note that details of the processes described above with respect to process 800 (e.g., FIG. 8) are also applicable in an analogous manner to other methods described herein. For example, process 900 optionally includes one or more of the characteristics of the various methods described above with reference to process 800. For example, the composition guidance of process 900 can be provided before or after moving the computer system in a movement pattern of process 800. For brevity, these details are not repeated herein.

[0216] FIG. 9 is a flow diagram illustrating a method (e.g., process 900) for providing composition guidance in accordance with some embodiments. Some operations in process 900 are, optionally, combined, the orders of some operations are, optionally, changed, and some operations are, optionally, omitted.

[0217] As described below, process 900 provides an intuitive way for providing composition guidance. Process 900 reduces the cognitive burden on a user, thereby creating a more efficient human-machine interface. For battery-operated computing devices, enabling a user to interact with such devices faster and more efficiently conserves power and increases the time between battery charges.

[0218] In some embodiments, process 900 is performed at a computer system (e.g., 600) that is in communication with a media capture component (e.g., 602) (e.g., a sensor, an environmental sensor, a capture component, an input component (e.g., a camera, a depth sensor, a microphone, a hardware input mechanism, a rotatable input mechanism, a heart monitor, a temperature sensor, and/or a touch-sensitive surface), a camera (e.g., a periscope camera, a telephoto camera, a wide-angle camera, and/or an ultra-wide-angle camera), a depth sensor, a microphone, a heart monitor, and/or a temperature sensor), an input component (e.g., 602) (e.g., a camera, a depth sensor, a microphone, a hardware input mechanism, a rotatable input mechanism, a heart monitor, a temperature sensor, and/or a touch-sensitive surface), and an output component (e.g., a display component (e.g., a display screen, a projector, and/or a touch-sensitive display), an audio component (e.g., smart speaker, home theater system, soundbar, headphone, earphone, earbud, speaker, television speaker, augmented reality headset speaker, audio jack, optical audio output, Bluetooth audio output, and/or HDMI audio output), a speaker, and/or a haptic output device). In some embodiments, the computer system is a phone, a watch, a tablet, a fitness tracking device, a wearable device, a display, a movable computer system, an accessory, a speaker, a light, a head-mounted display (HMD), and/or a personal computing device. In some embodiments, the media capture component includes and/or is the input component and/or the output component (e.g., an input and output component). In some embodiments, the output component is different from the media capture component and/or the input component. In some embodiments, the input component is different from the media capture component and/or the output component.

[0219] The computer system detects (902), via the input component (e.g., 602), a first set of one or more inputs (e.g., 605a, 605f, 605h, and/or 605k) (e.g., a verbal input (e.g., a verbal utterance, a sound, an audible request, an audible command, and/or an audible statement) and/or a non-verbal input (e.g., a swipe input, a hold-and-drag input, a gaze input, an air gesture, and/or a mouse click)) corresponding to one or more instructions that include one or more spoken words (e.g., instructions included in 605a, 605f, 605h, and/or 605k). In some embodiments, the first set of one or more inputs are detected while in a media capture mode (e.g., a mode in which the computer system is configured to capture an image, a video, and/or audio). In some embodiments, the first set of one or more inputs are detected while the computer system displays, via a display component in communication with the computer system, a user interface (e.g., corresponding to the media capture mode). In some embodiments, the first set of one or more inputs are detected while the computer system displays, via a display component in communication with the computer system, a live preview of output (e.g., an image, a video, and/or audio) of the media capture component.

[0220] In response to detecting the first set of one or more inputs (e.g., 605a, 605f, 605h, and/or 605k) corresponding to the one or more instructions, the computer system prepares (904) to capture media (e.g., an image, a video, and/or audio) via the media capture component (e.g., 602). In some embodiments, preparing to capture media occurs before capturing one or more images and/or media, preparing to capture media occurs while displaying a representation of a field of view of one or more cameras and/or one or more media capture components.

[0221] While (906) (and/or in conjunction with) preparing to capture media via the media capture component and in accordance with a determination that the one or more instructions includes first content (e.g., a first set of one or more spoken words), the computer system provides (908) (and/or outputs), via the output component, first composition guidance (e.g., 642 and/or 646) (e.g., visual, haptic, and/or audio output by the computer system that includes instructions for changing the spatial orientation of one or more subjects within the field of view of the media capture component (e.g., the instructions instruct the one or more subjects to move closer, further, to the left, down and/or up within the field of view of the media capture component and/or the instructions instruct the one or more subjects to change the spatial relationship between the one or more subjects)), wherein the first composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects (e.g., 640) in the field of view (e.g., 654) of the media capture component (e.g., 602).

[0222] While (906) preparing to capture media via the media capture component and in accordance with a determination that the one or more instructions includes second content (e.g., a second set of one or more spoken words) different from the first content (e.g., different from the first set of one or more spoken words), the computer system provides (910) (and/or outputs), via the output component, second composition guidance (e.g., 642 and/or 646) (in conjunction with, with, and/or without providing, via the output component, the first composition guidance) different from the first composition guidance (e.g., 642 and/or 646), wherein the second composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects (e.g., 640) in the field of view (e.g., 654) of the media capture component (e.g., 602). Selectively outputting composition guidance when a set of prescribed conditions are met (e.g., the one or more instructions includes first content or second content) allows the computer system to automatically tailor the composition guidance to a level of proficiency of a user of the computer system such that users that are less proficient are provided greater guidance than users that are more proficient, thereby performing an operation when a set of conditions has been met without requiring further user input.

[0223] In some embodiments, while (and/or, in some embodiments, before, after, or in conjunction with) preparing to capture media via the media capture component (e.g., 602) and in accordance with a determination that a respective subject (e.g., 622) is positioned at a first location within the field of view (e.g., 654) of the media capture component (e.g., centered within the field of view of the media capture components, left or right of center of the field of view of the media capture component, and/or above or below the center of the field of view of the media capture component), the computer system provides, via the output component, third composition guidance (e.g., 642 and/or 646) (e.g., that includes one or more recommendations for moving one or more objects within the field of view of the media capture component and/or outside of the field of view of the media capture component). In some embodiments, while preparing to capture media via the media capture component and in accordance with a determination that the respective subject is positioned at a second location different from the first location within the field of view of the media capture component, the computer system provides, via the output component, fourth composition guidance (e.g., 642 and/or 646) different from the third composition guidance (e.g., that includes one or more recommendations for moving one or more objects within the field of view of the media capture component and/or outside of the field of view of the media capture component). In some embodiments, the third composition guidance is different or the same as the first and/or second composition guidance. In some embodiments, the fourth composition guidance is different or the same as the first and/or second composition guidance. Selectively providing composition guidance based on the positioning of the user allows the computer system to automatically provide an indication of the user’s positioning relative to the computer system, thereby performing an operation when a set of conditions has been met without requiring further user input.

[0224] In some embodiments, while (and/or, in some embodiments, before, after, or in conjunction with) preparing to capture media (e.g., 605a, 605f, 605h, and/or 605k) via the media capture component (e.g., 602) and in accordance with a determination that a first set of elements are detected (e.g., in the field of view of the media capture component and/or in the environment (e.g., the environment of the computer system and/or the environment of a user)) (e.g., animate elements and/or inanimate elements), the computer system provides, via the output component, fifth composition guidance (e.g., 642 and/or 646) (e.g., that includes one or more recommendations for changing a spatial arrangement of the first set of elements). In some embodiments, while preparing to capture media via the media capture component and in accordance with a determination that a second set of elements different from the first set of elements is detected (e.g., in the field of view of the media capture component and/or in the environment (e.g., the environment of the computer system and/or the environment of a user)), the computer system provides, via the output component, sixth composition guidance (e.g., 642 and/or 646) (e.g., that includes one or more recommendations for changing a spatial arrangement of the second set of elements) different from the fifth composition guidance. In some embodiments, the fifth composition guidance is different or the same as the first and/or second composition guidance. In some embodiments, the sixth composition guidance is different or the same as the first and/or second composition guidance. Selectively providing composition guidance based on which elements are detected automatically allows the computer system to provide an indication of which elements are within the field of view of the media capture component, thereby performing an operation when a set of conditions has been met without requiring further user input.

[0225] In some embodiments, in accordance with a determination that the first set of one or more inputs (e.g., 605a, 605f, 605h, and/or 605k) has a first level of detail, the first composition guidance (e.g., 642 and/or 646) (e.g., in accordance with a determination that the one or more instructions includes first content) (or the second composition guidance in accordance with a determination that the one or more instructions includes second content) has a first amount of information (e.g., direction, guidance, and/or instruction) (e.g., as described above at FIG. 61). In some embodiments, in accordance with a determination that the first set of one or more inputs has a second level of detail that is greater than the first level of detail, the first composition guidance (e.g., in accordance with a determination that the one or more instructions includes first content) (or the second composition guidance in accordance with a determination that the one or more instructions includes second content) has a second amount of information (e.g., direction, guidance, and/or instruction) that is greater than the first amount of information (e.g., as described above at FIG. 61) (e.g., the first amount of information instructs respective subjects to move closer to the computer system while the second amount of information instructs the respective subjects to move 8 feet closer to the computer system and/or the first amount of information instructs the respective subjects to move to the left while the second amount of information instructs the respective subjects to move to the left until the respective subjects are positioned in front of an object (e.g., a chair, a couch, and/or a television)). Selectively providing the first composition guidance with a particular amount of information automatically allows the computer system to tailor the content of its guidance based on the input, thereby performing an operation when a set of conditions has been met without requiring further user input.

[0226] In some embodiments, while (and/or, in some embodiments, before, after, or in conjunction with) preparing to capture media (e.g., 605a, 605f, 605h, and/or 605k) via the media capture component (e.g., 602) (and, in some embodiments, in accordance with a determination that the one or more instructions includes first content or, in some embodiments, in accordance with a determination that the one or more instructions includes second content) and in accordance with a determination that the first set of one or more inputs (e.g., 605a, 605f, 605h, and/or 605k) correspond to a first individual (e.g., 622) (e.g., the first set of one or more inputs is performed by the first individual and/or the first set of one or more inputs refer to the first individual) (e.g., an individual that is registered with the computer system or an individual that is not registered with the computer system) (e.g., a subject, a user, a person, an animal, and/or an object), the computer system provides, via the output component, eighth composition guidance (e.g., 642 and/or 646) (e.g., that includes one or more individual specific recommendations for changing a spatial arrangement of the first individual and/or other individuals positioned within the field of view of the media capture component). In some embodiments, while preparing to capture media via the media capture component and in accordance with a determination that the first set of one or more inputs correspond to a second individual (e.g., 622) (e.g., the first set of one or more inputs is performed by the second individual and/or the first set of one or more inputs refer to the second individual) (e.g., an individual that is registered with the computer system or an individual that is not registered with the computer system) different from the first individual (e.g., a subject, a user, a person, an animal, and/or an object), the computer system provides, via the output component, ninth composition guidance (e.g., 642 and/or 646) (e.g., that includes one or more individual specific recommendations for changing a spatial arrangement of the second individual and/or other individuals positioned within the field of view of the media capture component) different from the eighth composition guidance. In some embodiments, the composition guidance is based on which individual provided the first set of one or more inputs and the content of the first set of one or more inputs. Selectively providing a type of composition guidance when a set of conditions is met automatically allows the computer system to tailor the content of its guidance on a user-by-user basis, thereby performing an operation when a set of conditions has been met without requiring further user input.

[0227] In some embodiments, the first composition guidance (e.g., 642 and/or 646) (and/or second composition guidance) includes one or more recommendations (e.g., audible instructions and/or visual instructions) that one or more characteristics of lighting (e.g., amount of lighting, brightness of the lighting, color of the lighting, tone of the lighting, and/or hue of the lighting) in an environment in the field of view (e.g., 654) of the media capture component (e.g., 602) (e.g., an environment of the user and/or an environment of the computer system) should be changed (e.g., as described above at FIG. 61). In some embodiments, the second composition guidance includes instruction for changing one or more characteristics of light in the environment. In some embodiments, the computer system automatically (e.g., without intervening user input) changes one or more characteristics of lighting in response to detecting the first set of one or more inputs. Providing one or more recommendations that one or more characteristics of lighting in the environment in the field of view of the media capture component should be changed while preparing to capture media via the media capture component allows the computer system to provide an indication with respect to the state of the computer system (e.g., the computer system is preparing to capture media) and the state of the environment in the field of view of the media capture component (e.g., the current characteristics of the lighting are not optimal for capturing media), thereby providing improved feedback and providing additional control options without cluttering the user interface with additional displayed controls.

[0228] In some embodiments, the first composition guidance (e.g., 642 and/or 646) (and/or second composition guidance) includes (e.g., audible instructions and/or visual instructions) one or more recommendations that an amount of light (e.g., environmental light, ambient light, one or more lights in the environment that are not physically coupled to the computer system) in an environment in the field of view (e.g., 654) of the media capture component (e.g., 602) should be changed (e.g., as described above at FIG. 61) (e.g., increasing the amount of light, or decreasing the amount of light). Providing one or more recommendations that an amount of light in the environment in the field of view of the media capture component should be changed while preparing to capture media via the media capture component allows the computer system to provide an indication with respect of the state of the computer system (e.g., the computer system is preparing to capture media) and the state of the environment in the field of view of the media capture component (e.g., there is too little or too much light in the environment), thereby providing improved feedback and providing additional control options without cluttering the user interface with additional displayed controls.

[0229] In some embodiments, the first composition guidance (e.g., 642 and/or 646) (and/or second composition guidance) includes (e.g., audible instructions and/or visual instructions) one or more recommendations that a type of light (e.g., direct light, indirect light, artificial light, and/or natural light) in an environment in the field of view (e.g., 654) of the media capture component (e.g., 602) should be changed (e.g., as described above at FIG. 61). In some embodiments, the instructions for changing the one or more characteristics of lighting in the environment include instructions for changing two or more types of light in the environment. Providing one or more recommendations that a type of light in the environment in the field of view of the media capture component (should be changed while preparing to capture media via the media capture component allows the computer system to provide an indication with respect of the state of the computer system (e.g., the computer system is preparing to capture media) and the state of the environment in the field of view of the media capture component ( (e.g., the type of lighting in the environment is not optimal for capturing the media), thereby providing improved feedback and providing additional control options without cluttering the user interface with additional displayed controls.

[0230] In some embodiments, the first composition guidance (e.g., 642 and/or 646) (and/or second composition guidance) includes one or more recommendations that one or more colors of light in an environment in the field of view (e.g., 654) of the media capture component (e.g., 602) should be changed (e.g., as described above at FIG. 61). Providing one or more recommendations that one or more colors of light in the environment in the field of view of the media capture component ( should be changed while preparing to capture media via the media capture component allows the computer system to provide an indication with respect of the state of the computer system (e.g., the computer system is preparing to capture media) and the state of the environment in the field of view of the media capture component ( (e.g., the color of the light in the environment is not optimal for capturing the media), thereby providing improved feedback and providing additional control options without cluttering the user interface with additional displayed controls.

[0231] In some embodiments, the computer system (e.g., 600) is in communication with a set of one or more external lights (e.g., an illumination device, a point light source, a spotlight, and/or one or more light sources). In some embodiments, while preparing to capture media (e.g., 605a, 605f, 605h, and/or 605k) (and/or, in some embodiments, before, after, or in conjunction with) via the media capture component (e.g., 602) and without detecting an input corresponding to a subject (e.g., a user input and/or an input that is performed by the subject) (e.g., automatically), the computer system sends instructions to the set of one or more external lights, wherein sending the instructions to the set of the one or more external lights causes one or more characteristics of lighting in an environment in the field of view (e.g., 654) of the media capture component to change (e.g., as described above at FIG. 61). In some embodiments, the instructions are sent to the set of one or more external lights in response to the computer system receiving confirmation that the subject approves changing the one or more characteristics of lighting in the environment. In some embodiments, the instructions are sent to the set of one or more external lights without the computer system receiving confirmation that the subject approves changing the one or more characteristics of lighting in the environment. Sending instructions to the set of one or more external lights without detecting an input corresponding to the subject while preparing to capture media automatically allows the computer system to control one or more characteristics of the lighting in the environment in the field of view of the media capture component to improve the capturing of the media, thereby performing an operation when a set of conditions has been met (e.g., the computer system is preparing to capture media) without requiring further user input.

[0232] In some embodiments, the first composition guidance (e.g., 642 and/or 646) includes one or more recommendations (e.g., an audible prompt and/or visual prompt) that one or more objects (e.g., 640) (e.g., animate objects and/or inanimate objects) should be moved (e.g., that someone should move the object) (e.g., as described above at FIG.61). Providing one or more recommendations that one or more objects should be moved while the computer system is preparing to capture media allows the computer system to indicate the state of the computer system (e.g., that the computer system is preparing to capture media) and cause the movement of objects in the environment to improve the capturing of the media and the resulting media item, thereby providing improved feedback and performing an operation when a set of conditions has been met (e.g., the computer system is preparing to capture media) without requiring further user input.

[0233] In some embodiments, the first composition guidance (e.g., 642 and/or 646) includes one or more recommendations (e.g., an audible prompt and/or visual prompt) that a subject (e.g., 622) should move (e.g., to a particular position) (e.g., move towards the computer system and/or the media capture component, move away from the computer system and/or the media capture component, move to the left of the computer system and/or the media capture component, and/or move to the right of the computer system and/or the media capture component) from a first position to a second position (e.g., in an environment) (e.g., as described above at FIG. 6L). In some embodiments, the prompt for the subject to move includes instructions to rotate (e.g., bend over) and/or perform translational movement (e.g., move to the right and/or move to the left). Providing one or more recommendations that a subject should move while the computer system is preparing to capture media allows the computer system to indicate the state of the computer system (e.g., that the computer system is preparing to capture media) and recommend the movement of the subject in the environment to improve the capturing of the media and the resulting media item, thereby providing improved feedback and performing an operation when a set of conditions has been met (e.g., the computer system is preparing to capture media) without requiring further user input.

[0234] In some embodiments, while (and/or, in some embodiments, before, after, or in conjunction with) preparing to capture media (e.g., 605a, 605f, 605h, and/or 605k) via the media capture component (e.g., 602) and in accordance with a determination that the positioning of a portion of a respective subject (e.g., 622) (e.g., the face of the respective subject, the head of the respective subject, the torso of the respective subject, an arm of the respective subject, a leg of the respective subject, and/or an extremity of the respective subject) within the field of view (e.g., 654) of the media capture component satisfies a first set of one or more positioning criteria (e.g., the portion of the respective subject is left of center of the field of view of the media capture component, the portion of the respective subject is right of center of the field of view of the media capture component, the portion of the respective subject is beneath the center of the field of view of the media capture component, the portion of the respective subject is above the center of the field of view and/or the portion of the respective subject is at the center of the field of view of the media capture component) (e.g., the portion of the respective subject continues to be in a respective portion of the field of view and/or a portion of capture media corresponding to the field of view of the one or more cameras, such as the middle, top, and/or bottom third (or half, fourth, fifth, sixth, etc.) of the field of view of the one or more cameras) relative to the field of view of the media capture component, the computer system provides, via the output component, tenth composition guidance (e.g., 642 and/or 646) (e.g., the tenth composition guidance includes one or more recommendations for changing a spatial positioning of one or more users in the field of view of the one or more cameras such that the user’s head is centered within a third of the field of view of the one or more cameras) (e.g., as described above at FIG. 61). In some embodiments, while preparing to capture media via the media capture component and in accordance with a determination that the position of the portion of the respective subject within the field of view of the media capture component satisfies a second set of one or more positioning criteria (e.g., the portion of the user is left of center of the field of view of the media capture component, the portion of the user is right of center of the field of view of the media capture component, the portion of the user is beneath the center of the field of view of the media capture component, the portion of the user is above the center of the field of view and/or the portion of the user is at the center of the field of view of the media capture component), different from the first set of one of one or more positioning criteria, relative to the field of view of the media capture component, the computer system provides, via the output component, eleventh composition guidance (e.g., 642 and/or 646) different from the tenth composition guidance (e.g., as described above at FIG. 61). Selectively providing composition guidance based on the positioning of the portion of the respective subject automatically allows the computer system to provide tailored guidance for adjusting the positioning of the portion of the respective subject relative to the field of view of the media capture component such that the media capturing process is improved, thereby performing an operation when a set of conditions has been met without requiring further user input.

[0235] In some embodiments, the tenth composition guidance (e.g., 642 and/or 646) (and/or the eleventh composition guidance) is provided in reference to the positioning of the portion of the respective subject relative to a fixed reference point (e.g., as described above at FIG. 61) (e.g., middle, bottom, or top third (or half, fourth, fifth, sixth, etc.) of the field of view and/or a portion of media captured corresponding to a portion of the field of view) (e.g., center face horizontally and/or vertically). Providing composition guidance in reference to the positioning of the portion of the respective subject relative to the fixed reference point automatically allows the computer system to provide an indication of the spatial relationship between the portion of the respective subject and the fixed reference point, thereby performing an operation when a set of conditions has been met without requiring further user input.

[0236] In some embodiments, the tenth composition guidance (e.g., 642 and/or 646) is provided with respect to (e.g., according to, dependent upon, based on) a spatial relationship of the portion of the respective subject and the body (e.g., the upper torso, lower torso, arms, and/or legs) of the respective subject (e.g., as described above at FIG. 61) (e.g., mirror and/or follow head to body framing with subject to scene framing). In some embodiments, when the body of the respective subject is above the portion of the respective subject, the composition guidance is provided with respect to a lower boundary of a target zone of the field of view of the media capture component. In some embodiments, when the body of the respective subject is below the portion of the respective subject, the composition guidance is provided with respect to an upper boundary of a target zone of the field of view of the media capture component. Providing composition guidance in reference to the positioning of the portion of the respective subject relative to the body of the respective subject automatically allows the computer system to provide an indication of the spatial relationship between the portion of the respective subject and the body of the respective subject, thereby performing an operation when a set of conditions has been met without requiring further user input.

[0237] In some embodiments, while (and/or, in some embodiments, before, after, or in conjunction with) preparing to capture media (e.g., 605a, 605f, 605h, and/or 605k) via the media capture component (e.g., 602) and in accordance with a determination that a horizontal plane (e.g., a horizon, a pair of shoulders, the frame of a picture, a column, the surface of a table, and/or the surface of a desk) (e.g., horizontal lines and/or a level plane) (e.g., a physical and/or tangible physical plane) in the field of view (e.g., 654) of the media capture component has a first spatial orientation (e.g., relative to the spatial orientation of the media capture component, relative to a user, and/or relative to one or more objects positioned within the field of view of the media capture component), the computer system provides, via the output component, twelfth composition guidance (e.g., 642 and/or 646) (e.g., as described above at FIG. 61) (e.g., guidance that includes one or more recommendations for repositioning the horizontal plane and/or the media capture component such that the horizontal plane is level with the media capture component) (e.g., camera moves such that lines in the field of view (shoulders, horizon) are horizontal). In some embodiments, while preparing to capture media via the media capture component and in accordance with a determination that the horizontal plane in the field of view of the media capture component has a second spatial orientation (e.g., relative to the spatial orientation of the media capture component, relative to the user, and/or relative to one or more objects positioned within the field of view of the media capture component) different from the first spatial orientation, the computer system provides, via the output component, thirteenth composition guidance different from the twelfth composition guidance (e.g., 642 and/or 646) (e.g., as described above at FIG. 61) (e.g., guidance that includes one or more recommendations for repositioning the horizontal plane and/or the media capture component such that the horizontal plane is not angled within the field of view of the media capture component). In some embodiments, the computer system detects the horizontal plane in the field of view of the media capture component while preparing to capture media via the media capture component. Selectively providing composition guidance based on the orientation of the horizontal plane in the field of view of the media capture component automatically allows the computer system to provide guidance such that planes in the field of view of the media capture component are correctly aligned with the field of view of the media capture component, thereby performing an operation when a set of conditions has been met without requiring further user input.

[0238] In some embodiments, while (and/or, in some embodiments, before, after, or in conjunction with) preparing to capture media (e.g., 605a, 605f, 605h, and/or 605k) via the media capture component (e.g., 602) and in accordance with a determination that there is a first amount of distance (e.g.,.1-10 feet) between a boundary of the field of view (e.g., 654) of the media capture component and a portion of a respective subject (e.g., 622) (e.g., the end of a finger, the end of an arm, shoulder, the bottom of the user’s feet, the top of the user’s head, the edge of a desk, and/or the edge of a book case), the computer system provides, via the output component, fourteenth composition guidance (e.g., 642 and/or 646) (e.g., guidance that includes one or more recommendations for repositioning the user such that the distance between the extremity of the user and the boundary of the field of view increases or decreases such that the horizontal plane is not angled within the field of view of the media capture component). In some embodiments, while preparing to capture media via the media capture component and in accordance with a determination that there is a second amount of distance between the boundary of the field of view of the media capture component and the portion of the respective subject, the computer system provides, via the output component, fifteenth composition guidance (e.g., 642 and/or 646) different from the fourteenth composition guidance (e.g., guidance that includes one or more recommendations for repositioning the user such that the distance between the extremity of the user and the boundary of the field of view increases or decreases such that the horizontal plane is not angled within the field of view of the media capture component). Selectively providing composition guidance when prescribed conditions are met automatically allows the computer system to provide an indication of how close the respective subject is to a boundary of the field of view of media capture component such that the respective subject does not extend beyond the boundary of the field of view of the media capture component, thereby performing an operation when a set of conditions has been met without requiring further user input.

[0239] In some embodiments, the computer system (e.g., 600) is in communication (e.g., wired communication and/or wireless communication) with a movement component (e.g., an actuator (e.g., a pneumatic actuator, hydraulic actuator and/or an electric actuator), a movable base, a rotatable component, a motor, a lift, a level, and/or a rotatable base). In some embodiments, while (and/or, in some embodiments, before, after, or in conjunction with) preparing to capture media (e.g., 605a, 605f, 605h, and/or 605k) via the media capture component (e.g., 602) (e.g., before preparing to capture media via the media capture component), the computer system moves, via the movement component, a portion of the computer system (e.g., a portion of the computer system includes the media capture component) (e.g., and/or the media capture component) (e.g., to improve the alignment of users (animate and/or inanimate users) within the field of view of the media capture component ) (e.g., to improve the alignment of users within the field of view of the media capture component based on composition guidelines (e.g., rule of thirds, rule of odds, rule of space, fill the frame, golden triangles, and golden ratio)). In some embodiments, the computer system and/or the media capture component rotates and/or translates as a part of moving. Moving, via the movement component, the portion of the computer system while preparing to capture media allows the computer system to provide an indication of the state of the computer system (e.g., the computer system is preparing to capture media) while better aligning the media capture component with the content to be captured, thereby providing improved feedback and performing an operation when a set of conditions has been met (e.g., while the computer system is preparing to capture media) without requiring further user input.

[0240] In some embodiments, the computer system outputs, via the output component, an indication (e.g., an explanation, a description, a clarification, and/or a demonstration (e.g., a visual demonstration)) of why the portion of the computer system (e.g., 600) is moving (e.g., as explained above at FIG. 6G) (e.g., before, during, or after moving the computer system to improve the alignment of users within the field of view of the media capture component). In some embodiments, the explanation is an audible explanation. In some embodiments, the explanation is displayed via a display component of the computer system. In some embodiments, the computer system outputs an indication of how the computer system will move. In some embodiments, the indication of why the computer system is moving includes an indication that the alignment of the computer system and/or a camera of the computer system is changing. Outputting, via the output component, an indication of why the portion of the computer system is moving allows the computer system to provide an indication to users of the computer system what conditions (e.g., conditions in the environment and or conditions of the user) will trigger movement of the computer system, thereby providing improved feedback and providing additional control options without cluttering the user interface with additional displayed controls.

[0241] In some embodiments, after preparing to capture media (e.g., 605a, 605f, 605h, and/or 605k) via the media capture component (e.g., 602) (e.g., while the computer system is prepared to capture media) (e.g., or in response to preparing the computer system to capture media) and in accordance with a determination that a set of one or more conditions is satisfied (e.g., as described above in relation to process 700), the computer system captures, via the media capture component, media (e.g., still photo and/or video) without detecting an intervening input (e.g., a user input and/or an input that is performed by the user) (e.g., media is automatically captured). Capturing media when a set of prescribed conditions is met automatically allows the computer system to capture media based on instructions and/or guidance from the user such that the resulting media item is captured based on one or more preferences of the user, thereby performing an operation when a set of conditions has been met without requiring further user input.

[0242] In some embodiments, after preparing to capture media (e.g., 605a, 605f, 605h, and/or 605k) via the media capture component (e.g., 602) (e.g., or while preparing to capture media via the media capture component (e.g., while the computer system is prepared to capture media)) (e.g., or in response to preparing the computer system to capture media), the computer system detects a respective input (e.g., 605a, 605f, 605h, and/or 605k) from a user (e.g., 622) (e.g., a voice command, tap input, swipe input, rotation of a physical input mechanism and/or an air gesture) (e.g., a user that is performed by the user). In some embodiments, in response to detecting the respective input from the user, the computer system captures, via the media capture component, a second media item (e.g., 605a, 605f, 605h, and/or 605k) (e.g., video and/or still photo). Capturing the second media item in response to detecting the respective input from the user allows the computer system to

I l l indicate the state of the computer system (e.g., that the computer system has detected the input from the user), thereby providing improved feedback and providing additional control options without cluttering the user interface with additional displayed controls.

[0243] In some embodiments, the computer system (e.g., 600) is in communication with a display component (e.g., computer monitor, touch sensitive display, head mounted display, touch-sensitive display, and/or television) (e.g., in some embodiments, the display component is on the front of the computer system and/or is front-facing). In some embodiments, while preparing (and/or, in some embodiments, before, after, or in conjunction with) to capture media (e.g., 605a, 605f, 605h, and/or 605k) via the media capture component (e.g., 602), the computer system displays, via the display component (e.g., 604), a representation (e.g., a real-time representation and/or a live representation) (e.g., a live preview, a camera live feed, and/or a representation of data being captured by the camera) of the field of view (e.g., 654) of the media capture component. In some embodiments, the computer system displays, via the display component, a representation of the field of view of the computer system and/or a user. Displaying the representation of the field of view of the media capture component when a set of prescribed conditions are met (e.g., the computer system is preparing to capture media) automatically allows the computer system to provide an indication of the state of the computer system (e.g., that the computer system is preparing to capture media) and provide users with a representation of the content included in the field of view of the media capture component such that the camera only captures desired content when capturing media, thereby providing improved feedback and performing an operation when a set of conditions has been met without requiring further user input.

[0244] Note that details of the processes described above with respect to process 900 (e.g., FIG. 9) are also applicable in an analogous manner to the methods described herein. For example, method NoneOO optionally includes one or more of the characteristics of the various methods described herein with reference to process 900. For example, the composition guidance of process 900 can occur in response to the first input of process 700. For brevity, these details are not repeated herein.

[0245] The description above, has been described with reference to specific examples for the purpose of explanation. Such specific examples can be in the form of textual description above and/or in the accompanying drawings. However, such embodiments should not be interpreted as being exhaustive and/or limiting to the disclosure (e.g., limiting to the explicit manners described herein). Many modifications and variations are possible in view of the above teachings by one of ordinary skill in the art without departing from the scope of the present disclosure.

[0246] Aspects of the technology described above can include gathering and/or using data from various sources. Such data can be used to improve interactions that a device has with its environment (e.g., interactions with users). In some scenarios, such data can include personal information that is usable to uniquely identify a specific person. Such personal information can include demographic data, telephone numbers, email addresses, location and/or location-related data, home addresses, work addresses, and/or any other identifying information. The use of such personal information can be utilized for the benefit of users of the device. For example, a user’s personal information can be used to improve interactions that the device engages in with the user. Other benefits from the use for personal information data are also possible and within the scope of the present disclosure.

[0247] The use of personal information can require one or more entities handling such data. These entities can be involved in collecting, processing, disclosing, transferring, storing, or other functions that support the technologies described herein. The present disclosure expects that (e.g., does not preclude) that all use of personal information data complies with well-established privacy policies and/or privacy practices by such entities. As a general matter, such policies and practices should meet or exceed generally recognized industry standards and comply with all applicable data privacy and security-related governmental requirements. In particular, for example, entities should receive informed consent from users to collect and/or use such personal information, and such collection and/or use should only be for legitimate and reasonable uses. Further, personal information of a user should not be shared, disclosed, sold, and/or provided for uses other than legitimate and/or reasonable uses.

[0248] Various scenarios can arise in which personal information is not available, such as when a user selects not to share such information. For example, the user can withhold consent for collection and/or use of such data (e.g., “opt out” of sharing such data and/or not explicitly “opt in” during a registration process). The user can also employ the use of any of various hardware and/or software components that prevent collection and/or use of such data. While the use of personal information can benefit a user by improving the operation of the device, the present disclosure contemplates that embodiments of the present technology can be used without such data. For example, operations of the device can use non -personal information (e.g., instead of and/or in place of personal information). Other techniques include making inferences based on non-personal information data or a minimal amount of personal information.

Claims

CLAIMS What is claimed is:

1. A method, comprising: at a computer system that is in communication with a media capture component and a microphone: detecting, via the microphone, a first input corresponding to a request to capture media; and after detecting the first input corresponding to the request to capture media: in accordance with a determination that the first input corresponds to a first instruction, capturing, via the media capture component, media in response to a first set of one or more conditions being satisfied; and in accordance with a determination that the first input corresponds to a second instruction different from the first instruction, capturing, via the media capture component, media in response to a second set of one or more conditions being satisfied, wherein the second set of one or more conditions is different from the first set of one or more conditions.

2. The method of claim 1, wherein the computer system is in communication with a first movement component, the method further comprising: after detecting the first input corresponding to the request to capture media, moving, via the first movement component, a portion of the computer system.

3. The method of claim 2, further comprising: after detecting the first input corresponding to the request to capture media, moving, via the first movement component, a position of the media capture component so that a field of view of the media capture component moves from a first position to a second position different from the first position.

4. The method of any one of claims 2-3, wherein: after detecting the first input: in accordance with a determination that the first input includes one or more instructions for the computer system to move in a first manner, a portion of the computer system moves in the first manner; and in accordance with a determination that the first input includes one or more instructions for the computer system to move in a second manner different from the first manner, the portion of the computer system moves in the second manner different from the first manner.

5. The method of claim 1-4, wherein detecting the first input corresponding to the request to capture media includes capturing, via the microphone, one or more verbal instructions.

6. The method of any one of claims 1-5, wherein detecting the first input corresponding to the request to capture media includes capturing, via the one or more input devices, one or more gesture-based instructions.

7. The method of any one of claims 1-6, wherein capturing, via the media capture component, media in response to the first set of one or more conditions being satisfied includes moving a portion of the computer system in a third manner, and wherein capturing, via the media capture component, media in response to the second set of one or more conditions being satisfied includes moving the portion of the computer system in a fourth manner different from the third manner.

8. The method of any one of claims 1-7, wherein the computer system is in communication with a second movement component, the method further comprising: while capturing, via the media capture component, the media: in accordance with a determination that the media being captured is a first type of media, moving, via the second movement component, a portion of the computer system during the capture of the media; and in accordance with a determination that the media being captured is a second type of media different from the first type of media, forgoing moving, via the second movement component, the portion of the computer system during the capture of the media.

9. The method of claim 8, wherein the first type of media is a video or a panoramic photo, and wherein the second type of media is a still photo.

10. The method of any one of claims 1-9, wherein the first input includes instructions for the computer system to delay capture of the media until the detection of a camera-detected input.

11. The method of claim 10, wherein the camera-detected input includes the detection of a gaze.

12. The method of any one of claims 10-11, wherein the camera-detected input includes the detection of a gesture.

13. The method of any one of claims 10-12, wherein the camera-detected input includes the detection of a pose.

14. The method of any one of claims 1-13, wherein the first input includes a set of one or more temporal based instructions indicating one or more media capture parameters.

15. The method of claim 14, wherein the set of one or more temporal based instructions includes one or more indications of when capture of media will be initiated.

16. The method of any one of claims 14-15, wherein the set of one or more temporal based instructions includes one or more indications of when capture of media will stop.

17. The method of any one of claims 14-16, wherein the set of one or more temporal based instructions include one or more indications of a time interval between the capture of separate media items.

18. The method of any one of claims 1-17, wherein the first input includes an indication of a composition guidance related to capturing the media.

19. The method of any one of claims 1-18, wherein the first set of one or more conditions does not include a condition corresponding to a detection of an input corresponding to a first user.

20. The method of claim 19, wherein the first set of one or more conditions includes a condition that is satisfied when a determination is made that a person in a field of view of the media capture component has stopped moving for a threshold amount of time.

21. The method of any one of claims 19-20, wherein the first set of one or more conditions includes a condition that is satisfied when a determination is made that a person in a field of view of the media capture component is positioned in a respective pose.

22. The method of any one of claims 19-21, further comprising: before capturing, via the media capture component, the media in response to the first set of one or more conditions being satisfied and in accordance with a determination that the first set of one or more conditions continue to be satisfied, displaying a countdown of a period of time that has to elapse before the media is captured.

23. The method of claim 22, further comprising: while displaying the countdown of the period of time that has to elapse before the media is captured, detecting that the first set of one or more conditions do not continue to be satisfied; and in response to detecting that the first set of one or more conditions do not continue to be satisfied, interrupting display of the countdown of the period of time that has to elapse before the media is captured.

24. The method of any one of claims 22-23, further comprising: while displaying the countdown of the period of time that has to elapse before the media is captured, detecting a request to capture the media before the period of time has elapsed before the media is captured; and in response to detecting the request to capture the media before the period of time has elapsed before the media is captured, interrupting display of the countdown of the period of time that has to elapse before the media is captured.

25. The method of any one of claims 1-24, wherein the media is a first media item, the method further comprising: after capturing the first media item, detecting, via the media capture component, a change in a pose of a person in a field of view of the media capture component; and in response to detecting the change in the pose of the person in the field of view of the media capture component, capturing, via the media capture component, a second media item.

26. The method of any one of claims 1-25, wherein the media is a third media item, the method further comprising: after capturing the third media item, detecting a change in a pose of a respective person in a field of view of the media capture component; and in response to detecting the change in the pose of the respective person: in accordance with a determination the respective person is positioned in one or more target poses, capturing, via the media capture component, a fourth media item; and in accordance with a determination that the respective person is not positioned in the one or more target poses, forgoing capturing the fourth media item.

27. The method of any one of claims 1-26, wherein the media is a fifth media item, the method further comprising: after capturing the fifth media item, detecting a change in a pose of a respective person in a field of view of the media capture component; and in response to detecting the change in the pose of the respective person: in accordance with a determination that change in the pose of the respective person was detected within a threshold amount of time of capturing the fifth media item, capturing, via the media capture component, a sixth media item; and in accordance with a determination that change in the pose of the respective person was not detected within the threshold amount of time of capturing the fifth media item, forgoing capturing, via the media capture component, the sixth media item.

28. A non-transitory computer-readable medium storing one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component and a microphone, the one or more programs including instructions for performing the method of any one of claims 1-27.

29. A computer system that is configured to communicate with a media capture component and a microphone, the computer system comprising: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for performing the method of any one of claims 1-27.

30. A computer system that is configured to communicate with a media capture component and a microphone, the computer system comprising: means for performing the method of any one of claims 1-27.

31. A computer program product, comprising one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component and a microphone, the one or more programs including instructions for performing the method of any one of claims 1-27.

32. A non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component and a microphone, the one or more programs including instructions for: detecting, via the microphone, a first input corresponding to a request to capture media; and after detecting the first input corresponding to the request to capture media: in accordance with a determination that the first input corresponds to a first instruction, capturing, via the media capture component, media in response to a first set of one or more conditions being satisfied; and in accordance with a determination that the first input corresponds to a second instruction different from the first instruction, capturing, via the media capture component, media in response to a second set of one or more conditions being satisfied, wherein the second set of one or more conditions is different from the first set of one or more conditions.

33. A computer system configured to communicate with a media capture component and a microphone, comprising: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: detecting, via the microphone, a first input corresponding to a request to capture media; and after detecting the first input corresponding to the request to capture media: in accordance with a determination that the first input corresponds to a first instruction, capturing, via the media capture component, media in response to a first set of one or more conditions being satisfied; and in accordance with a determination that the first input corresponds to a second instruction different from the first instruction, capturing, via the media capture component, media in response to a second set of one or more conditions being satisfied, wherein the second set of one or more conditions is different from the first set of one or more conditions.

34. A computer system configured to communicate with a media capture component and a microphone, comprising: means for, detecting, via the microphone, a first input corresponding to a request to capture media; and means, after detecting the first input corresponding to the request to capture media, for: in accordance with a determination that the first input corresponds to a first instruction, capturing, via the media capture component, media in response to a first set of one or more conditions being satisfied; and in accordance with a determination that the first input corresponds to a second instruction different from the first instruction, capturing, via the media capture component, media in response to a second set of one or more conditions being satisfied, wherein the second set of one or more conditions is different from the first set of one or more conditions.

35. A computer program product, comprising one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component and a microphone, the one or more programs including instructions for: detecting, via the microphone, a first input corresponding to a request to capture media; and after detecting the first input corresponding to the request to capture media: in accordance with a determination that the first input corresponds to a first instruction, capturing, via the media capture component, media in response to a first set of one or more conditions being satisfied; and in accordance with a determination that the first input corresponds to a second instruction different from the first instruction, capturing, via the media capture component, media in response to a second set of one or more conditions being satisfied, wherein the second set of one or more conditions is different from the first set of one or more conditions.

36. A method, comprising: at a computer system that is in communication with a media capture component and a movement component: while capturing video via the media capture component: in accordance with a determination that a first set of one or more capture conditions is satisfied, moving, via the movement component, a portion of the computer system that includes the media capture component in a first movement pattern, wherein moving the portion of the computer system in the first movement pattern causes framing of the video to change as the portion of the computer system moves; and in accordance with a determination that a second set of one or more capture conditions is satisfied, wherein the second set of one or more capture conditions is different from the first set of one or more capture conditions, moving, via the movement component, the portion of the computer system in a second movement pattern different from the first movement pattern, wherein moving the portion of the computer system in the second movement pattern causes framing of the video to change as the portion of the computer system moves.

37. The method of claim 36, wherein: in accordance with a determination that the computer system is configured to capture video using a first artistic style, the first set of one or more capture conditions is satisfied; and in accordance with a determination that the computer system is configured to capture video using a second artistic style different from the first artistic style, the second set of one or more capture conditions is satisfied.

38. The method of claim 37, further comprising: detecting an input corresponding to a respective artistic style; and after detecting the input corresponding to a respective artistic style and in response to the occurrence of a triggering condition corresponding to capturing video: in accordance with a determination that the respective artistic style corresponds to the first artistic style, capturing video using the first artistic style; and in accordance with a determination that the respective artistic style corresponds to the second artistic style, capturing video using the second artistic style.

39. The method of claim 37, further comprising: detecting an occurrence of one or more respective conditions without detecting an input from a user; and after detecting the occurrence of one or more respective conditions without detecting the input from the user and in response to the occurrence of a triggering condition corresponding to capturing video: in accordance with a determination that the one or more respective conditions corresponds to the first artistic style, capturing video using the first artistic style; and in accordance with a determination that the one or more respective conditions corresponds to the second artistic style, capturing video using the second artistic style.

40. The method of any one of claims 36-39, wherein: in accordance with a determination that a first set of one or more media capture settings of the computer system is active, the first set of one or more capture conditions is satisfied; and in accordance with a determination that a second set of one or more media capture settings, different from the first set of one or more media capture settings, of the computer system is active, the second set of one or more capture conditions is satisfied.

41. The method of claim 40, wherein the first media capture setting of the computer corresponds to the capture of media with a first set of one or more colors, and wherein the second media capture setting of the computer system corresponds to the capture of media with a second set of one or more colors different form the first set of one or more colors.

42. The method of any one of claims 36-41, wherein moving the portion of the computer system that includes the media capture component in the first movement pattern includes performing, via the movement component, a first type of movement and performing, via the movement component, a second type of movement different from the first type of movement.

43. The method of claim 42, wherein the first type of movement of the portion of the computer system that includes the media capture component is a movement in a first lateral direction.

44. The method of any one of claims 42-43, wherein the first type of movement of the portion of the computer system that includes the media capture component includes movement in a first vertical direction.

45. The method of any one of claims 42-44, wherein the first type of movement of the portion of the computer system that includes the media capture component includes movement in a first longitudinal direction.

46. The method of any one of claims 42-45, wherein one or more of the first type of movement of the portion of the computer system that includes the media capture component and the second type of movement of the portion of the computer system that includes the media capture component includes rotational movement.

47. The method of any one of claims 36-46, wherein: the portion of the computer system that includes the media capture component moves at a first rate while the portion of the computer system that includes the media capture component moves in the first movement pattern; and the portion of the computer system that includes the media capture component moves at a second rate different from the first rate while the portion of the computer system that includes the media capture component moves in the second movement pattern.

48. The method of any one of claims 36-47, wherein: the framing of the video changes based on a first set of one or more tracking parameters while the portion of the computer system that includes the media capture component moves in the first movement pattern; and the framing of the video changes based on a second set of one or more tracking parameters, different from the first set of one or more tracking parameters, while the portion of the computer system that includes the media capture component moves in the second movement pattern.

49. A non-transitory computer-readable medium storing one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component and a movement component, the one or more programs including instructions for performing the method of any one of claims 36-48.

50. A computer system that is configured to communicate with a media capture component and a movement component, the computer system comprising: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for performing the method of any one of claims 36-48.

51. A computer system that is configured to communicate with a media capture component and a movement component, the computer system comprising: means for performing the method of any one of claims 36-48.

52. A computer program product, comprising one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component and a movement component, the one or more programs including instructions for performing the method of any one of claims 36-48.

53. A non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component and a movement component, the one or more programs including instructions for: while capturing video via the media capture component: in accordance with a determination that a first set of one or more capture conditions is satisfied, moving, via the movement component, a portion of the computer system that includes the media capture component in a first movement pattern, wherein moving the portion of the computer system in the first movement pattern causes framing of the video to change as the portion of the computer system moves; and in accordance with a determination that a second set of one or more capture conditions is satisfied, wherein the second set of one or more capture conditions is different from the first set of one or more capture conditions, moving, via the movement component, the portion of the computer system in a second movement pattern different from the first movement pattern, wherein moving the portion of the computer system in the second movement pattern causes framing of the video to change as the portion of the computer system moves.

54. A computer system configured to communicate with a media capture component and a movement component, comprising: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: while capturing video via the media capture component: in accordance with a determination that a first set of one or more capture conditions is satisfied, moving, via the movement component, a portion of the computer system that includes the media capture component in a first movement pattern, wherein moving the portion of the computer system in the first movement pattern causes framing of the video to change as the portion of the computer system moves; and in accordance with a determination that a second set of one or more capture conditions is satisfied, wherein the second set of one or more capture conditions is different from the first set of one or more capture conditions, moving, via the movement component, the portion of the computer system in a second movement pattern different from the first movement pattern, wherein moving the portion of the computer system in the second movement pattern causes framing of the video to change as the portion of the computer system moves.

55. A computer system configured to communicate with a media capture component and a movement component, comprising: means, while capturing video via the media capture component, for: in accordance with a determination that a first set of one or more capture conditions is satisfied, moving, via the movement component, a portion of the computer system that includes the media capture component in a first movement pattern, wherein moving the portion of the computer system in the first movement pattern causes framing of the video to change as the portion of the computer system moves; and in accordance with a determination that a second set of one or more capture conditions is satisfied, wherein the second set of one or more capture conditions is different from the first set of one or more capture conditions, moving, via the movement component, the portion of the computer system in a second movement pattern different from the first movement pattern, wherein moving the portion of the computer system in the second movement pattern causes framing of the video to change as the portion of the computer system moves.

56. A computer program product, comprising one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component and a movement component, the one or more programs including instructions for: while capturing video via the media capture component: in accordance with a determination that a first set of one or more capture conditions is satisfied, moving, via the movement component, a portion of the computer system that includes the media capture component in a first movement pattern, wherein moving the portion of the computer system in the first movement pattern causes framing of the video to change as the portion of the computer system moves; and in accordance with a determination that a second set of one or more capture conditions is satisfied, wherein the second set of one or more capture conditions is different from the first set of one or more capture conditions, moving, via the movement component, the portion of the computer system in a second movement pattern different from the first movement pattern, wherein moving the portion of the computer system in the second movement pattern causes framing of the video to change as the portion of the computer system moves.

57. A method, comprising: at a computer system that is in communication with a media capture component, an input component, and an output component: detecting, via the input component, a first set of one or more inputs corresponding to one or more instructions that include one or more spoken words; in response to detecting the first set of one or more inputs corresponding to the one or more instructions, preparing to capture media via the media capture component; and while preparing to capture media via the media capture component: in accordance with a determination that the one or more instructions includes first content, providing, via the output component, first composition guidance, wherein the first composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component; and in accordance with a determination that the one or more instructions includes second content different from the first content, providing, via the output component, second composition guidance different from the first composition guidance, wherein the second composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component.

58. The method of claim 57, further comprising: while preparing to capture media via the media capture component: in accordance with a determination that a respective subject is positioned at a first location within the field of view of the media capture component, providing, via the output component, third composition guidance; and in accordance with a determination that the respective subject is positioned at a second location different from the first location within the field of view of the media capture component, providing, via the output component, fourth composition guidance different from the third composition guidance.

59. The method of any one of claims 57-58, further comprising: while preparing to capture media via the media capture component: in accordance with a determination that a first set of elements are detected, providing, via the output component, fifth composition guidance; and in accordance with a determination that a second set of elements different from the first set of elements is detected, providing, via the output component, sixth composition guidance different from the fifth composition guidance.

60. The method of any one of claims 57-59, wherein: in accordance with a determination that the first set of one or more inputs has a first level of detail, the first composition guidance has a first amount of information; and in accordance with a determination that the first set of one or more inputs has a second level of detail that is greater than the first level of detail, the first composition guidance has a second amount of information that is greater than the first amount of information.

61. The method of any one of claims 57-60, further comprising: while preparing to capture media via the media capture component: in accordance with a determination that the first set of one or more inputs correspond to a first individual, providing, via the output component, eighth composition guidance; and in accordance with a determination that the first set of one or more inputs correspond to a second individual different from the first individual, providing, via the output component, ninth composition guidance different from the eighth composition guidance.

62. The method of any one of claims 57-61, wherein the first composition guidance includes one or more recommendations that one or more characteristics of lighting in an environment in the field of view of the media capture component should be changed.

63. The method of claim 62, wherein the first composition guidance includes one or more recommendations that an amount of light in an environment in the field of view of the media capture component should be changed.

64. The method of any one of claims 62-63, wherein the first composition guidance includes one or more recommendations that a type of light in an environment in the field of view of the media capture component should be changed.

65. The method of any one of claims 62-64, wherein the first composition guidance includes one or more recommendations that one or more colors of light in an environment in the field of view of the media capture component should be changed.

66. The method of any one of claims 62-65, wherein the computer system is in communication with a set of one or more external lights, the method further comprising: while preparing to capture media via the media capture component and without detecting an input corresponding to a subject, sending instructions to the set of one or more external lights, wherein sending the instructions to the set of the one or more external lights causes one or more characteristics of lighting in an environment in the field of view of the media capture component to change.

67. The method of any one of claims 57-66, wherein the first composition guidance includes one or more recommendations that one or more objects should be moved.

68. The method of claim 57-67, wherein the first composition guidance includes one or more recommendations that a subject should move from a first position to a second position.

69. The method of any one of claims 57-68, further comprising: while preparing to capture media via the media capture component: in accordance with a determination that the positioning of a portion of a respective subject within the field of view of the media capture component satisfies a first set of one or more positioning criteria relative to the field of view of the media capture component, providing, via the output component, tenth composition guidance; and in accordance with a determination that the position of the portion of the respective subject within the field of view of the media capture component satisfies a second set of one or more positioning criteria, different from the first set of one of one or more positioning criteria, relative to the field of view of the media capture component, providing, via the output component, eleventh composition guidance different from the tenth composition guidance.

70. The method of claim 69, wherein the tenth composition guidance is provided in reference to the positioning of the portion of the respective subject relative to a fixed reference point.

71. The method of any one of claims 69-70, wherein the tenth composition guidance is provided with respect to a spatial relationship of the portion of the respective subject and the body of the respective subject.

72. The method of any one of claims 57-71, further comprising: while preparing to capture media via the media capture component: in accordance with a determination that a horizontal plane in the field of view of the media capture component has a first spatial orientation, providing, via the output component, twelfth composition guidance; and in accordance with a determination that the horizontal plane in the field of view of the media capture component has a second spatial orientation different from the first spatial orientation, providing, via the output component, thirteenth composition guidance different from the twelfth composition guidance.

73. The method of any one of claims 57-72, further comprising: while preparing to capture media via the media capture component: in accordance with a determination that there is a first amount of distance between a boundary of the field of view of the media capture component and a portion of a respective subject, providing, via the output component, fourteenth composition guidance; and in accordance with a determination that there is a second amount of distance between the boundary of the field of view of the media capture component and the portion of the respective subject, providing, via the output component, fifteenth composition guidance different from the fourteenth composition guidance.

74. The method of any one of claims 57-73, wherein the computer system is in communication with a movement component, the method further comprising: while preparing to capture media via the media capture component, moving, via the movement component, a portion of the computer system.

75. The method of claim 74, further comprising: outputting, via the output component, an indication of why the portion of the computer system is moving.

76. The method of any one of claims 57-75, further comprising: after preparing to capture media via the media capture component and in accordance with a determination that a set of one or more conditions is satisfied, capturing, via the media capture component, media without detecting an intervening input.

77. The method of any one of claims 57-76, further comprising: after preparing to capture media via the media capture component, detecting a respective input from a user; and in response to detecting the respective input from the user, capturing, via the media capture component, a second media item.

78. The method of any one of claims 57-77, wherein the computer system is in communication with a display component, the method further comprising: while preparing to capture media via the media capture component, displaying, via the display component, a representation of the field of view of the media capture component.

79. A non-transitory computer-readable medium storing one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component, an input component, and an output component, the one or more programs including instructions for performing the method of any one of claims 57-78.

80. A computer system that is configured to communicate with a media capture component, an input component, and an output component, the computer system comprising: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for performing the method of any one of claims 57-78.

81. A computer system that is configured to communicate with a media capture component, an input component, and an output component, the computer system comprising: means for performing the method of any one of claims 57-78.

82. A computer program product, comprising one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component, an input component, and an output component, the one or more programs including instructions for performing the method of any one of claims 57-78.

83. A non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component, an input component, and an output component, the one or more programs including instructions for: detecting, via the input component, a first set of one or more inputs corresponding to one or more instructions that include one or more spoken words; in response to detecting the first set of one or more inputs corresponding to the one or more instructions, preparing to capture media via the media capture component; and while preparing to capture media via the media capture component: in accordance with a determination that the one or more instructions includes first content, providing, via the output component, first composition guidance, wherein the first composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component; and in accordance with a determination that the one or more instructions includes second content different from the first content, providing, via the output component, second composition guidance different from the first composition guidance, wherein the second composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component.

84. A computer system configured to communicate with a media capture component, an input component, and an output component, comprising: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: detecting, via the input component, a first set of one or more inputs corresponding to one or more instructions that include one or more spoken words; in response to detecting the first set of one or more inputs corresponding to the one or more instructions, preparing to capture media via the media capture component; and while preparing to capture media via the media capture component: in accordance with a determination that the one or more instructions includes first content, providing, via the output component, first composition guidance, wherein the first composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component; and in accordance with a determination that the one or more instructions includes second content different from the first content, providing, via the output component, second composition guidance different from the first composition guidance, wherein the second composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component.

85. A computer system configured to communicate with a media capture component, an input component, and an output component, comprising: means for, detecting, via the input component, a first set of one or more inputs corresponding to one or more instructions that include one or more spoken words; means for, in response to detecting the first set of one or more inputs corresponding to the one or more instructions, preparing to capture media via the media capture component; and means, while preparing to capture media via the media capture component, for: in accordance with a determination that the one or more instructions includes first content, providing, via the output component, first composition guidance, wherein the first composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component; and in accordance with a determination that the one or more instructions includes second content different from the first content, providing, via the output component, second composition guidance different from the first composition guidance, wherein the second composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component.

86. A computer program product, comprising one or more programs configured to be executed by one or more processors of a computer system that is in communication with a media capture component, an input component, and an output component, the one or more programs including instructions for: detecting, via the input component, a first set of one or more inputs corresponding to one or more instructions that include one or more spoken words; in response to detecting the first set of one or more inputs corresponding to the one or more instructions, preparing to capture media via the media capture component; and while preparing to capture media via the media capture component: in accordance with a determination that the one or more instructions includes first content, providing, via the output component, first composition guidance, wherein the first composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component; and in accordance with a determination that the one or more instructions includes second content different from the first content, providing, via the output component, second composition guidance different from the first composition guidance, wherein the second composition guidance includes one or more recommendations for changing a spatial arrangement of one or more objects in the field of view of the media capture component.