WO2025034441A1

WO2025034441A1 - Asymmetrical microphone arrays

Info

Publication number: WO2025034441A1
Application number: PCT/US2024/040037
Authority: WO
Inventors: Jerad Lewis; Edwin Johnson; Daniel Jones; James NESFIELD
Original assignee: Sonos Inc
Current assignee: Sonos Inc
Priority date: 2023-08-04
Filing date: 2024-07-29
Publication date: 2025-02-13
Anticipated expiration: 2026-02-04

Abstract

There is provided a playback device comprising at least one audio transducer and an asymmetrical microphone array comprising at least four microphones. A network microphone device is also provided, comprising an asymmetrical microphone array comprising at least four microphones. The asymmetrical microphone arrays may include a first set of two or more microphones, wherein the microphones of the first set are separated by a first spacing and a second set of two or more microphones, wherein the microphones of the second set are separated by a second spacing that is smaller than the first spacing.

Description

ASYMMETRICAL MICROPHONE ARRAYS

CROSS-REFERENCE TO RELATED APPLICATION

[0001] This application claims the benefit of priority to U.S. Patent Application No. 63/517,820 filed on 4 August 2023, which is incorporated herein by reference in its entirety.

FIELD OF THE DISCLOSURE

[0002] The present disclosure is related to consumer goods and, more particularly, to methods, systems, products, features, services, and other elements directed to media playback or some aspect thereof.

BACKGROUND

[0003] Options for accessing and listening to digital audio in an out-loud setting were limited until in 2002, when SONOS, Inc. began development of a new type of playback system. Sonos then filed one of its first patent applications in 2003, entitled “Method for Synchronizing Audio Playback between Multiple Networked Devices,” and began offering its first media playback systems for sale in 2005. The Sonos Wireless Home Sound System enables people to experience music from many sources via one or more networked playback devices. Through a software control application installed on a controller (e.g., smartphone, tablet, computer, voice input device), one can play what she wants in any room having a networked playback device. Media content (e.g., songs, podcasts, video sound) can be streamed to playback devices such that each room with a playback device can play back corresponding different media content. In addition, rooms can be grouped together for synchronous playback of the same media content, and/or the same media content can be heard in all rooms synchronously.

BRIEF DESCRIPTION OF THE DRAWINGS

[0004] Features, examples, and advantages of the presently disclosed technology may be better understood with regard to the following description, appended claims, and accompanying drawings, as listed below. A person skilled in the relevant art will understand that the features shown in the drawings are for purposes of illustrations, and variations, including different and/or additional features and arrangements thereof, are possible.

[0005] Figure 1A is a partial cutaway view of an environment having a media playback system configured in accordance with examples of the present technology.

[0006] Figure IB is a schematic diagram of the media playback system of Figure 1A and one or more networks. [0007] Figure 1C is a block diagram of a playback device.

[0008] Figure ID is a block diagram of a playback device.

[0009] Figure IE is a block diagram of a network microphone device.

[0010] Figure IF is a block diagram of a network microphone device.

[0011] Figure 1G is a block diagram of a playback device.

[0012] Figure 1H is a partially schematic diagram of a control device.

[0013] Figures 2A, 2B, and 2C show perspective, top (plan), and front views of a first playback device including an asymmetrical microphone array according to the invention.

[0014] Figure 3 shows a top, plan view of a second playback device including an asymmetrical microphone array according to the invention.

[0015] Figure 4 shows a top, plan view of a third playback device including an asymmetrical microphone array according to the invention.

[0016] Figure 5 shows a top, plan view of a fourth playback device including an asymmetrical microphone array according to the invention.

[0017] Figures 6A, 6B, and 6C show perspective, top, and front views of a fifth playback device including an asymmetrical microphone array according to the invention.

[0018] Figure 7 is a schematic diagram of a first media playback system comprising a playback device having an asymmetric microphone array according to the invention;

[0019] Figure 8 is a schematic diagram of a second media playback system comprising a plurality of playback devices having asymmetric microphone arrays according to the invention; [0020] Figure 9 is a schematic diagram of a third media playback system comprising a plurality of playback devices having asymmetric microphone arrays according to the invention; [0021] Figure 10 is a schematic diagram of a fourth media playback system comprising a playback device having an asymmetric microphone array according to the invention;

[0022] The drawings are for the purpose of illustrating example examples, but those of ordinary skill in the art will understand that the technology disclosed herein is not limited to the arrangements and/or instrumentality shown in the drawings.

DETAILED DESCRIPTION

I. Overview

[0023] In addition to reproducing audio for a listener to enjoy, modem playback devices and media playback systems may also be capable of performing other functions, such as receiving and taking actions based on voice commands, communicating information between devices using audio, or characterizing a listening environment using audio. These additional actions may involve receiving audio signals at one or more microphones at the playback device or another network microphone device (NMD), such as a smart home device, either directly from a source or via reflections from the listening environment.

[0024] Embodiments of the present invention seek to improve performance of microphone arrays for playback devices, NMDs, and media playback systems. Performance may be improved by enabling determination of a wider array of angles and directions of audio signals, by enabling more efficient and accurate noise reduction in audio signals, by allowing detection of more direct audio signals, by improving disambiguation between audio signals received from the front and rear, and/or reducing self-sound. Self-sound is sound that is output by one or more transducers of a playback device and that may be received by one or more microphones of the same playback device, where it acts as noise to mask or otherwise obscure other parts of the received audio signal. Self-sound is commonly, but not always, caused by low-frequency sound. Low-frequency sound may be more likely to reach a microphone of a playback device because it is less directional than high-frequency sound. Sound becomes less directional as the size of the wavelength of the sound increases relative to the sound source outputting the sound, and, because wavelength increases as frequency decreases, low-frequency sound is generally less directional than high-frequency sound.

[0025] According to a first aspect of the invention, there is provided a playback device, which comprises at least one audio transducer, such as an audio playback transducer, and an asymmetrical microphone array comprising at least four microphones. Asymmetry in a microphone array comprising at least four microphones may offer variation in the type and variety of processing of received audio signals that can be achieved. Signals may be analyzed in different ways because of the different relative positions of different groupings of microphones, when compared to, e.g., a symmetrical microphone array. An asymmetrical array may allow several different groups of microphones having different sized apertures (aperture being the distance between outermost microphones of the grouping), which may enable different techniques to be implemented to analyze signals. For example, large-aperture groupings may be useful for achieving high angular resolution and range estimation, while small-aperture groupings may enable certain beamforming techniques.

[0026] The microphone array may be arranged asymmetrically in different ways. Asymmetry may be defined about a center point of the playback device; the microphone array may be asymmetric about the center point of the playback device. The ‘‘center point” of the playback device may be determined as a center of the playback device along its length, longitudinally, or along a main or longitudinal axis, or may be a point of rotational symmetry of the device, or a centroid of the device. The center point may be defined by or on a line or axis of the playback device, or a plane passing through the playback device. The playback device may have mirror symmetry on either side of the plane, line, or axis.

[0027] In some examples, to achieve the asymmetry, a majority of the at least four microphones of the microphone array may be to one side of the center point. The “majority” means more than half of the microphones of the array. For example, for an array including four microphones, three of the microphones may be to one side of the center point, and the other microphone may be to the other side. To be to one side of the center point, a microphone may be physically separate from the center point such that there is a spacing between the center point and the microphone, or alternatively, a main axis of the microphone may be spaced apart from the center point. In other examples, however, a majority of the microphones may be aligned with the center point and one or more additional microphones may be positioned asymmetrically on the device.

[0028] The asymmetrical microphone array may be divided into different sets of microphones, and the asymmetry may be achieved based on differences between these sets of microphones. The array may have a first set of two or more microphones and a second set of two or more microphones. The sets may be distinct, meaning that there are no microphones in more than one set, or that each microphone is in a single set. The number of microphones in each set may be the same, or may differ.

[0029] To achieve asymmetry, the first set may be symmetrically arranged, while the second set may be asymmetrically arranged. The first set may be symmetrical about a center point of the array, a center point of the playback device, or another point, while the second set may be asymmetrical about the same center point. The second set of two or more microphones may be asymmetrical about a center point by at least one microphone of the second set being a different distance from that point than at least one other microphone of the second set, or by a majority or all of the second set of two or more microphones being to one side of a center point.

[0030] Sets of microphones may be defined based on a spacing between adjacent microphones. Different sets of microphones may have different spacings between microphones. An array may include a first set of two or more microphones, whose microphones are separated by a first spacing, and a second set of two or more microphones, whose microphones are separated by a second spacing that is different to the first spacing. The second spacing may be smaller than the first spacing, or may be greater than the first spacing. The first spacing may be at least two times, three times, four times, or five times greater than the second spacing. Alternatively, the second spacing may be at least two times, three times, four times, or five times greater than the first spacing. A difference in spacing between sets may provide each of the sets with different properties and allow different capabilities with each. For example, a set having a wider spacing may provide useful range estimation characteristics as well as improving angular resolution of incident signals. A set having a smaller spacing may enable some beamforming techniques to be implemented.

[0031] Sets of microphones may additionally, or alternatively, be defined based on an aperture. The sets may form sub-arrays having different apertures, wherein an aperture of the first set is smaller than or greater than an aperture of the second set of microphones.

[0032] Although it is indicated above that the sets of microphones may be distinct, the microphones of one set may be positioned between the microphones of a different set. For example, the second set of two or more microphones described above may be positioned between at least two microphones of the first set of two or more microphones. This may be described as the second set of microphones being within the aperture of the first set of microphones. The microphones of the second set may be on an axis connecting the microphones of the first set, or may be off-axis, as will be described below. Accordingly, the microphones of the second set being between the microphones of the first set may be described in terms of longitudinal position along the playback device, such that the second set of microphones is longitudinally between the microphones of the first set, or in terms of one or more planes defined by the first set, such that the second set of microphones is positioned between two planes defined by the first set of microphones or that the second set of microphones is within a plane bounded by the microphones of the first set. In some examples, due to the asymmetry, the second set of microphones may be positioned between a microphone of the first set of microphones and a center point of the playback device.

[0033] Individual microphones in the first set may be separated from the second set by different spacings. Specifically, a first microphone and a second microphone of the first set, separated by the first spacing, may be different distances from the second set of microphones. The first microphone may be separated from the second set by a third spacing, and the second microphone may be separated from the second set by a fourth spacing, which is different to the third spacing. The distances or spacings between individual microphones of the first set and the second set may be measured between the microphone of the first set and a nearest microphone of the second set or between the microphone of the first set and a center point of the second set. The spacing between microphones of the first set and microphones of the second set may be greater than the spacing between the microphones of the second set; in the above example, the third and fourth spacings may be greater than the second spacing.

[0034] The placing of microphones on the playback device may be varied. This may improve the properties of the array and/or enhance the effects achieved by introducing asymmetry in the microphone array. Microphones of the microphone array may be distributed over at least two different faces of the playback device. The microphones may be distributed over a curved face or surface of the playback device. Distributing the microphones over different faces or over a curved surface may result in some of the microphones being directed in a first direction and others in a second direction, different to the first. The different directions may be nonparallel, and an angle between, e.g., a vertical axis, and each direction may differ by between 20 and 90 degrees, for example. The microphones may be considered to be in different planes or to have different orientations. Over a curved surface, the microphones may be at different positions on the curved surface having non-parallel tangent planes, relative to the curved surface. A microphone at a face may be within an opening, aperture, or through-hole in that face, and may be oriented so as to receive sound that is incident on the face. As those of ordinary skill in the art will appreciate, the term “face” may refer to a flat surface and/or a non-flat surface (e.g., a curved surface), that may be joined to one or more other faces along edges. A face may define a plane, such that different faces define different planes that are non-parallel.

[0035] Considering a microphone array having first and second subarrays or sets of microphones, the first set may be placed on a different face or at a different angle to the second set. The first set may be oriented in a first direction, while the second set may be oriented in a second direction. The first set may be configured to be upward-facing during use, while the second set may be configured to face sideways or to be forward-facing in use. The first set may be provided on a first face, which may be a face of the playback device configured to face upwards, in use, so that the microphones are also upward-facing, in use. The second set may be provided on a second face, and may be a face of the playback device configured to face forw ards, in use, so that the microphones are also forward-facing, in use. A forward-facing face of the playback device may include one or more horizontal-facing transducers and may face towards a listener location, in use. A forward-facing face of the playback device may be opposite a rearward-facing face. A rearward-facing face faces a wall or away from a listener, in use, and which may include one or more ports for receiving cables such as a power cable, a USB cable, an HDMI cable, a networking cable (such as ethemet), a combined power and networking/ data cables, (such as Power-Over-Ethemet (POE)), and/or another suitable cable(s) (e.g., one or more digital and/or analog audio and/or video cables). An upward-facing face of the playback device may include one or more upward-facing transducers, and may include one or more buttons or controls for providing inputs to the device. An upward-facing face may be a face opposite a base of the playback device, on which the playback stands or by which the playback device is mounted, in use.

[0036] In some examples, a playback device includes a first subarray or set of microphones having a first spacing and a second subarray or set of microphones having a second spacing that is different (e.g., smaller) than the first spacing, the first set may be upward-facing, in use and the second set may be forward- facing, in use. This may provide further benefits. The second set of microphones, whose smaller spacing may allow for beamforming techniques, may receive more direct sound from a source, e.g., a listener speaking a voice command, meaning that the source can be located better using the second set of microphones than would be possible otherwise. The first set of microphones with its larger spacing may be better used for improved angular resolution and range estimation, making it more useful for reflected audio signals from above the device, e.g., off a ceiling. Together, the two sets of microphones arranged in this way, with their specific spacings, may also provide improved elevation estimation, enabling not only a height of an audio source relative to the playback device to be determined in addition to a distance of the audio source from the playback device and an azimuthal angle of the audio source relative to the playback device. This may be particularly useful when characterizing a listening environment using audio exchanged between different devices, because the relative positions of the devices in the environment may be more accurately determined.

[0037] The positions of the microphones may be based on the positions of different audio transducers of the playback device. Particularly where it is desirable to reduce self-sound, it is useful to consider placement of the microphones relative to transducers. Microphones may be placed on parts of the playback device such that a majority of the microphones are at least a predetermined distance from a transducer. Doing so reduces the amount of self-sound picked up by those microphones. Positioning microphones in this way relative to a transducer may result in the asymmetry described above.

[0038] Self-sound may be a particular issue for low-frequency sound. Accordingly, some microphones may be placed to be at least a predetermined distance from a low-frequency or bass transducer, such as a transducer whose output includes frequencies below 200 Hz, below 300 Hz, below 400 Hz, below 500 Hz, below 1kHz, or below 1.5 kHz. In some examples, some microphones may be placed to be at a predetermined distance or further from any transducer of the playback device. Self-sound may also be reduced by having the microphones spread across different faces of the playback device. Reducing self-sound may provide benefits for noise reduction techniques. Specifically, at least some of the microphones may have a lower noise content due to having a lower self-sound and by having different positionings, and this lower noise content can be used to identify and reduce the noise content of other microphones. The predetermined distance may be at least a second spacing between the microphones of the second set, so that a microphone of the second set is closer to another microphone of the second set than it is to a transducer. Although a low-frequency or bass transducer is referred to here, the microphones may additionally or alternatively be placed to be a predetermined distance from a passive radiator or a port of the playback device.

[0039] In some examples, at least some of the microphones of the array may be provided on the same face as at least one transducer of the playback device, and other microphones may be provided on a different face. Where a first set of two or more microphones and a second set of two or more microphones are included in the array, the second set may be provided on a same face of the playback device as the transducer, while the first set may be on a different face. The second set of microphones may be positioned opposite the transducer about the center point of the playback device. The second set of microphones may be symmetrically arranged with the transducer about the center point of the playback device.

[0040] The playback device may take various different forms or have various different functions. In some examples, the playback device may be a soundbar, or another elongated playback device such as a floorstanding or tower speaker. The playback device may be configured to output home theater audio. In some examples, the playback device may be a portable playback device.

[0041] The playback device may form part of a media playback system, which may also include other playback devices, one or more amplifiers, one or more control devices, one or more network microphone devices, or a television or other visual equipment.

[0042] According to a further aspect, there is provided a NMD comprising an asymmetrical microphone array comprising at least four microphones. The asymmetrical microphone array may have some or all of the features of an asymmetrical microphone as defined above.

[0043] According to a further aspect, there is provided a media playback system comprising a first playback device or a first network microphone device including an asymmetrical microphone array having at least four microphones. The media playback system may comprise a plurality of playback devices and/or network microphone devices. The media playback system may comprise a second playback device or a second network microphone device having an asymmetrical microphone array. The media playback system may comprise a second playback device or a second network microphone device having a symmetrical microphone array. The second playback device or second network microphone device may be a portable device.

[0044] Many of the details, dimensions, angles and other features shown in the Figures are merely illustrative of particular examples of the disclosed technology. Accordingly, other examples can have other details, dimensions, angles and features without departing from the spirit or scope of the disclosure. In addition, those of ordinary skill in the art will appreciate that further examples of the various disclosed technologies can be practiced without several of the details described below.

[0045] These and other features described herein improve upon earlier-developed systems and methods including, for example the systems and methods disclosed and described in the following earlier-filed patent applications assigned to Sonos, Inc.

[0046] U.S. App. 14/871,494 titled “Spatial Mapping of Audio Playback Devices in a Listening Environment,” filed on September 30, 2015 and issued on April 17, 2018, as U.S. Pat. 9,949,054 (“Kadri ‘054”) discloses, among other features, analyzing a characteristic of a received audio signal from each of a plurality of microphones to determine an angular orientation of two playback devices relative to each other.

[0047] U.S. App. 15/229,855 titled “Determining Direction of Networked Microphone Device Relative to Audio Playback Device,” filed on August 5, 2016 and issued on June 27, 2017, as U.S. Pat. 9,963,164 (“Kadri ‘ 164”) discloses, among other features determining a position of a network microphone device relative to two audio drivers using audio output by the audio drivers and recorded by the network microphone device. In embodiments, Kadri ‘054 discloses a network microphone device having a microphone array having a plurality of microphones. In some examples, a first subset of the microphone array may be sensitive to a first frequency range and a second subset of the microphone may be sensitive to a second frequency range.

[0048] U.S. App. 15/282,554 titled “Multi-Orientation Playback Device Microphones,” filed on September 30, 2016 and issued on August 22, 2017, as U.S. Pat. 9,743,204 (“Welch ‘204”) discloses, among other features, a playback device that may have one or more microphone arrays installed or mounted on a housing or body of the playback device. The microphone arrays have a circular shape and the individual microphones are distributed around a circumference of the microphone array.

[0049] U.S. App. 15/984,073 titled “Linear Filtering for Noise-Suppressed Speech Detection,” filed on May 18, 2018 and issued on November 24, 2020, as U.S. Pat. 10,847,178 (“Sereshki ‘178”) discloses, among other features, a network device having microphones arranged in a disarrayed fashion, where an arrangement is referred to as “disarrayed” because the microphones are too spread out from one another to perform beamforming, or at least too spread out from one another to perform beamforming effectively. Another embodiment of Sereshki ‘178 discloses microphones arranged over multiple network devices. The network devices employ multi-microphone noise suppression techniques that do not necessarily rely on the geometrical arrangement of the microphones.

[0050] However, none of the aforementioned earlier-filed applications, individually or in combination, disclose the particular combinations of features and functions shown and described herein that relate to a playback device or a network microphone device having an asymmetrical array comprising at least four microphones, or to a media playback system including such a device.

[0051] The entire contents of U.S. Apps 14/871,494; 15/229,855; 15/282,554; and 15/984,073 are incorporated herein by reference.

II. Suitable Operating Environment

[0052] Figure 1A is a partial cutaway view of a media playback system 100 distributed in an environment 101 (e.g., a house). The media playback system 100 comprises one or more playback devices 110 (identified individually as playback devices HOa-n), one or more network microphone devices (“NMDs”), 120 (identified individually as NMDs 120a-c), and one or more control devices 130 (identified individually as control devices 130a and 130b).

[0053] As used herein the term “playback device” can generally refer to a network device configured to receive, process, and output data of a media playback system. For example, a playback device can be a network device that receives and processes audio content. In some examples, a playback device includes one or more transducers or speakers powered by one or more amplifiers. In other examples, however, a playback device includes one of (or neither ol) the speaker and the amplifier. For instance, a playback device can comprise one or more amplifiers configured to drive one or more speakers external to the playback device via a corresponding wire or cable. [0054] Moreover, as used herein the term NMD (i.e., a “network microphone device”) can generally refer to a network device that is configured for audio detection. In some examples, an NMD is a stand-alone device configured primarily for audio detection. In other examples, an NMD is incorporated into a playback device (or vice versa).

[0055] The term “control device” can generally refer to a network device configured to perform functions relevant to facilitating user access, control, and/or configuration of the media playback system 100.

[0056] Each of the playback devices 110 is configured to receive audio signals or data from one or more media sources (e.g., one or more remote servers, one or more local devices) and play back the received audio signals or data as sound. The one or more NMDs 120 are configured to receive spoken word commands, and the one or more control devices 130 are configured to receive user input. In response to the received spoken word commands and/or user input, the media playback system 100 can play back audio via one or more of the playback devices 110. In certain examples, the playback devices 110 are configured to commence playback of media content in response to a trigger. For instance, one or more of the playback devices 110 can be configured to play back a morning playlist upon detection of an associated trigger condition (e.g., presence of a user in a kitchen, detection of a coffee machine operation). In some examples, for instance, the media playback system 100 is configured to play back audio from a first playback device (e.g., the playback device 110a) in synchrony with a second playback device (e.g., the playback device 110b). Interactions between the playback devices 110, NMDs 120, and/or control devices 130 of the media playback system 100 configured in accordance with the various examples of the disclosure are described in greater detail below.

[0057] In the illustrated example of Figure 1A, the environment 101 comprises a household having several rooms, spaces, and/or playback zones, including (clockwise from upper left) a master bathroom 101a, a master bedroom 101b, a second bedroom 101c, a family room or den 1 Old, an office 101 e, a living room 10 If, a dining room 101g, a kitchen lOlh, and an outdoor patio lOli. While certain examples and examples are described below in the context of a home environment, the technologies described herein may be implemented in other types of environments. In some examples, for instance, the media playback system 100 can be implemented in one or more commercial settings (e.g., a restaurant, mall, airport, hotel, a retail or other store), one or more vehicles (e.g., a sports utility vehicle, bus, car, a ship, a boat, an airplane), multiple environments (e.g., a combination of home and vehicle environments), and/or another suitable environment where multi-zone audio may be desirable. Each room, space, or playback zone other than the patio 1 Oli is bounded by a ceiling. Ceiling characteristics may differ between rooms, spaces, or playback zones.

[0058] The media playback system 100 can comprise one or more playback zones, some of which may correspond to the rooms in the environment 101. Each of the playback zones and/or the individual rooms may be referred to as a listening environment. The media playback system 100 can be established with one or more playback zones, after which additional zones may be added, or removed to form, for example, the configuration shown in Figure 1 A. Each zone may be given a name according to a different room or space such as the office 101 e, master bathroom 101a, master bedroom 101b, the second bedroom 101c, kitchen lOlh, dining room 101g, living room lOlf, and/or the balcony lOli. In some examples, a single playback zone may include multiple rooms or spaces. In certain examples, a single room or space may include multiple playback zones.

[0059] In the illustrated example of Figure 1 A, the master bathroom 101a, the second bedroom 101c, the office 101 e, the living room lOlf, the dining room 101g, the kitchen lOlh, and the outdoor patio lOli each include one playback device 110, and the master bedroom 101b and the den lOld include a plurality of playback devices 110. In the master bedroom 101b, the playback devices 1101 and 11 Om may be configured, for example, to play back audio content in synchrony as individual ones of playback devices 110, as a bonded playback zone, as a consolidated playback device, and/or any combination thereof. Similarly, in the den 101 d, the playback devices HOh-j can be configured, for instance, to play back audio content in synchrony as individual ones of playback devices 110, as one or more bonded playback devices, and/or as one or more consolidated playback devices. Additional details regarding bonded and consolidated playback devices are described below with respect to Figures IB and IE.

[0060] In some examples, one or more of the playback zones in the environment 101 may each be playing different audio content. For instance, a user may be grilling on the patio lOli and listening to hip hop music being played by the playback device 110c while another user is preparing food in the kitchen lOlh and listening to classical music played by the playback device 110b. In another example, a playback zone may play the same audio content in synchrony with another playback zone. For instance, the user may be in the office 101 e listening to the playback device 1 lOf playing back the same hip hop music being played back by playback device 110c on the patio 1 Oli. In some examples, the playback devices 110c and 11 Of play back the hip hop music in synchrony such that the user perceives that the audio content is being played seamlessly (or at least substantially seamlessly) while moving between different playback zones. Additional details regarding audio playback synchronization among playback devices and/or zones can be found, for example, in U.S. Patent No. 8,234,395 entitled, “System and method for synchronizing operations among a plurality of independently clocked digital data processing devices,” which is incorporated by reference above. a. Suitable Media Playback System

[0061] Figure IB is a schematic diagram of the media playback system 100 and a cloud network 102. For ease of illustration, certain devices of the media playback system 100 and the cloud network 102 are omitted from Figure IB. One or more communication links 103 (referred to hereinafter as “the links 103”) communicatively couple the media playback system 100 and the cloud network 102.

[0062] The links 103 can comprise, for example, one or more wired networks, one or more wireless networks, one or more wide area networks (WAN), one or more local area networks (LAN), one or more personal area networks (PAN), one or more telecommunication networks (e.g., one or more Global System for Mobiles (GSM) networks, Code Division Multiple Access (CDMA) networks, Long-Term Evolution (LTE) networks, 5G communication network networks, and/or other suitable data transmission protocol networks), etc. The cloud network 102 is configured to deliver media content (e.g., audio content, video content, photographs, social media content) to the media playback system 100 in response to a request transmitted from the media playback system 100 via the links 103. In some examples, the cloud network 102 is further configured to receive data (e.g. voice input data) from the media playback system 100 and correspondingly transmit commands and/or media content to the media playback system 100.

[0063] The cloud network 102 comprises computing devices 106 (identified separately as a first computing device 106a, a second computing device 106b, and a third computing device 106c). The computing devices 106 can comprise individual computers or servers, such as, for example, a media streaming service server storing audio and/or other media content, a voice service server, a social media server, a media playback system control server, etc. In some examples, one or more of the computing devices 106 compnse modules of a single computer or server. In certain examples, one or more of the computing devices 106 comprise one or more modules, computers, and/or servers. Moreover, while the cloud network 102 is described above in the context of a single cloud network, in some examples the cloud network 102 comprises a plurality of cloud networks comprising communicatively coupled computing devices. Furthermore, while the cloud network 102 is shown in Figure IB as having three of the computing devices 106, in some examples, the cloud network 102 comprises fewer (or more than) three computing devices 106.

[0064] The media playback system 100 may be configured to receive media content from the networks 102 via the links 103. The received media content can comprise, for example, a Uniform Resource Identifier (URI) and/or a Uniform Resource Uocator (URL). For instance, in some examples, the media playback system 100 can stream, download, or otherwise obtain data from a URI or a URL corresponding to the received media content. A network 104 communicatively couples the links 103 and at least a portion of the devices (e.g., one or more of the playback devices 110, NMDs 120, and/or control devices 130) of the media playback system 100. The network 104 can include, for example, a wireless network (e.g., a WiFi network, a Bluetooth, a Z-Wave network, a ZigBee, and/or other suitable wireless communication protocol network) and/or a wired network (e.g., a network comprising Ethernet, Universal Serial Bus (USB), and/or another suitable wired communication). As those of ordinary skill in the art will appreciate, as used herein, “WiFi” can refer to several different communication protocols including, for example, Institute of Electrical and Electronics Engineers (IEEE) 802.11a, 802.11b, 802.11g, 802.11n, 802.11ac, 802.11ac, 802. 11 ad, 802. 11 af, 802.11 ah, 802. 11 ai, 802. 11 aj , 802.11 aq, 802.11 ax, 802. 11 ay , 802.15, etc. transmitted at 2.4 Gigahertz (GHz), 5 GHz, and/or another suitable frequency.

[0065] In some examples, the network 104 comprises a dedicated communication network that the media playback system 100 uses to transmit messages between individual devices and/or to transmit media content to and from media content sources (e.g., one or more of the computing devices 106). In certain examples, the network 104 is configured to be accessible only to devices in the media playback system 100, thereby reducing interference and competition with other household devices. In other examples, however, the network 104 comprises an existing household communication network (e.g., a household WiFi network). In some examples, the links 103 and the network 104 comprise one or more of the same networks. In some examples, for instance, the links 103 and the network 104 comprise a telecommunication network (e.g., an LTE network, a 5G network). Moreover, in some examples, the media playback system 100 is implemented without the network 104, and devices compnsing the media playback system 100 can communicate with each other, for example, via one or more direct connections, PANs, telecommunication networks, and/or other suitable communication links. [0066] In some examples, audio content sources may be regularly added or removed from the media playback system 100. In some examples, for instance, the media playback system 100 performs an indexing of media items when one or more media content sources are updated, added to, and/or removed from the media playback system 100. The media playback system 100 can scan identifiable media items in some or all folders and/or directories accessible to the playback devices 110, and generate or update a media content database comprising metadata (e.g., title, artist, album, track length) and other associated information (e.g., URIs, URLs) for each identifiable media item found. In some examples, for instance, the media content database is stored on one or more of the playback devices 110, network microphone devices 120, and/or control devices 130.

[0067] In the illustrated example of Figure IB, the playback devices 1101 and 110m comprise a group 107a. The playback devices 1101 and 110m can be positioned in different rooms in a household and be grouped together in the group 107a on a temporary or permanent basis based on user input received at the control device 130a and/or another control device 130 in the media playback system 100. When arranged in the group 107a, the playback devices 1101 and 110m can be configured to play back the same or similar audio content in synchrony from one or more audio content sources. In certain examples, for instance, the group 107a comprises a bonded zone in which the playback devices 1101 and 110m comprise left audio and right audio channels, respectively, of multi-channel audio content, thereby producing or enhancing a stereo effect of the audio content. In some examples, the group 107a includes additional playback devices 110. In other examples, however, the media playback system 100 omits the group 107a and/or other grouped arrangements of the playback devices 110.

[0068] The media playback system 100 of Figure IB includes the NMDs 120a and 120d, each comprising one or more microphones configured to receive voice utterances from a user. In the illustrated example of Figure IB, the NMD 120a is a standalone device and the NMD 120d is integrated into the playback device 11 On. The NMD 120a, for example, is configured to receive voice input 121 from a user 123. In some examples, the NMD 120a transmits data associated with the received voice input 121 to a voice assistant service (VAS) configured to (i) process the received voice input data and (ii) transmit a corresponding command to the media playback system 100. In some examples, for instance, the computing device 106c comprises one or more modules and/or servers of a VAS (e.g., a VAS operated by one or more of SONOS®, AMAZON®, GOOGLE® APPLE®, MICROSOFT®). The computing device 106c can receive the voice input data from the NMD 120a via the network 104 and the links 103. In response to receiving the voice input data, the computing device 106c processes the voice input data (i.e., “Play Hey Jude by The Beatles”), and determines that the processed voice input includes a command to play a song (e.g., “Hey Jude”). The computing device 106c accordingly transmits commands to the media playback system 100 to play back “Hey Jude” by the Beatles from a suitable media service (e.g., via one or more of the computing devices 106) on one or more of the playback devices 110. Although the media playback system 100 is shown as including a plurality of playback devices 110a- 11 On, an NMD 120a, a control device 130a, and a network 104, in other examples the media playback system 100 may include one playback device incorporating an upward-firing transducer and one or more microphones, as well as a processor and memory stored at, for example, the playback device, a network microphone device, or a control device. b. Suitable Playback Devices

[0069] Figure 1C is a block diagram of the playback device 110a comprising an input/output 111. The input/output 111 can include an analog I/O I l la (e.g., one or more wires, cables, and/or other suitable communication links configured to carry analog signals) and/or a digital I/O 111b (e.g., one or more wires, cables, or other suitable communication links configured to carry digital signals). In some examples, the analog I/O I l la is an audio line-in input connection comprising, for example, an auto-detecting 3.5mm audio line-in connection. In some examples, the digital I/O 111b comprises a Sony /Philips Digital Interface Format (S/PDIF) communication interface and/or cable and/or a Toshiba Link (TOSLINK) cable. In some examples, the digital I/O 111b comprises a High-Definition Multimedia Interface (HDMI) interface and/or cable. In some examples, the digital I/O 11 lb includes one or more wireless communication links comprising, for example, a radio frequency (RF), infrared, WiFi, Bluetooth, or another suitable communication protocol. In certain examples, the analog I/O I lla and the digital 111b comprise interfaces (e.g., ports, plugs, jacks) configured to receive connectors of cables transmitting analog and digital signals, respectively, without necessarily including cables.

[0070] The playback device 110a, for example, can receive media content (e.g., audio content comprising music and/or other sounds) from a local audio source 105 via the input/output 111 (e.g., a cable, a wire, a PAN, a Bluetooth connection, an ad hoc wired or wireless communication network, and/or another suitable communication link). The local audio source 105 can comprise, for example, a mobile device (e.g., a smartphone, a tablet, a laptop computer) or another suitable audio component (e.g., a television, a desktop computer, an amplifier, a phonograph, a Blu-ray player, a memory storing digital media files). In some examples, the local audio source 105 includes local music libraries on a smartphone, a computer, a networked-attached storage (NAS), and/or another suitable device configured to store media files. In certain examples, one or more of the playback devices 110, NMDs 120, and/or control devices 130 comprise the local audio source 105. In other examples, however, the media playback system omits the local audio source 105 altogether. In some examples, the playback device 110a does not include an input/output 111 and receives all audio content via the network 104.

[0071] The playback device 110a further comprises electronics 112, a user interface 113 (e.g., one or more buttons, knobs, dials, touch-sensitive surfaces, displays, touchscreens), and one or more transducers 114 (referred to hereinafter as “the transducers 114”). The one or more transducers may include further upward-firing transducers and/or a horizontal-firing transducer. The electronics 112 is configured to receive audio from an audio source (e.g., the local audio source 105) via the input/output 111, one or more of the computing devices 106a- c via the network 104 (Figure IB)), amplify the received audio, and output the amplified audio for playback via one or more of the transducers 114. The playback device 110a includes one or more microphones 115 (hereinafter referred to as “the microphones 115”). The microphones 115 may comprise a plurality of microphones, and may be arranged as a microphone array. The microphone array may be an asymmetrical microphone array. In certain examples, for instance, the playback device 110a having the microphones 115 can operate as an NMD configured to receive voice input from a user and correspondingly perform one or more operations based on the received voice input.

[0072] In the illustrated example of Figure 1C, the electronics 112 comprise one or more processors 112a (referred to hereinafter as “the processors 112a”), memory 112b, software components 112c, a network interface 112d, one or more audio processing components 112g (referred to hereinafter as “the audio components 112g”), one or more audio amplifiers 112h (referred to hereinafter as “the amplifiers 112h”), and power 112i (e.g., one or more power supplies, power cables, power receptacles, batteries, induction coils, Power- over Ethernet (POE) interfaces, and/or other suitable sources of electric power). In some examples, the electronics 112 optionally include one or more other components 112j (e.g., one or more sensors, video displays, touchscreens, battery charging bases).

[0073] The processors 112a can comprise clock-driven computing component(s) configured to process data, and the memory 112b can comprise a computer-readable medium (e.g., a tangible, non-transitory computer-readable medium, data storage loaded with one or more of the software components 112c) configured to store instructions for performing vanous operations and/or functions. The processors 112a are configured to execute the instructions stored on the memory 112b to perform one or more of the operations. The operations can include, for example, causing the playback device 110a to retrieve audio data from an audio source (e.g., one or more of the computing devices 106a-c (Figure IB)), and/or another one of the playback devices 110. In some examples, the operations further include causing the playback device 110a to send audio data to another one of the playback devices 110a and/or another device (e.g., one of the NMDs 120). Certain examples include operations causing the playback device 110a to pair with another of the one or more playback devices 110 to enable a multi-channel audio environment (e.g., a stereo pair, a bonded zone).

[0074] The processors 112a can be further configured to perform operations causing the playback device 110a to synchronize playback of audio content with another of the one or more playback devices 110. As those of ordinary skill in the art will appreciate, during synchronous playback of audio content on a plurality of playback devices, a listener will preferably be unable to perceive time-delay differences between playback of the audio content by the playback device 110a and the other one or more other playback devices 110. Additional details regarding audio playback synchronization among playback devices can be found, for example, in U.S. Patent No. 8,234,395, which was incorporated by reference above.

[0075] In some examples, the memory 112b is further configured to store data associated with the playback device 110a, such as one or more zones and/or zone groups of which the playback device 110a is a member, audio sources accessible to the playback device 110a, and/or a playback queue that the playback device 110a (and/or another of the one or more playback devices) can be associated with. The stored data can comprise one or more state variables that are periodically updated and used to describe a state of the playback device 110a. The memory 112b can also include data associated with a state of one or more of the other devices (e.g., the playback devices 110, NMDs 120, control devices 130) of the media playback system 100. In some examples, for instance, the state data is shared during predetermined intervals of time (e.g., every 5 seconds, every 10 seconds, every 60 seconds) among at least a portion of the devices of the media playback system 100, so that one or more of the devices have the most recent data associated with the media playback system 100.

[0076] The network interface 112d is configured to facilitate a transmission of data between the playback device 110a and one or more other devices on a data network such as, for example, the links 103 and/or the network 104 (Figure IB). The network interface 112d is configured to transmit and receive data corresponding to media content (e.g., audio content, video content, text, photographs) and other signals (e.g., non-transitory signals) comprising digital packet data including an Internet Protocol (IP)-based source address and/or an IP-based destination address. The network interface 112d can parse the digital packet data such that the electronics 112 properly receives and processes the data destined for the playback device 110a. [0077] In the illustrated example of Figure 1C, the network interface 112d comprises one or more wireless interfaces 112e (referred to hereinafter as “the wireless interface 112e”). The wireless interface 112e (e.g., a suitable interface comprising one or more antennae) can be configured to wirelessly communicate with one or more other devices (e.g., one or more of the other playback devices 110, NMDs 120, and/or control devices 130) that are communicatively coupled to the network 104 (Figure IB) in accordance with a suitable wireless communication protocol (e.g., WiFi, Bluetooth, LTE). In some examples, the network interface 112d optionally includes a wired interface 112f (e.g., an interface or receptacle configured to receive a network cable such as an Ethernet, aUSB-A, USB-C, and/or Thunderbolt cable) configured to communicate over a wired connection with other devices in accordance with a suitable wired communication protocol. In certain examples, the network interface 112d includes the wired interface 112f and excludes the wireless interface 112e. In some examples, the electronics 112 excludes the network interface 112d altogether and transmits and receives media content and/or other data via another communication path (e.g., the input/output 111).

[0078] The audio components 112g are configured to process and/or filter data comprising media content received by the electronics 112 (e.g., via the input/output 111 and/or the network interface 112d) to produce output audio signals. In some examples, the audio processing components 112g comprise, for example, one or more digital -to-analog converters (DAC), audio preprocessing components, audio enhancement components, a digital signal processors (DSPs), and/or other suitable audio processing components, modules, circuits, etc. In certain examples, one or more of the audio processing components 112g can comprise one or more subcomponents of the processors 112a. In some examples, the electronics 112 omits the audio processing components 112g. In some examples, for instance, the processors 112a execute instructions stored on the memory 112b to perform audio processing operations to produce the output audio signals. [0079] The amplifiers 112h are configured to receive and amplify the audio output signals produced by the audio processing components 112g and/or the processors 112a. The amplifiers 112h can comprise electronic devices and/or components configured to amplify audio signals to levels sufficient for driving one or more of the transducers 114. In some examples, for instance, the amplifiers 112h include one or more switching or class-D power amplifiers. In other examples, however, the amplifiers include one or more other types of power amplifiers (e.g., linear gain power amplifiers, class-A amplifiers, class-B amplifiers, class-AB amplifiers, class-C amplifiers, class-D amplifiers, class-E amplifiers, class-F amplifiers, class-G and/or class H amplifiers, and/or another suitable type of power amplifier). In certain examples, the amplifiers 112h comprise a suitable combination of two or more of the foregoing types of power amplifiers. Moreover, in some examples, individual ones of the amplifiers 112h correspond to individual ones of the transducers 114. In other examples, however, the electronics 112 includes a single one of the amplifiers 112h configured to output amplified audio signals to a plurality of the transducers 114. In some other examples, the electronics 112 omits the amplifiers 112h.

[0080] The transducers 114 (e.g., one or more speakers and/or speaker drivers) receive the amplified audio signals from the amplifier 112h and render or output the amplified audio signals as sound (e.g., audible sound waves having a frequency between about 20 Hertz (Hz) and 20 kilohertz (kHz)). In some examples, the transducers 114 can comprise a single transducer. In other examples, however, the transducers 114 comprise a plurality of audio transducers. In some examples, the transducers 114 comprise more than one type of transducer. For example, the transducers 114 can include one or more low frequency transducers (e.g., subwoofers, woofers), mid-range frequency transducers (e.g., mid-range transducers, mid-woofers), and one or more high frequency transducers (e.g., one or more tweeters). As used herein, “low frequency” can generally refer to audible frequencies below about 500 Hz, “mid-range frequency” can generally refer to audible frequencies between about 500 Hz and about 2 kHz, and “high frequency” can generally refer to audible frequencies above 2 kHz. In certain examples, however, one or more of the transducers 114 comprise transducers that do not adhere to the foregoing frequency ranges. For example, one of the transducers 114 may comprise a mid-woofer transducer configured to output sound at frequencies between about 200 Hz and about 5 kHz.

[0081] By way of illustration, SONOS, Inc. presently offers (or has offered) for sale certain playback devices including, for example, a “SONOS ONE,” “MOVE,” “PLAYA,” “BEAM,” “PLAYBAR,” “PLAYBASE,” “PORT,” “BOOST,” “AMP,” and “SUB.” Other suitable playback devices may additionally or alternatively be used to implement the playback devices of example examples disclosed herein. Additionally, one of ordinary skilled in the art will appreciate that a playback device is not limited to the examples described herein or to SONOS product offerings. In some examples, for instance, one or more playback devices 110 comprises wired or wireless headphones (e.g., over-the-ear headphones, on-ear headphones, in-ear earphones). In other examples, one or more of the playback devices 110 comprise a docking station and/or an interface configured to interact with a docking station for personal mobile media playback devices. In certain examples, a playback device may be integral to another device or component such as a television, a lighting fixture, or some other device for indoor or outdoor use. In some examples, a playback device omits a user interface and/or one or more transducers. For example, FIG. ID is a block diagram of a playback device I lOp comprising the input/output 111 and electronics 112 without the user interface 113 or transducers 114.

[0082] Figure IE is a block diagram of a bonded playback device 1 lOq comprising the playback device 110a (Figure 1C) sonically bonded with the playback device 1101 (e.g., a subwoofer) (Figure 1A). In the illustrated example, the playback devices 110a and HOi are separate ones of the playback devices 110 housed in separate enclosures. In some examples, however, the bonded playback device HOq comprises a single enclosure housing both the playback devices 110a and HOi. The bonded playback device HOq can be configured to process and reproduce sound differently than an unbonded playback device (e.g., the playback device 110a of Figure 1C) and/or paired or bonded playback devices (e.g., the playback devices 1101 and 110m of Figure IB). In some examples, for instance, the playback device 110a is full-range playback device configured to render low frequency, mid-range frequency, and high frequency audio content, and the playback device 1 lOi is a subwoofer configured to render low frequency audio content. In some examples, the playback device 110a, when bonded with the first playback device, is configured to render only the mid-range and high frequency components of a particular audio content, while the playback device HOi renders the low frequency component of the particular audio content. In some examples, the bonded playback device HOq includes additional playback devices and/or another bonded playback device. Additional play back device examples are described in further detail below with respect to Figures 2A-2C. c. Suitable Network Microphone Devices (NMDs) [0083] Figure IF is a block diagram of the NMD 120a (Figures 1A and IB). The NMD 120a includes one or more voice processing components 124 (hereinafter “the voice components 124”) and several components described with respect to the playback device 110a (Figure 1C) including the processors 112a, the memory 112b, and the microphones 115, the software components 112c, the network interface 112d, and power 112i. The NMD 120a optionally comprises other components also included in the playback device 110a (Figure 1 C), such as the user interface 113 and/or the transducers 114, as well as other components 112j. In some examples, the NMD 120a is configured as a media playback device (e.g., one or more of the playback devices 110), and further includes, for example, one or more of the audio components 112g (Figure 1C), the amplifiers 114, and/or other playback device components. In certain examples, the NMD 120a comprises an Internet of Things (loT) device such as, for example, a thermostat, alarm panel, fire and/or smoke detector, etc. In some examples, the NMD 120a comprises the microphones 115, the voice processing components 124, and only a portion of the components of the electronics 112 described above with respect to Figure IB. In some examples, for instance, the NMD 120a includes the processor 112a and the memory 112b (Figure IB), while omitting one or more other components of the electronics 112. In some examples, the NMD 120a includes additional components (e.g., one or more sensors, cameras, thermometers, barometers, hygrometers).

[0084] In some examples, an NMD can be integrated into a playback device. Figure 1G is a block diagram of a playback device HOr comprising an NMD 120d. The playback device 11 Or can comprise many or all of the components of the playback device 110a and further include the microphones 115 and voice processing components 124 (Figure IF). The playback device 11 Or optionally includes an integrated control device 130c. The control device 130c can comprise, for example, a user interface (e.g., the user interface 113 of Figure IB) configured to receive user input (e.g., touch input, voice input) without a separate control device. In other examples, however, the playback device 11 Or receives commands from another control device (e.g., the control device 130a of Figure IB).

[0085] Referring again to Figure IF, the microphones 115 are configured to acquire, capture, and/or receive sound from an environment (e.g., the environment 101 of Figure 1A) and/or a room in which the NMD 120a is positioned. The received sound can include, for example, vocal utterances, audio played back by the NMD 120a and/or another playback device, background voices, ambient sounds, etc. The microphones 115 convert the received sound into electrical signals to produce microphone data. The voice processing components 124 receive and analyzes the microphone data to determine whether a voice input is present in the microphone data. The voice input can comprise, for example, an activation word followed by an utterance including a user request. As those of ordinary skill in the art will appreciate, an activation word is a word or other audio cue that signifying a user voice input. For instance, in querying the AMAZON® VAS, a user might speak the activation word "Alexa." Other examples include "Ok, Google" for invoking the GOOGLE® VAS and "Hey, Siri" for invoking the APPLE® VAS.

[0086] After detecting the activation word, voice processing components 124 monitor the microphone data for an accompanying user request in the voice input. The user request may include, for example, a command to control a third-party device, such as a thermostat (e.g., NEST® thermostat), an illumination device (e.g., a PHILIPS HUE ® lighting device), or a media playback device (e.g., a Sonos® playback device). For example, a user might speak the activation word “Alexa” followed by the utterance “set the thermostat to 68 degrees” to set a temperature in a home (e.g., the environment 101 of Figure 1A). The user might speak the same activation word followed by the utterance “turn on the living room” to turn on illumination devices in a living room area of the home. The user may similarly speak an activation word followed by a request to play a particular song, an album, or a playlist of music on a playback device in the home. d. Suitable Control Devices

[0087] Figure 1H is a partially schematic diagram of the control device 130a (Figures 1A and IB). As used herein, the term “control device” can be used interchangeably with “controller” or “control system.” Among other features, the control device 130a is configured to receive user input related to the media playback system 100 and, in response, cause one or more devices in the media playback system 100 to perform an action(s) or operation(s) corresponding to the user input. In the illustrated example, the control device 130a comprises a smartphone (e.g., an iPhone™, an Android phone) on which media playback system controller application software is installed. In some examples, the control device 130a comprises, for example, a tablet (e.g., an iPad™), a computer (e.g., a laptop computer, a desktop computer), and/or another suitable device (e.g., atelevision, an automobile audio head unit, an loT device). In certain examples, the control device 130a comprises a dedicated controller for the media playback system 100. In other examples, as described above with respect to Figure 1G, the control device 130a is integrated into another device in the media playback system 100 (e.g., one more of the playback devices 110, NMDs 120, and/or other suitable devices configured to communicate over a network).

[0088] The control device 130a includes electronics 132, a user interface 133, one or more speakers 134, and one or more microphones 135. The electronics 132 comprise one or more processors 132a (referred to hereinafter as “the processors 132a”), a memory 132b, software components 132c, and a network interface 132d. The processor 132a can be configured to perform functions relevant to facilitating user access, control, and configuration of the media playback system 100. The memory 132b can comprise data storage that can be loaded with one or more of the software components executable by the processor 132a to perform those functions. The software components 132c can comprise applications and/or other executable software configured to facilitate control of the media playback system 100. The memory 112b can be configured to store, for example, the software components 132c, media playback system controller application software, and/or other data associated with the media playback system 100 and the user.

[0089] The network interface 132d is configured to facilitate network communications between the control device 130a and one or more other devices in the media playback system 100, and/or one or more remote devices. In some examples, the network interface 132d is configured to operate according to one or more suitable communication industry standards (e.g., infrared, radio, wired standards including IEEE 802.3, wireless standards including IEEE 802.11a, 802.11b, 802.11g, 802.11n, 802.11ac, 802.15, 4G, LTE). The network interface 132d can be configured, for example, to transmit data to and/or receive data from the playback devices 110, the NMDs 120, other ones of the control devices 130, one of the computing devices 106 of Figure IB, devices comprising one or more other media playback systems, etc. The transmitted and/or received data can include, for example, play back device control commands, state variables, playback zone and/or zone group configurations. For instance, based on user input received at the user interface 133, the network interface 132d can transmit a playback device control command (e.g., volume control, audio playback control, audio content selection) from the control device 130 to one or more of the playback devices 110. The network interface 132d can also transmit and/or receive configuration changes such as, for example, adding/removing one or more playback devices 110 to/from a zone, adding/removing one or more zones to/from a zone group, forming a bonded or consolidated player, separating one or more playback devices from a bonded or consolidated player, among others. [0090] The user interface 133 is configured to receive user input and can facilitate ' control of the media playback system 100. The user interface 133 includes media content art 133a (e.g., album art, lyrics, videos), a playback status indicator 133b (e.g., an elapsed and/or remaining time indicator), media content information region 133c, a playback control region 133d, and a zone indicator 133e. The media content information region 133c can include a display of relevant information (e.g., title, artist, album, genre, release year) about media content currently playing and/or media content in a queue or playlist. The playback control region 133d can include selectable (e.g., via touch input and/or via a cursor or another suitable selector) icons to cause one or more playback devices in a selected playback zone or zone group to perform playback actions such as, for example, play or pause, fast forward, rewind, skip to next, skip to previous, enter/exit shuffle mode, enter/exit repeat mode, enter/exit cross fade mode, etc. The playback control region 133d may also include selectable icons to modify equalization settings, playback volume, and/or other suitable playback actions. In the illustrated example, the user interface 133 comprises a display presented on a touch screen interface of a smartphone (e.g., an iPhone™, an Android phone). In some examples, however, user interfaces of varying formats, styles, and interactive sequences may alternatively be implemented on one or more network devices to provide comparable control access to a media playback system.

[0091] The one or more speakers 134 (e.g., one or more transducers) can be configured to output sound to the user of the control device 130a. In some examples, the one or more speakers comprise individual transducers configured to correspondingly output low frequencies, mid-range frequencies, and/or high frequencies. In some examples, for instance, the control device 130a is configured as a playback device (e.g., one of the playback devices 110). Similarly, in some examples the control device 130a is configured as an NMD (e.g., one of the NMDs 120), receiving voice commands and other sounds via the one or more microphones 135.

[0092] The one or more microphones 135 can comprise, for example, one or more condenser microphones, electret condenser microphones, dynamic microphones, and/or other suitable types of microphones or transducers. In some examples, two or more of the microphones 135 are arranged to capture location information of an audio source (e.g., voice, audible sound) and/or configured to facilitate filtering of background noise. Moreover, in certain examples, the control device 130a is configured to operate as playback device and an NMD. In other examples, however, the control device 130a omits the one or more speakers 134 and/or the one or more microphones 135. For instance, the control device 130a may comprise a device (e.g., a thermostat, an loT device, a network device) comprising a portion of the electronics 132 and the user interface 133 (e.g., a touch screen) without any speakers or microphones.

III. Example Asymmetrical Microphone Arrays

[0093] As used herein the term “playback device” can generally refer to a network device configured to receive, process, and output data of a media playback system. For example, a playback device can be a network device that receives and processes audio content. In some examples, a playback device includes one or more transducers or speakers powered by one or more amplifiers. In other examples, however, a playback device includes one of (or neither of) the speaker and the amplifier. For instance, a playback device can comprise one or more amplifiers configured to drive one or more speakers external to the playback device via a corresponding wire or cable. Playback devices described herein include at least four microphones.

[0094] Moreover, as used herein the term NMD (i.e., a “network microphone device”) can generally refer to a network device that is configured for audio detection. In some examples, an NMD is a stand-alone device configured primarily for audio detection. In other examples, an NMD is incorporated into a playback device (or vice versa).

[0095] Figures 2A to 2C illustrate three schematic views of an example of a first playback device 200, which takes the form of a soundbar. Figure 2A provides a perspective view of the playback device 200. Figure 2B provides a top, plan view of the first playback device 200, while Figure 2C shows a front view of the first playback device 200. In Figure 2A, an outer housing, surface, or a grille 205 of the playback device 200 is shown, but in Figures 2B and 2C the grille 205 is not shown, so that microphones and transducers of the device 200 are can be seen more clearly.

[0096] In Figure 2A, a top face or an upper portion 202, and a front face, an azimuthal portion, or a front portion 204 of the playback device 200 are visible. In use, when playback device 200 is placed on a surface such as a credenza or installed on a wall, the upper portion 202 is arranged to face upward, while the front portion 204 is configured to face forward, in use. The front portion 204 is curved at its edges, such that the front portion 204 is continuous with a rear face or rear portion (not shown) of the playback device 200. A grille 205 forms an outer surface of the front portion 204 and covers transducers and other components. [0097] A user interface 206 is provided on the upper portion 202. The user interface 206, in this example, comprises three buttons, although in other playback devices a user interface may comprise one or more buttons, knobs, dials, touch-sensitive surfaces, displays, touchscreens, or any combination thereof In some examples, the user interface 206 comprises one or more transport controls (e.g., play/pause, skip, back, volume up, volume down). In certain examples, the playback device 200 may lack a user interface.

[0098] The playback device 200 also includes a microphone array 210, comprising a plurality of microphones 212, and a plurality of transducers 214. The asymmetrical microphone array 210 is provided on the upper portion 202 of the playback device 200, and is visible in Figure 2B. The plurality of transducers 214 comprises five transducers that are provided in the front portion 204, as can be seen in Figure 2C. The transducers include a first forward-firing transducer 214a that is positioned centrally on the front portion 204, second and third forwardfiring transducers 214b, 214c that are positioned symmetrically to either side of the first forw ard-firing transducer 214a, and a fourth transducer 214d and a fifth transducer 214e that are positioned symmetrically towards each end of the front portion 204. The fourth and fifth transducers 214d, 214e are angled relative to the front portion 204, such that sound radiates from these transducers 214d, 214e in a substantially different direction, or along a different main axis, to sound radiating from the forward-firing transducers 214a-c. The fourth and fifth transducers 214d. 214e may be referred to as side-firing transducers. For the purpose of explanation, both Figures 2B and 2C include a center line 216 of each face, and a cross indicating a center point 218 of the playback device 200. References to features being positioned to a particular side relative to the center point 218 below may be considered to mean on one side of an axis passing through the center point 218, on one side of the center line 216, or on one side of a plane passing through the center point 218 or defined by one or more center lines 216. Figure 2B also shows the positions of the buttons of the user interface 206 for context.

[0099] The microphones 212 are arranged in the microphone array 210 so that the microphone array 210 is asy mmetrical. The array 210 is asymmetrical about the center line 216 and center point 218 of the playback device 200. The array 210 is also asymmetrical about its own center line and center point, because these are the same as the center line 216 and center point 218.

[0100] The microphone array 210 comprises a plurality of microphones 212, which will be referred to, from left to right as shown in Figure 2B, as a first microphone 212a, a second

Z1 microphone 212b, a third microphone 212c, and a fourth microphone 212d. The first microphone 212a is positioned towards one end of the upper portion 202, which, in Figure 2B, is the left-hand end of the playback device 200. The fourth microphone 212d is positioned in a similar position towards the other end of the upper portion 202, the right-hand end in Figure 2B. The second and third microphones 212b, 212c are positioned between the first and fourth microphones 212a, 212d. In the example of Figure 2B, the microphones of the array are aligned with one another, and are also aligned with a longitudinal axis 224 of the playback device 200. As will be described in relation to Figures 3 and 6 below, the microphones may lack such alignment in other examples. Moreover, in the illustrated example of Figures 2A-2C, the microphone array 200 includes four microphones 212. In some examples, however, other suitable numbers of microphones are included. In certain examples, for instance, the microphone array 200 includes an even number of microphones 210 (e.g., 2, 4, 6, 8, 10, etc.). In other examples, however, the microphone array 210 includes an odd number of microphones 210 (e.g., 1, 3, 5, 7, 9, 11, etc.).

[0101] The array 210 is asymmetrical for at least the reason that the first and fourth microphones 212a, 212d are symmetrical about the center of the device but the second and third microphone 212b, 212c are not symmetrically arranged about the center line 216 or point 218. Instead, they are provided to one side of the center of the device 200, such that the first, second, and third microphones 212a-c are to the left-hand side of the center (relative to the orientation of the upper portion shown in Figure 2B) and the fourth microphones 212d is to the right-hand side of the center. Accordingly, a majority of the microphones 212a-d of the microphone array 210 can be positioned at/in/on one side of the center of the playback device. [0102] The microphones 212a-d may be separated into sets. The sets are defined based on how the microphones may operate together. Such operation can be dependent upon a spacing between the microphones. Accordingly, the array 210 includes a first set 220 of two microphones, comprising the first microphone 212a and the fourth microphone 212d, and a second set 222 of two microphones, comprising the second microphone 212b and the third microphone 212c. The microphones 212a, 212d of the first set 220 are spaced apart or separated by a first distance or spacing, indicated in Figure 2B by the length dl. The microphones 212b, 212c of the second set 222 are spaced apart or separated by a second distance or spacing, which is smaller than the first spacing dl and is indicated in Figure 2B as length d2. Microphones having a larger spacing, such as those of the first set 220 may be useful for achieving high angular resolution when determining a direction from which a sound signal is received, because the times at which a signal is received at microphones having a larger spacing can be compared with greater precision. Microphones having a larger spacing may also be useful for range estimation. Conversely, microphones having a smaller spacing, such as those of the second set 222, may facilitate beamforming. Together, the microphones of the array may be useful for improving signal quality, by allowing improved noise reduction, improved characterization of a listening environment, and identification of a location of a sound source and subsequently determining information from that source.

[0103] The microphones of the second set 222 may also be closer to one another than they are to the microphones of the first set 220. As indicated in Figure 2B, the first and second microphones 212a, 212b are separated by a third spacing d3, which is greater than d2 but smaller than dl. The third and fourth microphones 212c, 212d are separated by a fourth spacing d4, which is also greater than d2 but smaller than dl. d4 is also greater in length than d3, because of the asymmetry of the array 210. In the illustrated example of Figure 2B, the length dl is greater than the length d4, the length d4 is greater than the length d3, and the length d3 is greater than the length d2. In some examples, however, the individual lengths can have any suitable relationship with the other lengths. In certain examples, for instance, the lengths d2 and d3 may be similar or substantially the same. As one of ordinary skill in art will appreciate, if the lengths d2 and d3 are substantially equal, the resulting array could still be asymmetric.

[0104] Figure 3 is a plan view of a top face or upper portion 302 of a second playback device 300, having an asymmetrical microphone array 310 (e.g., similar to Figure 210 of Figures 2A- 2C). The microphone array 310 includes a plurality of microphones 312, referred to as a first microphone 312a, a second microphone 312b, a third microphone 312c, and a fourth microphone 312d. In the microphone array 310, the first and fourth microphones 312a, 312d are in the same positions as they were in the first playback device 300, thereby being spaced apart by a spacing of dl, like the first and fourth microphones 212a and 212d of the first playback device 200 shown in Figure 2B. The second and third microphones 312b, 312c are positioned to be to one side of the center point 318 and/or center line 316 of the playback device. Accordingly, a majority of the microphones 312a-d are to one side of the center point 318 and/or center line 316 of the playback device, which results in an asymmetrical microphone array.

[0105] Like the second and third microphones 212b, 212c in the first playback device 200 in Figure 2B, the second and third microphones 312b, 312c of the second playback device are spaced apart by a distance d2, but in this example are aligned with one another longitudinally along the device 300 and are displaced from the longitudinal axis 324 so as to be symmetrical about that axis 324. The microphone array can still be split into two sets 320, 322, due to the similarity in spacing between the microphones. In contrast to the playback device 200, however, in the playback device of Figure 2, the second microphone 312b is the same distance from the first and fourth microphones 312a, 312d as the third microphone 312c.

[0106] Figure 4 is a plan view of a top face or an upper portion 402 of a third playback device 400, having an asymmetrical microphone array 410. The microphone array 410 includes a plurality of microphones 412, referred to as a first microphone 412a, a second microphone 412b, a third microphone 412c, and a fourth microphone 412d. In the example of Figure 4, the microphone array is asymmetrical and has the same number of microphones to each side of the center point 418 of the playback device. The asymmetry therefore arises due to the placement of the microphones to each side of the center point 418 and/or center line 416. In this example, the first and fourth microphones 412a, 412d are symmetrical about the center line 416 and/or center point 418 of the playback device, but the second and third microphones 412b, 412c are asymmetrical, rendering the entire array asymmetrical. In contrast to the array 310 (Figure 3), the microphones of the array 410 in Figure 4 are aligned with a longitudinal axis 424 of the upper portion 402 of the third playback device.

[0107] Figure 5 is a plan view of a top face or an upper portion 502 of a fourth playback device, having an asymmetrical microphone array 510. In the illustrated example of Figure 5, the microphone array 510 includes five microphones, rather than four microphones as shown in Figures 2A to 4. The microphones 512 of the microphone array 510 are referred to as a first microphone 512a, a second microphone 512b, a third microphone 512c, a fourth microphone 512d, and a fifth microphone 512e. In Figure 5, a majority of the microphones, which is the first to third microphones 512a-c are to one side of the center point 518 and/or center line 516 of the playback device, while the other two, the fourth and fifth microphones 512d-e are to the other side of the center point 518 and/ or center line 516 of the playback device. This split results in asymmetry in the microphone array. The microphones 512 may be split into two sets or subarrays, having different spacings. A first set 520 comprises the first to third microphones 512a- c, which have a first spacing, and a second set 522 comprises the fourth and fifth microphones 512d-e, which have a second spacing that is different (e.g., larger) than the first spacing. As in Figures 2B and 4, but in contrast to Figure 3, the microphones 512a-e of Figure 5 are aligned along a longitudinal axis 524 of the playback device 500. [0108] Figures 6A-6C are isometric, plan and front views, respectively, of an example of a fifth playback device 600 (e.g., a soundbar). In Figure 6A shows a curved outer housing portion 630 (e.g., a grille, cover, enclosure) of the playback device 600. In Figures 6B and 6C, however, the outer housing 630 is not shown, so that microphones and transducers of the playback device 600 can be seen. A top face or upper portion 602 of the playback device 600 is continuous with a front face or front portion 604 of the playback device 600, as well as a lower and rear face (not visible). For convenience and ease of explanation, the upper portion 602 and front portion 604 as shown in Figure 6B and 6C are treated as if they are separate portions rather than having continuity between them, and are explained in relation to a upper and front portion. References to the upper and front portion may be considered to refer to an upward-facing surface and a forward-facing surface of the playback device, in use. In some examples, the two surfaces may be substantially orthogonal.

[0109] A plurality of transducers 614 are disposed in and/or on upper portion 602 and/or the front portion 604. A first and second transducer 614a, 614b can comprise upward-firing transducers and are provided on the upper portion 602. A third and fourth transducer 614c, 614d comprise side-firing transducers and are provided on either end of the playback device 600. A fifth to eighth transducer 614e-h are provided on the front portion 604. The fifth and sixth transducers 614e, 614f are provided close to each end of the playback device 600 and are symmetrically arranged on the front portion 604. The seventh transducer 614g is provided centrally in the front portion 604, and the eighth transducer 614h is provided to one side of a centerline 616 of the front portion 604. The eighth transducer 614h is asymmetrically positioned on the front portion, such that the corresponding position on the opposite side of the centerline 616 is not occupied by another transducer. The eighth transducer 614h may be a transducer configured to output low-frequency audio. The eighth transducer 614h may be specifically designed for low-frequency output or may be configured to output such audio by the electronics of the device. In some examples, for instance, the eighth transducer may comprise a multiple membrane, multiple motor transducer. Additional details regarding multiple membrane, multiple motor transducers can be found at, for example, U.S. Patent Application 16/760,049 titled “Low Profile Loudspeaker Device, the contents of which are incorporated by reference in its entirety.

[0110] The playback device 600 includes a microphone array 610, comprising a plurality of microphones 612, referred to individually as a first microphone 612a and a fourth microphone 612d (Figure 6B), and a second microphone 612b and a third microphone 612d (Figure 6C), and a plurality of transducers 614. The microphone array 610 is provided across the front and upper portions 604, 602 of the playback device 600, and is visible in Figures 6B and 6C. The microphones 612a-d are arranged in the microphone array 610 so that the microphone array 610 is asymmetrical. The array 610 is asymmetrical about the center line 616 and/or center point 618 of the playback device 600.

[OHl] The microphone array 610 comprises a plurality of microphones 612, which will be referred to, from left to right as shown in Figure 6B, as a first microphone 612a, a second microphone 612b, a third microphone 612c, and a fourth microphone 612d. The first microphone 612a is positioned towards one end of the upper portion 602, which, in Figure 6B, is the left-hand end of the playback device 600. The fourth microphone 612d is positioned in a similar position towards the other end of the upper portion 602, the right-hand end in Figure 6B. The second and third microphones 612b, 612c are positioned between the first and fourth microphones 612a, 612d but are positioned on the front portion 604 rather than the upper portion 602.

[0112] The array 610 can be characterized as asymmetrical for at least the reason that the first and fourth microphones 612a, 612d (Figure 6B) are symmetrical about the center line 616 of the device but the second and third microphone 612b, 612c (Figure 6C) are not symmetrically arranged about the center line 616 and/or point 618. Instead, they are provided to one side of the center of the device 600, such that the first, second, and third microphones 612a-c are disposed on a first side (e.g., the left-hand side of the center relative to the orientation of the upper and front portions shown in Figures 6B and 6C) and the fourth microphone 612d is disposed on a second side (e.g., the right-hand side of the center). Accordingly, a majority of the microphones 612 of the microphone array 610 are located on one side of the center of the playback device.

[0113] In some examples, the microphones 612 may be arranged into apertures and/or sets of microphones. The sets can be defined, for instance, based on how the microphones may operate together. Such operation can be based on a spacing between two or more of the microphones 612. Accordingly, the array 610 includes a first set 620 of two microphones, comprising the first microphone 612a and the fourth microphone 612d, and a second set 622 of two microphones, comprising the second microphone 612b and the third microphone 612c. The microphones 612a, 612d of the first set 620 are spaced apart or separated by a first distance or spacing, indicated in Figure 6B by the length dl. The microphones 612b, 612c of the second set 622 are spaced apart or separated by a second distance or spacing, which is smaller than the first spacing dl and is indicated in Figure 6C as length d2. As those of ordinary skill in the art will appreciate, microphones having a larger spacing, such as those of the first set 620 may be useful for achieving high angular resolution when determining a direction from which a sound signal is received, because the times at which a signal is received at microphones having a larger spacing can be compared with greater precision. Microphones having a larger spacing may also be useful for range estimation. Microphones having a smaller spacing, such as those of the second set 622, may facilitate beamforming. Together, the asymmetric arrangement of the microphones 612 is expected to be advantageous over conventional symmetric or other arrays for at least the benefits of improving received or captured audio signal quality , allowing improved noise reduction, improved characterization of a listening environment, and/or enhanced sound source location identification and subsequent determining of information from that source. The microphones of the second set 622 are also closer to one another than they are to the microphones of the first set 620. dl may be at least 1 meter or greater, at least 0.9m, at least 0.8m, at least 0.7m, at least 0.6m, or at least 0.5m. In some examples, dl may be less than 0.5m. d2 may be less than 0.05m, less than 0. Im, less than 0.2m, less than 0.3m, less than 0.4m or less than 0.5m. In some examples, d2 may be less than two times smaller than dl, less than three times smaller than dl, less than four times smaller than dl, or less than five times smaller than dl.

[0114] In contrast to the arrays of Figures 2A-5, the microphone array 610 in the playback device 600 shown in Figures 6A to 6C is spread across two different faces or at least spread around a curved surface, such that the microphones are provided in at least two different orientations. The microphones 612a, 612d of the first set 620 are provided at the upper portion 602 of the playback device 600, and so are oriented to face upwards, in use. The microphones 612b, 612c of the second set 622 are provided at the front portion 604 of the playback device 600, and so are oriented to face forward, in use.

[0115] Utilizing microphones facing in two different directions, particularly via an asymmetric array, is expected to facilitate and/or enable improvements over conventional arrays, such as enabling an angle of elevation of a sound source relative to the playback device to be estimated in addition to being able to estimating an azimuthal angle of said sound source. This may enable better identification in three dimensions of a location of a source within a listening environment. Subsequently, beamforming may be used to focus on audio output from the source, using the second set 622 of microphones, for example. The positioning of the second set 622 on the front portion 604 and the first set 620 on the upper portion provides improvements in addition to those described above. Characterization of a ceiling portion of a room may be improved by having upward-facing microphones with good angular resolution, as the widely spaced first set 602 will have. This may be particularly useful in a home-theater setting. The second set 622 may enable better pickup of voice commands by being on a front portion, because commands from a user are likely to be incident on the front portion due to the expected positioning of a listener relative to the device 600. Furthermore, allowing at least some microphones to be placed on the front of the device enables direct audio signals to be picked up, meaning there is less possibility for noise.

[0116] When considering noise, the placement of the microphones of the array 610 may also reduce self-sound, which can cause unwanted noise in received signals due to the operation of nearby transducers. Self-sound is most likely to be caused by low-frequency audio, such as that below 500 Hz. Accordingly, to avoid self-sound, the microphones may be positioned to maximize a distance of at least some of the microphones 612 from corresponding adjacent transducers 614 outputting low-frequency audio, such as the eighth transducer 614h. The arrangement of microphones in Figures 6B and 6C positions the second set 622 of microphones on an opposite side of the center point 618 to the eighth transducer 614h. Accordingly, at least one of the second and third microphones 612b, 612c is positioned farther from the eighth transducer 614h than it would be were the microphone array 610 not asymmetric. This separation distance therefore may reduce overall self-sound and result in a majority of the microphones 612 being at least a fifth distance d5 from the transducer 614h (Figure 6C).

[0117] The microphones of the second set 622 are also positioned in a space on the playback device 600 that is unoccupied by any of the transducers 614, and so are a sixth distance d6 from any of the transducers. The sixth distance d6 (Figure 6C) is greater than the spacing between the microphones of the second set 622, which is d2. Achieving a reduced self-sound may also enable improved noise reduction in the microphone 612d that is closer to the transducer 614h. [0118] Although Figures 2A to 6 show examples of microphone arrays 210, 310, 410, 510, 610 on corresponding playback devices 200, 300, 400, 500, 600 including a particular number (e.g., four or five) of microphones 212, 312, 412, 512, 612. In other examples, microphone arrays may include any other suitable number of microphones, (e.g., more than five microphones). In some examples, for instance, asymmetrical microphone arrays may include six, seven, eight, nine, ten, twelve, sixteen, or twenty microphones. The microphones of an array may be divided into different apertures or sets having different spacings and/or positionings. Furthermore, although Figures 2A-5 show microphone arrays on an top face of the playback device and Figure 6 shows the microphone array spread across two faces of the playback device, in other examples the microphones may be wholly on a different face, such as all on the front portion or rear face, for example, or spread across multiple different faces. [0119] In Figures 2A to 6, the microphones 212, 312, 412, 512, 612 of the microphone arrays 210, 310, 410, 510, 610 are considered to be the same type of microphone. In other examples, one or more microphones of an array may be a different type of microphone to the other microphones in the array. The microphone arrays may therefore comprise, for example, one or more condenser microphones, one or more electret condenser microphones, one or more dynamic microphones, one or more MEMS microphones, and/or one or more microphones of another suitable type.

[0120] In the examples of Figures 2A-6, the playback devices take the form of a soundbar that is elongated along a horizontal axis and is configured to face along a primary sound axis that is substantially orthogonal to the first horizontal axis. In other examples, the playback devices can assume other forms, for example having more or fewer transducers, having other form-factors, or having any other suitable modifications with respect to the example shown in Figures 2A-6. In various implementations, the playback devices can serve as a home theatre primary playback device, and may be placed in a center front position of a home theatre listening environment. In such a configuration, the playback devices can play back home theatre audio synchronously with playback via one or more satellite playback devices, which can be arranged about the listening environment in a suitable configuration.

[0121] In example implementations, various techniques described herein may be carried out with a playback device that includes multiple audio transducers, and may optionally be used as a multichannel satellite playback device for home theatre applications. Although the techniques above are described in relation to playback devices, the microphone arrays may be incorporated into other NMDs that lack transducers.

IV. Example Media Playback System and Arrangements of Playback and Network Microphone Devices

[0122] If a system includes two or more devices each with their own microphone(s), combinations of microphones across two or more devices may be used to form new subarrays or subsets and/or to form a single ‘super’ array comprising all of the microphones across the devices. Relative positions of the microphones may be known, for example using range-finding techniques. The devices may have synchronized clocks and/or a common time reference. As discussed above, symmetric positioning is not essential, so such methods using microphones from two or more devices as an array can be used whether or not the devices are symmetrically positioned. The effects of asymmetry in the microphones of one device may be enhanced by including additional microphones at another device.

[0123] Figure 7 shows a first media playback system 700, which combines two playback devices, one having an asymmetric microphone array and the other a symmetric microphone array. The first media playback system comprises the fifth playback device 600 as shown and described in Figures 6A to 6C, and at least a further, sixth playback device 702. The media playback system 700 may include further playback devices, although for clarity these are not shown in Figure 7. The fifth playback device 600, as described above, has a plurality of transducers 614 (not visible in Figure 7) and an asymmetric microphone array 610 comprising four microphones 612a-d arranged into two sets or subarrays 620, 622. Although the outer grille or portion 630 of the fifth playback device 600 is shown in Figure 7, the locations of the microphones 612a-d are indicated with dots and labelled with the appropriate reference numeral. As in Figures 6A-6C, the fifth playback device 600 is shown in a perspective view such that its front and upper portions 604, 602 are visible.

[0124] The sixth playback device 702 has at least one transducer (not shown). The sixth playback device 702 also includes a plurality of microphones, whose positions are indicated with dots and labelled wdth reference numerals 706a-c (referred to hereafter as “the microphones 706a-c”). The sixth playback device 702 comprises three microphones 706a-c that are provided on an upper portion 712 of the device 702 and are arranged in a microphone array 708. The microphone array 708 is a symmetric microphone array, and is symmetric about a center line 710 of the sixth playback device 702. In Figure 7, the sixth playback device 702 is depicted so that a rear portion 716 and the upper portion 712 are visible. A front portion of the sixth playback device 702 is not visible, as it is arranged to face the fifth playback device 600, so that audio output from the transducers of the sixth playback device 702 are directed substantially towards the fifth playback device 600.

[0125] The fifth and sixth playback devices 600, 702 are positioned in a listening environment relative to a listener 714. The fifth playback device 600 and the sixth playback device 702 are oriented towards the listener 714.

[0126] To enable the fifth and sixth playback devices 600, 702 to operate in tandem, a distance between the fifth and sixth playback device 600, 702 may be determined or known and the devices 600, 702 may be time-synchronized. The devices may be time-synchronized by sharing a synchronized clock system or by being part of a time-synchronized network. Timing synchronization may enable a more accurate determination of measurements of signals received by one or both devices, because a shared time reference is utilized. Details regarding synchronization among playback devices can be found, for example, in U.S. Patent No. 8,234,395 entitled, “System and method for synchronizing operations among a plurality of independently clocked digital data processing devices,” which is incorporated herein by reference in its entirety.

[0127] A distance d7 between the fifth and sixth playback devices 600, 702 may be determined using audio transmission between the devices, using another short-range distance determination method, or by receiving input from a user indicating said distance. The distance d7 may be a distance between the centers of each device or another arbitrary point, and a plurality of distances may subsequently be determined between respective pairs of microphones of the devices using the distance d7 and known parameters of the playback devices 600, 702, such as the known distances dl-d3 between the microphones 612a-d. The use of audio transmission to determine a distance between devices may be more accurate than in other systems because of the presence of the asymmetric microphone array at the fifth playback device 600.

[0128] To determine the distance d7, the sixth playback device 702 may be configured to output one or more signals via its at least one transducer. The signal(s), which may, for example, comprise a sweep signal or a series of tones, may be received by the microphones 612a-d of the microphone array 610. The signal(s) received at the microphone array 610 may be processed to determine the distance d7 between the fifth and sixth playback devices 600, 702. Alternatively, or additionally, the fifth playback device 600 may also be configured to output one or more signals via its transducers 614 for receipt by the sixth playback device 702 via its symmetrical microphone array 708. The sixth playback device 702 may receive the signal(s) and processing may be performed to determine the distance d7 between the two devices 600, 702. Because the devices 600, 702 are time-synchronized, the precise time at which the signal is output and received may be determined and compared to determine said distance d7. The processing described herein may be performed at one or each playback device 600, 702 or at a separate control device, such as control device 130a, or at a remote server or computing device. Distances may be determined based on ultrasonic signals.

[0129] In addition to determining distances between devices, relative height and angles between the devices 600, 702 may be determined using the asymmetric array. Specifically, the asymmetric microphone array 610 may enable an elevation angle and an azimuthal angle of a sound source relative to the playback device to be determined.. An elevation angle 722 and/or an azimuthal angle 724 of the sixth playback device 702 relative to the fifth playback device 600 may be determined based on signals output by the sixth playback device 702 and received by the fifth playback device 600. Details regarding spatial mapping of devices, direction and angle determination may be found in U.S. Patent Applications 14/871,494 and 15/229,855, which are incorporated herein above.

[0130] Based on a knowledge of the relative position of the devices, as well as the timesynchronization between the devices, the playback devices 600, 702 may be used together to provide, e g., improved determination of a location of the listener 714, improved beamforming for receiving utterances from the listener 714, enhanced noise reduction, and/or improved characterization of the listening environment. The playback devices 600, 702 may be used together by using their microphone arrays 610, 708 as a combined array or so that combinations of the microphones 612a-d, 706a-c of the microphone arrays 610, 708 can be used to form further sets of microphones to provide specific functions or benefits. The formation of further sets of microphones is possible where at least two playback devices are provided with microphone arrays, comprising either a combination of asymmetric and symmetric arrays (as described here in relation to Figure 7) or two asymmetric arrays (as will be described in relation to Figure 8 below).

[0131] As an example, the listener 714 may utter one or more voice commands 726. The voice command 726 may be received by one or both microphone arrays 610, 708. A location of the listener 714 may be determined based on the one or more voice commands 726. A distance between the listener 714 and each of the playback devices 600, 702 may be determined based on the voice commands 726 and the distance d7 and/or one or more of the angles 722, 724. An angle or direction of the listener 714 relative to one or both playback devices 600, 702 may also be determined based on the voice commands 726. Such determinations may be made using the first set 620 of microphones or the second set 622 of microphones, or a combination of the microphones from the arrays 610, 708 of the devices 600, 702.

[0132] The combination of an asymmetric array 610 with a symmetric microphone array 708 provides a combined array that is also asymmetric. The combination overall may be asymmetric although in some examples the sets 620, 622 of microphones forming the array 610 may be used in combination with the entire symmetric array 708 to form a further array or set that may be symmetric. Accordingly, the benefits of symmetric and asymmetric arrays may be enhanced by using a combination of a symmetric and asymmetric array. [0133] While Figure 7 shows a playback system 700 having a device 600 with an asymmetric microphone array 610 and a device 702 with a symmetric microphone array 708, in some examples, two asymmetric microphone arrays may be provided at two separate devices. A combination of two asymmetric microphone arrays may provide further improvements when they are used in combination.

[0134] An example of a media playback system 800 including two asymmetric microphone arrays is shown in Figure 8. The second media playback system 800 includes two fifth playback devices 600L, 600R. The media playback system 800 may include further playback devices, although for clarity these are not shown in Figure 8. To distinguish between them, the playback devices 600L, 600R are referred to as “the left fifth playback device 600L” and “the right fifth playback device 600R”, as shown in Figure 8. Hereafter, features of the left fifth playback device 600L have a reference numeral ending with ‘L’ and features of the right fifth playback device 600R have a reference numeral ending with ‘R’ to enable them to be referred to individually.

[0135] Each of the left and right fifth playback devices 600L, 600R has a plurality of transducers (not shown) and a respective asymmetric microphone array 610L, 61 OR comprising four microphones 612a-dL, 612a-dR arranged into a first set or subarray 620L, 620R, and a second set or subarray 622L, 622R. Although the outer grilles or portions 630L, 630R of the fifth playback devices 600L, 600R are shown in Figure 8, the locations of the microphones 612a-dL, 612a-dR are indicated with dots and labelled with the appropriate reference numeral. As in Figures 6A-6C, the fifth playback devices 600L, 600R are shown in a perspective view such that its front portion 604L, 604R and upper portions 602L, 602R are visible.

[0136] The fifth playback devices 600L, 600R are positioned in a listening environment relative to a listener 814. The fifth playback devices 600L, 600R are oriented towards the listener 814. Both fifth playback devices 600L, 600R are positioned to be on the same side of the listener 814 although in other examples the devices 600L, 600R may be to opposite sides of the listener 814 or positioned in other ways relative to the listener 814.

[0137] Two of the same fifth playback devices 600L, 600R are depicted in Figure 8, each having an asymmetric microphone array 610L, 61 OR having the same arrangement. In other examples, the two or more playback devices may be different, or may be the same but have different arrangements of microphones to form asymmetric microphone arrays. In a majority of the possible arrangements of microphone arrays in two or more playback devices, the arrays lack symmetry with one another. In some examples, the asymmetric microphone arrays may be arranged so that the asymmetric arrays exhibit individual asymmetry but collective symmetry. An example of this may be having two playback devices whose microphone arrays are mirror images of each other.

[0138] As described above in relation to Figure 7, the two playback devices 600L, 600R in the second media playback system 800 of Figure 8 may be time-synchronized and a distance between them may be known or may have been determined. The media playback system 800 may be configured to determine distances and/or angles between the devices 600L, 600R. Distances and/or angles between the devices and a listener 814 may also be determined.

[0139] In addition, and as also indicated above, while each individual playback device 600L, 600R comprises a first set 620L, 620R and a second set 622L, 622R of microphones, both microphone arrays 610L, 61 OR may be combined into one combined microphone array or subdivided into further sets. For example, a further set may be generated by combining the two first sets 620L, 620R, which may be symmetric about a position between the playback devices. This further set may be subdivided into two further subsets comprising the outermost microphones 612aL, 612dR, which are separated by a distance d8, and two inner microphones 612dL, 612aR, which are separated by a distance d9. Other combinations of microphones may be determined along with the distances therebetween, such as distance dlO between the fourth microphone 612dL of the left fifth playback device 600L and the second microphone 612bR of the right fifth playback device 600R. The distances d8, d9, dlO or any other distance between any two microphones of the playback devices 600L, 600R may be determined based on determining a distance and angle between the devices and known distances of the individual microphones along a length of the device.

[0140] Providing two playback devices 600L, 600R having asymmetric microphone arrays may therefore provide combinations of microphones that have even greater distances therebetween, allowing even greater height estimation or angular resolution, as well as other benefits.

[0141] Further benefits may be achieved by combining multiple asymmetric microphone arrays with a symmetric microphone array. Particularly, as shown in Figure 9, a third media playback system 900 including at least three playback devices, two of which have an asymmetnc microphone array and one which has a symmetric microphone array may provide the benefits associated with both the first and second media playback systems 700, 800. [0142] As can be seen in Figure 9, the third media playback system 900, in this example, includes two fifth playback devices 600L, 600B as shown in Figure 8 and the sixth playback device 702 shown in Figure 7. In addition to the benefits and features of combining two or more microphone arrays described in relation to Figures 7 and 8 above, combinations of three or more playback devices may enable further techniques to be implemented. For example, as illustrated in Figure 9, relative locations of each of the playback devices 600L, 600B, 702 may be determined, and a listening area 902 therebetween may be defined. The listening environment in the listening area 902 may be more precisely defined and characterized by combining the effects achieved with the asymmetric and symmetric arrays individually, such that audio output by the playback devices 600L, 600R, 702 can be tuned or tailored to the listening environment or area 902. Sound sources, such as voice commands from a listener 904 within the listening area 902 may be triangulated within the listening area.

[0143] To improve characterization of a listening environment, or to improve sound reproduction for a listener location, such as reproduction of height audio, it may be useful to utilize a device that can be moved around the listening environment to specific locations. Figure 10 illustrates a fourth media playback system 1000 that may be used to achieve this, through the use of a portable playback device 1002. The fourth media playback system includes a fifth playback device 600 and the portable playback device 1002. The portable playback device 1002 comprises at least one transducer, not visible in Figure 10. The portable playback device 1002 may also include one or more microphones. The microphones may be arranged in a symmetric or asymmetric array or as an individual microphone.

[0144] The portable playback device 1002 and the fifth playback device 600 may be time- synchronized. The devices 1002, 600 may be time-synchronized using time-synchronization software. Time synchronization may be useful in ensuring precise and accurate determination of times of arrival of different reflections and audio signals and/or when using beam forming techniques for sets of microphones across more than one device. Details regarding synchronization among playback devices can be found, for example, in U.S. Patent No. 8,234,395 entitled, “System and method for synchronizing operations among a plurality of independently clocked digital data processing devices,” which is incorporated herein above.

[0145] The portable playback device 1002 may be configured to output one or more audio signals via its at least one transducer from a position within the listening environment. The one or more audio signals may be received at the asymmetric microphone array 610 of the fifth playback device 600. Based on the audio signals, processing may be performed to determine a position of the portable playback device 1002 within the listening environment and/or to characterize the listening environment. A listener 1014 may then be prompted to move the portable playback device to a different location within the listening environment and the process of outputting, receiving, and processing audio signals may be performed again. In some embodiments, the listener 1014 may be prompted to move the portable playback device 1002 around the listening environment while it is outputting the one or more audio signals.

[0146] In some embodiments, the asymmetric microphone array is incorporated into a portable playback device. One or more other playback devices of a media playback system may be configured to output audio signals for receipt by the asymmetric microphone array as it is moved around the room. The features of the above embodiments can also be combined, for example the time synchronization between devices discussed above for Figure 10 can be used in any of the embodiments discussed herein.

V. Conclusion

[0147] The above discussions relating to playback devices, controller devices, play back zone configurations, and media content sources provide only some examples of operating environments within which functions and methods described below may be implemented. Other operating environments and/or configurations of media playback systems, playback devices, and network devices not explicitly described herein may also be applicable and suitable for implementation of the functions and methods.

[0148] The description above discloses, among other things, various example systems, methods, apparatus, and articles of manufacture including, among other components, firmware and/or software executed on hardware. It is understood that such examples are merely illustrative and should not be considered as limiting. For example, it is contemplated that any or all of the firmware, hardware, and/or software examples or components can be embodied exclusively in hardware, exclusively in software, exclusively in firmware, or in any combination of hardware, software, and/or firmware. Accordingly, the examples provided are not the only ways) to implement such systems, methods, apparatus, and/or articles of manufacture.

[0149] Additionally, references herein to “example” means that a particular feature, structure, or characteristic described in connection with the example can be included in at least one example of an invention. The appearances of this phrase in various places in the specification are not necessarily all referring to the same example, nor are separate or alternative examples mutually exclusive of other examples. As such, the examples described herein, explicitly and implicitly understood by one skilled in the art, can be combined with other examples.

[0150] The specification is presented largely in terms of illustrative environments, systems, procedures, steps, logic blocks, processing, and other symbolic representations that directly or indirectly resemble the operations of data processing devices coupled to networks. These process descriptions and representations are typically used by those skilled in the art to most effectively convey the substance of their work to others skilled in the art. Numerous specific details are set forth to provide a thorough understanding of the present disclosure. However, it is understood to those skilled in the art that certain examples of the present disclosure can be practiced without certain, specific details. In other instances, well known methods, procedures, components, and circuitry have not been described in detail to avoid unnecessarily obscuring examples of the examples. Accordingly, the scope of the present disclosure is defined by the appended claims rather than the foregoing description of examples.

[0151] When any of the appended claims are read to cover a purely software and/or firmware implementation, at least one of the elements in at least one example is hereby expressly defined to include a tangible, non-transitory medium such as a memory, DVD, CD, Blu-ray, and so on, storing the software and/or firmware.

Claims

1. A playback device comprising: at least one audio transducer; and an asymmetrical microphone array comprising at least four microphones.

2. The play back device of claim 1, wherein the microphone array is asymmetric about a center point of the playback device.

3. The playback device of claim 1 or 2, wherein a majority of the at least four microphones of the microphone array are to one side of a center point of the playback device.

4. The play back device of claim 1, 2 or 3, wherein the asymmetrical microphone array comprises: a first set of two or more microphones, wherein the microphones of the first set are separated by a first spacing; and a second set of two or more microphones, wherein the microphones of the second set are separated by a second spacing that is smaller than the first spacing.

5. The play back device of claim 4, wherein the second set of two or more microphones are to one side of a center point of the playback device.

6. The play back device of claim 4 or 5, wherein the first set of two or more microphones is symmetrical about a center point of the playback device.

7. The playback device of claim 4, 5 or 6, wherein the first set of two or more microphones comprises a first microphone and a second microphone that are separated by the first spacing, and wherein the first microphone is separated from the second set of two or more microphones by a third spacing, and wherein the second microphone is separated from the second set of two or more microphones by a fourth spacing that is different to the third spacing.

8. The playback device of any of claims 4 to 7, wherein the first spacing is at least 5 times greater than the second spacing.

9. The playback device of any of claims 4 to 8, wherein the first set of two or more microphones are provided on a first face of the playback device and the second set of two or more microphones are provided on a second face of the playback device that is different to the first face.

10. The playback device of any of claims 4 to 9, wherein the first set of two or more microphones are upward-facing, in use.

11. The playback device of any of claims 4 to 10, wherein the second set of two or more microphones are forward-facing, in use.

12. The playback device of any of claims 4 to 11, wherein a first audio transducer is positioned opposite to the second set of two or microphones about a center point of the playback device.

13. The playback device of claim 12, wherein the second set of microphones are provided on a same face of the playback device as the first audio transducer.

14. The play back device of claim 12 or 13, wherein the first audio transducer is a low- frequency audio transducer.

15. The playback device of any of claims 4 to 14, comprising a plurality of audio transducers, and wherein the second set of microphones is separated from any of the plurality of audio transducers by at least the second spacing.

16. The playback device of any of claims 4 to 15, wherein the first set of two or more microphones forms a first sub-array having a first aperture and the second set of two or more microphones forms a second sub-array having a second aperture that is smaller than the first aperture.

17. The playback device of any of claims 4 to 16, wherein the second set of two or more microphones is positioned between at least two microphones of the first set of two or more microphones.

18. The playback device of any preceding claim, wherein the microphones of the microphone array are at least one of: distributed over at least two different faces of the playback device; or distributed over a curved surface.

19. A network microphone device comprising an asymmetrical microphone array comprising at least four microphones, the asymmetrical microphone array including: a first set of two or more microphones, wherein the microphones of the first set are separated by a first spacing; and a second set of two or more microphones, wherein the microphones of the second set are separated by a second spacing that is smaller than the first spacing.

20. A media playback system comprising: a first playback device according to any one of claims 1 to 18; and a second playback device comprising at least one audio transducer.