WO2012150731A1

WO2012150731A1 - Object control using heterogeneous input method

Info

Publication number: WO2012150731A1
Application number: PCT/KR2011/003363
Authority: WO
Inventors: Yooseok Cho
Original assignee: LG Electronics Inc
Current assignee: LG Electronics Inc
Priority date: 2011-05-04
Filing date: 2011-05-04
Publication date: 2012-11-08
Anticipated expiration: 2013-11-04

Abstract

Disclosed herein is a method of controlling an object, allowing the user to control an object in a more convenient manner using a voice and a gesture, and an image display apparatus using the same. For this purpose, an image display apparatus according to a first embodiment disclosed herein may include a microphone configured to acquire a voice; a camera configured to acquire an image; and a controller configured to extract first characteristic information from the voice, designate an interest object based on the first characteristic information, extract second characteristic information from the image based on the interest object, and reflect the second characteristic information on the interest object.

Description

OBJECT CONTROL USING HETEROGENEOUS INPUT METHOD

The present disclosure relates to an electronic device, and more particularly, to a method of controlling an object in an image display apparatus and the image display apparatus using the same.

An image display apparatus is an apparatus having a function of displaying an image that can be viewed by a user. The user can view broadcast through the image display apparatus. The image display apparatus displays broadcast selected by the user on the display unit among broadcast signals transmitted from broadcast stations. In recent years, all over the world, broadcast services are in the trend of changing analog broadcast into digital broadcast.

The digital broadcast refers to the broadcast of transmitting digital image and audio signals. The digital broadcast is robust to external noises, thereby resulting in low data loss, advantageous in error correction, and providing high-resolution and clear images, compared to analog broadcast.

An aspect of the present disclosure is to provide a method of controlling an object, allowing the user to control an object in a more convenient manner using a voice and a gesture, and an image display apparatus using the same.

In order to accomplish the foregoing task, there is provided an image display apparatus according to a first embodiment disclosed herein, and the image display apparatus may include a microphone configured to acquire a voice; a camera configured to acquire an image; and a controller configured to extract first characteristic information from the voice, designate an interest object based on the first characteristic information, extract second characteristic information from the image based on the interest object, and reflect the second characteristic information on the interest object.

According to an embodiment, it is characterized in that the controller may extract the second characteristic information from the image without displaying an interface for the interest object.

Furthermore, according to an embodiment, it is characterized in that the second characteristic information may include the depth information of a detection target.

Furthermore, according to an embodiment, it is characterized in that the controller may determine a level of the interest object based on the depth information.

Furthermore, according to an embodiment, it is characterized in that the second characteristic information may include the shape information of a detection target.

Furthermore, according to an embodiment, it is characterized in that the controller may extract the second characteristic information from a region excluding a display region of the image.

Furthermore, according to an embodiment, it is characterized in that the interest object may be an application being executed on the background.

Furthermore, according to an embodiment, it is characterized in that the controller may reflect the second characteristic information on the interest object based on at least one of an acquisition time and an acquisition order of the voice and image, respectively.

Furthermore, in order to accomplish the foregoing task, there is provided an image display apparatus according to a second embodiment disclosed herein, and the image display apparatus may include a microphone configured to acquire a voice; a camera configured to acquire an image; and a controller configured to extract first and second characteristic information from the voice, designate a first and a second interest object based on the first and the second characteristic information, respectively, extract third and fourth characteristic information from the image based on the first and the second interest object, respectively, and reflect the third and the fourth characteristic information on the first and the second interest object, respectively.

Furthermore, according to an embodiment, it is characterized in that the controller may extract the third and the fourth characteristic information from the image without displaying an interface for the first and the second interest object, respectively.

Furthermore, according to an embodiment, it is characterized in that the controller may sequentially reflect the third and the fourth characteristic information on the first and the second interest object, respectively.

Furthermore, according to an embodiment, it is characterized in that the controller may sequentially reflect the third and the fourth characteristic information on the first and the second interest object, respectively, upon receiving an execution command.

Furthermore, according to an embodiment, it is characterized in that the controller may store the third and the fourth characteristic information.

Furthermore, according to an embodiment, it is characterized in that the controller may simultaneously reflect the third and the fourth characteristic information on the first and the second interest object, respectively.

Furthermore, according to an embodiment, it is characterized in that the first and the second interest object may be applications being executed at the same time, respectively.

On the other hand, in order to accomplish the foregoing task, there is provided a method of controlling an object in an image display apparatus according to a first embodiment disclosed herein, and the method may include acquiring a voice and an image; extracting first characteristic information from the voice; designating an interest object based on the first characteristic information; extracting second characteristic information from the image based on the interest object; and reflecting the second characteristic information on the interest object.

Furthermore, according to an embodiment, it is characterized in that said extraction step may extract the second characteristic information from the image without displaying an interface for the interest object.

Furthermore, according to an embodiment, it is characterized in that said reflection step may determine a level of the interest object based on the depth information.

According to a method of controlling an object and an image display apparatus using the same disclosed herein, an object may be controlled by using a voice and an image, thereby providing an intuitive and convenient user interface environment.

In particular, according to a method of controlling an object and an image display apparatus using the same disclosed herein, the user may select a menu using a voice and control the selected menu using a gesture, thereby maximizing the use of a menu provided in an application being executed.

Furthermore, according to a method of controlling an object and an image display apparatus using the same disclosed herein, it may be possible to overcome the inconvenience in which the user should access a user interface provided in the control process when sequentially controlling one or more objects.

Furthermore, according to a method of controlling an object and an image display apparatus using the same disclosed herein, it may be possible to provide a user interface environment allowing the user to control a plurality of objects at the same time.

FIG. 1 is a view schematically illustrating an example of an entire broadcast system including an image display apparatus according to an embodiment disclosed herein;

FIG. 2 is a view specifically illustrating an example of an image display apparatus illustrated in FIG. 1;

FIGS. 3 and 4 are views illustrating that any one of image display apparatuses according to the embodiments disclosed herein is distinguished into a set-top box and a display unit.

FIG. 5 is a view specifically illustrating a camera unit in an image display apparatus according to an embodiment disclosed herein;

FIG. 6 is a flow chart illustrating the process of controlling an object in an image display apparatus according to a first embodiment disclosed herein;

FIG. 7 is an exemplary view illustrating the process of controlling an object using the depth information of a detection target according to a first embodiment disclosed herein;

FIG. 8 is an exemplary view illustrating the process of controlling an object using the shape information of a detection target according to a first embodiment disclosed herein;

FIG. 9 is an exemplary view illustrating the process of controlling an object in a region excluding a display region according to a first embodiment disclosed herein;

FIG. 10 is an exemplary view illustrating the process of controlling an application being executed in the background according to a first embodiment disclosed herein;

FIGS. 11 and 12 are views for explaining the process of controlling an object based on an acquisition time/acquisition order of the voice and image according to a first embodiment disclosed herein;

FIG. 13 is a flow chart illustrating the process of controlling an object in an image display apparatus according to a second embodiment disclosed herein;

FIGS. 14a and 14b are exemplary views illustrating the process of controlling a plurality of objects according to a second embodiment disclosed herein;

FIGS. 15a and 15b are exemplary views illustrating the process of sequentially controlling a plurality of objects according to a second embodiment disclosed herein; and

FIG. 16 is an exemplary view illustrating the process of simultaneously controlling a plurality of objects according to a second embodiment disclosed herein.

General description for multi-modal

Multi-modal stands for multiple modalities, and the modality implies each sense channel such as sight, hearing, touch, taste, smell, and the like. Exchanging each modality in a collective manner refers to multi-modal interaction.

A multi-modal system allows the user to access information such as voice, data, video, audio or other information, and e-mail, weather update, bank account transaction, news, or other information by one or more modalities through a user agent program such as a graphic browser or voice browser, and receive information by another modality.

In particular, the user may submit an information fetch request, such as speaking a fetched request to a microphone, by one or more modalities, and then receive fetched information by the same modality (i.e., voice) or another modality (i.e. image) through a graphic browser or the like that provides returned information as visual information on a display screen. In a communication device, the user agent program may operate in a similar manner to a web browser or another suitable software program resided in the device accessed to a network or another terminal device.

General description for image display apparatus

Hereinafter, the present invention will be described in more detail with reference to the accompanying drawings. A suffix "module" or "unit" used for constituent elements disclosed in the following description is merely intended for easy description of the specification, and thus the suffix "module" or "unit" may be interchangeably used each other.

On the other hand, an image display apparatus disclosed herein is an intelligent image display apparatus in which a computer-supported function is added to a broadcast receiving function, for example. The Internet function or the like may be added thereto while being faithful to the broadcast receiving function, and thus it may include an easy-to-use interface such as a handwriting type input device, a touch screen, a space remote controller or the like. Furthermore, due to the support of a wired or wireless Internet function, it may be connected to the Internet and a computer, thereby allowing functions, such as e-mail, web browsing, banking, game or the like, to be implemented. A standardized general-purpose OS may be used for such various functions.

Accordingly, for the image display apparatus disclosed herein, for example, various applications can be freely added or removed on a general-purpose OS kernel, thereby allowing various user-friendly functions to be carried out. More specifically, the image display apparatus may be network TV, HBBTV, smart TV and the like, and may be applicable to a smart phone according to circumstances.

Moreover, the embodiments of the present invention will be described in detail with reference to the accompanying drawings and the description disclosed therein, but the present invention will not be limited or restricted by the embodiments.

For the terminology used herein, general terms are selected which are widely used at present while taking the functions in the present invention into consideration, but it may vary according to the intention of those skilled in the art, general practices, or the advent of a new technology. Furthermore, in a specific case, terms arbitrarily selected by the present applicant may be used, and in this case, the meaning of the used terms will be disclosed in the corresponding portion of the detailed description. It should be noted that the terms used in this specification should not be merely construed as the nominal meaning thereof, but construed by the implied meaning thereof and the overall description of the specification.

FIG. 1 is a view schematically illustrating an example of an overall broadcast system including an image display apparatus according to an embodiment of the present invention.

As illustrated in FIG. 1, an overall broadcast system including an image display apparatus according to an embodiment of the present invention may be divided into a Content Provider (CP) 10, a Service Provider (SP) 20, a Network Provider (NP) 30, and a Home Network End Device (HNED) 40. The HNED 40 corresponds to a client 100, which is an image display apparatus according to an embodiment of the present disclosure, for example. The client 100 falls under an image display apparatus according to an embodiment of the present disclosure, and for example, the image display apparatus may be network TV, smart TV, IPTV, and the like.

The content provider 10 produces and provides various contents. For the content provider 10, for example, there may be a terrestrial broadcaster, a cable system operator (SO) or MSO (Multiple System Operator), a satellite broadcaster, an Internet broadcaster, and the like.

Furthermore, the content provider 10 may provide various applications or the like in addition to broadcast contents. In this regard, it will be described in detail below.

The service provider 20 may provide contents that are provided by the content provider 10 in a service package form. For example, the service provider 20 in FIG. 1 may package first terrestrial broadcast services, second terrestrial broadcast services, cable MSO, satellite broadcast services, various kinds of Internet broadcast services, applications and the like into a package to provide them to the user.

On the other hand, the service provider 20 may provide services to the side of the client 100 using the unicast or multicast method. The unicast method is a process of transmitting data on a one to one basis between one transmitter and one receiver. In case of the unicast method, for example, if the receiver requests data to the server, then the server can transmit data to the receiver according to the request. The multicast method is a process of transmitting data to a plurality of receivers in a specific group. For example, the server can transmit data in a lump to a plurality of previously registered receivers. The Internet Group Management Protocol (IGMP) or the like may be used for the multicast registration.

The network provider 30 may provide a network for providing services to the client 100. The client 100 may establish a Home Network End Device (HNED) to receive services.

For a means of protecting content transmitted from the foregoing image display apparatus system, conditional access, content protection or the like may be used. As an example of the conditional access or content protection, a process may be used such as CableCARD, DCAS (Downloadable Conditional Access System) or the like.

On the other hand, the client 100 may also provide content through a network. In this case, contrary to the foregoing case, inversely, the client 100 may be a content provider, and the content provider 10 may receive content from the client 100. In such a design, it may have an advantage capable of performing bi-directional content services or data services.

FIG. 2 is a view more specifically illustrating another example of an image display apparatus illustrated in FIG. 1.

Referring to FIG. 2, the image display apparatus 100 according to an embodiment of the present disclosure may include a broadcast receiver 105, an external device interface unit 135, a storage unit 140, a user input interface unit 150, a controller 170, a display unit 180, an audio output unit 185, a power supply unit 190, and a camera unit (not shown). The content provider 10 may include a tuner 110, a demodulation unit 120, and a network interface unit 130. Of course, according to circumstances, a design can be made to have the tuner 110 and the demodulation unit 120 without including the network interface unit 130, and on the contrary, a design can be made to have the network interface unit 130 without including the tuner 110 and the demodulation unit 120.

The tuner 110 selects a RF broadcast signal corresponding to the channel selected by the user or every prestored channels from the radio frequency (RF) broadcast signals received through an antenna. Furthermore, the tuner 110 transforms the selected RF broadcast signal into an intermediate frequency signal, a baseband image, or an audio signal.

For example, the selected RF broadcast signal may be transformed into a digital IF (DIF) signal if it is a digital broadcast signal, and may be transformed into an analog baseband video or audio signal (CVBS/SIF) if it is an analog broadcast signal. In other words, the tuner 110 can process both digital broadcast signals and analog broadcast signals. The analog baseband video or audio signal (CVBS/SIF) outputted from the tuner 110 may be directly input to the controller 170.

Furthermore, the tuner 110 may receive RF broadcast signals with a single carrier according to the Advanced Television System Committee (ATSC) method or RF broadcast signals having with a plurality of carriers according to the Digital Video Broadcasting (DVB) method.

On the other hand, the tuner 110 may sequentially select RF broadcast signals on all broadcast channels that have been stored through a channel storage function among the RF broadcast signals received through the antenna to transform it to an intermediate frequency signal or baseband video or audio signal.

The demodulation unit 120 receives a digital IF (DIF) signal that has been transformed by the tuner 110 to perform a demodulation operation.

For example, if the digital IF signal outputted from the tuner 110 is the ATSC method, then the demodulation unit 120 may perform 8-vestigal side band (8-VSB) demodulation, for instance. Furthermore, the demodulation unit 120 may perform channel decoding. For this purpose, the demodulation unit 120 may include a Trellis decoder, a de-interleaver, a Reed-Solomon decoder, and the like, to perform Trellis decoding, de-interleaving, and Reed-Solomon decoding.

For example, if the digital IF signal is the DVB method, then the demodulation unit 120 may perform Coded Orthogonal Frequency Division Modulation (COFDMA) demodulation, for instance. Furthermore, the demodulation unit 120 may perform channel decoding. For this purpose, the demodulation unit 120 may include a convolution decoder, a de-interleaver, a Reed-Solomon decoder and the like to perform convolution decoding, de-interleaving, and Reed-Solomon decoding.

The demodulation unit 120 may perform demodulation and channel decoding and then output a stream signal (TS). Here, the stream signal may be a multiplexed signal with video, audio, or data signals. For example, the stream signal may be a multiplexed MPEG-2 Transport Stream (TS) with an MPEG-2 video signal, a Dolby AC-3 audio signal, and the like. More specifically, MPEG-2 TS may include a 4-byte header, and a 184-byte payload.

On the other hand, the foregoing demodulation unit 120 may be provided in a separate manner according to the ATSC method or DVB method. In other words, it can be provided with an ATSC demodulation unit and a DVB demodulation unit.

The stream signal outputted from the demodulation unit 120 may be input to the controller 170. The controller 170 may perform inverse-multiplexing, video/audio signal processing and the like, and then output video to the display unit 180, and output audio to the audio output unit 185.

The external device interface unit 135 may be provided to connect an external device with the image display apparatus 100. For this purpose, the external device interface unit 135 may include an A/V input and output unit (not shown) or wireless communication unit (not shown).

The external device interface unit 135 may be connected to an external device such as a digital versatile disc (DVD), a Blu-ray disc, a gaming device, a camera, a camcorder, a computer (notebook) and the like in a wired/wireless manner. The external device interface unit 135 may transfer video, audio or data signals received from the outside through an external device connected thereto to the controller 170 of the image display apparatus 100. Furthermore, the external device interface unit 135 may output video, audio or data signals processed by the controller 170 to the external device connected thereto. For this purpose, the external device interface unit 135 may include an A/V input and output unit (not shown) or wireless communication unit (not shown).

The A/V input and output unit may include a USB terminal, a Composite Video Banking Sync (CVBS) terminal, a component terminal, a S-video terminal (analog), a Digital Visual Interface (DVI) terminal, a High Definition Multimedia Interface (HDMI) terminal, a RGB terminal, a D-SUB terminal, and the like.

The wireless communication unit may perform short-range wireless communication with other electronic devices. The image display apparatus 100 may be connected to other electronic devices in a network according to a communication standard such as Bluetooth, Radio Frequency Identification (RFID), Infrared Data Association (IrDA), Ultra Wideband (UWB), ZigBee, Digital Living Network Alliance (DLNA), and the like.

Furthermore, the external device interface unit 135 may be connected to at least one of various set-top boxes and the foregoing various terminals to perform an input and output operation with the set-top box.

On the other hand, the external device interface unit 135 may receive an application or application list within the adjoining external device to transfer it to the controller 170 or the storage unit 140.

The network interface unit 130 provides an interface for connecting the image display apparatus 100 to a wired/wireless network including the Internet network. The network interface unit 130 may include an Ethernet terminal, or the like, for example, for the connection with a wired network, and a communication standard such as Wireless LAN (WLAN, Wi-Fi), Wireless broadband (Wibro), World Interoperability for Microwave Access (Wimax), High Speed Downlink Packet Access (HSDPA), for example, for the connection with a wireless network.

The network interface unit 130 may transmit or receive data to or from another user or another electronic device through a connected network or another network linked with the connected network. In particular, the network interface unit 130 may send part of the content data stored in the image display apparatus 100 to a previously registered user or a selected user or selected electronic device among other electronic devices.

On the other hand, the network interface unit 130 may be connected to a specific web page through a connected network or another network linked the connected network. In other words, the network interface unit 130 may be connected to a specific web page through a network to send or receive data to or from the relevant server. In addition, the network interface unit 130 may receive content or data provided by the content provider or network operator. In other words, the network interface unit 130 may receive content and information related to the content such as a movie, an advertisement, a game, VOD, a broadcast signal and the like, provided from the content provider or network provider through a network. Furthermore, the network interface unit 130 may receive the firmware's update information or update file provided by the network operator. Furthermore, the network interface unit 130 may send data to the Internet, content provider, or network operator.

Furthermore, the network interface unit 130 may receive a desired application among the applications open to the public through a network.

The storage unit 140 may store programs for each signal processing or control within the controller 170 and may store signal-processed video, audio, or data signals.

Furthermore, the storage unit 140 may perform a function for temporarily storing video, audio, or data signals received from the external device interface unit 135 or network interface unit 130. Furthermore, the storage unit 140 may store information for a predetermined broadcast channel through a channel storage function.

Furthermore, the storage unit 140 may store an application or application list received from the external device interface unit 135 or network interface unit 130.

Furthermore, according to an embodiment of the present disclosure, the storage unit 140 may store mapping data for the user's gesture with the operation of an image display apparatus or the operation on an application.

The storage unit 140, for example, may include at least one type of storage medium including a flash memory type, a hard disk type, a multimedia card micro type, a card-type memory (e.g., SD or XD memory, etc), a Random Access Memory (RAM), a Read-Only Memory (EPROM, etc.), and the like. The image display apparatus 100 may reproduce a content file (a video file, a still image file, a music file, a document file, an application file, etc.) stored in the storage unit 140 to provide to the user.

FIG. 2 illustrates an embodiment in which the storage unit 140 is provided in a separate manner from the controller 170, but the scope of the present invention is not limited to this. The storage unit 140 may be included in the controller 170.

The user input interface unit 150 may transfer the user's input signals to the controller 170 or transfer signals received from the controller 170 to the user.

For example, the user input interface unit 150 may receive and process control signals, such as power on/off, channel selection, screen setup and the like, generated from the remote control device 200 or transmit and process control signals generated from the controller 170 to the remote control device 200 according to various communication methods, such as radio frequency (RF) communication, infrared (IR) communication and the like.

Furthermore, for example, the user input interface unit 150 may transfer control signals received from a local key (not shown), such as a power key, a channel key, a volume key, a setting key and the like, to the controller 170.

Furthermore, for example, the user input interface unit 150 may transfer control signals received from a sensing unit (not shown) for sensing the user's gesture to the controller 170 or transmit signals generated from the controller 170 to the sensing unit (not shown). Here, the sensing unit (not shown) may include a touch sensor, a voice sensor, a location sensor, a motion sensor, and the like.

The controller 170 may inverse-multiplex a stream received from the tuner 110, demodulation unit 120 or external device interface unit 135, and otherwise, process the inverse-multiplexed signals to generate or output signals for video or audio output.

The video signal that has been image-processed in the controller 170 may be input to the display unit 180 and displayed as video corresponding to the relevant video signal. Furthermore, the video signal that has been image-processed in the controller 170 may be input to an external output device through the external device interface unit 135.

The audio signal processed in the controller 170 may be audio-outputted to the audio output unit 185. Furthermore, the audio signal processed in the controller 170 may be input to an external output device through the external device interface unit 135.

Though not shown in FIG. 2, the controller 170 may include an inverse-multiplexing unit, a video processing unit and the like.

In addition, the controller 170 may control an overall operation within the image display apparatus 100. For example, the controller 170 may control the tuner 110 to tune a RF broadcast signal corresponding to the user's tuned channel or prestored channel.

Furthermore, the controller 170 may control the image display apparatus 100 by the user's command received through the user input interface unit 150 or internal program. In particular, a network may be connected thereto, thereby allowing the user's desired application or application list to be downloaded into the image display apparatus 100.

For example, the controller 170 may control the tuner 110 to receive a signal of the tuned channel according to a predetermined channel select command received through the user input interface unit 150. Then, the controller 170 processes video, audio or data signals of the tuned channel. The controller 170 may allow the user's tuned channel information or the like to be outputted through the display unit 180 or the audio output unit 185 together with the processed video or audio signal.

For another example, the controller 170 may allow video or audio signals generated from an external device, for example, a camera or camcorder, received through the external device interface unit 135, to be outputted through the display unit 180 or the audio output unit 185 according to an external device video play command received through the user input interface unit 150.

On the other hand, the controller 170 may control the display unit 180 to display an image. For example, the controller 170 may control a broadcast image received through the tuner 110, an external input image received through the external device interface unit 135, an image received through a network interface unit, or an image stored in the storage unit 140, to be displayed on the display unit 180. Here, the image displayed on the display unit 180 may be a still or moving image, and otherwise, may be a 2D or 3D image.

Furthermore, the controller 170 may control content to be reproduced. The content at this time may be content stored within the image display apparatus 100, received broadcast content, or external input content received from the outside. The content may be at least one of a broadcast image, an external input image, an audio file, a still image, a connected web screen, and a document file.

On the other hand, in connection with an embodiment of the present invention, the controller 170 may control to allow a home screen to be displayed on the display unit 180 according to a movement input to the home screen.

The home screen may have a plurality of card objects classified by content sources. The card object may include at least one of a card object indicating a thumbnail list of broadcast channels, a card object indicating a broadcast guide list, a card object indicating a broadcast reservation list or recording list, a card object indicating a media list within the image display apparatus or within a device connected to the image display apparatus. Furthermore, it may further include at least one of a card object indicating a connected external device list and a card object indicating a phone call related list.

Furthermore, the home screen may further include an application menu having at least one executable application item.

On the other hand, if there is a card object movement input, then the controller 170 may control the relevant card object to be moved and displayed, or a card object that has not been displayed on the display unit 180 to be moved and displayed on the display unit 180.

On the other hand, if a predetermined card object is selected from a plurality of card objects within the home screen, then the controller 170 may control an image corresponding to the relevant card object to be displayed on the display unit 180.

On the other hand, the controller 170 may control a received broadcast image and an object indicating the relevant broadcast image related information to be displayed within a card object displaying the broadcast image. Furthermore, the controller 170 may control the size of the broadcast image to be fixed by a lockout setup.

On the other hand, the controller 170 may control a setting object for at least one of an image setup, a screen setup, a reservation setup within the image display apparatus, a pointer setup in the remote control device, and a network setup, to be displayed in the home screen.

On the other hand, the controller 170 may control an object for login, help, or logout item in a region of the home screen.

On the other hand, the controller 170 may control an object indicating the number of the whole card objects or the number of card objects displayed on the display unit 180 among the whole card objects in a region of the home screen.

On the other hand, when a card object name within a predetermined card object among the card objects displayed on the display unit 180 is selected, the controller 170 may control the relevant card object to be displayed on the display unit 180 as a whole screen.

On the other hand, when a receiving call is received in the connected external device or image display apparatus, the controller 170 may control a call related card object to be focused and displayed among a plurality of card objects or a call related card object to be moved and displayed into the display unit 180.

On the other hand, when entering an application view item, the controller 170 may control an application or application list within the image display apparatus 100 or an application or application list that can be downloaded from an external network.

The controller 170 may control an application downloaded from an external network to be installed and driven, in addition to various user interfaces. Furthermore, the controller 170 may control an image related to an application being executed to be displayed on the display unit 180 by the user's selection.

On the other hand, though not shown in the drawing, it may further include a channel browsing processing unit for generating a thumbnail image corresponding to the channel signal or external input signal.

The channel browsing processing unit may receive a stream signal (TS) outputted from the demodulation unit 120, a stream signal outputted from the external device interface unit 135, or the like to extract an image from the received stream signal, thereby generating a thumbnail image. The generated thumbnail image may be encoded as it is, to be input to the controller 170. Furthermore, the generated thumbnail image may be also encoded in a stream type to be input to the controller 170. The controller 170 may display a thumbnail list having a plurality of thumbnail images on the display unit 180 using an input thumbnail image. On the other hand, thumbnail images within the thumbnail list may be sequentially or simultaneously updated. As a result, the user may grasp the content of a plurality of broadcast channels in a convenient manner.

The display unit 180 may convert video, data and OSD signals that are processed by the controller 170, video and data signals that are received from the external device interface unit 135, or the like, into R, G, and B signals, respectively, to generate a drive signal.

The display unit 180 may be provided with a PDP, an LCD, an OLED, a flexible display, a 3D display, and the like.

On the other hand, the display unit 180 may be configured with a touch screen to be used as an input device in addition to an output device.

The audio output unit 185 may receive an audio-processed signal, for example, a stereo signal, a 3.1-channel signal or a 5.1-channel signal from the controller 170 to output it as audio. The audio output unit 185 may be implemented by various types of speakers.

On the other hand, to detect the user's gesture, the image display apparatus 100 may further include a sensing unit (not shown) having at least one of a touch sensor, a voice sensor (microphone), a location sensor and a motion sensor as described above. The signal detected by the sensing unit (not shown) may be transferred to the controller 170 through the user input interface unit 150.

On the other hand, the camera unit (not shown) for capturing a subject (object or scene) may be further provided therein. Image information captured by the camera unit (not shown) may be input to the controller 170.

The camera unit (not shown) will be described in detail below in FIG. 5.

The controller 170 may receive a captured image from the camera unit (not shown) or a detected signal from the sensing unit (not shown) respectively or in a combined manner to detect the user's gesture.

The power supply unit 190 may supply the relevant powers over the entire image display apparatus 100.

In particular, the power supply unit 190 may supply power to the controller 170 that can be implemented in a system-on-chip (SOC) form, a display unit 180 for displaying video, an audio output unit 185 for outputting audio.

For this purpose, the power supply unit 190 may include a converter (not shown) for converting alternating-current power into direct-current power. On the other hand, for example, if the display unit 180 is implemented as a liquid crystal panel having a plurality of backlight lamps, then an inverter (not shown) capable of performing a PWM operation may be further included therein for brightness variation or dimming driving.

The remote control device 200 transmits a user input to the user input interface unit 150. For this purpose, the remote control device 200 may use various communication techniques such as BluetoothTM, Radio Frequency (RF) communication, Infrared (IR) communication, Ultra Wideband (UWB), ZigBeeTM, and the like.

In addition, the remote control device 200 may receive video, audio, or data signals outputted from the user input interface unit 150 to display it on the remote control device 200 or output audio or vibration.

The foregoing image display apparatus 100 may be a fixed-type digital broadcast receiver capable of receiving at least one of ATSC (8-VSB) broadcast services, DVB-T (COFDM) broadcast services, and ISDB-T (BST-OFDM) broadcast services.

On the other hand, the block diagram of the image display apparatus 100 illustrated in FIG. 2 is a block diagram for an embodiment of the present invention. Each constituent element in the block diagram may be integrated, added, or deleted according to the specification of an actually implemented image display apparatus 100. In other words, according to circumstances, two or more constituent elements may be integrated into one constituent element, or one constituent element may be divided into two or more constituent elements. Furthermore, the function carried out in each block is provided to describe the embodiment of the present invention, and the detailed operation or device will not limit the rights scope of the present invention.

On the other hand, the image display apparatus 100, contrary to FIG. 2, may not have the tuner 110 and the demodulation unit 120 as illustrated in FIG. 2, but may receive or play video content through the network interface unit 130 or external device interface unit 135.

On the other hand, the image display apparatus 100 is an example of the video signal-processing device that performs signal processing for an image stored in the device or input to the device. Other examples of the video signal processing device may further include a set-top box excluding the display unit 180 and the audio output unit 185, the foregoing DVD player, a Blue-ray player, a gaming device, a computer, and the like. Of them, the set-top box will be described below with reference to FIGS. 3 and 4.

FIGS. 3 and 4 are views illustrating that any one of image display apparatus according to the embodiments of the present invention is distinguished into a set-top box and a display unit.

First, referring to FIG. 3, a set-top box 250 and a display device 300 may transmit or receive data in a wired or wireless manner.

The set-top box 250 may include a network interface unit 255, a storage unit 258, a signal-processing unit 260, a user input interface unit 263, and an external device interface unit 265.

The network interface unit 255 provides an interface for connecting to a wired/wireless network including the Internet network. Furthermore, the network interface unit 255 may transmit or receive data to or from another user or another electronic device through a connected network or another network linked with the connected network.

The storage unit 258 may store programs for each signal processing or control within the signal processing unit 260, and perform a function for temporarily storing video, audio, or data signals received from the external device interface unit 265 or network interface unit 255.

The signal-processing unit 260 performs signal processing for input signals. For example, the signal-processing unit 260 may perform inverse-multiplexing or decoding for input video signals, and perform inverse-multiplexing or decoding for input audio signals. For this purpose, the signal-processing unit 260 may further include a video decoder or audio decoder. The signal-processed video or audio signals may be transmitted to the display unit 300 through the external device interface unit 265.

The user input interface unit 263 transfers the user's input signals to the signal processing unit 260, or transfers signals generated from the signal-processing unit 260 to the user. For example, the user input interface unit 263 may receive various control signals, such as power on/off, operation input, setup input and the like, received through the local key (not shown) or the remote control device 200, and transfer them to the signal processing unit 260.

The external device interface unit 265 provides an interface for transmitting or receiving data to or from an external device connected in a wired/wireless manner. In particular, the external device interface unit 265 provides an interface for transmitting or receiving data to or from the display device 300. In addition, it may be also possible to provide an interface for transmitting or receiving data to or from an external device, such as a gaming device, a camera, a camcorder, a computer (notebook) and the like.

On the other hand, the set-top box 250 may further include a media input unit (not shown) for reproducing media. The examples of such a media input unit may include a Blu-ray input unit (not shown). In other words, the set-top box 250 may be provided with a Blu-ray player. Input media such as a Blu-ray disc or the like may be signal-processed with inverse-multiplexing or decoding in the signal processing unit 260, and then transmitted to the display device 300 through the external device interface unit 265 for the display.

The display device 300 may include a tuner 270, an external device interface unit 273, a demodulation unit 275, a storage unit 278, a controller 280, an user input interface unit 283, a display unit 290, and an audio output unit 295.

The tuner 270, demodulation unit 275, storage unit 278, controller 280, user input interface unit 283, display unit 290, and audio output unit 295 correspond to the tuner 110, demodulation unit 120, storage unit 140, controller 170, user input interface unit 150, display unit 180, and audio output unit 185 as described above in FIG. 6, and thus the description thereof will be omitted.

On the other hand, the external device interface unit 273 provides an interface for transmitting or receiving data to or from an external device connected in a wired/wireless manner. In particular, the external device interface unit 265 provides an interface for transmitting or receiving data to or from the set-top box 250.

As a result, video or audio signals received through the set-top box 250 will be passed through the controller 170, and then outputted through the display unit 180 or audio output unit 185.

On the other hand, referring to FIG. 4, the set-top box 250 and display device 300 are similar to the set-top box 250 and display device 300 illustrated in FIG. 3, but there exists a difference in that the tuner 270 and demodulation unit 275 are not located within the display device 300 but located within the set-top box 250. Hereinafter, the difference will be primarily described.

The signal-processing unit 260 may perform signal processing for broadcast signals received through the tuner 270 and demodulation unit 275. Furthermore, the user input interface unit 263 may receive an input such as channel selection, channel storage and the like.

FIG. 5 is a view specifically illustrating a camera unit of the image display apparatus according to an embodiment of the present invention.

According to an embodiment, the camera unit 400 may include a plurality of cameras capable of acquiring information different from one another in order to acquire various information through the camera unit 400.

Referring to FIG. 5, a camera unit 400 according to an embodiment of the present invention may include

depth cams

401, 402, a RGB cam 403 (hereinafter, a camera refers to at least one of the

depth cams

401, 402 and the RGB cam 403), a camera memory 404, a camera controller 405, and

audio receivers

406, 407. The depth cam may include a depth CMOS image sensor 401, and an infrared (IR) light source 402. The audio receiver may include a microphone 406 and a sound source recognition unit 407.

The depth cam is a camera characterized in that a pixel value recognized from an image captured through the depth cam is a distance from the depth cam.

The depth cam may include an image sensor 401 and an infrared light source 402. There exist a time-of-flight (TOF) method in which infrared light is emitted from the infrared light source 402 to acquire distance information between the subject and the depth cam from a phase difference between the emitted infrared light and infrared light reflected and returned from the subject, and a structured light method in which an infrared light pattern (numerous infrared light points) is emitted from the infrared light source 402, and the patterns reflected from the object are captured by the controller 401 having a filter to acquire distance information between the subject and the depth cam based on a pattern in which the patterns are distorted.

In other words, the image display apparatus may recognize the location information of the subject through the depth cam. In particular, if the subject is a person, then the image display apparatus may acquire the location coordinate of each portion of the body to search for the movement of each portion of the body, thereby acquiring information for the detailed motion of the body.

In addition, the RGB cam 403 is a camera that acquires color information as a pixel value. The RGB cam may include three CMOS image sensors for acquiring information on each color of red (R), green (G), and blue (B). Furthermore, the RGB cam may acquire a relatively high-resolution image compared to the depth cam.

The camera memory 404 stores a setting value of the depth cam and the RGB cam. In other words, when a signal for capturing a subject is received from the user by using the camera unit 400, the camera unit 400 analyzes an image received from the camera unit 400 through the controller 405, and loads a camera setting value based on the analysis result to configure a capturing environment of the depth cam and the RGB cam.

Furthermore, the camera memory 404 may store an image captured through the depth cam and the RGB cam, and the stored image may be loaded when a call signal of the stored image is received from the user.

The microphone 406 receives sound waves or ultrasonic waves existing in the neighborhood of the camera unit 400 and transmits an electrical signal based on the vibration to the camera controller 405. In other words, the user may control the user's voice to be input to the image display apparatus through the microphone 406 and stored together with an image input through the camera unit 400, and a predetermined operation to be carried out in the image display apparatus through the input audio.

If a predetermined content or service is being used in the image display apparatus, then the sound source recognition unit 407 receives an audio signal of the content or service being used and transmits an electrical signal based on the vibration to the camera controller 405. In other words, contrary to the microphone 406, the sound source recognition unit 407 extracts and recognizes an audio signal from broadcast signals received by the image display apparatus.

The camera controller 405 controls the operation of each module. In other words, upon receiving a capture start signal using the camera unit 400, the camera controller 405 may control a subject to be captured through the depth cam and RGB cam, and the captured imaged may be analyzed to load setup information from the camera memory 404, thereby controlling the the depth cam and RGB cam. Furthermore, when an audio signal is received through the audio recognition unit, the camera controller 405 may store the input audio signal together with an image signal captured through the depth cam and RGB cam in the camera memory 404.

Through the foregoing configuration, the user may input a predetermined image and voice to the image display apparatus, and control the image display apparatus through the input image or voice.

FIG. 6 is a flow chart illustrating the process of controlling an object in an image display apparatus according to a first embodiment disclosed herein.

Hereinafter, the microphone refers to a means for acquiring a voice, and the camera refers to a means for acquiring an image. The microphone and the camera are included in the image display apparatus as described in Fig. 1. Alternatively, the microphone and the camera are included in the camera unit 400 as described in Fig. 5.

The microphone acquires a voice, and the camera acquires an image (S110).

For example, the microphone detects the user's voice (natural type voice) existing in the neighbourhood, and outputs the detected voice to the controller 170. Then, the camera captures a detection target, and outputs the captured image to the controller 170.Then, the controller 170 extracts first characteristic information from the voice (S120).

The controller 170 recognizes a voice input from the microphone by converting it into a text string or the like, and converts the recognized voice in the form of a text string or the like into the meaning structure having a previously formulated form to extract the meaning of the voice.

Here, the meaning structure having a predetermined formulated form refers to a meaning structure having the form of a command associated with the control of the image display apparatus 100 or an application provided from the image display apparatus 100, such as "thickness", "double", "volume", "TV volume", and the like, which will be described later.

Furthermore, the controller 170 designates an interest object based on the first characteristic information (S130).

Here, the object refers to an object allowing the user to apply an input such as an application, application data, content data, a menu, and the like to control the operation of an image display apparatus. Furthermore, the interest object may refer to an object that can be controlled by an input currently applied by the user among a plurality of objects capable of controlling the operation of an image display apparatus. In other words, the interest object refers to an object over which the user has control authority.

The controller 170 designates an object corresponding to the meaning of a voice extracted in the step of S120 as an interest object among a plurality of objects. For example, if "volume" is extracted in the step of S120, then a volume menu may be designated as an interest object.

Then, the controller 170 extracts second characteristic information from the image based on the interest object (S140).

The controller 170 recognizes an image received from the camera by converting it into the form of a gesture or the like, and converts the recognized gesture into the meaning structure having a previously formulated form to extract the meaning of the image for an interest object.

Here, the meaning structure having a previously formulated form refers to a meaning structure having the form of a command associated with the control of the image display apparatus 100 or an application provided from the image display apparatus 100, such as "the motion of raising up a hand", "the shape of a finger indicating right side", and the like of a detection target which will be described later.

If the interest object is a volume menu, then the controller 170 may extract a change of the coordinate value in the Y-axis direction from "the motion of raising up a hand".

Then, the controller 170 reflects the second characteristic information on the interest object (S150).

For example, if the interest object is a volume menu, then the controller 170 may perform an operation for increasing the volume based on a change of the coordinate value in the Y-axis direction of the detection target in an image.

FIG. 7 is an exemplary view illustrating the process of controlling an object using the depth information of a detection target according to a first embodiment disclosed herein.

Referring to FIG. 7, an application 510 capable of displaying a line along the movement of a detection target, for example, a finger, a palm, a stylus or the like, may be executed in the image display apparatus 100. When the user moves a fingertip, a change of the coordinate value may be extracted from an image to display a line 512a corresponding to the change of the coordinate value on the display unit. In this state, if the user wants to display a thicker line, then the pen thickness setup should be changed by using a tool provided by the application.

For example, the user may take a gesture (1) for moving a finger forward while uttering "thickness". In this case, the microphone may acquire a voice of "thickness", and the camera may acquire an image including the gesture (1) for moving a finger forward.

Then, the controller 170 may extract a command of "thickness" from the voice, and select a pen thickness menu 520. The pen thickness menu 520 may not be displayed on the display unit of the image display apparatus 100 during the process of setting up a pen thickness.

Then, the controller 170 may extract a change value in the Z-axis direction of a fingertip from the acquired image, and set up a thickness value of the pen thickness menu 520 using the extracted change value. For example, if a difference of the depth value between the start point and the end point in the gesture (1) is 4, then the controller 170 may change a thickness value of the pen thickness menu 520, for example, from 1 to 5. The user may subsequently take a gesture (2) for moving a finger to the side, and a line 512b corresponding to the gesture (2) may be displayed with the thickness of 5.

FIG. 8 is an exemplary view illustrating the process of controlling an object using the shape information of a detection target according to a first embodiment disclosed herein.

Referring to FIG. 8, an application 610 capable of reproducing video in the image display apparatus 100 may be executed. If the user wants to control a reproduction speed during the reproduction of video, then the reproduction speed setup should be changed by using a reproduction speed menu provided in the application.

For example, the user may take a gesture for unfolding two fingers only while uttering "fast". In this case, the microphone may acquire a voice of "fast", and the camera may acquire an image including a shape for unfolding two fingers only.

Then, the controller 170 may extract a command of "fast" from the voice and select a reproduction speed menu 620. The reproduction speed menu 620 may not be displayed on the display unit of the image display apparatus 100 during the process of setting up a reproduction speed.

Then, the controller 170 may extract the number of fingers from the shape information of a detection target in the acquired image, and set up a speed value of the reproduction speed menu 620 using the extracted number of fingers. For example, if the number of unfolded fingers is 2, then the controller 170 may change the reproduction speed from a single speed to a double speed.

FIG. 9 is an exemplary view illustrating the process of controlling an object in a region excluding a display region according to a first embodiment disclosed herein.

Referring to FIG. 9, an application 710 capable of receiving broadcast in the image display apparatus 100 may be executed. If the user wants to increase a volume of the channel being viewed while receiving broadcast, then the volume setup should be changed by using a volume menu provided in the application.

For example, the user may take a gesture for moving a finger from the bottom to the top while uttering "volume". In this case, the microphone may acquire a voice of "volume", and the camera may acquire an image including a gesture for moving a finger from the bottom to the top.

Then, the controller 170 may extract a command of "volume" from the voice and select a volume setup menu 720. The volume setup menu 720 may not be displayed on the display unit of the image display apparatus 100 during the process of setting up a volume.

Then, the controller 170 may extract a change value in the Y-axis direction of a fingertip from the acquired image, and set up a level of the volume setup menu 720 using the extracted change value. For example, if the user takes a gesture for moving a finger from the bottom to the top, then the controller 170 may increase the volume. Furthermore, the controller 170 may determine a level of increasing the volume based on a change of the coordinate value between the start point and the end point of a gesture, a speed, an acceleration, or the like while taking a gesture, and reflect the determined level on the volume setup menu 720.

Here, the gesture may not be a gesture for directly (or explicitly) indicating an interest object in the display region. For example, if it is a gesture in a region (surrounded by a dotted line) that can be detected by the camera as well as an extended region (surround by a solid line) in the Z-axis direction with respect to the display region, any gesture may be reflected on the interest object. If it is a gesture capable of extracting characteristic information that can be reflected on the interest object even when the interest object is displayed on the display unit, then the gesture may not be a gesture indicating the interest object.

FIG. 10 is an exemplary view illustrating the process of controlling an application being executed in the background according to a first embodiment disclosed herein.

Referring to FIG. 10, the image display apparatus 100 may support multitasking. In other words, the image display apparatus 100 may execute a web browsing application on the foreground while executing a broadcast receiving program on the background. The image of an application being executed on the foreground may be displayed on the screen. If the user wants to increase a volume of the channel being viewed while executing a web browsing application 810, then a broadcast receiving application should be called to be executed on the foreground and the volume setup should be changed by using a volume menu provided in the broadcast receiving application.

For example, the user may take a gesture for moving a finger from the bottom to the top while uttering "TV volume". In this case, the microphone may acquire a voice of "TV volume", and the camera may acquire an image including a gesture for moving a finger from the bottom to the top.

Then, the controller 170 may extract a command of "TV volume" from the voice and select a volume setup menu 820 of the broadcast receiving application. The volume setup menu 820 may not be displayed on the display unit of the image display apparatus 100 during the process of setting up a volume.

Then, the controller 170 may extract a change value in the Y-axis direction of a fingertip from the acquired image, and set up a level value of the volume setup menu 820 using the extracted change value. For example, if the user takes a gesture for moving a finger from the bottom to the top, then the controller 170 may increase the volume. Furthermore, the controller 170 may determine a level of increasing the volume based on a change of the coordinate value between the start point and the end point of a gesture, a speed, an acceleration, or the like while taking a gesture, and reflect the determined level on the volume setup menu 820.

FIGS. 11 and 12 are views for explaining the process of controlling an object based on an acquisition time/acquisition order of the voice and image according to a first embodiment disclosed herein.

Referring to FIG. 11, the controller 170 may extract first characteristic information (C1) from a voice, and detect an acquisition start time (t1) and an acquisition end time (t2) of the first characteristic information (C1). In this case, the second characteristic information (characteristic information extracted from an image) in which the acquisition start time is located between time t1 and t3 may be reflected on an interest object designated by the first characteristic information (C1). Here, a time between t2 and t3 refers to a threshold time, which implies a minimum time for which the second characteristic information can be reflected on an interest object designated by the first characteristic information.

The threshold time is a minimum time required for allowing the user to combine a voice and a gesture to control the interest object, which may be determined by an experiment or may be changed by the user.

For example, second characteristic information (C2) in which the acquisition start time is located between t1 and t2, and third characteristic information (C3) in which the acquisition start time is located between t2 and t3 may be reflected on an interest object designated by the first characteristic information (C1). On the contrary, second characteristic information (C4) in which the acquisition start time is later than t3 may not reflected on the interest object designated by the first characteristic information (C1).

Referring to FIG. 12, the same standard can be of course applied thereto according to the acquisition start time of the first characteristic information and the acquisition end time of the second characteristic information. For example, if the acquisition end time of the second characteristic information exists within a threshold time rather than the acquisition start time t2 of the first characteristic information (C1), then the second characteristic information may be reflected on the interest object designated by the first characteristic information (C1).

For example, the second characteristic information (C2) in which the acquisition end time is located within a threshold time from the acquisition start time of the first characteristic information (C1) as well as the second characteristic information (C3) in which the acquisition start time is located within a threshold time from the acquisition end time of the first characteristic information (C1) may be reflected on an interest object designated by the first characteristic information (C1).

On the other hand, if there exist a plurality of second characteristic information (C2 and C3) having the acquisition end time and acquisition start time within a threshold time from the acquisition start time and acquisition end time of the first characteristic information (C1), respectively, then at least one of the second characteristic information (C2) that has been acquired first and the second characteristic information (C3) that has been acquired later may be reflected on the interest object.

For example, if the user takes a first gesture of raising up a finger, utters "volume", and takes a second gesture of raising up a finger again, then the first gesture and the second gesture may be sequentially reflected on a volume menu, respectively. On the contrary, if the user takes a first gesture indicating a specific region on the display unit, utters "copy", and takes a second gesture indicating another specific region on the display unit again, then only the first gesture and the second gesture may be reflected on a copy menu according to the setup of the image display apparatus.

FIG. 13 is a flow chart illustrating the process of controlling an object in an image display apparatus according to a second embodiment disclosed herein.

The microphone acquires a voice, and the camera acquires an image (S210).

For example, the microphone detects the user's voice (natural type voice) existing in the neighbourhood, and outputs the detected voice to the controller 170. Then, the camera captures a detection target, and outputs the captured image to the controller 170. Then, the controller 170 extracts first and second characteristic information from the voice (S220).

The controller 170 recognizes a voice input from the microphone by converting it into a series of text strings or the like, and converts the recognized voice in the form of a series of text strings or the like into the meaning structure having a previously formulated form to extract the meaning of the voice.

The controller 170 designates a first and a second interest object based on the first and the second characteristic information (S230).

The controller 170 designates an object corresponding to the meaning of a voice extracted in the step of S220 as an interest object among a plurality of objects. For example, if "move" and "enlarge" is extracted in the step of S220, then a move menu and an enlarge menu may be designated as the interest objects, respectively.

The controller 170 extracts third and fourth characteristic information from the image based on the first and the second interest object (S240).

The controller 170 recognizes an image received from the camera by converting it into the form of a gesture or the like, and converts the recognized gesture into the meaning structure having a previously formulated form to extract the meaning of the image for each interest object.

Here, the meaning structure having a previously formulated form refers to a meaning structure having the form of a command associated with the control of the image display apparatus 100 or an application provided from the image display apparatus 100, such as "the motion of moving a hand to the side", "the motion of moving a hand forward" of a detection target which will be described later.

The controller 170 may extract the x, y coordinate of the start point and the end point from "the motion of moving a hand to the side" if the interest object is a move menu, and extract a change of the coordinate value in the Y-axis direction from "the motion of moving a hand forward" if the interest object is an enlarge menu.

Then, the controller 170 reflects the third and the fourth characteristic information on the first and the second interest object, respectively (S250).

For example, if the interest object is a move menu and an enlarge menu, respectively, then the controller 170 may perform an operation of movement based on a change of the x, y coordinate value of the start point and the end point extracted in "the motion of moving a hand to the side" of the image, and may perform an operation of enlargement based on a change of the coordinate value in the Z-axis direction extracted in "the motion of moving a hand forward" of the image. As a result, contents to which a move menu and an enlarge menu are sequentially applied may be displayed on the screen.

FIGS. 14a and 14b are exemplary views illustrating the process of controlling a plurality of objects according to a second embodiment disclosed herein.

Referring to FIG. 14a, if the user wants to designate and copy a specific region in a first region 910 and then paste the copied specific region to a second region 920 while executing a content reproduction application in the image display apparatus 100, then he or she may designate the specific region and then call a copy command, and designate a region to which the copied region is pasted and then call a paste command.

For example, referring to FIG. 14a, the user may take a gesture for designating a region to be copied while uttering "copy". In this case, the microphone may acquire a voice of "copy", and the camera may acquire an image including a gesture for designating a region to be copied.

Then, the controller 170 may extract an instruction of "copy" from the voice, and call a copy command.

Then, the controller 170 may extract points indicated by each end of two fingers from the acquired image, and select a rectangular-shaped region in which a line segment for connecting two points is a diagonal. Then, the controller 170 may store content in the rectangular-shaped region.

Subsequently, referring to FIG. 14b, the user may take a gesture for designating a region to which the content of the copied region is to be pasted while uttering "paste". In this case, the microphone may acquire a voice of "paste", and the camera may acquire an image including a gesture for designating a region to which the content of the copied region is to be pasted.

Then, the controller 170 may extract an instruction of "paste" from the voice, and call a paste command.

Then, the controller 170 may extract a point designated by an end of the finger from the acquired image, and paste the stored content to the extracted point.

If the user takes a gesture of FIG. 14a and utters a voice of "copy" within a threshold time and then takes a gesture of FIG. 14b within a threshold time again, then the gesture of FIG. 14a and the gesture of FIG. 14b may be characteristic information that can be reflected on a copy menu. However, for the copy menu, the gesture of FIG. 14b in which only one point has been selected may not be reflected on the copy menu when both the start point and the end point should be designated as in the gesture of FIG. 14a. Accordingly, the controller 170 can reflect only the gesture of FIG. 14a on the copy menu.

Contrary to the drawing, if the gesture of FIG. 14b is a gesture in which two points have been selected similarly to the gesture of FIG. 14a, then only the gesture of FIG. 14a that has been acquired first may be reflected on the copy menu or only the gesture of FIG. 14b that has been acquired later may be reflected on the copy menu

FIGS. 15a and 15b are exemplary views illustrating the process of sequentially controlling a plurality of objects according to a second embodiment disclosed herein.

Referring to FIG. 15a, the image display apparatus 100 may execute a map application according to the user's input. The map application may be executed, and a map corresponding to the region selected by the user's input may be displayed on the screen 1010. If the user wants to move to a specific point 1012 and enlarge the corresponding portion in the map, then he or she may first select a move menu to move to the specific point 1012, and select an enlarge menu to choose a magnification factor, thereby allowing the specific point 1012 to be enlarged and displayed on the screen 1010.

For example, the user may take a gesture (1) for moving a finger from the left side to the right side while uttering "move", and subsequently take a gesture (2) for moving a finger forward while uttering "enlarge". In this case, the microphone may acquire a voice of "move" and a voice of "enlarge", and the camera may acquire an image including a gesture (1) for moving a finger from the left side to the right side and a gesture (2) for moving a finger forward.

Then, the controller 170 may extract the commands of "move" and "enlarge" from the voice and designate a move menu and an enlarge menu as the interest objects, respectively. The move menu and enlarge menu may not be displayed on the display unit of the image display apparatus 100 during the process of moving and enlarging the content.

Then, the controller 170 may extract a change value of the x, y coordinate of the start point and the end point of the gesture (1) from the acquired image, and extract a change of the coordinate value in the Z-axis direction from gesture (2).

Then, the controller 170 stores the extracted values in association with the corresponding interest object. Then, upon receiving an execution command, the controller 170 may call each of the extracted values, move a map according to a change value of the x, y coordinate of the start point and the end point, and subsequently enlarge a map as much as a magnification factor corresponding to the change value of the Z-axis coordinate in a state that the map has been moved. As a result, referring to FIG. 15b, the content may be moved to be displayed at the center of the specific region 1012, and the content may be enlarged and displayed on the screen 1020 around the center of the specific region 1012.

The image display apparatus 100 may support a multitasking function for simultaneously executing two or more applications. For example, the image display apparatus 100 may execute an e-book application according to the user’s first input while the image display apparatus 100 may execute a broadcast receiving application according to the user's second input. Furthermore, the image display apparatus 100 may divide a screen into a first region 1110 and a second region 1120, and display an execution image of the e-book application on the first region 1110 and an execution image of the broadcast receiving application on the second region 1120, respectively.

If the first user wants to change a page on the e-book application and the second user to change a channel on the broadcast receiving application, then the first user may acquire the control for a page change menu of the e-book application to change the page, and then the second user may acquire the control for a channel change menu of the broadcast receiving application to change the channel.

If the first user takes a gesture for moving a finger from the top to the bottom while uttering "page", and take a gesture for moving a finger from the bottom to the top while uttering "channel", then the page turn of the e-book application may occur by the first user's gesture and the channel switching of the broadcast receiving application may occur by the second user's gesture.

In other words, the microphone may acquire a voice of "page" and a voice of "channel", and the camera may acquire an image including a gesture for moving a finger from the top to the bottom and a gesture for moving a finger from the bottom to the top.

Then, the controller 170 may extract a command of "page" from the voice and select a page setup menu of the e-book application. At the same time, the controller 170 may extract a command of "channel" from the voice and select a channel setup menu of the broadcast receiving application. The page setup menu and channel setup menu may not be displayed on the display unit of the image display apparatus 100 during the process of setting up the page and channel.

Then, the controller 170 may extract a change value in the Y-axis direction of the finger from the first user's gesture of the acquired image and a change value in the Y-axis direction of the finger from the second user's gesture thereof, and the extracted change values may be associated with the selected menus, respectively. Furthermore, the controller 170 may set up the page and channel using the extracted change values, respectively.

For example, the controller 170 may control a page subsequent to the currently displayed page to be displayed in the first region according to the first user's gesture. At the same time, the controller 170 may control a channel prior to the currently displayed channel to be displayed in the second region according to the second user's gesture.

On the other hand, during this process, the controller 170 may use various methods to associate a "page" to the first user's gesture, and associate a "channel" to the second user's gesture.

For example, a gesture generated from a detection region corresponding to the first region displayed with an execution image of the e-book application may be associated with a page setup menu, and a gesture generated from a detection region corresponding to the second region displayed with an execution image of the broadcast receiving application may be associated with a channel setup menu.

Otherwise, the voice patterns of "page" and "channel" are recognized and compared with the user's voice pattern to discern a gesture. Otherwise, if the type of the gesture applicable to each of the "page" and "channel" is different, then each of the voice and image may be associated thereto according to the type of the gesture.

On the other hand, a method of controlling an object in an electronic device according to an embodiment disclosed herein may be implemented as codes readable by a processor on a medium readable by the processor provided in the electronic device. The processor-readable media may include all types of recording devices in which data readable by the processor can be stored. Examples of the processor-readable media may include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage device, and the like, and also include a device implemented via a carrier wave such as transmission via the Internet. Furthermore, the processor-readable media may be distributed in a computer system connected to a network, and codes readable in a distributed manner by the processor may be stored and executed.

While the invention has been shown and described with respect to various embodiments of the present invention, it will be of course understood by those skilled in the art that various modifications may be made without departing from the gist of the invention as defined in the following claims, and it is to be noted that those modifications should not be understood individually from the technical spirit and prospect of the present invention.

Claims

An image display apparatus, comprising:

a microphone configured to acquire a voice;

a camera configured to acquire an image; and

a controller configured to extract first characteristic information from the voice, designate an interest object based on the first characteristic information, extract second characteristic information from the image based on the interest object, and reflect the second characteristic information on the interest object.
The image display apparatus of claim 1, wherein the controller extracts the second characteristic information from the image without displaying an interface for the interest object.
The image display apparatus of claim 1, wherein the second characteristic information comprises the depth information of a detection target.
The image display apparatus of claim 3, wherein the controller determines a level of the interest object based on the depth information.
The image display apparatus of claim 1, wherein the second characteristic information comprises the shape information of a detection target.
The image display apparatus of claim 1, wherein the controller extracts the second characteristic information from a region excluding a display region of the image.
The image display apparatus of claim 1, wherein the interest object is an application being executed on the background.
The image display apparatus of claim 1, wherein the controller reflects the second characteristic information on the interest object based on at least one of an acquisition time and an acquisition order of the voice and image, respectively.
An image display apparatus, comprising:

a microphone configured to acquire a voice;

a camera configured to acquire an image; and

a controller configured to extract first and second characteristic information from the voice, designate a first and a second interest object based on the first and the second characteristic information, respectively, extract third and fourth characteristic information from the image based on the first and the second interest object, respectively, and reflect the third and the fourth characteristic information on the first and the second interest object, respectively.
The image display apparatus of claim 9, wherein the controller extracts the third and the fourth characteristic information from the image without displaying an interface for the first and the second interest object, respectively.
The image display apparatus of claim 9, wherein the controller sequentially reflects the third and the fourth characteristic information on the first and the second interest object, respectively.
The image display apparatus of claim 11, wherein the controller sequentially reflects the third and the fourth characteristic information on the first and the second interest object, respectively, upon receiving an execution command.
The image display apparatus of claim 11, wherein the controller stores the third and the fourth characteristic information.
The image display apparatus of claim 9, wherein the controller simultaneously reflects the third and the fourth characteristic information on the first and the second interest object, respectively.
The image display apparatus of claim 14, wherein the first and the second interest object are applications being executed at the same time, respectively.
A method of controlling an object in an image display apparatus, the method comprising:

acquiring a voice and an image;

extracting first characteristic information from the voice;

designating an interest object based on the first characteristic information;

extracting second characteristic information from the image based on the interest object; and

reflecting the second characteristic information on the interest object.
The method of claim 16, wherein said extraction step extracts the second characteristic information from the image without displaying an interface for the interest object.
The method of claim 16, wherein the second characteristic information comprises the depth information of a detection target.
The method of claim 18, wherein said reflection step determines a level of the interest object based on the depth information.
The method of claim 16, wherein the second characteristic information comprises the shape information of a detection target.