US20180084221A1 - System and method for automatic video scaling - Google Patents
System and method for automatic video scaling Download PDFInfo
- Publication number
- US20180084221A1 US20180084221A1 US15/374,976 US201615374976A US2018084221A1 US 20180084221 A1 US20180084221 A1 US 20180084221A1 US 201615374976 A US201615374976 A US 201615374976A US 2018084221 A1 US2018084221 A1 US 2018084221A1
- Authority
- US
- United States
- Prior art keywords
- region
- interest
- channel
- difference
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/01—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
- H04N7/0117—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving conversion of the spatial resolution of the incoming video signal
- H04N7/0122—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving conversion of the spatial resolution of the incoming video signal the input and the output signals having different aspect ratios
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G06K9/4604—
-
- G06K9/4671—
-
- G06K9/6215—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/753—Transform-based matching, e.g. Hough transform
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234363—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the spatial resolution, e.g. for clients with a lower screen resolution
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/258—Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
- H04N21/25808—Management of client data
- H04N21/25825—Management of client data involving client display capabilities, e.g. screen resolution of a mobile phone
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440263—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the spatial resolution, e.g. for displaying on a connected PDA
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/47202—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting content on demand, e.g. video on demand
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/14—Picture signal circuitry for video frequency region
- H04N5/142—Edging; Contouring
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/2628—Alteration of picture size, shape, position or orientation, e.g. zooming, rotation, rolling, perspective, translation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/01—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
- H04N7/0117—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving conversion of the spatial resolution of the incoming video signal
-
- G06K2009/4666—
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/4728—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
Definitions
- Wireless streaming technologies such as MiraCast, WiDi, ChromeCast, AirPlay are supported by a wide range of televisions, sticks, dongle, and set-top boxes, and are able to stream media from network-attached devices to a display device without the hassle of establishing wired connections.
- the network-attached device may include mobile phones, tablets, or smart TV.
- Video source information from the network-attached device is cast to a display device, which may also have wireless capabilities.
- video content from streaming services include an active video region surrounded by a background image, which is either a static image or a series of slowly updating images.
- a background image which is either a static image or a series of slowly updating images.
- the user is only interested in viewing the content of the active video region and is not interested in viewing the surrounding background image, yet the active video region is not scaled to full screen.
- a method for identifying an active video region of streaming video and automatically scaling the active video region is desired.
- a method of identifying and scaling a region of interest on a display device is presented.
- the region of interest is detected based on the rate of change between frames.
- a size of the display area of the display device and aspect ratio are determined.
- the detected region of interest is scaled to fit the display area in full screen mode based on the display area and the aspect ratio.
- FIG. 1 depicts an example wireless display receiver block diagram according to one embodiment
- FIG. 2 includes a flowchart depicting a region of interest detection process according to one embodiment
- FIG. 3 depicts a flowchart that shows details of the difference accumulation process according to one embodiment
- FIG. 4 depicts a region of interest scaling method according to one embodiment
- FIG. 5A is a visual depiction of a region on the display device having a high Accumulated Difference according to one embodiment
- FIG. 5B depicts example edge binary image output from edge detection according to one embodiment
- FIG. 5C depicts an example of region of interest scaled up to fit the full display area according to one embodiment.
- FIG. 6 depicts an example functional block diagram of a display processor that may be used to implement one embodiment of the region of interest detection and scale-up method disclosed herein.
- Embodiments of the present inventive concept provide a system and method for automatic video scaling.
- the disclosure pertains to a technique for processing input data to identify a region of interest (ROI) from a video content in a display system.
- the disclosure provides a method for determining a video region to be scaled and a zoom factor from the detected ROI to output display.
- the disclosure provides a method for automatically adjusting video center and zoom factor according to size of the display area, such that region of interest is mapped to the whole display area in full screen mode.
- the region of interest may be a region that shows active video content.
- the disclosure includes a method for accurately detecting the region of interest, for example based on a rate of data change using accumulated difference data.
- the disclosure also includes a method for applying the region of interest to the full display size with aspect ratio adjustment.
- a user interface for selecting the ROI and adjusting region of interest to full display area size is provided.
- FIG. 1 depicts the functional relationship among components of an example wireless display system.
- a sender 2 which transmits video and/or audio data to be displayed, has capabilities to be attached to a network, such as the Internet, via wireless or wired means.
- the sender 2 is further capable of being configured to handle wireless communication with a receiver 4 by implementing a wireless streaming technology.
- the sender 2 may be configured to operate using ChromeCast (Google Cast), AirPlay, MiraCast, WiDi, or any other like technology.
- the sender 2 may include a server, a computer, a mobile device such as a smartphone or a tablet, or any other device with the above-described capabilities.
- the sender 2 is often not physically connected to the receiver 4 .
- the sender 2 may include a sender app running on the sender device.
- the sender app may allow a user to select which display device content is to be displayed on, have media controls such as play/pause/record functionalities, and/or allow content discovery by the user.
- the receiver 4 may receive digital media from the sender.
- the digital media may include video data (e.g. in a format such as MPEG2 or MPEG4, or the like) and/or audio data (e.g. in a format such as MP3, AAC, AC3, or the like) which is to be streamed.
- the receiver 4 may be configured to operate using a wireless streaming technology (as listed above for the sender 2 ) corresponding to the wireless display technology used by the sender 2 .
- the receiver 4 may be a device such as a Chromecast dongle, Apple TV, a personal computer.
- the receiver 4 may include a decoder 6 .
- the decoder 6 may include codecs included on a system on a chip (SoC) which is contained in the receiver.
- SoC system on a chip
- the codecs have the capability to decode video/audio compression formats of the video/audio data. After the codecs decode the compressed video/audio data so that the video/audio data becomes uncompressed, the receiver 4 may feed the uncompressed video/audio data to a display processor 8 .
- the display processor 8 may be configured to perform video enhancement and other processing related to displaying the video.
- the video is then output to the display device 12 via a secure digital/analog connection 10 (e.g. a HDCP compliant connection).
- a secure digital/analog connection 10 e.g. a HDCP compliant connection
- FIG. 2 includes a flowchart depicting an embodiment of region of interest detection process.
- the input from video decoder 6 allows the system to receive an uncompressed video frame.
- the system accumulates differences between a plurality of frames.
- the system receives at least two uncompressed video frames that are sequentially generated from the receiver ( 20 ). Then, the system accumulates differences between the at least two uncompressed video frames ( 22 ). This difference accumulation process ( 22 ) is detailed further below in reference to FIG. 3 .
- the system determines if there is enough difference data to proceed with identifying the region of interest ( 24 ). In one embodiment, the system makes this determination by comparing the value stored in the accumulated difference buffer with a difference threshold.
- the difference threshold is a predefined, selected value which corresponds to a high enough difference data value to identify a region of interest. By comparing the value of the accumulated difference buffer with the difference threshold, the system helps ensure that the system has enough information to identify the region of interest. If the system determines there is not enough difference data to proceed, then the system loops back to receive another uncompressed video frame.
- the system does not calculate the accumulated difference, but instead proceeds once a threshold number of frames have been displayed.
- this method with frame counts may be affected by the amount of activity in the content of video stream. For example, if the video being displayed has a very static image(s), the threshold number of frames would be adjusted up.
- the system determines there is enough difference data to proceed, then the system proceeds down a left path of the flow chart and a right path of the flow chart. Parts of the left and right branches of the flowchart may be executed sequentially or simultaneously.
- the system performs edge detection on the accumulated difference data to obtain an edge binary image output ( 26 ).
- Hough transform is applied to the edge binary image output ( 28 ) to obtain lines from the edge binary image output.
- the system detects only lines (e.g., straight lines) extending at an angle of around 0 and 90 degrees with respect to a first direction, where the first direction may be parallel to a longer side of the video.
- the present inventive concept is not limited thereto.
- the system may assume that the region of interest may be displayed with a straight-line boundary that extends at an arbitrary angle with respect to the first direction.
- the system performs corner detection on the accumulated difference data to obtain potential corner locations ( 32 ). Then, the system matches up the potential corner locations with the detected lines ( 30 ) from the edge binary image output ( 34 ), and applies a rectangle rule to eliminate potential corner locations that are not matched with a line ( 36 ).
- the rectangle rule dictates that the corner candidates selected must form a rectangle.
- each corner candidate must have two perpendicular lines, provided by the Hough transform, and two counter corners, the lines.
- the two counter corners are corners located at an endpoint of each respective line, where each endpoint is not located at the vertex of the two perpendicular lines.
- the system After determining the corner candidates, the system identifies a rectangle outlining a region of interest candidate and determines the region of interest ( 38 ). The system can then scale the region of interest using the display area spec that was previously received and displays the scaled region of interest to the full display area ( 40 ).
- FIG. 3 depicts a flowchart that shows details of the difference accumulation process 22 between frames.
- the difference accumulation process 22 measures the rate of data change over a number of frames. Stages 50 through 56 are part of the process 22 shown in FIG. 2 .
- the system extracts data from each of the R, G, and B channels of the current frame at time t ( 50 ).
- each current channel R(t), G(t), and B(t) is respectively subtracted from the data of each last channel R(t ⁇ 1), G(t ⁇ 1), and B(t ⁇ 1) channel (herein also referred to as data of a first prior frame t ⁇ 1 for respective R, G, and B channels), and each difference (Diff) is added to a difference buffer Diff ( 52 ).
- the parameter “Accumulated Difference” may be a sum of the differences between a number of previous channels, for example R(t ⁇ 2), R(t ⁇ 3), and R(t ⁇ 4) for the R channel (which are herein also referred to as data of a second, third, and fourth prior frames for the R channel).
- This “Accumulated Difference” is updated to include the latest Diff ( 54 ). If a static image is being displayed, the Accumulated Difference may stay roughly zero for a large number of frames. However, but if a video is being displayed, the Accumulated Difference may grow rapidly.
- each of the current values R(t), G(t), and B(t) are redefined as values of the previous channels R(t ⁇ 1), G(t ⁇ 1), and B(t ⁇ 1) to extract updated current values from the channels ( 56 ).
- FIG. 4 depicts a region of interest scaling process according to one embodiment.
- a region of interest candidate border is shown as a rectangle ( 62 ) and the user is asked whether the detected region is correct to scale to full screen ( 64 ). If the region of interest candidate is not the correct region to scale, the system will look for an additional region of interest candidate ( 68 ). If the system finds an additional candidate, the system will display the additional candidate to the user for selection. On the other hand, if the region of interest candidate is the correct region to scale to full screen, the system will confirm the aspect ratio of the display device ( 70 ) and adjust the aspect ratio of the video if necessary ( 72 ). The system may determine the aspect ratio based on the device type, which may be detected or provided by the user. The region of interest is then scaled to be displayed full screen on the display device ( 74 ).
- the system will perform a rotation to align the region of interest either following or preceding scaling of the region of interest to full screen.
- FIGS. 5A, 5B, and 5C visually depict an embodiment of region of interest detection.
- FIG. 5A provides a visual depiction of a region of interest 94 within a full display area 92 , which may be a display area of a display device.
- the region 94 having a high Accumulated Difference is shown to be brighter than the other parts of the display area 92 , therefore indicating that the bright region 94 is the region of interest. (i.e., the white areas correspond to a region of interest and the black areas correspond to static images).
- FIG. 5A may be thought of as depicting the process that happens at stage 22 of FIG. 2 .
- FIG. 5B shows example edge binary image output from edge detection, such as edge binary image output produced from stage 26 of FIG. 2 , for example.
- edge detection such as edge binary image output produced from stage 26 of FIG. 2
- FIG. 5C depicts the case where a user confirms that the highlighted rectangular area is indeed the region of interest, and the region is scaled up to fit the entire display area 92 .
- the data that is shown outside the active video area in FIG. 5A and FIG. 5B remain hidden.
- the scaling entails identifying a center of the region of interest and aligning it with a center of the display area, then applying a zoom factor that is determined mathematically based on the dimensions of the region of interest and the aspect ratio of the display device.
- FIG. 6 depicts a functional block diagram of an embodiment of a display processor that may be used to implement the region of interest detection and automatic scaling processes described above.
- video frames are received by a Difference Accumulator 80 having a Buffer 82 .
- An Edge Detector 82 , a Corner Detector 84 , a Line Detector 86 , and a Region Detector 88 work together to determine a region of interest, as depicted in FIG. 2 .
- a “detector” 96 includes at least one of the Edge Detector 82 , Corner Detector 84 , Line Detector 86 , and Region Detector 88 , as shown, for example, in FIG. 6 .
- an Adjuster 90 may determine the aspect ratio and may adjust the aspect ratio if necessary.
- These detectors and the adjuster may be implemented as software or non-transitory computer-readable instructions that are stored in a medium.
- operating system software of a system that includes the display processor may provide an operating environment for softwares executing in the computer system, and may coordinate activities of the components of the computer system.
- the computer system includes at least one processor and memory, one or more input devices, one or more output devices, and one or more communication connections.
- the memory may be volatile memory (e.g., registers, cache, random access memory (RAM)), non-volatile memory (e.g., read only memory (ROM), electrically erasable programmable read only memory (EEPROM), flash memory, etc.), or combination thereof.
- the memory may store software for implementing various embodiments of the disclosed concept.
- embodiments can be implemented using a computer having a display device, e.g., a CRT (cathode ray tube), LCD (liquid crystal display), projection screen, OLED display, 3D display, etc. for displaying information to the participants.
- a display device e.g., a CRT (cathode ray tube), LCD (liquid crystal display), projection screen, OLED display, 3D display, etc.
- Other kinds of devices can be used to provide for interaction with participants as well; for example, feedback provided to the player can be any form of sensory feedback, e.g.
- any of the above methods may be used to make a “selection” of the region of interest by confirming the highlighted rectangular area.
- Computer-readable media are any available media that may be accessed within a computer system.
- Computer-readable media include memory, storage, communication media, and combinations thereof.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Graphics (AREA)
- Human Computer Interaction (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Computer Security & Cryptography (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Controls And Circuits For Display Device (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
- This application is related to and claims the benefit of U.S.
Provisional Patent Application 62/397,825, entitled “A System and Method for Automatic Video Scaling” filed on Sep. 21, 2016. The provisional application is hereby incorporated by reference for all purposes. - With the growth of Internet streaming services and online media sites such as YouTube, Netflix, Hulu, and Amazon, among others, wireless streaming technologies are becoming ever more popular, allowing users to stream digital media from a network-attached device to a display device. Wireless streaming technologies such as MiraCast, WiDi, ChromeCast, AirPlay are supported by a wide range of televisions, sticks, dongle, and set-top boxes, and are able to stream media from network-attached devices to a display device without the hassle of establishing wired connections. The network-attached device may include mobile phones, tablets, or smart TV. Video source information from the network-attached device is cast to a display device, which may also have wireless capabilities.
- In some cases, video content from streaming services include an active video region surrounded by a background image, which is either a static image or a series of slowly updating images. In many, if not most cases, the user is only interested in viewing the content of the active video region and is not interested in viewing the surrounding background image, yet the active video region is not scaled to full screen.
- A method for identifying an active video region of streaming video and automatically scaling the active video region is desired.
- A method of identifying and scaling a region of interest on a display device is presented. The region of interest is detected based on the rate of change between frames. A size of the display area of the display device and aspect ratio are determined. The detected region of interest is scaled to fit the display area in full screen mode based on the display area and the aspect ratio.
- These and other features and advantages of the present invention will be appreciated and understood with reference to the specification, claims, and appended drawings wherein:
-
FIG. 1 depicts an example wireless display receiver block diagram according to one embodiment; -
FIG. 2 includes a flowchart depicting a region of interest detection process according to one embodiment; -
FIG. 3 depicts a flowchart that shows details of the difference accumulation process according to one embodiment; -
FIG. 4 depicts a region of interest scaling method according to one embodiment; -
FIG. 5A is a visual depiction of a region on the display device having a high Accumulated Difference according to one embodiment; -
FIG. 5B depicts example edge binary image output from edge detection according to one embodiment; -
FIG. 5C depicts an example of region of interest scaled up to fit the full display area according to one embodiment; and -
FIG. 6 depicts an example functional block diagram of a display processor that may be used to implement one embodiment of the region of interest detection and scale-up method disclosed herein. - The detailed description set forth below, in connection with the appended drawings, is intended as a description of exemplary embodiments of a system and method for detecting and scaling up a region of interest to display in full screen mode. The description sets forth the features of the inventive technique in connection with the illustrated embodiments. It is to be understood, however, that the same or equivalent functions and structures may be accomplished by different embodiments that are also intended to be encompassed within the spirit and scope of the disclosure. As denoted elsewhere herein, like element numbers are intended to indicate like elements or features.
- Embodiments of the present inventive concept provide a system and method for automatic video scaling. The disclosure pertains to a technique for processing input data to identify a region of interest (ROI) from a video content in a display system. The disclosure provides a method for determining a video region to be scaled and a zoom factor from the detected ROI to output display. The disclosure provides a method for automatically adjusting video center and zoom factor according to size of the display area, such that region of interest is mapped to the whole display area in full screen mode. The region of interest may be a region that shows active video content. The disclosure includes a method for accurately detecting the region of interest, for example based on a rate of data change using accumulated difference data. The disclosure also includes a method for applying the region of interest to the full display size with aspect ratio adjustment. In some embodiments, a user interface for selecting the ROI and adjusting region of interest to full display area size is provided.
-
FIG. 1 depicts the functional relationship among components of an example wireless display system. - A
sender 2, which transmits video and/or audio data to be displayed, has capabilities to be attached to a network, such as the Internet, via wireless or wired means. Thesender 2 is further capable of being configured to handle wireless communication with areceiver 4 by implementing a wireless streaming technology. For example, thesender 2 may be configured to operate using ChromeCast (Google Cast), AirPlay, MiraCast, WiDi, or any other like technology. Thesender 2 may include a server, a computer, a mobile device such as a smartphone or a tablet, or any other device with the above-described capabilities. Thesender 2 is often not physically connected to thereceiver 4. - The
sender 2 may include a sender app running on the sender device. The sender app may allow a user to select which display device content is to be displayed on, have media controls such as play/pause/record functionalities, and/or allow content discovery by the user. - The
receiver 4 may receive digital media from the sender. The digital media may include video data (e.g. in a format such as MPEG2 or MPEG4, or the like) and/or audio data (e.g. in a format such as MP3, AAC, AC3, or the like) which is to be streamed. Thereceiver 4 may be configured to operate using a wireless streaming technology (as listed above for the sender 2) corresponding to the wireless display technology used by thesender 2. Thereceiver 4 may be a device such as a Chromecast dongle, Apple TV, a personal computer. - The
receiver 4 may include adecoder 6. Thedecoder 6 may include codecs included on a system on a chip (SoC) which is contained in the receiver. The codecs have the capability to decode video/audio compression formats of the video/audio data. After the codecs decode the compressed video/audio data so that the video/audio data becomes uncompressed, thereceiver 4 may feed the uncompressed video/audio data to adisplay processor 8. - The
display processor 8 may be configured to perform video enhancement and other processing related to displaying the video. The video is then output to thedisplay device 12 via a secure digital/analog connection 10 (e.g. a HDCP compliant connection). -
FIG. 2 includes a flowchart depicting an embodiment of region of interest detection process. The input fromvideo decoder 6 allows the system to receive an uncompressed video frame. The system accumulates differences between a plurality of frames. - As shown in
FIG. 2 , the system receives at least two uncompressed video frames that are sequentially generated from the receiver (20). Then, the system accumulates differences between the at least two uncompressed video frames (22). This difference accumulation process (22) is detailed further below in reference toFIG. 3 . - Subsequent to accumulating differences (22), the system determines if there is enough difference data to proceed with identifying the region of interest (24). In one embodiment, the system makes this determination by comparing the value stored in the accumulated difference buffer with a difference threshold. The difference threshold is a predefined, selected value which corresponds to a high enough difference data value to identify a region of interest. By comparing the value of the accumulated difference buffer with the difference threshold, the system helps ensure that the system has enough information to identify the region of interest. If the system determines there is not enough difference data to proceed, then the system loops back to receive another uncompressed video frame.
- According to another embodiment, the system does not calculate the accumulated difference, but instead proceeds once a threshold number of frames have been displayed. However, this method with frame counts may be affected by the amount of activity in the content of video stream. For example, if the video being displayed has a very static image(s), the threshold number of frames would be adjusted up.
- If the system determines there is enough difference data to proceed, then the system proceeds down a left path of the flow chart and a right path of the flow chart. Parts of the left and right branches of the flowchart may be executed sequentially or simultaneously. On the left path, the system performs edge detection on the accumulated difference data to obtain an edge binary image output (26). Then, Hough transform is applied to the edge binary image output (28) to obtain lines from the edge binary image output. In one embodiment, the system detects only lines (e.g., straight lines) extending at an angle of around 0 and 90 degrees with respect to a first direction, where the first direction may be parallel to a longer side of the video. However, the present inventive concept is not limited thereto. For example, the system may assume that the region of interest may be displayed with a straight-line boundary that extends at an arbitrary angle with respect to the first direction.
- On the right path of the flow chart of
FIG. 2 , the system performs corner detection on the accumulated difference data to obtain potential corner locations (32). Then, the system matches up the potential corner locations with the detected lines (30) from the edge binary image output (34), and applies a rectangle rule to eliminate potential corner locations that are not matched with a line (36). The rectangle rule dictates that the corner candidates selected must form a rectangle. Thus, each corner candidate must have two perpendicular lines, provided by the Hough transform, and two counter corners, the lines. The two counter corners are corners located at an endpoint of each respective line, where each endpoint is not located at the vertex of the two perpendicular lines. After determining the corner candidates, the system identifies a rectangle outlining a region of interest candidate and determines the region of interest (38). The system can then scale the region of interest using the display area spec that was previously received and displays the scaled region of interest to the full display area (40). -
FIG. 3 depicts a flowchart that shows details of thedifference accumulation process 22 between frames. Thedifference accumulation process 22, in one aspect, measures the rate of data change over a number of frames.Stages 50 through 56 are part of theprocess 22 shown inFIG. 2 . Once the system receives the uncompressed video frame, the system extracts data from each of the R, G, and B channels of the current frame at time t (50). The data of each current channel R(t), G(t), and B(t) is respectively subtracted from the data of each last channel R(t−1), G(t−1), and B(t−1) channel (herein also referred to as data of a first prior frame t−1 for respective R, G, and B channels), and each difference (Diff) is added to a difference buffer Diff (52). At this point, the parameter “Accumulated Difference” may be a sum of the differences between a number of previous channels, for example R(t−2), R(t−3), and R(t−4) for the R channel (which are herein also referred to as data of a second, third, and fourth prior frames for the R channel). This “Accumulated Difference” is updated to include the latest Diff (54). If a static image is being displayed, the Accumulated Difference may stay roughly zero for a large number of frames. However, but if a video is being displayed, the Accumulated Difference may grow rapidly. Once the difference buffer Diff is added to the Accumulated Difference, each of the current values R(t), G(t), and B(t) are redefined as values of the previous channels R(t−1), G(t−1), and B(t−1) to extract updated current values from the channels (56). -
FIG. 4 depicts a region of interest scaling process according to one embodiment. After identifying a region of interest candidate (60, or 38 inFIG. 2 ), a region of interest candidate border is shown as a rectangle (62) and the user is asked whether the detected region is correct to scale to full screen (64). If the region of interest candidate is not the correct region to scale, the system will look for an additional region of interest candidate (68). If the system finds an additional candidate, the system will display the additional candidate to the user for selection. On the other hand, if the region of interest candidate is the correct region to scale to full screen, the system will confirm the aspect ratio of the display device (70) and adjust the aspect ratio of the video if necessary (72). The system may determine the aspect ratio based on the device type, which may be detected or provided by the user. The region of interest is then scaled to be displayed full screen on the display device (74). - Although not depicted, if the region of interest is tilted (such that no edge extends parallel to the edge of the display device), the system will perform a rotation to align the region of interest either following or preceding scaling of the region of interest to full screen.
- Referring to
FIGS. 5A, 5B, and 5C visually depict an embodiment of region of interest detection.FIG. 5A , for example, provides a visual depiction of a region ofinterest 94 within afull display area 92, which may be a display area of a display device. In this particular figure, theregion 94 having a high Accumulated Difference is shown to be brighter than the other parts of thedisplay area 92, therefore indicating that thebright region 94 is the region of interest. (i.e., the white areas correspond to a region of interest and the black areas correspond to static images).FIG. 5A may be thought of as depicting the process that happens atstage 22 ofFIG. 2 . -
FIG. 5B shows example edge binary image output from edge detection, such as edge binary image output produced fromstage 26 ofFIG. 2 , for example. As the process moves forward to determine the region of interest, the edges/lines around the rectangular area will be highlighted as a candidate for scaling up (stage 62 ofFIG. 4 ).FIG. 5C depicts the case where a user confirms that the highlighted rectangular area is indeed the region of interest, and the region is scaled up to fit theentire display area 92. In the scaled-up version, the data that is shown outside the active video area inFIG. 5A andFIG. 5B remain hidden. As mentioned above, in one embodiment, the scaling entails identifying a center of the region of interest and aligning it with a center of the display area, then applying a zoom factor that is determined mathematically based on the dimensions of the region of interest and the aspect ratio of the display device. -
FIG. 6 depicts a functional block diagram of an embodiment of a display processor that may be used to implement the region of interest detection and automatic scaling processes described above. As shown, video frames are received by aDifference Accumulator 80 having aBuffer 82. AnEdge Detector 82, aCorner Detector 84, aLine Detector 86, and aRegion Detector 88 work together to determine a region of interest, as depicted inFIG. 2 . A “detector” 96, as used herein, includes at least one of theEdge Detector 82,Corner Detector 84,Line Detector 86, andRegion Detector 88, as shown, for example, inFIG. 6 . Furthermore, anAdjuster 90 may determine the aspect ratio and may adjust the aspect ratio if necessary. These detectors and the adjuster may be implemented as software or non-transitory computer-readable instructions that are stored in a medium. In various embodiments, operating system software of a system that includes the display processor may provide an operating environment for softwares executing in the computer system, and may coordinate activities of the components of the computer system. - Various embodiments of the system that includes the display processor may be implemented with or involve one or more computer systems. The computer system is not intended to suggest any limitation as to the scope of use or functionality of described embodiments. The computer system includes at least one processor and memory, one or more input devices, one or more output devices, and one or more communication connections. The memory may be volatile memory (e.g., registers, cache, random access memory (RAM)), non-volatile memory (e.g., read only memory (ROM), electrically erasable programmable read only memory (EEPROM), flash memory, etc.), or combination thereof. In one embodiment, the memory may store software for implementing various embodiments of the disclosed concept.
- To provide for interaction between a user and the display processor, embodiments can be implemented using a computer having a display device, e.g., a CRT (cathode ray tube), LCD (liquid crystal display), projection screen, OLED display, 3D display, etc. for displaying information to the participants. A touchscreen, a keyboard and a pointing device, e.g., a mouse or a trackball, by which a user can provide input to the computer are also provided. Other kinds of devices can be used to provide for interaction with participants as well; for example, feedback provided to the player can be any form of sensory feedback, e.g. visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, brain waves, other physiological input, eye movements, gestures, body movements, or tactile input. For example, any of the above methods may be used to make a “selection” of the region of interest by confirming the highlighted rectangular area.
- While this specification contains many specifics, these should not be construed as limitations on the scope of the disclosure as a whole or of what can be claimed. Rather, the examples provided should be viewed as descriptions of features specific to particular embodiments. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features can be described above as being executed in certain combinations in certain order, one or more features from a disclosed combination may in some cases be omitted from the combination, and the claimed combination can be directed to a subcombination or variation of a subcombination.
- Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing can be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
- Various embodiments of the present invention may be described in the general context of computer-readable media. Computer-readable media are any available media that may be accessed within a computer system. By way of example, and not limitation, within the computer system, computer-readable media include memory, storage, communication media, and combinations thereof.
- It should be understood that the invention can be practiced with modification and alteration within the spirit and scope of the disclosure and the appended claims. The description is not intended to be exhaustive or to limit the inventive concept to the precise form disclosed.
Claims (21)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/374,976 US9924131B1 (en) | 2016-09-21 | 2016-12-09 | System and method for automatic video scaling |
| KR1020170116034A KR102427156B1 (en) | 2016-09-21 | 2017-09-11 | A system and method for automatic video scaling |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201662397825P | 2016-09-21 | 2016-09-21 | |
| US15/374,976 US9924131B1 (en) | 2016-09-21 | 2016-12-09 | System and method for automatic video scaling |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US9924131B1 US9924131B1 (en) | 2018-03-20 |
| US20180084221A1 true US20180084221A1 (en) | 2018-03-22 |
Family
ID=61598633
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/374,976 Active US9924131B1 (en) | 2016-09-21 | 2016-12-09 | System and method for automatic video scaling |
Country Status (2)
| Country | Link |
|---|---|
| US (1) | US9924131B1 (en) |
| KR (1) | KR102427156B1 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108810538A (en) * | 2018-06-08 | 2018-11-13 | 腾讯科技(深圳)有限公司 | Method for video coding, device, terminal and storage medium |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2022154342A1 (en) * | 2021-01-12 | 2022-07-21 | Samsung Electronics Co., Ltd. | Methods and electronic device for processing image |
Family Cites Families (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR101017362B1 (en) * | 2004-01-08 | 2011-02-28 | 삼성전자주식회사 | Automatic zoom device and method for dynamic video playback |
| US7911536B2 (en) | 2004-09-23 | 2011-03-22 | Intel Corporation | Screen filled display of digital video content |
| MX2007012564A (en) | 2005-04-13 | 2007-11-15 | Nokia Corp | Coding, storage and signalling of scalability information. |
| US20070024706A1 (en) | 2005-08-01 | 2007-02-01 | Brannon Robert H Jr | Systems and methods for providing high-resolution regions-of-interest |
| KR101255226B1 (en) | 2005-09-26 | 2013-04-16 | 한국과학기술원 | Method and Apparatus for defining and reconstructing ROIs in Scalable Video Coding |
| JP5093557B2 (en) * | 2006-10-10 | 2012-12-12 | ソニー株式会社 | Image processing apparatus, image processing method, and program |
| JP5263565B2 (en) * | 2006-10-12 | 2013-08-14 | ソニー株式会社 | Image processing apparatus, image processing method, and program |
| US8189945B2 (en) | 2009-05-27 | 2012-05-29 | Zeitera, Llc | Digital video content fingerprinting based on scale invariant interest region detection with an array of anisotropic filters |
| WO2010141023A1 (en) | 2009-06-04 | 2010-12-09 | Hewlett-Packard Development Company, L.P. | Video conference |
| KR101272448B1 (en) * | 2011-07-11 | 2013-06-07 | 광주과학기술원 | Apparatus and method for detecting region of interest, and the recording media storing the program performing the said method |
| KR101968070B1 (en) | 2012-10-12 | 2019-04-10 | 캐논 가부시끼가이샤 | Method for streaming data, method for providing data, method for obtaining data, computer-readable storage medium, server device, and client device |
| CN105075271A (en) | 2013-04-08 | 2015-11-18 | 索尼公司 | Region of interest scalability with SHVC |
| US9591300B2 (en) | 2014-12-04 | 2017-03-07 | Spirent Communications, Inc. | Video streaming and video telephony downlink performance analysis system |
-
2016
- 2016-12-09 US US15/374,976 patent/US9924131B1/en active Active
-
2017
- 2017-09-11 KR KR1020170116034A patent/KR102427156B1/en active Active
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108810538A (en) * | 2018-06-08 | 2018-11-13 | 腾讯科技(深圳)有限公司 | Method for video coding, device, terminal and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| KR102427156B1 (en) | 2022-07-29 |
| KR20180032499A (en) | 2018-03-30 |
| US9924131B1 (en) | 2018-03-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP2893706B1 (en) | Augmented reality for video system | |
| US8644467B2 (en) | Video conferencing system, method, and computer program storage device | |
| JP6336206B2 (en) | Method, apparatus, program and recording medium for processing moving picture file identifier | |
| US9389706B2 (en) | Method and system for mouse control over multiple screens | |
| AU2013200108A1 (en) | Apparatus and method for scaling layout of application in image display device | |
| US10893206B1 (en) | User experience with digital zoom in video from a camera | |
| US9749574B2 (en) | Image matching-based pointing techniques | |
| US10573277B2 (en) | Display device, display system, and non-transitory recording medium, to adjust position of second image in accordance with adjusted zoom ratio of first image | |
| KR20140039762A (en) | Image processing apparatus and control method thereof | |
| WO2023060056A3 (en) | Spatial motion attention for intelligent video analytics | |
| US9924131B1 (en) | System and method for automatic video scaling | |
| US20190027118A1 (en) | Terminal device and display method | |
| US20180367836A1 (en) | A system and method for controlling miracast content with hand gestures and audio commands | |
| US10609305B2 (en) | Electronic apparatus and operating method thereof | |
| US11741570B2 (en) | Image processing device and image processing method of same | |
| CN109766530B (en) | Method and device for generating chart frame, storage medium and electronic equipment | |
| US11631159B2 (en) | Zoom control of digital images on a display screen | |
| CN106131628B (en) | A kind of method of video image processing and device | |
| US10632379B2 (en) | Method and apparatus for performing interaction in chessboard interface | |
| US20130009949A1 (en) | Method, system and computer program product for re-convergence of a stereoscopic image | |
| CN107547913B (en) | Video data playing and processing method, client and equipment | |
| KR102836439B1 (en) | Electronic apparatus and controlling method thereof | |
| US11557065B2 (en) | Automatic segmentation for screen-based tutorials using AR image anchors | |
| US9292906B1 (en) | Two-dimensional image processing based on third dimension data | |
| US10353490B2 (en) | Image display apparatus, driving method of image display apparatus, and computer readable recording medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: SAMSUNG DISPLAY CO., LTD., KOREA, REPUBLIC OF Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LEE, JANGHWAN;REEL/FRAME:040716/0386 Effective date: 20161206 |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
| MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |