WO2022031723A1 - Système et procédé de préparation de composites numériques en vue d'incorporation dans des supports visuels numériques - Google Patents
Système et procédé de préparation de composites numériques en vue d'incorporation dans des supports visuels numériques Download PDFInfo
- Publication number
- WO2022031723A1 WO2022031723A1 PCT/US2021/044374 US2021044374W WO2022031723A1 WO 2022031723 A1 WO2022031723 A1 WO 2022031723A1 US 2021044374 W US2021044374 W US 2021044374W WO 2022031723 A1 WO2022031723 A1 WO 2022031723A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- interest
- asset
- shot
- insert
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
- G11B27/036—Insert-editing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/60—Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client
- H04N21/63—Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
- H04N21/643—Communication protocols
- H04N21/64322—IP
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/812—Monomedia components thereof involving advertisement data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
Definitions
- the present application relates in general to the field of digital media compositing.
- the present disclosure is directed to a system and method for generating media overlays and integrating said overlays into digital media.
- This digital media may be provided to consumers through various services, including over-the-top (OTT) delivery systems.
- OTT over-the-top
- the system as provided in the present disclosure may include an automated identification module.
- This automated identification module may execute a custom Automated Placement Opportunity Identification (APOI) engine.
- APOI Automated Placement Opportunity Identification
- This APOI engine may be used to tag and/or label content based on visual features.
- the visual features being identified may include flat surfaces, locations, particular objects, scenery characteristics, etc.
- the APOI engine may incorporate one or more neural networks for detecting indi vidual shots of a digital media set, generating labels associated with the visual features identified in each shot, and determining objects of interest that are mapped across the individual shots.
- the APOI engine and the one or more neural networks therein may be trained by analyzing labels generated in the past and confirmed as accurate.
- the system as provided m the present disclosure may further include a Placement Insertion Interface (PII) system that allows digital media clients to easily explore available placements for composites to be inserted throughout available digital media.
- PII Placement Insertion Interface
- This PII system may further include an upload tool for digital media clients to upload their own visual assets to be composited.
- the present disclosure provides for a method of and a system for pre-processing digital media, the system executing the method comprising; receiving a digital media dataset; detecting, by way of one or more neural networks, one or more shots within the digital media dataset, wherein each shot is identified by way of boundary indicators; generating, by way of the one or more neural networks, contextual labels for each shot, wherein each contextual label correlates to a characteristic of each respective shot of the digital media dataset; extracting an array of images for each shot, wherein one or more images of the array comprise one or more objects of interest; detecting, by way of the one or more neural networks, objects of interest for each image of the array of images of each shot; determining, by way of the one or more neural networks, objects of interest to be mapped; mapping, by way of the one or more neural networks, an object of interest of a first image of the array of images of a first shot to an object of interest of a second image of the array of images of the first shot, wherein the object of interest of
- the primary image asset is indicati ve of a digital video asset comprising a senes of image assets, wherein the method is programmatically repeated for each image asset of the series.
- each series of image assets are extracted from a digital video asset by: receiving the digital video asset, processing, by way of one or more neural networks, pixels of the digital video asset; identifying, by the one or more neural networks, a first shot boundary of the digital video asset and a second boundary of the digital video asset; extracting one or more video frames located between the first shot boundary and the second shot boundary of the digital video asset; and generating a series of image assets from the one or more video frames as extracted.
- SUBSTITUTE SHEET (RULE 26) area corresponding to the insert layer area for applying additional effect layers thereto.
- the combining comprises blending the shadow layer image with the composite image
- the present disclosure further comprises creating a reflection layer image comprising one or more reflections of one or more objects depicted in the base layer image, wherein the one or more reflections are disposed within the insert layer area.
- the combining comprises blending the reflection layer image with the composite image.
- the present disclosure further comprises adding motion blur to the insert image within the insert layer area to simulate motion over a period of time.
- the present disclosure further comprises adding depth of field blur to the insert image within the insert layer area to simulate a difference in focus.
- the present disclosure further comprises generating respective composite images for a sequence of base layer images corresponding to frames in a video using the insert image.
- Figure 2 illustrates an automated placement opportunity identification engine, according to some embodiments.
- Figure 4 illustrates a flowchart detailing the methods performable by a placement video clip tool, according to some embodiments.
- Figure 7 illustrates an on-top composite logic process, according to some embodiments.
- Figure 8 illustrates an on-top composite logic, according to some embodiments.
- Figure 9 illustrates graphic insertion compositing logic, according to some embodiments.
- Figure 10 illustrates an exemplary insertion of a motion blur effect, according to some embodiments.
- Figure 11 illustrates an automated compositing service, according to some embodiments.
- a neural network may be used as a pre-processing mechanism for other neural networks.
- Figure 1 illustrates a flowchart of the main components of the present disclosure presented for demonstrative purposes only, according to some embodiments.
- the main components of the present disclosure may include content analysis for placement identification at 102.
- placement identification may include identifying a placement video for placement opportunities as described below.
- the main components of the present disclosure may further include selecting a graphic at 104.
- Graphic selection 104 may include selecting a pre-uploaded or previously available graphic for compositing into a placement video, Graphic selection 104 may further include uploading a new graphic by way of a Graphical User Interface displayed to a user. Graphic selection 104 allows a user to select which graphic is desired for compositing.
- the main components of the present disclosure may further include manipulating the desired graphic in order to best fit the placement video at 106.
- This process may include manipulation of the graphic by a programmatic process or manually adjusted in order to alter the rotation, skew, and/or color of said graphic to more closely resemble the placement video, according to some embodiments.
- Some embodiments may further include manipulating the graphic using one or more of a compositing or combination procedures. These procedures may be used to generate a manipulated graphic based on a combination of graphics, logos, texts, or other creatives provided or otherwise indicated by the user. Alternatively, these procedures may generate the manipulated graphic according to instructions determined or otherwise calculated by the system without instruction from a user.
- the main components of the present disclosure may further include compositing the manipulated graphic onto a placement location of the placement video at 108.
- compositing procedure at 108 may include a predetermined, programmatic methodology or automated process as indicated in Figure 1.
- the mam components of the present disclosure may further include displaying for the user a preview of the manipulated graphic, composited onto the placement location of the placement video at 110.
- Preview procedure at 110 may include generating a graphical user interface that displays for the user a generated preview, according to some embodiments.
- the main components of the present disclosure may further include delivering to the user a final output video comprising the manipulated graphic composited onto the placement location therein as shown at 1 12.
- Delivery procedure at 112 may include delivering the final output video by way of a communication protocol designed for file transfer, such as the IP protocol suite (e.g., TCP, UDP, FTP), or any other digital delivery- method.
- IP protocol suite e.g., TCP, UDP, FTP
- Compositing images onto digital media may be implemented through numerous steps as provided by the present system.
- the first step in order to implement the present system involves an Automated Placement Opportunity Identification engine.
- the Automated Placement Opportunity Identification engine may use one or more machine learning algorithms to identify placement opportunities within digital media.
- placement opportunities may include flat surfaces such as billboards, walls, sides of buildings, tables and desks, counter tops and bars, screens (e.g., digital screens, computer screens, monitors, etc.), signage, and/or posters.
- FIG. 2 illustrates an automated placement opportunity identification engine, according to some embodiments.
- the Automated Placement Opportunity Identification engine may receive a digital media dataset at 202, according to some embodiments. In order to identify the boundary (e.g., cuts, dissolves, fades) of a single shot, the Automated Placement Opportunity Identification engine may rapidly preprocess the digital media using a
- the shot boundary detection mechanism may utilize a pretrained neural network model that receives as input the pixels of digital media and outputs final shot boundaries therefrom.
- This neural network may be fully convolutional in time, allowing it to use a large temporal context without continuously processing frames. More information regarding such a shot boundary detection mechanism is described in Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks (Gygli, Michael, May 23, 2017), and is hereby incorporated as reference.
- One or more neural networks may also be used to label the context of each shot at
- the context recognition engine may implement one or more various neural networks (pre-trained or otherwise) to identify the context of a scene, environment, location, or other data used to describe the context of a parti cular media.
- the contextual labels may be used as input to one or more neural networks to identify objects of interest, including placement opportunities, at 208.
- Placement opportunities may include object type recognition (e.g., cars, computers, food, beverage, etc.), scene/contextual recognition (e.g., office, outdoors, mountains, home, kitchen, etc.), audio/ speech recognition and categorization (e.g., subject of conversation/dialogue, keyword mapping, full transcriptions, etc.), and sensitive content (violence, nudity', alcohol, illicit drags, etc.), according to some embodiments.
- object type recognition e.g., cars, computers, food, beverage, etc.
- scene/contextual recognition e.g., office, outdoors, mountains, home, kitchen, etc.
- audio/ speech recognition and categorization e.g., subject of conversation/dialogue, keyword mapping, full transcriptions, etc.
- sensitive content violence, nudity', alcohol, illicit drags, etc.
- the Automated Placement Opportunity Identification engine may be implemented using various techniques.
- the Automated Placement Opportunity Identification engine may incorporate pre-trained neural networks that have been trained using publicly available computer vision datasets (e.g., Imagenet). These neural networks may be trained using this public data to learn and identify different labels, each of w'hich may be associated with a placement opportunity as described above.
- the Automated Placement Opportunity Identification engine may further incorporate a transformation of pre-trained neural networks to more accurately represent the intended models, according to some embodiments. Transforming pre-trained neural networks may include re-training, layer manipulation, progressive mutations, recurrent training, or any other alterations to a publicly -available, pre-trained neural net model.
- the Automated Placement Opportunity Identification engine may implement a custom neural net model that is trained by humans using a manual computer vision annotation tool, according to some embodiments.
- the computer vision annotation tool may allow a user to gather images for annotation. The user may then annotate and assign labels (e.g., using bounding boxes') to areas of the gathered images that the user identifies as placement opportunities. These labels are then used to train a custom neural net model for label identification purposes.
- Automated Placement Opportunity Identification engine may further implement an object tracking mechanism at 210, according to some embodiments.
- the object tracking mechanism may be used to match objects (e.g., placement opportunities) across various frames of a moving scene, fire object tracking mechanism may further be used to identify objects across various camera angles of the same scene at 212. By estimating depth and 3D geometry from 2D frames, the object tracking mechanism may be able to identify placement opportunities, according to some embodiments.
- the Automated Placement Opportunity Identification engine may output, at 214, the digital media dataset, indications of placement opportunities, as well as the labels added thereto.
- Compositing images onto still images may require a trivial amount of work. Analyzing a still image to detect availability for placing a composite image requires analysis of only one frame of a single image. Expanding this service to other media formats other
- SUBSTITUTE SHEET (RULE 26) than still images (e.g., video data) will benefit from further analysis and/or additional machine learning methods.
- the Automated Placement Opportunity Identification engine may use the above placement opportunity data points in at least two ways.
- a first way that the placement opportunity data points may be used is as an auditing tool for use by a human to evaluate the identified placement opportunity to determine whether or not to proceed with compositing. This may be used to reduce labor costs of analyzing video data for placement opportunities,
- the placement opportunity data points may be used in a search query to filter through the digital media available for compositing.
- the implementation of this search query may be used to identify many various aspects of a scene, including particular objects, scenery, dialogue category , presence of sensitive content, etc.
- This search query implementation may be integrated into a Placement Insertion Interface (PII) system as an inventory browsing tool.
- PII Placement Insertion Interface
- FIG 3 illustrates a placement inventory browsing tool 300 of a PII system, according to some embodiments.
- a user may use the placement inventory browsing tool to browse the inventory of digital media available to receive composites.
- This inventory may be organized by highly specific, individual placements of composites.
- This inventory may also be browsed by context as identified by the Automated Placement Opportunity Identification engine.
- Some embodiments may be browsed using other features as identified by the Automated Placement Opportunity Identification engine, the features including keywords, genres, formats, etc.
- Placement inventory browsing tool 300 includes a graphical user interface that displays options for browsing through the inventory of digital media available to recei ve composites.
- placement inventory browsing tool 300 includes a search bar utility 302, a genre selection utility 304, and a format selection utility 306.
- search bar utility 302 may receive keyword searches as show in Figure 3.
- Search bar utility 302 may also be a drop-down list, radio button, or any other graphical user interface element used to receive input, according to some embodiments.
- Genre selection utility 304 may receive one or more user selections from a drop-down list as shown in Figure 3.
- Genre selection utility 304 may also be a search bar, radio button, or any other graphical user interface element used to receive input, according to some embodiments.
- Genre selection utility 304 may provide selections such as Comedy, Horror, Action, Reality, and many other genres. Further yet, according to some embodiments, format selection utility 306 may receive one or more user selections from a drop-down list as shown in Figure 3. Format selection utility 306 may also be a search bar, radio button, or any other graphical user interface element used to receive input, according to some embodiments. Format selection utility 306 may provide selections such as In- Action Six, Overlay, Brand Insertion, Product Insertion, and many other formats.
- the graphical user interface of placement inventory browsing tool 300 may further include a search button 308, shown as "GO" in Figure 3.
- button 308 may be used to activate a search query.
- button 308 may fetch the query terms as provided by the user by way of GUI elements displayed on screen, such as search bar utility 302, genre selection utility 304, and format selection utility 306, according to some embodiments.
- Activation of search button 308 may further return results 310 based on a user's selections.
- the search query as shown in Figure 3 includes a keyword search for "New York City" in search bar utility 302.
- the search query includes a keyword search for "New York City" in search bar utility 302.
- SUBSTITUTE SHEET (RULE 26) as shown in Figure 3 further includes comedy in the genre selection utility 304, and all formats in the format selection utility 306.
- the search query as shown in Figure 3 returns at least two results 310: result 310A ("Broad City") and result 310B ("Jimmy Kimmel").
- Each of the results 310 may include a preview of the clip, a placement ID number, a program title, and a supply source, according to some embodiments.
- result 310A includes a placement ID number of 10124, a program title of "Broad City ,” and a supply source of "Viacom.”
- the graphical user interface of placement inventory browsing tool 300 may further include an upload button 312, shown as "Upload Asset” in Figure 3.
- upload button 312 may display for the user a graphical user interface whereby the user may upload his/her own digital media asset, according to some embodiments.
- the returned results in response to activation of a search button 308 may be fetched from a supply database of placements.
- the video clips as previewed in results 310 may be activated by a play button shown in the center of the video clip, according to some embodiments.
- this play button may activate a fetching protocol in which a preview video clip may be fetched from a server that hosts actual video assets of the placement video clips returned as results.
- activation of this play button may activate a preview of the video clip for the user's viewing, according to some embodiments.
- a preview of the video clip may provide completed composites previously rendered by other users. This may be done in order to show an example of how a composite looks when inserted into a particular video clip.
- FIG. 4 illustrates a placement video clip tool 400, which may allow users to setup new' placement video clips (also known as pre-composited versions of a specific shot from a specific digital media content video).
- a user may be able to upload a new creative graphic, according to some embodiments.
- a user may be able to upload a new creative graphic, according to some embodiments.
- SUBSTITUTE SHEET (RULE 26) may then select a placement video clip to preview a creative graphic composited thereon as shown m Figure 4 at 404.
- a user may also be able to simply select a placement video clip to preview without uploading a new creative graphic, according to some embodiments.
- a creative graphic may be newly uploaded at 406 or, alternatively, select a previously- uploaded creative graphic for the placement video clip at 407, such as the creative graphic uploaded at 402.
- a creative graphic may be uploaded either directly from a specific placement preview or from a distinct "upload" page. In either case, an uploaded creative can be inserted into any matching placements.
- the creative graphic may be programmatically adjusted to best fit the placement within the placement video clip.
- this programmatic adjustment may be accomplished through computer vision, permutationary rendering, or any other rendering technologies to provide one or more "best fit" options to be selected by the user.
- the user may then select one of the "best fit” options.
- the user may then edit the creative graphic by way of creative graphic editing tools for manual adjustments to more closely fit the placement at 408.
- a composited video clip will then be created and a preview rendering may be generated in order for the user to preview the composited video clip at 410.
- Figure 5A illustrates some "best fit" options that may be presented to a user as described above, according to some embodiments.
- the options may be presented in various ways by way of a user-interactive GUI, such as the GUI shown in Figure 5.
- best fit options 500 may include a fill mode 502 and a fit mode 504,
- SUBSTITUTE SHEET (RULE 26) among others.
- Other best fit options may be presented to a user, such as "stretch to fit,” “fit entirely,” and even more advanced modes such as programmatic skewing to account for various angles presented in placement video clips.
- the present technology may further recognize the best area of a creative graphic to display in a particular placement video clip.
- the present technology may include an area selection optimizer (ASO) engine, according to some embodiments.
- ASO area selection optimizer
- Area selection optimizer (ASO) engine may be used to programmatically recognize the optimal area of a creative graphic to display within the placement area of a placement video.
- ASO engine may be used to identify various features that typically indicate the focus of a graphic and may, according to some embodiments, extract such a feature for insertion into a placement video.
- ASO engine may further include logo identification, intelligent cropping, and optimal resizing,
- the ASO engine and APOI engine may implement a Guassian, machine learning, or otherwise computer vision algorithm to identify logos, faces, or other important features from a user's uploaded media or other media for use as a creative graphic.
- the ASO engine and APOI engine may use computer vision algorithms to analyze the pixel colors, brightness, and intensity to select a region that is a local minima with respect to brightness, as well as large enough for placement of a creative graphic, such as a logo, text or other overlay of interest.
- a creative graphic provided by the user may be altered according to instructions determined by the ASO engine.
- the creative graphic may also include a combination of one or more creative graphics composited onto or otherwise combined with each other.
- ASO engine may be used to recognize the most important features or otherwise an optimal area of the creative graphic before editing (e.g., 408) the creative graphic.
- the editing process may be programmatically enabled to include the features as recognized by the ASO engine.
- ASO engine may perform analytics on a creative graphic without altering or otherwise permanently changing the creative graphic.
- the ASO engine may perform analytics on a copy of the creative graphic in order to preserve the original creative graphic file.
- a creative graphic may be repeatedly analyzed, copied, and/or manipulated for placement in an unlimited number of placement video clips. For example, if a user uploads a creative graphic for placement in a first placement video clip, the said creative graphic may be copied, analyzed, further placed into the first placement video clip, preserving a copy of the said creative graphic.
- a user may then analyze and further place the same creative graphic (or a copy thereof) preserved from the previous upload across any number of placement video clips in the future
- the ASO engine may use one or more machine learning algorithms to identify important features or otherwise an optimal area of the creative graphic to include in a placement video clip. Similar to the Automated Placement Opportunity Identification engine, the machine learning algorithms as applied herein may be trained using training data provided by successful manipulation and placements of creative graphics, according to some embodiments.
- Some examples of important features identified by the ASO engine may include, but are not limited to, a face of an indi vidual, faces of a group of individuals, a group of people more generally, a prominent object of interest provided in the creative graphic, multiple objects of interest as provided in the creative graphic, objects or people at the center of the frame or alternatively in focus as provided in the creative graphic, among others.
- Important features identifiable by the ASO engine may further include, according to some
- SUBSTITUTE SHEET (RULE 26) embodiments, logos, icons, emblems, marks, designs, logotype designs, or other unique symbols associated with a company, organization, group, or individual.
- FIG. 5B illustrates an exemplary area selection optimization procedure, according to some embodiments.
- Exemplary ASO procedure 508 may include receiving a creative graphic 510 to identify or otherwise extract an important feature therein.
- Creative graphic 510 may include therein one or more important features identifiable by an ASO engine.
- creative graphic 510 includes features such as buildings, street lights, and a group of people 512.
- ASO engine 514 may be trained using training data including other creative graphics with prelabeled important features.
- ASO engine 514 may receive creative graphic 510 to identify important features therein and label them accordingly. Labeling may include applying a bounding box or other notation to a portion of creative graphic 510 to indicate that an important feature may be located therein.
- ASO engine 514 may determine that group of people 512 is an important feature of creative graphic 510 and apply thereto a label 516. According to some embodiments, ASO engine 514 may extract important features from creative graphic 510 (or a copy thereof) in addition to or instead of labeling. For example, ASO engine 516 may extract an identifiable feature 520 from creative graphic 510 (or a copy thereof) by eliminating therefrom features not identified as important by ASO engine 514 (e.g., buildings and street lights), leaving only an extracted group of people as the identified important feature 520.
- Figure 5C illustrates an exemplary area selection optimization procedure, according to some embodiments.
- Exemplary ASO procedure 520 includes receiving one or more creative graphics to identify or otherwise extract a logo or icon therefrom.
- ASO procedure 520 demonstrates ASO engine 526 receiving two different creative graphics, such as bottle graphic 522 and automobile graphic 524, both of which have a logo contained
- ASO engine 526 may be the same ASO engine 516 as described in ASO procedure 510 trained using training data similar to that of ASO engine 516 along with additional training data. Alternatively, ASO engine 526 may be separate from ASO engine 516. According to some embodiments, ASO engine 526 may be trained using training data including other creative graphics with pre-labeled logos contained therein. For example, ASO engine 526 may receive bottle graphic 522 for analysis, identifying and further extracting an important feature, such as logo 528, therefrom.
- ASO engine 526 may receive a different graphic for analysis, such as automobile graphic 524, to identify and further extract an important feature, such as logo 528, therefrom.
- ASO engine 526 may extract important features (e.g., logo 528) from a creative graphic (e.g., bottle graphic 522, automobile graphi c 524) irrespective of what the creative graphic displays.
- FIG. 6 illustrates the events that precede and succeed an automated compositing service, according to some embodiments.
- an HTTP request may be triggered at 602.
- This HTTP request at 602 may transmit information by way of a compositing service API.
- This information may include, but is not limited to the following data: placement ID, placement format number, creative asset ID, BG color, and video fit.
- Placement format number may include one or more of the following:
- HTTP request at 602 may be an automated scheduled job that continually checks for newly uploaded creative graphics.
- the Compositing service API 604 as shown in Figure 6 may query database tables to gather more information and assets that the compositing job may need, such as those indicated or otherwise requested by HTTP post 602.
- compositing service API 604 may transmit a query request 606 to a first database table, OTT placements table 608.
- OTT placements table 608 may transmit a response 610 containing bounding box coordinates that specify the positions of video and creative assets in the composited output.
- the coordinates transmitted at response 610 may be static or otherwise dynamic for the duration of the placement video clip.
- compositing service API 604 may further transmit a query request 612 to a second database table, creative assets table 614.
- creative assets table 614 may transmit a response 616 containing a creative ID to get the public URLs of the actual creative graphics (e.g., images, GIFs, video), as well as a headline and caption.
- compositing service API 604 may generate a compositing job using information received from responses 610 and 616, among other data. Compositing service API 604 may further transmit compositing job 618 as a queue request into queuing system 620. Compositing job 618 may contain data gathered by compositing service API 604, including one or more of: placement ID(s), format number(s), creative asset ID(s), video fit type(s), compositing variables, original content clips, and a combination thereof, among other data. Compositing variables may include, but are not limited to, bounding boxes and background colors, among others. According to some embodiments,
- SUBSTITUTE SHEET (RULE 26) queuing system 620 may transmit a response 622 to compositing service API 604, response 622 including a task ID and a queue time, among others.
- Data received by compositing sendee API 604 from responses 610, 616, and 622 may be transmitted to and stored in composite processes database table 626 for later reference or retrieval.
- Compositing job 618 be stored at queueing system 620 until it is passed into compositing service 628, shown as a complex web of logic nodes. Compositing service 628 will be further described below.
- the output 630 of compositing service 628 may be a composited version of the original placement video clip, according to some embodiments.
- the name of output 630 may use a variety' of naming conventions, including those based on the placement video ID.
- output 630 of compositing service 628 (e.g., composited version of the original placement video clip) may be uploaded to composite directory 632 and stored with a render ID for later use.
- the naming convention of output 630 may be used to generate an access URL 634 for storage and later retrieval at composite processes database table 626 for later reference and retrieval.
- compositing sendee API 604 may query composite processes database table 626 to receive a response 632 containing access URL 634.
- compositing service API may use the access URL 634 to fetch and reuse the already composited output 630 from composite directory 632.
- the Automated Compositing service as described above may encompass generating at least four OTT formats:
- the In-Action Six format may be used to composite a second video into a small portion of the frame while a first video is shrunk into another small comer of the same frame.
- the Overlay format may be used to simply overlay a second video onto the comer of a first video.
- the Brand Insertion format may be used to realistically composite still images into a scene of a video.
- the Product Insertion format may be used to composite 3D objects into a scene of a video. Both the In- Action Six format and the Overlay format may be considered on-top compositing, while the Brand Insertion format and Product Insertion format may be considered compositing into the scene.
- Figure 7 illustrates an on-top composite process, specifically an in-action six compositing logic 700.
- a Super Bowl video stream could be used as the original content video clip 702 as shown in Figure 7.
- the original content video clip 702 may be shrunk into a smaller portion of the frame as shown as "Squeezing Back" at 704.
- the original content video clip 702 squeezes back from filling the full screen to at least a partial portion of the screen.
- original content video clip 702 is confined by the original content bounding box detailed by the compositing variables (e.g., bounding boxes, background colors, etc.) as described above.
- creative content 708 may include one or more of the following: a creative video clip 710 and a headline & caption 712. Creative video clip 710 may be confined by a bounding box as shown in Figure 7. At 706, creative content 708
- SUBSTITUTE SHEET may fade onto the screen at various places and sizes, according to some embodiments. This fading process at 706 may be described as a sliding gradient from 0% opacity to 100% opacity within a predetermined time frame (e.g., 2 seconds).
- Creative content 708 may or may not be dynamic, according to some embodiments.
- Static and dynamic content may be displayed by creative content 708, according to some embodiments.
- original content video clip 702 scales back to 100% of the frame size.
- creative content 708 is static, for example, the original content video clip 702 may scale back to 100% of the frame size after a predetermined period of time (e.g., 6 seconds).
- the compositing process used to accomplish such a scaling effect of the original content video clip 702 and the insertion of creative content 708 may be described as the on- top compositing logic. According to some embodiments, this on-top compositing logic may utilize the following elements:
- Bounding box (x, y, w, h) for creative content such as creative content 708;
- the graphic insertion compositing service may use one or more of the following as inputs:
- a third layer of compositing sendee 900 as executed by graphic insertion formats may be alpha layer 906.
- Alpha layer 906 closely resembles the original base layer 902, however, alpha layer 906 contains a "cut-out" or an application area of in which a creative graphic may be inserted. The cut-out or application area may be added on top of the creative graphic layer 904 in order to generate shadows, objects, or any elements in the scene that may cover up the creative graphic.
- this layer handles characters blocking the creative graphic and illustrates the motion thereof.
- generating an alpha layer may further include identifying measurements on a z-axis for objects within the
- a fourth layer of compositing service 900 as executed by graphic insertion formats may be shadow layer 908.
- Shadow layer 908 may generate realistic shadows blended into the environment of the scene. According to various embodiments, these shadows may be realistically inserted by using a multiply blend mode.
- a fifth layer of compositing service 900 as executed by graphic insertion formats may be reflect layer 910.
- Reflect layer 910 may generate reflections over the layers as described above in order to match the environment of the scene. According to various embodiments, these reflections may be realistically inserted by using a screen blend mode.
- the layers as described above are combined or otherwise composited together for a single frame of the entire placement video. The layering and compositing process are performed repeatedly for each frame of a placement video clip. For example, if a 1 minute video clip has a frame rate of 30 frames per second, this layering and compositing process may be performed once per frame for a total of about 1,800 times.
- Figure 10 illustrates an exemplary insertion of a motion blur effect, according to some embodiments.
- a motion blur effect procedure 1000 may analyze an image before applying motion blur to a creative graphic.
- the creative graphic 1002 shows what a creative graphic may look like before a motion blur effect is applied using compositing logic.
- Creative graphic 1004 demonstrates what creative graphic 1002 would look like with motion blur effects applied using compositing logic as described above.
- depth of field blur may be generated and otherwise applied in a similar manner, wherein the pixels surrounding the placement may be analyzed for depth
- FIG 11 illustrates an automated compositing service, according to some embodiments.
- the automated compositing service 1100 receives as input at least a base image 1102 and a creative graphic 1110,
- base image 1102 may be one or more of a media dataset, such as an image, a single video frame, or multiple video frames.
- Automated compositing service 1100 may include one or more neural networks, such as a computer vision neural network 1104 and a compositing neural network 1108.
- Base image 1102 may be analyzed by computer vision neural network 1104 to determine scene parameters 1106.
- Scene parameters 1106 may include various characteristics of base image 1102, including, but not limited to, camera data, objects in the scene, context of the scene, transformations performed on the scene, light data of the scene, materials in the scene, geometry data of the scene, among other data related to base image 1102.
- the output of computer vision neural network 1104 may be used as input for compositing neural network 1108.
- compositing neural network 1108 may receive as input base image 1102, scene parameters 1 106, as well as creative graphic 1 1 10.
- Creative graphic 1110 may include an
- SUBSTITUTE SHEET (RULE 26)
- some or ah of the processing described above can be carried out on a personal computing device, on one or more centralized computing devices, or via cloud-based processing by one or more servers.
- some types of processing occur on one device and other types of processing occur on another device.
- some or all of the data described above can be stored on a personal computing device, in data storage hosted on one or more centralized computing devices, or via cloud- based storage.
- some data are stored in one location and other data are stored in another location.
- quantum computing can be used.
- functional programming languages can be used.
- electrical memory such as flash-based memory, can be used.
- General-purpose computers, network appliances, mobile devices, or other electronic systems may also be included in an example system implementing the processes described herein.
- a system can include a processor, a memory, a storage device, and an input/output device. Each of the components may be interconnected, for example, using a system bus.
- the processor is capable of processing instructions for execution within the system.
- the processor is a single-threaded processor.
- the processor is a multi-threaded processor.
- the processor is capable of processing instructions stored in the memory or on the storage device.
- the memory stores information within the system.
- the memory is a non-transitory computer-readable medium.
- the memoiy is a volatile memory unit.
- the memory is a non-volatile memory unit.
- the storage device is capable of providing mass storage for the system.
- the storage device is a non-transitory computer-readable medium.
- the storage device may include, for example, a hard disk
- SUBSTITUTE SHEET (RULE 26) device an optical disk device, a solid-date drive, a flash drive, or some other large capacity storage device.
- the storage device may store long-term data (e.g., database data, file system data, etc.).
- the input/output device provides input/output operations for the system.
- the input/output device may include one or more of a network interface device, e.g., an Ethernet card, a serial communication device, e.g., an RS- 232 port, and/or a wireless interface device, e.g., an 802.11 card, a 3G wireless modem, or a 4G wireless modem.
- the input/output device may include driver devices configured to receive input data and send output data to other input/output devices, e.g., keyboard, printer and display devices.
- driver devices configured to receive input data and send output data to other input/output devices, e.g., keyboard, printer and display devices.
- mobile computing devices, mobile communication devices, and other devices may be used.
- At least a portion of the approaches described above may be realized by instructions that upon execution cause one or more processing devices to carry out the processes and functions described above.
- Such instructions may include, for example, interpreted instructions such as script instructions, or executable code, or other instructions stored in a non-transitory computer readable medium.
- the storage device may be implemented in a distributed way over a network, such as a server farm or a set of widely- distributed servers, or may be implemented in a single computing device.
- the program instructions can be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
- the computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
- system may encompass all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers
- a processing system may include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
- a processing system may include, in addition to hardware, code that creates an execution environment for the computer program in question, e g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
- a computer program (which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a standalone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
- a computer program may, but need not, correspond to a file in a file system.
- a program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code).
- a computer program can be deployed to be
- SUBSTITUTE SHEET (RULE 26) executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
- Computers suitable for the execution of a computer program can include, by way of example, general or special purpose microprocessors or both, or any other kind of central processing unit.
- a central processing unit will receive instructions and data from a read-only memoiy or a random access memory or both.
- a computer generally includes a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.
- a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
- mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
- a computer need not have such devices.
- a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device (e.g., a universal serial bus (USB) flash drive), to name just a few.
- PDA personal digital assistant
- GPS Global Positioning System
- USB universal serial bus
- Computer readable media statable for storing computer program instructions and data include all forms of nonvolatile memory, media and memoiy devices, including by v/ay of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memoiy devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks;
- semiconductor memory devices e.g., EPROM, EEPROM, and flash memoiy devices
- magnetic disks e.g., internal hard disks or removable disks
- magneto optical disks magneto optical disks
- SUBSTITUTE SHEET (RULE 26) and CD-ROM and DVD-ROM disks.
- the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
- a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
- a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
- a keyboard and a pointing device e.g., a mouse or a trackball
- Other kinds of devices can be used to provide for interaction with a user as well ; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
- a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages
- Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back end, middleware, or front end components.
- the components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.
- LAN local area network
- WAN wide area network
- the computing system can include clients and servers.
- a client and server are generally remote from each other and typically interact through a communication network.
- SUBSTITUTE SHEET (RULE 26) The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
- SUBSTITUTE SHEET (RULE 26) list
- “or” or “and/or” shall be interpreted as being inclusive, i.e., the inclusion of at least one, but also including more than one, of a number or list of elements, and, optionally, additional unlisted items. Only terms clearly indicated to the contrary, such as “only one of or “exactly one of,” or, when used in the claims, “consisting of,” will refer to the inclusion of exactly one element of a number or list of elements. In general, the term “or” as used shall only be interpreted as indicating exclusive alternatives (i.e.
- the phrase "at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements.
- This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase "at least one" refers, whether related or unrelated to those elements specifically identified.
- At least one of A and B can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Marketing (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- Processing Or Creating Images (AREA)
- Image Analysis (AREA)
Abstract
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP21759478.7A EP4189591A1 (fr) | 2020-08-03 | 2021-08-03 | Système et procédé de préparation de composites numériques en vue d'incorporation dans des supports visuels numériques |
Applications Claiming Priority (6)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/983,064 US11301715B2 (en) | 2020-08-03 | 2020-08-03 | System and method for preparing digital composites for incorporating into digital visual media |
| US16/983,064 | 2020-08-03 | ||
| US16/984,608 US11625874B2 (en) | 2020-08-04 | 2020-08-04 | System and method for intelligently generating digital composites from user-provided graphics |
| US16/984,608 | 2020-08-04 | ||
| US16/986,617 | 2020-08-06 | ||
| US16/986,617 US10984572B1 (en) | 2020-08-06 | 2020-08-06 | System and method for integrating realistic effects onto digital composites of digital visual media |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022031723A1 true WO2022031723A1 (fr) | 2022-02-10 |
Family
ID=77499939
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2021/044374 Ceased WO2022031723A1 (fr) | 2020-08-03 | 2021-08-03 | Système et procédé de préparation de composites numériques en vue d'incorporation dans des supports visuels numériques |
Country Status (2)
| Country | Link |
|---|---|
| EP (1) | EP4189591A1 (fr) |
| WO (1) | WO2022031723A1 (fr) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130091519A1 (en) * | 2006-11-23 | 2013-04-11 | Mirriad Limited | Processing and apparatus for advertising component placement |
| US20140359656A1 (en) * | 2013-05-31 | 2014-12-04 | Adobe Systems Incorporated | Placing unobtrusive overlays in video content |
| US20170330363A1 (en) * | 2016-05-13 | 2017-11-16 | Yahoo Holdings Inc. | Automatic video segment selection method and apparatus |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9711182B2 (en) * | 2011-06-07 | 2017-07-18 | In Situ Media Corporation | System and method for identifying and altering images in a digital video |
-
2021
- 2021-08-03 WO PCT/US2021/044374 patent/WO2022031723A1/fr not_active Ceased
- 2021-08-03 EP EP21759478.7A patent/EP4189591A1/fr not_active Withdrawn
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130091519A1 (en) * | 2006-11-23 | 2013-04-11 | Mirriad Limited | Processing and apparatus for advertising component placement |
| US20140359656A1 (en) * | 2013-05-31 | 2014-12-04 | Adobe Systems Incorporated | Placing unobtrusive overlays in video content |
| US20170330363A1 (en) * | 2016-05-13 | 2017-11-16 | Yahoo Holdings Inc. | Automatic video segment selection method and apparatus |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4189591A1 (fr) | 2023-06-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11625874B2 (en) | System and method for intelligently generating digital composites from user-provided graphics | |
| US11783461B2 (en) | Facilitating sketch to painting transformations | |
| US10657652B2 (en) | Image matting using deep learning | |
| US10049308B1 (en) | Synthesizing training data | |
| US10956784B2 (en) | Neural network-based image manipulation | |
| US20200111241A1 (en) | Method and apparatus for processing video image and computer readable medium | |
| US10049477B1 (en) | Computer-assisted text and visual styling for images | |
| RU2542923C2 (ru) | Размещение рекламы с учетом видеоконтента | |
| EP2587826A1 (fr) | Procédé et système d'extraction et d'association pour objets d'intérêt dans vidéo | |
| US20140189476A1 (en) | Image manipulation for web content | |
| WO2017190639A1 (fr) | Procédé d'affichage d'informations multimédias, client et serveur | |
| EP1887526A1 (fr) | Système de réalité vidéo numériquement amplifiée | |
| CN110390048A (zh) | 基于大数据分析的信息推送方法、装置、设备及存储介质 | |
| CN105141987A (zh) | 广告植入方法和广告植入系统 | |
| US10984572B1 (en) | System and method for integrating realistic effects onto digital composites of digital visual media | |
| WO2019089097A1 (fr) | Systèmes et procédés permettant de générer un scénarimage récapitulatif à partir d'une pluralité de trames d'image | |
| US11126788B2 (en) | Font capture from images of target decorative character glyphs | |
| Pęśko et al. | Comixify: Transform video into comics | |
| CN116954605A (zh) | 页面生成方法、装置及电子设备 | |
| Hu et al. | Video summarization via exploring the global and local importance | |
| EP3396964B1 (fr) | Placement de contenu dynamique dans une image fixe ou une vidéo | |
| US20150181288A1 (en) | Video sales and marketing system | |
| US11301715B2 (en) | System and method for preparing digital composites for incorporating into digital visual media | |
| WO2022031723A1 (fr) | Système et procédé de préparation de composites numériques en vue d'incorporation dans des supports visuels numériques | |
| CN108737892A (zh) | 媒体中的动态内容渲染 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21759478 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2021759478 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2021759478 Country of ref document: EP Effective date: 20230303 |
|
| WWW | Wipo information: withdrawn in national office |
Ref document number: 2021759478 Country of ref document: EP |