[go: up one dir, main page]

WO2025170579A1 - Generative model tuning and inference utilizing quality signals - Google Patents

Generative model tuning and inference utilizing quality signals

Info

Publication number
WO2025170579A1
WO2025170579A1 PCT/US2024/014738 US2024014738W WO2025170579A1 WO 2025170579 A1 WO2025170579 A1 WO 2025170579A1 US 2024014738 W US2024014738 W US 2024014738W WO 2025170579 A1 WO2025170579 A1 WO 2025170579A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
generated
computing system
output
machine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/014738
Other languages
French (fr)
Inventor
Victor Carbune
Matthew Sharifi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Priority to PCT/US2024/014738 priority Critical patent/WO2025170579A1/en
Publication of WO2025170579A1 publication Critical patent/WO2025170579A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • G06F16/33295Natural language query formulation in dialogue systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Definitions

  • the present disclosure relates generally to machine learning processes and machine-learned devices and systems. More particularly, the present disclosure relates to leveraging quality signals associated with training resources to prioritize particular resources during training and model inferences, which may include training and leveraging a machine- learned reward model.
  • a computer can receive input(s).
  • the computer can execute instructions to process the input(s) to generate output(s) using a parameterized model.
  • the computer can obtain feedback on its performance in generating the outputs with the model.
  • the computer can generate feedback by evaluating its performance.
  • the computer can receive feedback from an external source.
  • the computer can update parameters of the model based on the feedback to improve its performance. In this manner, the computer can iteratively ‘'learn 7 ’ to generate the desired outputs.
  • the resulting model is often referred to as a machine-learned model.
  • Foundation models can be used for a wide variety of tasks.
  • the general foundation models can generate outputs for a plurality of different tasks; however, the outputs may be of low quality.
  • the foundation models can include generative models that may generate outputs via next token prediction.
  • the generative models may be trained on training datasets that include variance in the quality of training examples.
  • Reinforcement learning from human feedback can be utilized to further train and/or tune based on asking humans to rate preferred responses.
  • the reinforcement learning from human feedback can increase perceived quality; however, the training loop may be dependent on requesting feedback, which may further propagate human biases.
  • the method can include obtaining, by a computing system including one or more processors, a content dataset.
  • the content dataset can include a plurality of content items associated with a plurality of respective resources and a plurality of qualityscores associated with the plurality of respective resources.
  • the plurality of quality scores can be descriptive of a quality of a respective resource as a search result.
  • the method can include processing, by the computing system, a prompt with a generative model to generate a plurality of candidate model-generated responses.
  • the prompt can include a request for information.
  • determining the particular model-generated response of the plurality of candidate model-generated responses to provide as an output can include determining, by the computing system and based on the content dataset, the particular model-generated response of the plurality of candidate model-generated responses is associated with a respective content item of the subset of content items with a respective quality score greater than the other quality scores of the set of respective quality scores associated with other content items of the subset of content items associated with a set of other candidate model-generated responses.
  • the content dataset can include a plurality of pre-existing content items published on the internet.
  • the generative model can include a pretrained autoregressive language model.
  • the particular model-generated response can include a natural language response to the prompt.
  • the method can include processing, by the computing system, a second prompt with the generative model to generate a plurality of model-generated fragments.
  • the plurality of model -generated fragments can include a plurality of different candidate responses to the second prompt.
  • the method can include processing, by the computing system, the plurality of model-generated fragments with the machine-learned reward model to generate a plurality 7 of respective scores.
  • the plurality of respective scores can be associated with evaluating a quality 7 of the plurality 7 of modelgenerated fragments.
  • the method can include providing, by the computing system, a particular model-generated fragment of the plurality of model-generated fragments for display based on the plurality of respective scores.
  • the plurality of quality 7 scores may have been determined by processing the plurality of content items and plurality of respective metadata sets associated with the plurality of respective resources with a ranking engine to generate a plurality of ranking scores.
  • the ranking engine can be associated with a search engine.
  • the ranking engine can be configured to rank resources to determine particular resources to provide as search results.
  • the plurality of quality scores may have been determined based on incoming links and outgoing links for the plurality of respective resources.
  • the prompt may include multimodal data.
  • the multimodal data can include an image and text descriptive of a question associated with the image.
  • the particular modelgenerated response may include a predicted answer responsive to the question that is based on one or more image features of the image.
  • the system can include one or more processors and one or more non-transitory computer-readable media that collectively store instructions that, when executed by the one or more processors, cause the computing system to perform operations.
  • the operations can include obtaining a training dataset.
  • the training dataset can include a plurality of content items associated with a plurality of respective web resources and a plurality of quality scores associated with the plurality of respective web resources.
  • the plurality of quality scores can be determined based on incoming links and outgoing links for the plurality of respective web resources.
  • the operations can include processing a prompt with a generative model to generate a plurality' of probabilities associated with a plurality of candidate model outputs.
  • the operations can include determining a first ground truth example from the training dataset is associated with a first candidate model output of the plurality of candidate model outputs and a second ground truth example from the training dataset is associated with a second candidate model output of the plurality of candidate model outputs.
  • the operations can include determining the first ground truth example is associated with a first web resource with a higher quality score than a second web resource associated with the second ground truth example.
  • the operations can include evaluating a loss function that evaluates a difference between a first probability associated with the first candidate model output and a second probability' associated with the second candidate model output and adjusting one or more parameters of the generative model based on the loss function.
  • the operations can include obtaining input data.
  • the input data can be descriptive of a user prompt.
  • the operations can include processing the input data with the generative model to generate a model-generated response.
  • the modelgenerated response can be responsive to the user prompt.
  • the operations can include providing the model-generated response as an output.
  • the user prompt can include a natural language question.
  • the model -generated response can include a plurality of predicted words responsive to the question.
  • the model-generated response can include a sequence of words that differs from the plurality' of content items.
  • the user prompt can include multimodal data.
  • the user prompt can include an image and a question associated with the image.
  • the plurality of quality scores can be determined based on an amount of references to a respective web resource within other resources. In some implementations, the plurality of quality scores can be determined based on how the respective web resource is referenced.
  • Another example aspect of the present disclosure is directed to one or more non-transitory computer-readable media that collectively store instructions that, when executed by one or more computing devices, cause the one or more computing devices to perform operations.
  • the operations can include obtaining a training dataset.
  • the training dataset can include a plurality of content items associated with a plurality of respective web resources and a plurality of quality scores associated with the plurality of respective web resources.
  • the plurality of quality scores can be determined based on incoming links and outgoing links for the plurality of respective web resources.
  • the operations can include training a machine-learned reward model on the training dataset.
  • the machine-learned reward model can be trained to rank a set of data based on a determined quality score.
  • the operations can include obtaining a prompt and processing the prompt with a generative model to generate a plurality of model -generated fragments.
  • the plurality of model-generated fragments can include a plurality of different candidate responses to the prompt.
  • the operations can include processing the plurality of model -generated fragments with the machine-learned reward model to generate a plurality of respective scores.
  • the plurality of respective scores can be associated with evaluating a quality of the plurality of model-generated fragments.
  • the operations can include providing a particular model-generated fragment of the plurality’ of model-generated fragments as an output based on the plurality of respective scores.
  • the operations can include obtaining a plurality of interaction datasets associated with a plurality of additional resources.
  • the plurality of additional resources can include a plurality of model-generated content items that were previously generated with the generative model.
  • the plurality of interaction datasets can be descriptive of respective interactions with the plurality of additional resources by a plurality of users.
  • the operations can include adjusting one or more parameters of the machine-learned reward model based on the plurality of interaction datasets and the plurality of model-generated content items.
  • the method can include determining, by the computing system, a ground truth example from the training dataset based on the plurality of quality scores and evaluating, by the computing system, a loss function that evaluates a difference between the model-generated output and the ground truth example.
  • the method can include adjusting, by the computing system, one or more parameters of the generative model based on the loss function.
  • the method can include obtaining, by the computing system, a plurality of interaction datasets associated with a plurality of additional resources.
  • the plurality of additional resources can include a plurality of model-generated content items that were previously generated with the generative model.
  • the plurality of interaction datasets can be descriptive of respective interactions with the plurality of additional resources by a plurality of users.
  • the method can include training, by the computing system, a machine-learned reward model based on the plurality of interaction datasets and the plurality of model-generated content items and storing, by the computing system, the machine-learned reward model and the generative model.
  • the reward model can be further trained on the training dataset and the plurality of scores.
  • the method can include processing, by the computing system, a second prompt with the generative model to generate a plurality of model-generated fragments.
  • the plurality of model -generated fragments can include a plurality of different candidate responses to the second prompt.
  • the method can include processing, by the computing system, the plurality of model-generated fragments with the machine-learned reward model to generate a plurality 7 of respective scores.
  • the plurality of respective scores can be associated with evaluating a quality 7 of the plurality 7 of modelgenerated fragments.
  • the method can include providing, by the computing system, a particular model-generated fragment of the plurality 7 of model-generated fragments as an output based on the plurality 7 of respective scores.
  • Figure 2 depicts a block diagram of an example generative model tuning system according to example embodiments of the present disclosure.
  • the present disclosure is directed to systems and methods for generative model tuning and/or generative model output generation based on signals associated with resources of a training dataset.
  • the signals can include quality signals that may include a resource rank (e.g., a page rank based on a number and/or qualify of links to and/or from the specific resource), interaction data (e.g., view traffic, view times, selection instances, etc.), content type, and/or resource type (e.g.. scholastic versus social media).
  • a resource rank e.g., a page rank based on a number and/or qualify of links to and/or from the specific resource
  • interaction data e.g., view traffic, view times, selection instances, etc.
  • content type e.g.. scholastic versus social media
  • the systems and methods disclosed herein can leverage the qualify signals for tuning the generative model, selecting a particular candidate output, and/or model selection.
  • the plurality of quality scores may be obtained through large-scale processing of the corpus where relationships between documents are inferred (e.g., processing which may include (1) through incoming/outgoing links to respective web resources, (2) through information present in multiple training examples, (3) through meta-quality signals present in the training corpus, such as information about one part of piece of info being incorrect, and/or another being correct, and/or (4) through human action present within the corpus, such as content being identified as shared, etc.).
  • the systems and methods can then obtain a prompt.
  • the systems and methods can process the prompt with a generative model to generate a model-generated output responsive to the prompt.
  • the systems and methods can determine a ground truth example from the training dataset based on the plurality of quality scores.
  • the systems and methods may then evaluate a loss function that evaluates a difference between the modelgenerated output and the ground truth example and adjust one or more parameters of the generative model based on the loss function.
  • the systems and methods may leverage the quality 7 signals for model inference.
  • the generative model may process the prompt to generate a plurality of candidate model-generated outputs.
  • One or more first candidate model-generated outputs may be associated with a first content item of a first resource
  • one or more second candidate model-generated outputs may be associated with a second content item of a second resource.
  • the association may be determined based on the sequence prediction being similar to a sequence within the respective content item.
  • the first content item may be associated with a first quality score
  • the second content item may be associated with a second quality score.
  • the systems and methods can determine the first score is descriptive of a higher quality resource than the resource associated with the second score.
  • the systems and methods may select the one or more first candidate model-generated outputs as the model-generated response to the prompt.
  • the plurality of candidate model-generated outputs may include candidate response fragments. The selected candidate response fragment can then be fed back into the generative model to generate a full-length model-generated response.
  • the systems and methods can include obtaining an additional machine-learned model (e.g., a machine-learned reward model) that may be part of and/or separate from the generative model.
  • the additional machine-learned model e.g., the machine-learned reward model
  • the additional machine-learned model may be trained on the training dataset and the plurality’ of quality scores to train the machine-learned reward model to generate quality scores for the model-generated outputs to identify candidate model -generated outputs that are of high quality.
  • the machine-learned reward model may be trained and/or tuned on interact on data associated with previously generated model-generated outputs.
  • the view data, web traffic, reviews, and/or other interaction data associated with previously generated model-generated outputs can be leveraged to train and/or tune the machine-learned reward model to identify which model-generated outputs are of high quality and/or more likely to receive interactions.
  • the trained machine-learned reward model can then be leveraged for generative model tuning and/or for candidate modelgenerated output selection.
  • the machine-learned reward model can be utilized to determine a particular generative model of a plurality of candidate generative models to utilize for a particular task.
  • the machine-learned reward model can include one or more models that output pointwise or pairwise scores.
  • the machine-learned reward model can be implemented via a plurality of different architectures, sizes, weights, and/or configurations.
  • foundation models can be used for various tasks, such as creative writing, summarization, coding.
  • the foundation models may be utilized in increasingly complex tasks where the overall goal might involve solving multiple subtasks in order to fulfill a complex query’ from a user.
  • the output of such models may be obtained through a decoding process (e.g., beam search, greedy) that takes into account the likelihood of the next token, trained in an unsupervised manner from a large corpus of content available (e.g. on the web).
  • a decoding process e.g., beam search, greedy
  • the process may include a lot of variability in the quality of the training data, which can lead to the models sometimes generating low quality outputs.
  • the systems and methods disclosed herein can leverage quality signals (e.g., reference data, interaction data, and/or type classification) to tune and/or guide a generative model to generate high quality outputs, which may include solving multiple subtasks in order to fulfill a complex query.
  • the quality signals can include determined quality scores, which may be determined by evaluating a quantity and/or quality of incoming and outgoing references to a resource associated with a content item training example. For example, resources with more incoming links than outgoing links may have a determined higher quality score than resources with less incoming links than outgoing links. Additionally and/or alternatively, citations in academic papers and/or explicit references may be weighted more heavily than a link at the bottom of a web page.
  • the use of the quality signals can eliminate and/or mitigate the reliance on additional training data.
  • a smaller training dataset can be leveraged for tuning and/or model inference for the same and/or better quality than much larger datasets.
  • a particular model-generated fragment of a plurality of candidate model-generated fragments can be determined before a full-response is generated, which may then be leveraged for generating the full-response to reduce the computational cost of generating a plurality of full responses.
  • the systems and methods can be utilized for large language models (e.g.. an autoregressive language model), image generation models (e.g., text-to-image generation models (e.g., a diffusion model), audio generation models (e.g., a song generation model), multimodal data generation model, and/or other generative models.
  • the systems and methods can be utilized to tune pre-trained generative models, to determine which of a plurality of model-generated datasets to utilize as the output, and/or to determine which generative model to utilize.
  • the systems and methods disclosed herein can leverage signals that more directly estimate the response value, which may enable better separation of low-value and high-value results.
  • the signals can be obtained from existing training data and/or from additional metadata, rather than via a separate external process involving human raters.
  • a search ranking engine may be utilized as an intrinsic value signal for distinguishing two or more model-generated outputs (e.g., two or more large language model (LLM) responses).
  • the search ranking engine may rank web pages and/or other resources by evaluating the number and quality of links to a web page and/or other resource.
  • the search ranking engine may prioritize resources (e.g., resources associated with search results) that receive more links from other web pages and/or other resources.
  • model-generated outputs e.g., LLM responses
  • the user may be presented with the answer which is most likely found in data.
  • the systems and methods may have the response that is associated more closely with a page that has a higher page rank score be preferred and returned to the user.
  • the underlying training/tuning documents may be generative model generated (e.g., LLM generated (fully or partially)).
  • a user can have a content item (e.g., a document) generated using a generative model (e.g.. an LLM) and may publish the content item (e.g. publish a document on the web, via an email, etc ).
  • a generative model e.g. an LLM
  • the signals can then be fed back to fine tune the generative model’s (e.g.. the LLM’s) reward model and can guide the reward model towards creating content which is measurably higher quality on subsequent uses.
  • LLM cloud offerings can provide analytics about the value of the answers given to users.
  • the reward model e.g., a value estimation model
  • the reward model and/or the generative model may be used for analyzing conversations and identifying average values a model generates.
  • models e.g., a language model trained for multilinguality, reasoning, and/or coding versus a foundation model trained for providing access to machine-learned model capabilities
  • developers may observe a change in the score depending on the model capabilities to solve tasks asked by their user bases.
  • the systems and methods can perform task analytics and personalization for users based on their needs from the generative model (e.g., an LLM) they interact with.
  • Generative model cloud offerings may be able to provide, in addition to input/output token costs, task specific costs based on how much value these bring to the user.
  • the systems and methods may determine the skills of a user, which can then be leveraged to determine when to provide generative model aid and/or suggestions. For example, there may be users better skilled at math than at computer science. For these users, solving better computer science problems may bring higher value.
  • the experienced based generative model usage can be used in an interactive human-in-the-loop scenario, where the model proactively offers to solve subtasks according to their perceived value for a given user in a given context.
  • the systems and methods can include a pretraining procedure and/or decoding changes.
  • the text-based training datasets that are normally for generative models e.g.. LLMs
  • LLMs generative models
  • the data can be extracted from various pipelines and can be extended to any other information quality' signals that are available and used in information (for example content item) retrieval.
  • corpuses with (query, answer) pairs may benefit from quality metrics conditioned on the query'.
  • a transformation can be applied (e.g., for page rank, a naive approach may be to take various connections to the paragraph extracted and citations to it instead of the entire document score). Additional signals may then be used here (e.g., whether the task being solved has some universal notion of how difficult the task is to solve (the difficulty may be self-inferred by the number of actions needed to solved it (or total compute used), which may be tied to a model capacity) or how much value the task brings (e.g. organizing the shopping cart for a user will have some intrinsic value)).
  • the model architecture may include a transformer encoder/ decoder (and/or other variants).
  • the next step can include either using the reward model at training time (e.g., through reinforcement learning) and/or alternatively at inference time directly through a scoring mechanism.
  • the decoder can either take the trained reward model as a scoring signal which can be applied during beam search (however, the scoring may not be applied per-token, rather per-sentence and/or alternatively a chosen granularity) and/or, may apply the scoring as a post-processing step when the response is re-ranked using the scoring model.
  • interaction data e.g., web traffic, time of viewing, search result page selection data, view trends, selection trends, sharing trends, etc.
  • interaction data e.g., web traffic, time of viewing, search result page selection data, view trends, selection trends, sharing trends, etc.
  • the ty pe of content and/or type of resource can be determined and utilized to determine which content items to prioritize during training and/or inference selection.
  • the use of the quality signals can reduce the number of training examples for tuning and/or training models to achieve a particular quality of generative model output, can increase the quality of generative model outputs, and can reduce the transmission that is relied on during reinforcement learning from human feedback.
  • Another example technical effect and benefit relates to improved computational efficiency and improvements in the functioning of a computing system.
  • Figure 1 depicts a block diagram of an example content generation system 10 according to example embodiments of the present disclosure.
  • the content generation system 10 can be configured to receive a set of input data descriptive of a prompt 12 (for example providing the prompt 12 or providing data interpretable by the content generation system 10 to obtain the prompt 12) and, as a result of receipt of the input data, provide output data that includes a model-generated output 18.
  • the content generation system 10 can include a generative model 14 that is operable to process the prompt 12 and generate a plurality of token predictions for generating a content item.
  • the content generation system 10 can obtain a prompt 12.
  • the prompt 12 can include a text string descriptive of a query and/or question.
  • the prompt 12 may include multimodal data.
  • the prompt 12 can include one or more images.
  • a generative model 14 can process the prompt 12 to generate a modelgenerated output 18.
  • the generative model 14 may include a pre-trained language model, a pre-trained image generation model, a pre-trained audio generation model, and/or another model.
  • the generative model 10 may have been trained for next token prediction.
  • the plurality of quality' signals can include a page ranking based on link data associated with the respective content item.
  • the page ranking may be based on the quantity' and/or quality of incoming links and/or outgoing links.
  • the plurality of quality signals can include interaction data (e.g., view count, view time, view traffic data, view trends, selection trends, and/or other interaction data), the type of content (e.g., a blog, an academic paper, a social media post, a conversation, a to do list, a memoir, and/or other content type), the type of resource (e.g., a personal web page, a social media platform, a news website, a scholastic publisher, and/or other resource type), and/or other quality signals.
  • the plurality of quality signals may be utilized for model inference.
  • the model -generated output 18 can include a model -generated response to the prompt 12.
  • the model-generated output 18 can include text data, image data, audio data, latent encoding data, multimodal data, and/or other data.
  • the model-generated output 18 can include a plurality of predicted words, a plurality of predicted pixels, and/or a plurality of predicted audio signals.
  • the model-generated output 18 can be descriptive of a predicted answer to a question of the prompt 12.
  • the model-generated output 12 can include a natural language response, a generated image, a generated audio clip, and/or other generated data.
  • FIG. 2 depicts a block diagram of an example generative model tuning system 200 according to example embodiments of the present disclosure.
  • the generative model tuning system 200 can include tuning a generative model 214 based on quality signals associated with content items of a training dataset 216.
  • the tuning can include leveraging a machine-learned reward model 220 for training example selection for evaluating a generative model 214 output.
  • the generative model tuning system 200 can obtain a prompt 212.
  • the prompt 212 can include one or more inputs for the generative model 214.
  • the prompt 212 can include a query, a conversational question, a question about another input (e.g.. an image), and/or other inputs.
  • the generative model 214 can process a prompt 212 to generate a modelgenerated output 218.
  • the generative model 214 can include a pre-trained model, which may include a large language model, a vision language model, an image generation model, an audio generation model, and/or other generative models.
  • the model-generated output 218 can include a predicted response to the prompt 212. The predicted response can be generated based on one or more learned sequences, one or more learned knowledge graphs, one or more learned representations, and/or one or more learned features for token prediction.
  • the modelgenerated output 218 can include text data, image data, audio data, structure data, latent encoding data, multimodal data, and/or other data.
  • the generative model tuning system 200 can then process the model-generated output 218 to determine a particular content item 222 from the training dataset 216 to leverage for evaluating the model -generated output 218.
  • the generative model tuning system 200 can determine a subset of the training dataset 216 is associated with the model-generated output 218.
  • the generative model tuning system 200 can then determine the one or more respective quality signals associated with a particular content item 222 is descriptive of a highest quality content item from the subset of the training dataset 216.
  • the particular content item 222 determination may be determined by a machine-learned reward model 220.
  • the machine-learned reward model 220 may have been trained on qualitysignals associated with one or more content items.
  • the reward model 220 may have been trained on the training dataset 216.
  • the reward model 220 may have been trained on interaction data associated with previously generated model-generated content items that were published on a web page and/or other web resource.
  • the particular content item 222 may be determined and/or selected without a machine-learned reward model 220.
  • a loss function 224 can then be evaluated based on the model -generated output 218 and the particular content item 222.
  • the evaluation may include comparing the model-generated output 218 and the particular content item 222 to determine a gradient descent.
  • the gradient descent can then be backpropagated to the generative model 214 to adjust one or more parameters of the generative model 214.
  • the gradient descent may incentivize the generative model 214 to generate outputs that more closely follow the sequences, reasoning, and/or logic of the particular content item 222 determined to be of high quality based on the respective quality signals.
  • the quality signals may be directly utilized to evaluate a loss function 224. Additionally and/or alternatively, the quality scores generated with the machine-learned reward model 220 may be utilized to evaluate the loss function 224.
  • the reward model-based content generation 300 can obtain a prompt 312.
  • the prompt 312 can include one or more inputs for the generative model 314.
  • the prompt 312 can include a query, a conversational question, a question about another input (e.g., an image), and/or other inputs.
  • the generative model 314 can process the prompt 312 to generate a plurality of candidate model-generated output 318, which can include a first candidate output, a second candidate output, and/or an nth candidate output.
  • the generative model 314 can include a pre-trained model, which may include a large language model, a vision language model, an image generation model, an audio generation model, and/or other generative models.
  • the plurality’ of candidate model-generated outputs 318 can include a plurality- of candidate predicted responses to the prompt 312.
  • the plurality of candidate predicted responses can be generated based on one or more learned sequences, one or more learned knowledge graphs, one or more learned representations, and/or one or more learned features for token prediction.
  • the plurality of candidate model-generated outputs 318 can include text data, image data, audio data, structure data, latent encoding data, multimodal data, and/or other data.
  • Each of the plurality of candidate model -generated outputs 318 may be associated with content, style, sequence, and/or logic associated with a different content item from the training dataset 316.
  • the plurality of candidate model-generated outputs 318 can be processed with a machine-learned reward model 320 to determine a particular modelgenerated output 326 to provide as the output.
  • the reyvard model 320 can be leveraged to determine a quality score (and/or level) of each of the plurality- of candidate model-generated outputs 318.
  • the particular model-generated output 326 may be determined based on the respective candidate model -generated output having a highest rank and/or highest quality score.
  • the machine-learned reward model 320 may have been trained on qualitysignals associated with one or more content items.
  • the reward model 320 may- have been trained on the training dataset 316.
  • the reward model 320 may have been trained on interaction data associated with previously generated model-generated content items that were published on a yveb page and/or other yveb resource.
  • the plurality of candidate model -generated outputs 318 can include a plurality of candidate response fragments. The particular model-generated output 326 can then be fed back to the generative model 314 to generate a full modelgenerated response.
  • FIG. 4 depicts a block diagram of an example reyvard model training system 400 according to example embodiments of the present disclosure.
  • the reward model training system 400 can be configured to process a content item 402 with a reward model 404 to generate a score 406 (and/or rank), which can then be utilized along with a respective quality signal 408 for the content item 402 to evaluate a loss function for reyvard model 404 training and/or tuning.
  • the reward model training system 400 can obtain a content item 402.
  • the content item 402 may be a pre-existing human-authored content item and/or a model-generated content item.
  • the content item 402 can include a training example from a training dataset.
  • the content item 402 can include an article, a blog, a short story, a conversation, a memoir, an academic paper, a social media post, a tutorial, instructions, an encyclopedia entry, and/or other content item.
  • Figure 6 depicts a flow chart diagram of an example method to perform according to example embodiments of the present disclosure. Although Figure 6 depicts steps performed in a particular order for purposes of illustration and discussion, the methods of the present disclosure are not limited to the particularly illustrated order or arrangement. The various steps of the method 600 can be omitted, rearranged, combined, and/or adapted in various ways without deviating from the scope of the present disclosure.
  • a computing system can obtain a training dataset.
  • the training dataset can include a plurality of content items associated with a plurality of respective web resources and a plurality of quality scores associated with the plurality of respective web resources.
  • the plurality of quality scores can be determined based on incoming links and outgoing links for the plurality' of respective web resources, interaction data, knowledge graph data, resource type, content type, and/or other quality signals.
  • the computing system can obtain a prompt.
  • the prompt may include a query .
  • the prompt may include an image and text descriptive of a question associated with the image.
  • the prompt may be a user-input prompt and/or an automatically generated prompt (e.g., a prompt generated based on a context).
  • FIG. 10 depicts a flowchart of a method 1000 for training one or more machine-learned models according to aspects of the present disclosure.
  • an example machine-learned model can include a generative model (e.g., a large language model, a foundation model, a vision language model, an image generation model, a text-to- image model, an audio generation model, and/or other generative models) and/or a machine- learned reward model.
  • a generative model e.g., a large language model, a foundation model, a vision language model, an image generation model, a text-to- image model, an audio generation model, and/or other generative models
  • a machine- learned reward model e.g., a machine- learned reward model.
  • example method 1000 can include obtaining a training instance.
  • a set of training data can include a plurality of training instances divided between multiple datasets (e.g., a training dataset, a validation dataset, or testing dataset).
  • a training instance can be labeled or unlabeled.
  • runtime inferences can form training instances when a model is trained using an evaluation of the model’s performance on that runtime instance (e.g., online training/leaming).
  • Example data types for the training instance and various tasks associated therewith are described throughout the present disclosure.
  • example method 1000 can include processing, using one or more machine-learned models, the training instance to generate an output.
  • the output can be directly obtained from the one or more machine-learned models or can be a downstream result of a chain of processing operations that includes an output of the one or more machine- learned models.
  • example method 1000 can include updating the machine-learned model using the evaluation signal.
  • values for parameters of the machine-learned model(s) can be learned, in some embodiments, using various training or learning techniques, such as, for example, backwards propagation.
  • the evaluation signal can be backpropagated from the output (or another source of the evaluation signal) through the machine-learned model (s) to update one or more parameters of the model(s) (e.g., based on a gradient of the evaluation signal with respect to the parameter value(s)).
  • system(s) containing one or more machine-learned models can be trained in an end-to-end manner.
  • example method 1000 can be implemented for training a machine-learned model from an initialized state to a fully trained state (e.g., when the model exhibits a desired performance profile, such as based on accuracy, precision, recall, etc.).
  • example method 1000 can be implemented for particular stages of a training procedure.
  • example method 1000 can be implemented for pre-training a machine-learned model.
  • Pre-training can include, for instance, large-scale training over potentially noisy data to achieve a broad base of performance levels across a variety of tasks/data types.
  • example method 1000 can be implemented for fine-tuning a machine-learned model. Fine-tuning can include, for instance, smaller-scale training on higher-quality (e.g., labeled, curated, etc.) data. Fine-tuning can affect all or a portion of the parameters of a machine-learned model.
  • various portions of the machine-learned model can be “frozen” for certain training stages.
  • parameters associated with an embedding space can be “frozen” during fine-tuning (e.g., to retain information learned from a broader domain(s) than present in the fine-tuning dataset(s)).
  • An example fine-tuning approach includes reinforcement learning. Reinforcement learning can be based on user feedback on model performance during use.
  • the machine-learned models can include one or more generative models and/or one or more reward models.
  • the reward models can be trained and/or tuned to be utilized for generative model tuning and/or generative model inference guidance.
  • the systems and methods disclosed herein can include the utilization of intrinsic document quality signals (e.g., linking-based resource ranking and other metadata) in the inference process of a generative model (e.g., a large language model (LLM) foundation model).
  • a generative model e.g., a large language model (LLM) foundation model.
  • a text-based training dataset of a generative model can include the intrinsic document quality signals (e.g., a number of page views) that can help to directly estimate the response value, which can enable better separation of low-value and high-value results.
  • the model can obtain the signals from existing training data along with the intrinsic document quality signals as an additional metadata (e.g., link-based resource ranking, user data-based resource ranking, number of page visits, number/quality of incoming links, etc.) rather than via a separate external process involving human raters.
  • the additional metadata can include reference-based page ranking, model-generated content interaction data, value analytics, and/or task analytics.
  • Reference-based page ranking can be used as an intrinsic value signal for distinguishing two or more machine-learned model responses (e.g.. where a user receives responses from the machine-learned model, that include fragments of web pages that inherently have an associated page rank score and the response that is associated more closely with a page that has a higher page rank score may be preferred and returned to the user).
  • machine-learned model responses e.g. where a user receives responses from the machine-learned model, that include fragments of web pages that inherently have an associated page rank score and the response that is associated more closely with a page that has a higher page rank score may be preferred and returned to the user.
  • the reference-based resource ranking can be utilized for distinguishing two or more machine-learned model responses, where the underlying documents are model-generated (fully or partially) (e.g., the user has generated the document using the generative model and published the document (e.g., on the web, via an email, etc.) and the document quality signals can then be fed back to fine-tune the generative modeTs reward model and guide the generative model towards creating content which is measurably higher quality on subsequent uses).
  • Machine-learned model cloud offerings that provide analytics about the value of the answers given to users may be leveraged for fine-tuning and/or inference (e.g., the reward model may assess answer quality’ without altering the language modeTs responses, offering developers scores that reflect model performance and help compare capabilities when switching models).
  • the quality signals may include task analytics and personalization for users based on their needs from the generative model they interact with.
  • the reward model may prioritize responses associated with resources on topics that the user lacks expertise.
  • the metadata can be extracted from various pipelines (e g., various sources) and can be extended to other available information quality signals (e.g., utility function signals) used in information retrieval.
  • utility functions signals can refer to the text in a how -to tutorial that provides inherent value by explaining how' to solve a specific task, compared to a text that provides a discussion of a problem without reaching a solution.
  • the text-based training dataset can help to create a partial scoring model (e.g., a reward model used in the reinforcement learning from human feedback (RLHF) loop) that helps the machine-learned model to generate better quality results, where the annotation between the data and document quality signals can be established.
  • RLHF human feedback
  • the model (e.g., the generative model and/or the reward model) can split the document into smaller units, translate the document-level signals to paragraph-level signals by preserving the document-level signal and applying the signal to the smaller units, apply the transformation for other metadata (e.g., page ranking) by using a naive approach that takes various connections to the paragraph extracted and citations to the paragraph instead of the entire document score, and incorporate other signals (e.g., assessing the intrinsic difficulty or value of the task being addressed (e.g., organizing a shopping cart for a user having some intrinsic value)).
  • other metadata e.g., page ranking
  • the dataset may be used to generate a transformer encoder/decoder based model architecture using a pre-trained machine-learned model.
  • the machine-learned model may utilize the reward model during the training time and/or may utilize a scoring mechanism directly at the inference stage.
  • the decoder may utilize the trained reward model as a scoring signal that is applied during beam search and/or may apply the signal as a post-processing step when the response is re-ranked using the scoring model.
  • the generative model and/or the reward model may match the response through an embedding look-up to parts of documents that include the responses and can then use an associated (partial) score for the generated (part of) response.
  • the reward model and/or the generative model can revise the learned score using a form of system-level feedback (e.g., the resource ranking and/or other signal data).
  • the reward model may better evaluate the responses through the value, where the user is allowed to select the signal preference.
  • the reward model and/or the generative model may decide to stop if the remaining individual tasks have low' value according to the rew ard model.
  • the one or more generative models can include one or more autoregressive models (e.g., a machine-learned model trained to generate predictive values based on previous behavior data) and/or one or more diffusion models (e g., a machine- learned model trained to generate predicted data based on generating and processing distribution data associated with the input data).
  • autoregressive models e.g., a machine-learned model trained to generate predictive values based on previous behavior data
  • diffusion models e.g., a machine- learned model trained to generate predicted data based on generating and processing distribution data associated with the input data.
  • the one or more generative models can be trained to process input data and generate model-generated content items.
  • the input data and/or model-generated content items may include a plurality of predicted words, pixels, image frame, signals, and/or other data.
  • the model-generated content items may include novel content items that are not the same as any pre-existing work.
  • the one or more generative models 90 can leverage learned representations, sequences, and/or probability distributions to generate the content items, which may include phrases, storylines, settings, objects, characters, beats, lyrics, and/or other aspects that are not included in pre-existing content items.
  • the one or more generative models may include a vision language model.
  • the vision language model can be trained, tuned, and/or configured to process image data and/or text data to generate a natural language output.
  • the vision language model may leverage a pre-trained large language model (e.g., a large autoregressive language model) with one or more encoders (e.g.. one or more image encoders and/or one or more text encoders) to provide detailed natural language outputs that emulate natural language composed by a human.
  • a pre-trained large language model e.g., a large autoregressive language model
  • encoders e.g. one or more image encoders and/or one or more text encoders
  • the vision language model may leverage a pre-trained language model that may then be tuned for multimodality’. Training and/or tuning of the vision language model can include image-text matching, masked-language modeling, multimodal fusing with cross attention, contrastive learning, prefix language model training, and/or other training techniques.
  • the vision language model may be trained to process an image to generate predicted text that is similar to ground truth text data (e.g., a ground truth caption for the image).
  • the vision language model may be trained to replace masked tokens of a natural language template with textual tokens descriptive of features depicted in an input image.
  • the training, tuning, and/or model inference may include multi-layer concatenation of visual and textual embedding features.
  • the vision language model may be trained and/or tuned via jointly learning image embedding and text embedding generation, which may include training and/or tuning a system to map embeddings to a joint feature embedding space that maps text features and image features into a shared embedding space.
  • the joint training may include image-text pair parallel embedding and/or may include triplet training.
  • the images may be utilized and/or processed as prefixes to the language model.
  • Figures 6 - 10 are directed to example methods, the methods disclosed herein can be implemented as operations stored in a non-transitory computer- readable medium that may be executed by one or more processors of a computing system. Moreover, systems, computer-readable media, and/or methods disclosed herein may be implemented additionally and/or alternatively via other devices, components, and/or mediums. Different implementations of the systems, computer-readable media, and/or methods disclosed herein may be compatible with one another.
  • Figure 1 1 is a block diagram of an example processing flow for using machine-learned model(s) 1 to process input(s) 2 to generate output(s) 3.
  • Example neural networks can include feed-forward neural networks, recurrent neural networks (RNNs), including long short-term memory' (LSTM) based recurrent neural networks, convolutional neural networks (CNNs), diffusion models, generative-adversarial networks, or other forms of neural networks.
  • RNNs recurrent neural networks
  • CNNs convolutional neural networks
  • Example neural networks can be deep neural networks.
  • Some example machine-learned models can leverage an attention mechanism such as self-atention.
  • some example machine-learned models can include multiheaded self-attention models.
  • Machine-learned model(s) 1 can include a single or multiple instances of the same model configured to operate on data from input(s) 2.
  • Machine-learned model(s) 1 can include an ensemble of different models that can cooperatively interact to process data from input(s) 2.
  • machine-learned model(s) 1 can employ a mixture-of-experts structure. See, e.g.. Zhou et al., Mixture-of-Experts with Expert Choice Routing, ARXIV:2202.09368V2 (Oct. 14, 2022).
  • assembly code data e.g., low-level programming languages that use symbolic representations of machine code instructions to program a processing unit
  • genetic data or other chemical or biochemical data image data such as pixel values and/or video frames
  • audio data, audiovisual data, haptic data biometric data
  • medical data financial data
  • statistical data geographical data, astronomical data, historical data
  • sensor data generally (e.g., digital or analog values, such as voltage or other absolute or relative level measurement values from a real or artificial input, such as from an audio sensor, light sensor, displacement sensor, etc.), and the like.
  • Data can be raw or processed and can be in any format or schema.
  • Figure 12 is a block diagram of an example implementation of an example machine-learned model configured to process sequences of information.
  • an example implementation of machine-learned model(s) 1 can include machine-learned sequence processing model(s) 4.
  • An example system can pass input(s) 2 to sequence processing model(s) 4.
  • Sequence processing model(s) 4 can include one or more machine- learned components.
  • Sequence processing model(s) 4 can process the data from input(s) 2 to obtain an input sequence 5.
  • Input sequence 5 can include one or more input elements 5-1, 5- 2, . . . , 5-M, etc. obtained from input(s) 2.
  • Sequence processing model 4 can process input sequence 5 using prediction layer(s) 6 to generate an output sequence 7.
  • Output sequence 7 can include one or more output elements 7-1, 7-2, . . . , 7-N. etc. generated based on input sequence 5.
  • the system can generate output(s) 3 based on output sequence 7.
  • Sequence processing model(s) 4 can include one or multiple machine-learned model components configured to ingest, generate, or otherwise reason over sequences of information.
  • some example sequence processing models in the text domain are referred to as “Large Language Models,” or LLMs. See, e.g., PaLM 2 Technical Report, GOOGLE. https://ai.google/static/documents/palm2techreport.pdf (n.d.).
  • Other example sequence processing models can operate in other domains, such as image domains, see, e.g., Dosovitskiy et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ARXIV:2010. 11929V2 (Jun.
  • SentencePiece A simple and language independent subword tokenizer and detokenizer for Neural Text Processing, PROCEEDINGS or THE 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (System Demonstrations), pages 66-71 (October 31-November 4. 2018), https://aclanthology.org/Dl 8-2012.pdf.
  • Image-based input source(s) can be tokenized by extracting and serializing patches from an image.
  • Prediction layer(s) 6 can predict one or more output elements 7-1, 7-2, . . . , 7- N based on the input elements.
  • Prediction layer(s) 6 can include one or more machine-learned model architectures, such as one or more layers of learned parameters that manipulate and transform the input(s) to extract higher-order meaning from, and relationships between, input element(s) 5-1, 5-2, . . . , 5-M. In this manner, for instance, example prediction layer(s) 6 can predict new output element(s) in view of the context provided by input sequence 5.
  • Prediction layer(s) 6 can evaluate associations between portions of input sequence 5 and a particular output element. These associations can inform a prediction of the likelihood that a particular output follows the input context. For example, consider the textual snippet, “The carpenter’s toolbox was small and heavy'. It was full of .” Example prediction layer(s) 6 can identify that “It” refers back to “toolbox” by determining a relationship between the respective embeddings. Example prediction layer(s) 6 can also link “It” to the attributes of the toolbox, such as “small” and “heavy 7 .” Based on these associations, prediction layer(s) 6 can, for instance, assign a higher probability to the word “nails” than to the word “sawdust.”
  • a transformer is an example architecture that can be used in prediction layer(s) 4. See, e.g, Vaswani et al.. Attention Is All You Need, ARXlV:1706.03762v7 (Aug. 2, 2023).
  • a transformer is an example of a machine-learned model architecture that uses an attention mechanism to compute associations between items within a context w indow.
  • the context window can include a sequence that contains input sequence 5 and potentially one or more output element(s) 7-1, 7-2, . . . , 7-N.
  • a transformer block can include one or more attention layer(s) and one or more post-attention layer(s) (e.g., feedforward layer(s), such as a multi-layer perceptron).
  • Prediction layer(s) 6 can include other machine-learned model architectures in addition to or in lieu of transformer-based architectures. For example, recurrent neural networks (RNNs) and long short-term memory (LSTM) models can also be used, as w 7 ell as convolutional neural networks (CNNs). In general, prediction layer(s) 6 can leverage various kinds of artificial neural networks that can understand or generate sequences of information.
  • RNNs recurrent neural networks
  • LSTM long short-term memory
  • CNNs convolutional neural networks
  • prediction layer(s) 6 can leverage various kinds of artificial neural networks that can understand or generate sequences of information.
  • Output sequence 7 can include or otherwise represent the same or different data types as input sequence 5. For instance, input sequence 5 can represent textual data, and output sequence 7 can represent textual data.
  • Output sequence 7 can also be generated non-autoregressively. For instance, multiple output elements of output sequence 7 can be predicted together without explicit sequential conditioning on each other. See, e.g., Saharia et al., Non-Autoregressive Machine Translation with Latent Alignments, ARXIV:2004.07437V3 (Nov. 16, 2020).
  • Output sequence 7 can include one or multiple portions or elements.
  • output sequence 7 can include multiple elements corresponding to multiple portions of a generated output sequence (e.g., a textual sentence, values of a discretized waveform, computer code. etc.).
  • output sequence 7 can include a single element associated with a classification output.
  • an output “vocabulary” can include a set of classes into which an input sequence is to be classified.
  • a vision transformer block can pass latent state information to a multilayer perceptron that outputs a likely class value associated with an input image.
  • elements 8-0. . . . , 8-9 can indicate particular locations within a multidimensional embedding space. Some elements can map to a set of discrete locations in the embedding space. For instance, elements that correspond to discrete members of a predetermined vocabulary of tokens can map to discrete locations in the embedding space that are associated with those tokens. Other elements can be continuously distributed across the embedding space. For instance, some datatypes can be broken down into continuously defined portions (e.g., image patches) that can be described using continuously distributed locations within the embedding space.
  • the expressive power of the embedding space may not be limited to meanings associated with any particular set of tokens or other building blocks.
  • a continuous embedding space can encode a spectrum of high-order information.
  • An individual piece of information e.g., a token
  • An individual piece of information can map to a particular point in that space: for instance, a token for the word ‘‘dog’' can be projected to an embedded value that points to a particular location in the embedding space associated with canine-related information.
  • an image patch of an image of a dog on grass can also be projected into the embedding space.
  • the input value can be provided as a data type associated with an input modality and projected along with that input modality (e.g., the input value can be a textual task label that is embedded along with other textual data in the input; the input value can be a pixel-based representation of a task that is embedded along with other image data in the input; etc.).
  • the input value can be provided as a data type that differs from or is at least independent from other input(s). For instance, the input value represented by element 8-0 can be learned within a continuous embedding space.
  • the task is a generative task
  • machine-learned model(s) 1 can be configured to output content generated in view of input(s) 2.
  • input(s) 2 can be or otherwise represent data of one or more modalities that encodes context for generating additional content.
  • the task can be an instruction following task.
  • Machine-learned model(s) 1 can be configured to process input(s) 2 that represent instructions to perform a function and to generate output(s) 3 that advance a goal of satisfying the instruction function (e.g., at least a step of a multi-step procedure to perform the function).
  • Output(s) 3 can represent data of the same or of a different modality as input(s) 2.
  • input(s) 2 can represent textual data (e.g., natural language instructions for a task to be performed) and machine-learned model(s) 1 can process input(s) 2 to generate output(s) 3 that represent textual data responsive to the instructions (e.g., natural language responses, programming language responses, machine language responses, etc.).
  • Input(s) 2 can represent image data (e.g., image-based instructions for a task to be performed, optionally accompanied by textual instructions) and machine-learned model(s) 1 can process input(s) 2 to generate output(s) 3 that represent textual data responsive to the instructions (e.g., natural language responses, programming language responses, machine language responses, etc.).
  • the task can be an audio generation task.
  • Machine- learned model(s) 1 can be configured to process input(s) 2 that represent context regarding a desired portion of audio content.
  • the context can include text data, image data, audio data, etc.
  • Machine-learned model(s) 1 can be configured to generate output(s) 3 that represent audio data related to the context.
  • machine-learned model(s) 1 can be configured to generate waveform data in the form of an image (e.g., a spectrogram). Values for channel(s) associated with pixels of the image can be selected based on the context.
  • Machine- learned model(s) 1 can be configured to generate waveform data in the form of a sequence of discrete samples of a continuous waveform. Values of the sequence can be selected based on the context (e.g., based on a probability determined based on the context).
  • Figure 17 is a block diagram of an example networked computing system that can perform aspects of example implementations of the present disclosure.
  • the system can include a number of computing devices and systems that are communicatively coupled over a network 49.
  • An example computing device 50 is described to provide an example of a computing device that can perform any aspect of the present disclosure (e.g., implementing model host 31. chent(s) 32, or both).
  • An example server computing system 60 is described as an example of a server computing system that can perform any aspect of the present disclosure (e.g., implementing model host 31, client(s) 32, or both).
  • Server computing system 60 can store or otherwise include one or more machine-learned models 65.
  • Machine-learned model(s) 65 can be the same as or different from machine-learned model(s) 55.
  • Machine-learned models 65 can include one or more machine-learned model(s) 1. such as a sequence processing model 4.
  • Machine-learned models 65 can include one or multiple model instance(s) 31-1.
  • Machine-learned model(s) 65 can be received from computing device 50, model development platform system 70, third party system(s) 80. or developed locally on server computing system(s) 60.
  • Machine-learned model(s) 65 can be loaded into memory 62 and used or otherwise implemented by processor(s) 61.
  • Server computing system(s) 60 can implement multiple parallel instances of machine-learned model(s) 65.
  • machine-learned models 65 can be included in or otherwise stored and implemented by server computing system 60 to establish a client-server relationship with computing device 50 for serving model inferences.
  • server computing system(s) 60 can implement model host 31 on behalf of client(s) 32 on computing device 50.
  • machine-learned models 65 can be implemented by server computing system 60 as a portion of a web service (e.g., remote machine-learned model hosting service, such as an online interface for performing machine-learned model operations over a network on server computing system(s) 60).
  • server computing system(s) 60 can communicate with computing device 50 over a local intranet or internet connection.
  • Third-party system(s) 80 can include one or more processors 81 and a memoiy 82.
  • Processor(s) 81 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, an FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected.
  • Memory 82 can include one or more non-transitory computer-readable storage media, such as HBM, RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof.
  • Memory 82 can store data 83 and instructions 84 which can be executed by processor(s) 81 to cause third-party system(s) 80 to perform operations.
  • the operations can implement any one or multiple features described herein.
  • the operations can implement example methods and techniques described herein.
  • Example operations include the functionality described herein with respect to tools and other external resources called when training or performing inference with machine-learned model(s) 1, 4, 16, 20, 55, 65, etc. (e.g., third-party resource(s) 85).
  • Figure 17 illustrates one example arrangement of computing systems that can be used to implement the present disclosure.
  • computing system 50 or server computing system(s) 60 can implement all or a portion of the operations of model development platform system 70.
  • computing system 50 or server computing system(s) 60 can implement developer tool(s) 75 (or extensions thereof) to develop, update/train, or refine machine-learned models 1, 4, 16, 20, 55, 65, etc. using one or more techniques described herein with respect to model alignment toolkit 17.
  • computing system 50 or server computing system(s) 60 can develop, update/train, or refine machine-learned models based on local datasets (e.g., for model personalization/customization, as permitted by user data preference selections).
  • the central intelligence layer can include a number of machine-learned models. For example, as illustrated in Figure 19, a respective machine-learned model can be provided for each application and managed by the central intelligence layer. In other implementations, two or more applications can share a single machine-learned model. For example, in some implementations, the central intelligence layer can provide a single model for all of the applications. In some implementations, the central intelligence layer is included within or otherwise implemented by an operating system of computing device 99.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Systems and methods for model tuning and inference can include leveraging quality signals for determining resources to be more closely utilized during training and inference. The systems and methods can include obtaining a training dataset that includes a plurality of content items associated with a plurality of resources. The systems and methods can determine a plurality of quality scores for the plurality of resources. The systems and methods may then tune the parameters of the generative model based on the plurality of content items and the plurality of quality scores. The plurality of quality scores may be leveraged to train a reward model to be utilized for model-generated output selection.

Description

GENERATIVE MODEL TUNING AND INFERENCE UTILIZING QUALITY SIGNALS
FIELD
[0001] The present disclosure relates generally to machine learning processes and machine-learned devices and systems. More particularly, the present disclosure relates to leveraging quality signals associated with training resources to prioritize particular resources during training and model inferences, which may include training and leveraging a machine- learned reward model.
BACKGROUND
[0002] A computer can receive input(s). The computer can execute instructions to process the input(s) to generate output(s) using a parameterized model. The computer can obtain feedback on its performance in generating the outputs with the model. The computer can generate feedback by evaluating its performance. The computer can receive feedback from an external source. The computer can update parameters of the model based on the feedback to improve its performance. In this manner, the computer can iteratively ‘'learn7’ to generate the desired outputs. The resulting model is often referred to as a machine-learned model.
[0003] Foundation models can be used for a wide variety of tasks. The general foundation models can generate outputs for a plurality of different tasks; however, the outputs may be of low quality. The foundation models can include generative models that may generate outputs via next token prediction. The generative models may be trained on training datasets that include variance in the quality of training examples.
[0004] Reinforcement learning from human feedback can be utilized to further train and/or tune based on asking humans to rate preferred responses. The reinforcement learning from human feedback can increase perceived quality; however, the training loop may be dependent on requesting feedback, which may further propagate human biases.
SUMMARY
[0005] Aspects and advantages of embodiments of the present disclosure will be set forth in part in the following description, or can be learned from the description, or can be learned through practice of the embodiments.
[0006] One example aspect of the present disclosure is directed to a computer- implemented method. The method can include obtaining, by a computing system including one or more processors, a content dataset. The content dataset can include a plurality of content items associated with a plurality of respective resources and a plurality of qualityscores associated with the plurality of respective resources. The plurality of quality scores can be descriptive of a quality of a respective resource as a search result. The method can include processing, by the computing system, a prompt with a generative model to generate a plurality of candidate model-generated responses. In some implementations, the prompt can include a request for information. The method can include determining, by the computing system and based on the content dataset, a subset of the content items of the plurality of content items that are associated with at least a subset of the plurality of candidate modelgenerated responses. The method can include determining, by the computing system and based on a set of respective quality scores associated with the subset of the content items, a particular model -generated response of the plurality of candidate model-generated responses to provide as an output. The method can include providing, by the computing system, the particular model-generated response as an output response to the prompt.
[0007] In some implementations, determining the particular model-generated response of the plurality of candidate model-generated responses to provide as an output can include determining, by the computing system and based on the content dataset, the particular model-generated response of the plurality of candidate model-generated responses is associated with a respective content item of the subset of content items with a respective quality score greater than the other quality scores of the set of respective quality scores associated with other content items of the subset of content items associated with a set of other candidate model-generated responses. The content dataset can include a plurality of pre-existing content items published on the internet. The generative model can include a pretrained autoregressive language model. The particular model-generated response can include a natural language response to the prompt.
[0008] In some implementations, the method can include selecting, by the computing system, a set of the content items for inclusion in a training dataset based on the plurality of quality scores; training, by the computing system, a machine-learned reward model using the training dataset; and storing, by the computing system, the machine-learned reward model and the generative model. The method can include obtaining, by the computing system, a plurality of interaction datasets associated with a plurality of additional resources. In some implementations, the plurality of additional resources can include a plurality of modelgenerated content items that were previously generated with the generative model. The plurality of interaction datasets can be descriptive of respective interactions with the plurality of additional resources by a plurality of users. The method can include training, by the computing system, the machine-learned reward model based on the plurality of interaction datasets and the plurality of model-generated content items.
[0009] In some implementations, the method can include processing, by the computing system, a second prompt with the generative model to generate a plurality of model-generated fragments. The plurality of model -generated fragments can include a plurality of different candidate responses to the second prompt. The method can include processing, by the computing system, the plurality of model-generated fragments with the machine-learned reward model to generate a plurality7 of respective scores. The plurality of respective scores can be associated with evaluating a quality7 of the plurality7 of modelgenerated fragments. The method can include providing, by the computing system, a particular model-generated fragment of the plurality of model-generated fragments for display based on the plurality of respective scores.
[0010] In some implementations, the plurality of quality7 scores may have been determined by processing the plurality of content items and plurality of respective metadata sets associated with the plurality of respective resources with a ranking engine to generate a plurality of ranking scores. The ranking engine can be associated with a search engine. The ranking engine can be configured to rank resources to determine particular resources to provide as search results. In some implementations, the plurality of quality scores may have been determined based on incoming links and outgoing links for the plurality of respective resources. The prompt may include multimodal data. The multimodal data can include an image and text descriptive of a question associated with the image. The particular modelgenerated response may include a predicted answer responsive to the question that is based on one or more image features of the image. In some implementations, the particular modelgenerated response may include an augmented image. The augmented image may include one or more annotations responsive to the question. The plurality of quality7 scores may have been determined based on a quantity7 and quality7 of incoming links for the plurality7 of respective resources.
[0011] Another example aspect of the present disclosure is directed to a computing system for parameter adjustment. The system can include one or more processors and one or more non-transitory computer-readable media that collectively store instructions that, when executed by the one or more processors, cause the computing system to perform operations. The operations can include obtaining a training dataset. The training dataset can include a plurality of content items associated with a plurality of respective web resources and a plurality of quality scores associated with the plurality of respective web resources. In some implementations, the plurality of quality scores can be determined based on incoming links and outgoing links for the plurality of respective web resources. The operations can include processing a prompt with a generative model to generate a plurality' of probabilities associated with a plurality of candidate model outputs. The operations can include determining a first ground truth example from the training dataset is associated with a first candidate model output of the plurality of candidate model outputs and a second ground truth example from the training dataset is associated with a second candidate model output of the plurality of candidate model outputs. The operations can include determining the first ground truth example is associated with a first web resource with a higher quality score than a second web resource associated with the second ground truth example. The operations can include evaluating a loss function that evaluates a difference between a first probability associated with the first candidate model output and a second probability' associated with the second candidate model output and adjusting one or more parameters of the generative model based on the loss function.
[0012] In some implementations, the operations can include obtaining input data. The input data can be descriptive of a user prompt. The operations can include processing the input data with the generative model to generate a model-generated response. The modelgenerated response can be responsive to the user prompt. The operations can include providing the model-generated response as an output. In some implementations, the user prompt can include a natural language question. The model -generated response can include a plurality of predicted words responsive to the question. The model-generated response can include a sequence of words that differs from the plurality' of content items. In some implementations, the user prompt can include multimodal data. The user prompt can include an image and a question associated with the image. The plurality of quality scores can be determined based on an amount of references to a respective web resource within other resources. In some implementations, the plurality of quality scores can be determined based on how the respective web resource is referenced.
[0013] Another example aspect of the present disclosure is directed to one or more non-transitory computer-readable media that collectively store instructions that, when executed by one or more computing devices, cause the one or more computing devices to perform operations. The operations can include obtaining a training dataset. The training dataset can include a plurality of content items associated with a plurality of respective web resources and a plurality of quality scores associated with the plurality of respective web resources. In some implementations, the plurality of quality scores can be determined based on incoming links and outgoing links for the plurality of respective web resources. The operations can include training a machine-learned reward model on the training dataset. The machine-learned reward model can be trained to rank a set of data based on a determined quality score. The operations can include obtaining a prompt and processing the prompt with a generative model to generate a plurality of model -generated fragments. The plurality of model-generated fragments can include a plurality of different candidate responses to the prompt. The operations can include processing the plurality of model -generated fragments with the machine-learned reward model to generate a plurality of respective scores. In some implementations, the plurality of respective scores can be associated with evaluating a quality of the plurality of model-generated fragments. The operations can include providing a particular model-generated fragment of the plurality’ of model-generated fragments as an output based on the plurality of respective scores.
[0014] In some implementations, the operations can include obtaining a plurality of interaction datasets associated with a plurality of additional resources. The plurality of additional resources can include a plurality of model-generated content items that were previously generated with the generative model. In some implementations, the plurality of interaction datasets can be descriptive of respective interactions with the plurality of additional resources by a plurality of users. The operations can include adjusting one or more parameters of the machine-learned reward model based on the plurality of interaction datasets and the plurality of model-generated content items.
[0015] In some implementations, providing the particular model -generated fragment of the plurality of model-generated fragments as the output based on the plurality of respective scores can include determining the particular model-generated fragment is associated with a particular respective score that is higher than a plurality of other respective scores. The operations can include processing the prompt with an additional generative model to generate a plurality’ of additional model-generated fragments, processing the plurality of additional model-generated fragments with the machine-learned reward model to generate a plurality of additional scores, determining the additional generative model is associated with a task of the prompt based on the plurality of respective scores and the plurality of additional scores, and generating a long-form model-generated response with the additional generative model.
[0016] Another example aspect of the present disclosure is directed to a computer- implemented method for tuning a machine-learned model. The method can include obtaining, by a computing system including one or more processors, a training dataset. The training dataset can include a plurality of content items associated with a plurality of respective resources. The method can include determining, by the computing system, a plurality of quality scores associated with the plurality of respective resources. In some implementations, the plurality of quality scores can be determined based on incoming links and outgoing links for the plurality of respective resources. The method can include obtaining, by the computing system, a prompt and processing, by the computing system, the prompt with a generative model to generate a model-generated output responsive to the prompt. The method can include determining, by the computing system, a ground truth example from the training dataset based on the plurality of quality scores and evaluating, by the computing system, a loss function that evaluates a difference between the model-generated output and the ground truth example. The method can include adjusting, by the computing system, one or more parameters of the generative model based on the loss function.
[0017] In some implementations, determining, by the computing system, the ground truth example from the training dataset based on the plurality of quality scores can include determining, by the computing system, a first ground truth example from the training dataset is associated with the model-generated output and determining, by the computing system, a second ground truth example from the training dataset is associated with the model-generated output. Determining, by the computing system, the ground truth example from the training dataset based on the plurality of quality scores can include determining, by the computing system, a first score of the plurality of quality scores is associated with the first ground truth example and determining, by the computing system, a second score of the plurality of quality scores is associated with the second ground truth example. In some implementations, determining, by the computing system, the ground truth example from the training dataset based on the plurality of quality scores can include determining, by the computing system, the ground truth example from the first ground truth example and the second ground truth example based on the first score and the second score.
[0018] In some implementations, the method can include obtaining, by the computing system, a plurality of interaction datasets associated with a plurality of additional resources. The plurality of additional resources can include a plurality of model-generated content items that were previously generated with the generative model. The plurality of interaction datasets can be descriptive of respective interactions with the plurality of additional resources by a plurality of users. The method can include training, by the computing system, a machine-learned reward model based on the plurality of interaction datasets and the plurality of model-generated content items and storing, by the computing system, the machine-learned reward model and the generative model. The reward model can be further trained on the training dataset and the plurality of scores.
[0019] In some implementations, the method can include processing, by the computing system, a second prompt with the generative model to generate a plurality of model-generated fragments. The plurality of model -generated fragments can include a plurality of different candidate responses to the second prompt. The method can include processing, by the computing system, the plurality of model-generated fragments with the machine-learned reward model to generate a plurality7 of respective scores. The plurality of respective scores can be associated with evaluating a quality7 of the plurality7 of modelgenerated fragments. The method can include providing, by the computing system, a particular model-generated fragment of the plurality7 of model-generated fragments as an output based on the plurality7 of respective scores.
[0020] In some implementations, the method can include obtaining, by the computing system, an additional prompt and processing, by the computing system, the additional prompt wi th the generative model to generate an additional model-generated output. The method can include processing, by the computing system, the model-generated output with the machine- learned reward model to generate an output score and adjusting, by the computing system, one or more parameters of the generative model based on the output score.
[0021] In some implementations, the generative model can include a pretrained autoregressive language model. The generative model may have been pretrained on a plurality of textual content items associated with the plurality7 of respective resources. In some implementations, the generative model can include an image generation model. The image generation model may have been trained for generation based on a plurality of images from the plurality of content items. The image generation model can include a diffusion model.
[0022] Other aspects of the present disclosure are directed to various systems, apparatuses, non-transitory computer-readable media, user interfaces, and electronic devices. [0023] These and other features, aspects, and advantages of various embodiments of the present disclosure will become better understood with reference to the following description and appended claims. The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate example embodiments of the present disclosure and. together with the description, serve to explain the related principles. BRIEF DESCRIPTION OF THE DRAWINGS
[0024] Detailed discussion of embodiments directed to one of ordinary skill in the art is set forth in the specification, which makes reference to the appended figures, in which: [0025] Figure 1 depicts a block diagram of an example content generation system according to example embodiments of the present disclosure.
[0026] Figure 2 depicts a block diagram of an example generative model tuning system according to example embodiments of the present disclosure.
[0027] Figure 3 depicts a block diagram of an example reward model-based content generation according to example embodiments of the present disclosure.
[0028] Figure 4 depicts a block diagram of an example reward model training system according to example embodiments of the present disclosure.
[0029] Figure 5 depicts a block diagram of an example generative model selection system according to example embodiments of the present disclosure.
[0030] Figure 6 depicts a flow chart diagram of an example method to perform generative model inference based on quality signals according to example embodiments of the present disclosure.
[0031] Figure 7 depicts a flow chart diagram of an example method to perform resource-based tuning according to example embodiments of the present disclosure.
[0032] Figure 8 depicts a flow chart diagram of an example method to perform model inference according to example embodiments of the present disclosure.
[0033] Figure 9 depicts a flow chart diagram of an example method to perform generative model tuning according to example embodiments of the present disclosure.
[0034] Figure 10 is a flow chart diagram illustrating an example method for training a machine-learned model according to example implementations of aspects of the present disclosure.
[0035] Figure 11 is a block diagram of an example processing flow for using machine-learned model(s) to process input(s) to generate output(s) according to example implementations of aspects of the present disclosure.
[0036] Figure 12 is a block diagram of an example sequence processing model according to example implementations of aspects of the present disclosure.
[0037] Figure 13 is a block diagram of an example technique for populating an example input sequence for processing by a sequence processing model according to example implementations of aspects of the present disclosure. [0038] Figure 14 is a block diagram of an example model development platform according to example implementations of aspects of the present disclosure.
[0039] Figure 15 is a block diagram of an example training workflow for training a machine-learned model according to example implementations of aspects of the present disclosure.
[0040] Figure 16 is a block diagram of an inference system for operating one or more machine-learned model(s) to perform inference according to example implementations of aspects of the present disclosure.
[0041] Figure 17 is a block diagram of an example networked computing system according to example implementations of aspects of the present disclosure.
[0042] Figure 18 is a block diagram of an example computing device according to example implementations of aspects of the present disclosure.
[0043] Figure 19 is a block diagram of an example computing device according to example implementations of aspects of the present disclosure.
[0044] Reference numerals that are repeated across plural figures are intended to identify the same features in various implementations.
DETAILED DESCRIPTION
[0045] Generally, the present disclosure is directed to systems and methods for generative model tuning and/or generative model output generation based on signals associated with resources of a training dataset. The signals can include quality signals that may include a resource rank (e.g., a page rank based on a number and/or qualify of links to and/or from the specific resource), interaction data (e.g., view traffic, view times, selection instances, etc.), content type, and/or resource type (e.g.. scholastic versus social media). In particular, the systems and methods disclosed herein can leverage the qualify signals for tuning the generative model, selecting a particular candidate output, and/or model selection. The systems and methods may train and/or utilize an additional machine-learned model (e.g., a machine-learned reward model). For example, the systems and methods may leverage the additional machine-learned model (e.g.. the machine-learned reward model) to determine particular training examples to utilize for a particular training instance, to determine which candidate model-generated output to select based on a determined qualify7, and/or to determine which generative model to utilize for a particular data generation task.
[0046] The systems and methods can obtain a training dataset. The training dataset can include a plurality of content items associated with a plurality of respective resources. The systems and methods can determine a plurality of quality scores associated with the plurality of respective resources. The plurality of quality scores can be determined based on incoming links and outgoing links for the plurality of respective web resources. The plurality of quality scores may be obtained through large-scale processing of the corpus where relationships between documents are inferred (e.g., processing which may include (1) through incoming/outgoing links to respective web resources, (2) through information present in multiple training examples, (3) through meta-quality signals present in the training corpus, such as information about one part of piece of info being incorrect, and/or another being correct, and/or (4) through human action present within the corpus, such as content being identified as shared, etc.). The systems and methods can then obtain a prompt. The systems and methods can process the prompt with a generative model to generate a model-generated output responsive to the prompt. The systems and methods can determine a ground truth example from the training dataset based on the plurality of quality scores. The systems and methods may then evaluate a loss function that evaluates a difference between the modelgenerated output and the ground truth example and adjust one or more parameters of the generative model based on the loss function.
[0047] Alternatively and/or additionally, the systems and methods may leverage the quality7 signals for model inference. For example, the generative model may process the prompt to generate a plurality of candidate model-generated outputs. One or more first candidate model-generated outputs may be associated with a first content item of a first resource, and one or more second candidate model-generated outputs may be associated with a second content item of a second resource. The association may be determined based on the sequence prediction being similar to a sequence within the respective content item. The first content item may be associated with a first quality score, and the second content item may be associated with a second quality score. The systems and methods can determine the first score is descriptive of a higher quality resource than the resource associated with the second score. Based on the first score being associated with a higher quality7 than the second score, the systems and methods may select the one or more first candidate model-generated outputs as the model-generated response to the prompt. In some implementations, the plurality of candidate model-generated outputs may include candidate response fragments. The selected candidate response fragment can then be fed back into the generative model to generate a full-length model-generated response.
[0048] In some implementations, the systems and methods can include obtaining an additional machine-learned model (e.g., a machine-learned reward model) that may be part of and/or separate from the generative model. The additional machine-learned model (e.g., the machine-learned reward model) may be trained on the training dataset and the plurality’ of quality scores to train the machine-learned reward model to generate quality scores for the model-generated outputs to identify candidate model -generated outputs that are of high quality. Additionally and/or alternatively, the machine-learned reward model may be trained and/or tuned on interact on data associated with previously generated model-generated outputs. For example, the view data, web traffic, reviews, and/or other interaction data associated with previously generated model-generated outputs can be leveraged to train and/or tune the machine-learned reward model to identify which model-generated outputs are of high quality and/or more likely to receive interactions. The trained machine-learned reward model can then be leveraged for generative model tuning and/or for candidate modelgenerated output selection. In some implementations, the machine-learned reward model can be utilized to determine a particular generative model of a plurality of candidate generative models to utilize for a particular task. The machine-learned reward model can include one or more models that output pointwise or pairwise scores. The machine-learned reward model can be implemented via a plurality of different architectures, sizes, weights, and/or configurations.
[0049] Large foundation models can be used for various tasks, such as creative writing, summarization, coding. The foundation models may be utilized in increasingly complex tasks where the overall goal might involve solving multiple subtasks in order to fulfill a complex query’ from a user.
[0050] The output of such models may be obtained through a decoding process (e.g., beam search, greedy) that takes into account the likelihood of the next token, trained in an unsupervised manner from a large corpus of content available (e.g. on the web). However, the process may include a lot of variability in the quality of the training data, which can lead to the models sometimes generating low quality outputs.
[0051] Approaches such as reinforcement learning from a human feedback loop mayaddress the low quality problem by asking humans to rate preferred responses. However, the reinforcement learning from human feedback (RLHF) may tie the decoding to the perceived quality and not the inherent quality of the content or the actual value created (e.g., how often was a document viewed, how authoritative is it, etc.). The systems and methods disclosed herein can leverage quality signals such as page rank, and/or intrinsic task value.
[0052] The systems and methods disclosed herein can leverage quality signals (e.g., reference data, interaction data, and/or type classification) to tune and/or guide a generative model to generate high quality outputs, which may include solving multiple subtasks in order to fulfill a complex query. The quality signals can include determined quality scores, which may be determined by evaluating a quantity and/or quality of incoming and outgoing references to a resource associated with a content item training example. For example, resources with more incoming links than outgoing links may have a determined higher quality score than resources with less incoming links than outgoing links. Additionally and/or alternatively, citations in academic papers and/or explicit references may be weighted more heavily than a link at the bottom of a web page.
[0053] The use of the quality signals can eliminate and/or mitigate the reliance on additional training data. In particular, a smaller training dataset can be leveraged for tuning and/or model inference for the same and/or better quality than much larger datasets. Additionally and/or alternatively, a particular model-generated fragment of a plurality of candidate model-generated fragments can be determined before a full-response is generated, which may then be leveraged for generating the full-response to reduce the computational cost of generating a plurality of full responses.
[0054] The systems and methods can be utilized for large language models (e.g.. an autoregressive language model), image generation models (e.g., text-to-image generation models (e.g., a diffusion model), audio generation models (e.g., a song generation model), multimodal data generation model, and/or other generative models. The systems and methods can be utilized to tune pre-trained generative models, to determine which of a plurality of model-generated datasets to utilize as the output, and/or to determine which generative model to utilize.
[0055] Large language models can be trained on enormous datasets where the loss function can optimize the next token prediction and/or variations of the next token prediction, such as corrupting text and predicting the corruption gaps. After the initial pretraining, taskspecific fine-tuning can be performed using reinforcement learning and a trained reward model on human preferences. The reinforcement learning and/or the trained reward model can guide the decoding into outputting preferred answers by humans. However, not only does the human feedback based learning bias towards certain characteristics of responses not necessarily relevant to the task, but the reinforcement learning from human feedback may not correlate with the actual value that the response creates.
[0056] The systems and methods disclosed herein can leverage signals that more directly estimate the response value, which may enable better separation of low-value and high-value results. The signals can be obtained from existing training data and/or from additional metadata, rather than via a separate external process involving human raters.
[0057] For example, a search ranking engine may be utilized as an intrinsic value signal for distinguishing two or more model-generated outputs (e.g., two or more large language model (LLM) responses). The search ranking engine may rank web pages and/or other resources by evaluating the number and quality of links to a web page and/or other resource. The search ranking engine may prioritize resources (e.g., resources associated with search results) that receive more links from other web pages and/or other resources. In some implementations, model-generated outputs (e.g., LLM responses) may include fragments of pages that inherently have an associated page rank score. In some implementations, the user may be presented with the answer which is most likely found in data. The systems and methods may have the response that is associated more closely with a page that has a higher page rank score be preferred and returned to the user.
[0058] In another example, the underlying training/tuning documents may be generative model generated (e.g., LLM generated (fully or partially)). A user can have a content item (e.g., a document) generated using a generative model (e.g.. an LLM) and may publish the content item (e.g. publish a document on the web, via an email, etc ). There may be value signals that can be derived from the document over time. For instance, the number of page views may be indicative of the quality. The signals can then be fed back to fine tune the generative model’s (e.g.. the LLM’s) reward model and can guide the reward model towards creating content which is measurably higher quality on subsequent uses.
[0059] In some implementations, LLM cloud offerings can provide analytics about the value of the answers given to users. For example, without modifying the decoding process to change the generative model outputs (e.g., the language model outputs), the reward model (e.g., a value estimation model) can be used to inspect the quality of the answers generated and provide some score to developers. The reward model and/or the generative model may be used for analyzing conversations and identifying average values a model generates. When switching between models (e.g., a language model trained for multilinguality, reasoning, and/or coding versus a foundation model trained for providing access to machine-learned model capabilities), developers may observe a change in the score depending on the model capabilities to solve tasks asked by their user bases.
[0060] In some implementations, the systems and methods can perform task analytics and personalization for users based on their needs from the generative model (e.g., an LLM) they interact with. Generative model cloud offerings may be able to provide, in addition to input/output token costs, task specific costs based on how much value these bring to the user. In some implementations, the systems and methods may determine the skills of a user, which can then be leveraged to determine when to provide generative model aid and/or suggestions. For example, there may be users better skilled at math than at computer science. For these users, solving better computer science problems may bring higher value. The experienced based generative model usage can be used in an interactive human-in-the-loop scenario, where the model proactively offers to solve subtasks according to their perceived value for a given user in a given context.
[0061] The systems and methods can include a pretraining procedure and/or decoding changes. For example, the text-based training datasets that are normally for generative models (e.g.. LLMs) can be extended to incorporate intrinsic document quality signals as additional metadata (e.g., this can include reference data, user data, number of page visits, number/quality of incoming links, etc.). The data can be extracted from various pipelines and can be extended to any other information quality' signals that are available and used in information (for example content item) retrieval. In particular, corpuses with (query, answer) pairs may benefit from quality metrics conditioned on the query'. In some implementations, there may be a source of signal that relates to some utility functions. For example, the text in a how-to tutorial may provide inherent value by explaining how to solve a specific task, compared to a text which provides a discussion of a problem without reaching a solution. [0062] The training data at the previous step can then be used to create a partial scoring model (e.g., as a reward model used in a reinforcement learning loop). The annotation between training data and document quality signals can be leveraged for tuning and inferences. Documents may be split into smaller units, for example sentences or paragraphs since this may be the level at which the reward model may operate at and/or be trained on. The signal can be translated from a document-level signal to a paragraph-level signal. This can be done naively by preserving the document-level signal and applying to the smaller units. Alternatively and/or additionally, a transformation can be applied (e.g., for page rank, a naive approach may be to take various connections to the paragraph extracted and citations to it instead of the entire document score). Additional signals may then be used here (e.g., whether the task being solved has some universal notion of how difficult the task is to solve (the difficulty may be self-inferred by the number of actions needed to solved it (or total compute used), which may be tied to a model capacity) or how much value the task brings (e.g. organizing the shopping cart for a user will have some intrinsic value)). The model architecture may include a transformer encoder/ decoder (and/or other variants). [0063] The next step can include either using the reward model at training time (e.g., through reinforcement learning) and/or alternatively at inference time directly through a scoring mechanism. At inference time, the decoder can either take the trained reward model as a scoring signal which can be applied during beam search (however, the scoring may not be applied per-token, rather per-sentence and/or alternatively a chosen granularity) and/or, may apply the scoring as a post-processing step when the response is re-ranked using the scoring model. In some implementations, there may be a third approach, where parts of the model-generated response may be matched through an embedding look-up to parts of documents containing responses and then an associated (partial) score can be used for the generated (part of) the response. Additionally and/or alternatively, the reward model training can include a training dataset preprocessing technique, where the resulting training dataset (with scores for document fragments) can be the one used for training the reward model. In some implementations, the training may include direct preference optimization, which may avoid reinforcement learning and/or reward models, while still utilizing the same training corpus.
[0064] In some implementations, revising the learned score may rely on a systemlevel feedback. As page rank (and/or any other signal) may be subject to change, reward models may follow the drift and be updated regularly. From a user interface perspective, the model may be able to better evaluate the model-generated responses through the lens of such a value signal. The underlying controls may be exposed to users enabling the user to select preferences around which type of signal to prefer more.
[0065] In some implementations, when solving the subtasks recursively is expensive, the model may decide to stop early if all individual tasks remaining are low-value according to the reward model (e.g., the value estimation model). The user may obtain benefit from the first subtasks solved that are more valuable in nature and may be able to piece together the remaining solutions easily. For example, a react framework may be configured for this type of interruption by a reward model.
[0066] The document signals may be used as a form of LLM feedback over time. For instance, if an LLM was used to create a document on the web. the systems and methods can gather the feedback signals to further refine the reward model, which can then be used to guide generation to progressively higher quality generation over time.
[0067] The systems and methods of the present disclosure provide a number of technical effects and benefits. As one example, the system and methods can be utilized to tune a generative model and/or guide generative model content item generation. In particular. the systems and methods disclosed herein can leverage quality signals including reference values, sources, and/or type of content to influence training and/or model inference. For example, the number and/or quality of references to and from a particular resource associated with a content item can be leveraged for determining which content items to prioritize during training and/or inference selection. Alternatively and/or additionally, interaction data (e.g., web traffic, time of viewing, search result page selection data, view trends, selection trends, sharing trends, etc.) can be leveraged for determining which content items to prioritize during training and/or inference selection. In some implementations, the ty pe of content and/or type of resource can be determined and utilized to determine which content items to prioritize during training and/or inference selection. The use of the quality signals can reduce the number of training examples for tuning and/or training models to achieve a particular quality of generative model output, can increase the quality of generative model outputs, and can reduce the transmission that is relied on during reinforcement learning from human feedback. [0068] Another example technical effect and benefit relates to improved computational efficiency and improvements in the functioning of a computing system. For example, a technical benefit of the systems and methods of the present disclosure is the ability to reduce the computational resources needed for training and/or tuning a generative model for generating high quality outputs for downstream tasks. In particular, the amount of training loops and/or the size of the training dataset can be reduced for the same and/or better quality due to the leveraging of the quality signals. The dilution of quality based on low quality training examples can be mitigated, and high quality' training examples can be emphasized during tuning and/or inferences. Moreover, the machine-learned reward model can be leveraged during tuning and/or inference to perform generative model selection, output selection, and/or training example selection. For example, the machine-learned reward model may be used to determine a quality associated with outputs during inference and to determine that outputs of a sufficiently high quality have been generated such that further inference is no longer required. The machine-learned reward model can be smaller than the one or more generative models, which can reduce computational cost for the selection and/or tuning.
[0069] With reference now to the Figures, example embodiments of the present disclosure will be discussed in further detail.
[0070] Various example implementations are described herein with respect to the accompanying Figures. [0071] Figure 1 depicts a block diagram of an example content generation system 10 according to example embodiments of the present disclosure. In some implementations, the content generation system 10 can be configured to receive a set of input data descriptive of a prompt 12 (for example providing the prompt 12 or providing data interpretable by the content generation system 10 to obtain the prompt 12) and, as a result of receipt of the input data, provide output data that includes a model-generated output 18. Thus, in some implementations, the content generation system 10 can include a generative model 14 that is operable to process the prompt 12 and generate a plurality of token predictions for generating a content item.
[0072] In particular, the content generation system 10 can obtain a prompt 12. The prompt 12 can include a text string descriptive of a query and/or question. The prompt 12 may include multimodal data. For example, the prompt 12 can include one or more images. [0073] A generative model 14 can process the prompt 12 to generate a modelgenerated output 18. The generative model 14 may include a pre-trained language model, a pre-trained image generation model, a pre-trained audio generation model, and/or another model. The generative model 10 may have been trained for next token prediction.
[0074] Training of the generative model 14 can be performed based on a training dataset 16 that includes a plurality' of content items and a plurality' of respective quality' signals associated with the plurality of content items. The plurality of content items can include articles, academic papers, social media posts, blogs, videos, audio files, and/or other content items. In some implementations, the plurality of content items can be associated with a plurality7 of different resources. The plurality of different resources can include blog pages, news websites, academic paper databases, a social media platform, and/or other resources. The plurality of web resources can include web resources, application resources, local resources, and/or other resources.
[0075] The plurality of quality' signals can include a page ranking based on link data associated with the respective content item. The page ranking may be based on the quantity' and/or quality of incoming links and/or outgoing links. In some implementations, the plurality of quality signals can include interaction data (e.g., view count, view time, view traffic data, view trends, selection trends, and/or other interaction data), the type of content (e.g., a blog, an academic paper, a social media post, a conversation, a to do list, a memoir, and/or other content type), the type of resource (e.g., a personal web page, a social media platform, a news website, a scholastic publisher, and/or other resource type), and/or other quality signals. In some implementations, the plurality of quality signals may be utilized for model inference.
[0076] The model -generated output 18 can include a model -generated response to the prompt 12. The model-generated output 18 can include text data, image data, audio data, latent encoding data, multimodal data, and/or other data. In some implementations, the model-generated output 18 can include a plurality of predicted words, a plurality of predicted pixels, and/or a plurality of predicted audio signals. The model-generated output 18 can be descriptive of a predicted answer to a question of the prompt 12. The model-generated output 12 can include a natural language response, a generated image, a generated audio clip, and/or other generated data.
[0077] Figure 2 depicts a block diagram of an example generative model tuning system 200 according to example embodiments of the present disclosure. In particular, the generative model tuning system 200 can include tuning a generative model 214 based on quality signals associated with content items of a training dataset 216. The tuning can include leveraging a machine-learned reward model 220 for training example selection for evaluating a generative model 214 output.
[0078] For example, the generative model tuning system 200 can obtain a prompt 212. The prompt 212 can include one or more inputs for the generative model 214. The prompt 212 can include a query, a conversational question, a question about another input (e.g.. an image), and/or other inputs.
[0079] The generative model 214 can process a prompt 212 to generate a modelgenerated output 218. The generative model 214 can include a pre-trained model, which may include a large language model, a vision language model, an image generation model, an audio generation model, and/or other generative models. The model-generated output 218 can include a predicted response to the prompt 212. The predicted response can be generated based on one or more learned sequences, one or more learned knowledge graphs, one or more learned representations, and/or one or more learned features for token prediction. The modelgenerated output 218 can include text data, image data, audio data, structure data, latent encoding data, multimodal data, and/or other data.
[0080] The generative model tuning system 200 can then process the model-generated output 218 to determine a particular content item 222 from the training dataset 216 to leverage for evaluating the model -generated output 218. In particular, the generative model tuning system 200 can determine a subset of the training dataset 216 is associated with the model-generated output 218. The generative model tuning system 200 can then determine the one or more respective quality signals associated with a particular content item 222 is descriptive of a highest quality content item from the subset of the training dataset 216. In some implementations, the particular content item 222 determination may be determined by a machine-learned reward model 220.
[0081] The machine-learned reward model 220 may have been trained on qualitysignals associated with one or more content items. For example, the reward model 220 may have been trained on the training dataset 216. Alternatively and/or additionally, the reward model 220 may have been trained on interaction data associated with previously generated model-generated content items that were published on a web page and/or other web resource. In some implementations, the particular content item 222may be determined and/or selected without a machine-learned reward model 220.
[0082] A loss function 224 can then be evaluated based on the model -generated output 218 and the particular content item 222. The evaluation may include comparing the model-generated output 218 and the particular content item 222 to determine a gradient descent. The gradient descent can then be backpropagated to the generative model 214 to adjust one or more parameters of the generative model 214. The gradient descent may incentivize the generative model 214 to generate outputs that more closely follow the sequences, reasoning, and/or logic of the particular content item 222 determined to be of high quality based on the respective quality signals. In some implementations, the quality signals may be directly utilized to evaluate a loss function 224. Additionally and/or alternatively, the quality scores generated with the machine-learned reward model 220 may be utilized to evaluate the loss function 224.
[0083] Figure 3 depicts a block diagram of an example reward model-based content generation 300 according to example embodiments of the present disclosure. In particular, the reward model-based content generation 300 can be configured to obtain and process a prompt 312 to generate a model-generated response based on rew ard model 320 guidance.
[0084] For example, the reward model-based content generation 300 can obtain a prompt 312. The prompt 312 can include one or more inputs for the generative model 314. The prompt 312 can include a query, a conversational question, a question about another input (e.g., an image), and/or other inputs.
[0085] The generative model 314 can process the prompt 312 to generate a plurality of candidate model-generated output 318, which can include a first candidate output, a second candidate output, and/or an nth candidate output. The generative model 314 can include a pre-trained model, which may include a large language model, a vision language model, an image generation model, an audio generation model, and/or other generative models. The plurality’ of candidate model-generated outputs 318 can include a plurality- of candidate predicted responses to the prompt 312. The plurality of candidate predicted responses can be generated based on one or more learned sequences, one or more learned knowledge graphs, one or more learned representations, and/or one or more learned features for token prediction. The plurality of candidate model-generated outputs 318 can include text data, image data, audio data, structure data, latent encoding data, multimodal data, and/or other data.
[0086] Each of the plurality of candidate model -generated outputs 318 may be associated with content, style, sequence, and/or logic associated with a different content item from the training dataset 316. The plurality of candidate model-generated outputs 318 can be processed with a machine-learned reward model 320 to determine a particular modelgenerated output 326 to provide as the output. The reyvard model 320 can be leveraged to determine a quality score (and/or level) of each of the plurality- of candidate model-generated outputs 318. The particular model-generated output 326 may be determined based on the respective candidate model -generated output having a highest rank and/or highest quality score.
[0087] The machine-learned reward model 320 may have been trained on qualitysignals associated with one or more content items. For example, the reward model 320 may- have been trained on the training dataset 316. Alternatively and/or additionally, the reward model 320 may have been trained on interaction data associated with previously generated model-generated content items that were published on a yveb page and/or other yveb resource. [0088] In some implementations, the plurality of candidate model -generated outputs 318 can include a plurality of candidate response fragments. The particular model-generated output 326 can then be fed back to the generative model 314 to generate a full modelgenerated response.
[0089] Figure 4 depicts a block diagram of an example reyvard model training system 400 according to example embodiments of the present disclosure. In particular, the reward model training system 400 can be configured to process a content item 402 with a reward model 404 to generate a score 406 (and/or rank), which can then be utilized along with a respective quality signal 408 for the content item 402 to evaluate a loss function for reyvard model 404 training and/or tuning.
[0090] For example, the reward model training system 400 can obtain a content item 402. The content item 402 may be a pre-existing human-authored content item and/or a model-generated content item. The content item 402 can include a training example from a training dataset. The content item 402 can include an article, a blog, a short story, a conversation, a memoir, an academic paper, a social media post, a tutorial, instructions, an encyclopedia entry, and/or other content item.
[0091] A reward model 404 can process the content item 404 to generate a score 406 (and/or rank) associated with a determined quality of the content item 402. The score 406 may include a classification, a numerical value, and/or a comparison to one or more other content items.
[0092] One or more quality signals 408 associated with the content item 402 can be obtained. The one or more quality signals 408 can be obtained from and/or determined based on metadata for the content item 402. The one or more quality signals can include a resource rank, interaction data, and/or other quality signals.
[0093] A loss function 410 can then be evaluated based on the score 406 and the one or more quality signals. A gradient descent may be generated based on the evaluation of the loss function 410. The gradient descent can then be backpropagated to the reward model 404 to adjust one or more parameters of the machine-learned reward model 404.
[0094] Figure 5 depicts a block diagram of an example generative model selection system 500 according to example embodiments of the present disclosure. In particular, the generative model selection system 500 can leverage a machine-learned reward model 516 to determine a particular generative model of plurality of candidate generative models to utilize for a particular task associated with an obtained prompt 502.
[0095] For example, the generative model selection system 500 can obtain a prompt 502. The prompt 502 can be processed with a plurality of candidate generative models (e.g., a first generative model 504, a second generative model 506, and/or an nth generative model 508) to generate a plurality of candidate output fragments (e.g., a first output fragment 510, a second output fragment 512, and/or an nth output fragment 514), which can be associated with partially generated responses. The plurality of candidate generative models can be trained on different training datasets and/or may have varying architectures.
[0096] The plurality of candidate output fragments may be processed with a machine- learned reward model 516 to perform a model determination 518 (or model selection from the plurality of candidate generative models). Alternatively and/or additionally, the machine- learned rew ard model 516 can be trained to process a prompt 502 to determine which generative model to utilize. In particular, the machine-learned reward model 516 may be trained and/or configured to determine a particular generative model to utilize for a particular prompt task and/or subtask based on a prompt classification, a gating determination, and/or candidate output fragments ranking.
Example Methods
[0097] Figure 6 depicts a flow chart diagram of an example method to perform according to example embodiments of the present disclosure. Although Figure 6 depicts steps performed in a particular order for purposes of illustration and discussion, the methods of the present disclosure are not limited to the particularly illustrated order or arrangement. The various steps of the method 600 can be omitted, rearranged, combined, and/or adapted in various ways without deviating from the scope of the present disclosure.
[0098] At 602, a computing system can obtain a content dataset. The content dataset can include a plurality of content items associated with a plurality of respective resources and a plurality of quality scores associated with the plurality of respective resources. The content dataset may be part of a training dataset that was utilized to train and/or tune a generative model. The plurality of quality scores can be descriptive of a quality of a respective resource as a search result. In some implementations, the plurality of quality scores may have been determined based on incoming links and outgoing links for the plurality of respective resources. Additionally and/or alternatively, the plurality7 of quality7 scores may have been determined based on a quantity and quality of incoming links for the plurality of respective resources.
[0099] In some implementations, the plurality of quality7 scores may have been determined by processing the plurality of content items and plurality of respective metadata sets associated with the plurality of respective resources with a ranking engine to generate a plurality of ranking scores. The ranking engine can be associated with a search engine. In some implementations, the ranking engine can be configured to rank resources to determine particular resources to provide as search results.
[0100] At 604, the computing system can process a prompt with a generative model to generate a plurality of candidate model-generated responses. The prompt can include a request for information. The prompt may include a question and/or a query. The prompt may include a text string, an image, an embedding, a multimodal input, and/or other data. The generative model may include a pre-trained model. In some implementations, the generative model may include a large language model, a vision language model, an image generation model, an audio generation model, and/or other generative models. The generative model may include one or more encoders and/or one or more decoders. [0101] At 606, the computing system can determine, based on the content dataset, a subset of the content items of the plurality of content items that are associated with at least a subset of the plurality of candidate model-generated responses. The subset of content items may be determined by determining the plurality of candidate model-generated responses include respective sequence predictions similar to the subset of content items. The similarity may be determined based on embedding similarity, semantic similarity, and/or data sequence matching (e.g.. text matching).
[0102] At 608, the computing system can determine, based on a set of respective quality7 scores associated with the subset of the content items, a particular model-generated response of the plurality of candidate model-generated responses to provide as an output. The particular model-generated response may be associated with a respective content item with a highest quality score of the subset of content items. The particular model-generated response may be fed back to the generative model to generate a long-form response to the prompt. [0103] In some implementations, determining the particular model-generated response of the plurality of candidate model-generated responses to provide as an output can include determining, based on the content dataset, the particular model-generated response of the plurality of candidate model-generated responses is associated with a respective content item of the subset of content items with a respective quality score greater than the other quality scores of the set of respective quality scores associated with other content items of the subset of content items associated with a set of other candidate model-generated responses. [0104] In some implementations, the computing system can provide the particular model-generated response as an output. For example, the computing system can provide the particular model-generated response as an output response to the prompt. The content dataset can include a plurality of pre-existing content items published on the internet. The generative model may include a pretrained autoregressive language model. In some implementations, the particular model-generated response can include a natural language response to the prompt.
[0105] Additionally and/or alternatively, the computing system may select a set of the content items for inclusion in a training dataset based on the plurality of quality- scores. The computing system may then train a machine-learned reward model using the training dataset. The machine-learned reward model and the generative model can then be stored.
[0106] In some implementations, the computing system may obtain a plurality of interaction datasets associated with a plurality of additional resources. The plurality of additional resources can include a plurality of model-generated content items that were previously generated with the generative model. In some implementations, the plurality of interaction datasets can be descriptive of respective interactions with the plurality of additional resources by a plurality of users. The computing system can train the machine- learned reward model based on the plurality of interaction datasets and the plurality of modelgenerated content items.
[0107] Alternatively and/or additionally, the computing system can process a second prompt with the generative model to generate a plurality of model-generated fragments. The plurality of model-generated fragments can include a plurality of different candidate responses to the second prompt. The computing system can process the plurality of modelgenerated fragments with the machine-learned reward model to generate a plurality of respective scores. In some implementations, the plurality of respective scores can be associated with evaluating a quality of the plurality of model-generated fragments. The computing system can provide a particular model-generated fragment of the plurality' of model-generated fragments as an output based on the plurality of respective scores.
[0108] Figure 7 depicts a flow chart diagram of an example method to perform according to example embodiments of the present disclosure. Although Figure 7 depicts steps performed in a particular order for purposes of illustration and discussion, the methods of the present disclosure are not limited to the particularly illustrated order or arrangement. The various steps of the method 700 can be omitted, rearranged, combined, and/or adapted in various ways without deviating from the scope of the present disclosure.
[0109] At 702, a computing system can obtain a training dataset. The training dataset can include a plurality of content items associated with a plurality- of respective web resources and a plurality of quality scores associated with the plurality of respective web resources. The plurality of quality scores can be determined based on incoming links and outgoing links for the plurality of respective web resources and/or other quality- signals. In some implementations, the plurality of quality' scores can be determined based on an amount of references to a respective web resource within other resources. The plurality of quality scores can be determined based on how the respective web resource is referenced.
[0110] At 704. the computing system can process a prompt with a generative model to generate a plurality of probabilities associated with a plurality of candidate model outputs. The generative model may include a pre-trained generative model. The generative model may include a natural language processing model, an image generation model, an audio generation model, a vision language model, and/or other models. The generative model can include one or more transformer models. The plurality of probabilities can be descriptive of a probability’ the respective candidate model output is responsive to the prompt and sequentially coherent. The plurality of candidate model outputs may include full responses and/or response fragments. Each of the plurality of candidate model outputs may be associated with a different sequence representation and/or logic representation associated with a different content item within the training dataset.
[0111] At 706, the computing system can determine a first ground truth example from the training dataset is associated with a first candidate model output of the plurality of candidate model outputs and a second ground truth example from the training dataset is associated with a second candidate model output of the plurality of candidate model outputs. The associations can be determined based on semantic comparison, stylistic comparison, and/or other similarity measures. In some implementations, the association can be determined based on embedding the content items and the candidate model outputs to determine embedding neighbors for the association determination.
[0112] At 708, the computing system can determine the first ground truth example is associated with a first web resource with a higher quality score than a second web resource associated with the second ground truth example. The computing system can determine the quality signals of the first content item are descriptive of a higher quality than the uality signals of the second content item. The quality scores can be utilized to determine a particular candidate model output is of higher quality. The particular candidate model output can then be selected as the output and/or for model tuning.
[01 13] At 710, the computing system can evaluate a loss function that evaluates a difference between a first probability associated with the first candidate model output and a second probability associated with the second candidate model output. The loss function can include one or more terms for penalizing the second probability and/or one or more terms for increasing the first probability. The loss function may include one or more terms for prioritizing candidate model outputs with higher quality' scores.
[0114] At 712, the computing system can adjust one or more parameters of the generative model based on the loss function. Adjusting the one or more parameters can be utilized to adjust the probability predictions to increase the first probability and/or reduce the second probability. The adjustment may prioritize prediction probabilities for content items with higher quality signals.
[0115] In some implementations, the computing system can obtain input data. The input data can be descriptive of a user prompt. The computing system can process the input data with the generative model to generate a model-generated response. The model-generated response can be responsive to the user prompt. The computing system can provide the modelgenerated response as an output. In some implementations, the user prompt can include a natural language question. The model-generated response can include a plurality of predicted words responsive to the question. The model-generated response can include a sequence of words that differs from the plurality of content items. In some implementations, the user prompt can include multimodal data. The user prompt can include an image and a question associated with the image.
[0116] Figure 8 depicts a flow chart diagram of an example method to perform according to example embodiments of the present disclosure. Although Figure 8 depicts steps performed in a particular order for purposes of illustration and discussion, the methods of the present disclosure are not limited to the particularly illustrated order or arrangement. The various steps of the method 800 can be omitted, rearranged, combined, and/or adapted in various ways without deviating from the scope of the present disclosure.
[0117] At 802, a computing system can obtain a training dataset. The training dataset can include a plurality of content items associated with a plurality of respective web resources and a plurality of quality scores associated with the plurality of respective web resources. The plurality of quality scores can be determined based on incoming links and outgoing links for the plurality' of respective web resources, interaction data, knowledge graph data, resource type, content type, and/or other quality signals.
[0118] At 804. the computing system can train a machine-learned reward model on the training dataset. The machine-learned reward model can be an additional machine-learned model that may be utilized to evaluate content items and/or determine similar pre-existing content items associated with a received model-generated content item, which can then be utilized to determine a quality of the received model-generate content item. The machine- learned reward model can be trained to rank a set of data based on a determined quality score. The machine-learned reward model can be separate from the generative model. Alternatively and/or additionally, the machine-learned reward model may be part of the generative model. The machine-learned model may be trained to process a content item and generate a quality score descriptive of a determined level of quality. The machine-learned reward model may include determining a similarity between a model-generated output and a training content item, which can then be leveraged to look-up a quality' score for the similar training content item. Alternatively and/or additionally, the machine-learned reward model may be trained to generate the quality score without a similarity determination. [0119] In some implementations, the computing system can obtain a plurality of interaction datasets associated with a plurality of additional resources. The plurality of additional resources can include a plurality of model-generated content items that were previously generated with the generative model. The plurality' of interaction datasets can be descriptive of respective interactions with the plurality' of additional resources by a plurality' of users. The computing system can adjust one or more parameters of the machine-learned reward model based on the plurality of interaction datasets and the plurality- of modelgenerated content items.
[0120] At 806, the computing system can obtain a prompt. The prompt may include a query . In some implementations, the prompt may include an image and text descriptive of a question associated with the image. The prompt may be a user-input prompt and/or an automatically generated prompt (e.g., a prompt generated based on a context).
[0121] At 808, the computing system can process the prompt with a generative model to generate a plurality' of model-generated fragments. The plurality of model-generated fragments can include a plurality of different candidate responses to the prompt. The plurality of model-generated fragments can be parts of larger candidate responses and/or short form responses that can be expanded.
[0122] At 810, the computing system can process the plurality' of model-generated fragments with the machine-learned reward model to generate a plurality of respective scores. The plurality of respective scores can be associated with evaluating a quality of the plurality of model -generated fragments. The plurality' of respective scores may be determined based on learned quality' characteristics associated with the training dataset. The plurality7 of respective scores may be associated with a ranking for each of the plurality' of model-generated fragments.
[0123] At 812, the computing system can provide a particular model-generated fragment of the plurality7 of model-generated fragments as an output based on the plurality of respective scores. In some implementations, providing the particular model-generated fragment of the plurality of model-generated fragments as the output based on the plurality of respective scores can include determining the particular model-generated fragment is associated with a particular respective score that is higher than a plurality' of other respective scores.
[0124] In some implementations, the computing system can process the prompt with an additional generative model to generate a plurality of additional model-generated fragments, process the plurality of additional model-generated fragments with the machine- learned reward model to generate a plurality of additional scores, determine the additional generative model is associated with a task of the prompt based on the plurality of respective scores and the plurality of additional scores, and generate a long-form model-generated response with the additional generative model.
[0125] Figure 9 depicts a flow chart diagram of an example method to perform according to example embodiments of the present disclosure. Although Figure 9 depicts steps performed in a particular order for purposes of illustration and discussion, the methods of the present disclosure are not limited to the particularly illustrated order or arrangement. The various steps of the method 900 can be omitted, rearranged, combined, and/or adapted in various ways without deviating from the scope of the present disclosure.
[0126] At 902, a computing system can obtain a training dataset. The training dataset can include a plurality of content items associated with a plurality of respective resources. The plurality of content items can include articles, academic papers, blogs, product listings, reviews, social media posts, videos, songs, and/or other content items. The plurality' of content items can include text data, image data, audio data, statistical data, structure data, latent encoding data, multimodal data, and/or other data. The plurality of respective resources can include a plurality of different web domains, a plurality of different web pages, a plurality' of different applications, a plurality' of different web platforms, a plurality of different databases, a plurality of different knowledge graphs, a plurality of different social graphs, and/or other resources. The training dataset may include a plurality of metadata sets associated with the plurality of content items.
[0127] At 904, the computing system can determine a plurality of quality' scores associated with the plurality of respective resources. The plurality' of quality' scores can be determined based on incoming links and outgoing links for the plurality’ of respective resources. For example, a link difference can be determined between incoming and outgoing links, which may be utilized for determining the quality’ score for the respective content item. Additionally and/or alternatively, the form, location, and/or prominence of the link may be utilized for weighting the links in quality’ score determination. For example, links in a citation may be weighted differently than a standalone link in a list of links. In some implementations, short form URLs, encoded links, full URLs, and/or selectable widgets may be weighted differently. Links may be weighted in the quality' score determination based on interaction data associated with the respective links.
[0128] Additionally and/or alternatively, the plurality of quality scores may be determined based at least in part on reference data, user data, interaction data, search engine ranking data, type of content, type of resource, and/or other signals. The reference data can include data descriptive of references to the content item (e.g., title reference and/or link to the content item), the author, and/or the resource within other content items. The user data can be descriptive of preferences, trends, and/or data associated with a particular user, a set of users, and/or global users. The interaction data can be descriptive of view traffic, view time, view trends, selection metrics, sharing metrics, and/or other interaction data associated with the resource and/or the content item. The search engine ranking data can be descriptive of resource rankings as determined by a search engine and/or other ranking engine. The type of content and/or type of resource may be determined based on the metadata and/or by a classification model. Academic papers may receive higher quality scores, while social media posts may receive lower quality scores. Additionally and/or alternatively, a scholastic resource and/or an encyclopedia resource may receive a higher quality score than an unfiltered and/or unmoderated blog platform.
[0129] At 906, the computing system can obtain a prompt and process the prompt with a generative model to generate a model-generated output responsive to the prompt. In some implementations, the generative model may include a pretrained autoregressive language model. The generative model may have been pretrained on a plurality7 of textual content items associated with the plurality of respective resources. Alternatively and/or additionally, the generative model may include an image generation model. The image generation model may have been trained for generation based on a plurality7 of images from the plurality of content items. The image generation model can include a diffusion model. The model-generated output can include text data, image data, audio data, multimodal data, and/or other data. The prompt may include a question, and the model-generated output may be responsive to the question. The model-generated output may include a plurality of token predictions, which may include a novel sequence of words. pixels, and/or other signals. The token predictions may be determined based on learned sequences and/or learned representations.
[0130] At 908, the computing system can determine a ground truth example from the training dataset based on the plurality of quality scores. The ground truth example can include a particular content item from the plurality of content items. The ground truth example may include the full content item and/or a portion of the particular content item. The ground truth example may be determined based on the respective quality score associated with the particular content item of the ground truth example being higher than at least a subset of the other quality scores. The ground truth example may be determined based on selecting a subset of content items that are determined to be similar to the model-generated output, then select the particular content item based on the quality’ scores.
[0131] In some implementations, wherein determining the ground truth example from the training dataset based on the plurality of quality scores can include determining a first ground truth example from the training dataset is associated with the model-generated output and determining a second ground truth example from the training dataset is associated with the model-generated output. Determining the ground truth example from the training dataset based on the plurality of quality scores can include determining a first score of the plurality of quality scores is associated with the first ground truth example and determining a second score of the plurality of quality' scores is associated with the second ground truth example. The computing system can then determine the ground truth example from the first ground truth example and the second ground truth example based on the first score and the second score.
[0132] At 910, the computing system can evaluate a loss function that evaluates a difference between the model-generated output and the ground truth example. The loss function can include an L2 loss, a perception loss, and/or other losses. In some implementations, a gradient descent can be generated based on the loss function, which can then be backpropagated to the generative model.
[0133] At 912, the computing system can adjust one or more parameters of the generative model based on the loss function. Adjusting the one or more parameters can include tuning a pre-trained generative model. In some implementations, a subset of the parameters of the generative model may be fixed, while the remaining parameters may be adjusted during tuning. The training loop may be performed iteratively. The training loop may be guided and/or influenced based on machine-learned reward model processing. [0134] In some implementations, the computing system can obtain a plurality' of interaction datasets associated w ith a plurality' of additional resources. The plurality of additional resources can include a plurality' of model-generated content items that were previously generated with the generative model. The plurality of interaction datasets can be descriptive of respective interactions with the plurality of additional resources by a plurality of users. The computing system can train a machine-learned reward model based on the plurality' of interaction datasets and the plurality' of model-generated content items. The computing system can then store the machine-learned reward model and the generative model. The reward model may be further trained on the training dataset and the plurality of scores. [0135] In some implementations, the computing system can process a second prompt with the generative model to generate a plurality of model-generated fragments. The plurality of model-generated fragments can include a plurality of different candidate responses to the second prompt. The computing system can process the plurality of model -generated fragments with the machine-learned reward model to generate a plurality of respective scores. The plurality of respective scores can be associated with evaluating a quality of the plurality of model-generated fragments. The computing system can then provide a particular modelgenerated fragment of the plurality' of model-generated fragments as an output based on the plurality of respective scores.
[0136] Alternatively and/or additionally, the computing system can obtain an additional prompt, process the additional prompt with the generative model to generate an additional model -generated output, process the model-generated output with the machine- learned reward model to generate an output score, and adjust one or more parameters of the generative model based on the output score.
[0137] Figure 10 depicts a flowchart of a method 1000 for training one or more machine-learned models according to aspects of the present disclosure. For instance, an example machine-learned model can include a generative model (e.g., a large language model, a foundation model, a vision language model, an image generation model, a text-to- image model, an audio generation model, and/or other generative models) and/or a machine- learned reward model.
[0138] One or more portion(s) of example method 1000 can be implemented by a computing system that includes one or more computing devices such as, for example, computing systems described with reference to the other figures. Each respective portion of example method 1000 can be performed by any (or any combination) of one or more computing devices. Moreover, one or more portion(s) of example method 1000 can be implemented on the hardw are components of the device(s) described herein, for example, to train one or more systems or models. Figure 10 depicts elements performed in a particular order for purposes of illustration and discussion. Those of ordinary skill in the art, using the disclosures provided herein, will understand that the elements of any of the methods discussed herein can be adapted, rearranged, expanded, omitted, combined, or modified in various ways without deviating from the scope of the present disclosure. Figure 10 is described with reference to elements/terms described with respect to other systems and figures for exemplary illustrated purposes and is not meant to be limiting. One or more portions of example method 1000 can be performed additionally, or alternatively, by other systems.
[0139] At 1002, example method 1000 can include obtaining a training instance. A set of training data can include a plurality of training instances divided between multiple datasets (e.g., a training dataset, a validation dataset, or testing dataset). A training instance can be labeled or unlabeled. Although referred to in example method 1000 as a “training” instance, it is to be understood that runtime inferences can form training instances when a model is trained using an evaluation of the model’s performance on that runtime instance (e.g., online training/leaming). Example data types for the training instance and various tasks associated therewith are described throughout the present disclosure.
[0140] At 1004, example method 1000 can include processing, using one or more machine-learned models, the training instance to generate an output. The output can be directly obtained from the one or more machine-learned models or can be a downstream result of a chain of processing operations that includes an output of the one or more machine- learned models.
[0141] At 1006, example method 1000 can include receiving an evaluation signal associated with the output. The evaluation signal can be obtained using a loss function. Various determinations of loss can be used, such as mean squared error, likelihood loss, cross entropy loss, hinge loss, contrastive loss, or various other loss functions. The evaluation signal can be computed using known ground-truth labels (e.g., supervised learning), predicted or estimated labels (e.g., semi- or self-supervised learning), or without labels (e.g., unsupervised learning). The evaluation signal can be a reward (e.g., for reinforcement learning). The reward can be computed using a machine-learned reward model configured to generate rewards based on output(s) received. The reward can be computed using feedback data describing human feedback on the output(s).
[0142] At 1008, example method 1000 can include updating the machine-learned model using the evaluation signal. For example, values for parameters of the machine-learned model(s) can be learned, in some embodiments, using various training or learning techniques, such as, for example, backwards propagation. For example, the evaluation signal can be backpropagated from the output (or another source of the evaluation signal) through the machine-learned model (s) to update one or more parameters of the model(s) (e.g., based on a gradient of the evaluation signal with respect to the parameter value(s)). For example, system(s) containing one or more machine-learned models can be trained in an end-to-end manner. Gradient descent techniques can be used to iteratively update the parameters over a number of training iterations. In some implementations, performing backwards propagation of errors can include performing truncated backpropagation through time. Example method 1000 can include implementing a number of generalization techniques (e.g., weight decays, dropouts, etc.) to improve the generalization capability of the models being trained.
[0143] In some implementations, example method 1000 can be implemented for training a machine-learned model from an initialized state to a fully trained state (e.g., when the model exhibits a desired performance profile, such as based on accuracy, precision, recall, etc.).
[0144] In some implementations, example method 1000 can be implemented for particular stages of a training procedure. For instance, in some implementations, example method 1000 can be implemented for pre-training a machine-learned model. Pre-training can include, for instance, large-scale training over potentially noisy data to achieve a broad base of performance levels across a variety of tasks/data types. In some implementations, example method 1000 can be implemented for fine-tuning a machine-learned model. Fine-tuning can include, for instance, smaller-scale training on higher-quality (e.g., labeled, curated, etc.) data. Fine-tuning can affect all or a portion of the parameters of a machine-learned model. For example, various portions of the machine-learned model can be “frozen” for certain training stages. For example, parameters associated with an embedding space can be “frozen” during fine-tuning (e.g., to retain information learned from a broader domain(s) than present in the fine-tuning dataset(s)). An example fine-tuning approach includes reinforcement learning. Reinforcement learning can be based on user feedback on model performance during use.
[0145] More particularly, the machine-learned models can include one or more generative models and/or one or more reward models. The reward models can be trained and/or tuned to be utilized for generative model tuning and/or generative model inference guidance.
[0146] For example, the systems and methods disclosed herein can include the utilization of intrinsic document quality signals (e.g., linking-based resource ranking and other metadata) in the inference process of a generative model (e.g., a large language model (LLM) foundation model).
[0147] A text-based training dataset of a generative model (e.g., a large language foundation model (LLM foundation model)) can include the intrinsic document quality signals (e.g., a number of page views) that can help to directly estimate the response value, which can enable better separation of low-value and high-value results. The model can obtain the signals from existing training data along with the intrinsic document quality signals as an additional metadata (e.g., link-based resource ranking, user data-based resource ranking, number of page visits, number/quality of incoming links, etc.) rather than via a separate external process involving human raters. The additional metadata can include reference-based page ranking, model-generated content interaction data, value analytics, and/or task analytics. [0148] Reference-based page ranking can be used as an intrinsic value signal for distinguishing two or more machine-learned model responses (e.g.. where a user receives responses from the machine-learned model, that include fragments of web pages that inherently have an associated page rank score and the response that is associated more closely with a page that has a higher page rank score may be preferred and returned to the user).
[0149] In some implementations, the reference-based resource ranking can be utilized for distinguishing two or more machine-learned model responses, where the underlying documents are model-generated (fully or partially) (e.g., the user has generated the document using the generative model and published the document (e.g., on the web, via an email, etc.) and the document quality signals can then be fed back to fine-tune the generative modeTs reward model and guide the generative model towards creating content which is measurably higher quality on subsequent uses).
[0150] Machine-learned model cloud offerings that provide analytics about the value of the answers given to users may be leveraged for fine-tuning and/or inference (e.g., the reward model may assess answer quality’ without altering the language modeTs responses, offering developers scores that reflect model performance and help compare capabilities when switching models).
[0151] The quality signals may include task analytics and personalization for users based on their needs from the generative model they interact with. For example, the reward model may prioritize responses associated with resources on topics that the user lacks expertise.
[0152] The metadata can be extracted from various pipelines (e g., various sources) and can be extended to other available information quality signals (e.g., utility function signals) used in information retrieval. For example, utility functions signals can refer to the text in a how -to tutorial that provides inherent value by explaining how' to solve a specific task, compared to a text that provides a discussion of a problem without reaching a solution. [0153] The text-based training dataset can help to create a partial scoring model (e.g., a reward model used in the reinforcement learning from human feedback (RLHF) loop) that helps the machine-learned model to generate better quality results, where the annotation between the data and document quality signals can be established.
[0154] To create the partial scoring model and train at a sentence and/or paragraph level, the model (e.g., the generative model and/or the reward model) can split the document into smaller units, translate the document-level signals to paragraph-level signals by preserving the document-level signal and applying the signal to the smaller units, apply the transformation for other metadata (e.g., page ranking) by using a naive approach that takes various connections to the paragraph extracted and citations to the paragraph instead of the entire document score, and incorporate other signals (e.g., assessing the intrinsic difficulty or value of the task being addressed (e.g., organizing a shopping cart for a user having some intrinsic value)).
[0155] Additionally and/or alternatively, the dataset may be used to generate a transformer encoder/decoder based model architecture using a pre-trained machine-learned model. The machine-learned model may utilize the reward model during the training time and/or may utilize a scoring mechanism directly at the inference stage. During the inference stage, the decoder may utilize the trained reward model as a scoring signal that is applied during beam search and/or may apply the signal as a post-processing step when the response is re-ranked using the scoring model.
[0156] In some implementations, the generative model and/or the reward model may match the response through an embedding look-up to parts of documents that include the responses and can then use an associated (partial) score for the generated (part of) response. The reward model and/or the generative model can revise the learned score using a form of system-level feedback (e.g., the resource ranking and/or other signal data).
[0157] In some implementations, the reward model may better evaluate the responses through the value, where the user is allowed to select the signal preference. The reward model and/or the generative model may decide to stop if the remaining individual tasks have low' value according to the rew ard model.
[0158] In some implementations, the generative model may include language models (e.g.. large language models and/or vision language models), image generation models (e.g.. text-to-image generation models and/or image augmentation models), audio generation models, video generation models, graph generation models, and/or other data generation models (e.g., other content generation models). The one or more generative models can include one or more transformer models, one or more convolutional neural networks, one or more recurrent neural networks, one or more feedforward neural networks, one or more generative adversarial networks, one or more self-attention models, one or more embedding models, one or more encoders, one or more decoders, and/or one or more other models. In some implementations, the one or more generative models can include one or more autoregressive models (e.g., a machine-learned model trained to generate predictive values based on previous behavior data) and/or one or more diffusion models (e g., a machine- learned model trained to generate predicted data based on generating and processing distribution data associated with the input data).
[0159] The one or more generative models can be trained to process input data and generate model-generated content items. The input data and/or model-generated content items may include a plurality of predicted words, pixels, image frame, signals, and/or other data. The model-generated content items may include novel content items that are not the same as any pre-existing work. The one or more generative models 90 can leverage learned representations, sequences, and/or probability distributions to generate the content items, which may include phrases, storylines, settings, objects, characters, beats, lyrics, and/or other aspects that are not included in pre-existing content items.
[0160] The one or more generative models may include a vision language model.
[0161] The vision language model can be trained, tuned, and/or configured to process image data and/or text data to generate a natural language output. The vision language model may leverage a pre-trained large language model (e.g., a large autoregressive language model) with one or more encoders (e.g.. one or more image encoders and/or one or more text encoders) to provide detailed natural language outputs that emulate natural language composed by a human.
[01 2] The vision language model may be utilized for zero-shot image classification, few shot image classification, image captioning, multimodal query distillation, multimodal question and answering, and/or may be tuned and/or trained for a plurality of different tasks. The vision language model can perform visual question answering, image caption generation, feature detection (e.g., content monitoring (e.g. for inappropriate content)), object detection, scene recognition, and/or other tasks.
[0163] The vision language model may leverage a pre-trained language model that may then be tuned for multimodality’. Training and/or tuning of the vision language model can include image-text matching, masked-language modeling, multimodal fusing with cross attention, contrastive learning, prefix language model training, and/or other training techniques. For example, the vision language model may be trained to process an image to generate predicted text that is similar to ground truth text data (e.g., a ground truth caption for the image). In some implementations, the vision language model may be trained to replace masked tokens of a natural language template with textual tokens descriptive of features depicted in an input image. Alternatively and/or additionally, the training, tuning, and/or model inference may include multi-layer concatenation of visual and textual embedding features. In some implementations, the vision language model may be trained and/or tuned via jointly learning image embedding and text embedding generation, which may include training and/or tuning a system to map embeddings to a joint feature embedding space that maps text features and image features into a shared embedding space. The joint training may include image-text pair parallel embedding and/or may include triplet training. In some implementations, the images may be utilized and/or processed as prefixes to the language model.
[0164] Although Figures 6 - 10 are directed to example methods, the methods disclosed herein can be implemented as operations stored in a non-transitory computer- readable medium that may be executed by one or more processors of a computing system. Moreover, systems, computer-readable media, and/or methods disclosed herein may be implemented additionally and/or alternatively via other devices, components, and/or mediums. Different implementations of the systems, computer-readable media, and/or methods disclosed herein may be compatible with one another.
Example Machine-Learned Models
[01 5] Figure 1 1 is a block diagram of an example processing flow for using machine-learned model(s) 1 to process input(s) 2 to generate output(s) 3.
[0166] Machine-learned model(s) 1 can be or include one or multiple machine- learned models or model components. Example machine-learned models can include neural networks (e.g., deep neural networks). Example machine-learned models can include nonlinear models or linear models. Example machine-learned models can use other architectures in lieu of or in addition to neural networks. Example machine-learned models can include decision tree based models, support vector machines, hidden Markov models, Bayesian networks, linear regression models, k-means clustering models, etc.
[0167] Example neural networks can include feed-forward neural networks, recurrent neural networks (RNNs), including long short-term memory' (LSTM) based recurrent neural networks, convolutional neural networks (CNNs), diffusion models, generative-adversarial networks, or other forms of neural networks. Example neural networks can be deep neural networks. Some example machine-learned models can leverage an attention mechanism such as self-atention. For example, some example machine-learned models can include multiheaded self-attention models.
[0168] Machine-learned model(s) 1 can include a single or multiple instances of the same model configured to operate on data from input(s) 2. Machine-learned model(s) 1 can include an ensemble of different models that can cooperatively interact to process data from input(s) 2. For example, machine-learned model(s) 1 can employ a mixture-of-experts structure. See, e.g.. Zhou et al., Mixture-of-Experts with Expert Choice Routing, ARXIV:2202.09368V2 (Oct. 14, 2022).
[0169] Input(s) 2 can generally include or otherwise represent various types of data. Input(s) 2 can include one type or many different types of data. Output(s) 3 can be data of the same type(s) or of different types of data as compared to input(s) 2. Output(s) 3 can include one type or many different types of data.
[0170] Example data types for input(s) 2 or output(s) 3 include natural language text data, software code data (e.g., source code, object code, machine code, or any other form of computer-readable instructions or programming languages), machine code data (e.g.. binary code, assembly code, or other forms of machine-readable instructions that can be executed directly by a computer's central processing unit), assembly code data (e.g., low-level programming languages that use symbolic representations of machine code instructions to program a processing unit), genetic data or other chemical or biochemical data, image data such as pixel values and/or video frames, audio data, audiovisual data, haptic data, biometric data, medical data, financial data, statistical data, geographical data, astronomical data, historical data, sensor data generally (e.g., digital or analog values, such as voltage or other absolute or relative level measurement values from a real or artificial input, such as from an audio sensor, light sensor, displacement sensor, etc.), and the like. Data can be raw or processed and can be in any format or schema.
[0171] In multimodal inputs 2 or outputs 3, example combinations of data types include image data and audio data, image data and natural language data, natural language data and software code data, image data and biometric data, sensor data and medical data, etc. It is to be understood that any combination of data types in an input 2 or an output 3 can be present.
[0172] An example input 2 can include one or multiple data types, such as the example data types noted above. An example output 3 can include one or multiple data types, such as the example data types noted above. The data type(s) of input 2 can be the same as or different from the data type(s) of output 3. It is to be understood that the example data types noted above are provided for illustrative purposes only. Data types contemplated within the scope of the present disclosure are not limited to those examples noted above.
Example Machine-Learned Sequence Processing Models
[0173] Figure 12 is a block diagram of an example implementation of an example machine-learned model configured to process sequences of information. For instance, an example implementation of machine-learned model(s) 1 can include machine-learned sequence processing model(s) 4. An example system can pass input(s) 2 to sequence processing model(s) 4. Sequence processing model(s) 4 can include one or more machine- learned components. Sequence processing model(s) 4 can process the data from input(s) 2 to obtain an input sequence 5. Input sequence 5 can include one or more input elements 5-1, 5- 2, . . . , 5-M, etc. obtained from input(s) 2. Sequence processing model 4 can process input sequence 5 using prediction layer(s) 6 to generate an output sequence 7. Output sequence 7 can include one or more output elements 7-1, 7-2, . . . , 7-N. etc. generated based on input sequence 5. The system can generate output(s) 3 based on output sequence 7.
[0174] Sequence processing model(s) 4 can include one or multiple machine-learned model components configured to ingest, generate, or otherwise reason over sequences of information. For example, some example sequence processing models in the text domain are referred to as “Large Language Models,” or LLMs. See, e.g., PaLM 2 Technical Report, GOOGLE. https://ai.google/static/documents/palm2techreport.pdf (n.d.). Other example sequence processing models can operate in other domains, such as image domains, see, e.g., Dosovitskiy et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ARXIV:2010. 11929V2 (Jun. 3, 2021), audio domains, see, e.g., Agostinelli et al., MusicLM: Generating Music From Text, ARXIV:2301.11325V1 (Jan. 26, 2023), biochemical domains, see, e.g., Jumper et al., Highly accurate protein structure prediction with AlphaFold, 596 Nature 583 (Aug. 26, 2021), by way of example. Sequence processing model(s) 4 can process one or multiple ty pes of data simultaneously. Sequence processing model(s) 4 can include relatively large models (e.g.. more parameters, computationally expensive, etc.), relatively small models (e.g., fewer parameters, computationally lightweight, etc.), or both. [0175] In general, sequence processing model(s) 4 can obtain input sequence 5 using data from input(s) 2. For instance, input sequence 5 can include a representation of data from input(s) 2 in a format understood by sequence processing model(s) 4. One or more machine- learned components of sequence processing model(s) 4 can ingest the data from input(s) 2, parse the data into pieces compatible with the processing architectures of sequence processing model(s) 4 (e.g., via “tokenization”), and project the pieces into an input space associated with prediction layer(s) 6 (e.g., via “embedding”).
[0176] Sequence processing model(s) 4 can ingest the data from input(s) 2 and parse the data into a sequence of elements to obtain input sequence 5. For example, a portion of input data from input(s) 2 can be broken dow n into pieces that collectively represent the content of the portion of the input data. The pieces can provide the elements of the sequence. [0177] Elements 5-1. 5-2, . . . . 5-M can represent, in some cases, building blocks for capturing or expressing meaningful information in a particular data domain. For instance, the elements can describe “atomic units” across one or more domains. For example, for textual input source(s), the elements can correspond to groups of one or more words or sub-word components, such as sets of one or more characters.
[0178] For example, elements 5-1, 5-2, . . . , 5-M can represent tokens obtained using atokenizer. For instance, a tokenizer can process a given portion of an input source and output a series of tokens (e.g., corresponding to input elements 5-1, 5-2, . . . , 5-M) that represent the portion of the input source. Various approaches to tokenization can be used. For instance, textual input source(s) can be tokenized using a byte-pair encoding (BPE) technique. See, e.g., Kudo et al., SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing, PROCEEDINGS or THE 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (System Demonstrations), pages 66-71 (October 31-November 4. 2018), https://aclanthology.org/Dl 8-2012.pdf. Image-based input source(s) can be tokenized by extracting and serializing patches from an image.
[0179] In general, arbitrary data types can be serialized and processed into input sequence 5. It is to be understood that element(s) 5-1, 5-2, . . . , 5-M depicted in Figure 12 can be the tokens or can be the embedded representations thereof.
[0180] Prediction layer(s) 6 can predict one or more output elements 7-1, 7-2, . . . , 7- N based on the input elements. Prediction layer(s) 6 can include one or more machine-learned model architectures, such as one or more layers of learned parameters that manipulate and transform the input(s) to extract higher-order meaning from, and relationships between, input element(s) 5-1, 5-2, . . . , 5-M. In this manner, for instance, example prediction layer(s) 6 can predict new output element(s) in view of the context provided by input sequence 5.
[0181] Prediction layer(s) 6 can evaluate associations between portions of input sequence 5 and a particular output element. These associations can inform a prediction of the likelihood that a particular output follows the input context. For example, consider the textual snippet, “The carpenter’s toolbox was small and heavy'. It was full of .” Example prediction layer(s) 6 can identify that “It” refers back to “toolbox” by determining a relationship between the respective embeddings. Example prediction layer(s) 6 can also link “It” to the attributes of the toolbox, such as “small” and “heavy7.” Based on these associations, prediction layer(s) 6 can, for instance, assign a higher probability to the word “nails” than to the word “sawdust.”
[0182] A transformer is an example architecture that can be used in prediction layer(s) 4. See, e.g, Vaswani et al.. Attention Is All You Need, ARXlV:1706.03762v7 (Aug. 2, 2023). A transformer is an example of a machine-learned model architecture that uses an attention mechanism to compute associations between items within a context w indow. The context window can include a sequence that contains input sequence 5 and potentially one or more output element(s) 7-1, 7-2, . . . , 7-N. A transformer block can include one or more attention layer(s) and one or more post-attention layer(s) (e.g., feedforward layer(s), such as a multi-layer perceptron).
[0183] Prediction layer(s) 6 can include other machine-learned model architectures in addition to or in lieu of transformer-based architectures. For example, recurrent neural networks (RNNs) and long short-term memory (LSTM) models can also be used, as w7ell as convolutional neural networks (CNNs). In general, prediction layer(s) 6 can leverage various kinds of artificial neural networks that can understand or generate sequences of information. [0184] Output sequence 7 can include or otherwise represent the same or different data types as input sequence 5. For instance, input sequence 5 can represent textual data, and output sequence 7 can represent textual data. Input sequence 5 can represent image, audio, or audiovisual data, and output sequence 7 can represent textual data (e.g., describing the image, audio, or audiovisual data). It is to be understood that prediction layer(s) 6, and any other interstitial model components of sequence processing model(s) 4, can be configured to receive a variety of data types in input sequence(s) 5 and output a variety7 of data ty pes in output sequence(s) 7.
[0185] Output sequence 7 can have various relationships to input sequence 5. Output sequence 7 can be a continuation of input sequence 5. Output sequence 7 can be complementary7 to input sequence 5. Output sequence 7 can translate, transform, augment, or otherwise modify input sequence 5. Output sequence 7 can answer, evaluate, confirm, or otherwise respond to input sequence 5. Output sequence 7 can implement (or describe instructions for implementing) an instruction provided via input sequence 5. [0186] Output sequence 7 can be generated autoregressively. For instance, for some applications, an output of one or more prediction layer(s) 6 can be passed through one or more output layers (e.g., softmax layer) to obtain a probability distribution over an output vocabulary (e.g., a textual or symbolic vocabulary) conditioned on a set of input elements in a context window. In this manner, for instance, output sequence 7 can be autoregressively generated by sampling a likely next output element, adding that element to the context window, and re-generating the probability distribution based on the updated context window, and sampling a likely next output element, and so forth.
[0187] Output sequence 7 can also be generated non-autoregressively. For instance, multiple output elements of output sequence 7 can be predicted together without explicit sequential conditioning on each other. See, e.g., Saharia et al., Non-Autoregressive Machine Translation with Latent Alignments, ARXIV:2004.07437V3 (Nov. 16, 2020).
[0188] Output sequence 7 can include one or multiple portions or elements. In an example content generation configuration, output sequence 7 can include multiple elements corresponding to multiple portions of a generated output sequence (e.g., a textual sentence, values of a discretized waveform, computer code. etc.). In an example classification configuration, output sequence 7 can include a single element associated with a classification output. For instance, an output “vocabulary” can include a set of classes into which an input sequence is to be classified. For instance, a vision transformer block can pass latent state information to a multilayer perceptron that outputs a likely class value associated with an input image.
[0189] Figure 13 is a block diagram of an example technique for populating an example input sequence 8. Input sequence 8 can include various functional elements that form part of the model infrastructure, such as an element 8-0 obtained from a task indicator 9 that signals to any model(s) that process input sequence 8 that a particular task is being performed (e.g., to help adapt a performance of the model(s) to that particular task). Input sequence 8 can include various data elements from different data modalities. For instance, an input modality 10-1 can include one modality of data. A data-to-sequence model 11-1 can process data from input modality 10-1 to project the data into a format compatible with input sequence 8 (e.g., one or more vectors dimensioned according to the dimensions of input sequence 8) to obtain elements 8-1, 8-2, 8-3. Another input modality 10-2 can include a different modality of data. A data-to-sequence model 11-2 can project data from input modality 10-2 into a format compatible with input sequence 8 to obtain elements 8-4, 8-5, 8- 6. Another input modality 10-3 can include yet another different modality of data. A data-to- sequence model 11-3 can project data from input modality 10-3 into a format compatible with input sequence 8 to obtain elements 8-7. 8-8, 8-9.
[0190] Input sequence 8 can be the same as or different from input sequence 5. Input sequence 8 can be a multimodal input sequence that contains elements that represent data from different modalities using a common dimensional representation. For instance, an embedding space can have P dimensions. Input sequence 8 can be configured to contain a plurality of elements that have P dimensions. In this manner, for instance, example implementations can facilitate information extraction and reasoning across diverse data modalities by projecting data into elements in the same embedding space for comparison, combination, or other computations therebetween.
[0191] For example, elements 8-0. . . . , 8-9 can indicate particular locations within a multidimensional embedding space. Some elements can map to a set of discrete locations in the embedding space. For instance, elements that correspond to discrete members of a predetermined vocabulary of tokens can map to discrete locations in the embedding space that are associated with those tokens. Other elements can be continuously distributed across the embedding space. For instance, some datatypes can be broken down into continuously defined portions (e.g., image patches) that can be described using continuously distributed locations within the embedding space.
[0192] In some implementations, the expressive power of the embedding space may not be limited to meanings associated with any particular set of tokens or other building blocks. For example, a continuous embedding space can encode a spectrum of high-order information. An individual piece of information (e.g., a token) can map to a particular point in that space: for instance, a token for the word ‘‘dog’' can be projected to an embedded value that points to a particular location in the embedding space associated with canine-related information. Similarly, an image patch of an image of a dog on grass can also be projected into the embedding space. In some implementations, the projection of the image of the dog can be similar to the projection of the word “dog” while also having similarity to a projection of the word “grass.” while potentially being different from both. In some implementations, the projection of the image patch may not exactly align with any single projection of a single word. In some implementations, the projection of the image patch can align with a combination of the projections of the words “dog” and “grass.” In this manner, for instance, a high-order embedding space can encode information that can be independent of data modalities in which the information is expressed. [0193] Task indicator 9 can include a model or model component configured to identify a task being performed and inject, into input sequence 8, an input value represented by element 8-0 that signals which task is being performed. For instance, the input value can be provided as a data type associated with an input modality and projected along with that input modality (e.g., the input value can be a textual task label that is embedded along with other textual data in the input; the input value can be a pixel-based representation of a task that is embedded along with other image data in the input; etc.). The input value can be provided as a data type that differs from or is at least independent from other input(s). For instance, the input value represented by element 8-0 can be learned within a continuous embedding space.
[0194] Input modalities 10-1, 10-2, and 10-3 can be associated with various different data types (e.g., as described above with respect to input(s) 2 and output(s) 3).
[0195] Data-to-sequence models 11-1, 11-2, and 11-3 can be the same or different from each other. Data-to-sequence models 11-1, 11-2, and 11-3 can be adapted to each respective input modality 10-1, 10-2, and 10-3. For example, a textual data-to-sequence model can subdivide a portion of input text and project the subdivisions into element(s) in input sequence 8 (e.g., elements 8-1, 8-2, 8-3, etc.). An image data-to-sequence model can subdivide an input image and project the subdivisions into element(s) in input sequence 8 (e.g., elements 8-4, 8-5, 8-6, etc.). An arbitrary data type data-to-sequence model can subdivide an input of that arbitrary data type and project the subdivisions into element(s) in input sequence 8 (e.g., elements 8-7, 8-8, 8-9, etc.).
[0196] Data-to-sequence models 11-1, 11-2, and 11-3 can form part of machine- learned sequence processing model(s) 4. Data-to-sequence models 11-1, 11-2, and 11-3 can be jointly trained with or trained independently from machine-learned sequence processing model(s) 4. Data-to-sequence models 11-1, 11-2, and 11-3 can be trained end-to-end with machine-learned sequence processing model(s) 4.
Example Machine-Learned Model Development Platform
[0197] Figure 14 is a block diagram of an example model development platform 12 that can facilitate creation, adaptation, and refinement of example machine-learned models (e.g., machine-learned model(s) 1, sequence processing model(s) 4, etc.). Model development platform 12 can provide a number of different toolkits that developer systems can employ in the development of new or adapted machine-learned models. [0198] Model development platform 12 can provide one or more model libraries 13 containing building blocks for new models. Model libraries 13 can include one or more pretrained foundational models 13-1, which can provide a backbone of processing power across various tasks. Model libraries 13 can include one or more pre-trained expert models 13-2, which can be focused on performance in particular domains of expertise. Model libraries 13 can include various model primitives 13-3, which can provide low-level architectures or components (optionally pre-trained), which can be assembled in various arrangements as desired.
[0199] Model development platform 12 can receive selections of various model components 14. Model development platform 12 can pass selected model components 14 to a workbench 15 that combines selected model components 14 into a development model 16. [0200] Workbench 15 can facilitate further refinement and adaptation of development model 16 by leveraging a number of different toolkits integrated with model development platform 12. For example, workbench 15 can facilitate alignment of the development model 16 with a desired performance profile on various tasks using a model alignment toolkit 17. [0201] Model alignment toolkit 17 can provide a number of tools for causing development model 16 to generate outputs aligned with desired behavioral characteristics.
Alignment can include increasing an accuracy, precision, recall, etc. of model outputs.
Alignment can include enforcing output styles, schema, or other preferential characteristics of model outputs. Alignment can be general or domain-specific. For instance, a pre-trained foundational model 13-1 can begin with an initial level of performance across multiple domains. Alignment of the pre-trained foundational model 13-1 can include improving a performance in a particular domain of information or tasks (e g., even at the expense of performance in another domain of information or tasks).
[0202] Model alignment toolkit 17 can integrate one or more dataset(s) 17-1 for aligning development model 16. Curated dataset(s) 17-1 can include labeled or unlabeled training data. Dataset(s) 17-1 can be obtained from public domain datasets. Dataset(s) 17-1 can be obtained from private datasets associated with one or more developer system(s) for the alignment of bespoke machine-learned model(s) customized for private use-cases.
[0203] Pre-training pipelines 17-2 can include a machine-learned model training workflow configured to update development model 1 over large-scale, potentially noisy datasets. For example, pre-training can leverage unsupervised learning techniques (e g., denoising, etc.) to process large numbers of training instances to update model parameters from an initialized state and achieve a desired baseline performance. Pre-training pipelines 17-2 can leverage unlabeled datasets in dataset(s) 17-1 to perform pre-training. Workbench 15 can implement a pre-training pipeline 17-2 to pre-train development model 16.
[0204] Fine-tuning pipelines 17-3 can include a machine-learned model training workflow configured to refine the model parameters of development model 16 with higher- quality data. Fine-tuning pipelines 17-3 can update development model 16 by conducting supervised training with labeled dataset(s) in dataset(s) 17-1. Fine-tuning pipelines 17-3 can update development model 16 by conducting reinforcement learning using reward signals from user feedback signals. Workbench 15 can implement a fine-tuning pipeline 17-3 to finetune development model 16.
[0205] Prompt libraries 17-4 can include sets of inputs configured to induce behavior aligned with desired performance criteria. Prompt libraries 17-4 can include few-shot prompts (e.g., inputs providing examples of desired model outputs for prepending to a desired runtime query), chain-of-thought prompts (e.g., inputs providing step-by-step reasoning within the exemplars to facilitate thorough reasoning by the model), and the like.
[0206] Example prompts can be retrieved from an available repository of prompt libraries 17-4. Example prompts can be contributed by one or more developer systems using workbench 15.
[0207] In some implementations, pre-trained or fine-tuned models can achieve satisfactory performance without exemplars in the inputs. For instance, zero-shot prompts can include inputs that lack exemplars. Zero-shot prompts can be within a domain within a training dataset or outside of the training domain(s).
[0208] Prompt libraries 17-4 can include one or more prompt engineering tools. Prompt engineering tools can provide workflows for retrieving or learning optimized prompt values. Prompt engineering tools can facilitate directly learning prompt values (e.g., input element values) based on one or more training iterations. Workbench 15 can implement prompt engineering tools in development model 16.
[0209] Prompt libraries 17-4 can include pipelines for prompt generation. For example, inputs can be generated using development model 16 itself or other machine- learned models. In this manner, for instance, a first model can process information about a task and output an input for a second model to process in order to perform a step of the task. The second model can be the same as or different from the first model. Workbench 15 can implement prompt generation pipelines in development model 16.
[0210] Prompt libraries 17-4 can include pipelines for context injection. For instance, a performance of development model 16 on a particular task can improve if provided with additional context for performing the task. Prompt libraries 17-4 can include software components configured to identify desired context, retrieve the context from an external source (e.g., a database, a sensor, etc.), and add the context to the input prompt. Workbench 15 can implement context injection pipelines in development model 16.
[0211] Although various training examples described herein with respect to model development platform 12 refer to “‘pre-training” and “fine-tuning.” it is to be understood that model alignment toolkit 17 can generally support a wide variety of training techniques adapted for training a wide variety of machine-learned models. Example training techniques can correspond to the example training method 1000 described above.
[0212] Model development platform 12 can include a model plugin toolkit 18. Model plugin toolkit 18 can include a variety of tools configured for augmenting the functionality of a machine-learned model by integrating the machine-learned model with other systems, devices, and software components. For instance, a machine-learned model can use tools to increase performance quality where appropriate. For instance, deterministic tasks can be offloaded to dedicated tools in lieu of probabilistically performing the task with an increased risk of error. For instance, instead of autoregressively predicting the solution to a system of equations, a machine-learned model can recognize a tool to call for obtaining the solution and pass the system of equations to the appropriate tool. The tool can be a traditional system of equations solver that can operate deterministically to resolve the system of equations. The output of the tool can be returned in response to the original query. In this manner, tool use can allow some example models to focus on the strengths of machine-learned models — e.g., understanding an intent in an unstructured request for a task — while augmenting the performance of the model by offloading certain tasks to a more focused tool for rote application of deterministic algorithms to a well-defined problem.
[0213] Model plugin toolkit 18 can include validation tools 18-1. Validation tools 18- 1 can include tools that can parse and confirm output(s) of a machine-learned model.
Validation tools 18-1 can include engineered heuristics that establish certain thresholds applied to model outputs. For example, validation tools 18-1 can ground the outputs of machine-learned models to structured data sources (e.g., to mitigate “hallucinations”). [0214] Model plugin toolkit 18 can include tooling packages 18-2 for implementing one or more tools that can include scripts or other executable code that can be executed alongside development model 16. Tooling packages 18-2 can include one or more inputs configured to cause machine-learned model(s) to implement the tools (e.g.. few-shot prompts that induce a model to output tool calls in the proper syntax, etc.). Tooling packages 18-2 can include, for instance, fine-tuning training data for training a model to use a tool.
[0215] Model plugin toolkit 18 can include interfaces for calling external application programming interfaces (APIs) 18-3. For instance, in addition to or in lieu of implementing tool calls or tool code directly with development model 16, development model 16 can be aligned to output instruction that initiate API calls to send or obtain data via external systems. [0216] Model plugin toolkit 18 can integrate with prompt libraries 17-4 to build a catalog of available tools for use with development model 16. For instance, a model can receive, in an input, a catalog of available tools, and the model can generate an output that selects a tool from the available tools and initiates a tool call for using the tool.
[0217] Model development platform 12 can include a computational optimization toolkit 19 for optimizing a computational performance of development model 16. For instance, tools for model compression 19-1 can allow development model 16 to be reduced in size while maintaining a desired level of performance. For instance, model compression 19-1 can include quantization workflows, weight pruning and sparsification techniques, etc. Tools for hardware acceleration 19-2 can facilitate the configuration of the model storage and execution formats to operate optimally on different hardware resources. For instance, hardware acceleration 19-2 can include tools for optimally sharding models for distributed processing over multiple processing units for increased bandwidth, lower unified memory requirements, etc. Tools for distillation 19-3 can provide for the training of lighter-weight models based on the knowledge encoded in development model 1 . For instance, development model 16 can be a highly performant, large machine-learned model optimized using model development platform 12. To obtain a lightweight model for running in resource-constrained environments, a smaller model can be a “student model” that learns to imitate development model 16 as a “teacher model.” In this manner, for instance, the investment in learning the parameters and configurations of development model 16 can be efficiently transferred to a smaller model for more efficient inference.
[0218] Workbench 15 can implement one. multiple, or none of the toolkits implemented in model development platform 12. Workbench 15 can output an output model 20 based on development model 16. Output model 20 can be a deployment version of development model 16. Output model 20 can be a development or training checkpoint of development model 16. Output model 20 can be a distilled, compressed, or otherwise optimized version of development model 16. [0219] Figure 15 is a block diagram of an example training flow for training a machine-learned development model 16. One or more portion(s) of the example training flow can be implemented by a computing system that includes one or more computing devices such as, for example, computing systems described with reference to the other figures. Each respective portion of the example training flow can be performed by any (or any combination) of one or more computing devices. Moreover, one or more portion(s) of the example training flow can be implemented on the hardware components of the device(s) described herein, for example, to train one or more systems or models. FIG. 15 depicts elements performed in a particular order for purposes of illustration and discussion. Those of ordinary skill in the art, using the disclosures provided herein, will understand that the elements of any of the methods discussed herein can be adapted, rearranged, expanded, omitted, combined, or modified in various ways without deviating from the scope of the present disclosure. FIG. 15 is described with reference to elements/terms described with respect to other systems and figures for exemplar}' illustrated purposes and is not meant to be limiting. One or more portions of the example training flow can be performed additionally, or alternatively, by other systems.
[0220] Initially, development model 16 can persist in an initial state as an initialized model 21. Development model 16 can be initialized with weight values. Initial weight values can be random or based on an initialization schema. Initial weight values can be based on prior pre-training for the same or for a different model.
[0221] Initialized model 21 can undergo pre-training in a pre-training stage 22. Pretraining stage 22 can be implemented using one or more pre-training pipelines 17-2 over data from dataset(s) 17-1. Pre-training can be omitted, for example, if initialized model 21 is already pre-trained (e.g.. development model 16 contains, is, or is based on a pre-trained foundational model or an expert model).
[0222] Pre-trained model 23 can then be a new version of development model 16, which can persist as development model 16 or as a new development model. Pre-trained model 23 can be the initial state if development model 16 was already pre-trained. Pre-trained model 23 can undergo fine-tuning in a fine-tuning stage 24. Fine-tuning stage 24 can be implemented using one or more fine-tuning pipelines 17-3 over data from dataset(s) 17-1. Fine-tuning can be omitted, for example, if a pre-trained model has satisfactory performance, if the model was already fine-tuned, or if other tuning approaches are preferred.
[0223] Fine-tuned model 29 can then be a new version of development model 16, which can persist as development model 16 or as a new development model. Fine-tuned model 29 can be the initial state if development model 16 was already fine-tuned. Fine-tuned model 29 can undergo refinement with user feedback 26. For instance, refinement with user feedback 26 can include reinforcement learning, optionally based on human feedback from human users of fine-tuned model 25. As reinforcement learning can be a form of fine-tuning, it is to be understood that fine-tuning stage 24 can subsume the stage for refining with user feedback 26. Refinement with user feedback 26 can produce a refined model 27. Refined model 27 can be output to downstream system(s) 28 for deployment or further development. [0224] In some implementations, computational optimization operations can be applied before, during, or after each stage. For instance, initialized model 21 can undergo computational optimization 29-1 (e.g.. using computational optimization toolkit 19) before pre-training stage 22. Pre-trained model 23 can undergo computational optimization 29-2 (e.g., using computational optimization toolkit 19) before fine-tuning stage 24. Fine-tuned model 25 can undergo computational optimization 29-3 (e.g., using computational optimization toolkit 19) before refinement w ith user feedback 26. Refined model 27 can undergo computational optimization 29-4 (e.g., using computational optimization toolkit 19) before output to downstream system(s) 28. Computational optimization(s) 29-1, . . . . 29-4 can all be the same, all be different, or include at least some different optimization techniques.
Example Machine-Learned Model Inference System [0225] Figure 16 is a block diagram of an inference system for operating one or more machine-learned model(s) 1 to perform inference (e.g., for training, for deployment, etc.). A model host 31 can receive machine-learned model(s) 1. Model host 31 can host one or more model instance(s) 31-1, which can be one or multiple instances of one or multiple models. Model host 31 can host model instance(s) 31-1 using available compute resources 31-2 associated with model host 31.
[0226] Model host 31 can perform inference on behalf of one or more client(s) 32. Client(s) 32 can transmit an input request 33 to model host 31. Using input request 33, model host 31 can obtain input(s) 2 for input to machine-learned model(s) I. Machine-learned model(s) 1 can process input(s) 2 to generate output(s) 3. Using output(s) 3, model host 31 can return an output payload 34 for responding to input request 33 from client(s) 32. Output payload 34 can include or be based on output(s) 3.
[0227] Model host 31 can leverage various other resources and tools to augment the inference task. For instance, model host 31 can communicate with tool interfaces 35 to facilitate tool use by model instance(s) 31-1. Tool interfaces 35 can include local or remote APIs. Tool interfaces 35 can include integrated scripts or other software functionality’. Model host 31 can engage online learning interface(s) 36 to facilitate ongoing improvements to machine-learned model(s) 1. For instance, online learning interface(s) 36 can be used within reinforcement learning loops to retrieve user feedback on inferences served by model host 31. Model host 31 can access runtime data source(s) 37 for augmenting input(s) 2 with additional contextual information. For instance, runtime data source(s) 37 can include a knowledge graph 37-1 that facilitates structured information retrieval for information associated with input request(s) 33 (e.g., a search engine service). Runtime data source(s) 37 can include public or private, external or local database(s) 37-2 that can store information associated with input request(s) 33 for augmenting input(s) 2. Runtime data source(s) 37 can include account data 37-3 which can be retrieved in association with a user account corresponding to a client 32 for customizing the behavior of model host 31 accordingly.
[0228] Model host 31 can be implemented by one or multiple computing devices or systems. Client(s) 2 can be implemented by one or multiple computing devices or systems, which can include computing devices or systems shared with model host 31.
[0229] For example, model host 31 can operate on a server system that provides a machine-learning service to client device(s) that operate client(s) 32 (e.g., over a local or wide-area network). Client device(s) can be end-user devices used by individuals. Client device(s) can be server systems that operate client(s) 32 to provide various functionality as a service to downstream end-user devices.
[0230] In some implementations, model host 31 can operate on a same device or system as client(s) 32. Model host 31 can be a machine-learning service that runs on-device to provide machine-learning functionality to one or multiple applications operating on a client device, which can include an application implementing client(s) 32. Model host 31 can be a part of a same application as client(s) 32. For instance, model host 31 can be a subroutine or method implemented by one part of an application, and client(s) 32 can be another subroutine or method that engages model host 31 to perform inference functions within the application. It is to be understood that model host 31 and client(s) 32 can have various different configurations.
[0231] Model instance(s) 31-1 can include one or more machine-learned models that are available for performing inference. Model instance(s) 31-1 can include weights or other model components that are stored in persistent storage, temporarily cached, or loaded into high-speed memory. Model instance(s) 31-1 can include multiple instance(s) of the same model (e.g., for parallel execution of more requests on the same model). Model instance(s) 31-1 can include instance(s) of different model(s). Model instance(s) 31-1 can include cached intermediate states of active or inactive model(s) used to accelerate inference of those models. For instance, an inference session with a particular model may generate significant amounts of computational results that can be re-used for future inference runs (e.g., using a KV cache for transformer-based models). These computational results can be saved in association with that inference session so that session can be executed more efficiently when resumed.
[0232] Compute resource(s) 31-2 can include one or more processors (central processing units, graphical processing units, tensor processing units, machine-learning accelerators, etc.) connected to one or more memory devices. Compute resource(s) 31-2 can include a dynamic pool of available resources shared with other processes. Compute resource(s) 31-2 can include memory devices large enough to fit an entire model instance in a single memory instance. Compute resource(s) 31-2 can also share model instance(s) across multiple memory devices (e.g., using data parallelization or tensor parallelization, etc.). This can be done to increase parallelization or to execute a large model using multiple memory devices which individually might not be able to fit the entire model into memory.
[0233] Input request 33 can include data for input(s) 2. Model host 31 can process input request 33 to obtain input(s) 2. Input(s) 2 can be obtained directly from input request 33 or can be retrieved using input request 33. Input request 33 can be submitted to model host 31 via an API.
[0234] Model host 31 can perform inference over batches of input requests 33 in parallel. For instance, a model instance 31-1 can be configured with an input structure that has a batch dimension. Separate input(s) 2 can be distributed across the batch dimension (e.g., rows of an array). The separate input(s) 2 can include completely different contexts. The separate input(s) 2 can be multiple inference steps of the same task. The separate input(s) 2 can be staggered in an input structure, such that any given inference cycle can be operating on different portions of the respective input(s) 2. In this manner, for instance, model host 31 can perform inference on the batch in parallel, such that output(s) 3 can also contain the batch dimension and return the inference results for the batched input(s) 2 in parallel. In this manner, for instance, batches of input request(s) 33 can be processed in parallel for higher throughput of output payload(s) 34.
[0235] Output payload 34 can include or be based on output(s) 3 from machine- learned model(s) 1. Model host 31 can process output(s) 3 to obtain output payload 34. This can include chaining multiple rounds of inference (e.g., iteratively, recursively, across the same model(s) or different model(s)) to arrive at a final output for a task to be returned in output payload 34. Output payload 34 can be transmitted to client(s) 32 via an API.
[0236] Online learning interface(s) 36 can facilitate reinforcement learning of machine-learned model(s) 1. Online learning interface(s) 36 can facilitate reinforcement learning with human feedback (RLHF). Online learning interface(s) 36 can facilitate federated learning of machine-learned model(s) 1.
[0237] Model host 31 can execute machine-learned model(s) 1 to perform inference for various tasks using various types of data. For example, various different input(s) 2 and output(s) 3 can be used for various different tasks. In some implementations, input(s) 2 can be or otherwise represent image data. Machine-learned model(s) 1 can process the image data to generate an output. As an example, machine-learned model(s) 1 can process the image data to generate an image recognition output (e.g., a recognition of the image data, a latent embedding of the image data, an encoded representation of the image data, a hash of the image data, etc.). As another example, machine-learned model(s) 1 can process the image data to generate an image segmentation output. As another example, machine-learned model(s) 1 can process the image data to generate an image classification output. As another example, machine-learned model(s) 1 can process the image data to generate an image data modification output (e.g., an alteration of the image data, etc.). As another example, machine- learned model(s) 1 can process the image data to generate an encoded image data output (e.g., an encoded and/or compressed representation of the image data, etc.). As another example, machine-learned model(s) 1 can process the image data to generate an upscaled image data output. As another example, machine-learned model(s) 1 can process the image data to generate a prediction output.
[0238] In some implementations, the task is a computer vision task. In some cases, input(s) 2 includes pixel data for one or more images and the task is an image processing task. For example, the image processing task can be image classification, where the output is a set of scores, each score corresponding to a different object class and representing the likelihood that the one or more images depict an object belonging to the object class. The image processing task may be object detection, where the image processing output identifies one or more regions in the one or more images and, for each region, a likelihood that region depicts an object of interest. As another example, the image processing task can be image segmentation, where the image processing output defines, for each pixel in the one or more images, a respective likelihood for each category in a predetermined set of categories. For example, the set of categories can be foreground and background. As another example, the set of categories can be object classes. As another example, the image processing task can be depth estimation, where the image processing output defines, for each pixel in the one or more images, a respective depth value. As another example, the image processing task can be motion estimation, where the network input includes multiple images, and the image processing output defines, for each pixel of one of the input images, a motion of the scene depicted at the pixel between the images in the network input.
[0239] In some implementations, input(s) 2 can be or otherwise represent natural language data. Machine-learned model(s) 1 can process the natural language data to generate an output. As an example, machine-learned model(s) 1 can process the natural language data to generate a language encoding output. As another example, machine-learned model(s) 1 can process the natural language data to generate a latent text embedding output. As another example, machine-learned model(s) 1 can process the natural language data to generate a translation output. As another example, machine-learned model(s) 1 can process the natural language data to generate a classification output. As another example, machine-learned model(s) 1 can process the natural language data to generate a textual segmentation output. As another example, machine-learned model(s) 1 can process the natural language data to generate a semantic intent output. As another example, machine-learned model(s) 1 can process the natural language data to generate an upscaled text or natural language output (e.g.. text or natural language data that is higher quality than the input text or natural language, etc.). As another example, machine-learned model(s) 1 can process the natural language data to generate a prediction output (e.g., one or more predicted next portions of natural language content).
[0240] In some implementations, input(s) 2 can be or otherwise represent speech data (e.g., data describing spoken natural language, such as audio data, textual data, etc.). Machine-learned model(s) 1 can process the speech data to generate an output. As an example, machine-learned model(s) 1 can process the speech data to generate a speech recognition output. As another example, machine-learned model(s) 1 can process the speech data to generate a speech translation output. As another example, machine-learned model(s) 1 can process the speech data to generate a latent embedding output. As another example, machine-learned model(s) 1 can process the speech data to generate an encoded speech output (e.g., an encoded and/or compressed representation of the speech data, etc.). As another example, machine-learned model(s) 1 can process the speech data to generate an upscaled speech output (e.g., speech data that is higher quality than the input speech data. etc.). As another example, machine-learned model(s) 1 can process the speech data to generate a textual representation output (e.g., a textual representation of the input speech data, etc.). As another example, machine-learned model(s) 1 can process the speech data to generate a prediction output.
[0241] In some implementations, input(s) 2 can be or otherwise represent latent encoding data (e.g., a latent space representation of an input, etc.). Machine-learned model(s) 1 can process the latent encoding data to generate an output. As an example, machine- learned model(s) 1 can process the latent encoding data to generate a recognition output. As another example, machine-learned model(s) 1 can process the latent encoding data to generate a reconstruction output. As another example, machine-learned model(s) 1 can process the latent encoding data to generate a search output. As another example, machine- learned model(s) 1 can process the latent encoding data to generate a reclustering output. As another example, machine-learned model(s) 1 can process the latent encoding data to generate a prediction output.
[0242] In some implementations, input(s) 2 can be or otherwise represent statistical data. Statistical data can be, represent, or otherwise include data computed and/or calculated from some other data source. Machine-learned model(s) 1 can process the statistical data to generate an output. As an example, machine-learned model(s) 1 can process the statistical data to generate a recognition output. As another example, machine-learned model(s) 1 can process the statistical data to generate a prediction output. As another example, machine- learned model(s) 1 can process the statistical data to generate a classification output. As another example, machine-learned model(s) 1 can process the statistical data to generate a segmentation output. As another example, machine-learned model(s) 1 can process the statistical data to generate a visualization output. As another example, machine-learned model(s) 1 can process the statistical data to generate a diagnostic output.
[0243] In some implementations, input(s) 2 can be or otherwise represent sensor data. Machine-learned model(s) 1 can process the sensor data to generate an output. As an example, machine-learned model(s) 1 can process the sensor data to generate a recognition output. As another example, machine-learned model(s) 1 can process the sensor data to generate a prediction output. As another example, machine-learned model(s) 1 can process the sensor data to generate a classification output. As another example, machine-learned model(s) 1 can process the sensor data to generate a segmentation output. As another example, machine-learned model(s) 1 can process the sensor data to generate a visualization output. As another example, machine-learned model(s) 1 can process the sensor data to generate a diagnostic output. As another example, machine-learned model(s) 1 can process the sensor data to generate a detection output.
[0244] In some implementations, machine-learned model(s) 1 can be configured to perform a task that includes encoding input data for reliable and/or efficient transmission or storage (and/or corresponding decoding). For example, the task may be an audio compression task. The input may include audio data and the output may comprise compressed audio data. In another example, the input includes visual data (e.g. one or more images or videos), the output comprises compressed visual data, and the task is a visual data compression task. In another example, the task may comprise generating an embedding for input data (e.g. input audio or visual data). In some cases, the input includes audio data representing a spoken utterance and the task is a speech recognition task. The output may comprise a text output which is mapped to the spoken utterance. In some cases, the task comprises encrypting or decry pting input data. In some cases, the task comprises a microprocessor performance task, such as branch prediction or memory' address translation.
[0245] In some implementations, the task is a generative task, and machine-learned model(s) 1 can be configured to output content generated in view of input(s) 2. For instance. input(s) 2 can be or otherwise represent data of one or more modalities that encodes context for generating additional content.
[0246] In some implementations, the task can be a text completion task. Machine- learned model(s) 1 can be configured to process input(s) 2 that represent textual data and to generate output(s) 3 that represent additional textual data that completes a textual sequence that includes input(s) 2. For instance, machine-learned model(s) 1 can be configured to generate output(s) 3 to complete a sentence, paragraph, or portion of text that follows from a portion of text represented by input(s) 2.
[0247] In some implementations, the task can be an instruction following task. Machine-learned model(s) 1 can be configured to process input(s) 2 that represent instructions to perform a function and to generate output(s) 3 that advance a goal of satisfying the instruction function (e.g., at least a step of a multi-step procedure to perform the function). Output(s) 3 can represent data of the same or of a different modality as input(s) 2. For instance, input(s) 2 can represent textual data (e.g., natural language instructions for a task to be performed) and machine-learned model(s) 1 can process input(s) 2 to generate output(s) 3 that represent textual data responsive to the instructions (e.g., natural language responses, programming language responses, machine language responses, etc.). Input(s) 2 can represent image data (e.g., image-based instructions for a task to be performed, optionally accompanied by textual instructions) and machine-learned model(s) 1 can process input(s) 2 to generate output(s) 3 that represent textual data responsive to the instructions (e.g., natural language responses, programming language responses, machine language responses, etc.). One or more output(s) 3 can be iteratively or recursively generated to sequentially process and accomplish steps toward accomplishing the requested functionality. For instance, an initial output can be executed by an external system or be processed by machine-learned model(s) 1 to complete an initial step of performing a function. Multiple steps can be performed, with a final output being obtained that is responsive to the initial instructions. [0248] In some implementations, the task can be a question answering task. Machine- learned model(s) 1 can be configured to process input(s) 2 that represent a question to answer and to generate output(s) 3 that advance a goal of returning an answer to the question (e.g., at least a step of a multi-step procedure to perform the function). Output(s) 3 can represent data of the same or of a different modality as input(s) 2. For instance, input(s) 2 can represent textual data (e.g., natural language instructions for a task to be performed) and machine- learned model(s) 1 can process input(s) 2 to generate output(s) 3 that represent textual data responsive to the question (e.g., natural language responses, programming language responses, machine language responses, etc.). Input(s) 2 can represent image data (e.g., image-based instructions for a task to be performed, optionally accompanied by textual instructions) and machine-learned model(s) 1 can process input(s) 2 to generate output(s) 3 that represent textual data responsive to the question (e.g., natural language responses, programming language responses, machine language responses, etc.). One or more output(s) 3 can be iteratively or recursively generated to sequentially process and accomplish steps toward answering the question. For instance, an initial output can be executed by an external system or be processed by machine-learned model(s) 1 to complete an initial step of obtaining an answer to the question (e.g., querying a database, performing a computation, executing a script, etc.). Multiple steps can be performed, with a final output being obtained that is responsive to the question.
[0249] In some implementations, the task can be an image generation task. Machine- learned model(s) 1 can be configured to process input(s) 2 that represent context regarding a desired portion of image content. The context can include text data, image data, audio data, etc. Machine-learned model(s) 1 can be configured to generate output(s) 3 that represent image data that depicts imagery related to the context. For instance, machine-learned model(s) 1 can be configured to generate pixel data of an image. Values for channel (s) associated with the pixels in the pixel data can be selected based on the context (e.g., based on a probability determined based on the context).
[0250] In some implementations, the task can be an audio generation task. Machine- learned model(s) 1 can be configured to process input(s) 2 that represent context regarding a desired portion of audio content. The context can include text data, image data, audio data, etc. Machine-learned model(s) 1 can be configured to generate output(s) 3 that represent audio data related to the context. For instance, machine-learned model(s) 1 can be configured to generate waveform data in the form of an image (e.g., a spectrogram). Values for channel(s) associated with pixels of the image can be selected based on the context. Machine- learned model(s) 1 can be configured to generate waveform data in the form of a sequence of discrete samples of a continuous waveform. Values of the sequence can be selected based on the context (e.g., based on a probability determined based on the context).
[0251] In some implementations, the task can be a data generation task. Machine- learned model(s) 1 can be configured to process input(s) 2 that represent context regarding a desired portion of data (e.g., data from various data domains, such as sensor data, image data, multimodal data, statistical data. etc.). The desired data can be, for instance, synthetic data for training other machine-learned models. The context can include arbitrary data type(s). Machine-learned model(s) 1 can be configured to generate output(s) 3 that represent data that aligns with the desired data. For instance, machine-learned model (s) 1 can be configured to generate data values for populating a dataset. Values for the data object(s) can be selected based on the context (e.g., based on a probability determined based on the context).
Example Computing Systems and Devices
[0252] Figure 17 is a block diagram of an example networked computing system that can perform aspects of example implementations of the present disclosure. The system can include a number of computing devices and systems that are communicatively coupled over a network 49. An example computing device 50 is described to provide an example of a computing device that can perform any aspect of the present disclosure (e.g., implementing model host 31. chent(s) 32, or both). An example server computing system 60 is described as an example of a server computing system that can perform any aspect of the present disclosure (e.g., implementing model host 31, client(s) 32, or both). Computing device 50 and server computing system(s) 60 can cooperatively interact (e.g., over network 49) to perform any aspect of the present disclosure (e.g., implementing model host 31. client(s) 32, or both). Model development platform system 70 is an example system that can host or serve model development platform(s) 12 for development of machine-learned models. Third-party system(s) 80 are example system(s) with which any of computing device 50, server computing system(s) 60, or model development platform system(s) 70 can interact in the performance of various aspects of the present disclosure (e.g., engaging third-party tools, accessing third-party databases or other resources, etc.).
[0253] Network 49 can be any type of communications network, such as a local area network (e.g., intranet), wide area network (e.g., Internet), or some combination thereof and can include any number of wired or wireless links. In general, communication over network 49 can be carried via any type of wired or wireless connection, using a wide variety of communication protocols (e.g., TCP/IP, HTTP, SMTP, FTP), encodings or formats (e.g., HTML, XML), or protection schemes (e.g., VPN. secure HTTP, SSL). Network 49 can also be implemented via a system bus. For instance, one or more devices or systems of Figure 17 can be co-located with, contained by, or otherwise integrated into one or more other devices or systems.
[0254] Computing device 50 can be any type of computing device, such as, for example, a personal computing device (e.g.. laptop or desktop), a mobile computing device (e g., smartphone or tablet), a gaming console or controller, a wearable computing device, an embedded computing device, a server computing device, a virtual machine operating on a host device, or any other type of computing device. Computing device 50 can be a client computing device. Computing device 50 can be an end-user computing device. Computing device 50 can be a computing device of a service provided that provides a service to an end user (who may use another computing device to interact with computing device 50).
[0255] Computing device 50 can include one or more processors 51 and a memory 52. Processor(s) 51 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, an FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected. Memory 52 can include one or more non-transitory computer-readable storage media, such as HBM, RAM, ROM, EEPROM, EPROM, flash memory’ devices, magnetic disks, etc., and combinations thereof. Memory 52 can store data 53 and instructions 54 which can be executed by processor(s) 51 to cause computing device 50 to perform operations. The operations can implement any one or multiple features described herein. The operations can implement example methods and techniques described herein.
[0256] Computing device 50 can also include one or more input components that receive user input. For example, a user input component can be a touch-sensitive component (e.g., a touch-sensitive display screen or a touch pad) that is sensitive to the touch of a user input object (e.g., a finger or a stylus). The touch-sensitive component can serve to implement a virtual keyboard. Other example user input components include a microphone, camera, LIDAR, a physical keyboard or other buttons, or other means by which a user can provide user input.
[0257] Computing device 50 can store or include one or more machine-learned models 55. Machine-learned models 55 can include one or more machine-learned model(s) 1, such as a sequence processing model 4. Machine-learned models 55 can include one or multiple model instance(s) 31-1. Machine-learned model(s) 55 can be received from server computing system(s) 60, model development platform system 70. third party system(s) 80 (e.g., an application distribution platform), or developed locally on computing device 50. Machine-learned model(s) 55 can be loaded into memory 52 and used or otherwise implemented by processor(s) 51. Computing device 50 can implement multiple parallel instances of machine-learned model(s) 55.
[0258] Server computing system(s) 60 can include one or more processors 61 and a memory 62. Processor(s) 61 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, an FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality7 of processors that are operatively connected. Memory7 62 can include one or more non-transitory computer-readable storage media, such as HBM, RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof. Memory 62 can store data 63 and instructions 64 which can be executed by processor(s) 61 to cause server computing system(s) 60 to perform operations. The operations can implement any one or multiple features described herein. The operations can implement example methods and techniques described herein.
[0259] In some implementations, server computing system 60 includes or is otherwise implemented by one or multiple server computing devices. In instances in which server computing system 60 includes multiple server computing devices, such server computing devices can operate according to sequential computing architectures, parallel computing architectures, or some combination thereof.
[0260] Server computing system 60 can store or otherwise include one or more machine-learned models 65. Machine-learned model(s) 65 can be the same as or different from machine-learned model(s) 55. Machine-learned models 65 can include one or more machine-learned model(s) 1. such as a sequence processing model 4. Machine-learned models 65 can include one or multiple model instance(s) 31-1. Machine-learned model(s) 65 can be received from computing device 50, model development platform system 70, third party system(s) 80. or developed locally on server computing system(s) 60. Machine-learned model(s) 65 can be loaded into memory 62 and used or otherwise implemented by processor(s) 61. Server computing system(s) 60 can implement multiple parallel instances of machine-learned model(s) 65.
[0261] In an example configuration, machine-learned models 65 can be included in or otherwise stored and implemented by server computing system 60 to establish a client-server relationship with computing device 50 for serving model inferences. For instance, server computing system(s) 60 can implement model host 31 on behalf of client(s) 32 on computing device 50. For instance, machine-learned models 65 can be implemented by server computing system 60 as a portion of a web service (e.g., remote machine-learned model hosting service, such as an online interface for performing machine-learned model operations over a network on server computing system(s) 60). For instance, server computing system(s) 60 can communicate with computing device 50 over a local intranet or internet connection. For instance, computing device 50 can be a workstation or endpoint in communication with server computing system(s) 60, with implementation of machine-learned models 65 being managed by server computing system(s) 60 to remotely perform inference (e g., for runtime or training operations), with output(s) returned (e.g., cast, streamed, etc.) to computing device 50. Machine-learned models 65 can work cooperatively or interoperatively with machine- learned models 55 on computing device 50 to perform various tasks.
[0262] Model development platform system(s) 70 can include one or more processors 71 and a memory 72. Processor(s) 71 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, an FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected. Memory 72 can include one or more non-transitory computer-readable storage media, such as HBM, RAM, ROM, EEPROM, EPROM, flash memoiy devices, magnetic disks, etc., and combinations thereof. Memory 72 can store data 73 and instructions 74 which can be executed by processor(s) 71 to cause model development platform system(s) 70 to perform operations. The operations can implement any one or multiple features described herein. The operations can implement example methods and techniques described herein. Example operations include the functionality described herein with respect to model development platform 12. This and other functionality can be implemented by developer tool(s) 75.
[0263] Third-party system(s) 80 can include one or more processors 81 and a memoiy 82. Processor(s) 81 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, an FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected. Memory 82 can include one or more non-transitory computer-readable storage media, such as HBM, RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof. Memory 82 can store data 83 and instructions 84 which can be executed by processor(s) 81 to cause third-party system(s) 80 to perform operations. The operations can implement any one or multiple features described herein. The operations can implement example methods and techniques described herein. Example operations include the functionality described herein with respect to tools and other external resources called when training or performing inference with machine-learned model(s) 1, 4, 16, 20, 55, 65, etc. (e.g., third-party resource(s) 85).
[0264] Figure 17 illustrates one example arrangement of computing systems that can be used to implement the present disclosure. Other computing system configurations can be used as well. For example, in some implementations, one or both of computing system 50 or server computing system(s) 60 can implement all or a portion of the operations of model development platform system 70. For example, computing system 50 or server computing system(s) 60 can implement developer tool(s) 75 (or extensions thereof) to develop, update/train, or refine machine-learned models 1, 4, 16, 20, 55, 65, etc. using one or more techniques described herein with respect to model alignment toolkit 17. In this manner, for instance, computing system 50 or server computing system(s) 60 can develop, update/train, or refine machine-learned models based on local datasets (e.g., for model personalization/customization, as permitted by user data preference selections).
[0265] Figure 18 is a block diagram of an example computing device 98 that performs according to example embodiments of the present disclosure. Computing device 98 can be a user computing device or a server computing device (e.g., computing device 50, server computing system(s) 60, etc.). Computing device 98 can implement model host 31. For instance, computing device 98 can include a number of applications (e.g., applications 1 through N). Each application can contain its own machine learning library and machine- learned model(s). For example, each application can include a machine-learned model. Example applications include a text messaging application, an email application, a dictation application, a virtual keyboard application, a browser application, etc. As illustrated in Figure 18, each application can communicate with a number of other components of the computing device, such as, for example, one or more sensors, a context manager, a device state component, or additional components. In some implementations, each application can communicate with each device component using an API (e.g., a public API). In some implementations, the API used by each application is specific to that application.
[0266] Figure 19 is a block diagram of an example computing device 99 that performs according to example embodiments of the present disclosure. Computing device 99 can be the same as or different from computing device 98. Computing device 99 can be a user computing device or a server computing device (e.g., computing device 50, server computing system(s) 60. etc.). Computing device 98 can implement model host 31. For instance, computing device 99 can include a number of applications (e.g., applications 1 through N). Each application can be in communication with a central intelligence layer. Example applications include a text messaging application, an email application, a dictation application, a virtual keyboard application, a browser application, etc. In some implementations, each application can communicate with the central intelligence layer (and model(s) stored therein) using an API (e.g., a common API across all applications).
[0267] The central intelligence layer can include a number of machine-learned models. For example, as illustrated in Figure 19, a respective machine-learned model can be provided for each application and managed by the central intelligence layer. In other implementations, two or more applications can share a single machine-learned model. For example, in some implementations, the central intelligence layer can provide a single model for all of the applications. In some implementations, the central intelligence layer is included within or otherwise implemented by an operating system of computing device 99.
[0268] The central intelligence layer can communicate with a central device data layer. The central device data layer can be a centralized repository of data for computing device 99. As illustrated in Figure 19, the central device data layer can communicate with a number of other components of the computing device, such as, for example, one or more sensors, a context manager, a device state component, or additional components. In some implementations, the central device data layer can communicate with each device component using an API (e.g., a private API).
Additional Disclosure
[0269] The technology discussed herein makes reference to servers, databases, software applications, and other computer-based systems, as well as actions taken and information sent to and from such systems. The inherent flexibility of computer-based systems allows for a great variety of possible configurations, combinations, and divisions of tasks and functionality between and among components. For instance, processes discussed herein can be implemented using a single device or component or multiple devices or components working in combination. Databases and applications can be implemented on a single system or distributed across multiple systems. Distributed components can operate sequentially or in parallel.
[0270] While the present subject matter has been described in detail with respect to various specific example embodiments thereof, each example is provided by way of explanation, not limitation of the disclosure. Those skilled in the art, upon attaining an understanding of the foregoing, can readily produce alterations to, variations of, and equivalents to such embodiments. Accordingly, the subject disclosure does not preclude inclusion of such modifications, variations or additions to the present subject matter as would be readily apparent to one of ordinary skill in the art. For instance, features illustrated or described as part of one embodiment can be used with another embodiment to yield a still further embodiment. Thus, it is intended that the present disclosure cover such alterations, variations, and equivalents.
[0271] Aspects of the disclosure have been described in terms of illustrative embodiments thereof. Any and all features in the following claims can be combined or rearranged in any way possible, including combinations of claims not explicitly enumerated in combination together, as the example claim dependencies listed herein should not be read as limiting the scope of possible combinations of features disclosed herein. Accordingly, the scope of the present disclosure is by way of example rather than by way of limitation, and the subject disclosure does not preclude inclusion of such modifications, variations or additions to the present subject matter as would be readily apparent to one of ordinary skill in the art. Moreover, terms are described herein using lists of example elements joined by conjunctions such as 'and." “or.” “but,” etc. It should be understood that such conjunctions are provided for explanatory purposes only. Clauses and other sequences of items joined by a particular conjunction such as “or,” for example, can refer to “and/or,” “at least one of’, “any combination of’ example elements listed therein, etc. Terms such as “based on” should be understood as “based at least in part on.”
[0272] The term “can” should be understood as referring to a possibility of a feature in various implementations and not as prescribing an ability that is necessarily present in every' implementation. For example, the phrase “X can perform Y” should be understood as indicating that, in various implementations, X has the potential to be configured to perfomi Y, and not as indicating that in every instance X must always be able to perform Y. It should be understood that, in various implementations, X might be unable to perform Y and remain within the scope of the present disclosure.
[0273] The term '‘may” should be understood as referring to a possibility of a feature in various implementations and not as prescribing an ability that is necessarily present in every implementation. For example, the phrase “X may perform Y” should be understood as indicating that, in various implementations, X has the potential to be configured to perform Y, and not as indicating that in every instance X must always be able to perform Y. It should be understood that, in various implementations, X might be unable to perform Y and remain within the scope of the present disclosure.

Claims

WHAT IS CLAIMED IS:
1. A computer-implemented method, the method comprising: obtaining, by a computing system comprising one or more processors, a content dataset, wherein the content dataset comprises a plurality of content items associated with a plurality of respective resources and a plurality of quality scores associated with the plurality of respective resources, wherein the plurality of quality scores are descriptive of a quality of a respective resource as a search result; processing, by the computing system, a prompt with a generative model to generate a plurality of candidate model-generated responses, wherein the prompt comprises a request for information; determining, by the computing system and based on the content dataset, a subset of the content items of the plurality of content items that are associated with at least a subset of the plurality of candidate model-generated responses; determining, by the computing system and based on a set of respective quality scores associated with the subset of the content items, a particular model-generated response of the plurality of candidate model-generated responses to provide as an output; and providing, by the computing system, the particular model-generated response as an output response to the prompt.
2. The method of any preceding claim, wherein determining the particular model-generated response of the plurality of candidate model-generated responses to provide as an output comprises: determining, by the computing system and based on the content dataset, the particular model-generated response of the plurality of candidate model-generated responses is associated with a respective content item of the subset of content items with a respective quality score greater than the other quality scores of the set of respective quality scores associated with other content items of the subset of content items associated with a set of other candidate model -generated responses.
3. The method of any preceding claim, wherein the content dataset comprises a plurality of pre-existing content items published on the internet, wherein the generative model comprises a pretrained autoregressive language model, and wherein the particular model-generated response comprises a natural language response to the prompt.
4. The method of any preceding claim, further comprising: selecting, by the computing system, a set of the content items for inclusion in a training dataset based on the plurality of quality scores; training, by the computing system, a machine-learned reward model using the training dataset; and storing, by the computing system, the machine-learned reward model and the generative model.
5. The method of claim 4. further comprising: obtaining, by the computing system, a plurality of interaction datasets associated with a plurality of additional resources, wherein the plurality of additional resources comprise a plurality of model-generated content items that were previously generated with the generative model, wherein the plurality of interaction datasets are descriptive of respective interactions with the plurality of additional resources by a plurality of users; training, by the computing system, the machine-learned reward model based on the plurality of interaction datasets and the plurality of model-generated content items.
6. The method according to claim 4 or 5, further comprising: processing, by the computing system, a second prompt with the generative model to generate a plurality’ of model-generated fragments, wherein the plurality of model-generated fragments comprise a plurality of different candidate responses to the second prompt; processing, by the computing system, the plurality of model-generated fragments with the machine-learned reward model to generate a plurality of respective scores, wherein the plurality of respective scores are associated with evaluating a quality of the plurality of model-generated fragments; and providing, by the computing system, a particular model-generated fragment of the plurality of model-generated fragments for display based on the plurality of respective scores.
7. The method of any preceding claim, wherein the plurality of quality scores were determined by: processing the plurality of content items and plurality of respective metadata sets associated with the plurality of respective resources with a ranking engine to generate a plurality of ranking scores.
8. The method of claim 7. wherein the ranking engine is associated with a search engine, wherein the ranking engine is configured to rank resources to determine particular resources to provide as search results.
9. The method of any preceding claim, wherein the plurality of quality scores were determined based on incoming links and outgoing links for the plurality of respective resources.
10. The method of any preceding claim, wherein the prompt comprises multimodal data, wherein the multimodal data comprises an image and text descriptive of a question associated with the image.
11. The method of claim 10, wherein the particular model -generated response comprises a predicted answer based on one or more image features of the image.
12. The method of claim 10, wherein the particular model -generated response comprises an augmented image.
13. The method of claim 12, wherein the augmented image comprises one or more annotations responsive to the question.
14. The method of any preceding claim, wherein the plurality of quality scores were determined based on a quantity and quality of incoming links for the plurality of respective resources.
15. A computing system for parameter adjustment, the system comprising: one or more processors; and one or more non-transitory computer-readable media that collectively store instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising: obtaining a training dataset, wherein the training dataset comprises a plurality of content items associated with a plurality’ of respective web resources and a plurality of qualify scores associated with the plurality of respective web resources, wherein the plurality' of quality scores are determined based on incoming links and outgoing links for the plurality of respective web resources; processing a prompt with a generative model to generate a plurality of probabilities associated with a plurality of candidate model outputs; determining a first ground truth example from the training dataset is associated with a first candidate model output of the plurality of candidate model outputs and a second ground truth example from the training dataset is associated with a second candidate model output of the plurality of candidate model outputs; determining the first ground truth example is associated with a first web resource with a higher quality score than a second web resource associated with the second ground truth example; evaluating a loss function that evaluates a difference between a first probability associated with the first candidate model output and a second probability associated with the second candidate model output; and adjusting one or more parameters of the generative model based on the loss function.
16. The system of claim 15, wherein the operations further comprise: obtaining input data, wherein the input data is descriptive of a user prompt; processing the input data with the generative model to generate a model-generated response, wherein the model -generated response is responsive to the user prompt; and providing the model-generated response as an output.
17. The system of claim 16, wherein the user prompt comprises a natural language question, and wherein the model-generated response comprises a plurality of predicted words responsive to the question.
18. The system of claim 17, wherein the model-generated response comprises a sequence of words that differs from the plurality’ of content items.
19. The system according to claim 16 or 17, wherein the user prompt comprises multimodal data, wherein the user prompt comprises an image and a question associated with the image.
20. The system as in any of claims 15 - 19, wherein the plurality of quality scores are determined based on an amount of references to a respective web resource within other resources, and wherein the plurality of quality scores are determined based on how the respective web resource is referenced.
21. One or more non- transitory computer-readable media that collectively store instructions that, when executed by one or more computing devices, cause the one or more computing devices to perform operations, the operations comprising: obtaining a training dataset, wherein the training dataset comprises a plurality of content items associated with a plurality of respective web resources and a plurality of quality scores associated with the plurality of respective web resources, wherein the plurality of quality scores are determined based on incoming links and outgoing links for the plurality of respective web resources; training a machine-learned reward model on the training dataset, wherein the machine-learned reward model is trained to rank a set of data based on a determined quality score; obtaining a prompt; processing the prompt with a generative model to generate a plurality of modelgenerated fragments, wherein the plurality of model-generated fragments comprise a plurality of different candidate responses to the prompt; processing the plurality of model -generated fragments with the machine-learned rew ard model to generate a plurality of respective scores, wherein the plurality of respective scores are associated with evaluating a quality of the plurality of model-generated fragments; and providing a particular model-generated fragment of the plurality of model-generated fragments as an output based on the plurality of respective scores.
22. The one or more non-transitory computer-readable media of claim 21, the operations further comprising: obtaining a plurality of interaction datasets associated with a plurality of additional resources, wherein the plurality of additional resources comprise a plurality of modelgenerated content items that were previously generated with the generative model, wherein the plurality of interaction datasets are descriptive of respective interactions with the plurality of additional resources by a plurality of users; and adjusting one or more parameters of the machine-learned reward model based on the plurality of interaction datasets and the plurality of model-generated content items.
23. The one or more non-transitory computer-readable media according to claim 21 or 22, wherein providing the particular model-generated fragment of the plurality of model-generated fragments as the output based on the plurality of respective scores comprises: determining the particular model-generated fragment is associated with a particular respective score that is higher than a plurality of other respective scores.
24. The one or more non-transitory computer-readable media as in any of claims 21 - 23, wherein the operations further comprise: processing the prompt with an additional generative model to generate a plurality of additional model-generated fragments; processing the plurality of additional model-generated fragments with the machine- learned reward model to generate a plurality of additional scores; determining the additional generative model is associated with a task of the prompt based on the plurality of respective scores and the plurality7 of additional scores; and generating a long-form model-generated response with the additional generative model.
25. A computer-implemented method for tuning a machine-learned model, the method comprising: obtaining, by a computing system comprising one or more processors, a training dataset, wherein the training dataset comprises a plurality of content items associated with a plurality of respective resources; determining, by the computing system, a plurality of quality scores associated with the plurality of respective resources, wherein the plurality of quality7 scores are determined based on incoming links and outgoing links for the plurality of respective resources; obtaining, by the computing system, a prompt; processing, by the computing system, the prompt wi th a generative model to generate a model-generated output responsive to the prompt; determining, by the computing system, a ground truth example from the training dataset based on the plurality7 of quality scores; evaluating, by the computing system, a loss function that evaluates a difference between the model-generated output and the ground truth example; and adjusting, by the computing system, one or more parameters of the generative model based on the loss function.
26. The method of claim 25, wherein determining, by the computing system, the ground truth example from the training dataset based on the plurality of quality scores comprises: determining, by the computing system, a first ground truth example from the training dataset is associated with the model-generated output; determining, by the computing system, a second ground truth example from the training dataset is associated with the model-generated output; determining, by the computing system, a first score of the plurality of quality scores is associated with the first ground truth example; determining, by the computing system, a second score of the plurality of qualityscores is associated with the second ground truth example; and determining, by the computing system, the ground truth example from the first ground truth example and the second ground truth example based on the first score and the second score.
27. The method according to claim 25 or 26, further comprising: obtaining, by the computing system, a plurality of interaction datasets associated with a plurality of additional resources, wherein the plurality of additional resources comprise a plurality of model-generated content items that were previously generated with the generative model, wherein the plurality of interaction datasets are descriptive of respective interactions with the plurality- of additional resources by a plurality of users; training, by the computing system, a machine-learned reward model based on the plurality of interaction datasets and the plurality of model-generated content items; and storing, by the computing system, the machine-learned reward model and the generative model.
28. The method of claim 27, wherein the reward model is further trained on the training dataset and the plurality of scores.
29. The method according to claim 27 or 28, further comprising: processing, by the computing system, a second prompt with the generative model to generate a plurality of model-generated fragments, wherein the plurality of model-generated fragments comprise a plurality of different candidate responses to the second prompt; processing, by the computing system, the plurality of model-generated fragments with the machine-learned reward model to generate a plurality of respective scores, wherein the plurality of respective scores are associated with evaluating a quality of the plurality of model-generated fragments; and providing, by the computing system, a particular model-generated fragment of the plurality of model-generated fragments as an output based on the plurality' of respective scores.
30. The method according to claim 27 or 28, further comprising: obtaining, by the computing sy stem, an additional prompt; processing, by the computing system, the additional prompt with the generative model to generate an additional model-generated output; processing, by the computing system, the model -generated output with the machine- learned reward model to generate an output score; and adjusting, by the computing system, one or more parameters of the generative model based on the output score.
31. The method as in any of claims 25 - 30, wherein the generative model comprises a pretrained autoregressive language model.
32. The method of claim 31, wherein the generative model was pretrained on a plurality of textual content items associated with the plurality of respective resources.
33. The method as in any of claims 25 - 32, wherein the generative model comprises an image generation model, wherein the image generation model was trained for generation based on a plurality' of images from the plurality of content items.
34. The method of claim 33, wherein the image generation model comprises a diffusion model.
PCT/US2024/014738 2024-02-07 2024-02-07 Generative model tuning and inference utilizing quality signals Pending WO2025170579A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2024/014738 WO2025170579A1 (en) 2024-02-07 2024-02-07 Generative model tuning and inference utilizing quality signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2024/014738 WO2025170579A1 (en) 2024-02-07 2024-02-07 Generative model tuning and inference utilizing quality signals

Publications (1)

Publication Number Publication Date
WO2025170579A1 true WO2025170579A1 (en) 2025-08-14

Family

ID=90364471

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/014738 Pending WO2025170579A1 (en) 2024-02-07 2024-02-07 Generative model tuning and inference utilizing quality signals

Country Status (1)

Country Link
WO (1) WO2025170579A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160232165A1 (en) * 2010-09-28 2016-08-11 International Business Machines Corporation Providing answers to questions using hypothesis pruning

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160232165A1 (en) * 2010-09-28 2016-08-11 International Business Machines Corporation Providing answers to questions using hypothesis pruning

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
AGOSTINELLI ET AL.: "MusicLM: Generating Music From Text", ARXIV:2301.11325V1, 26 January 2023 (2023-01-26)
DOSOVITSKIY ET AL.: "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale", ARXIV:2010. 11929V2, 3 June 2021 (2021-06-03)
FERRUCCI DAVID ET AL: "Building Watson: An Overview of the DeepQA Project", AI MAGAZINE., vol. 31, no. 3, 1 September 2010 (2010-09-01), CA, pages 59 - 79, XP093193097, ISSN: 0738-4602, DOI: 10.1609/aimag.v31i3.2303 *
JUMPER ET AL.: "Highly accurate protein structure prediction with AlphaFold", NATURE, vol. 596, 26 August 2021 (2021-08-26), pages 583, XP055888904, DOI: 10.1038/s41586-021-03819-2
KUDO ET AL.: "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing", PROCEEDINGS OR THE 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (SYSTEM DEMONSTRATIONS, 31 October 2018 (2018-10-31), pages 66 - 71, Retrieved from the Internet <URL:https://aclanthology.org/D18-2012.pdf>
SAHARIA ET AL.: "Non-Autoregressive Machine Translation with Latent Alignments", ARXIV:2004.07437V3, 16 November 2020 (2020-11-16)
VASWANI ET AL.: "Attention Is All You Need", ARXIV:1706.03762V7, 2 August 2023 (2023-08-02)
ZHOU ET AL.: "Mixture-of Experts with Expert Choice Routing", ARXIV:2202.09368V2, 14 October 2022 (2022-10-14)

Similar Documents

Publication Publication Date Title
WO2024073087A1 (en) Revision of and attribution for output of text generation models
US20250054322A1 (en) Attribute Recognition with Image-Conditioned Prefix Language Modeling
US20250124256A1 (en) Efficient Knowledge Distillation Framework for Training Machine-Learned Models
EP4487285A1 (en) Asset performance determination system
WO2025072932A1 (en) Multimodal autoregressive model for time-aligned and contextual modalities
US20250356223A1 (en) Machine-Learning Systems and Methods for Conversational Recommendations
WO2025095958A1 (en) Downstream adaptations of sequence processing models
US20250131321A1 (en) Efficient Training Mixture Calibration for Training Machine-Learned Models
US20250328568A1 (en) Content-Based Feedback Recommendation Systems and Methods
US20250315428A1 (en) Machine-Learning Collaboration System
US20250265087A1 (en) Machine-Learned Model Alignment With Synthetic Data
WO2025102041A1 (en) User embedding models for personalization of sequence processing models
US20250209308A1 (en) Risk Analysis and Visualization for Sequence Processing Models
WO2024207009A1 (en) Efficient use of tools by language models
WO2025170579A1 (en) Generative model tuning and inference utilizing quality signals
US12536233B1 (en) AI-generated content page tailored to a specific user
US20250244960A1 (en) Generative Model Integration with Code Editing
US12524459B1 (en) Artificial intelligence-based image search refinement
US20260030300A1 (en) AI-Generated Content Page Tailored to a Specific User
US20250265285A1 (en) Computing Tool Retrieval Using Sequence Processing Models
EP4685671A1 (en) Ai-generated content page tailored to a specific user
US20260004191A1 (en) Multimodal Machine-Learned Models for Unified Attention and Response Predictions for Visual Content
US20250131280A1 (en) Meta-Reinforcement Learning Hypertransformers
US20250124067A1 (en) Method for Text Ranking with Pairwise Ranking Prompting
US20250111285A1 (en) Self-Supervised Learning for Temporal Counterfactual Estimation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24711398

Country of ref document: EP

Kind code of ref document: A1