WO2025059289A1

WO2025059289A1 - System and method for performing translation of colloquial content

Info

Publication number: WO2025059289A1
Application number: PCT/US2024/046370
Authority: WO
Inventors: Akshat PRAKASH
Original assignee: Camb Ai Inc
Current assignee: Camb Ai Inc
Priority date: 2023-09-13
Filing date: 2024-09-12
Publication date: 2025-03-20
Anticipated expiration: 2026-03-13

Abstract

The present disclosure relates to a method and a system for processing and translating colloquial content. The method, performed by a server, includes receiving, by a data reception engine, an input text in at least one source language. The input text includes at least one of colloquial expressions and content elements. In addition, the method includes pre-processing, by a pre-processing engine with implementation of a large language model (LLM), the input text for simplifying the content elements present in the input text. Further, the method includes translating, by an ensemble engine comprising two or more translation models, the simplified text into a target language text. Furthermore, the method includes post-processing, by a post-processing engine, the target language text for refining the target language text, based on one or more contextual parameters.

Description

SYSTEM AND METHOD FOR PERFORMING TRANSLATION OF COLLOQUIAL CONTENT TECHNICAL FIELD [0001] Embodiments of the present disclosure generally relate to the field of machine translation. More particularly, embodiments of the present disclosure relate to translation of colloquial language and related content using an ensemble of advanced language models and contextual refinement techniques. BACKGROUND [0002] The following description of the related art is intended to provide background information pertaining to the field of the disclosure. This section may include certain aspects of the art that may be related to various features of the present disclosure. However, it should be appreciated that this section is used only to enhance the understanding of the reader with respect to the present disclosure, and not as admissions of the prior art. [0003] Machine Translation has witnessed unparallel advancements in the field of language translation services. The technology aims to facilitate communication and comprehension between people who speak various languages, without the use of human translators. Quintessentially it is like having a virtual translator that helps a user understand and communicate with people who speak different languages by converting what they say into a language the user understands in a fast and efficient way. [0004] In the contemporary landscape, Machine Translation has unfolded a diverse array of applications. Primarily, Machine Translation is effective in providing enhanced automated translation services, particularly for colloquial and low-resource languages where most conventional approaches fail to meet the expectations of a desired outcome. Some most commonly observed examples of applications of Machine Translation include translation of specific user language(s) that may not have significant digital resources available like particular slang word, idioms, banters etc. A few other examples include Improving Customer Services in Global Business domain, Content localization, Mining Data in content available in multilingual format etc. [0005] Recently, there have been significant efforts and developments in utilizing various automated translation and Natural Language Processing (NLP) related techniques and models, such as Neural Machine Translation (NMT), Transformer Models, GPT Models, BERT Models etc. for the purpose of performing automated translation of Colloquial content. While these techniques are superior to traditional phrase-based machine translation systems, most of the existing solutions still face several challenges, such as lack of Low-resource languages, lack of accurate contextual understanding, requirement of heavy computational and resource heavy mechanisms, and lack of scalability. These challenges are conventionally addressed by manual translations. Additionally, various evaluations are being performed for overcoming the abovementioned challenges. Some of them include the state-of-the-art machine translation by Google called Neural Machine Translation (NMT) (Wu et al., 2016) that employs neural networks to translate languages, thereby producing fluent and accurate results. Similarly, the "Attention is All You Need" paper by Vaswani et al. (2017) introduced the Transformer model, which forms the foundation of many state-of-the-art NLP (Natural Language Processing) models. [0006] While these methods demonstrate superior translations, they still fail to overcome the problem of understanding contextual ambiguity, local language idioms and complex expressions. Similarly, some techniques such as OpenAI's GPT models, Google’s BERT (Bidirectional Encoder Representations from Transformers), and some others are pre-trained language models, which however, are also dependent on training data and simultaneously suffer from resource intensiveness. [0007] Some other approaches dedicated to improving translation for low-resource languages like techniques for multilingual learning and exploiting monolingual data, ensemble methods (combining multiple methods), transfer learning wherein a model trained on a high-resource language is fine-tuned on a low-resource language, commonly used (Zoph et al., 2016) today are not immune to the challenges of contextual ambiguity and constant updating of resources. [0008] For the abovementioned reasons, these translation systems are generally not able to translate colloquial content with the contextual accuracy while maintaining translation consistency across multiple languages involving complex expressions. [0009] Therefore, there exists an imperative need in the art for a system for colloquial language translation that can overcome the shortcomings of the conventional state-of-the-art Machine Translation approaches and that can effectively deal with the known formidable challenges, which the present disclosure aims to address. SUMMARY [0010] Some of the objects of the present disclosure, which at least one embodiment disclosed herein satisfies are listed below. [0011] It is an object of the present disclosure to provide a translation system that can efficiently and accurately translate a colloquial language text into a user-desired language without the need of any manual intervention. [0012] It is another object of the present disclosure to provide an ensemble machine translation method based on multiple models, such as SOTA, NMT, and GPT models, which are well able to translate low resources language. [0013] It is another object of the present disclosure to provide an ensemble machine translation method based on SOTA, NMT, and GPT models, which allow for scalability across a plurality of languages. [0014] It is another object of the present disclosure to provide an ensemble machine translation method based on SOTA, NMT, and GPT models, which can efficiently utilize computation resources. [0015] It is yet another object of the present disclosure to provide an ensemble machine translation method based on SOTA, NMT, and GPT models with mutable language resource(s). [0016] In an aspect, a method for processing and translating colloquial content is disclosed. The method includes receiving, by a data reception engine, an input text in at least one source language. The input text includes at least one of colloquial expressions and content elements. In addition, the method includes pre-processing, by a pre-processing engine with implementation of a large language model (LLM), the input text for simplifying the content elements present in the input text. The pre-processing step converts the input text into a simplified text. Further, the method includes translating, by an ensemble engine comprising two or more translation models, the simplified text into a target language text. Each translation model is optimized for translating a specific language group in the simplified text. Furthermore, the method includes post-processing, by a post-processing engine, the target language text for refining the target language text, based on one or more contextual parameters. [0017] In an embodiment, the method includes generating, by an output engine, an output text from the target language text. Next, the method includes displaying, by a display engine, the output text on a display of a computing unit. [0018] In an embodiment, for simplifying the content elements present in the input text, the method comprises removing, by the pre-processing engine, redundant and obsolete content elements from the input text while maintaining the semantic integrity of the input text. [0019] In an embodiment, the one or more contextual parameters includes at least one of input text demographics and situational context. [0020] In an embodiment, the LLM is fine-tuned on sentence simplification datasets to reduce complexity of the input text and enhance translation quality. [0021] In another aspect, a system to process and translate colloquial content is disclosed. The system includes a data reception engine configured to receive an input text in at least one source language. The input text includes at least one of colloquial expressions and content elements. In addition, the system includes a pre-processing engine configured to pre-process, with implementation of a large language model (LLM), the input text to simplify the content elements present in the input text. The pre-processing engine is configured to convert the input text into a simplified text. Further, the system includes an ensemble engine configured to translate the simplified text into a target language text. The ensemble engine includes one or more translation models. Each translation model is optimized to translate a specific language group in the simplified text. Furthermore, the system includes a post-processing engine configured to post-process the target language text to refine the target language text based on one or more contextual parameters. [0022] In an embodiment, the system includes an output engine configured to generate an output text from the target language text. Next, the system includes a display engine configured to display the output text on a display of a computing unit. [0023] In an embodiment, to simplify the content elements present in the input text, the pre- processing engine is configured to remove redundant and obsolete content elements from the input text while maintaining the semantic integrity of the input text. [0024] In an embodiment, the one or more contextual parameters includes at least one of input text demographics and situational context. [0025] In an embodiment, the LLM is fine-tuned on sentence simplification datasets to reduce complexity of the input text and enhance translation quality. [0026] In an embodiment, the post-processing engine is configured to implement a referential prompt system to dynamically adjust the target language text. [0027] In an embodiment, the post-processing engine is configured to implement a rule-based model along with the referential prompt system to generate the target language text. BRIEF DESCRIPTION OF THE DRAWINGS [0028] The accompanying drawings, which are incorporated herein, and constitute a part of this disclosure, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Also, the embodiments shown in the figures are not to be construed as limiting the disclosure, but the possible variants of the method and system according to the disclosure are illustrated herein to highlight the advantages of the disclosure. It will be appreciated by those skilled in the art that disclosure of such drawings includes disclosure of electrical components or circuitry commonly used to implement such components. [0029] Figure 1a illustrates an exemplary block diagram of a system for translation of content involving colloquial languages, in accordance with an exemplary embodiment of the present disclosure. [0030] Figure 1b illustrates an exemplary block diagram of a translation engine, in accordance with an exemplary embodiment of the present disclosure. [0031] Figure 2 illustrates an exemplary method flow diagram indicating the process for translating content involving colloquial languages, in accordance with exemplary embodiments of the present disclosure. [0032] The foregoing shall be more apparent from the following more detailed description of the disclosure. DETAILED DESCRIPTION [0033] In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter may each be used independently of one another or with any combination of other features. An individual feature may not address any of the problems discussed above or might address only some of the problems discussed above. [0034] The ensuing description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the disclosure as set forth. [0035] Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. [0036] Also, it is noted that individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations may be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure. [0037] The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive—in a manner similar to the term “comprising” as an open transition word—without precluding any additional or other elements. [0038] As used herein, a “processing unit” or “processor” or “operating processor” includes one or more processors, wherein processor refers to any logic circuitry for processing instructions. A processor may be a general-purpose processor, a special purpose processor, a conventional processor, a digital signal processor, a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits, Field Programmable Gate Array circuits, any other type of integrated circuits, etc. The processor may perform signal coding data processing, input/output processing, and/or any other functionality that enables the working of the system according to the present disclosure. More specifically, the processor or processing unit is a hardware processor. [0039] As used herein, “a user equipment”, “a user device”, “a smart-user-device”, “a smart- device”, “an electronic device”, “a mobile device”, “a handheld device”, “a wireless communication device”, “a mobile communication device”, “a communication device” may be any electrical, electronic and/or computing device or equipment, capable of implementing the features of the present disclosure. The user equipment/device may include, but is not limited to, a mobile phone, smart phone, laptop, a general-purpose computer, desktop, personal digital assistant, tablet computer, wearable device or any other computing device which is capable of implementing the features of the present disclosure. Also, the user device may contain at least one input means configured to receive an input from at least one of a transceiver unit, a processing unit, a storage unit, a detection unit and any other such unit(s) which are required to implement the features of the present disclosure. [0040] As used herein, “storage unit” or “memory unit” refers to a machine or computer- readable medium including any mechanism for storing information in a form readable by a computer or similar machine. For example, a computer-readable medium includes read-only memory (“ROM”), random access memory (“RAM”), magnetic disk storage media, optical storage media, flash memory devices or other types of machine-accessible storage media. The storage unit stores at least the data that may be required by one or more units of the system to perform their respective functions. [0041] As discussed in the background section, the conventional machine translation systems for translating content including colloquial expressions have several shortcomings, such as inability to accurately translate region-specific expressions, lack of training data etc. Further, resource-intensive processes for the purpose of translation leads to inefficient utilization of computational resources. In conventional methods, human intervention is necessary to ensure semantic consistency. Additionally, incorporating updates to language translation models in a daunting task as languages around the world are always evolving. Moreover, existing systems struggle to scale across multiple languages. There are innumerable languages with countless idioms, slangs, nuances, contextual differences and colloquial distinctions, and managing this disparity in a multi-lingual setting is a significant challenge. [0042] The present disclosure aims to overcome the above-mentioned and other existing problems in this field of technology, for the purpose of machine translation, particularly for translating content involving colloquial language by providing a system and a method that are based on an ensemble of state-of-the-art (SOTA) neural machine translation (NMT) and generative pre-trained transformer (GPT) models. Particularly, the proposed method combines the strengths of existing models, such as NMT and GPT models. NMT models are known for their ability to handle complex sentence structures and long-distance dependencies, while GPT models excel at generating fluent and natural-sounding language. By combining these two models, the proposed method aims to produce translations that are both accurate and natural sounding. Accordingly, the main objective of the proposed method is to improve the accuracy and fluency of machine translation for colloquial content. Hereinafter, exemplary embodiments of the present disclosure will be described with reference to the accompanying drawings. [0043] Figure 1a illustrates an exemplary block diagram of a system [100] for translation of content involving colloquial languages, in accordance with an exemplary embodiment of the present disclosure. [0044] The system [100] includes a server [106] adapted to receive an input text from a computing unit [102] through a communication medium [104] (e.g., network [104]) for efficient and accurate language translation. The server [106] comprises at least one processing unit [112] and at least one storage unit [110]. Also, all of the components/ units of the system [100] are assumed to be connected to each other unless otherwise indicated below. Also, in Figure 1a, a few units are shown, however, the system [100] may comprise multiple such units or the system [100] may comprise any such numbers of said units as required to implement the features of the present disclosure. Further, in an implementation, the system [100] may be present in a user device (e.g., the computing unit [102]) to implement the features of the present invention. The system [100] may be a part of the user device (e.g., the computing unit [102]) or may be independent of but in communication with the user device. In another implementation, the system [100] may reside in a server. In yet another implementation, the system [100] may reside partly in the server [106] and partly in the user device. [0045] The server [106] includes a translation engine [108]. The translation engine [108] is configured to deliver efficient and accurate language translations. This is made possible through the combined use of various state-of-the-art (SOTA) Neural Machine Translation (NMT) models that work together as an ensemble. The ensemble includes two or more translation models (e.g., NMT), as well as other programming models that are part of the present disclosure. [0046] The translation engine [108] is configured to leverage the interconnection between the different components and units of the system [100] to deliver high-quality translations. In a preferred embodiment, each of the two or more NMT models in the ensemble model is selected based on its proficiency with a particular language group. This ensures that the translation engine [108] is able to handle a wide range of languages and deliver accurate translations every time. In addition to the NMT models, the translation engine [108] also uses other programming models to deliver accurate translations for specific language regimes. For example, the indicTrans programming model is used for Indic languages, while the DeepL model is used for European languages. The translation engine [108] also uses popular tools such as Google Translate to deliver accurate translations for different languages and other Custom Neural Machine Translation Language Regimes. [0047] Figure 1b illustrates an exemplary block diagram of the translation engine [108], in accordance with an exemplary embodiment of the present disclosure. [0048] As illustrated in Figure 1b, the translation engine [108] includes various sub-engines, such as for example, a data reception engine [108a], a pre-processing engine [108b], an ensemble engine [108c], a post-processing engine [108d], an output engine [108e], and a display engine [108f]. [0049] In one implementation, the system [100] includes the data reception engine [108a] configured to receive an input text in at least one source language. In one implementation, the data reception engine [108a] is configured to receive an input file in one or more formats. For example, the one or more formats may include at least one of text, audio, video, image, and combinations thereof. The server [106] is then configured to extract the input text from the input file. The input text includes at least one of colloquial expressions and content elements. [0050] The pre-processing engine [108b] is a sub-engine adapted to simplify the input text in such a way that the simplified text thus obtained retains the core meaning and essence of the original sentences while making it more comprehensible. In particular, the pre-processing engine [108b] is configured to pre-process, with implementation of a large language model (LLM), the input text to simplify content elements present in the input text. The pre-processing engine [108b] converts the input text in a simplified text. In particular, the pre-processing engine [108b] is configured to remove redundant and obsolete content elements from the input text while maintaining the semantic integrity of the input text. [0051] In a preferred embodiment, the pre-processing engine [108b] is provided in the form of a programming model formed using Vicuna 13B language that is a fine-tuned model on sentence simplification datasets. Such a programming model is adapted to fine tune the input text by filtering redundant and obsolete keywords and simultaneously retaining the core meaning of the input text. This pre-processing step not only reduces the complexity of the source data but also primes it for more accurate translation, thereby enhancing the overall translation output. Further, the process of sentence simplification is adapted to enhance the quality of downstream translations and improve accessibility for low-resource languages. Additionally, this process is not just about improving translation quality. The present process also addresses the critical issue of accessibility for low- resource languages. In other embodiments, the pre-processing engine [108b] may be any custom- trained language model dedicated to sentence simplification. This model could be fine-tuned on an array of text simplification datasets, encompassing a wide spectrum of language complexities. In yet other embodiments, any suitable programming engine fine-tuned for sentence simplification may be utilized without deviating from the scope of the present disclosure. [0052] The ensemble engine [108c] is configured to translate the simplified text into a target language text. The ensemble engine [108c] includes two or more translation models. In one implementation, each translation model is optimized to translate a specific language group in the simplified text. [0053] The ensemble engine [108c] is preferably a combination of two or more state-of-the- art (SOTA) Neural Machine Translation (NMT) models configured to work together collaboratively. Particularly, the ensemble engine [108c] is configured to convert the simplified text into the target language (i.e., desired language) while maintaining accuracy and fluency. In a preferred embodiment, each of the NMT models is trained to excel in different language pairs or linguistic aspects, thereby mitigating the limitations of individual models and providing a more comprehensive translation output. The NMT Model may be any suitably fine-tuned programming model known in the art without deviating from the scope of the current disclosure. Moreover, such programming models are continually updated and expanded, thereby ascertaining that the system stays up to date. [0054] The post-processing engine [108d] is a sub-engine adapted to refine and fine-tune the translated text using a referential prompt system, preferably powered by Generative Pre-trained Transformer (GPT-4) referential prompt system. In particular, the post-processing engine [108d] is configured to post-process the target language text to refine the target language based on one or more contextual parameters. The one or more contextual parameters includes at least one of input text demographics and situational context. The post-processing engine [108d] is configured to implement the referential prompt system to dynamically adjust the target language text. Also, the post-processing engine [108d] is configured to implement a rule-based model along with the referential prompt system to generate the target language text. [0055] The GPT-4 referential prompt system possesses the ability to dynamically adjust and revise translations based on the one or more contextual parameters. The one or more contextual parameters include at least one of age, gender, formality, and tense, all of which play a crucial role in accurately conveying the subtleties of language use. Particularly, the post-processing engine [108d] is configured to ensure that the translated text not only maintains linguistic precision but also adapts to the specific contextual requirements of the given situation. This capability is particularly valuable in scenarios where accurate communication depends on more than just words and also on the correct choice of vocabulary, tone, and style that align with the cultural and situational context. In a preferred embodiment, the post-processing engine [108d] is in the form of the GPT-4 programming model. However, in other embodiments, any suitable advanced language model specifically designed for contextual adjustment may be utilized without deviating from the scope of the present disclosure. In some embodiments, the rule-based model may additionally be provided in the post processing engine [108d] to handle certain aspects of contextual adaptation, such as modifying formality or tense. [0056] The output engine [108e] is configured to generate an output text from the target language text. The output engine [108e] is a sub-engine adapted to prepare a final translation of the input text, wherein the final translation embodies both contextual accuracy and linguistic simplicity in a target language, as desired. The output engine [108e] may be in the form of any suitable programming language model known in the art that may be utilized to combine the output and display in a user-friendly manner in a target language. [0057] In order to provide an accurate translation, the processing unit [112] of the server [106] is configured to pre-process the input text in order to refine and reduce the complexity of the input text while simultaneously retaining the core meaning. So, to process the received original content, typically in the form of text, which needs to be predicated and translated, the content is pre- processed with a first language model. [0058] In order to align the translated text in accordance with various contextual factors, the processing unit [112] of the server [106] is configured to post-process the input data using a referential prompt system. Examples of contextual factors include but are not limited to age, gender, formality and tense. Example of referential prompt system includes but is not limited to GPT-4 Referential prompt. [0059] The storage unit [110] is configured to store translation related information, such as original content, translations, output, contextual parameters, various NLP and / or processing models, and the like. [0060] The display engine [108f] is configured to display the output text on the display of the computing unit [102]. The output text is the final translation of the input text. For example, the input text is in English language and the output text is in French language. In another example, the input text is in Spanish language and the output text is in Chinese language. [0061] Further, the disclosed framework includes all the possibilities of data pick up to prepare machine translations. The content may be received directly in the form of text, some translation is performed from local databases and some data is translated using GPT’s master reference data. [0062] Figure 2 illustrates an exemplary method flow diagram [200] indicating the process for translating content involving colloquial languages, in accordance with exemplary embodiments of the present disclosure. In an implementation, the method [200] is performed by the system [100]. As shown in Figure 2, the method [200] starts at step [202]. [0063] At step [204], the method [200] comprises receiving, by the data reception engine [108a], the input text in at least one source language. The input text represents original content that needs to be translated. The input text includes at least one of colloquial expressions and content elements. In one example, the input text is in English language. In another example, the input text can be in any other language. [0064] Next, at step [206], the method [200] comprises pre-processing, by the pre-processing engine [108b] with implementation of the large language model (LLM), the input text for simplifying the content elements present in the input text. The pre-processing step is performed for converting the input text in the simplified text. Particularly, the pre-processing engine [108b] (preferably in the form of the Vicuna 13B Language Model), is utilized to simplify the input text while maintaining their meaning. This step is crucial for enhancing the quality of downstream translations and improving accessibility, especially for low-resource languages. [0065] At next step [208], the method [200] comprises translating, by the ensemble engine [108c], the simplified text into the target language text. The ensemble engine [108c] includes two or more translation models. Each translation model is optimized for translating a specific language group in the simplified text. Particularly, at step [208], the method [200] includes translating the simplified text generated from the previous step using the ensemble engine [108c]. The ensemble engine [108c] is preferably a combination of two or more state-of-the-art (SOTA) Neural Machine Translation (NMT) models. This translation step is a crucial part of the overall machine translation process, aiming to convert the simplified input text into the desired target language text while maintaining accuracy and fluency. [0066] The method [200] then proceeds to step [210]. At step [210], the method [200] comprises post-processing, by the post processing engine [108d], the target language text for refining the target language text based on the one or more contextual parameters.. Particularly, at this step, the method [200] includes fine-tuning the translated target language text using a referential prompt system, preferably powered by GPT-4 referential prompt block. This essential step is dedicated to correcting and adapting the translated target language text to ensure that it aligns with a broad spectrum of contextual nuances, ultimately yielding linguistically and culturally appropriate translations that resonate with the intended context. In a preferred embodiment, the translated target language text is corrected and adjusted on the basis of the one or more contextual parameters including but not limited to age, gender, formality, and tense for producing linguistically appropriate translations for a given context. [0067] In one implementation, the method [200] includes an additional step (not shown in figures) performed prior to the step [208], where the simplified text is contextually analyzed before the translation in step [208]. This step is conceptually similar to the step [210]. [0068] Further, the method [200] includes generating, by the output engine [108e], the output text from the target language text. The output text may represent a final translated text that embodies both contextual accuracy and linguistic simplicity in a target language, as desired. [0069] Accordingly, it may be noted that the method [200] provides a final output that is a result of a holistic approach that optimizes the translation process from multiple angles. By combining sentence simplification, ensemble NMT models, and context-aware adjustments, the system [100] attains an output that is not only comprehensible but also culturally and situationally appropriate. The output text, in the target language, embodies the core purpose of the system [100]: to facilitate meaningful and accurate communication across linguistic boundaries, thereby enhancing global connectivity and understanding. While the present invention leverages NMP and GPT, alternative embodiments may include alternative translation, simplification, and contextual adjustment models for optimizing contextual ambiguity and accuracy of the final output text. Any known programming engines may be fine-tuned to comprehend the desired functionality of the steps without deviating from the scope of the present disclosure. In some embodiments, the system [100] may be adapted to incorporate human reviewers at different stages of the translation pipeline. For instance, after the sentence simplification stage, human reviewers could assess and correct the simplified sentences. Similarly, post-translation, the human could review and adjust the translated text for better contextual accuracy. [0070] In some embodiments, the system [100] may, instead of using distinct sub-engines of the translation engine [108] for the purpose of sentence simplification, translation, and contextual adjustment, may utilize a single large language model that is trained for a multi-task learning or transfer learning approach, teaching it to perform all three tasks in a unified manner. [0071] Example: [0072] As an example, dialogues are given as a list of numbered sentences. [0073] The ensemble engine [108c], based on SOTA NMT and GPT models understands the full context of the dialogues as a whole and then for each sentence: 1. if the sentence doesn't require any modification, it is left as is. 2. if it does, follow the steps above. The complex sentence is split into a series of simpler, more descriptive sentences but then recombined into one full dialogue in the simplest way possible. [0074] The input JSON in a preferred embodiment is provided hereinbelow: { "1": { "original": "<Original sentence>" }, "2": { "original": "<Original sentence>" }, # ... rest of the original sentences } The output JSON should be in the following format: { "1": { "simplified": ["<Rewritten sentence>"] }, "2": { ["<Rewritten sentence>"] }, # ... rest of the rewritten sentences } The lines in the output JSON should be same as the input JSON. """ Referential prompt: The input is a video transcript file formatted in a particular JSON format. Here is a sample input format: { "1": { "original_dialogue": "<original dialogue in English>", "reference_translation": "<an NMT generated reference translation>", "speedup_factor": <a float that represents the ratio of 'time taken to speak reference_translation / time take to speaker original_dialogue'>, "gender": "<gender of the speaker of this dialogue>", "age": <an int that informs the age of the speaker of this dialogue>, "target_audience": "<the type of audience that the speaker is speaking to; can be set to informal or informa>l" }, # rest of the dialogues } For each dialogue, the reference translation is improved by paying attention to the gender, age, target audience, and speedup_factor. 1. If the target audience is informal, informal speech is used rather than honorific. When the target_audience is formal, honorific speech is used. 2. For each improved translation, the subject-verb agreement is corrected. 3. For each translation, the number, tense, mood, case, and person agreement is retained, any modification of which may result in poor performance. 4. For each improved translation, the meaning and context is retained as that of the original dialogue. The final result is output in the following JSON format: { "1": { "final_translation": <improved translation for the corresponding dialogue> }, # … rest of the dialogues } [0075] The method [200] comprises displaying, by the display engine [108f], the output text on the display of the computing unit [102] (not shown in figures). [0076] The method [200] terminates at step [212]. [0077] As is evident from the above, the present disclosure provides a technically advanced solution for translating colloquial texts. The proposed system [100], which is based on an ensemble of state-of-the-art NMT models, presents a multitude of advantages over existing translation technologies. [0078] Firstly, the use of a Vicuna 13B Language Model fine-tuned on sentence simplification datasets for pre-processing by the system [100] enhances accuracy by reducing sentence complexity and increases the accuracy of downstream translations. [0079] Secondly, the referential prompts powered by GPT-4 ensure improved contextual relevance, adjusting for factors such as age, gender, formality, and tense. This leads to translations that are not only linguistically correct but also appropriate in a given context. [0080] Thirdly, the system [100] excels at handling colloquial language due to the sentence simplification and referential prompts, thereby resulting in translations that are more meaningful and natural to the users. Fourthly, with support for over 100 languages, including low-resource languages, the system [100] is a versatile and inclusive translation technology. [0081] Fifthly, the ensemble of NMT models is designed to grow and adapt by incorporating new and improved models as they become available, thereby ensuring that the system [100] remains on the cutting edge of the machine translation technology. [0082] Further, the modular design of the system [100] allows for scalable improvements as each component can be individually updated or replaced with more advanced models over time. Additionally, the sentence simplification step improves translation accessibility for low-resource languages, thus expanding the range and utility of machine translation. Together, these advantages present a significant leap forward in the capabilities and quality of machine translation technology. [0083] It may be noted that the system [100] proposed in the current disclosure may be utilized for the purpose of numerous immediate and potential applications due to its enhanced capacity for accurate and culturally sensitive machine translation. For instance, websites, applications, and digital platforms can utilize the system [100] of the present disclosure to provide localized content for users from different linguistic backgrounds. Businesses that offer customer support in multiple languages can benefit from the high-quality translations of the system [100]. The system [100] may further be used to translate customer inquiries and automate responses, thereby improving the efficiency of multilingual customer service. In some instances, the system [100] may be utilized by various online learning platforms for the purpose of translating educational materials, thereby making them accessible to a broader range of students across the globe. Social media platforms may use the system [100] to translate user-generated content, thereby facilitating communication and understanding among users who speak different languages. For communities and cultures with less digitally available languages, the system [100], may be a vital tool in preserving and promoting their languages by providing high-quality translation services. In academia and research, the system [100] can be employed to translate research papers and articles, thereby facilitating the sharing and exchange of knowledge across linguistic boundaries. In media and entertainment, the system [100] could be used for automated subtitling and dubbing of movies, shows, and video content, thereby making it accessible to a diverse audience. Governments can use the system [100] to offer multilingual services, including translation of official documents to cater to a linguistically diverse population. More particularly, the system [100] may be used to facilitate cross-cultural communication, thereby helping people understand and appreciate different cultures better. While these are the immediate applications, the system [100] of the current disclosure may be used in a vast number of applications where the need of high-quality machine translation grows. [0084] Therefore, the proposed system [100] offers significant advantages over existing translation technologies due to its enhanced accuracy, improved contextual relevance, superior handling of colloquial language, broad language support, dynamic adaptability, optimized use of NMT models, increased accessibility, and scalability. Accordingly, the system [100] represents a significant step forward in the capabilities and quality of machine translation technology. [0085] While considerable emphasis has been placed herein on the disclosed embodiments, it will be appreciated that many embodiments can be made and that many changes can be made to the embodiments without departing from the principles of the present disclosure. These and other changes in the embodiments of the present disclosure will be apparent to those skilled in the art, whereby it is to be understood that the foregoing descriptive matter to be implemented is illustrative and non-limiting.

Claims

I/We claim: 1. A method for processing and translating colloquial content, comprising: receiving, by a data reception engine, an input text in at least one source language, wherein the input text comprises at least one of colloquial expressions and content elements; pre-processing, by a pre-processing engine with implementation of a large language model (LLM), the input text for simplifying the content elements present in the input text, wherein said pre-processing step converts the input text in to a simplified text; translating, by an ensemble engine comprising two or more translation models, the simplified text into a target language text, wherein each translation model is optimized for translating a specific language group in the simplified text; and post-processing, by a post-processing engine, the target language text for refining the target language text, based on one or more contextual parameters. 2. The method as claimed in claim 1, comprising: generating, by an output engine, an output text from the target language text; and displaying, by a display engine, the output text on a display of a computing unit. 3. The method as claimed in claim 1, wherein for simplifying the content elements present in the input text, the method comprises removing, by the pre-processing engine, redundant and obsolete content elements from the input text while maintaining the semantic integrity of the input text. 4. The method as claimed in claim 1, wherein the one or more contextual parameters comprises at least one of input text demographics and situational context. 5. The method as claimed in claim 1, wherein the LLM is fine-tuned on sentence simplification datasets to reduce complexity of the input text and enhance translation quality. 6. A system to process and translate colloquial content, comprising: a data reception engine configured to receive an input text in at least one source language, wherein the input text comprises at least one of colloquial expressions and content elements; a pre-processing engine configured to pre-process, with implementation of a large language model (LLM), the input text to simplify the content elements present in the input text, wherein said pre-processing engine is configured to convert the input text into a simplified text; an ensemble engine configured to translate the simplified text into a target language text, wherein the ensemble engine comprises two or more translation models, wherein each translation model is optimized to translate a specific language group in the simplified text; and a post-processing engine configured to post-process the target language text to refine the target language text, based on one or more contextual parameters. 7. The system as claimed in claim 6, comprising: an output engine configured to generate an output text from the target language text; and a display engine configured to display the output text on a display of a computing unit. 8. The system as claimed in claim 6, wherein to simplify the content elements present in the input text, the pre-processing engine is configured to remove redundant and obsolete content elements from the input text while maintaining the semantic integrity of the input text. 9. The system as claimed in claim 6, wherein the one or more contextual parameters comprises at least one of input text demographics and situational context. 10. The system as claimed in claim 6, wherein the LLM is fine-tuned on sentence simplification datasets to reduce complexity of the input text and enhance translation quality. 11. The system as claimed in claim 6, wherein the post-processing engine is configured to implement a referential prompt system to dynamically adjust the target language text.

2. The system as claimed in claim 6, wherein the post-processing engine is configured to implement a rule-based model along with the referential prompt system to generate the target language text.