WO2025224665A1

WO2025224665A1 - Artificial intelligence (ai)-interpretable and human-uninterpretable anonymization of user data

Info

Publication number: WO2025224665A1
Application number: PCT/IB2025/054267
Authority: WO
Inventors: Krishna Prasad Agara Venkatesha Rao; Akshay Shekhar Kadakol; Dev Prasad Kode
Original assignee: Sony Group Corp
Current assignee: Sony Group Corp
Priority date: 2024-04-24
Filing date: 2025-04-24
Publication date: 2025-10-30
Anticipated expiration: 2026-10-24

Abstract

A system for anonymization of user data is provided. The system receives input data that includes specific information. The system extracts major features in a spatial domain corresponding to the input data. The system transforms the extracted major features into a frequency domain. The system identifies frequency components corresponding to the major features transformed into the frequency domain. The system suppresses one or more frequency components among the identified frequency components. The system generates signal domain data corresponding to the extracted major features based on the suppression of the frequency components. The system transforms the generated signal domain data from the frequency domain to the spatial domain based on an inverse transformation of the generated signal domain data. The system generates output metadata based on the inverse transformation of the generated signal domain data. The output metadata includes an anonymized version of the specific information included in the input data.

Description

ARTIFICIAL INTELLIGENCE (AI)-INTERPRETABLE AND HUMAN-UNINTERPRETABLE

ANONYMIZATION OF USER DATA

CROSS-REFERENCE TO RELATED APPLICATIONS/INCORPORATION BY REFERENCE

[0001] This application claims priority to Indian Provisional Application No. IN202411032499, filed on April 24, 2024, which is hereby incorporated by reference in its entirety.

FIELD OF INVENTION

[0002] Various embodiments of the disclosure relate to data anonymization. More specifically, various embodiments of the disclosure relate to a system and method for Artificial Intelligence (Al)-interpretable and human-uninterpretable anonymization of user data.

BACKGROUND

[0003] Data privacy and protection have become increasingly important concerns in the digital age. As organizations collect and process vast amounts of personal information, there is a growing need for robust methods to anonymize sensitive data while preserving its utility for analysis and machine learning applications.

[0004] Traditional anonymization techniques often involve masking or removing personally identifiable information (PH) from datasets. Common approaches include data masking, tokenization, and generalization of attributes. However, these methods may significantly reduce the analytical value of the data or fail to fully protect against re-identification risks. More advanced techniques like differential privacy add statistical noise to datasets but may impact data utility for certain use cases. Additionally, many existing anonymization solutions produce outputs that are still human-readable, potentially exposing sensitive information if accessed by unauthorized parties. [0005] Limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of described systems with some aspects of the present disclosure, as set forth in the remainder of the present application and with reference to the drawings.

SUMMARY

[0006] A system and method for Artificial Intelligence (Al)-interpretable and human- uninterpretable anonymization of user data is provided substantially as shown in, and/or described in connection with, at least one of the figures, as set forth more completely in the claims.

[0007] These and other features and advantages of the present disclosure may be appreciated from a review of the following detailed description of the present disclosure, along with the accompanying figures in which like reference numerals refer to like parts throughout.

BRIEF DESCRIPTION OF THE DRAWINGS

[0008] FIG. 1 is a block diagram that illustrates an exemplary network environment for Artificial Intelligence (Al)-interpretable and human-uninterpretable anonymization of user data, in accordance with an embodiment of the disclosure.

[0009] FIG. 2 is a block diagram that illustrates an exemplary system of FIG. 1 , in accordance with an embodiment of the disclosure.

[0010] FIG. 3A and FIG. 3B are flow diagrams that collectively illustrate execution pipeline for processing input data, in accordance with an embodiment of the disclosure.

[0011] FIG. 4 is a flow diagram that illustrates execution pipeline for automating generation of output metadata based on knob parameters, in accordance with an embodiment of the disclosure.

[0012] FIG. 5 is a flow diagram that illustrates consumer-side operations, in accordance with an embodiment of the disclosure.

[0013] FIG. 6A is a diagram that illustrates producer plugin flowchart, in accordance with an embodiment of the disclosure.

[0014] FIG. 6B is a diagram that illustrates consumer plugin flowchart, in accordance with an embodiment of the disclosure.

[0015] FIG. 7 is a flowchart that illustrates exemplary operations of a method for Artificial Intelligence (Al)-interpretable and human-uninterpretable anonymization of user data, in accordance with an embodiment of the disclosure.

DETAILED DESCRIPTION

[0016] The following described implementation may be found in a system and a method for Artificial Intelligence (Al)-interpretable and human-uninterpretable anonymization of user data. Exemplary aspects of the disclosure may provide a system, which may include circuitry that may be configured to receive input data that includes specific information. The circuitry may further extract one or more major features in a spatial domain corresponding to the input data. The extracted one or more major features may represent the input data. The circuitry may further transform the extracted one or more major features into a frequency domain. The circuitry may further identify frequency components corresponding to the extracted one or more major features transformed into the frequency domain. The circuitry may further suppress one or more frequency components among the identified frequency components. The circuitry may further generate signal domain data corresponding to the extracted one or more major features based on the suppression of the one or more frequency components. The circuitry may further transform the generated signal domain data from the frequency domain to the spatial domain based on an inverse transformation of the generated signal domain data. The circuitry may further generate output metadata based on the inverse transformation of the generated signal domain data. The generated output metadata may include an anonymized version of the specific information included in the input data. The anonymized version of the specific information may be interpretable by an Artificial Intelligence (Al) model and may be uninterpretable by a human.

[0017] Traditional data anonymization techniques often fall short in balancing data utility and privacy protection. Existing methods such as data masking, tokenization, and generalization may significantly reduce the analytical value of data or fail to fully protect against re-identification risks. More advanced techniques like differential privacy, while offering stronger privacy guarantees, may impact data utility for certain applications. Additionally, many current anonymization solutions produce outputs that are still human- readable, potentially exposing sensitive information if accessed by unauthorized parties. These limitations highlight the need for a solution that may maintain data utility for Al applications while ensuring robust privacy protection while still keeping the data usable for analytics.

[0018] The disclosed technique introduces a novel approach to data anonymization that leverages signal domain transformations and feature extraction to generate Al-interpretable yet human-uninterpretable metadata. By transforming extracted features into the frequency domain, suppressing specific frequency components, and then inverse-transforming the data, the system produces anonymized metadata that retains utility for Al models while being uninterpretable by humans. This approach offers several key advantages over existing solutions for instance, the disclosed system 102 preserves data utility for Al applications, provides stronger privacy protection by making the output uninterpretable to humans, and allows for flexible control over the degree of anonymization through feature selection and frequency component suppression. [0019] The present disclosure relates to a system and method for Al-interpretable and human-uninterpretable anonymization of user data. The present disclosure addresses the growing need for privacy-preserving data processing techniques in the field of artificial intelligence and machine learning. The system may be configured to generate anonymized metadata from input data containing sensitive information such that a utility of the data is maintained for Al applications. In some cases, the system may support multiple modalities of data, which may extend beyond just image processing to encompass various types of input data. This versatility allows for broad applicability across different domains and use cases.

[0020] To enhance the flexibility and utility of the disclosed technique, the system may support various data augmentation techniques that may be applied to the anonymized data. Such data augmentation techniques may help improve the robustness and generalization capabilities of Al models trained on the anonymized data. In some implementations, the disclosed system may allow for specifying a configurable parameter to control the level of privacy of the anonymized data. Hence, the system may provide users with the ability to balance the trade-off between data utility and privacy protection based on their specific requirements and use cases. The system may utilize a trained neural network to automate generation of the output metadata from the input data. The trained neural network may control the set of knob parameters to protect privacy of the input data and to hinder any possibility of cyberattacks on the input data.

[0021] The output metadata generated by the system may include a watermark for infringement detection. This watermarking scheme may serve as a mechanism to identify and track the usage of the anonymized data, that potentially may aid in the detection of unauthorized use or distribution of the protected information. Based on a combination of these features, the disclosed system may offer a comprehensive solution for data anonymization that addresses the challenges of management of data utility for Al applications and may also ensure robust privacy protection. This approach may have significant implications for various industries and applications where the processing of sensitive data is required, such as healthcare, finance, and personal data management.

[0022] FIG. 1 is a block diagram that illustrates an exemplary network environment for Artificial Intelligence (Al)-interpretable and human-uninterpretable anonymization of user data, in accordance with an embodiment of the disclosure. With reference to FIG. 1 , there is shown a network environment 100. The network environment 100 may include a system 102, a set of transformation models 104, a server 108, a database 110, and a communication network 112. The system 102 may include a set of transformation models 104 and a masking model 106. The set of transformation models 104 may include a first transformation model 104A, a second transformation model 104B, a third transformation model 104C ... and a Nth transformation model 104N. In FIG. 1 , there is further shown input data 114 associated with the system 102. The input data 114 may include personal identifiable information (PH) 114A.

[0023] Though FIG. 1 shows “N” transformation models in the set of transformation models 104, the scope of the disclosure may not be so limited. In an embodiment, the set of transformation models 104 may include only two or more than “N” transformation models, without departure from the scope of the disclosure.

[0024] The system 102 may include suitable logic, circuitry, interfaces, and/or code that may be configured to receive input data 114 that includes specific information, for instance the PH 114A. The system 102 may further extract one or more major features in a spatial domain corresponding to the input data 114. In an embodiment, the extracted one or more major features may represent the input data 114. The system 102 may further transform the extracted one or more major features into a frequency domain. The system 102 may further identify frequency components corresponding to the extracted one or more major features transformed into the frequency domain. The system 102 may further suppress one or more frequency components among the identified frequency components. The system 102 may further generate signal domain data corresponding to the extracted one or more major features based on the suppression of the one or more frequency components. The system 102 may further transform the generated signal domain data from the frequency domain to the spatial domain based on an inverse transformation of the generated signal domain data. The system 102 may further generate output metadata based on the inverse transformation of the generated signal domain data. In an embodiment, the generated output metadata may include an anonymized version of the specific information included in the input data 114. In an embodiment, the anonymized version of the specific information may be interpretable by an Artificial Intelligence (Al) model and is uninterpretable by a human. Examples of the system 102 may include, but are not limited to, a computing device, a server, a network provider, a base station, a router, a smartphone, a cellular phone, a mobile phone, a gaming device, a mainframe machine, a computer workstation, a consumer electronic (CE) device and/or the likes.

[0025] The system 102 may store the set of transformation models 104 or may be remotely connected to another system (such as the server 108) that hosts the set of transformation models 104. When hosted on another system, the system 102 may send instructions to control training or inference of the set of transformation models 104 via remote calls (e.g., API calls).

[0026] Each transformation model of the set of transformation models 104 may be configured to perform specific data transformations or feature extractions on data. In an instance, a feature extraction model (not shown) may be a part of the set of transformation models 104. The circuitry 202 may apply the feature extraction model on the input data 114 to extract features of the input data 114.

[0027] In accordance with an embodiment, at least one transformation model of the set of transformation models 104 may be based on a machine learning (ML) model. The ML model may be a classifier or regression or clustering model, which may be trained to identify a relationship between inputs, such as features in a training dataset and output labels. The ML model may be defined by its hyper-parameters, for example, number of weights, cost function, input size, number of layers, and the like. The parameters of the ML model may be tuned and weights may be updated so as to move towards a global minima of a cost function for the ML model. After several epochs of the training on the feature information in the training dataset, the ML model may be trained to output a prediction/classification result for a set of inputs. The prediction result may be indicative of a class label for each input of the set of inputs (e.g., input features extracted from new/unseen instances).

[0028] The ML model may include electronic data, which may be implemented as, for example, a software component of an application executable on the system 102. The ML model may rely on libraries, external scripts, or other logic/instructions for execution by a processing device (for instance, circuitry 202 of FIG. 2). The ML model may include code and routines configured to enable a computing device, such as the circuitry 202 to perform one or more operations, such as major features transformation into frequency domain, signal domain data transformation from frequency domain to spatial domain, and inverse transformation of signal domain data. Additionally or alternatively, the ML model may be implemented using hardware including a processor, a microprocessor (e.g., to perform or control performance of the one or more operations), a field-programmable gate array (FPGA), or an applicationspecific integrated circuit (ASIC). Alternatively, in some embodiments, the ML model may be implemented using a combination of hardware and software. Examples of the set of transformation models 104 may include, but are not limited to, a Scale-Invariant Feature Transform (SIFT) model or a Hough Transformation model, a feature extraction model, or a pattern recognition model.

[0029] The masking model 106 may be configured to apply additional data transformation or anonymization processes to the input data and add specific metadata to the anonymized input data. In some embodiments, the masking model 106 may employ techniques such as data obfuscation, tokenization, or generalization to further protect sensitive information. The masking model 106 may work in conjunction with other transformation models to enhance privacy protection while data utility may be maintained for Al applications.

[0030] The masking model 106 may be type of artificial neural network that may typically include multiple layers, including an input layer, one or more hidden layers, and an output layer. Each layer of the multiple layers may include one or more nodes (or artificial neurons). Outputs of all nodes in the input layer may be coupled to at least one node of hidden layer(s). Similarly, inputs of each hidden layer may be coupled to outputs of at least one node in other layers of the masking model 106. Outputs of each hidden layer may be coupled to inputs of at least one node in other layers of the masking model 106. Node(s) in the final layer may receive inputs from at least one hidden layer to output a result. The number of layers and the number of nodes in each layer may be determined from hyper-parameters of the neural network. Such hyper-parameters may include weighted connections and activation functions, which may be set before or after training the masking model 106 on the training data.

[0031] During the training phase, the masking model 106 may adjust weights of the weighted connections and biases of the activation functions based on an error between the applied masking technique and required masking technique. This process, known as backpropagation, involves calculating the gradient of the loss function with respect to the network’s parameters and updating them to minimize the error. Once trained, the masking model 106 may generalize from the training data to perform accurate masking on new data. The performance of the masking model 106 may be often evaluated using metrics such as accuracy, precision, recall, and F1 -score, which may provide insights into how well the masking model 106 is performing.

[0032] In an example, the masking model 106 may apply techniques associated with Natural Language Processing (NLP) and computer vision. The masking model 106 may utilize transformer models like BERT (Bidirectional Encoder Representations from Transformers). During the training phase, the masking model 106 may mask a certain percentage of input features with masking techniques such as substitution, shuffling, encryption, redaction, averaging, nulling out, synthetic data generation, and the like. In one instance, the masking model 106 may blur the PH 114A from the input data 114 corresponding to a face in an image to mask the face. In another instance, the masking model 106 may modify facial features of the face in the image to mask the face. In another instance, the masking model 106 may add watermarking scheme to data associated with the image to mask the image. The masking model 106 may be based on the masking techniques. The training enables the masking model 106 to understand the portions of an image that are required to be masked. The masking model 106 may utilize several techniques, such as image inpainting, attention mechanisms, and data augmentation.

[0033] The server 108 may include suitable logic, circuitry, interfaces, and/or code that may be configured to manage data storage and retrieval operations and facilitate the processing and transformation of input data. In some implementations, the server 108 may perform one or more operations (e.g., by implementation of one or more Al models) of the system 102 to anonymize the PH 114A in the input data 114 such that after the anonymization, the PH 114A is still interpretable by an Al model but may be incomprehensible for human users. The server 108 may be implemented as a cloud server and may execute operations through web applications, cloud applications, HTTP requests, repository operations, file transfer, and the like. Other example implementations of the server 108 may include, but are not limited to, a database server, a file server, a web server, a media server, an application server, a mainframe server, a machine learning server (enabled with or hosting, for example, a computing resource, a memory resource, and a networking resource), or a cloud computing server.

[0034] In at least one embodiment, the server 108 may be implemented as a plurality of distributed cloud-based resources by use of several technologies that are well known to those ordinarily skilled in the art. A person with ordinary skill in the art will understand that the scope of the disclosure may not be limited to the implementation of the server 108 and the system 102, as two separate entities. In certain embodiments, the functionalities of the server 108 may be incorporated in its entirety or at least partially in the system 102, without a departure from the scope of the disclosure. In certain embodiments, the server 108 may host the database 110. Alternatively, the server 108 may be separate from the database 110 and may be communicatively coupled to the database 110.

[0035] The database 110 may include suitable logic, interfaces, and/or code that may be configured to store the input data 114, which may include specific information such as the PH 114A. The database 110 may be derived from data off a relational or non-relational database, or a set of comma-separated values (csv) files in conventional or big-data storage. The database 110 may be stored or cached on a device, such as a server (e.g., the server 108) or the system 102. The device storing the database 110 may be configured to receive commands or instructions from the system 102 or the server 108. In response, the device of the database 110 may be configured to retrieve and provide the input data 114 and the corresponding PH 114A. In some embodiments, the database 110 may be hosted on a plurality of servers stored at the same or different locations. The operations of the database 110 may be executed using hardware including a processor, a microprocessor (e.g., to perform or control performance of one or more operations), a field-programmable gate array (FPGA), or an application-specific integrated circuit (ASIC). In some other instances, the database 110 may be implemented using software.

[0036] The communication network 112 may include a communication medium through which the system 102 and the server 108 may communicate with one another. The communication network 112 may be one of a wired connection or a wireless connection. Examples of the communication network 112 may include, but are not limited to, the Internet, a cloud network, Cellular or Wireless Mobile Network (such as Long-Term Evolution and 5^th Generation (5G) New Radio (NR)), satellite communication system (using, for example, low earth orbit satellites), a Wireless Fidelity (Wi-Fi) network, a Personal Area Network (PAN), a Local Area Network (LAN), or a Metropolitan Area Network (MAN). Various devices in the network environment 100 may be configured to connect to the communication network 112 in accordance with various wired and wireless communication protocols. Examples of such wired and wireless communication protocols may include, but are not limited to, a Transmission Control Protocol and Internet Protocol (TIP/IP), User Datagram Protocol (UDP), Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), Zig Bee, EDGE, IEEE 802.11 , light fidelity (Li-Fi), 802.16, IEEE 802.11s, IEEE 802.11g, multi-hop communication, wireless access point (AP), device to device communication, cellular communication protocols, and Bluetooth (BT) communication protocols. [0037] In operation, the system 102 may be configured to receive the input data 114 that includes specific information. In some embodiments, the input data 114 may be one of audio data or image data. Further, the specific information may be PH 114A included in the input data 114. In an instance, for the input data 114 including an image of a person, the PH 114A may be facial features of the person. In another instance, for the input data 114 including an audio of a person, the PH 114A may be voice, pitch, or tone of the person. The system 102 may receive the input data 114 from a user through a user-interface. Alternatively, the system 102 may receive the input data 114 through the server 108 or the database 110. Details related to the reception of the input data 114 are further described, for example, in FIG. 3.

[0038] The system 102 may further be configured to extract features of the input data 114 based on the application of the feature extraction model. The system 102 may execute at least one of a Scale-Invariant Feature Transform (SIFT) or a Hough Transformation, based on the application of the feature extraction model, to extract the features of the input data 114. Further, the system 102 may extract one or more major features in a spatial domain corresponding to the input data 114 from the extracted features of the input data 114. In an instance, when the image data includes a human face, the features associated with the image data may include facial features, texture, color, background, dimensions, and the like. The major features extracted may include the facial features of the human face. In an instance, when the audio data includes a human voice, the features associated with the audio data may include pitch, tone, volume, articulation, and vocal qualities of the human voice, background noise, a file format, duration, sample rate, and the like. The major features extracted may include the pitch, the tone, the volume, the articulation, and the vocal qualities of the human voice. In an embodiment, the system 102 may extract the one or more major features based on one of a selection of a number of the one or more features to be extracted or a degree of anonymization of the specific information. In an embodiment, the system 102 may extract the one or more major features in the spatial domain based on a statistical method. In an instance, the statistical method may include at least one of a t-Distributed Neighbor Embedding (t-SNE) method or a Principal Component Analysis (PCA) method. Details related to the extraction of the one or more major features are further described, for example, in FIG. 3.

[0039] The system 102 may further be configured to transform the extracted one or more major features into a frequency domain. In an instance, the system 102 may transform the extracted one or more major features into the frequency domain based on at least one of a fast Fourier transform (FFT) or a discrete Fourier transform (DFT). The system 102 may process each of the extracted one or more major features to transform the one or more major features into the frequency domain. In an instance, in image data, the transformed data may include broad structures as well as fine details and edges in the frequency domain. Details related to the transformation of the one or more major features are further described, for example, in FIG. 3.

[0040] The system 102 may further be configured to identify frequency components corresponding to the extracted one or more major features transformed into the frequency domain. In an instance, FFT or DFT may reveal frequency components corresponding to the one or more major features transformed into the frequency domain. In an instance, in image data, low-frequency components typically represent broad structures, while high-frequency components correspond to fine details and edges. In another instance, for audio data, the FFT may be used to identify dominant frequencies or harmonic structures. Details related to the identification of the frequency components are further described, for example, in FIG. 3.

[0041] The system 102 may further be configured to suppress one or more frequency components among the identified frequency components. In an embodiment, the system 102 may detect one or more undesired frequency components corresponding to the extracted one or more major features transformed into the frequency domain. Once the undesired frequency components are detected, the system 102 may employ filtering techniques to suppress the one or more undesired frequency components. In an instance, the system 102 may employ notch filters, which are configured to attenuate specific narrow frequency bands. Alternatively, the system 102 may employ adaptive filters, which dynamically adjust corresponding parameters to target and reduce the undesired frequency components. The filters may be implemented in either the analog or digital domain, depending on the application requirements. The system 102 may calibrate the suppression of the one or more frequency components to ensure that only the undesired frequency components are attenuated while preserving the integrity of the desired frequency components. The selective attenuation may improve signal-to-noise ratio and enhance overall quality and usability of the major features (transformed into the frequency domain) for subsequent processing or analysis. Details related to the suppression of one or more frequency components are further described, for example, in FIG. 3.

[0042] The system 102 may further be configured to generate signal domain data corresponding to the extracted one or more major features based on the suppression of the one or more frequency components. In an exemplary embodiment, the system 102 may process the extracted one or more major features to generate sparse compressed data in the spatial domain corresponding to the input data 114. The system 102 may further apply a transformation model (for instance, the first transformation model 104A) of the set of transformation models 104 to transform the generated sparse compressed data in the spatial domain into a signal domain. Details related to the generation of the signal domain data are further described, for example, in FIG. 3. [0043] The system 102 may further be configured to transform the generated signal domain data from the frequency domain to the spatial domain based on an inverse transformation of the generated signal domain data. The inverse transformation may include applying inverse mathematical operations to reconstruct the generated signal domain data from the frequency domain to the spatial domain. The process of reconstruction may involve summing up all the sinusoidal components (sine and cosine waves) with respective amplitudes and phases, as determined in the frequency domain. Details related to the transformation of the generated signal domain data are further described, for example, in FIG. 3.

[0044] The system 102 may further be configured to generate output metadata based on the inverse transformation of the generated signal domain data. The generated output metadata may include an anonymized version of the specific information included in the input data. The anonymized version of the specific information may be interpretable by an Artificial Intelligence (Al) model and uninterpretable by a human. In an exemplary embodiment, the system 102 may anonymize the PH 114A, which may be included in the input data 114. The anonymized PH 114A may be interpretable by the Artificial Intelligence (Al) model and may be uninterpretable by the human. Details related to the generation of the output metadata are further described, for example, in FIG. 3.

[0045] The system 102 may further be configured to encode the generated output metadata in a specific file format. In an embodiment, the system 102 may encode the generated output metadata in the same file format as that of the input data 114. For instance, if the input data 114 incudes the image data in JPG format, then the system 102 may encode the generated output metadata in the JPG format itself. In another instance, if the input data 114 incudes audio data in MP3/ WAV format, then the system 102 may encode the generated output metadata in the same format. The encoding process may enhance the security and portability of the transformed data.

[0046] The system 102 may further be configured to add specific metadata to the encoded generated output metadata. The added specific metadata may correspond to a watermarking scheme. In an instance, the watermarking scheme may be used to track or verify (based on the masking) the input data 114, as the input data 114 may be transmitted to other devices or retrieved from the database 110.

[0047] FIG. 2 is a block diagram that illustrates an exemplary system of FIG. 1 , in accordance with an embodiment of the disclosure. FIG. 2 is explained in conjunction with elements from FIG. 1. With reference to FIG. 2, there is shown a block diagram 200 of the system 102. The system 102 may include circuitry 202, a memory 204, a network interface 206, and an input/output (I/O) device 208. The I/O device 208 may include a display device 208A. The memory 204 may include the set of transformation models 104, the masking model 106, and the input data 114. The network interface 206 may connect the system 102 with the server 108, via the communication network 112.

[0048] The circuitry 202 may include suitable logic, circuitry, and/or interfaces that may be configured to execute program instructions associated with different operations to be executed by the system 102. The operations may include, for instance, input data reception, major features extraction, frequency components identification, signal domain data generation, and output metadata generation. The circuitry 202 may include one or more processing units, which may be implemented as a separate processor. In an embodiment, the one or more processing units may be implemented as an integrated processor or a cluster of processors that perform functions of one or more specialized processing units, collectively. The circuitry 202 may be implemented based on a number of processor technologies known in the art. Examples of implementations of the circuitry 202 may be an X86-based processor, a Graphics Processing Unit (GPU), a Reduced Instruction Set Computing (RISC) processor, an Application-Specific Integrated Circuit (ASIC) processor, a Complex Instruction Set Computing (CISC) processor, a microcontroller, a central processing unit (CPU), and/or a combination thereof.

[0049] The memory 204 may include suitable logic, circuitry, interfaces, and/or code that may be configured to store one or more instructions to be executed by the circuitry 202. The one or more instructions stored in the memory 204 may be configured to execute the different operations of the circuitry 202 (and/or the system 102). The memory 204 may be further configured to store the set of transformation models 104, the masking model 106, and the input data 114. Examples of implementation of the memory 204 may include, but are not limited to, Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Hard Disk Drive (HDD), a Solid-State Drive (SSD), a CPU cache, and/or a Secure Digital (SD) card.

[0050] The network interface 206 may include suitable logic, circuitry, interfaces, and/or code that may be configured to facilitate communication between the system 102 and the server 108, via the communication network 112. The network interface 206 may be implemented by use of various known technologies to support wired or wireless communication of the system 102 with the communication network 112. The network interface 206 may include, but is not limited to, an antenna, a radio frequency (RF) transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a coder-decoder (CODEC) chipset, a subscriber identity module (SIM) card, or a local buffer circuitry.

[0051] The network interface 206 may be configured to communicate via a wireless communication with networks, such as the Internet, an Intranet, a wireless network, a cellular telephone network, a wireless local area network (LAN), or a metropolitan area network (MAN). The wireless communication may be configured to use one or more of a plurality of communication standards, protocols and technologies, such as Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), wideband code division multiple access (W-CDMA), Long Term Evolution (LTE), 5^th Generation (5G) New Radio (NR), code division multiple access (CDMA), time division multiple access (TDMA), Bluetooth, Wireless Fidelity (Wi-Fi) (such as IEEE 802.11a, IEEE 802.11 b, IEEE 802.11g or IEEE 802.11n), voice over Internet Protocol (VoIP), light fidelity (Li-Fi), Worldwide Interoperability for Microwave Access (Wi-MAX), a protocol for email, instant messaging, and a Short Message Service (SMS).

[0052] The I/O device 208 may include suitable logic, circuitry, interfaces, and/or code that may be configured to receive the input data 114 and provide the generated output metadata including an anonymized version of the specific information included in the input data 114. For example, the I/O device 208 may receive audio data or image data as the input data 114. The I/O device 208 may be further configured to render the generated output metadata on a user interface, for instance, a user device. Examples of the I/O device 208 may include, but are not limited to, a display (e.g., a touch screen), a keyboard, a mouse, a joystick, a microphone, or a speaker. Examples of the I/O device 208 may further include braille I/O devices, such as, braille keyboards and braille readers.

[0053] The display device 208A may include suitable logic, circuitry, and interfaces that may be configured to display or render the anonymized version of the specific information included in the input data 114. In some embodiments, the display device 208A may be a touch screen which may enable a user to provide a user-input via the display device 208A. The display device 208A may be realized through several known technologies such as, but not limited to, at least one of a Liquid Crystal Display (LCD) display, a Light Emitting Diode (LED) display, a plasma display, or an Organic LED (OLED) display technology, or other display devices. In accordance with an embodiment, the display device 208A may refer to a display screen of a head mounted device (HMD), a smart-glass device, a see-through display, a projection-based display, an electro-chromic display, or a transparent display. Various operations of the circuitry 202 are described further, for example, in FIG. 3A and FIG. 3B.

[0054] FIG. 3A and FIG. 3B are diagrams that collectively illustrate an execution pipeline for processing input data, in accordance with an embodiment of the disclosure. FIG. 3A and FIG. 3B are explained in conjunction with elements from FIG. 1 and FIG. 2. With reference to FIG. 3A and FIG. 3B, there is shown an exemplary execution pipeline 300 including a sequence of operations that may be executed by the circuitry 202 or the system 102. The sequence of operations may be executed for anonymization of PH 114A in the input data 114, for instance image data. The sequence of operations that may start at 302 and may terminate at 314.

[0055] With reference to FIG. 3A, at 302, an input data may be received. The circuitry 202 may receive the input data 114. For instance, an input image may be received as the input data 114. In an instance, the input image may include a human face, and the corresponding personally identifiable information may include facial features of the human face. In another case, the received input image may include a vehicle, and the corresponding personally identifiable information may include a license plate number of the vehicle. Further, the received input image may be an independent image. Alternatively, the input image may be extracted from a video. Further, the input image may be transmitted to the circuitry 202 through a user device. Alternatively, the circuitry 202 may retrieve the input image from the server 108, the database 110, or the memory 204. Alternatively, the input image may be captured through an imaging device in real time (or near-real time) and transmitted to the circuitry 202.

[0056] At 304A, a first transformation model may be applied to the input data. The circuitry 202 may be configured to apply the first transformation model 104A of the set of transformation models 104 on the input data 114. For example, the circuitry 202 may be configured to extract a first set of features of the input image based on the application of the first transformation model 104A. The circuitry 202 may extract one or more major features from the first set of features in a spatial domain corresponding to the input image.

[0057] At 304B, a second transformation model may be applied to the input data. The circuitry 202 may be configured to apply the second transformation model 104B of the set of transformation models 104 on the input data 114. For example, the circuitry 202 may be configured to extract a second set of features of the input image based on the application of the second transformation model 104B. Further, the circuitry 202 may extract one or more major features from the second set of features in the spatial domain corresponding to the input image. The extracted one or more major features may represent key characteristics of the input image. In an embodiment, the first transformation model 104A and the second transformation model 104B may be feature extraction models. Further, the circuitry 202 may be configured to execute at least one of a Scale-Invariant Feature Transform (SIFT) or a Hough Transformation, based on the application of the feature extraction model. In an embodiment, each of the first set of features and the second set of features may form a part of the extracted features of the input data 114.

[0058] In an embodiment, the first transformation model 104A and the second transformation model 104B may be applied simultaneously to the input image. For example, the first transformation model 104A may employ edge detection techniques to identify object boundaries in the input image, while the second transformation model 104B may use texture analysis to capture surface patterns in the input image.

[0059] In a similar manner, the first transformation model 104A and the second transformation model 104B may be applied to audio data. The first transformation model 104A may extract temporal features like rhythm from the audio data, while the second transformation model 104B may extract spectral characteristics of the audio data. Alternative implementations may also include wavelet transforms for multi-resolution analysis or deep learning-based feature extractors trained on domain-specific datasets.

[0060] At 306, a third transformation model may be applied to the input data. The circuitry 202 may be configured to apply a third transformation model (for instance, the third transformation model 104C) of the set of transformation models 104 on the input data 114. The third transformation model 104C may encompass multiple methods that may be executed in parallel. Examples of such methods may include Principal Component Analysis (PCA) and t-Distributed Neighbor Embedding (t-SNE). The third transformation model 104C may be applied for dimensionality reduction and data visualization.

[0061] For instance, in a scenario that involves high-dimensional input data, the PCA method may be used to identify the most significant variables that contribute to data variance in the input data 114. In case of the t-SNE method, a two-dimensional representation of the input data 114 may be created such that local relationships between data points associated with the high-dimensional input data may be preserved.

[0062] As shown in FIG. 3A, the application of the third transformation model at 306 may include various operations, such as, an application of the t-SNE method at 308A, an application of PCA method at 310A, an application of the t-SNE method at 308B, and an application of PCA method at 310B. The operations at 308A and 310A may be executed on an output determined (for instance, first set of features) based on the application of the first transformation model 104A, at 304A. The operations at 308B and 31 OB may be executed on an output determined (for instance, second set of features) based on the application of the second transformation model 104B, at 304B. The operations at 308A, 31 OA, 308B, and 31 OB are explained herein.

[0063] At 308A, the t-SNE method may be applied. The circuitry 202 may apply the t-SNE method to extract the one or more major features in the spatial domain based on the application of the first transformation model 104A. The t-SNE method involves a machine learning algorithm for visualization of high-dimensional data associated with the extracted features, which may convert similarities between data points to joint probabilities and may minimize a Kullback-Leibler divergence between joint probabilities of a lower-dimensional embedding and higher-dimensional data. The t-SNE method may be employed to create a low-dimensional representation of the input data 114 that may preserve local relationships between the data points. For instance, in natural language processing, the t-SNE method may help visualize semantic relationships between words.

[0064] At 310A, the PCA method may be applied. The circuitry 202 may apply the PCA method to extract the one or more major features in the spatial domain based on the application of the first transformation model 104A. The PCA method involves a statistical procedure that uses orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. In an instance, the PCA method may be used to reduce the dimensionality of the extracted features while the most significant variations may be retained. For example, in image processing, the PCA method may be applied to compress facial features for efficient face recognition. In audio data analysis, the PCA method may be used to identify the most significant key notes. Alternative implementations may include Kernel PCA for non-linear dimensionality reduction or Incremental PCA to handle large datasets that don't fit in the memory 204.

[0065] At 308B, the t-SNE method may be applied. The circuitry 202 may apply the t-SNE method to extract the one or more major features in the spatial domain based on the application of the second transformation model 104B (the application of the t-SNE method is elaborated at 308A).

[0066] At 31 OB, the PCA method may be applied. The circuitry 202 may apply the PCA method to extract the one or more major features in the spatial domain based on the application of the second transformation model 104B (the application of the PCA method is elaborated at 31 OA).

[0067] At 312, data related to major features may be concatenated. The circuitry 202 may perform the data concatenation based on a combination of data related to the one or more major features extracted at operations 308A, 308B, 31 OA, and 31 OB of the third transformation model (for instance, the third transformation model 104C), at 306. To concatenate the data related to the extracted one or more major features, the circuitry 202 may align and merge feature vectors associated with the one or more major features from the different transformations. For example, in a multi-modal data analysis scenario, at least one of the one or more major features extracted from the image data may be concatenated with major features from corresponding text data. In time series analysis, major features from different time scales or domains (e.g., time domain and frequency domain) may be combined. In an instance, the circuitry 202 may perform weighted concatenation, where major features from different sources may be assigned different importance based on the data. In another instance, the circuitry 202 may perform adaptive concatenation, where the concatenation strategy may be learned based on the data. [0068] At 314, an operation for sparsification of compressed data in feature space may be performed. The circuitry 202 may be configured to perform the sparsification of compressed data in feature space (for example, the feature vectors associated with the concatenated data), which may involve creation of a more compact representation of the compressed data while retaining essential features. Techniques such as thresholding, where small values are set to zero, or L1 regularization, which encourages sparsity, may be employed for the sparsification. For instance, in natural language processing, the sparsification may involve use of only the most informative words in a document representation. In image processing, the sparsification may involve use of only the most significant wavelet coefficients.

[0069] With reference to FIG. 3B, at 316, a fourth transformation model may be applied. The circuitry 202 may be configured to apply the fourth transformation model (for instance, the Nth transformation model 104N) of the set of transformation models 104 on the sparsified data. The circuitry 202, based on the application of the fourth transformation model, may convert the sparsified data from the spatial domain to a frequency domain. The process of conversion may utilize advanced signal processing techniques such as wavelet transforms or Gabor filters to capture both spatial and frequency information simultaneously. For example, in image processing applications, the conversion may help identify texture patterns or recurring structures that are not easily discernible in the spatial domain.

[0070] As shown in FIG. 3A, the operation 316 (i.e., the application of a fourth transformation model) may include sub-operations, such as, an application of FFT at 318, a generation of signal domain data corresponding to major features at 320, a suppression of top-N major features at 322, and an application of I FFT at_324. The sub-operations 318, 320, 322, and 324 are described herein.

[0071] At 318, a Fast Fourier Transform (FFT) may be applied. The circuitry 202 may be configured to apply the FFT to convert the spatial domain representation of the sparsified data into the frequency domain representation using the Fast Fourier Transform. The FFT may reveal frequency components corresponding to various characteristics of the sparsified data. In image processing, low-frequency components typically represent broad structures, while high-frequency components correspond to fine details and edges. For audio signal processing, the FFT may be used to identify dominant frequencies or harmonic structures. In another embodiment, the circuitry 202 may apply the DFT to convert the spatial domain representation of the sparsified data into the frequency domain representation (which works in a similar manner).

[0072] At 320, signal domain data corresponding to major features may be generated. The circuitry 202 may be configured to generate the signal domain data corresponding to the major features based on the application of the FFT. The generation of the signal domain data may involve analysis of all the frequency components associated with the major features transformed into the frequency domain to identify the frequency components most significant or representative of the input image. Techniques such as spectral kurtosis or higher-order statistics may be employed to detect non-Gaussian components that could represent the major features.

[0073] At 322, significant major features may be suppressed. The circuitry 202 may be configured to suppress of the significant major features associated with the input image. The suppression of the significant major features (say, top-N major features) may be crucial in the process of anonymization, which may involve selective suppression or modification of certain frequency components identified as potentially revealing sensitive information. The number N may be dynamically adjusted based on a degree of anonymization required or a selection of a number of the major features required. In an instance, advanced techniques such as adaptive thresholding or machine learning-based feature selection may be employed to optimize the suppression process. In scenarios involving facial recognition data, the process of suppression may involve suppression of features that correspond to unique facial characteristics. For voice data anonymization, the process of suppression may involve modification of formant frequencies while overall speech intelligibility may be preserved.

[0074] At 324, an Inverse Fast Fourier Transform (IFFT) may be applied. The circuitry 202 may be configured to apply the IFFT to convert the modified frequency domain representation back into the spatial domain. The process of conversion may involve techniques to mitigate artifacts introduced during the frequency domain modifications, such as windowing or overlapadd methods. In image processing applications, the process of IFFT may include postprocessing to enhance edge preservation or reduce ringing artifacts. For audio signals, phase vocoder techniques may be employed to maintain time-scale consistency.

[0075] At 326, output metadata may be generated. The circuitry 202 may be configured to generate the output metadata based on the inverse transformation (e.g., through the application of the IFFT) of the generated signal domain data. The process of generation of the output metadata may incorporate advanced data structuring techniques to ensure compatibility with various Al frameworks. For instance, in scenarios that involve medical imaging data, the output metadata may be formatted to comply with Digital Imaging and Communications in Medicine (DICOM) standards while anonymization may be preserved. In natural language processing applications, the output metadata may be structured as word embeddings or contextual representations.

[0076] The output metadata may include an anonymized version of the specific information (for instance, the PI I 114A such as facial features) included in the input data 114 (such as the input image). The anonymized version of the specific information may be interpretable by an Artificial Intelligence (Al) model and uninterpretable by a human.

[0077] The circuitry 202 may generate metadata in the same format as the input data 114 to maintain consistency and allow seamless integration with existing Al models and processing pipelines. For instance, if the input data 114 is an RGB image with dimensions of 224x224x3, the generated output metadata may maintain the same dimensions and structure. The approach could be extended to various data types, such as, to maintain the time series structure for sensor data or preserve the hierarchical structure of extensible Markup Language (XML) documents.

[0078] FIG. 4 is a flow diagram that illustrates execution pipeline for automating generation of metadata based on knob parameters, in accordance with an embodiment of the disclosure. FIG. 4 is described in conjunction with elements from FIG. 1 , FIG. 2, FIG. 3A, and FIG. 3B. With reference to FIG. 4, there is shown an exemplary flow diagram 400 including a sequence of operations for automating generation of output metadata that may be executed by the circuitry 202 or the system 102. The system 102 may include a neural network model (not shown).

[0079] At 402, the circuitry 202 may retrieve a set of knob parameters associated with the extraction of the one or more major features. At 404, the set of knob parameters associated with the extraction of the one or more major features may be fed into the neural network model. In an instance, the set of knob parameters may correspond to a privacy requirement of a downstream training task. Further, the circuitry 202 may train the neural network model to learn the set of knob parameters associated with the extraction of the one or more major features. In an example, the set of knob parameters may correspond to weights, hyperparameters, or other parameters for training/finetuning of the neural network model. In an example, the neural network model may correspond to the masking model 106. [0080] The circuitry 202 may further apply the trained neural network model on the input data 114 for automating generation of the output metadata. The application of the trained neural network model may control the set of knob parameters associated with the extraction of the one or more major features to protect privacy of the input data 114 and to reduce any possibility of cyberattacks on the input data 114.

[0081] FIG. 5 is a diagram that illustrates an exemplary scenario for conversion of output metadata generated based on anonymization to machine-interpretable data, in accordance with an embodiment of the disclosure. FIG. 5 is described in conjunction with elements from FIG. 1 , FIG. 2, FIG. 3A, and FIG. 3B. With reference to FIG. 5, there is shown an exemplary scenario 500. The scenario 500 may include reception of output metadata at 502, an operation for an extra-sparse representation in compressed space at 504, an operation for a stepwise feature upscaling at 506, and generation of machine-interpretable data at 508. The various operations (i.e., at 504 and 506) of the scenario 500 may be executed by the system 102 or the circuitry 202.

[0082] As shown in FIG. 5, the circuitry 202 may be configured to transmit the generated output metadata to a user interface, at 502. In some instances, the output metadata may include anonymized specific information (e.g., the PH 114A) derived from the input data 114. The generation of the output metadata is described further, for example, in FIG. 3B (at 326). [0083] At 504, the operation for extra-sparse representation of the output metadata in a compressed space may be performed, the circuitry 202 may be configured to perform the extra-sparse representation of the output metadata in the compressed space. The process of the extra-sparse representation may involve further compression of the output metadata to create a more compact representation. In some instances, the extra-sparse representation may be achieved through techniques such as sparse coding or dictionary learning. For example, in a scenario that involves image data, the extra-sparse representation may capture essential visual patterns using a minimal set of basis-functions, based on techniques such as, K-SVD (K-Singular Value Decomposition) for dictionary learning. In natural language processing applications, the extra-sparse representation may involve representation of text data using latent semantic structures, through techniques like Non-negative Matrix Factorization (NMF) or Latent Dirichlet Allocation (LDA).

[0084] At 506, the operation for stepwise feature upscaling may be performed. The circuitry 202 may perform the stepwise feature upscaling based on gradual reconstruction of the output metadata from a compressed form into the machine-interpretable data at 508. The machine- interpretable data may be suitable for Al model interpretation. In some instances, the process of stepwise feature upscaling may employ techniques such as progressive growth of neural networks or multi-scale feature synthesis. For instance, in case of image data, the process of feature upscaling may involve application of super-resolution techniques to enhance detail and maintain anonymization, using methods like SRGAN (Super-Resolution Generative Adversarial Network) or ESRGAN (Enhanced SRGAN). In audio processing scenarios, the stepwise feature upscaling may reconstruct frequency components in a manner that preserves overall acoustics characteristics without revealing identifiable voice features, using techniques like WaveNet for high-fidelity audio generation. For text data, the process of feature upscaling may involve hierarchical decoding to reconstruct linguistic structures from compressed representations, using transformer-based models with progressive decoding strategies.

[0085] Based on the feature upscaling (at 506), the circuitry 202 may be configured to generate the machine-interpretable data (at 508) that may retain the essential characteristics and statistical properties of the input data 114 and allow Al models to perform effectively on the generated data. In some instances, the machine-interpretable data may be optimized for specific Al architectures. For example, in computer vision applications, the machine- interpretable data may be structured as feature maps compatible with convolutional neural networks, using techniques like channel-wise attention mechanisms or spatial pyramid pooling for enhanced feature representation. In natural language processing tasks, the machine-interpretable data may take the form of contextualized word embeddings or attention maps, using methods like BERT (Bidirectional Encoder Representations from Transformers) or GPT (Generative Pre-trained Transformer) for contextual encoding. For time-series analysis, the machine-interpretable data may be formatted as recurrent neural network states or wavelet coefficients.

[0086] The machine-interpretable data may be in a format same as a format of the input data 114. The consistency in format may allow for seamless integration with existing Al models and processing pipelines. For instance, if the input data 114 is an RGB image with specific dimensions, the generated machine-interpretable data may maintain the same dimensions and structure, albeit with transformed content that is no longer visually interpretable as an image by humans. In the case of audio data, the generated machine- interpretable data may preserve the same sampling rate and duration as the input, but with frequency components altered to mask identifiable characteristics. For text data, the generated machine-interpretable data may maintain the same sequence length and tokenization scheme as the input, but with individual tokens replaced by anonymized representations.

[0087] The machine-interpretable data may be usable to train Al models. In some embodiments, the anonymized yet statistically relevant nature of the machine-interpretable data may allow for effective training of the Al models while individual privacy may not be compromised. For example, in healthcare applications, Al models may be trained on anonymized patient data to identify disease patterns without exposure to personal health information, using federated learning techniques to keep data decentralized.

[0088] The combination of the operations such as, the extra-sparse representation at 504 and the stepwise feature upscaling at 506 for the generation of machine-interpretable data at 508 may ensure that the generated machine-interpretable data remains useful for Al applications while being uninterpretable by humans. The generated machine-interpretable data may address privacy concerns in various domains where sensitive data needs to be analyzed without risk to individual identification. For instance, in smart city applications, data from various sensors may be processed to enable Al-driven urban planning while citizens' privacy may be protected, using techniques like edge computing for local data processing and aggregation.

[0089] FIG. 6A is a diagram that illustrates producer plugin flowchart, in accordance with an embodiment of the disclosure. FIG. 6A is described in conjunction with elements from FIG. 1 , FIG. 2, FIG. 3A, FIG. 3B, FIG. 4, and FIG. 5. With reference to FIG. 6A, there is shown a producer plugin flowchart 600A. The flowchart 600A may include operations from 602 to 614 and may be implemented by the system 102 of FIG. 1 or by the circuitry 202 of FIG. 2. The flowchart 600A may start at 602 and proceed to 604.

[0090] At 604, the circuitry 202 may receive the input data 114 including PH 114A. In an embodiment, for the input data 114 including an image of a person, the PH 114A may be facial features of the person. In another embodiment, for the input data 114 including an audio of a person, the PH 114A may be voice, pitch, or tone of the person.

[0091] At 606, the circuitry 202 may extract one or more major features in a spatial domain corresponding to the input data 114. In an embodiment, the circuitry 202 may extract features of the input data 114 based on the application of the feature extraction model. Further, the circuitry 202 may extract one or more major features in a spatial domain corresponding to the input data 114 from the extracted features of the input data 114. In an instance, when the image data includes a human face, the major features extracted may include the facial features of the human face. In another instance, when the audio data includes a human voice, the major features extracted may include the pitch, the tone, the volume, the articulation, and the vocal qualities of the human voice. In an embodiment, the circuitry 202 may extract the one or more major features based on one of a selection of a number of the one or more features to be extracted or a degree of anonymization of the specific information.

[0092] At 608, the circuitry 202 may apply feature transformations to the one or more major features. In an embodiment, the circuitry 202 may identify frequency components corresponding to the extracted one or more major features transformed into the frequency domain. In an instance, FFT or DFT may reveal frequency components corresponding to the one or more major features transformed into the frequency domain. Further, the circuitry 202 may detect one or more undesired frequency components corresponding to the extracted one or more major features transformed into the frequency domain, and may further employ filtering techniques to suppress the one or more undesired frequency components.

[0093] At 610, the circuitry 202 may apply domain transformations based on the suppression of the one or more frequency components. The circuitry 202 may generate signal domain data corresponding to the extracted one or more major features based on the suppression of the one or more frequency components. In an instance, the circuitry 202 may process the extracted one or more major features to generate sparse compressed data in the spatial domain corresponding to the input data 114. The circuitry 202 may further apply a transformation model (for instance, the first transformation model 104A) of the set of transformation models 104 to transform the generated sparse compressed data in the spatial domain into a signal domain.

[0094] At 612, the circuitry 202 may apply the trained neural network model on the input data 114 for automating generation of the output metadata. The application of the trained neural network model may control the set of knob parameters associated with the extraction of the one or more major features to protect privacy of the input data 114 and to reduce any possibility of cyberattacks on the input data 114.

[0095] At 614, the circuitry 202 may save the generated output metadata in a specific file format. In an embodiment, the circuitry 202 may encode and save the generated output metadata in the same file format as that of the input data 114. Control may pass to end.

[0096] Although the flowchart 600A is illustrated as discrete operations, such as, 604, 606, 608, 610, 612, and 614, the disclosure is not so limited. Accordingly, in certain embodiments, such discrete operations may be further divided into additional operations, combined into fewer operations, or eliminated, depending on the implementation without detracting from the essence of the disclosed embodiments.

[0097] FIG. 6B is a diagram that illustrates consumer plugin flowchart, in accordance with an embodiment of the disclosure. FIG. 6B is described in conjunction with elements from FIG.

1 , FIG. 2, FIG. 3A, FIG. 3B, FIG. 4, FIG. 5, and FIG. 6A. With reference to FIG. 6B, there is shown a producer plugin flowchart 600B. The flowchart 600B may include operations from 616 to 622 and may be implemented by the system 102 of FIG. 1 or by the circuitry 202 of FIG. 2. The flowchart 600B may start at 616 and proceed to 618.

[0098] At 618, the circuitry 202 may load the output metadata (saved at 614) to a consumer plugin. The consumer plugin may be an application program of a user computing device. Further, at 620, the circuitry 202 may perform stepwise feature upscaling based on gradual reconstruction of the output metadata from a compressed form into the machine-interpretable data. The machine-interpretable data may be suitable for Al model interpretation. In an embodiment, the machine-interpretable data may retain the essential characteristics and statistical properties of the input data 114.

[0099] At 622, the circuitry 202 may convert the machine-interpretable data into a trainable format. In an embodiment, the machine-interpretable data may be in a format same as a format of the input data 114. The consistency in format may allow for seamless integration with existing Al models and processing pipelines. The machine-interpretable data may be usable to train Al models. The anonymized yet statistically relevant nature of the machine- interpretable data may allow for effective training of the Al models while individual privacy may not be compromised. Control may pass to end.

[0100] Although the flowchart 600B is illustrated as discrete operations, such as, 618, 620, and 622, the disclosure is not so limited. Accordingly, in certain embodiments, such discrete operations may be further divided into additional operations, combined into fewer operations, or eliminated, depending on the implementation without detracting from the essence of the disclosed embodiments.

[0101] FIG. 7 is a flowchart that illustrates exemplary operations of a method for Artificial Intelligence (Al)-interpretable and human-uninterpretable anonymization of user data, in accordance with an embodiment of the disclosure. FIG. 7 is described in conjunction with elements from FIG. 1 , FIG. 2, FIG. 3A, FIG. 3B, FIG. 4, FIG. 5, FIG. 6A, and FIG. 6B. With reference to FIG. 7, there is shown a flowchart 700. The flowchart 700 may include operations from 702 to 718 and may be implemented by the system 102 of FIG. 1 or by the circuitry 202 of FIG. 2. The flowchart 700 may start at 702 and proceed to 704.

[0102] At 704, input data including specific information may be received. The circuitry 202 may be configured to receive the input data 114, which may include audio data or image data. Further, the specific information may be personal identifiable information 114A included in the input data 114. In an instance, for the input data 114 including an image of a person, the specific information may include personal identifiable information 114A (for instance, facial features) of the person. In another instance, for the input data 114 including an audio of a person, the specific information may include corresponding personal identifiable information 114A (for instance, voice, pitch, tone, etc.) of the person. Details related to the reception of the input data 114 are further described, for example, in FIG.3.

[0103] At 706, one or more major features may be extracted. The circuitry 202 may be configured to extract one or more major features in a spatial domain corresponding to the input data 114. In an embodiment, the system 102 may extract features of the input data 114 based on the application of the feature extraction model. The circuitry may be configured to execute at least one of the SIFT or the Hough Transformation to extract the features of the input data 114, based on the application of the feature extraction model. Further, the system 102 may extract the one or more major features corresponding to the input data 114 from the extracted features of the input data 114. In an instance, the image data may include a human face, the features associated with the image data may include facial features, texture, color, background, dimensions, and the like. The major features extracted may include facial features of the human face. In an instance, the audio data may include a human voice, the features associated with the audio data may include pitch, tone, volume, articulation, and vocal qualities of the human voice, background noise, file format, duration, sample rate, and the like. The major features extracted may include pitch, tone, volume, articulation, and vocal qualities of the human voice. In an embodiment, the system 102 may extract the one or more major features based on one of a selection of a number of the one or more features to be extracted or a degree of anonymization of the specific information required. In an embodiment, the system 102 may extract the one or more major features in the spatial domain based on a statistical method. In an exemplary embodiment, the statistical method may include at least one of a t-Distributed Neighbor Embedding (t-SNE) method or a Principal Component Analysis (PCA) method. Details related to the extraction of the one or more major features are further described, for example, in FIG.3.

[0104] At 708, extracted one or more major features may be transformed into a frequency domain. In an instance, the system 102 may transform the extracted one or more major features into the frequency domain based on at least one of a fast Fourier transform (FFT) or a discrete Fourier transform (DFT). The system 102 may process each of the extracted one or more major features to transform into the frequency domain. In an instance, in image data, the transformed data may include broad structures as well as fine details and edges in the frequency domain. Details related to the transformation of the one or more major features are further described, for example, in FIG.3.

[0105] At 710, frequency components corresponding to the one or more transformed major features may be identified. The system 102 may further be configured to identify frequency components corresponding to the extracted one or more major features transformed into the frequency domain. The FFT or DFT may reveal frequency components corresponding to the one or more major features transformed into the frequency domain. In an instance, in image data, low-frequency components typically represent broad structures, while high-frequency components correspond to fine details and edges. In another instance, for audio data, the FFT may be used to identify dominant frequencies or harmonic structures. Details related to the identification of the frequency components are further described, for example, in FIG.3.

[0106] At 712, one or more frequency components among the identified frequency components may be suppressed. In an embodiment, the system 102 may detect one or more undesired frequency components corresponding to the extracted one or more major features transformed into the frequency domain. Once the undesired frequency components are detected, the system 102 may employ filtering techniques to suppress the one or more undesired frequency components. In an instance, the system 102 may employ notch filters, which are configured to attenuate specific narrow frequency bands. Alternatively, the system 102 may employ adaptive filters, which dynamically adjust corresponding parameters to target and reduce the undesired frequency components. The filters may be implemented in either the analog or digital domain, depending on the application requirements. The system 102 may calibrate the suppression of the one or more frequency components to ensure that only the undesired frequency components are attenuated while preserving the integrity of the desired frequency components. The selective attenuation may improve signal-to-noise ratio and enhance overall quality and usability of the major features (transformed into the frequency domain) for subsequent processing or analysis. Details related to the suppression of one or more frequency components are further described, for example, in FIG. 3.

[0107] At 714, signal domain data corresponding to the extracted one or more major features may be generated. The system 102 may further be configured to generate signal domain data corresponding to the extracted one or more major features based on the suppression of the one or more frequency components among the identified frequency components. In an exemplary embodiment, the system 102 may process the extracted one or more major features to generate sparse compressed data in the spatial domain corresponding to the input data. The system 102 may further apply a transformation model (for instance, first transformation model 104A) of the set of transformation models 104 to transform the generated sparse compressed data in the spatial domain into a signal domain. Details related to the generation of the signal domain data are further described, for example, in FIG.3.

[0108] At 716, the generated signal domain data may be transformed from the frequency domain to the spatial domain. The system 102 may further be configured to transform the generated signal domain data from the frequency domain to the spatial domain based on an inverse transformation of the generated signal domain data. The inverse transformation may include applying inverse mathematical operations to reconstruct the generated signal domain data from the frequency domain to the spatial domain. The process of reconstruction may involve summing up all the sinusoidal components (sine and cosine waves) with respective amplitudes and phases, as determined in the frequency domain. Details related to the transformation of the generated signal domain data are further described, for example, in FIG. 3.

[0109] At 718, output metadata may be generated. The system 102 may further be configured to generate output metadata based on inverse transformation of the generated signal domain data from the frequency domain to the spatial domain. The generated output metadata may include an anonymized version of the specific information included in the input data. The anonymized version of the specific information may be interpretable by an Artificial Intelligence (Al) model and may be uninterpretable by a human. In an exemplary embodiment, the system 102 may anonymize the PH. which may be included in the input data. The anonymized PH may be interpretable by the Artificial Intelligence (Al) model and may be uninterpretable by the human. The system 102 may further be configured to generate signal domain data corresponding to the extracted one or more major features based on suppression of one or more frequency components among the identified frequency components. In an exemplary embodiment, the system 102 may process the extracted one or more major features to generate sparse compressed data in the spatial domain corresponding to the input data. The system 102 may further apply a transformation model (for instance, first transformation model 104A) of the set of transformation models 104 to transform the generated sparse compressed data in the spatial domain into a signal domain. Details related to the generation of the output metadata are further described, for example, in FIG.3. Control may pass to end.

[0110] Although the flowchart 700 is illustrated as discrete operations, such as, 704, 706, 708, 710, 712, 714, 716, and 718, the disclosure is not so limited. Accordingly, in certain embodiments, such discrete operations may be further divided into additional operations, combined into fewer operations, or eliminated, depending on the implementation without detracting from the essence of the disclosed embodiments.

[0111] Various embodiments of the disclosure may provide a non-transitory computer- readable medium and/or storage medium having stored thereon, computer-executable instructions executable by a machine and/or a computer to operate a system (for example, the system 102 of FIG. 1). Such instructions may cause the system 102 to perform operations that may include receipt of input data (for example, the input data 114 of FIG. 1) that includes specific information. The operations may further include extraction of one or more major features in a spatial domain corresponding to the input data 114. The extracted one or more major features represent the input data 114. The operations may further include transformation of the extracted one or more major features into a frequency domain. The operations may further include identification of frequency components corresponding to the extracted one or more major features transformed into the frequency domain. The operations may further include suppression of one or more frequency components among the identified frequency components. The operations may further include generation of signal domain data corresponding to the extracted one or more major features based on the suppression of the one or more frequency components. The operations may further include transformation of the generated signal domain data from the frequency domain to the spatial domain based on an inverse transformation of the generated signal domain data. The operations may further include generation of output metadata based on the inverse transformation of the generated signal domain data. The generated output metadata may include an anonymized version of the specific information included in the input data 114.

[0112] Exemplary aspects of the disclosure may provide a system (such as, the system 102 of FIG. 1) that includes circuitry (such as, the circuitry 202 of FIG. 2). The circuitry 202 may be configured to receive input data (for example, the input data 114 of FIG. 1) that includes specific information. The circuitry 202 may be configured to extract one or more major features in a spatial domain corresponding to the input data 114. The extracted one or more major features represent the input data 114. The circuitry 202 may be configured to transform the extracted one or more major features into a frequency domain. The circuitry 202 may be configured to identify frequency components corresponding to the extracted one or more major features transformed into the frequency domain. The circuitry 202 may be configured to suppress one or more frequency components among the identified frequency components. The circuitry 202 may be configured to generate signal domain data corresponding to the extracted one or more major features based on the suppression of the one or more frequency components. The circuitry 202 may be configured to transform the generated signal domain data from the frequency domain to the spatial domain based on an inverse transformation of the generated signal domain data. The circuitry 202 may be configured to generate output metadata based on the inverse transformation of the generated signal domain data. The generated output metadata may include an anonymized version of the specific information included in the input data 114.

[0113] In an embodiment, the circuitry may further be configured to encode the generated output metadata in a specific file format.

[0114] In an embodiment, the circuitry may further be configured to add specific metadata to the encoded generated output metadata. The added specific metadata may correspond to a watermarking scheme.

[0115] In an embodiment, the circuitry 202 may further be configured to apply a feature extraction model to the input data. The circuitry 202 may further be configured to extract features of the input data 114 based on the application of the feature extraction model. The circuitry 202 may further be configured to extract the one or more major features corresponding to the input data 114 from the extracted features of the input data 114.

[0116] In an embodiment, the circuitry 202 may be configured to execute at least one of a Scale-Invariant Feature Transform (SIFT) or a Hough Transformation, based on the application of the feature extraction model.

[0117] In an embodiment, the circuitry 202 may further be configured to extract the one or more major features based on one of a selection of a number of the one or more features to be extracted or a degree of anonymization of the specific information.

[0118] In an embodiment, the circuitry 202 may further be configured to train a neural network model to learn a set of knob parameters associated with the extraction of the one or more major features. The set of knob parameters may correspond to a privacy requirement of a downstream training task. The circuitry 202 may further be configured to apply the trained neural network model on the input data. The generation of the output metadata may further based on the application of the trained neural network model.

[0119] In an embodiment, the input data 114 may be one of audio data or image data. The specific information may be personal identifiable information (PH) (for example, the personal identifiable information 114A of FIG. 1) included in the input data 114. The circuitry 202 may further be configured to anonymize the PH 114A included in the input data 114. The anonymized PH may be interpretable by the Artificial Intelligence (Al) model and may be uninterpretable by the human.

[0120] In an embodiment, the circuitry 202 may further be configured to extract the one or more major features in the spatial domain based on a statistical method. The statistical method may include at least one of a t-Distributed Neighbor Embedding (t-SNE) method or a Principal Component Analysis (PCA) method.

[0121] In an embodiment, the circuitry 202 may further be configured to generate, based on the extracted one or more major features, sparse compressed data in the spatial domain corresponding to the input data 114. The circuitry 202 may further be configured to transform, based on application of a transformation model (for instance, the first transformation model 104A), the generated sparse compressed data in the spatial domain into a signal domain.

[0122] In an embodiment, the circuitry 202 may further be configured to transform the extracted one or more major features into the frequency domain based on at least one of a fast Fourier transform (FFT) or a discrete Fourier transform (DFT).

[0123] In an embodiment, the circuitry may be configured to convert the generated output metadata into machine interpretable data based on a stepwise feature space upscaling. The machine interpretable data may be in a format same as a format of the input data, the anonymized version of the specific information is interpretable by an Artificial Intelligence (Al) model and is uninterpretable by a human. The machine interpretable data may be usable to train the Al model.

[0124] The present disclosure may be realized in hardware, or a combination of hardware and software. The present disclosure may be realized in a centralized fashion, in at least one computer system, or in a distributed fashion, where different elements may be spread across several interconnected computer systems. A computer system or other apparatus adapted to carry out the methods described herein may be suited. A combination of hardware and software may be a general-purpose computer system with a computer program that, when loaded and executed, may control the computer system such that it carries out the methods described herein. The present disclosure may be realized in hardware that comprises a portion of an integrated circuit that also performs other functions.

[0125] The present disclosure may also be embedded in a computer program product, which comprises all the features that enable the implementation of the methods described herein, and which when loaded in a computer system is able to carry out these methods. Computer program, in the present context, means any expression, in any language, code or notation, of a set of instructions intended to cause a system with information processing capability to perform a particular function either directly, or after either or both of the following: a) conversion to another language, code or notation; b) reproduction in a different material form.

[0126] While the present disclosure is described with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made, and equivalents may be substituted without departure from the scope of the present disclosure. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the present disclosure without departure from its scope. Therefore, it is intended that the present disclosure is not limited to the embodiment disclosed, but that the present disclosure will include all embodiments that fall within the scope of the appended claims.

Claims

CLAIMS What is claimed is:

1. A system, comprising: circuitry configured to: receive input data that includes specific information; extract one or more major features in a spatial domain corresponding to the input data, wherein the extracted one or more major features represent the input data; transform the extracted one or more major features into a frequency domain; identify frequency components corresponding to the extracted one or more major features transformed into the frequency domain; suppress one or more frequency components among the identified frequency components; generate signal domain data corresponding to the extracted one or more major features based on the suppression of the one or more frequency components; transform the generated signal domain data from the frequency domain to the spatial domain based on an inverse transformation of the generated signal domain data; and generate output metadata based on the inverse transformation of the generated signal domain data, wherein the generated output metadata includes an anonymized version of the specific information included in the input data.

2. The system according to claim 1 , wherein the circuitry is further configured to encode the generated output metadata in a specific file format.

3. The system according to claim 2, wherein the circuitry is further configured to add specific metadata to the encoded generated output metadata, wherein the added specific metadata corresponds to a watermarking scheme.

4. The system according to claim 1 , wherein the circuitry is further configured to: apply a feature extraction model to the input data; extract features of the input data based on the application of the feature extraction model; and extract the one or more major features corresponding to the input data from the extracted features of the input data.

5. The system according to claim 4, wherein the circuitry is configured to execute at least one of a Scale-Invariant Feature Transform (SIFT) or a Hough Transformation, based on the application of the feature extraction model.

6. The system according to claim 1 , wherein the circuitry is further configured to extract the one or more major features based on one of a selection of a number of the one or more features to be extracted or a degree of anonymization of the specific information.

7. The system according to claim 1 , wherein the circuitry is further configured to: train a neural network model to learn a set of knob parameters associated with the extraction of the one or more major features, wherein the set of knob parameters correspond to a privacy requirement of a downstream training task; and apply the trained neural network model on the input data, wherein the generation of the output metadata is further based on the application of the trained neural network model.

8. The system according to claim 1 , wherein the input data is one of audio data or image data, the specific information is personal identifiable information (PH) included in the input data, and the circuitry is further configured to anonymize the PH included in the input data, wherein the anonymized PH is interpretable by the Artificial Intelligence (Al) model and is uninterpretable by the human.

9. The system according to claim 1 , wherein the circuitry is further configured to extract the one or more major features in the spatial domain based on a statistical method, wherein the statistical method includes at least one of a t-Distributed Neighbor Embedding (t-SNE) method or a Principal Component Analysis (PCA) method.

10. The system according to claim 1 , wherein the circuitry is further configured to: generate, based on the extracted one or more major features, sparse compressed data in the spatial domain corresponding to the input data; and transform, based on application of a transformation model, the generated sparse compressed data in the spatial domain into a signal domain.

11. The system according to claim 1 , wherein the circuitry is configured to convert the generated output metadata into machine interpretable data based on a stepwise feature space upscaling, wherein: the machine interpretable data is in a format same as a format of the input data, the anonymized version of the specific information is interpretable by an Artificial Intelligence (Al) model and is uninterpretable by a human, and the machine interpretable data is usable to train the Al model.

12. A method, comprising: in an electronic device: receiving input data that includes specific information; extracting one or more major features in a spatial domain corresponding to the input data, wherein the extracted one or more major features represent the input data; transforming the extracted one or more major features into a frequency domain; identifying frequency components corresponding to the extracted one or more major features transformed into the frequency domain; suppressing one or more frequency components among the identified frequency components; generating signal domain data corresponding to the extracted one or more major features based on the suppression of the one or more frequency components; transforming the generated signal domain data from the frequency domain to the spatial domain based on an inverse transformation of the generated signal domain data; and generating output metadata based on the inverse transformation of the generated signal domain data, wherein the generated output metadata includes an anonymized version of the specific information included in the input data.

13. The method according to claim 12, further comprising encoding the generated output metadata in a specific file format.

14. The method according to claim 13, further comprising adding specific metadata to the encoded generated output metadata, wherein the added specific metadata corresponds to a watermarking scheme.

15. The method according to claim 12, further comprising: applying a feature extraction model to the input data; extracting features of the input data based on the application of the feature extraction model; and extracting the one or more major features corresponding to the input data from the extracted features of the input data.

16. The method according to claim 12, further comprising extracting the one or more major features based on one of a selection of a number of the one or more features to be extracted or a degree of anonymization of the specific information.

17. The method according to claim 12, further comprising: training a neural network model to learn a set of knob parameters associated with the extraction of the one or more major features, wherein the set of knob parameters correspond to a privacy requirement of a downstream training task; and applying the trained neural network model on the input data, wherein the generation of the output metadata is further based on the application of the trained neural network model.

18. The method according to claim 12, wherein the input data is one of audio data or image data, the specific information is personal identifiable information (PH) included in the input data, and the method further comprising anonymizing the PH included in the input data, wherein the anonymized PH is interpretable by the Artificial Intelligence (Al) model and is uninterpretable by the human.

19. The method according to claim 12, further comprising converting the generated output metadata into machine interpretable data based on a stepwise feature space upscaling, wherein: the machine interpretable data is in a format same as a format of the input data, the anonymized version of the specific information is interpretable by an Artificial

Intelligence (Al) model and is uninterpretable by a human, and the machine interpretable data is usable to train the Al model.

20. A non-transitory computer-readable medium having stored thereon, computer-executable instructions that when executed by an electronic device, causes the electronic device to execute operations, the operations comprising: receiving input data that includes a specific information; extracting one or more major features in a spatial domain corresponding to the input data, wherein the extracted one or more major features represent the input data; transforming the extracted one or more major features into a frequency domain; identifying frequency components corresponding to the extracted one or more major features transformed into the frequency domain; suppressing one or more frequency components among the identified frequency components; generating signal domain data corresponding to the extracted one or more major features based on the suppression of the one or more frequency components ; transforming the generated signal domain data from the frequency domain to the spatial domain based on an inverse transformation of the generated signal domain data; and generating output metadata based on the inverse transformation of the generated signal domain data, wherein the generated output metadata includes an anonymized version of the specific information included in the input data.