TR2024005102A2

TR2024005102A2 - A SYSTEM TO ANONYMIZE DATA SENT TO ARTIFICIAL INTELLIGENCE

Info

Publication number: TR2024005102A2
Application number: TR2024/005102A
Authority: TR
Inventors: Demi̇ray Emre
Original assignee: Deytek Bi̇li̇şi̇m Mühendi̇sli̇k Sanayi̇ Ve Ti̇caret Anoni̇m Şi̇rketi̇
Priority date: 2024-04-29
Filing date: 2024-04-29
Publication date: 2024-05-21
Also published as: WO2025230497A1

Abstract

Buluş, iş süreçlerinde kullanılan yapay zeka sistemlerinin kişisel verilerin korunması ve yönetimi için geliştirilmiştir. Sistemin amacı, yapay zekaya giden verilerin kontrol altında tutulmasını sağlamak ve kişisel verilerin izinsiz erişimini engellemektir. Bu çözüm, yapay zekaya veri gönderilmeden önce kişisel verilerin tokenizasyon işlemiyle güvenli hale getirilmesini sağlar. Sistem, kullanıcıya özel olarak oluşturulan tokenlerle kişisel verilerin yapay zekaya iletilmesini ve geri dönüşte de bu tokenlerin çözülerek orijinal verilerin elde edilmesini sağlar.This invention is designed to protect and manage personal data within artificial intelligence (AI) systems used in business processes. The system aims to ensure that data sent to AI is kept under control and to prevent unauthorized access to personal data. This solution secures personal data through tokenization before it is sent to AI. The system allows personal data to be transmitted to AI using user-specific tokens, and these tokens are then decrypted to retrieve the original data.

Description

TARIFNAME YAPAY ZEKAYA GÖNDERILEN VERILERI ANONIMLESTIRMEK ÜZERE BIR SISTEM Teknolojik Alan: Bulus, yapay zeka yazilimlarinin kullanildigi her türlü alanda gerçeklestirilen dosya transferi veya iletisim süreçlerinde, kisisel verileri anonimlestirerek yapay zeka yazilimindan gizleyen, bu sayede yapay zeka yazilimi kaynakli kisisel veri sizintilarinin önüne geçen bir sistem ile ilgilidir. Teknigin Bilinen Durumu: Günümüzde yapay zeka, yasamin birçok alaninda oldugu gibi is süreçlerinde de siklikla kullanilmaktadir. Ancak, bu yaygin kullanim beraberinde veri sizintisi ve üçüncü parti alanlarla gizli bilgi paylasimi gibi riskleri de beraberinde getirmektedir. Is süreçlerini hizlandirip kolaylastirdigi için yapay zeka kullanimi kaçinilmazdir. Ancak, yapay zekaya gönderilen verilerin kontrol altinda tutulmasi da önemlidir. Bilinen teknikte yapay zeka yazilimlarina gönderilen verilerin sinirlandirilmadigi gözlemlenmistir. Kullanicilar, iletisim ve dosya transfer süreçlerinde yapay zeka yazilimlari kullanmakta, yazismalar veya dosyalardaki kisisel veriler yapay zeka yazilimindan gizlenmemektedir. Bu durumda yapay zeka yazilimi kaynakli kisisel veri sizintilari ile karsilasilabilmektedir. Yapilan literatür arastirmasinda karsilasilan CN numarali Çin patent dokümaninda yapay zeka temelli bir veri anonimlestirme sistemi anlatilmaktadir. Anlatilan patent dokümaninda, anonimlestirme süreci yapay zeka yazilimi tarafindan gerçeklestirilmektedir. Alicindan alinan bir dosya veya iletisim verisi, yapay zeka yaziliminda anomimlestirilmekte ve bir hedefe gönderilmektedir. Tarifnameye konu bulusta ise dosya veya yazisma içerisindeki kisisel veri yapay zeka yazilimina aktarilmadan önce anonimlestirilmekte, yapay zekadan çikis yaparken çözülerek hedefe yönlendirilmektedir. Bu sayede kisisel veriler yapay zekadan saklanmakta ve yapay zeka yazilimi kaynakli kisisel veri sizintilarinin önüne geçilmektedir. Sonuç olarak, yapay zeka yazilimlarina gönderilen verileri anonimlestirmek üzere, teknigin bilinen durumunun asildigi, dezavantajlarinin giderildigi yeni bir Bulusun Kisa Açiklanmasi: Bulus, yapay zeka yazilimlarina gönderilen verileri anonimlestirmek üzere, teknigin bilinen durumunun asildigi, dezavantajlarinin giderildigi, ilave olarak ekstra avantajlar içeren bir sistemdir. Bulusun amaci, yapay zeka yazilimlarinin kullanildigi is süreçlerinde, kisisel veri içeren bir veri kümesindeki kisisel verinin yapay zeka yazilimina anonimlestirilerek aktarildigi bir sistem ortaya koymaktir. Bulusun bir diger amaci, yapay zeka yazilimi kaynakli kisisel veri sizintilarinin önüne geçmek üzere bir sistem ortaya koymaktir. Bulusta, is süreçlerin kullanilan yapay zeka yazilimlari anlik olarak sürekli olarak kontrollü olarak beslenirken, kisisel verilerin yapay zekaya gönderimi sinirlandirilmaktadir. Bu amaçla gelistirilen bulusa konu çözümde, yapay zekayi beslemek üzere eklenen her dosya incelenmekte, eger dosya kisisel veri içeriyorsa öncelikle bir onaya sunulmakta, aksi takdirde dosyanin eklenmesi engellenmekte ya da dosyadaki kisisel veriler tokenizasyon ile farklilastirilarak yapay zekaya iletilmektedir. Bulusta bahsedilen yapay zeka, bir chatbot olabilecegi gibi, farkli bir is sürecinde kullanilan bir bilgi havuzunu yöneten bir yapi da olabilmektedir. Bu çözüm, APl araciligiyla herhangi bir sisteme kolayca entegre edilebilmektedir. Kisisel veri anonimlestirilmek istenildiginde örnegin isim (kisisel veri) olan gibi benzersiz bir token koduna çevrilmekte ve yapay zekaya bu veri kodlanmis olarak iletilmektedir. Yapay zeka bu veriyle egitilmekte, eger yapay zekayi chatbot olarak düsünürsek "Ahmet Yavuz'un çalistigi birim nedir?" diye soru soruldugunda, bu soru önce APl'ye gelerek "A96DAD5S.!+2368lANA89#'un çalistigi birim nedir?" olarak yapay zekaya iletilmektedir. Dönen cevap token içeriyorsa bu token gene APl tarafindan çevrilerek, "Ahmet Yavuz lT biriminde çalismaktadir." seklinde cevap olarak döndürülmektedir. Bulusun avantajlari asagida siralanmaktadir: 0 Yapay zeka sistemine entegre edilebilen bir yapiya sahiptir (Burada bahsedilen yapay zeka bir chat bot olabilecegi gibi farkli bir yapay zeka uygulamasi da olabilmektedir.). 0 Kisisel verilerin yapay zekaya iletilmeden önce tokenizasyon islemiyle güvenli hale getirilmesini saglar. 0 Yapay zeka sistemine gönderilen verilerin içerdigi kisisel bilgilerin öncelikle onaya sunulmasini saglar. 0 Kisisel verilerin tokenizasyonu, sisteme eklenen her dosya için otomatik olarak gerçeklestirilir. 0 Sistem, APl araciligiyla diger sistemlere kolayca entegre edilebilir. 0 Yapay zeka sistemi sisteme eklenen yeni dosyalar ile anlik olarak beslenmektedir. 0 Kisisel veri içeren sorgularin yapay zekaya iletilmesi ve dönen cevaplarin tokenlerle sifrelenmesi ve ardindan çözülerek kullaniciya sunulmasi saglanir. Sekillerin Açiklanmasi: Bulus, ilisikteki sekillere atifta bulunularak anlatilacaktir, böylece bulusun özellikleri daha net anlasilacaktir. Ancak, bunun amaci bulusu bu belli düzenlemeler ile sinirlamak degildir. Tam aksine, bulusun ilisikteki istemler tarafindan tanimlandigi alani içine dâhil edilebilecek bütün alternatif, degisiklik ve denkliklerinin kapsanmasi da amaçlanmistir. Gösterilen ayrintilar, sadece mevcut bulusun tercih edilen düzenlemelerinin anlatimi amaciyla gösterildigi ve hem yöntemlerin sekillendirilmesinin, hem de bulusun kurallari ve kavramsal özelliklerinin en kullanisli ve kolay anlasilir tanimini saglamak amaciyla sunulduklari anlasilmalidir. Bu çizimlerde; Sekil- 1 Bulus konusu sistemin sematik görünümü. Bu bulusun anlasilmasina yardimci olacak sekiller ekli resimde belirtildigi gibi numaralandirilmis olup isimleri ile beraber asagida verilmistir. Referanslarin Açiklanmasi: .Kullanici .Dosya ve iletisim sistemi .Yapay zeka motoru 40.Yapay zeka yazilimi Bulusun Açiklanmasi: Bu detayli açiklamada bulus konusu sistem sadece konunun daha iyi anlasilmasina yönelik olarak, hiçbir sinirlayici etki olusturmayacak örneklerle açiklanmaktadir. Sekil 1'de bulus konusu sistemin unsurlarini gösteren sematik görünüm verilmektedir. Bulusta bahsedilen kullanici (10), is süreçlerinde yapay zeka yazilimini (40) kullanan herhangi biridir. Bir dosya ve iletisim sistemi (20) üzerinden yapay zeka motorunu (30) kullanarak yapay zeka yazilimi (40) ile iletisime geçmektedir. Bulusta dosya ve iletisim sistemi (20), bir kurulusun bütün dosya ve verilerinin yer aldigi, iletisim süreçlerinin gerçeklestirildigi sistemdir. Kullanicinin (10) yapay zeka yazilimi (40) ile olan iletisimi, bir yazisma veya dosya transferi seklinde gerçeklesebilmektedir. Bulusta yapay zeka yazilimina (40) aktarilan yazisma veya dosyalardaki veri kümesi içerisindeki kisisel veriler kisisel veri yakalama araci (50) tarafindan tespit edilmekte, yine kisisel veri yakalama araci (50) üzerinde kisisel verilere bir token kodu atanarak gizlenmis bir sekilde yapay zeka yazilimina (40) aktarilmaktadir. Kullanicinin (10) yapay zeka yazilimina (40) girdisi sonucunda yapay zeka yazilimindan (40) gelecek dönüslerde, veri kümesi içerisindeki token kodu atanmis veriler yine kisisel veri yakalama araci (50) tarafindan çözülmekte, dosya ve iletisim sistemi (20) üzerinden kullaniciya (10) aktarilmaktadir. Bulusta esas islevi gerçeklestiren bölüm kisisel veri yakalama araci (50) olup, kisisel veri yakalama araci yapay zeka yazilimina (40) gönderilen dosya ve iletisim verilerindeki kisisel verileri token kodu ile gizlerken, yapay zeka yazilimindan (40) dönüste de token kodu atanmis kisisel verilerin kodlarini kaldirir. Bu sayede kullanici (10) yapay zeka yazilimindan (40) dönüsleri orijinal halinde almaktadir. Her ne kadar esas islevi gerçeklestiren bölüm kisisel veri yakalama araci (50) olsa da bir diger önemli husus yapay zeka yaziliminin (40) egitilmesidir. Bulusta yapay zeka yazilimi (40), kisisel veriler ile beslenmemekte, kisisel verileri atanmis token kodlari ile egitilmektedir. Daha önce verilen örnekten ilerlemek gerekirse bulusta yapay zeka yazilimi (40), "Ahmet Yavuz" kisisel verisini iletisim sisteminde (20) yer alan veri kümesindeki bütün kisisel verilere benzersiz bir token kodu atanmakta, token kodu atanmis kisisel veriler anlik olarak yapay zeka yazilimina (40) beslenerek ögretilmektedir. Bulusun çalisma prensibi su sekilde gerçeklesmektedir. Bulusta dosya ve iletisim sistemi (20) içerisinde yer alan veri kümesi içerisindeki kisisel veriler, kisisel veri yakalama araci (50) tarafindan yakalanmakta, kisisel verilere bir token kodu atanmaktadir. Token kodu atanmis kisisel veri içeren dosya veya veri kümeleri anlik olarak yapay zeka yazilimina (40) beslenmektedir. Bir kullanici (10), yapay zeka yazilimina (40) soru sormak istediginde veya bir is sürecini yönetmek istediginde, yapay zeka motoru (30) üzerinden yapay zeka yazilimiyla (40) iletisime geçmektedir. Iletisim esnasinda kullanilan veri kümesinde, kisisel veri olup olmadigi kisisel veri yakalama araci (50) tarafindan tespit edilmekte, eger veri kümesinde kisisel veri varsa bahsedilen kisisel veri token koduna çevrilerek yapay zeka yazilimina (40) aktarilmaktadir. Yapay zeka yazilimi (40) gelen is sürecini tamamladiginda kullaniciya (10) bir çikti sunmakta, sunulan çiktidaki veri kümesinde yer alan token atanmis kisisel veriler yine kisisel veri yakalama araci (50) tarafindan çözülerek gerçek kisisel veriye dönüstürülmekte ve kullaniciya (10) bu sekilde aktarilmaktadir. Bulusta alternatif olarak kisisel veri içeren veri kümelerinin yapay zeka yazilimi (40) ile paylasimi onaya sunulabilmektedir. Yapay zeka yazilimi (40) ile iletisim esnasinda kullanilan veri kümesinde, kisisel veri olup olmadigi kisisel veri yakalama araci (50) tarafindan tespit edilmekte, eger veri kümesinde kisisel veri varsa yetkili onayina sunulmaktadir. Yetkili onay vermezse kisisel veri içeren dosya veya iletisim kümesi yapay zeka yazilimina aktarilmamaktadir. Eger yetkili onay verirse, kisisel verilere token kodu atandiktan sonra yapay zeka yazilimina aktarilmaktadir. Bulusa konu sistem ile yapisal olmayan veriler sistematik ve güncel bir sekilde yapay zeka yazilimina (40) beslenmektedir. Bilinen teknikteki sistemlerde yapay zekaya yönelik bir veri gizliligi motivasyonu olmamasinin yani sira yapay zekanin beslenmesinde de aksakliklar yasanmaktadir. Bulusta yer alan dosya ve iletisim sistemi (20), veri tabanlarinda tutulamayan, belli bir format ve düzenleri olmayan veriler olan yapisal olmayan verileri de yapay zeka yazilimina (40) sistematik ve güncel bir sekilde besleyerek, yapay zekanin besleme sorununa çözüm üretmektedir. Ayni zamanda dosya ve iletisim sistemi (20), sistemdeki dosya degisiklikleri durumunda yapay zeka yazilimini (40) sürekli olarak egitmektedir. Yapay zeka yaziliminin (40) beslenmesi konusunda herhangi bir sinirlama bulunmamaktadir. Tüm dosya ortamlariyla (cloud ya da onprem) besleme yapilabilmesi mümkündür. Bulusta yapay zeka yazilimi (40) bir chatbox olabilmekte, yapay zeka yazilimindan (40) dönen çiktilar bir metin (bir soruya verilmis yanit) veya bir dosya olabilmektedir. Yapay zeka yazilimin (40) dosya çiktisi verebilmesi için ilgili dosyalarda yetkili onayi gerekmektedir. Bu sebeple sistem otomatik olarak dosya içerik ve yetkilerini hesaplayarak yapay zeka sorgularinda hangi dosyalarin kullanilabilecegine karar vermektedir. Örnegin sirket için önemli, paylasim yetkisi bulunmayan bir dosya, yapay zeka yazilimindan (40) talep edilirse, yapay zeka yazilimi (40) dosyanin iletilmesi için bir yetki talebinde bulunabilir veya kullaniciya yetkisiz islem dönüsü yapabilir. Ayni zamanda yapay zeka yazilimi (40) cevap verirken, sadece kendi kisisel dosyalarini kullan, yetkisi olanlari kullan, tüm dosya setini kullan seklinde, hangi dosya setlerini kullanabilecegi bir parametre olarak da verilebilmektedir. TR TR TR TR TR TR TR TR TR TR TR TR TRDESCRIPTION: A SYSTEM FOR ANONYMOUSING DATA SENT TO ARTIFICIAL INTELLIGENCE Technological Field: The invention relates to a system that anonymizes personal data during file transfer or communication processes in any field where artificial intelligence software is used, thereby preventing personal data leaks originating from artificial intelligence software. State of the Art: Today, artificial intelligence is frequently used in many areas of life, including business processes. However, this widespread use also brings with it risks such as data leakage and sharing confidential information with third parties. The use of artificial intelligence is inevitable because it speeds up and simplifies business processes. However, it is also important to keep the data sent to artificial intelligence under control. It has been observed that the data sent to artificial intelligence software is not limited in the known technology. Users utilize artificial intelligence (AI) software in communication and file transfer processes, but personal data in correspondence or files is not hidden from the AI software. This can lead to data leaks originating from AI software. A literature review revealed a Chinese patent document (CN) describing an AI-based data anonymization system. In this patent document, the anonymization process is performed by AI software. A file or communication data received from a sender is anonymized by the AI software and sent to a destination. In the invention described in the patent, personal data within the file or correspondence is anonymized before being transferred to the AI software, and then de-anonymized upon exiting the AI and being sent to the destination. This ensures that personal data is protected from artificial intelligence and prevents personal data leaks originating from AI software. In short, the invention is a system that surpasses the known state of the art, eliminates its disadvantages, and offers additional advantages for anonymizing data sent to AI software. The invention aims to create a system where personal data in a dataset containing personal information is transferred to AI software in an anonymized form in business processes where AI software is used. Another aim of the invention is to create a system to prevent personal data leaks originating from AI software. In this invention, while the artificial intelligence software used in business processes is continuously and controllably fed in real-time, the transmission of personal data to the AI is restricted. In the solution developed for this purpose, every file added to feed the AI is examined. If the file contains personal data, it is first submitted for approval; otherwise, the file is prevented from being added, or the personal data in the file is differentiated through tokenization before being transmitted to the AI. The AI mentioned in this invention can be a chatbot or a structure managing an information pool used in a different business process. This solution can be easily integrated into any system via API. When personal data needs to be anonymized, for example, a name (personal data) is converted into a unique token code, and this encoded data is transmitted to the AI. Artificial intelligence is trained with this data. If we consider the AI as a chatbot, when the question "What unit does Ahmet Yavuz work in?" is asked, this question first goes to the API and is then transmitted to the AI as "What unit does A96DAD5S.!+2368lANA89# work in?". If the returned answer contains a token, this token is converted by the API and returned as the answer "Ahmet Yavuz works in the IT unit." The advantages of the invention are listed below: 0 It has a structure that can be integrated into an artificial intelligence system (The artificial intelligence mentioned here can be a chatbot or a different artificial intelligence application). 0 It ensures that personal data is secured through tokenization before being transmitted to the artificial intelligence. 0 It ensures that the personal information contained in the data sent to the artificial intelligence system is first submitted for approval. 0 Personal data tokenization is automatically performed for each file added to the system. 0 The system can be easily integrated with other systems via API. 0 The artificial intelligence system is fed with new files added to the system in real time. 0 Queries containing personal data are transmitted to the artificial intelligence, and the returned responses are encrypted with tokens, then decrypted and presented to the user. Explanation of Figures: The invention will be described by referring to the attached figures, so that the features of the invention are understood more clearly. However, the aim is not to limit the invention to these specific arrangements. On the contrary, it is also intended to cover all alternatives, modifications, and equivalences that can be included within the area defined by the attached requirements of the invention. It should be understood that the details shown are presented solely for the purpose of describing the preferred arrangements of the invention and are intended to provide the most useful and easily understandable definition of both the methods used and the rules and conceptual features of the invention. In these drawings; Figure 1 shows a schematic view of the system in question. The figures that will help in understanding this invention are numbered as indicated in the attached image and are given below with their names. References Explanation: .User .File and communication system .Artificial intelligence engine 40.Artificial intelligence software Description of the Invention: In this detailed explanation, the system in question is explained with examples that will not create any limiting effect, solely for the purpose of better understanding the subject. Figure 1 shows a schematic view of the elements of the system in question. The user (10) mentioned in the invention is anyone who uses artificial intelligence software (40) in business processes. They communicate with the artificial intelligence software (40) using the artificial intelligence engine (30) via a file and communication system (20). In the invention, the file and communication system (20) is the system where all the files and data of an organization are located and communication processes are carried out. The communication of the user (10) with the artificial intelligence software (40) can take the form of a message or file transfer. In the invention, personal data in the dataset in the messages or files transferred to the artificial intelligence software (40) is detected by the personal data capture tool (50), and a token code is assigned to the personal data on the personal data capture tool (50) and transferred to the artificial intelligence software (40) in a hidden manner. In the feedback from the AI software (40) following the input of the user (10) into the AI software (40), the data with the assigned token code in the dataset is decoded by the personal data capture tool (50) and transferred to the user (10) via the file and communication system (20). The part that performs the main function in the invention is the personal data capture tool (50), which hides the personal data in the file and communication data sent to the AI software (40) with a token code, and removes the codes of the personal data with the assigned token code in the feedback from the AI software (40). In this way, the user (10) receives the feedback from the AI software (40) in its original form. Although the main function is performed by the personal data capture tool (50), another important aspect is the training of the artificial intelligence software (40). In the invention, the artificial intelligence software (40) is not fed with personal data, but is trained with assigned token codes. To continue with the example given earlier, in the invention, the artificial intelligence software (40) assigns a unique token code to all personal data in the dataset in the communication system (20) containing the personal data "Ahmet Yavuz", and the personal data with assigned token codes are fed to the artificial intelligence software (40) instantly to teach it. The working principle of the invention is as follows: In the invention, personal data in the dataset in the file and communication system (20) are captured by the personal data capture tool (50), and a token code is assigned to the personal data. Files or datasets containing personal data with assigned token codes are fed into the AI software (40) in real time. When a user (10) wants to ask the AI software (40) a question or manage a business process, they communicate with the AI software (40) through the AI engine (30). During communication, the personal data capture tool (50) detects whether there is personal data in the dataset used, and if there is personal data in the dataset, the said personal data is converted into a token code and transferred to the AI software (40). When the AI software (40) completes the incoming process, it presents an output to the user (10). The token-assigned personal data in the dataset in the output is decoded by the personal data capture tool (50) and converted into real personal data, and this is then transmitted to the user (10). Alternatively, the sharing of datasets containing personal data with the AI software (40) can be submitted for approval. During communication with the AI software (40), the personal data capture tool (50) detects whether the dataset contains personal data. If there is personal data in the dataset, it is submitted for authorization. If the authorization is not given, the file or communication set containing personal data is not transferred to the AI software. If authorized, personal data is transferred to the AI software after a token code is assigned. With the system in question, unstructured data is fed to the AI software (40) in a systematic and up-to-date manner. In known systems, there is no data privacy motivation for AI, and there are also problems in feeding the AI. The file and communication system (20) in the invention solves the problem of feeding the AI by feeding unstructured data, which cannot be stored in databases and does not have a certain format and order, to the AI software (40) in a systematic and up-to-date manner. At the same time, the file and communication system (20) continuously trains the AI software (40) in case of file changes in the system. There are no limitations on feeding the AI software (40). It is possible to feed it with all file environments (cloud or on-premise). In the invention, the AI software (40) can be a chatbox, and the outputs returned from the AI software (40) can be a text (an answer to a question) or a file. Authorized approval is required for the relevant files in order for the AI software (40) to provide file output. Therefore, the system automatically calculates the file content and permissions and decides which files can be used in AI queries. For example, if an important file for the company, for which the user does not have sharing permission, is requested from the AI software (40), the AI software (40) can request authorization to transmit the file or return an unauthorized operation to the user. At the same time, when the AI software (40) responds, it can be given a parameter specifying which file sets it can use, such as: use only its own personal files, use those with permissions, use the entire file set.