[go: up one dir, main page]

WO2019037093A1 - Procédé et système de traitement de données par informatique répartie spark - Google Patents

Procédé et système de traitement de données par informatique répartie spark Download PDF

Info

Publication number
WO2019037093A1
WO2019037093A1 PCT/CN2017/099083 CN2017099083W WO2019037093A1 WO 2019037093 A1 WO2019037093 A1 WO 2019037093A1 CN 2017099083 W CN2017099083 W CN 2017099083W WO 2019037093 A1 WO2019037093 A1 WO 2019037093A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage area
memory storage
eviction
data
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2017/099083
Other languages
English (en)
Chinese (zh)
Inventor
毛睿
陆敏华
陆克中
朱金彬
隋秀峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to PCT/CN2017/099083 priority Critical patent/WO2019037093A1/fr
Publication of WO2019037093A1 publication Critical patent/WO2019037093A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers

Definitions

  • the present invention relates to the field of computers, and in particular, to a Spark distributed computing data processing method and system.
  • Spark has become a popular computing framework for big data applications, especially in the field of iterative computing such as graph computing and machine learning.
  • the lack of space causes some partitioned data to be cached to memory, or the data that has been cached to memory needs to be migrated to disk, causing the performance of Spark to drop.
  • Spark proposes and designs a unified memory management model, when the partition data is cached.
  • the task cannot apply for enough storage space, it actively migrates the cached data in the storage area to disk or directly rejects it; the unified memory management model has the flexibility to effectively alleviate the Spark cache by migrating or culling the cached data.
  • the demand for data and the pressure of insufficient storage space is a unified memory management model.
  • the Spark unified memory management model triggers some tasks of Spark.
  • the problem of double counting or disk reading has a bad impact on Spark performance.
  • the main purpose of the present invention is to provide a Spark distributed computing data processing method and system, which aims to solve the technical problem of repeated Spark task calculation or disk reading in the Spark unified memory management model in the prior art.
  • a first aspect of the present invention provides a Spark A distributed computing system data processing method, the method comprising:
  • the eviction logic unit When performing a storage task on the elastic distributed dataset RDD partition data that the user has identified the cache, if you are going to Spark If the memory storage area fails to apply, the eviction logic unit sends a command to evict the cached data by expelling the memory storage area;
  • the data access heat setting according to the eviction cache of the memory storage area is based on Migration address of the hybrid storage system of SSD and HDD;
  • Reading and releasing the eviction cache data in the memory storage area migrating the memory storage area to evict the cache data to the migration address, modifying the eviction cache data persistence level in the memory storage area, and feedback eviction success Signal and expulsion information.
  • the second aspect of the present invention further provides a Spark A distributed computing data processing system, the system comprising:
  • the eviction logic unit sends a command to evict the cache memory of the memory storage area
  • Calculating a location module configured to calculate a size of the eviction space in the memory storage area, and if the space size after the eviction meets the requirement of the storage task space by the storage task, the cache data may be eviction according to the memory storage area
  • Access popularity settings are based on Migration address of the hybrid storage system of SSD and HDD;
  • a data migration module configured to read and release the eviction cache data in the memory storage area, and migrate the memory storage area to evict the cached data to the migration address, and modify the eviction cache data in the memory storage area to be persistent Level, feedback eviction success signal and eviction information.
  • the partition data can be flexibly migrated to the SSD or HDD according to the heat, instead of directly migrating the buffered intermediate data to the disk or kicking out
  • the cached data can effectively alleviate the pressure of Spark partition data cache on the huge storage space and insufficient memory space.
  • the partition data when the partition data is called, the high-speed read and write performance of the hybrid storage system and the heat according to the partition data are separated.
  • the storage feature can quickly read the partition data of different access heats stored in the hybrid storage system to improve the performance of Spark.
  • FIG. 1 is a schematic flowchart of a Spark distributed computing data processing method according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a refinement step of step 101 of a Spark distributed computing data processing method according to an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a refinement step of step 102 of a Spark distributed computing data processing method according to an embodiment of the present invention
  • FIG. 4 is a schematic flowchart of a refinement step in step 304 of a Spark distributed computing data processing method according to an embodiment of the present invention
  • FIG. 5 is a schematic flowchart of a step of refining data in step 103 of a Spark distributed computing data processing method according to an embodiment of the present invention
  • FIG. 6 is a schematic flowchart of a step of refining a data persistence level step in step 103 of a Spark distributed computing data processing method according to an embodiment of the present invention
  • FIG. 7 is a schematic diagram of functional modules of a Spark distributed computing data processing system according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a refinement function module of an application storage module 601 of a Spark distributed computing data processing system according to an embodiment of the present invention
  • FIG. 9 is a schematic diagram of a refinement function module of the application storage module 602 of the Spark distributed computing data processing system according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of a refinement function module of the application storage module 603 of the Spark distributed computing data processing system according to an embodiment of the present invention.
  • FIG. 1 is a schematic flowchart of a Spark distributed computing data processing method according to an embodiment of the present invention, where the processing method includes:
  • the migration of the SSD and HDD based hybrid storage system may be set according to the memory storage area eviction cache data access heat. address.
  • a hybrid storage system is constructed by introducing an SSD and an HDD, and the eviction logic unit and the cache data migration unit are designed, and the partition data is flexibly migrated to the SSD or the HDD according to the heat, instead of directly buffering the intermediate data.
  • Migrating to disk or kicking out cached data can effectively alleviate the pressure of Spark partition data cache on the huge storage space and insufficient memory space.
  • the storage feature can quickly read the partition data of different access heats stored in the hybrid storage system to improve the performance of Spark.
  • FIG. 2 is a schematic flowchart of a refinement step of a Spark distributed computing data processing method S101 according to an embodiment of the present invention, where the refinement step includes:
  • the Spark execution engine performs the scheduling of the subtask through the task scheduler, and performs a storage task on the RDD partition data that the user has identified and cached in the subtask runtime space, and then attempts to apply for the space space to the Spark memory storage area. If the application is successful, the RDD partition data is directly stored.
  • a hybrid storage system is constructed by introducing an SSD and an HDD, and the eviction logic unit and the cache data migration unit are designed, and the partition data is flexibly migrated to the SSD or the HDD according to the heat, instead of directly buffering the intermediate data.
  • Migrating to disk or kicking out cached data can effectively alleviate the pressure of Spark partition data cache on the huge storage space and insufficient memory space.
  • the storage feature can quickly read the partition data of different access heats stored in the hybrid storage system to improve the performance of Spark.
  • FIG. 3 is a schematic flowchart of a refinement step of a Spark distributed computing data processing method S102 according to an embodiment of the present invention, where the refinement step includes:
  • the eviction logic unit receives the eviction command, and the eviction logic unit sends an application for expelling the memory storage space to the memory storage area by requiring insufficient storage space for performing the storage task due to the RDD partition data.
  • the memory storage area determines whether the memory storage area has an expellable space and feeds back to the eviction logic unit.
  • the least-used algorithm LRU strategy that is, the algorithm performs the phase-out data according to the historical access heat record of the memory storage area data
  • the core idea is that if the data is recently accessed, the probability of being accessed in the future is also higher, according to The probability of access determines the size of the eviction space in the memory storage area.
  • the storage space needs to occupy a space.
  • Terminating the memory storage area may evict the cache data migration task, and feedback the eviction memory storage area to evict the cache data failure signal.
  • a hybrid storage system is constructed by introducing an SSD and an HDD, and the eviction logic unit and the cache data migration unit are designed, and the partition data is flexibly migrated to the SSD or the HDD according to the heat, instead of directly buffering the intermediate data.
  • Migrating to disk or kicking out cached data can effectively alleviate the pressure of Spark partition data cache on the huge storage space and insufficient memory space.
  • the storage feature can quickly read the partition data of different access heats stored in the hybrid storage system to improve the performance of Spark.
  • FIG. 4 is a schematic flowchart of a refinement step in a Spark distributed computing data processing method S304 according to an embodiment of the present invention, where the refinement step includes:
  • the first preset heat value range is that the memory storage area can be eviction cache data access heat is high, and the specific access heat range can be freely set by the user;
  • the first preset heat value is greater than the second preset heat value.
  • the second preset heat value range is that the memory storage area can be eviction cache data access heat is low, and the specific access heat range can be freely set by the user.
  • a hybrid storage system is constructed by introducing an SSD and an HDD, and the eviction logic unit and the cache data migration unit are designed, and the partition data is flexibly migrated to the SSD or the HDD according to the heat, instead of directly buffering the intermediate data.
  • Migrating to disk or kicking out cached data can effectively alleviate the pressure of Spark partition data cache on the huge storage space and insufficient memory space.
  • the storage feature can quickly read the partition data of different access heats stored in the hybrid storage system to improve the performance of Spark.
  • FIG. 5 is a schematic flowchart of a step of refining data in a Spark distributed computing data processing method S103 according to an embodiment of the present invention.
  • the refinement step includes:
  • the cache data migration unit receives the memory storage area to evict the cache data migration information and the memory storage area may evict the cache data migration command, and store the eviction data of the memory storage area according to the migration information to the SSD or the HDD;
  • the cache data migration unit receives the memory storage area to evict the cache data migration information and the memory storage area can evict the cache data migration command
  • the cached data in the specified memory storage area is first read and the corresponding memory space is released, and then Cache the cached data in the memory storage area to the SSD or HDD according to the migration address;
  • the memory storage area can evict data migration information, including: the memory storage area can evict the cache data address, the memory storage area can evict the cache data space size, and the migration address.
  • Sending a memory storage area to the eviction logic unit may evict the cache data migration completion signal.
  • a hybrid storage system is constructed by introducing an SSD and an HDD, and the eviction logic unit and the cache data migration unit are designed, and the partition data is flexibly migrated to the SSD or the HDD according to the heat, instead of directly buffering the intermediate data.
  • Migrating to disk or kicking out cached data can effectively alleviate the pressure of Spark partition data cache on the huge storage space and insufficient memory space.
  • the storage feature can quickly read the partition data of different access heats stored in the hybrid storage system to improve the performance of Spark.
  • FIG. 6 is a schematic flowchart of a step of refining a data persistence level step in a Spark distributed computing data processing method S103 according to an embodiment of the present invention.
  • the refinement step includes:
  • the migration address of the cache storage data in the memory storage area is SSD
  • the persistent storage level of the cache memory data in the modified memory storage area is SSD_ONLY.
  • the modification is completed, the feedback memory storage area can evict the cache data eviction success signal, and the memory storage area can evict the data migration information, so that the RDD partition data enters the memory storage area to complete the storage task.
  • a hybrid storage system is constructed by introducing an SSD and an HDD, and the eviction logic unit and the cache data migration unit are designed, and the partition data is flexibly migrated to the SSD or the HDD according to the heat, instead of directly buffering the intermediate data.
  • Migrating to disk or kicking out cached data can effectively alleviate the pressure of Spark partition data cache on the huge storage space and insufficient memory space.
  • the storage feature can quickly read the partition data of different access heats stored in the hybrid storage system to improve the performance of Spark.
  • FIG. 7 is a schematic diagram of functional modules of a Spark distributed computing data processing system according to an embodiment of the present invention.
  • the functional module includes:
  • the application storage module 601 is configured to send the eviction memory storage area cache data to the eviction logic unit if the storage space of the Spark memory storage area fails when the storage task is performed on the flexible distributed data set RDD partition data that the user has identified.
  • the calculation address module 602 is configured to calculate the size of the eviction space in the memory storage area. If the space size after the eviction meets the requirements of the storage task space for the memory storage area, the data storage area may be evicted according to the memory storage area, and the SSD and HDD are set based on the SSD and the HDD. Migration address of the hybrid storage system;
  • the data migration module 603 is configured to read and release the eviction cache data in the memory storage area, migrate the cache storage data to the migration address in the memory storage area, modify the memory storage area to evict the cache data persistence level, and feedback the eviction success signal. And eviction information.
  • a hybrid storage system is constructed by introducing an SSD and an HDD, and the eviction logic unit and the cache data migration unit are designed, and the partition data is flexibly migrated to the SSD or the HDD according to the heat, instead of directly buffering the intermediate data.
  • Migrating to disk or kicking out cached data can effectively alleviate the pressure of Spark partition data cache on the huge storage space and insufficient memory space.
  • the storage feature can quickly read the partition data of different access heats stored in the hybrid storage system to improve the performance of Spark.
  • FIG. 8 is a schematic diagram of a refinement function module of a storage module 601 of a Spark distributed computing data processing system according to an embodiment of the present disclosure, where the refinement function module includes:
  • the first application module 6011 is configured to calculate a size of a memory storage space occupied by performing a storage task on the RDD partition data, apply for a space to the Spark memory storage area, and compare with an unoccupied space of the memory storage area;
  • the first feedback module 6012 is configured to: if the size of the memory storage area occupied by the storage task is larger than the unoccupied space of the memory storage area, requesting space from the Spark memory storage area fails, and sending the eviction memory storage area to the eviction logic unit to evict the cache The command of the data and the size of the memory storage space are required to send the storage task.
  • a hybrid storage system is constructed by introducing an SSD and an HDD, and the eviction logic unit and the cache data migration unit are designed, and the partition data is flexibly migrated to the SSD or the HDD according to the heat, instead of directly buffering the intermediate data.
  • Migrating to disk or kicking out cached data can effectively alleviate the pressure of Spark partition data cache on the huge storage space and insufficient memory space.
  • the storage feature can quickly read the partition data of different access heats stored in the hybrid storage system to improve the performance of Spark.
  • FIG. 9 is a schematic diagram of a refinement function module of a storage module 602 of a Spark distributed computing data processing system according to an embodiment of the present disclosure, where the refinement function module includes:
  • the second application module 6021 is configured to: the eviction logic unit receives the eviction command, and the eviction logic unit sends an application to the memory storage area that requires insufficient storage space for performing the storage task due to the RDD partition data, and if the application is successful, press Recently, the LRU strategy is used to calculate the size of the expellable space in the memory storage area;
  • the migration address module 6022 is configured to set the size of the unoccupied space of the memory storage area after the eviction is greater than or equal to the size of the RDD partition data to perform the storage task, and set the hybrid storage system based on the SSD and the HDD according to the eviction cache data access heat of the memory storage area.
  • the migration address, and the memory storage area eviction cache data migration information and the memory storage area eviction cache data migration command are sent to the cache data migration unit;
  • the second feedback module 6023 is configured to: if the unoccupied space of the memory storage area after the eviction is smaller than the size of the RDD partition data to perform the storage task, terminate the memory storage area to evict the cache data migration task, and feedback the eviction memory storage area to evict Cache data failure signal;
  • the SSD migration address module 6024 is configured to: if the memory storage area eviction cache data access heat is within a first preset heat value range, read the SSD address and set the read SSD address as a migration address;
  • the HDD migration address module 6025 is configured to read the HDD address and set the read HDD address as a migration address if the memory storage area eviction cache data access heat is within the second preset heat value range.
  • a hybrid storage system is constructed by introducing an SSD and an HDD, and the eviction logic unit and the cache data migration unit are designed, and the partition data is flexibly migrated to the SSD or the HDD according to the heat, instead of directly buffering the intermediate data.
  • Migrating to disk or kicking out cached data can effectively alleviate the pressure of Spark partition data cache on the huge storage space and insufficient memory space.
  • the storage feature can quickly read the partition data of different access heats stored in the hybrid storage system to improve the performance of Spark.
  • FIG. 10 is a schematic diagram of a refinement function module of a storage module 603 of a Spark distributed computing data processing system according to an embodiment of the present invention.
  • the refinement function module includes:
  • the third feedback module 6031 is configured to send, to the eviction logic unit, a memory storage area eviction cache data migration completion signal;
  • the SSD persistence level module 6032 is configured to: if the memory storage area can evict the cached data, the migration address is SSD, and modify the memory storage area to evict the cached data to have a persistence level of SSD_ONLY;
  • the HDD persistence level module 6033 is configured to: if the memory storage area can evict the cached data, the migration address is HDD, and the modified memory storage area can evict the cached data by a persistent level of HDD_ONLY;
  • the fourth feedback module 6034 is configured to feedback the memory storage area to evict the cache data eviction success signal and the memory storage area to evict the data migration information, so that the RDD partition data enters the memory storage area to complete the storage task.
  • a hybrid storage system is constructed by introducing an SSD and an HDD, and the eviction logic unit and the cache data migration unit are designed, and the partition data is flexibly migrated to the SSD or the HDD according to the heat, instead of directly buffering the intermediate data.
  • Migrating to disk or kicking out cached data can effectively alleviate the pressure of Spark partition data cache on the huge storage space and insufficient memory space.
  • the storage feature can quickly read the partition data of different access heats stored in the hybrid storage system to improve the performance of Spark.
  • the disclosed methods and systems may be implemented in other manners.
  • the system embodiments described above are merely illustrative.
  • the division of modules is only a logical function division.
  • multiple modules or components may be combined or integrated. Go to another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or module, and may be electrical, mechanical or otherwise.
  • the modules described as separate components may or may not be physically separate.
  • the components displayed as modules may or may not be physical modules, that is, may be located in one place, or may be distributed to multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • An integrated module if implemented as a software functional module and sold or used as a standalone product, can be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read only memory (ROM, Read-Only) Memory, random access memory (RAM), disk or optical disk, and other media that can store program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

La présente invention se rapporte au domaine de l'informatique, et concerne un procédé de traitement de données par informatique répartie Spark. Le procédé comporte les étapes consistant à: planifier une sous-tâche au moyen d'un planificateur de tâches, exécuter une tâche de stockage de données de partition RDD, et solliciter un espace d'une zone de stockage; calculer la taille d'un espace pouvant être expulsé dans la zone de stockage, et spécifier une adresse de migration d'un système de stockage hybride en fonction d'une popularité d'accès de données de partition (S102); et lire des données en antémémoire dans une zone de stockage spécifiée, libérer un espace mémoire correspondant, migrer des données de partition vers une adresse spécifiée, modifier un niveau de persistance des données migrées, et renvoyer un signal d'expulsion réussie et des informations d'espace expulsé (S103). L'invention concerne également un système d'informatique répartie Spark. En introduisant le système de stockage hybride et en concevant une unité logique d'expulsion et une unité de migration de données en antémémoire, les données sont migrées vers un SSD ou un HDD en fonction de la popularité des données de partition et ne sont pas directement migrées vers un disque magnétique ou les données en antémémoire sont supprimées, de sorte que la pression du manque d'espace mémoire peut être efficacement réduite et les performances de Spark sont améliorées.
PCT/CN2017/099083 2017-08-25 2017-08-25 Procédé et système de traitement de données par informatique répartie spark Ceased WO2019037093A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/099083 WO2019037093A1 (fr) 2017-08-25 2017-08-25 Procédé et système de traitement de données par informatique répartie spark

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/099083 WO2019037093A1 (fr) 2017-08-25 2017-08-25 Procédé et système de traitement de données par informatique répartie spark

Publications (1)

Publication Number Publication Date
WO2019037093A1 true WO2019037093A1 (fr) 2019-02-28

Family

ID=65438348

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/099083 Ceased WO2019037093A1 (fr) 2017-08-25 2017-08-25 Procédé et système de traitement de données par informatique répartie spark

Country Status (1)

Country Link
WO (1) WO2019037093A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947778A (zh) * 2019-03-27 2019-06-28 联想(北京)有限公司 一种Spark存储方法及系统
CN115145841A (zh) * 2022-07-18 2022-10-04 河南大学 一种应用于Spark计算平台中的降低内存争用的方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101907978A (zh) * 2010-07-27 2010-12-08 浙江大学 基于固态硬盘和磁性硬盘的混合存储系统及存储方法
US20110191556A1 (en) * 2010-02-01 2011-08-04 International Business Machines Corporation Optimization of data migration between storage mediums
CN102831088A (zh) * 2012-07-27 2012-12-19 国家超级计算深圳中心(深圳云计算中心) 基于混合存储器的数据迁移方法和装置
CN103186350A (zh) * 2011-12-31 2013-07-03 北京快网科技有限公司 混合存储系统及热点数据块的迁移方法
CN103631730A (zh) * 2013-11-01 2014-03-12 深圳清华大学研究院 内存计算的缓存优化方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110191556A1 (en) * 2010-02-01 2011-08-04 International Business Machines Corporation Optimization of data migration between storage mediums
CN101907978A (zh) * 2010-07-27 2010-12-08 浙江大学 基于固态硬盘和磁性硬盘的混合存储系统及存储方法
CN103186350A (zh) * 2011-12-31 2013-07-03 北京快网科技有限公司 混合存储系统及热点数据块的迁移方法
CN102831088A (zh) * 2012-07-27 2012-12-19 国家超级计算深圳中心(深圳云计算中心) 基于混合存储器的数据迁移方法和装置
CN103631730A (zh) * 2013-11-01 2014-03-12 深圳清华大学研究院 内存计算的缓存优化方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LU , KEZHONG ET AL.: "Design of RDD Persistence Method in Spark for SSDs", JOURNAL OF COMPUTER RESEARCH AND DEVELOPMENT, vol. 54, no. 6, 30 June 2017 (2017-06-30), pages 1382, XP055578521 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947778A (zh) * 2019-03-27 2019-06-28 联想(北京)有限公司 一种Spark存储方法及系统
CN115145841A (zh) * 2022-07-18 2022-10-04 河南大学 一种应用于Spark计算平台中的降低内存争用的方法
CN115145841B (zh) * 2022-07-18 2023-05-12 河南大学 一种应用于Spark计算平台中的降低内存争用的方法

Similar Documents

Publication Publication Date Title
CN107526546B (zh) 一种Spark分布式计算数据处理方法及系统
CN108810041B (zh) 一种分布式缓存系统的数据写入及扩容方法、装置
CN107402722B (zh) 一种数据迁移方法及存储设备
TWI771933B (zh) 借助命令相關過濾器來進行重複資料刪除管理的方法、主裝置以及儲存伺服器
CN105474127B (zh) 用于多处理器系统的虚拟每处理器定时器
WO2024113568A1 (fr) Procédé et appareil de migration de données pour disque statique à semi-conducteurs, dispositif électronique et support d'enregistrement
CN102063406B (zh) 用于多核处理器的网络共享Cache及其目录控制方法
US9218287B2 (en) Virtual computer system, virtual computer control method, virtual computer control program, recording medium, and integrated circuit
US20190034302A1 (en) Transfer track format information for tracks in cache at a primary storage system to a secondary storage system to which tracks are mirrored to use after a failover or failback
WO2019037093A1 (fr) Procédé et système de traitement de données par informatique répartie spark
WO2017157125A1 (fr) Procédé et appareil de suppression d'un hôte en nuage dans un environnement informatique en nuage, serveur et support d'informations
TWI828307B (zh) 用於記憶體管理機會與記憶體交換任務之運算系統及管理其之方法
CN103412800B (zh) 一种虚拟机热备份方法和设备
CN105718320A (zh) 一种时钟任务处理方法、装置及设备
US20250165400A1 (en) Data reduction method, apparatus, and system
WO2015024532A1 (fr) Système et procédé destinés à la mise en cache d'instruction de haute performance
JP2014186675A (ja) 演算処理装置、情報処理装置及び情報処理装置の制御方法
JP2008293472A (ja) 計算機装置およびそのキャッシュリカバリ方法
KR102884533B1 (ko) 스코어보드 저장 및 복원
WO2020235858A1 (fr) Serveur et procédé de commande de celui-ci
US12455831B2 (en) Address translation service management
CN115087961A (zh) 用于相干及非相干存储器请求的仲裁方案
WO2025091914A1 (fr) Procédé et appareil destinés à commander l'accès à des données pour système de stockage distribué
CN110633132B (zh) 存储器模块
CN115936137A (zh) 用于数据密集型应用的自动分层数据存储的系统和方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17922440

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24.09.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17922440

Country of ref document: EP

Kind code of ref document: A1