WO2024074042A1 - 一种数据存储方法及装置、数据读取方法及装置、设备 - Google Patents
一种数据存储方法及装置、数据读取方法及装置、设备 Download PDFInfo
- Publication number
- WO2024074042A1 WO2024074042A1 PCT/CN2023/094310 CN2023094310W WO2024074042A1 WO 2024074042 A1 WO2024074042 A1 WO 2024074042A1 CN 2023094310 W CN2023094310 W CN 2023094310W WO 2024074042 A1 WO2024074042 A1 WO 2024074042A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- size
- target
- block
- item
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0644—Management of space entities, e.g. partitions, extents, pools
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present application relates to the field of data processing technology, and in particular to a data storage method and device, a data reading method and device, a device and a non-volatile storage medium.
- Machine learning in artificial intelligence requires the collection, labeling and preprocessing of data sets, etc. Only then can it be read and used in the training and inference of machine learning and deep learning.
- the reading and writing of data sets may have a great negative impact on the overall performance of artificial intelligence training and inference.
- the main reasons include: (1) Depending on the needs of different algorithms, the number of data sets may be tens of thousands or more (each of which is, for example, an image, text or voice); (2) The data set needs to be preprocessed and written to the hard disk as usable training/test data; (3) After the data set is preprocessed, each data item will usually become smaller and its size is fixed; (4) After the above three steps are completed, the training and inference process is actually "reading" thousands of small data sets for calculation.
- An object of an embodiment of the present application is to provide a data storage method, which saves data reading time and improves data reading and writing efficiency; another object of the present application is to provide a data storage device, a data reading method and device, an equipment and a non-volatile storage medium.
- a data storage method including:
- Each data in the target data set is stored in each continuous target block of the hard disk with the same size; wherein the block size of each target block is determined according to the data size.
- the method after receiving the target data set to be stored and before obtaining the data size of each item of data in the target data set, the method further includes:
- a first preprocessing operation is performed on the target data set; wherein the first preprocessing operation is a preprocessing operation that does not increase the data size.
- a first preprocessing operation is performed on the target data set, including:
- receiving a target data set to be stored includes:
- obtaining the data size of each item of data in the target data set includes:
- a process of determining a block size of a target block according to the data size includes:
- the block size of the target block is obtained by selecting from various optional block sizes that are larger than the data size.
- the block size of the target block is selected from various optional block sizes that are larger than the data size, including:
- the optional block size with the smallest difference from the data size is determined as the block size of the target block.
- each preset optional block size after obtaining each preset optional block size, it also includes:
- step of selecting a block size of the target block from each optional block size larger than the data size is performed;
- the maximum value among the optional block sizes is determined as the block size of the target block.
- a data reading method comprising:
- the read data are returned to the sender of the data read command.
- the method after reading each item of data of the target data set from each consecutive target block of the same size in the hard disk, and before returning each item of data read to the sender of the data read command, the method further includes:
- a second preprocessing operation is performed on each of the read data; wherein the second preprocessing operation is a preprocessing operation for increasing the size of the data.
- a second preprocessing operation is performed on each piece of data read, including:
- receiving a data read command includes:
- reading each item of data of the target data set from consecutive target blocks of the same size in the hard disk includes:
- each data item of the target data set is read from each consecutive target block of the same size in the hard disk according to the one-to-one relationship between the target block and each data item.
- reading each item of data of the target data set from consecutive target blocks of the same size in the hard disk includes:
- each data item of the target data set is read from each consecutive target block of the same size in the hard disk according to the many-to-one relationship between the target block and each data item; wherein each data item is pre-stored in adjacent consecutive blocks.
- a data storage device including:
- a data set receiving module used for receiving a target data set to be stored
- a data size acquisition module is used to acquire the data size of each item of data in the target data set; wherein the size of each item of data in the target data set is the same;
- the data storage module is used to store each data in the target data set into each continuous target block of the same size in the hard disk; wherein the block size of each target block is determined according to the data size.
- a data reading device including:
- a read command receiving module used for receiving a data read command
- the data reading module is used to read each item of the target data set from each consecutive target block of the same size in the hard disk. wherein the block size of each target block is determined according to the data size of each item of data, and the size of each item of data in the target data set is the same;
- the data return module is used to return the read data to the sender of the data read command.
- an electronic device including:
- the processor is used to implement the steps of the above data storage method or data reading method when executing a computer program.
- a non-volatile readable storage medium on which a computer program is stored.
- the computer program is executed by a processor, the steps of the above data storage method or data reading method are implemented.
- the data storage method provided in the embodiment of the present application receives a target data set to be stored; obtains the data size of each data item in the target data set; wherein the size of each data item in the target data set is the same; stores each data item in the target data set into consecutive target blocks of the same size in the hard disk; wherein the block size of each target block is determined according to the data size.
- the embodiments of the present application also provide a data storage device, a data reading method and device, an apparatus and a non-volatile storage medium corresponding to the above-mentioned data storage method, which have the above-mentioned technical effects and will not be repeated here.
- FIG1 is a flowchart of an implementation method of data storage in some embodiments of the present application.
- FIG2 is another implementation flow chart of the data storage method in some embodiments of the present application.
- FIG3 is a flowchart of an implementation of a data reading method in some embodiments of the present application.
- FIG4 is another implementation flow chart of the data reading method in some embodiments of the present application.
- FIG5 is a schematic diagram of a handwriting recognition data set image file in some embodiments of the present application.
- FIG6 is a schematic diagram of a normalized handwriting recognition data set image file in some embodiments of the present application.
- FIG7 is a schematic diagram of a data set image classification in some embodiments of the present application.
- FIG8 is a structural block diagram of a data storage device in some embodiments of the present application.
- FIG9 is a structural block diagram of a data reading device in some embodiments of the present application.
- FIG10 is a structural block diagram of an electronic device in some embodiments of the present application.
- FIG. 11 is a schematic diagram of a specific structure of an electronic device provided in some embodiments of the present application.
- the operating system and the file system do not guarantee the continuity of each piece of data stored in the hard disk.
- each piece of data will be cut into data blocks because of the block size planned by the hard disk and the file system.
- the continuity cannot be guaranteed and the data is often stored discontinuously in the hard disk.
- the part of reading data still requires the CPU (Central Processing Unit), main memory and hard disk I/O (Input/Output) system plus related software and operating system to complete data reading and writing.
- the GPU Graphics Processing Unit
- CPU Central Processing Unit
- CPU main memory and hard disk I/O
- CPU Central Processing Unit
- main memory and hard disk I/O Input/Output
- related software and operating system need to communicate and transmit non-contiguous data with each other frequently.
- the data storage method provided in the present application it is ensured that all data in the target data set are stored in continuous blocks of the hard disk, so that data storage is greatly optimized.
- it can be read directly from continuous blocks of the hard disk, saving data reading time and improving data reading and writing efficiency.
- FIG. 1 is a flowchart of an implementation method of a data storage method in an embodiment of the present application. The method may include the following steps:
- the target data set to be stored is sent to the accelerator.
- the target data set to be stored may be sent to the accelerator via a CPU or a GPU, and the accelerator receives the target data set to be stored.
- the target dataset can be a dataset used for artificial intelligence machine learning training, such as a dataset used for training an image recognition model, or a dataset used for an item recommendation model, etc.
- the data types in the target dataset can be pictures, text, voice, etc.
- the CPU host or GPU host and the accelerator may be connected via a physical connection or via a network, which is not limited in the embodiments of the present application.
- the sizes of each data item in the target data set are the same.
- the target data set After receiving the target data set to be stored, the target data set can be preprocessed and transformed into usable training data or test data so that the size of each data item in the target data set to be stored is the same, and the data size of each data item in the target data set is obtained.
- Image decoding Decode the compressed image. Color images are decoded and stored in three pixel channels: R (Red), G (Green), and B (Blue). Some model algorithms will need to be trained on one or more of the R, G, and B channels later;
- Grayscale conversion simply converts an image from color to black and white. It is often used to reduce computational complexity in artificial intelligence algorithms. Since most images do not require color recognition, it is wise to use grayscale conversion, which reduces the number of pixels in the image and thus reduces the amount of computation required;
- Normalization is the process of projecting image data pixels (intensities) to a predefined range, usually (0, 1) or (-1, 1), but different algorithms have different definitions. Its purpose is to improve fairness for all images. For example, scaling all images to an equal range of [0, 1] or [-1, 1] allows all images to contribute equally to the total loss, rather than having strong and weak losses when other images have high and low pixel ranges, respectively.
- the purpose of normalization also includes providing a standard learning rate. Since high-pixel images require a low learning rate, while low-pixel images require a high learning rate, rescaling helps provide a standard learning rate for all images.
- Data augmentation is the process of making small changes to existing data to increase its diversity without collecting new data. This is a technique used to expand a dataset. Standard data augmentation techniques include horizontal and vertical flipping, rotation, cropping, shearing, etc. Performing data augmentation helps prevent neural networks from learning irrelevant features and improves model performance.
- Standardization is a method of scaling and preprocessing images to make them have similar or consistent height and width. When training, testing, and inference of artificial intelligence, if the size of the image is consistent, the processing will be more efficient.
- the minimum write block size available for selection is pre-set, such as 256 bytes, 512 bytes, 1024 bytes, 2048 bytes, 4096 bytes, etc.
- Some solid state drives can support a larger range of block sizes.
- the data in the target data set is stored in each target block of the same size in the hard disk.
- the present application can directly read from the continuous blocks of the hard disk when reading data, saving the time of data reading and improving the efficiency of data reading and writing.
- the embodiments of the present application also provide corresponding improved solutions.
- the same steps or corresponding steps as those in the above embodiments can be referenced to each other, and the corresponding beneficial effects can also be referenced to each other, which will not be repeated one by one in the following improved embodiments.
- FIG. 2 is another implementation flow chart of the data storage method in an embodiment of the present application.
- the method may include the following steps:
- step S201 may include the following steps:
- a target data set for artificial intelligence model training is collected in advance and sent to an accelerator, which receives the target data set for artificial intelligence model training to be stored.
- the target dataset can contain training set, validation set and test set.
- the model is fitted on a training dataset.
- the training set is a collection of examples used to fit parameters (such as the weights of the connections between neurons in an artificial neural network).
- the training set is usually a data pair consisting of an input vector and an output vector.
- the output vector is called the target.
- the current model makes predictions for each example in the training set and compares the predictions with the target. Based on the results of the comparison, the learning algorithm updates the parameters of the model.
- the process of model fitting may include both feature selection and parameter estimation.
- the fitted model is used to make predictions on a validation dataset.
- the validation set provides an unbiased evaluation of the model fitted on the training set when tuning the model's hyperparameters (e.g., the number of neurons in the hidden layer of a neural network).
- the validation set can be used for early stopping in regularization, i.e., stopping training when the validation set error rises (a sign of overfitting on the training set).
- test dataset can be used to provide an unbiased evaluation of the final model. If the test dataset is never used during training (for example, not used in cross-validation), it is also called a holdout set.
- test set and validation set can be the same set.
- the method may further include the following steps:
- a first preprocessing operation is performed on the target data set; wherein the first preprocessing operation is a preprocessing operation that does not increase the data size.
- a first preprocessing operation is performed on the target data set.
- the first preprocessing operation includes partial preprocessing, which is generally performed without increasing the size of the data.
- Some preprocessing operations such as horizontal flipping, vertical flipping, and rotation of the image to be recognized, can avoid data preprocessing during data storage, leading to data enlargement, save storage space, and reduce costs.
- performing a first preprocessing operation on the target data set may include the following steps:
- the sizes of each data item in the target data set are the same.
- step S202 may include the following steps:
- each data item in the target data set also includes a data tag and a data file name, so that the data itself, the data tag and the data file name together constitute a complete data item.
- the data label is the reference standard corresponding to the data item, and the data file name is the identification information that uniquely identifies the data item.
- the minimum write block size that can be selected is preset. For example, for a solid state drive (SSD), it can usually be set to 256 bytes, 512 bytes, 1024 bytes, 2048 bytes, 4096 bytes, etc. Some solid state drives can support a larger range of block sizes. After obtaining the data size of each data item in the target data set, obtain the preset optional block sizes.
- SSD solid state drive
- step S204 Determine whether the data size is less than or equal to the maximum value of the optional block sizes. If the data size is less than or equal to the maximum value of the optional block sizes, execute step S205; if the data size is greater than the maximum value of the optional block sizes, execute step S206.
- step S205 determines whether the data size is less than or equal to the maximum value of the optional block sizes. If the data size is less than or equal to the maximum value of the optional block sizes, it means that there is a selectable block size so that each data item is stored in only one complete block, and step S205 is executed. If the data size is greater than the maximum value of the optional block sizes, it means that the size of each data item has exceeded the maximum supported block size of the hard disk, and multiple blocks are required to accommodate each data item, and step S206 is executed.
- S205 Selecting a block size of a target block from various optional block sizes that are larger than the data size.
- the data size is less than or equal to the maximum value of the optional block sizes, it means that there is an optional block size, so that each data item is stored in only one complete block, and the block size of the target block is selected from the optional block sizes that are larger than the data size.
- step S205 may include the following steps:
- Step 1 Select the optional block size with the smallest difference with the data size from the optional block sizes that are larger than the data size;
- Step 2 Determine the optional block size with the smallest difference with the data size as the block size of the target block.
- the optional block size with the smallest difference from the data size is selected from the optional block sizes larger than the data size, and the optional block size with the smallest difference from the data size is determined as the block size of the target block.
- the target block size is determined to be 4096 bytes. Group.
- S206 Determine the maximum value among the optional block sizes as the block size of the target block.
- each data item size is 3200 bytes.
- the block size of the target block is set to 2048 bytes.
- FIG. 3 is a flowchart of an implementation of a data reading method in an embodiment of the present application. The method may include the following steps:
- a data read command is sent to the accelerator, such as the CPU or GPU sending the data read command to the accelerator, and the accelerator receives the data read command.
- S302 reading each data item of the target data set from each consecutive target block of the hard disk with the same size; wherein the block size of each target block is determined according to the data size of each data item, and the size of each data item in the target data set is the same.
- the target data set pre-stored in the accelerator is pre-processed before storage so that the size of each data item in the target data set is the same, and the block size of the target block is determined according to the data size of each data item in the target data set, so that the target data set is stored in a continuous block.
- the accelerator After receiving the data read command, the accelerator reads each data item of the target data set from each continuous and same-sized target block in the hard disk. By reading each data item of the target data set from continuous blocks, the data reading rate is greatly improved.
- the read data After reading each item of data in the target data set from each continuous block of the block size of the target block in the hard disk, the read data is returned to the sender of the data read command, thereby completing the fast reading of each item of data in the target data set.
- the sending end is generally the host CPU or host GPU that interacts with the accelerator to read and write data.
- FIG. 4 is another implementation flow chart of the data reading method in the embodiment of the present application.
- the method may include the following steps:
- step S401 may include the following steps:
- the target data set pre-stored in the accelerator may be a data set for artificial intelligence model training.
- a data read command for reading the target data set for artificial intelligence model training is sent to the accelerator.
- the accelerator receives the data read command for reading the target data set for artificial intelligence model training.
- S402 Read each data item of the target data set from each consecutive target block of the hard disk with the same size; wherein the block size of each target block is determined according to the data size of each data item, and the size of each data item in the target data set is the same.
- step S402 may include the following steps:
- each data item of the target data set is read from each consecutive target block of the same size in the hard disk according to the one-to-one relationship between the target block and each data item.
- each data item of the target data set is read from each continuous and same-sized target block in the hard disk, thereby realizing fast reading of each data item in the target data set.
- step S402 may include the following steps:
- each data item of the target data set is read from each consecutive target block of the same size in the hard disk according to the many-to-one relationship between the target block and each data item; wherein each data item is pre-stored in adjacent consecutive blocks.
- each data item of the target data set is read from each continuous and same-sized target block in the hard disk. This enables continuous reading of each data item in the target data set, improving data reading efficiency.
- S403 performing a second preprocessing operation on each item of the read data; wherein the second preprocessing operation is a preprocessing operation for increasing the size of the data.
- a second preprocessing operation to increase the data size is performed on each data set read.
- the accelerator first reads the previously partially preprocessed data set from the continuous blocks of the hard disk, and then performs all the remaining preprocessing that was not performed when writing, and finally transmits the data set that has completed all preprocessing back to the CPU host or GPU host system, so that the written part of the preprocessing (processing that does not increase the size) is moved from the CPU host or GPU host system to the accelerator, further freeing up the CPU host or GPU host, achieving a more coordinated adjustment of the preprocessing steps and order, and reducing the size of the data set when storing data.
- this process embodies the concept of computational storage.
- the read and write performance of the data set is optimized, that is, "computing power" is exchanged for space reduction.
- the accelerator implements normalization preprocessing, which actually only passes through the fast matrix operation circuit and then returns the data item to the CPU host or GPU host. This process only requires a small amount of additional computing time and circuit cost, which can greatly reduce the storage space of the data set.
- In-storage computing/computing storage refers to placing computing components closer to the storage device, while minimizing the use of the CPU's computing power. This speeds up the overall storage performance and offloads the CPU.
- step S403 may include the following steps:
- normalization preprocessing operations may be performed on each item of data read.
- the value of each point in the 28x28 array that has not yet been normalized and preprocessed is represented by 0 to 255 to represent its color intensity, which means that only 1 byte is needed to represent the value of each point, but after normalization preprocessing, the floating-point value of each point requires 4 bytes. Therefore, during preprocessing, normalization preprocessing is not performed first, and the data set obtained after partial preprocessing is written to the hard disk, but during reading, normalization preprocessing is performed in the accelerator, and then transmitted back to the CPU host or GPU host. In this way, each data item, compared to storing the normalized floating-point value, only 1/4 of the size is used when storing on the hard disk in this application, and the space occupied is smaller.
- Figure 5 is a schematic diagram of a handwriting recognition data set image file in an embodiment of the present application
- Figure 6 is a schematic diagram of a handwriting recognition data set image file after normalization in an embodiment of the present application.
- the left side of Figure 5 is a 28x28 grayscale monochrome image of "9”
- the right side shows the value of each image point in the 28x28 array (normalization preprocessing has not yet been performed), with its color intensity represented by 0 to 255.
- the left side of Figure 6 is a picture, which is finally represented by a 28x28 floating point array after normalization preprocessing (each array element is divided by 255).
- Figure 7 is a schematic diagram of a data set image classification in an embodiment of the present application.
- Even the relatively simple MNIST data set has 60,000 training sets and 10,000 test data. These 70,000 records need to be preprocessed.
- the image of each number is stored in a 28x28 array, and each one has a label (1abel) to annotate its real number. If the label range is 0 to 9, only one more byte is needed to store the label annotation.
- Subsequent model training will frequently read 60,000 items (training sets) depending on the training process. After each batch of training is completed, 10,000 items (test sets) may also be read multiple times to confirm whether the test is successful or not.
- the CPU/GPU host system 0/1/2 writes and reads the data set access accelerator 0/1/2 respectively.
- the writing process of multiple data sets with a more advanced data set hardware core architecture is as follows:
- the CPU/GPU host system 0/1/2 directly transmits the data set that is “not pre-processed at all” to the corresponding data set access accelerator 0/1/2;
- Data set access accelerator 0/1/2 receives the written data set and first performs partial preprocessing that does not cause the data to become larger. In this way, the written data set requires the least storage space. This step transfers part of the preprocessing operations of the CPU/GPU to the accelerator;
- the CPU/GPU host system 0/1/2 is received from the transmission medium to set the settings and parameters for read preprocessing. This step usually only needs to be performed once and is applicable to the reading of all data items;
- the pre-processed data set is transmitted back to the CPU/GPU host system 0/1/2.
- the present application also provides a data storage device.
- the data storage device described below and the data storage method described above can be referenced to each other.
- FIG. 8 is a structural block diagram of a data storage device in an embodiment of the present application, and the device may include:
- the data set receiving module 81 is used to receive the target data set to be stored
- the data size acquisition module 82 is used to acquire the data size of each item of data in the target data set; wherein the size of each item of data in the target data set is the same;
- the data storage module 83 is used to store each data in the target data set into each continuous target block of the same size in the hard disk; wherein the block size of each target block is determined according to the data size.
- the device may further include:
- the first preprocessing module is used to perform a first preprocessing operation on the target data set after receiving the target data set to be stored and before obtaining the data size of each data item in the target data set; wherein the first preprocessing operation is a preprocessing operation that does not increase the data size.
- the first preprocessing module is specifically a module that performs preprocessing operations on the target data set except for normalization preprocessing.
- the data set receiving module 81 is specifically a module for receiving a target data set to be stored for artificial intelligence model training.
- the data size acquisition module 82 is specifically a module for acquiring the data size of each data item in the target data set, which is composed of the data itself, the data label, and the data file name.
- the device may further include a block size determination module, the block size determination module:
- the optional block size acquisition submodule is used to obtain preset optional block sizes
- the block size selection submodule is used to select the block size of the target block from various optional block sizes that are larger than the data size.
- the block size selection submodule includes:
- the block size selection unit is used to select the block size with the smallest difference with the data size from the optional block sizes larger than the data size.
- the block size determining unit is used to determine the optional block size with the smallest difference with the data size as the block size of the target block.
- the device may further include:
- a judging module used for judging whether the data size is less than or equal to the maximum value of the optional block sizes after obtaining the preset optional block sizes
- the block size selection submodule is specifically a module for selecting the block size of the target block from the optional block sizes that are larger than the data size when it is determined that the data size is less than or equal to the maximum value of the optional block sizes;
- the block size determination module is specifically a module that determines the maximum value among the optional block sizes as the block size of the target block when it is determined that the data size is greater than the maximum value among the optional block sizes.
- the present application further provides a data reading device.
- the data reading device described below and the data reading method described above can refer to each other.
- FIG. 9 is a structural block diagram of a data reading device in an embodiment of the present application, and the device may include:
- a read command receiving module 91 used for receiving a data read command
- the data reading module 92 is used to read each data item of the target data set from each target block of the hard disk that is continuous and has the same size; wherein the block size of each target block is determined according to the data size of each data item, and the size of each data item in the target data set is the same;
- the data return module 93 is used to return the read data to the sending end of the data read command.
- the device may further include:
- the second preprocessing module is used to perform a second preprocessing operation on each item of data read from each consecutive target block of the same size in the hard disk before returning the read data to the sending end of the data reading command; wherein the second preprocessing operation is a preprocessing operation to increase the data size.
- the second preprocessing module is specifically a module that performs normalization preprocessing operations on each piece of data read.
- the read command receiving module 91 is specifically a module that receives a data read command to read a target data set for artificial intelligence model training.
- the data reading module 92 is specifically a module that reads each data item of the target data set from consecutive target blocks of the same size in the hard disk according to a one-to-one relationship between the target block and each data item when the block size of the target block is greater than or equal to the data size.
- the data reading module 92 is specifically a module that reads each data item of the target data set from consecutive target blocks of the same size in the hard disk according to the many-to-one relationship between the target block and each data item when the block size of the target block is smaller than the data size; wherein each data item is pre-stored in adjacent consecutive blocks.
- FIG. 10 is a schematic diagram of an electronic device provided by the present application, and the device may include:
- the processor 322 is used to implement the steps of the data storage method or the data reading method of the above method embodiment when executing a computer program.
- FIG. 11 is a schematic diagram of the specific structure of an electronic device provided in this embodiment.
- the electronic device may have relatively large differences due to different configurations or performances, and may include a processor (central processing units, CPU) 322 (for example, one or more processors) and a memory 332, and the memory 332 stores one or more computer applications 342 or data 344.
- the memory 332 can be a temporary storage or a permanent storage.
- the program stored in the memory 332 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the data processing device.
- the processor 322 can be configured to communicate with the memory 332 to execute a series of instruction operations in the memory 332 on the electronic device 301.
- the electronic device 301 may further include one or more power supplies 326 , one or more wired or wireless network interfaces 350 , one or more input and output interfaces 358 , and/or one or more operating systems 341 .
- the steps in the data storage method or data reading method described above can be implemented by the structure of an electronic device.
- the present application further provides a non-volatile readable storage medium, on which a computer program is stored.
- a computer program is stored on a non-volatile readable storage medium, on which a computer program is stored.
- the non-volatile readable storage medium may include: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and other media that can store program codes.
- each embodiment is described in a progressive manner, and each embodiment focuses on the differences from other embodiments.
- the same or similar parts between the embodiments can be referred to each other.
- the description is relatively simple, and the relevant parts can be referred to the method part description.
- the solution provided by the embodiment of the present application can be applied to the field of data processing technology.
- a target data set to be stored is received; the data size of each data item in the target data set is obtained, wherein the size of each data item in the target data set is the same; and each data item in the target data set is stored in each continuous target block of the same size in the hard disk, wherein the block size of each target block is determined according to the data size, thereby achieving the technical effect of saving data reading time and improving data reading and writing efficiency.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请公开了一种数据存储方法,该方法包括以下步骤:接收待存储的目标数据集;获取目标数据集中每项数据的数据大小;其中,目标数据集中各项数据的大小相同;将目标数据集中各项数据存储至硬盘中连续且大小相同的各目标区块;其中,各目标区块的区块大小根据数据大小确定。本申请还公开了一种数据存储装置、数据读取方法及装置、设备及存储介质。
Description
相关申请的交叉引用
本申请要求于2022年10月08日提交中国专利局,申请号为202211219584.2,申请名称为“一种数据存储方法及装置、数据读取方法及装置、设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及数据处理技术领域,特别是涉及一种数据存储方法及装置、数据读取方法及装置、设备及非易失性存储介质。
人工智能在近几年得到了快速发展,人工智能的机器学习需要对数据集进行收集,标记及预处理等。而后才能在机器学习与深度学习的训练与推论中被读取与使用。
然而数据集的读写对整体人工智能训练与推论的效能有可能有极大的负面影响,主要原因包括:(1)数据集依不同演算法的需求,其个数可能成千上万或更多(每个都是例如图档,文字或语音);(2)数据集需要经过预处理为可用的训练/测试数据写入硬盘;(3)数据集经过预处理后,通常每项数据都会变小,且其大小是固定的;(4)以上三步骤完成后,训练与推论的过程其实是“读取”成千上万小数据量的数据集数据,进行运算。也就是说要存取一个数据集,实际上需要执行很多系统程序,并且需要在硬盘中花费时间搜寻该数据集的所有数据项,以还原成原来的数据集。需要花费大量时间搜寻硬盘大部份不连续的区块,才能组合为原数据集,导致数据读写效率低。
综上所述,如何有效地解决花费大量时间搜寻硬盘大部份不连续的区块,才能组合为原数据集,导致数据读写效率低等问题,是目前本领域技术人员急需解决的问题。
发明内容
本申请实施例的目的是提供一种数据存储方法,该方法节省了数据读取的时间,提升了数据读写效率;本申请的另一目的是提供一种数据存储装置、数据读取方法及装置、设备及非易失性存储介质。
为解决上述技术问题,本申请提供如下技术方案:
根据本申请实施例的第一方面,提供了一种数据存储方法,包括:
接收待存储的目标数据集;
获取目标数据集中每项数据的数据大小;其中,目标数据集中各项数据的大小相同;
将目标数据集中各项数据存储至硬盘中连续且大小相同的各目标区块;其中,各目标区块的区块大小根据数据大小确定。
在本申请的一种具体实施方式中,在接收待存储的目标数据集之后,获取目标数据集中每项数据的数据大小之前,还包括:
对目标数据集进行第一预处理操作;其中,第一预处理操作为未增加数据大小的预处理操作。
在本申请的一种具体实施方式中,对目标数据集进行第一预处理操作,包括:
对目标数据集进行除归一化预处理之外的预处理操作。
在本申请的一种具体实施方式中,接收待存储的目标数据集,包括:
接收待存储的用于人工智能模型训练的目标数据集。
在本申请的一种具体实施方式中,获取目标数据集中每项数据的数据大小,包括:
获取目标数据集中由数据本身、数据标签以及数据档名构成的每项数据的数据大小。
在本申请的一种具体实施方式中,还包括根据数据大小确定目标区块的区块大小的过程,根据数据大小确定目标区块的区块大小的过程,包括:
获取预设的各可选区块大小;
从大于数据大小的各可选区块大小中选取得到目标区块的区块大小。
在本申请的一种具体实施方式中,从大于数据大小的各可选区块大小中选取得到目标区块的区块大小,包括:
从大于数据大小的各可选区块大小中选取与数据大小差值最小的可选区块大小;
将与数据大小差值最小的可选区块大小确定为目标区块的区块大小。
在本申请的一种具体实施方式中,在获取预设的各可选区块大小之后,还包括:
判断数据大小是否小于等于各可选区块大小中的最大值;
若是,则执行从大于数据大小的各可选区块大小中选取得到目标区块的区块大小的步骤;
若否,则将各可选区块大小中的最大值确定为目标区块的区块大小。
根据本申请实施例的第二方面,提供了一种数据读取方法,包括:
接收数据读取命令;
从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据;其中,各目标区块的区块大小根据每项数据的数据大小确定,且目标数据集中各项数据的大小相同;
将读取到的各项数据返回给数据读取命令的发送端。
在本申请的一种具体实施方式中,在从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据之后,将读取到的各项数据返回给数据读取命令的发送端之前,还包括:
对读取到的各项数据进行第二预处理操作;其中,第二预处理操作为增加数据大小的预处理操作。
在本申请的一种具体实施方式中,对读取到的各项数据进行第二预处理操作,包括:
对读取到的各项数据进行归一化预处理操作。
在本申请的一种具体实施方式中,接收数据读取命令,包括:
接收读取用于人工智能模型训练的目标数据集的数据读取命令。
在本申请的一种具体实施方式中,从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据,包括:
当目标区块的区块大小大于等于数据大小时,按照目标区块与每项数据的一对一关系,从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据。
在本申请的一种具体实施方式中,从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据,包括:
当目标区块的区块大小小于数据大小时,按照目标区块与每项数据的多对一关系,从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据;其中,每项数据预先存储在相邻连续区块中。
根据本申请实施例的第三方面,提供了一种数据存储装置,包括:
数据集接收模块,用于接收待存储的目标数据集;
数据大小获取模块,用于获取目标数据集中每项数据的数据大小;其中,目标数据集中各项数据的大小相同;
数据存储模块,用于将目标数据集中各项数据存储至硬盘中连续且大小相同的各目标区块;其中,各目标区块的区块大小根据数据大小确定。
根据本申请实施例的第四方面,提供了一种数据读取装置,包括:
读取命令接收模块,用于接收数据读取命令;
数据读取模块,用于从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数
据;其中,各目标区块的区块大小根据每项数据的数据大小确定,且目标数据集中各项数据的大小相同;
数据返回模块,用于将读取到的各项数据返回给数据读取命令的发送端。
根据本申请实施例的第五方面,提供了一种电子设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行计算机程序时实现如前数据存储方法或数据读取方法的步骤。
根据本申请实施例的第六方面,提供了一种非易失性可读存储介质,非易失性可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如前数据存储方法或数据读取方法的步骤。
本申请实施例所提供的数据存储方法,接收待存储的目标数据集;获取目标数据集中每项数据的数据大小;其中,目标数据集中各项数据的大小相同;将目标数据集中各项数据存储至硬盘中连续且大小相同的各目标区块;其中,各目标区块的区块大小根据数据大小确定。
由上述技术方案可知,通过依据待存储的目标数据集中每项数据的固定大小设定硬盘的目标区块的区块大小,保证目标数据集中各项数据存储至硬盘连续区块。使得数据存储得到较大优化,在数据读取时能够从硬盘连续区块中直接读取,节省了数据读取的时间,提升了数据读写效率。
相应的,本申请实施例还提供了与上述数据存储方法相对应的数据存储装置、数据读取方法及装置、设备和非易失性存储介质,具有上述技术效果,在此不再赘述。
为了更清楚地说明本申请实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请一些实施例中数据存储方法的一种实施流程图;
图2为本申请一些实施例中数据存储方法的另一种实施流程图;
图3为本申请一些实施例中数据读取方法的一种实施流程图;
图4为本申请一些实施例中数据读取方法的另一种实施流程图;
图5为本申请一些实施例中一种手写辨识数据集图档示意图;
图6为本申请一些实施例中一种对手写辨识数据集图档归一化后的示意图;
图7为本申请一些实施例中一种数据集图档分类示意图;
图8为本申请一些实施例中一种数据存储装置的结构框图;
图9为本申请一些实施例中一种数据读取装置的结构框图;
图10为本申请一些实施例中一种电子设备的结构框图;
图11为本申请一些实施例中提供的一种电子设备的具体结构示意图。
现有的数据存储方法,储存在硬盘中的每笔数据,作业系统、档案系统是不保证其连续性的,也就是说每个数据,因为配合硬盘与档案系统规划的区块大小,会被切割为数据区块,无法保证连续,实际上是常常不连续地储存在硬盘之中。读取数据的部分仍需要CPU(Central Processing Unit,中央处理器),主记忆体与硬盘I/O(Input/Output,输入/输出)系统加上相关软体与作业系统一起完成数据读写,这代表GPU(Graphics Processing Unit,图形处理器)、CPU、主记忆体与硬盘I/O以及相关软体与作业系统彼此需要频繁地沟通与传输非地址连续的数据,也就是说用户在应用层要存取一个数据,实际上
需要执行很多作业系统、档案系统的程序,并且需要在硬盘中花费时间搜寻该数据的所有数据区块,以组合还原成原来的档案。从而导致数据集的读写对整体人工智能训练与推论的效能有可能造成极大的负面影响,进而导致读写效能下降。
为此,本申请中提供的数据存储方法中,保证目标数据集中各项数据存储至硬盘连续区块,使得数据存储得到较大优化,在数据读取时能够从硬盘连续区块中直接读取,节省了数据读取的时间,提升了数据读写效率。
为了使本技术领域的人员更好地理解本申请方案,下面结合附图和具体实施方式对本申请作进一步的详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
参见图1,图1为本申请实施例中数据存储方法的一种实施流程图,该方法可以包括以下步骤:
S101:接收待存储的目标数据集。
当需要存储预先获取到的目标数据集(DataSet)时,向加速器发送待存储的目标数据集,如可以通过CPU或GPU向加速器发送待存储的目标数据集,加速器接收待存储的目标数据集。
目标数据集可以为用于人工智能机器学习训练的数据集,如可以为用于训练图像识别模型的数据集,也可以为用于物品推荐模型的数据集,等等。目标数据集中的数据类型可以为图片、文字、语音等。
CPU主机或GPU主机与加速器之间可以通过实体连线连接,也可以通过网路连接,本申请实施例对此不做限定。
S102:获取目标数据集中每项数据的数据大小。
其中,目标数据集中各项数据的大小相同。
在接收到待存储的目标数据集之后,可以对目标数据集进行预处理,变换为可用的训练数据或测试数据,使得待存储的目标数据集中各项数据的大小相同,获取目标数据集中每项数据的数据大小。
以深度学习的图片预处理为例。给定一张压缩过的图片,通常会进行以下一个或多个预处理步骤:
(1)图片解码(Image decode):将压缩的图片解码,彩色的图会解码分为R(Red,红色)、G(Green,绿色)、B(Blue,蓝色)三个像素通道的图片储存。有些模型演算法后续会需要针对R、G、B其中一个或多个通道进行训练;
(2)灰度转换(Grayscale conversion)灰度转换只是将图像从彩色转换为黑白。它通常用于降低人工智能算法中的计算复杂度。由于大多数图片不需要识别颜色,因此使用灰度转换是明智的,它减少了图像中的像素数量,从而减少了所需的计算量;
(3)归一化(Normalization):归一化是将图像数据像素(强度)投影到预定义范围的过程,通常为(0,1)或(-1,1),但不同演算法有不同的定义,其目的是提高所有图像的公平性。例如,将所有图像缩放到[0,1]或[-1,1]的相等范围允许所有图像对总损失做出同等贡献,而不是当其他图像具有高像素和低像素范围时分别是强损失和弱损失。归一化的目的还包括提供标准学习率由于高像素图像需要低学习率,而低像素图像需要高学习率,重新缩放有助于为所有图像提供标准学习率;
(4)数据增强(Data Augmentation):数据增强是在不收集新数据的情况下对现有数据进行微小改动以增加其多样性的过程。这是一种用于扩大数据集的技术。标准数据增强技术包括水平和垂直翻转、旋转、裁剪、剪切等。执行数据增强有助于防止神经网络学习不相关的特征,提升模型性能;
(5)标准化(Image standardization):标准化是一种缩放和预处理图像使其具有相似或一致性高度和宽度的方法。人工智能的训练、测试、推论时,如果图像的尺寸是一致的,则处理起来效率更高。
S103:将目标数据集中各项数据存储至硬盘中连续且大小相同的各目标区块;其中,各目标区块的区块大小根据数据大小确定。
预先设置可供选择的最小写入区块大小,如通常可以设定为256位元组(Byte)、512位元组、1024位元组、2048位元组、4096位元组等,有些固态硬盘可以支持更大范围的区块大小。在获取到目标数据集中每项数据的数据大小之后,根据数据大小确定目标区块的区块大小。如在确定存在大于等于数据大小的可选择区块大小时,从大于等于数据大小的各可选择区块大小中选择与数据大小最接近的区块大小。
在根据数据大小确定目标区块的区块大小之后,将目标数据集中各项数据存储至硬盘中连续且大小相同的各目标区块。相较于现有的读取不连续地储存在硬盘中的数据,本申请在数据读取时能够从硬盘连续区块中直接读取,节省了数据读取的时间,提升了数据读写效率。
由上述技术方案可知,通过依据待存储的目标数据集中每项数据的固定大小设定硬盘的目标区块的区块大小,保证目标数据集中各项数据存储至硬盘连续区块。使得数据存储得到较大优化,在数据读取时能够从硬盘连续区块中直接读取,节省了数据读取的时间,提升了数据读写效率。
需要说明的是,基于上述实施例,本申请实施例还提供了相应的改进方案。在后续实施例中涉及与上述实施例中相同步骤或相应步骤之间可相互参考,相应的有益效果也可相互参照,在下文的改进实施例中不再一一赘述。
参见图2,图2为本申请实施例中数据存储方法的另一种实施流程图,该方法可以包括以下步骤:
S201:接收待存储的目标数据集。
在本申请的一种具体实施方式中,步骤S201可以包括以下步骤:
接收待存储的用于人工智能模型训练的目标数据集。
当需要训练人工智能模型时,预先收集用于人工智能模型训练的目标数据集,并将用于人工智能模型训练的目标数据集发送给加速器,加速器接收待存储的用于人工智能模型训练的目标数据集。
目标数据集可以包含训练集、验证集和测试集。
首先,模型在训练集(training dataset)上进行拟合。对于监督式学习,训练集是由用来拟合参数(例如人工神经网络中神经元之间连接的权重)的样本组成的集合。在实践中,训练集通常是由输入向量和输出向量组成的数据对。其中输出向量被称为目标。在训练过程中,当前模型会对训练集中的每个样本进行预测,并将预测结果与目标进行比较。根据比较的结果,学习算法会更新模型的参数。模型拟合的过程可能同时包括特征选择和参数估计。
接下来,拟合得到的模型会在验证集(validation dataset)上进行预测。在对模型的超参数(例如神经网络中隐藏层的神经元数量)进行调整时,验证集提供了对在训练集上拟合得到模型的无偏评估。验证集可用于正则化中的提前停止,即在验证集误差上升时(此为在训练集上过拟合的信号),停止训练。
最后,测试集(test dataset)可被用来提供对最终模型的无偏评估。若测试集在训练过程中从未用到(例如,没有被用在交叉验证当中),则它也被称之为预留集。
很多人工智能的算法只需用使用训练集作为训练,之后使用测试集或验证集作为训练完成的测试,之后便布署进行推论。也就是说测试集或验证集可以是同一集合。
在本申请的一种具体实施方式中,在步骤S201之后,在步骤S202之前,该方法还可以包括以下步骤:
对目标数据集进行第一预处理操作;其中,第一预处理操作为未增加数据大小的预处理操作。
在接收到待存储的目标数据集之后,对目标数据集进行第一预处理操作。在将目标数据集存储至加速器的过程中,第一预处理操作包括部分预处理,一般是进行未增加数据大小的
一些预处理操作。如对待识别的图片进行水平翻转、垂直翻转、旋转等。从而避免数据存储过程中的数据预处理导致数据增大,节省存储空间,降低成本。
在本申请的一种具体实施方式中,对目标数据集进行第一预处理操作,可以包括以下步骤:
对目标数据集进行除归一化预处理之外的预处理操作。
由于归一化处理得到的浮点数值需要占用更多的存储空间,因此在接收到待存储的目标数据集之后,对目标数据集进行预处理时,对目标数据集进行除归一化预处理之外的预处理操作,从而节省存储空间。
S202:获取目标数据集中每项数据的数据大小。
其中,目标数据集中各项数据的大小相同。
在本申请的一种具体实施方式中,步骤S202可以包括以下步骤:
获取目标数据集中由数据本身、数据标签以及数据档名构成的每项数据的数据大小。
目标数据集中每项数据除了包含数据本身外,还包括数据标签以及数据档名,从而由数据本身、数据标签以及数据档名共同构成一项完整数据。在接收到待存储的目标数据集之后,获取目标数据集中由数据本身、数据标签以及数据档名构成的每项数据的数据大小。
数据标签为数据项对应的参考标准,数据档名为对数据项进行唯一标识的标识信息。
S203:获取预设的各可选区块大小。
预先设置可供选择的最小写入区块大小,如以固态硬盘(Solid State Disk,SSD)为例,通常可以设定为256位元组(Byte)、512位元组、1024位元组、2048位元组、4096位元组等,有些固态硬盘可以支持更大范围的区块大小。在获取到目标数据集中每项数据的数据大小之后,获取预设的各可选区块大小。
S204:判断数据大小是否小于等于各可选区块大小中的最大值,若数据大小小于等于各可选区块大小中的最大值,则执行步骤S205,若数据大小大于各可选区块大小中的最大值,则执行步骤S206。
在获取到预设的各可选区块大小之后,判断数据大小是否小于等于各可选区块大小中的最大值,若是数据大小小于等于各可选区块大小中的最大值,则说明存在可选择的区块大小,使得每个数据项仅存储在一个完整区块中,执行步骤S205,若数据大小大于各可选区块大小中的最大值,则说明每项数据的大小已超过硬盘最大的支持区块大小,则需要多个区块才能容纳每个数据项,执行步骤S206。
S205:从大于数据大小的各可选区块大小中选取得到目标区块的区块大小。
当确定数据大小小于等于各可选区块大小中的最大值时,说明存在可选择的区块大小,使得每个数据项仅存储在一个完整区块中,从大于数据大小的各可选区块大小中选取得到目标区块的区块大小。从而实现了目标数据集中各项数据存储至硬盘连续单独区块,实现目标数据集写入硬盘的最佳化,使得在后续数据读取时能够从硬盘连续区块中直接读取,节省了数据读取的时间,提升了数据读写效率。
在本申请的一种具体实施方式中,步骤S205可以包括以下步骤:
步骤一:从大于数据大小的各可选区块大小中选取与数据大小差值最小的可选区块大小;
步骤二:将与数据大小差值最小的可选区块大小确定为目标区块的区块大小。
为方便描述,可以将上述两个步骤结合起来进行说明。
当确定数据大小小于等于各可选区块大小中的最大值时,从大于数据大小的各可选区块大小中选取与数据大小差值最小的可选区块大小,将与数据大小差值最小的可选区块大小确定为目标区块的区块大小。从而实现了目标数据集中各项数据存储至硬盘连续单独区块,使得在后续数据读取时能够从硬盘连续区块中直接读取,节省了数据读取的时间,提升了数据读写效率。
例如,当每项数据的大小为3136位元组时,若可供选择的区块大小为256位元组、512位元组、1024位元组、2048位元组、4096位元组等,则确定目标区块大小为4096位元
组。
S206:将各可选区块大小中的最大值确定为目标区块的区块大小。
当确定数据大小大于各可选区块大小中的最大值时,说明每项数据的大小已超过硬盘最大的支持区块大小,则需要多个区块才能容纳每个数据项,在这种情况下,将各可选区块大小中的最大值确定为目标区块的区块大小。以手写辨识数据集每项大小3200位元组为例,当可选区块大小中的最大值为2048位元组时,选择设定目标区块的区块大小为2048位元组,虽然每项数据存储至多个区块中,但在数据读取时仍是读取连续区块,效能依然很高。
S207:将目标数据集中各项数据存储至硬盘中连续且大小相同的各目标区块;其中,各目标区块的区块大小根据数据大小确定。
参见图3,图3为本申请实施例中数据读取方法的一种实施流程图,该方法可以包括以下步骤:
S301:接收数据读取命令。
在将目标数据集存储至加速器之后,当需要读取目标数据集时,向加速器发送数据读取命令,如CPU或GPU向加速器发送数据读取命令,加速器接收数据读取命令。
S302:从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据;其中,各目标区块的区块大小根据每项数据的数据大小确定,且目标数据集中各项数据的大小相同。
预先存储在加速器的目标数据集在存储前会通过对目标数据集进行预处理,使得目标数据集中各项数据的大小相同,并根据目标数据集中每项数据的数据大小确定目标区块的区块大小,从而使得目标数据集在连续区块中存储。加速器在接收到数据读取命令之后,从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据。通过从连续区块中读取目标数据集的各项数据,较大地提升了数据读取速率。
S303:将读取到的各项数据返回给数据读取命令的发送端。
在从硬盘中目标区块的区块大小的各连续区块中读取目标数据集的每项数据之后,将读取到的各项数据返回给数据读取命令的发送端,从而完成对目标数据集中各项数据的快速读取。
发送端一般为与加速器进行数据读写交互的主机CPU或主机GPU。
参见图4,图4为本申请实施例中数据读取方法的另一种实施流程图,该方法可以包括以下步骤:
S401:接收数据读取命令。
在本申请的一种具体实施方式中,步骤S401可以包括以下步骤:
接收读取用于人工智能模型训练的目标数据集的数据读取命令。
预先存储至加速器的目标数据集可以为用于人工智能模型训练的数据集,当需要训练人工智能模型时,向加速器发送读取用于人工智能模型训练的目标数据集的数据读取命令。加速器接收读取用于人工智能模型训练的目标数据集的数据读取命令。
S402:从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据;其中,各目标区块的区块大小根据每项数据的数据大小确定,且目标数据集中各项数据的大小相同。
在本申请的一种具体实施方式中,步骤S402可以包括以下步骤:
当目标区块的区块大小大于等于数据大小时,按照目标区块与每项数据的一对一关系,从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据。
当目标区块的区块大小大于等于数据大小时,说明在数据存储时每项数据存储在一个存储区块中,按照目标区块与每项数据的一对一关系,从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据,从而实现对目标数据集中各项数据的快速读取。
在本申请的一种具体实施方式中,步骤S402可以包括以下步骤:
当目标区块的区块大小小于数据大小时,按照目标区块与每项数据的多对一关系,从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据;其中,每项数据预先存储在相邻连续区块中。
当目标区块的区块大小小于数据大小时,说明在数据存储时每项数据存储在相邻连续区块中,按照目标区块与每项数据的多对一关系,从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据。从而实现对目标数据集中各项数据的连续读取,提升了数据读取效率。
S403:对读取到的各项数据进行第二预处理操作;其中,第二预处理操作为增加数据大小的预处理操作。
在读取到目标数据集的各项数据之后,对读取到的各项数据进行增加数据大小的第二预处理操作。CPU主机或GPU主机系统需要读取数据集时,加速器先由硬盘连续区块读取先前部分预处理数据集,再进行写入时没有进行的余下所有预处理,最后再将完成所有预处理的数据集传送回CPU主机或GPU主机系统,从而写入的部分预处理(不增加大小的处理)由CPU主机或GPU主机系统移到加速器,进一步释放CPU主机或GPU主机,实现更配合调整预处理的步骤与次序,减少数据存储时数据集的大小。
通过在数据存储时不进行增加数据大小的第一预处理操作,在读取数据时,才在加速器中对目标数据集中各项数据进行增加数据大小的第二预处理操作,此过程即体现了计算存储的概念。通过采用计算存储的策略使得数据集读写效能最佳化,也就是用“计算能力”换取空间减少,加速器实现归一化预处理实际上只是经由快速矩阵运算电路之后,再传回数据项给CPU主机或GPU主机,该过程仅需要额外花费少量的计算时间和电路成本,可大幅减少数据集的储存空间。
存储计算/计算存储(In-storage computing/computing storage)指的是更靠近在存储装置的地方,配置计算元件,而尽量不使用中央处理器的运算能力。以此加速整体存储的效能并卸载中央处理器。
在本申请的一种具体实施方式中,步骤S403可以包括以下步骤:
对读取到的各项数据进行归一化预处理操作。
在读取到目标数据集的各项数据之后,可以是对读取到的各项数据进行归一化预处理操作。例如,以手写辨识数据集说明,尚未进行归一化预处理的28x28阵列每个点的值,以0~255表示其颜色强度,这代表只需要1个位元组即可表示每个点的值,但归一化预处理后,每个点的浮点数值需要4个位元组。所以在预处理时,先不进行归一化预处理,将经过部分预处理后得到的数据集写入硬盘,但在读取时,在加速器中进行归一化预处理,然后再传回给CPU主机或GPU主机,如此一来,每笔数据项,相较于存储归一化后的浮点数值,本申请在硬盘储存时只用了1/4的大小,所占空间更小。
S404:将读取到的各项数据返回给数据读取命令的发送端。
在一种具体实例应用中,参见图5和图6,图5为本申请实施例中一种手写辨识数据集图档示意图,图6为本申请实施例中一种对手写辨识数据集图档归一化后的示意图。以著名入门的0~9手写辨识数据集(MNIST handwritten digit database)为例,图5左方是一个“9”的28x28的灰阶单色图片,右方显示28x28阵列每个图像点的值(尚未进行归一化预处理),以0~255表示其颜色强度。图6左方是图片,在经过归一化预处理(每个阵列元素除以255)之后,最后以一个28x28的浮点数阵列表示。
在一种具体实例应用中,参见图7,图7为本申请实施例中一种数据集图档分类示意图。即使是相对入门简单的MNIST数据集也有训练集60000项、测试数据10000项。需要对这70000笔进行预处理,每个数字的影像以28x28的阵列储存,而且每一个皆有标签(1abel)注记其真实的数字,如果标签范围是0~9,则只需多用一个位元组(Byte)储存标签注记。后续的模型训练会视训练过程频繁读取60000项(训练集),每批次训练完成后,也可能多次读取10000项(测试集)确认测试成功与否。
在一种具体实例应用中,CPU/GPU主机系统0/1/2分别对数据集存取加速器0/1/2进行写入与读取,多路资料集配合更进阶的资料集硬体核心架构写入流程如下:
(1)CPU/GPU主机系统0/1/2把“完全不”进行预处理的数据集直接传送给对应的数据集存取加速器0/1/2;
(2)资料集存取加速器0/1/2收到写入数据集,先进行不会导致数据变大的部分预处理,如此写入的数据集所需存储空间最小,此步骤将CPU/GPU的部分预处理运算转移到加速器;
(3)判断经过部分预处理的数据集的大小(含标签或档名);
(4)依据每项数据集的大小,设定硬盘最佳化的最小读取区块;
(5)将所有数据集依序写入硬盘连续区块。
多路数据集加速读取流程如下:
(1)由传输介质收到CPU/GPU主机系统0/1/2设定读取预处理的设定与参数,此步骤通常只需执行一次便适用于所有数据项的读取;
(2)由传输介质收到CPU/GPU主机系统读取预处理数据集的命令;
(3)依据命令由硬盘连续区块读取之前写入的部分预处理数据集;
(4)进行写入时未处理的“余下所有预处理”,完成人工智能模型完成的全部预处理;
(5)回传已完成全部预处理的数据集到CPU/GPU主机系统0/1/2。
相应于上面的数据存储方法实施例,本申请还提供了一种数据存储装置,下文描述的数据存储装置与上文描述的数据存储方法可相互对应参照。
参见图8,图8为本申请实施例中一种数据存储装置的结构框图,该装置可以包括:
数据集接收模块81,用于接收待存储的目标数据集;
数据大小获取模块82,用于获取目标数据集中每项数据的数据大小;其中,目标数据集中各项数据的大小相同;
数据存储模块83,用于将目标数据集中各项数据存储至硬盘中连续且大小相同的各目标区块;其中,各目标区块的区块大小根据数据大小确定。
由上述技术方案可知,通过依据待存储的目标数据集中每项数据的固定大小设定硬盘的目标区块的区块大小,保证目标数据集中各项数据存储至硬盘连续区块。使得数据存储得到较大优化,在数据读取时能够从硬盘连续区块中直接读取,节省了数据读取的时间,提升了数据读写效率。
在本申请的一种具体实施方式中,该装置还可以包括:
第一预处理模块,用于在接收待存储的目标数据集之后,获取目标数据集中每项数据的数据大小之前,对目标数据集进行第一预处理操作;其中,第一预处理操作为未增加数据大小的预处理操作。
在本申请的一种具体实施方式中,第一预处理模块具体为对目标数据集进行除归一化预处理之外的预处理操作的模块。
在本申请的一种具体实施方式中,数据集接收模块81具体为接收待存储的用于人工智能模型训练的目标数据集的模块。
在本申请的一种具体实施方式中,数据大小获取模块82具体为获取目标数据集中由数据本身、数据标签以及数据档名构成的每项数据的数据大小的模块。
在本申请的一种具体实施方式中,该装置还可以包括区块大小确定模块,区块大小确定模块:
可选区块大小获取子模块,用于获取预设的各可选区块大小;
区块大小选取子模块,用于从大于数据大小的各可选区块大小中选取得到目标区块的区块大小。
在本申请的一种具体实施方式中,区块大小选取子模块包括:
区块大小选取单元,用于从大于数据大小的各可选区块大小中选取与数据大小差值最小
的可选区块大小;
区块大小确定单元,用于将与数据大小差值最小的可选区块大小确定为目标区块的区块大小。
在本申请的一种具体实施方式中,该装置还可以包括:
判断模块,用于在获取预设的各可选区块大小之后,判断数据大小是否小于等于各可选区块大小中的最大值;
区块大小选取子模块具体为当确定数据大小小于等于各可选区块大小中的最大值时,从大于数据大小的各可选区块大小中选取得到目标区块的区块大小的模块;
区块大小确定模块具体为当确定数据大小大于各可选区块大小中的最大值时,将各可选区块大小中的最大值确定为目标区块的区块大小的模块。
相应于上面的数据读取方法实施例,本申请还提供了一种数据读取装置,下文描述的数据读取装置与上文描述的数据读取方法可相互对应参照。
参见图9,图9为本申请实施例中一种数据读取装置的结构框图,该装置可以包括:
读取命令接收模块91,用于接收数据读取命令;
数据读取模块92,用于从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据;其中,各目标区块的区块大小根据每项数据的数据大小确定,且目标数据集中各项数据的大小相同;
数据返回模块93,用于将读取到的各项数据返回给数据读取命令的发送端。
在本申请的一种具体实施方式中,该装置还可以包括:
第二预处理模块,用于在从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据之后,将读取到的各项数据返回给数据读取命令的发送端之前,对读取到的各项数据进行第二预处理操作;其中,第二预处理操作为增加数据大小的预处理操作。
在本申请的一种具体实施方式中,第二预处理模块具体为对读取到的各项数据进行归一化预处理操作的模块。
在本申请的一种具体实施方式中,读取命令接收模块91具体为接收读取用于人工智能模型训练的目标数据集的数据读取命令的模块。
在本申请的一种具体实施方式中,数据读取模块92具体为当目标区块的区块大小大于等于数据大小时,按照目标区块与每项数据的一对一关系,从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据的模块。
在本申请的一种具体实施方式中,数据读取模块92具体为当目标区块的区块大小小于数据大小时,按照目标区块与每项数据的多对一关系,从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据的模块;其中,每项数据预先存储在相邻连续区块中。
相应于上面的方法实施例,参见图10,图10为本申请所提供的电子设备的示意图,该设备可以包括:
存储器332,用于存储计算机程序;
处理器322,用于执行计算机程序时实现上述方法实施例的数据存储方法或数据读取方法的步骤。
具体的,请参考图11,图11为本实施例提供的一种电子设备的具体结构示意图,该电子设备可因配置或性能不同而产生比较大的差异,可以包括处理器(central processing units,CPU)322(例如,一个或一个以上处理器)和存储器332,存储器332存储有一个或一个以上的计算机应用程序342或数据344。其中,存储器332可以是短暂存储或持久存储。存储在存储器332的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对数据处理设备中的一系列指令操作。可选地,处理器322可以设置为与存储器332通信,在电子设备301上执行存储器332中的一系列指令操作。
电子设备301还可以包括一个或一个以上电源326,一个或一个以上有线或无线网络接口350,一个或一个以上输入输出接口358,和/或,一个或一个以上操作系统341。
上文所描述的数据存储方法或数据读取方法中的步骤可以由电子设备的结构实现。
相应于上面的方法实施例,本申请还提供一种非易失性可读存储介质,非易失性可读存储介质上存储有计算机程序,计算机程序被处理器执行时可实现如下步骤:
接收待存储的目标数据集;获取目标数据集中每项数据的数据大小;其中,目标数据集中各项数据的大小相同;将目标数据集中各项数据存储至硬盘中连续且大小相同的各目标区块;其中,各目标区块的区块大小根据数据大小确定。
该非易失性可读存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
对于本申请提供的非易失性可读存储介质的介绍请参照上述方法实施例,本申请在此不做赘述。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置、设备及非易失性存储介质而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的技术方案及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。
本申请实施例提供的方案可应用于数据处理技术领域,在本申请实施例中,采用接收待存储的目标数据集;获取目标数据集中每项数据的数据大小,其中,目标数据集中各项数据的大小相同;将目标数据集中各项数据存储至硬盘中连续且大小相同的各目标区块,其中,各目标区块的区块大小根据数据大小确定,取得了节省数据读取的时间,提升数据读写效率的技术效果。
Claims (22)
- 一种数据存储方法,其特征在于,包括:接收待存储的目标数据集;获取所述目标数据集中每项数据的数据大小,其中,所述目标数据集中各项数据的大小相同;将所述目标数据集中各项数据存储至硬盘中连续且大小相同的各目标区块,其中,各所述目标区块的区块大小根据所述数据大小确定。
- 根据权利要求1所述的数据存储方法,其特征在于,在接收待存储的目标数据集之后,获取所述目标数据集中每项数据的数据大小之前,还包括:对所述目标数据集进行第一预处理操作,其中,所述第一预处理操作为未增加数据大小的预处理操作。
- 根据权利要求2所述的数据存储方法,其特征在于,对所述目标数据集进行第一预处理操作,包括:对所述目标数据集进行除归一化预处理之外的预处理操作。
- 根据权利要求2所述的数据存储方法,其特征在于,所述目标数据集中的数据类型为图片的情况下,对所述目标数据集进行第一预处理操作,还包括:对所述图片进行如下至少之一处理操作:水平翻转、垂直翻转以及旋转。
- 根据权利要求1所述的数据存储方法,其特征在于,接收待存储的目标数据集,包括:接收待存储的用于人工智能模型训练的目标数据集。
- 根据权利要求5所述的数据存储方法,其中,所述目标数据集包括:训练集、验证集和测试集,其中,所述训练集,设置为对所述人工智能模型进行拟合;所述验证集,设置为对拟合完成的所述人工智能模型进行预测;所述测试集,设置为对最终的额所述人工智能模型进行评估。
- 根据权利要求1所述的数据存储方法,其特征在于,获取所述目标数据集中每项数据的数据大小,包括:获取所述目标数据集中由数据本身、数据标签以及数据档名构成的每项数据的数据大小。
- 根据权利要求1至7任一项所述的数据存储方法,其特征在于,还包括根据所述 数据大小确定所述目标区块的区块大小的过程,根据所述数据大小确定所述目标区块的区块大小的过程,包括:获取预设的各可选区块大小;从大于所述数据大小的各所述可选区块大小中选取得到所述目标区块的区块大小。
- 根据权利要求8所述的数据存储方法,其特征在于,从大于所述数据大小的各所述可选区块大小中选取得到所述目标区块的区块大小,包括:从大于所述数据大小的各所述可选区块大小中选取与所述数据大小差值最小的可选区块大小;将与所述数据大小差值最小的可选区块大小确定为所述目标区块的区块大小。
- 根据权利要求8所述的数据存储方法,其特征在于,在获取预设的各可选区块大小之后,还包括:判断所述数据大小是否小于等于各所述可选区块大小中的最大值;若所述数据大小小于等于各所述可选区块大小中的最大值,则执行所述从大于所述数据大小的各所述可选区块大小中选取得到所述目标区块的区块大小的步骤;若所述数据大小大于各所述可选区块大小中的最大值,则将各所述可选区块大小中的最大值确定为所述目标区块的区块大小。
- 根据权利要求1所述的数据存储方法,其特征在于,接收待存储的目标数据集,包括:通过加速器接收待存储的所述目标数据集,其中,待存储的所述目标数据集是通过中央处理器或图形处理器发送至所述加速器的。
- 一种数据读取方法,其特征在于,包括:接收数据读取命令;从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据,其中,各所述目标区块的区块大小根据每项数据的数据大小确定,且所述目标数据集中各项数据的大小相同;将读取到的各项数据返回给所述数据读取命令的发送端。
- 根据权利要求12所述的数据读取方法,其特征在于,所述发送端为与加速器进行数据读写交互的中央处理器或图形处理器。
- 根据权利要求10所述的数据读取方法,其特征在于,在从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据之后,将读取到的各项数据返回给所述数据读取命令的发送端之前,还包括:对读取到的各项数据进行第二预处理操作,其中,所述第二预处理操作为增加数据大小的预处理操作。
- 根据权利要求14所述的数据读取方法,其特征在于,对读取到的各项数据进行第二预处理操作,包括:对读取到的各项数据进行归一化预处理操作。
- 根据权利要求12所述的数据读取方法,其特征在于,接收数据读取命令,包括:接收读取用于人工智能模型训练的目标数据集的数据读取命令。
- 根据权利要求12所述的数据读取方法,其特征在于,从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据,包括:当所述目标区块的区块大小大于等于所述数据大小时,按照目标区块与每项数据的一对一关系,从硬盘中连续且大小相同的各所述目标区块中读取所述目标数据集的每项数据。
- 根据权利要求12所述的数据读取方法,其特征在于,从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据,包括:当所述目标区块的区块大小小于所述数据大小时,按照目标区块与每项数据的多对一关系,从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据;其中,每项数据预先存储在相邻连续区块中。
- 一种数据存储装置,其特征在于,包括:数据集接收模块,设置为接收待存储的目标数据集;数据大小获取模块,设置为获取所述目标数据集中每项数据的数据大小,其中,所述目标数据集中各项数据的大小相同;数据存储模块,设置为将所述目标数据集中各项数据存储至硬盘中连续且大小相同的各目标区块,其中,各所述目标区块的区块大小根据所述数据大小确定。
- 一种数据读取装置,其特征在于,包括:读取命令接收模块,设置为接收数据读取命令;数据读取模块,设置为从硬盘中连续且大小相同的各目标区块中读取目标数据集的每项数据,其中,各所述目标区块的区块大小根据每项数据的数据大小确定,且所述目标数据集中各项数据的大小相同;数据返回模块,设置为将读取到的各项数据返回给所述数据读取命令的发送端。
- 一种电子设备,其特征在于,包括:存储器,设置为存储计算机程序;处理器,设置为执行所述计算机程序时实现如权利要求1至11任一项所述数据存储方法或权利要求12至18任一项所述数据读取方法的步骤。
- 一种非易失性可读存储介质,其特征在于,所述非易失性可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至11任一项所述数据存储方法或权利要求12至18任一项所述数据读取方法的步骤。
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211219584.2A CN115291813B (zh) | 2022-10-08 | 2022-10-08 | 一种数据存储方法及装置、数据读取方法及装置、设备 |
| CN202211219584.2 | 2022-10-08 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024074042A1 true WO2024074042A1 (zh) | 2024-04-11 |
Family
ID=83834698
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/094310 Ceased WO2024074042A1 (zh) | 2022-10-08 | 2023-05-15 | 一种数据存储方法及装置、数据读取方法及装置、设备 |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN115291813B (zh) |
| WO (1) | WO2024074042A1 (zh) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115291813B (zh) * | 2022-10-08 | 2023-04-07 | 苏州浪潮智能科技有限公司 | 一种数据存储方法及装置、数据读取方法及装置、设备 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102063379A (zh) * | 2010-12-28 | 2011-05-18 | 天津市亚安科技电子有限公司 | 一种flash存储器的数据存储方法 |
| CN112394876A (zh) * | 2019-08-14 | 2021-02-23 | 深圳市特思威尔科技有限公司 | 大文件存储/读取方法、存储/读取装置和计算机设备 |
| CN115291813A (zh) * | 2022-10-08 | 2022-11-04 | 苏州浪潮智能科技有限公司 | 一种数据存储方法及装置、数据读取方法及装置、设备 |
-
2022
- 2022-10-08 CN CN202211219584.2A patent/CN115291813B/zh active Active
-
2023
- 2023-05-15 WO PCT/CN2023/094310 patent/WO2024074042A1/zh not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102063379A (zh) * | 2010-12-28 | 2011-05-18 | 天津市亚安科技电子有限公司 | 一种flash存储器的数据存储方法 |
| CN112394876A (zh) * | 2019-08-14 | 2021-02-23 | 深圳市特思威尔科技有限公司 | 大文件存储/读取方法、存储/读取装置和计算机设备 |
| CN115291813A (zh) * | 2022-10-08 | 2022-11-04 | 苏州浪潮智能科技有限公司 | 一种数据存储方法及装置、数据读取方法及装置、设备 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN115291813A (zh) | 2022-11-04 |
| CN115291813B (zh) | 2023-04-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11967151B2 (en) | Video classification method and apparatus, model training method and apparatus, device, and storage medium | |
| JP2022058915A (ja) | 画像認識モデルをトレーニングするための方法および装置、画像を認識するための方法および装置、電子機器、記憶媒体、並びにコンピュータプログラム | |
| EP4220555B1 (en) | Training method and apparatus for image segmentation model, image segmentation method and apparatus, and device | |
| WO2022116856A1 (zh) | 一种模型结构、模型训练方法、图像增强方法及设备 | |
| WO2021135254A1 (zh) | 车牌号码识别方法、装置、电子设备及存储介质 | |
| EP4390725A1 (en) | Video retrieval method and apparatus, device, and storage medium | |
| CN113343958B (zh) | 一种文本识别方法、装置、设备及介质 | |
| CN111832666B (zh) | 医疗影像数据扩增方法、装置、介质及电子设备 | |
| CN112434746B (zh) | 基于层次化迁移学习的预标注方法及其相关设备 | |
| CN119810428B (zh) | 轻量型实时目标检测方法、装置、服务器及存储介质 | |
| CN118470581A (zh) | 面向目标检测任务的无人机跨模态语义通信方法及系统 | |
| KR20230103790A (ko) | 이질적 이미지의 딥러닝 분석을 위한 적대적 학습기반 이미지 보정 방법 및 장치 | |
| WO2024074042A1 (zh) | 一种数据存储方法及装置、数据读取方法及装置、设备 | |
| CN114742999A (zh) | 一种深度三网络半监督语义分割方法和系统 | |
| CN110717405B (zh) | 人脸特征点定位方法、装置、介质及电子设备 | |
| CN116051686B (zh) | 图上文字擦除方法、系统、设备及存储介质 | |
| CN115188000A (zh) | 基于ocr的文本识别方法、装置、存储介质及电子设备 | |
| CN113596576A (zh) | 一种视频超分辨率的方法及装置 | |
| CN114756425B (zh) | 智能监控方法、装置、电子设备及计算机可读存储介质 | |
| CN117459773A (zh) | 一种跨设备内容同步的智能电视图像显示方法及相关装置 | |
| CN116189228A (zh) | 牛脸检测方法、装置、计算机设备及可读存储介质 | |
| CN114693985A (zh) | 用于图像分割的优化装置和方法以及预测装置和方法 | |
| CN119762510B (zh) | 医学图像分割方法、装置、设备及存储介质 | |
| US20230237780A1 (en) | Method, device, and computer program product for data augmentation | |
| TWI851149B (zh) | 資料擴增裝置、方法以及非揮發性電腦可讀取記錄媒體 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23874237 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 23874237 Country of ref document: EP Kind code of ref document: A1 |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04/09/2025) |