HDFS¶
DSS can connect to filesystems based on the “Hadoop Filesystem” API to
- Read and write datasets
- Read and write managed folders
Compatible filesystems¶
DSS can read/write from any kind of Hadoop Filesystem and has been tested with the following URL schemes:
hdfs://
maprfs://
s3a://
wasb://
adl://
Note
DSS collectively refers all “Hadoop Filesystem” URIs as the “HDFS” dataset, even though it supports more than hdfs://
URIs
Using multiple Hadoop filesystems¶
All Hadoop clusters define a ‘default’ filesystem, which is traditionally a HDFS on the cluster, but access to HDFS filesystems on other clusters, or even to different filesystem types like cloud storages (S3, Azure Blob storage, Google Cloud Storage) is also possible. The prime benefit of framing other filesystem as Hadoop filesystem is that it enables the use of the Hadoop I/O layers, and as a corrolary, of important Hadoop file formats : Parquet and ORC.
For more information about connecting to multiple Hadoop filesystems and connection details, see hadoop/multi-hdfs.