javaAPI操作
创建maven工程并将jar包导入pom文件获取FileSystem的几种方式实现数据增、删、改、查、上传、下载、小文件合并涉及的Class
使用API对HDFS上的目录和数据进行增、删、改、查操作
创建maven工程并将jar包导入pom文件
由于cdh版本的所有的软件涉及版权的问题,所以并没有将所有的jar包托管到maven仓库当中去,而是托管在了CDH自己的服务器上面,所以我们默认去maven的仓库下载不到,需要自己手动的添加repository去CDH仓库进行下载,以下两个地址是官方文档说明,请仔细查阅
cdh_vd_cdh5_maven_repo.html
cdh_vd_cdh5_maven_repo_514x.html
jar包:
<repositories>
<repository>
<id>cloudera
</id
>
<url>https
://repository
.cloudera
.com
/artifactory
/cloudera
-repos
/</url
>
</repository
>
</repositories
>
<dependencies>
<dependency>
<groupId>org
.apache
.Hadoop
</groupId
>
<artifactId>Hadoop
-client
</artifactId
>
<version>2.6.0-mr1
-cdh5
.14.0</version
>
</dependency
>
<dependency>
<groupId>org
.apache
.Hadoop
</groupId
>
<artifactId>Hadoop
-common
</artifactId
>
<version>2.6.0-cdh5
.14.0</version
>
</dependency
>
<dependency>
<groupId>org
.apache
.Hadoop
</groupId
>
<artifactId>Hadoop
-hdfs
</artifactId
>
<version>2.6.0-cdh5
.14.0</version
>
</dependency
>
<dependency>
<groupId>org
.apache
.Hadoop
</groupId
>
<artifactId>Hadoop
-mapreduce
-client
-core
</artifactId
>
<version>2.6.0-cdh5
.14.0</version
>
</dependency
>
<!-- https
://mvnrepository
.com
/artifact
/junit
/junit
-->
<dependency>
<groupId>junit
</groupId
>
<artifactId>junit
</artifactId
>
<version>4.11</version
>
<scope>test
</scope
>
</dependency
>
<dependency>
<groupId>org
.testng
</groupId
>
<artifactId>testng
</artifactId
>
<version>RELEASE
</version
>
</dependency
>
</dependencies
>
<build>
<plugins>
<plugin>
<groupId>org
.apache
.maven
.plugins
</groupId
>
<artifactId>maven
-compiler
-plugin
</artifactId
>
<version>3.0</version
>
<configuration>
<source>1.8</source
>
<target>1.8</target
>
<encoding>UTF
-8</encoding
>
<!-- <verbal>true</verbal
>-->
</configuration
>
</plugin
>
<plugin>
<groupId>org
.apache
.maven
.plugins
</groupId
>
<artifactId>maven
-shade
-plugin
</artifactId
>
<version>2.4.3</version
>
<executions>
<execution>
<phase>package</phase
>
<goals>
<goal>shade
</goal
>
</goals
>
<configuration>
<minimizeJar>true</minimizeJar
>
</configuration
>
</execution
>
</executions
>
</plugin
>
<!-- <plugin>
<artifactId>maven
-assembly
-plugin
</artifactId
>
<configuration>
<descriptorRefs>
<descriptorRef>jar
-with
-dependencies
</descriptorRef
>
</descriptorRefs
>
<archive>
<manifest>
<mainClass>cn
.itcast
.Hadoop
.db
.DBToHdfs2
</mainClass
>
</manifest
>
</archive
>
</configuration
>
<executions>
<execution>
<id>make
-assembly
</id
>
<phase>package</phase
>
<goals>
<goal>single
</goal
>
</goals
>
</execution
>
</executions
>
</plugin
>-->
</plugins
>
</build
>
获取FileSystem的几种方式
第一种获取FileSystem类的方式
Configuration configuration = new Configuration();
FileSystem fileSystem = FileSystem.get(new URI("hdfs://节点IP:8020"), configuration);
第二种获取FileSystem类的方式
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS","hdfs://节点IP:8020");
FileSystem fileSystem = FileSystem.get(new URI("/"), configuration);
第三种获取FileSystem类的方式
Configuration configuration = new Configuration();
FileSystem fileSystem = FileSystem.newInstance(new URI("hdfs://节点IP:8020"), configuration);
第四种获取FileSystem类的方式
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS","hdfs://节点IP:8020");
FileSystem fileSystem =FileSystem.newInstance(configuration);
实现数据增、删、改、查、上传、下载、小文件合并
boolean mkdirs
= fileSystem
.mkdirs(new Path("/java"));
if (mkdirs
) {
System
.out
.println("创建文件夹成功");
} else {
System
.out
.println("创建文件夹失败");
}
fileSystem
.createNewFile(new Path("/Test001/abc.txt"));
boolean delete
= fileSystem
.delete(new Path("/java"), true);
if (delete
) {
System
.out
.println("删除成功");
} else {
System
.out
.println("删除失败");
}
boolean rename
= fileSystem
.rename(new Path("/java"), new Path("/newjava"));
if (rename
) {
System
.out
.println("修改成功");
} else {
System
.out
.println("修改失败");
}
FileStatus
[] fileStatuses
= fileSystem
.listStatus(new Path("/"));
for (FileStatus file
: fileStatuses
) {
System
.out
.println(file
.getPath().getName());
System
.out
.println(file
.getPermission() + "\t" + file
.getOwner() + "\t" + file
.getGroup() + "\t"
+ new Date(file
.getModificationTime()) + "\t" + file
.getReplication() + "\t" + file
.getBlockSize() + "\t"
+ file
.getPath());
}
fileSystem
.copyFromLocalFile(new Path("E:\\123.txt"), new Path("/"));
fileSystem
.close();
fileSystem
.copyToLocalFile(new Path("/123.txt"), new Path("E:\\test"));
fileSystem
.close();
FSDataOutputStream append
= fileSystem
.append(new Path("/Test001/abc.txt"));
append
.write("今天天气真好".getBytes());
FileSystem fileSystem
= FileSystem
.get(new URI("hdfs://节点IP:8020"), new Configuration(),"root");
FSDataOutputStream outputStream
= fileSystem
.create(new Path("/bigfile.xml"));
LocalFileSystem local
= FileSystem
.getLocal(new Configuration());
FileStatus
[] fileStatuses
= local
.listStatus(new Path("file:///windows本地路径"));
for (FileStatus fileStatus
: fileStatuses
) {
FSDataInputStream inputStream
= local
.open(fileStatus
.getPath());
IOUtils
.copy(inputStream
,outputStream
);
IOUtils
.closeQuietly(inputStream
);
}
IOUtils
.closeQuietly(outputStream
);
local
.close();
fileSystem
.close();
涉及的Class
Configuration:该类的对象封转了客户端或者服务器的配置;FileSystem:该类的对象是一个文件系统对象,可以用该对象的一些方法来对文件进行操作,通过 FileSystem 的静态方法 get 获得该对象。FileSystem fs = FileSystem.get(conf)get 方法从 conf 中的一个参数 fs.defaultFS 的配置值判断具体是什么类型的文件系统。如果我们的代码中没有指定 fs.defaultFS,并且工程 classpath下也没有给定相应的配置,conf中的默认值就来自于Hadoop的jar包中的core-default.xml , 默 认 值 为 : file:/// , 则 获 取 的 将 不 是 一 个DistributedFileSystem 的实例,而是一个本地文件系统的客户端对象