Ubuntu下安装Hadoop2.6.0和Hive详细教程
本文目录导读:1、Hadoop2.6.0的安装与配置在进行大数据开发时,使用Apache Hadoop框架是非常常见的。而在Hadoop中,Apache Hive作为一种数据仓库工具,可以帮助我们进行大规模数据的查询、分析和处理。本文将介绍如何在Ubuntu系统中安装Hadoop 2.6.0以及Hive,并搭建一个简单的集群环境。Ha……
- 本文目录导读:
- 1、Hadoop2.6.0的安装与配置
在进行大数据开发时,使用Apache Hadoop框架是非常常见的。而在Hadoop中,Apache Hive作为一种数据仓库工具,可以帮助我们进行大规模数据的查询、分析和处理。本文将介绍如何在Ubuntu系统中安装Hadoop 2.6.0以及Hive,并搭建一个简单的集群环境。
Hadoop2.6.0的安装与配置
首先,在Ubuntu系统中下载并解压缩最新版本的Hadoop 2.x.x。然后,在/etc/hadoop/目录下创建core-site.xml、yarn-site.xml、mapred-site.xml和hdfs-site.xml文件,并按照以下方式进行配置。
– core-site.xml:
“`
fs.defaultFShdfs://localhost:9000/io.file.buffer.size131072
– yarn-site.xml:
yarn.resourcemanager.hostnamelocalhost
yarn.nodemanager.aux-services.mapreduce_shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
<!--yarn.scheduler.minimum-allocation-mb1024
yarn.scheduler.maximum-allocation-mb8192-->
yarn.nodemanager.resource.memory-mb16384
mapreduce.framework.nameyarn
The address of the RM web application.yarn.webapp.address.rm${yarn.resourcemanager.hostname}:8088
<!---->
<!--true-->
<!--kerberos
-->
<!--
HTTP/_HOST@HADOOP.COM
-->
<!--
/etc/security/keytabs/spnego.service.keytab
<!--
0.0.0.0:8188/cluster/apps/
${entity-type}/${entity-id}
-->
Timeline store configuration properties.
LevelDB Timeline Store specific configuration properties.
If not specified, then a local directory is used as the Leveldb store.
LevelDB Options:
leveldb.timetolive.seconds – default:259200 (3 days)
The number of seconds for which a timeline entity or domain
should be retained in the Leveldb store.
leveldb.max.open.files – default:32
The maximum number of files that can be open by the Leveldb store.
leveldb.write.buffer.size – default:4MB
The size of the write buffer for the Leveldb store. Smaller values
cause more frequent flushes to disk and can result in improved I/O
performance. Larger values consume more memory and provide better I/O
throughput at the expense of higher memory usage.
file:///tmp/leveldb-timeline-store
2592000000
The Timeline Service writes some data to HDFS, including:
* Application Timeline Data (ATS) events for all running applications,
including MRv1, MRv2, Tez, Hive etc.
* User-defined timeline entities and metrics that are pushed via YARN API.
By default, ATS data is stored under /tmp/hadoop-yarn/timeline on HDFS.
You can customize this location by setting these properties:
yarn.timeline-service.store-classorg.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore
–>
<!---->
<!--yarn.timeline-service.generic-application-history.enabled-->
<!-- true-->
<!-- -->
yarn.log-aggregation-enabletrueyarn.log.server.urlyarn.timeline-service.address${yarn.timeline-service.hostname}:10200yarn.timeline-service.webapp.address${yarn.timeline-service.hostname}:8188yarn.nodemanager.log-dirs/hadoop/yarn/local/logs
<!---->
<!--8192-->
<!--4.0f-->
<!-- -->
16384.0f
8.0f
Classpath for the MR history server.
The address of the MR history server web application.
<!--localhost:19888/jobhistory
The maximum number of completed applications to keep in the store.
86400000 (1day)
MR Job History Store Configuration Properties.
If not specified, then a local directory is used as the job history store.
Mapred-DB options:
mapred.jobtracker.retirejobs.cache.size - default:1000
The maximum number of jobs that can be kept in cache for quick access.
mapred.jobtracker.retirejob.interval - default:24hours (86400000 ms)
The interval after which a job is retired from memory and moved to disk.
mapred.job.tracker.persist.jobstatus.active - default:true
Whether to persist the job status of active jobs.
mapred.job.tracker.persist.jobstatus.hours - default:24hours (86400000 ms)
The interval after which the job status of active jobs is persisted.
org.apache.hadoop.mapreduce.v2.hs.HistoryServerLeveldbStateStoreService
/tmp/hadoop-yarn/staging/history/done_intermediate
/tmp/hadoop-yarn/staging/history/done_intermediate
${yarn.timeline-service.hostname}:19888
LevelDB Job History Store specific configuration properties.
If not specified, then a local directory is used as the Leveldb store.
The number of seconds for which a completed job should be retained
in the Leveldb store.
performance