Ubuntu下安装Hadoop2.6.0和Hive详细教程

本文目录导读:1、Hadoop2.6.0的安装与配置在进行大数据开发时,使用Apache Hadoop框架是非常常见的。而在Hadoop中,Apache Hive作为一种数据仓库工具,可以帮助我们进行大规模数据的查询、分析和处理。本文将介绍如何在Ubuntu系统中安装Hadoop 2.6.0以及Hive,并搭建一个简单的集群环境。Ha……

在进行大数据开发时,使用Apache Hadoop框架是非常常见的。而在Hadoop中,Apache Hive作为一种数据仓库工具,可以帮助我们进行大规模数据的查询、分析和处理。本文将介绍如何在Ubuntu系统中安装Hadoop 2.6.0以及Hive,并搭建一个简单的集群环境。

Hadoop2.6.0的安装与配置

首先,在Ubuntu系统中下载并解压缩最新版本的Hadoop 2.x.x。然后,在/etc/hadoop/目录下创建core-site.xml、yarn-site.xml、mapred-site.xml和hdfs-site.xml文件,并按照以下方式进行配置。

– core-site.xml:

“`

fs.defaultFShdfs://localhost:9000/io.file.buffer.size131072

– yarn-site.xml:

yarn.resourcemanager.hostnamelocalhost

yarn.nodemanager.aux-services.mapreduce_shuffle.class

org.apache.hadoop.mapred.ShuffleHandler

<!--yarn.scheduler.minimum-allocation-mb1024

yarn.scheduler.maximum-allocation-mb8192

-->

yarn.nodemanager.resource.memory-mb16384

mapreduce.framework.nameyarn

The address of the RM web application.yarn.webapp.address.rm${yarn.resourcemanager.hostname}:8088

<!---->

<!--true-->

<!--kerberos

-->

<!--

HTTP/_HOST@HADOOP.COM

-->

<!--

/etc/security/keytabs/spnego.service.keytab

<!--

0.0.0.0:8188/cluster/apps/

${entity-type}/${entity-id}

-->

Timeline store configuration properties.

LevelDB Timeline Store specific configuration properties.

If not specified, then a local directory is used as the Leveldb store.

LevelDB Options:

leveldb.timetolive.seconds – default:259200 (3 days)

The number of seconds for which a timeline entity or domain

should be retained in the Leveldb store.

leveldb.max.open.files – default:32

The maximum number of files that can be open by the Leveldb store.

leveldb.write.buffer.size – default:4MB

The size of the write buffer for the Leveldb store. Smaller values

cause more frequent flushes to disk and can result in improved I/O

performance. Larger values consume more memory and provide better I/O

throughput at the expense of higher memory usage.

file:///tmp/leveldb-timeline-store

2592000000

Ubuntu下安装Hadoop2.6.0和Hive详细教程

The Timeline Service writes some data to HDFS, including:

* Application Timeline Data (ATS) events for all running applications,

including MRv1, MRv2, Tez, Hive etc.

* User-defined timeline entities and metrics that are pushed via YARN API.

By default, ATS data is stored under /tmp/hadoop-yarn/timeline on HDFS.

You can customize this location by setting these properties:

yarn.timeline-service.store-classorg.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore

–>

<!---->

<!--yarn.timeline-service.generic-application-history.enabled-->

<!-- true-->

<!-- -->

yarn.log-aggregation-enabletrueyarn.log.server.urlyarn.timeline-service.address${yarn.timeline-service.hostname}:10200yarn.timeline-service.webapp.address${yarn.timeline-service.hostname}:8188

yarn.nodemanager.log-dirs/hadoop/yarn/local/logs

<!---->

<!--8192-->

<!--4.0f-->

<!-- -->

16384.0f

8.0f

Classpath for the MR history server.

The address of the MR history server web application.

<!--localhost:19888/jobhistory

The maximum number of completed applications to keep in the store.

86400000 (1day)

MR Job History Store Configuration Properties.

If not specified, then a local directory is used as the job history store.

Mapred-DB options:

mapred.jobtracker.retirejobs.cache.size - default:1000

The maximum number of jobs that can be kept in cache for quick access.

mapred.jobtracker.retirejob.interval - default:24hours (86400000 ms)

The interval after which a job is retired from memory and moved to disk.

mapred.job.tracker.persist.jobstatus.active - default:true

Whether to persist the job status of active jobs.

mapred.job.tracker.persist.jobstatus.hours - default:24hours (86400000 ms)

The interval after which the job status of active jobs is persisted.

org.apache.hadoop.mapreduce.v2.hs.HistoryServerLeveldbStateStoreService

/tmp/hadoop-yarn/staging/history/done_intermediate

/tmp/hadoop-yarn/staging/history/done_intermediate

${yarn.timeline-service.hostname}:19888

LevelDB Job History Store specific configuration properties.

If not specified, then a local directory is used as the Leveldb store.

The number of seconds for which a completed job should be retained

in the Leveldb store.

performance