Ansible (DevOps): Creating Roles to Set Up Hadoop

DevOps is a bunch of programming advancement practices that consolidate Software Development (Dev) and Information Technology Operations (Ops). In this blog we will be using the tool Ansible - IT configuration and Deployment Tool to automate a Hadoop cluster.

Ansible (DevOps) Creating Roles to Set Up Hadoop

What is Hadoop?

Hadoop is an assortment of open-source programming utilities that utilizing numerous computers associated through a network takes care of the issues involving huge information and computation. It goes under Apache Software Foundation.

What issue do we solve by automating Hadoop?

With the advancement in technology, time is becoming a major issue. Several things are done manually which takes time. In today’s era, almost all companies are facing the problem to store and process their large amount of data. Suppose a new system comes into the industry, we have to deploy all the codes according to our needs that already exists in the industry and which takes a lot of time to do all the changes. These things can be now done through automation.

Why do we need to automate Hadoop?

Deploying an infrastructure grade Hadoop cluster is a monumental task and can take a lot of time to deploy as every system needs to be configured for its specific purposes like data nodes for storage, nodes for job scheduling and processing, etc. Our expert big data hadoop developer will implement HDFS that is mainly used for data storage.

Hadoop Architecture

Ansible Architectural Diagram:

The software and hardware requirements of this project are as follows:

Sr. No.	SOFTWARE	HARDWARE
1.	RHEL 7.5 and above	A minimum of 1 GHz processor
2.	YML, JINJA	A minimum of 1 GB RAM
3.	Ansible, HTTP, Hadoop	No strict specifications about hard disk

METHODOLOGY

Hadoop:

The primary infrastructure software services aimed to automate are as follows:

Step 1: Install the ansible package in Linux using yum. In this, we are installing an ansible package using the “yum install ansible” command. Before this, yum is to be configured.

Step 2: Then make Ansible galaxy of Hadoop Cluster in some different folders like playbooks. This Hadoop cluster is implemented to solve the big data problem using the command “Ansible-galaxy init hadoop cluster”.

Step 3: Accordingly put the client IP in the host’s file so that it can read the IP from there and so that playbook can be automatically run in that system. The location of a host file would be “/etc/ansible/hosts”.

Step 4: Configure the ansible file according to the need in ansible.cfg file.

Step 5: Write a Hadoop cluster role to set-up a Master Node, Slave Node.

Step 6: Then create a site.yml file in which write a code to import the role Hadoop.

Step 7: Execute the file using the command “ansible-playbook site.yml”.

Step 8: In the Client node role: we copy the Java and Hadoop setup files to the respective nodes.

Step 9: In the Master node role: we copy the master node configuration ie core-site.xml and hdfs-site.xml on the master node machine.

Step 10: In the Slave node role: we copy the slave-node configuration ie core-site.xml and hdfs-site.xml on the master node machine.

Step 11: Now we have to run the following command:

On Name Node - “hadoop-daemon.sh start namenode”

On Data Node -“hadoop-daemon.sh start datanode”

Step 12: On the client, the machine checks the Hadoop setup by running the following command:

“hadoop hdfs admin -report”

Step 13: To upload a file using the command: “ hadoop fs -put filename / ”

Step 14: To upload a file using the command: “ hadoop fs cat /filename ”

All the respective yml files are listed below:

NOTE: STRICT IDENTATIONS ARE TO BE USED

Site.yml

-name: deploy slave node import_playbook: sn.yml -name: deploy client node import_playbook: cn.yml </ https://www.aegissofttech.com/articles/ansible-creating-roles-to-set-up-hadoop.html> <div id="link11"></div> <h4>Sn.yml</h4> <xmp> -hosts: dn roles: - role: slavenode

Inventory

[dn] DATA NODE 192.168.56.115 ansible_user=root ansible_password=redhat #slave1 192.168.56.116 ansible_user=root ansible_password=redhat #slave2 192.168.56.116 ansible_user=root ansible_password=redhat #slave No [nn] MASTER NODE 192.168.56.114 ansible_user=root ansible_password=redhat #master

Client role: main.yml

-command: "rpm -ivh hadoop-1.2.1-1.x86_64.rpm --force" -command: "rpm -ivh jdk-8u171-linux-x64.rpm --force" -template: src: ".bashrc" dest: "/root/.bashrc" -template: src: "core-site.xml.j2" dest: "/etc/hadoop/core-site.xml"

Master Node: main.yml

-command: "rpm -ivh hadoop-1.2.1-1.x86_64.rpm --force" -command: "rpm -ivh jdk-8u171-linux-x64.rpm --force" -template: src: ".bashrc" dest: "/root/.bashrc" -file: path: /master state: directory 21 -template: src: "hdfs-site.xml" dest: "/etc/hadoop/hdfs-site.xml" -template: src: "core-site.xml.j2" dest: "/etc/hadoop/core-site.xml" -command: "hadoop namenode -format -force" #- command: "hadoop-daemon.sh start namenode"

Core-Site.xml

<configuration> <property> <name>fs.default.name</name> {% for i in groups["nn"] %} <value>hdfs://{{ i }}:9001</value> {% endfor %} </property> </configuration>

SCREENSHOTS:

HADOOP

CLUSTERS SUMMARY

FILE UPLOADS

Table of Content

What is Hadoop

Issue solve by automating Hadoop

Why need to automate Hadoop

Hadoop Architecture

Ansible Architectural Diagram

Software and Hardware requirements

Methodology

Hadoop

All respective yml files

Site.yml

Sn.yml

Inventory

Client role: main.yml

Master Node: main.yml

Empower your Business with Team Aegis, CONNECT NOW!

Scale your Business with our Software Services Now!

Let's Talk

How to get top N words count using Big Data Hadoop MapReduce paradigm with developer’s assistance

Aegis big data hadoop developers are posting this article to let the development community know how to get top N words frequency count via distinct articals

Hadoop As A Service In The Cloud- Introduction

Setting up and implementing Hadoop services in a cost effective way in near to impossible for small and medium sized organizations.

Understanding Hadoop Ecosystem and Modernism of Current World

In recent years, data science has acquired momentum as an integrative field of study due to the massive quantities of data we generate regularly, which is estimated to be more than 2.5 quintillion bytes in size. The area of research makes use of contemporary methods and technologies to extract useful insights from organized and unstructured data, uncover interesting patterns, and make decisions based on that knowledge. Because data science makes use of both organized and unorganized data, the data utilized for analytics may be sourced from a variety of application areas and be made accessible in many different forms.