ANSIBLE PLAYBOOK TO CONFIGURE HADOOP CLUSTER.
Today we are living in a world where millions of users are online and consuming and creating a lots of data like we are talking about quintillion bytes. So for companies like Facebook, Google, they manage petabytes of data everyday and for that they use technologies like hadoop, spark but deploying these technology is also a big problem because we have to deploy a large scale of systems and then configure these technologies on them.
That’s where automation comes in play, we can use automation tools like Ansible. Here i am gonna show how to deploy hadoop cluster using Ansible.
Here i am going to use two systems : one master node and one slave node.
we retrieve their ips and put them in our configuration file.
now we write a ansible playbook to configure master node and slave node : we call them name node and data nodes.
There are some tasks which need to be done on both systems so we write them first.
Then we write tasks to be run on namenode.
then we write tasks to be done on data node.
Now we run this playbook from our system and to run it we use
ansible play-book file_name.yml
now we can see all the tasks are done by ansible in both the systems. Now our hadoop cluster is ready.