Kerberos Authentication

 Kerberos Authentication in a Hadoop Cluster



What is Kerberos?

Kerberos is a network authentication protocol. 
It is designed to provide strong authentication for client/server applications. In Hadoop, there is two type of authentication.

1. Simple
2. Kerberos 
Simple is the default authentication that Hadoop uses, for Kerberos you need to setup Kerberos in your Hadoop cluster.

Setting up Kerberos

For setting up Kerberos on cluster one node act as KDC(Key Distribution Centre) server and other act as client or workstations. Kerberos is sensitive in some cases so there is some prerequisite that we need to take care before installing Kerberos on all the nodes.

1. Install NTP: In a Kerberos client/server setup, all the servers need to be in the same time zone with synchronized date and time as Kerberos is a time sensitive protocol because its authentication is based partly on the timestamps of the tickets. Follow below command for NTP installation on a Centos machine.


yum -y install ntp

ntpdate 0.rhel.pool.ntp.org

servive ntpd.service start


2. DNS Lookup: Configure DNS correctly and test that forward and reverse DNS lookup work. Follow below commands for DNS setup.

yum install bind-utils

For Forward Lookup:

nslookup <fqdn>

For Reverse Lookup:

nslookup <Public Ip>

KDC Installation:

Packages required for kdc setup are.

1. kkrb5-server : kdc server 
2. krb5-libs : admin package
3. krb5-workstation - client packege

Run the following command for installing these packages.

yum -y install krb5-server krb5-libs krb5-workstation


Configuring KDC.

This setup will require one server that will act as Kerberos kdc server and all other nodes in a server will act as clients and kdc server can also act as a client in a single node setup. Let's assume that we have two server setup, one server act as kdc server and other one is the client.

Let's assume that our domain name is (random.com) and fully qualified hostname for our kdc server is.

kerberos.random.com

There are few files that need to change according to your configuration. Generally, kdc is installed on the path "/var/kerberos/".

1. krb5.conf - It is present on path "/var/kerberos/krb5kdc". Ensure that your default realm is set to your domain name in the capital case.

[logging]

 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]

 default_realm = RANDOM.COM
 dns_lookup_realm = false
 dns_lookup_kdc = false
 ticket_lifetime = 24h
 renew_lifetime = 7d
 forwardable = true

[realms]

 EXAMPLE.COM = {
  kdc = kerberos.random.com
  admin_server = kerberos.random.com
 }

[domain_realm]

 .random.com = RANDOM.COM
 random.com = RANDOM.COM 


Here in realm section, we set kdc to the hostname of the kdc server.
The first entry in domain_realm  maps all hosts under the domain .random.com into the RANDOM.COM realm, but not the host with the name random.com That host is matched by the second entry. Remember this file signifies client side so it should be present on all clients.

2. kdc.conf - It is present on path "/var/kerberos/krb5kdc/" .Change the realm name here.

default_realm = RANDOM.COM

[kdcdefaults] 

 kdc_ports = 88
 kdc_tcp_ports = 88

[realms]

 RANDOM.COM = {
  #master_key_type = aes256-cts
  acl_file = /var/kerberos/krb5kdc/kadm5.acl
  dict_file = /usr/share/dict/words
  admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
  supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal
 }

 3. kadm5.acl- It is present on path "/var/kerberos/krb5kdc/". Also, change the realm.

 */admin@RANDOM.COM     *

 This line signifies that any principal in the RANDOM.COM realm with an admin instance has all administrative privileges(* signifies admin privileges). For more examples refer this https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/kadm5_acl.html.


 Kdc Server is now configured Let's create Database for KDC to hold all the principals, below is the command for creating the database for your realm, this command will ask you for the password which it will stash so that you don't have to enter it in future.


kdb5_util create -r RANDOM.COM -s


Now on KDC server create an admin user with the following command.

kadmin.local




It will open kadmin bash. 


kadmin.local:  addprinc root/admin

kadmin.local:  ktadd -kt /var/kerberos/krb5kdc/kadm5.keytab kadmin/admin

kadmin.local:  ktadd -kt /var/kerberos/krb5kdc/kadm5.keytab kadmin/changepw

kadmin.local:  exit


"addprinc root/admin" will add a principal with root permissions, 

"ktadd -kt <keytab> <principal>" will add the principal to the keytab file which is used by the client. After that just start the Kerberos with the following commands.


service krb5kdc start

service kadmin start

Now that our kdc server is up and running lets setup our clients which in our case is a Hadoop cluster.


Note that while configuring Hadoop in secure mode, each user and service needs to be authenticated by Kerberos in order to use Hadoop services.

It is recommended that each Hadoop service runs as a different Unix user eg for HDFS and YARN we can have hdfs and yarn as users. 
For running Hadoop service daemons in Hadoop in secure mode, Kerberos principals are required. Each service reads authenticate information saved in the keytab file with permissions of only that particular service user.

Configuring HDFS:

For hdfs we will be creating keytab files each for namenode, secondary namenode and datanode. For creating keytab follow steps.

1. Connect to KDC server and  add principals for the respective service eg. for namenode

kadmin -p root/admin

This command will connect you to kdc server and ask for the password (root/admin is principal we added above) and will open kadmin bash.

2. Add a principal nn/full.qualified.domain.name@RANDOM.COM to kdc database with below step, also add principal for host

addprinc -randkey nn/full.qualified.domain.name@RANDOM.COM

addprinc -randkey host/full.qualified.domain.name@RANDOM.COM

Note: -randkey will generate password automatically for the principal

3. Create a keytab file for namenode.


ktadd -kt /etc/security/keytab/nn.service.keytab  nn/full.qualified.domain.name@RANDOM.COM

ktadd -kt /etc/security/keytab/nn.service.keytab  host/full.qualified.domain.name@RANDOM.COM

ktadd -kt /etc/security/keytab/nn.service.keytab  HTTP/full.qualified.domain.name@RANDOM.COM

4. You can verify the keytab content with below command.


klist -e -k -t /etc/security/keytab/nn.service.keytab

Give this keytab file a permission for that particular user(In this case its hdfs)

Do similar  steps for Secondary Namenode and Datanode.

Configure hdfs-site.xml for kerberos.

Namenode Properties:



Note: Hadoop will automatically infer _HOST from the server. 

Journalnode Properties:



Datanode Properties:



Yarn(ResourceManger) Properties:


Here in principal name "nn/_HOST@REALM.TLD" nn represent the username, we need to map this name with our hdfs user and respectively, for others, this can be done by adding following properties.



Similarly, add this for every principal associated with that particular user.


Configure core-site.xml for Kerberos.


To verify everything is working fine, enter the following command which should give you directory content.

hadoop fs -ls /

This is all floks. Please share your views in comments below

Kerberos Authentication Kerberos Authentication Reviewed by Unknown on 2:54 AM Rating: 5

No comments:

Powered by Blogger.