This case study is built on a modified [example Kite SDK web application] (https://github.com/kite-sdk/kite-spring-hbase-example) that uses Spring MVC and HBase. This application is a web caching app that can be used to fetch web pages and store their content in a local HBase cluster. The cached web page can be viewed, and metadata about that page, like size, time to fetch, and outlinks can be queried.
Download Tomcat:
wget http://apache.mesi.com.ar/tomcat/tomcat-7/v7.0.57/bin/apache-tomcat-7.0.57.tar.gz
tar -zxf apache-tomcat-7.0.57.tar.gzDownload the source code for this project by cloning the Git repository:
git clone https://github.com/hadoop-security/kite-spring-hbase-example.git
cd kite-spring-hbase-exampleDownload the Kite SDK CLI tool:
wget http://central.maven.org/maven2/org/kitesdk/kite-tools-cdh5/0.17.1/kite-tools-cdh5-0.17.1.tar.gz
tar -zxf kite-tools-cdh5-0.17.1.tar.gzBefore building and deploying the application you need to configure your HBase cluster with the necessary security settings. In particular, you need to:
-
Enable Kerberos authentication for Hadoop (HDFS and YARN)
-
Enable Kerberos authentication for HBase
-
Enable HBase authorization
-
Create a Kerberos principal to perform HBase admin functions:
kadmin addprinc hbase quit -
Create a Kerberos principal for the application and export it to a keytab:
kadmin addprinc web-page-snapshots ktadd -k app.keytab web-page-snapshots quit -
Place the app.keytab file in the home directory of the user that will run the app.
-
Grant the application principal create table permissions:
kinit hbase hbase shell grant 'web-page-snapshots', 'RWXCA' quit -
Create the HBase datasets:
kinit -kt ~/app.keytab web-page-snapshots export KITE_USER_CLASSPATH=$HBASE_CONF_DIR kite-tools-cdh5-0.17.1/bin/kite-dataset create dataset:hbase:<ZK HOSTS>:<ZK PORT>/webpagesnapshots.WebPageSnapshotModel -s src/main/avro/hbase-models/WebPageSnapshotModel.avsc kite-tools-cdh5-0.17.1/bin/kite-dataset create dataset:hbase:<ZK HOSTS>:<ZK PORT>/webpageredirects.WebPageRedirectModel -s src/main/avro/hbase-models/WebPageRedirectModel.avscReplace
<ZK HOSTS>with the comma seperated list of ZooKeeper server hostnames and<ZK PORT>with the ZooKeeper port (typically 2181). Make sureHBASE_CONF_DIRis set to the location of your HBase client configuration files. -
Grant Alice and Bob access to the public tables/columns:
hbase shell grant 'alice', 'RW', 'webpagesnapshots', 'content', 'public' grant 'alice', 'RW', 'webpagesnapshots', '_s' grant 'alice', 'RW', 'webpagesnapshots', 'meta' grant 'alice', 'RW', 'webpagesnapshots', 'observable' grant 'alice', 'RW', 'webpageredirects' grant 'alice', 'RW', 'managed_schemas' grant 'bob', 'RW', 'webpagesnapshots', 'content', 'public' grant 'bob', 'RW', 'webpagesnapshots', '_s' grant 'bob', 'RW', 'webpagesnapshots', 'meta' grant 'bob', 'RW', 'webpagesnapshots', 'observable' grant 'bob', 'RW', 'webpageredirects' grant 'bob', 'RW', 'managed_schemas' quit -
Grant Alice and Bob access to their private columns:
hbase shell grant 'alice', 'RW', 'webpagesnapshots', 'content', 'alice' grant 'bob', 'RW', 'webpagesnapshots', 'content', 'bob' quit -
Edit
~/apache-tomcat-7.0.57/conf/tomcat-users.xmland add the following before the</tomcat-users>closing tag:<user name="alice" password="secret" roles="user" /> <user name="bob" password="secret" roles="user" /> -
Create
~/apache-tomcat-7.0.57/bin/setenv.shwith the following content:#!/bin/bash export HADOOP_CONF_DIR=/etc/hadoop/conf export HBASE_CONF_DIR=/etc/hbase/conf export CLASSPATH=${HADOOP_CONF_DIR}:${HBASE_CONF_DIR}Replace
/etc/hadoop/confand/etc/hbase/confwith your Hadoop and HBase configuration directories if you use another location. -
Edit
src/main/resources/hbase-prod.propertiesand set the following values:hbase.zk.host=<ZK HOSTS> hbase.zk.port=<ZK PORT> application.kerberos.principal=web-page-snapshots application.kerberos.keytab=/home/<USER>/app.keytabReplace
<ZK HOSTS>with the comma seperated list of ZooKeeper server hostnames and<ZK PORT>with the ZooKeeper port (typically 2181). Also replace<USER>with the username that app will be running as. -
Add the following parameters to
hbase-site.xmlon all of the HBase nodes to enable user impersonation by theweb-page-snapshotsprincipal:<property> <name>hadoop.proxyuser.web-page-snapshots.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.web-page-snapshots.hosts</name> <value>*</value> </property> -
Restart HBase
There are two build profiles in the application: dev and prod.
The default build profile is dev, and in that mode, it will be built so that an in-process HBase cluster is launched and configured on startup. That cluster will re-use the same data directory across restarts, so data remains persistent. This enables us to quickly build a web application on this framework without having to install a Hadoop and HBase for dev purposes.
The prod build profile will construct a WAR that won't launch an in-process HBase cluster on startup. Before deployment, you must modify the configuration file src/main/resources/hbase-prod.properties with the appropriate properties.
You can build the production WAR file using the following command:
mvn -Pprod clean installDeploy the built WAR file (webapps/ROOT.war) to your production Tomcat instance.
Go the appropriate URL, for example:
http://app.example.com:8080/home
Once there, you can take snapshots, and view older snapshots of web pages. By default, the contents of snapshots will be made private and only visable to the user who made the snapshot. You can check the "Make snapshot public?" checkbox before taking a snapshot if you want a public snapshot.
When viewing a snapshot, you can always see the metadata. If you don't have permissions to see the content, you'll see the message "Insufficient privilleges to view snapshot" in the content pane of the page.
To run locally in dev mode, simply run the following maven command, which launches an in process Tomcat to run the app in (we have pretty high memory settings since not only is this running Tomcat, but it's also launching an HDFS and HBase cluster in the app):
env MAVEN_OPTS="-Xmx2048m -XX:MaxPermSize=256m" mvn clean install tomcat7:runOnce launched, you can view the web application in your browser by going to the appropriate URL. For example:
Once there, you can take snapshots, and view older snapshots of web pages.
Note: HBase security is not enabled when running in development mode. You need to deploy to a cluster to test the security features.