This repo contains Azure management scripts for certain purposes. The scripts here will be the "experimental" version, the final ones should be found in the Iaas Applications repo (e.g. the final Apache Ignite script is pushed there).
This script installs Apache Ignite on an HDInsight cluster, regardless how many your HDInsight cluster has.
The cluster is designed to run as a ScriptAction AFTER provisioning the cluster; as it needs information about the name & worker nodes.
The following snippet shows how to pass the arguments for a cluster with a name: myHDICluster. The cluster consists of 2 Head nodes and 2 Worker nodes.
./install-apache-ignite.sh wasb://mycontainer@myblob.blob.core.windows.net admin AmbariPwd_01 100.8.17.254 myHDICluster adminssh 10.0.0.1 10.0.0.2 10.0.0.4 10.0.0.9Running the script as a ScriptAction or manually is simple, all you need to do is submit the correct arguments separated by a space
- The wasb storage URL of which you want Apache Ignite to interface with
- The URL should be as follows, you can find it in your
HDFS core-siteconfiguration:
wasb://container@account.blob.core.windows.net
- The Ambari Admin username
- The Ambari Admin password
- The Ambari Admin name & password are needed to automatically push Ignite's configuration into HDFS
core-site.xmlvia Ambari'sconfig.shcommand.
- The IP address of your namenode where Ambari server is running
- This could be the IP address of the headnode0 or headnode1
- I haven't tested it with the node's FQDN, but you can try; the worst case scenario is to push the correct configuration again.
- The Ambari cluster name
- This is the name you see on the top left after you login to Ambari web console
- The SSh username of your account
- Why is this needed? because we need to give a read/write/execute permission for you on
&IGNITE_HOME/workdirectory; otherwise the Ignite process will fail during initiation.
- The IP addresses of ALL your headnodes & worker nodes separated by SPACE
- why is this needed? The script configures the Apache Ignite
default-config.xmland enables cluster discovery - What is cluster discovery? Cluster discovery enables all of the Ignite processes running on your nodes to sync with each other
- check the Ignite process is running on your nodes, for example using:
ps -aef | grep default-config.xml
- Check the Ambari HDFS configuration by searching for
igfs - Using HDFS commands:
- Browse your blob storage:
hdfs dfs -ls wasb://container@account.blob.core.windows.net/HdiNotebooks - Browse Ignite:
hdfs dfs -ls igfs:///HdiNotebooksThe bove commands should return the same results - Using Spark-Shell, open
spark-shelland run an example as follows:
val textdata = sc.textFile("wasb://container@account.blob.core.windows.net/Folder/textFile.ext")
val count = textdata.count
val first = textdata.first
val dataWithoutHeader = textdata.filter(line => line != first)
val datacount = dataWithoutHeader.count
val igtextdata = sc.textFile("igfs:///Folder/textFile.ext")
val igcount = igtextdata.count
val igfirst = igtextdata.first
val igdataWithoutHeader = igtextdata.filter(line => line != first)
val igdatacount = igdataWithoutHeader.countIf the above expirements above work, then Congratulations, Apache Ignite is acting as a secondary in-memory file system for your blob. You can start testing its performance against pulling directly from your blob storage.