From my home directory (type cd ~ to get to your home directory):

richard@ubuntu:~$ ls -l
drwxrwxr-x 6 richard richard 1024 Nov 20 15:25 cassandra

richard@ubuntu:~$ cd cassandra
richard@ubuntu:~/cassandra$ ls -l
drwxrwxr-x 9 richard richard 1024 Nov 20 15:04 dsc-cassandra-2.0.1

richard@ubuntu:~/cassandra$ cd dsc-cassandra-2.0.1
richard@ubuntu:~/cassandra/dsc-cassandra-2.0.1$ ls -l
drwxr-xr-x 2 richard richard 1024 Sep 24 14:14 bin

All cassandra commands are in the /bin directory but you must execute them from the directory above as /bin/whatever (where whatever is cassandra or CSQSH and so on).

This is in the command window where I started cassandra which is the server running in the foreground (bin/cassandra -f):

INFO 10:47:40,179 Node localhost/127.0.0.1 state jump to normal
INFO 10:47:40,191 Startup completed! Now serving reads.
INFO 10:47:40,393 Starting listening for CQL clients on localhost/127.0.0.1:9042…
INFO 10:47:40,537 Using TFramedTransport with a max frame size of 15728640 bytes.
INFO 10:47:40,538 Binding thrift service to localhost/127.0.0.1:9160
INFO 10:47:40,550 Using synchronous/threadpool thrift server on localhost : 9160
INFO 10:47:40,550 Listening for thrift clients…

Now you need to start a second command window by right clicking on the command window icon.

Don’t use Cli as it has been replaced by CQLSH as follows:

richard@ubuntu:~/cassandra/dsc-cassandra-2.0.1/bin$ cassandra-cli
cassandra-cli: command not found
richard@ubuntu:~/cassandra/dsc-cassandra-2.0.1/bin$ cd ..
richard@ubuntu:~/cassandra/dsc-cassandra-2.0.1$ bin/cassandra-cli
Connected to: “Test Cluster” on 127.0.0.1/9160
Welcome to Cassandra CLI version 2.0.1

Please consider using the more convenient cqlsh instead of CLI
CQL3 is fully backwards compatible with Thrift data; see http://www.datastax.com/dev/blog/thrift-to-cql3

Type ‘help;’ or ‘?’ for help.
Type ‘quit;’ or ‘exit;’ to quit.

So use cqlsh

richard@ubuntu:~/cassandra/dsc-cassandra-2.0.1$ bin/cqlsh
Connected to Test Cluster at localhost:9160.
[cqlsh 4.0.1 | Cassandra 2.0.1 | CQL spec 3.1.1 | Thrift protocol 19.37.0]
Use HELP for help.
cqlsh>

Here I will explain the classic word counting example used in demonstrating how Hadoop functions, in this case using the sixty-six books of the Protestant canon. Hadoop does something called “mapping” by indexing each word. This is a big deal because mapping changes the unstructured text below into a structured index which Hadoop can reduce.

It is time for me to speak of the books of the New Testament.
Receive only four evangelists:
Matthew, then Mark, to whom, having added Luke
As third, count John as fourth in time,
But first in height of teachings,
For I call this one rightly a son of thunder,
Sounding out most greatly with the word of God.

Today we will map using white space as the delimiter between words dropping puncation altogether. Since Hadoop indexes by Key/Value pairs the words are the unique key and their value is the number of ocurrances. In the example “it” occurs once, “of” occurs five times.

Stop to think about that for a minute. After mapping instead of blah, blah, blah, now there is a neat, orgainized set of unique words with a count of the number of times they occur.

Hadoop takes it one step further. Since there are sixty-six books Hadoop allocates 66 processors to parse all the books at the same time. Then Hadoop “reduces” or combines the indexes from 66 into 1 automagiclly.

Now it is up to you to analyze this index of unique words based on their number of occurances to gain business insight.

Okay, let me simplifiy this. Say you have 6 billion tweets and you need to know who is trending, Justin Bieber or Philip Hoffman. Hadoop does exactly the same thing but you only need to check which count is larger to find who is trending.

Now that you have a thorough understanding of the functioning of Hadoop, how would you use it to gain business insight?