BigData

Posts

apache flume to fetch twitter data

we are using Apache flume to fetch tweeter data and store it in to HDFS. so lets get started , flume version I am using here is apache flume-1.4.0 Download apache flume - on your unix terminal type this command wget http://apache.mirrors.hoobly.com/flume/1.4.0/apache-flume-1.4.0-bin.tar.gz create directory - "flume-ng" create directory in your /usr/lib folder, type this command sudo mkdir /usr/lib/flume-ng Now copy the flume tar file you have downloaded to your usr/lib/flume-ng directory, which you just have created. command is sudo cp –r apache-flume-1.4.0-bin.tar.gz /usr/lib/flume-ng/ check if your tar file is copied to your flume-ng directory , give command ls /usr/lib/flume-ng/ untar the tar file in flume-ng directory , but first you need to change your directory from /Home to /usr/lib/flume-ng/ cd /usr/lib/flume-ng/ and now untar the file with the command sudo tar -xvf /usr/lib/flume-ng/apache-flume-1.4.0-bin.tar.gz ...

Apache Hadoop pseudo distributed cluster on Ubuntu virtual Machine

Hi, Here i am going to show you , How to setup up a pseudo distributed (Single Node) hadoop cluster on Ubuntu VM. Prerequisite - understanding of Hadoop VM ware player ubuntu Things you need VMWare Player , Just a simple google will take you to vmware web site, which will give you info and download link to the latest version of the Virtual player. Download and install it. ubuntu VM Image , again google will help you here, (the version I used is ubuntu-14.04). and off course a Laptop :-). setting up VM- after installing vmware player and extracting ubuntu to Directory of your choice, double click the vmware icon on your Desktop, click on open virtual machine and go to the directory where you have extracted ubuntu, you will find a ubuntu.vmx file, double click on it. and then Play the VM. (you can later edit the settings of VM, if you want to). Updating ubuntu Go to terminal and give this command - ' $>sudo apt-ge...

BigData

Search This Blog

Posts

Twitter sentiment analysis in Hadoop using apache Pig

apache flume to fetch twitter data

Apache Hadoop pseudo distributed cluster on Ubuntu virtual Machine