Hadoop Streaming:
Hadoop spilling is an utility that accompanies the Hadoop circulation. This utility permits you to make and run Map/Reduce occupations with any executable or script as the mapper and/or the reducer.
Example Using Python:
For Hadoop spilling, we are considering the word-number issue. Any employment in Hadoop must have two stages: mapper and reducer. We have composed codes for the mapper and the reducer in python script to run it under Hadoop. One can likewise compose the same in Perl and Ruby.
Mapper Phase Code
!/usr/bin/python import sys # Input takes from standard input for myline in sys.stdin: # Remove whitespace either side myline = myline.strip() # Break the line into words words = myline.split() # Iterate the words list for myword in words: # Write the results to standard output print '%s\t%s' % (myword, 1)Ensure this record has execution consent (chmod +x/home/master/hadoop-1.2.1/mapper.py).
Reducer Phase Code
#!/usr/bin/python from operator import itemgetter import sys current_word = "" current_count = 0 word = "" # Input takes from standard input for myline in sys.stdin: # Remove whitespace either side myline = myline.strip() # Split the input we got from mapper.py word, count = myline.split('\t', 1) # Convert count variable to integer try: count = int(count) except ValueError: # Count was not a number, so silently ignore this line continue if current_word == word: current_count += count else: if current_word: # Write result to standard output print '%s\t%s' % (current_word, current_count) current_count = count current_word = word # Do not forget to output the last word if needed! if current_word == word: print '%s\t%s' % (current_word, current_count)Save the mapper and reducer codes in mapper.py and reducer.py in Hadoop home catalog. Ensure these documents have execution authorization (chmod +x mapper.py and chmod +x reducer.py). As python is space delicate so the same code can be download from the underneath connection.
How Streaming Works:
At the point when a script is indicated for mappers, every mapper assignment will dispatch the script as a different procedure when the mapper is instated. As the mapper assignment runs, it changes over its inputs into lines and nourish the lines to the standard data (STDIN) of the procedure. Meanwhile, the mapper gathers the line-situated yields from the standard yield (STDOUT) of the procedure and changes over every line into a key/esteem pair, which is gathered as the yield of the mapper. As a matter of course, the prefix of a line up to the main tab character is the key and whatever is left of the line (barring the tab character) will be the worth. On the off chance that there is no tab character in the line, then the whole line is considered as the key and the quality is invalid. In any case, this can be tweaked, according to one need.
At the point when a script is determined for reducers, every reducer undertaking will dispatch the script as a different procedure, then the reducer is introduced. As the reducer undertaking runs, it changes over its data key/values sets into lines and bolsters the lines to the standard info (STDIN) of the procedure. Meanwhile, the reducer gathers the line-situated yields from the standard yield (STDOUT) of the procedure, changes over every line into a key/esteem pair, which is gathered as the yield of the reducer. Of course, the prefix of a line up to the primary tab character is the key and whatever remains of the line (barring the tab character) is the quality. In any case, this can be altered according to particular prerequisites.
We provide customized online training for hadoop in usa with real time experts on your flexible timings with professionals. For more information visit@ hadoop online training