Apache PIG interview questions

6. What are the complex data types in PIG
Complex Data Types
tuple
An ordered set of fields.
(19,2)
bag
An collection of tuples.
{(19,2), (18,1)}
map
A set of key value pairs.
[open#apache]

7. Whether pig latin language is case-sensitive or not?
The names (aliases) of relations A, B, and C are case sensitive.
The names (aliases) of fields f1, f2, and f3 are case sensitive.
Function names PigStorage and COUNT are case sensitive.
Keywords LOAD, USING, AS, GROUP, BY, FOREACH, GENERATE, and DUMP are case insensitive. They can also be written as load, using, as, group, by, etc.
In the FOREACH statement, the field in relation B is referred to by positional notation ($0).

8. How should ‘load’ keyword is useful in pig scripts?
First step in dataflow language we need to specify the input,which is done by using ‘load’ keyword.load looks for your data on HDFS in a tab-delimited file using the default load function ‘PigStorage’.suppose if we want to load data from hbase,we would use the loader for hbase
‘HBaseStorage’.
example of pigstorage loader
A = LOAD ‘/home/ravi/work/flight.tsv’ using PigStorage (‘t’) AS (origincode:chararray, destinationcode:chararray, origincity:chararray, destinationcity:chararray, passengers:int, seats:int, flights:int, distance:int, year:int, originpopulation:int, destpopulation:int);
example of hbasestorage loader
x= load ‘a’ using HBaseStorage();

9. How should ‘store’ keyword is useful in pig scripts?
After we have completed process,then result should write into somewhere,Pig provides the store statement for this purpose
store processed into ‘/data/ex/process’;

10. What is the purpose of ‘dump’ keyword in pig?
dump display the output on the screen
dump ‘processed’

Author: user

Leave a Reply