Apache PIG interview questions

11. Relational Operators in PIG
Relational Operators in PIG
COGROUP
CROSS
DISTINCT
FILTER
FOREACH
GROUP
JOIN (inner)
JOIN (outer)
LIMIT
LOAD
ORDER
SAMPLE
SPLIT
STORE
STREAM
UNION

12. How to use ‘foreach’ operation in pig scripts?
Foreach takes a set of expressions and applies them to every record in the data pipeline
A = load ‘input’ as (user:chararray, id:long, address:chararray, phone:chararray,preferences:map[]);
B = foreach A generate user, id;

13. Why should we use ‘filters’ in pig scripts?
Filters are similar to where clause in SQL. Filter which contain predicate.If that predicate evaluates to true for a given record, that record will be passed down the pipeline.

14. Why should we use ‘orderby’ keyword in pig scripts?
The order statement sorts your data for you, producing a total order of your output data.The syntax of order is similar to group. You indicate a key or set of keys by which you wish to order your data
input2 = load ‘daily’ as (exchanges, stocks);
grpds = order input2 by exchanges;

15. Why should we use ‘distinct’ keyword in pig scripts?
The distinct statement is very simple. It removes duplicate records. It works only on entire records, not on individual fields:
input2 = load ‘daily’ as (exchanges, stocks);
grpds = distinct exchanges;

Author: user

Leave a Reply