21. How PIG integrate with Mapreduce
As a Pig Latin program is executed, each statement is parsed in turn. If there are syntax errors or other (semantic) problems, such as undefined aliases, the interpreter will halt and display an error message. The interpreter builds a logical plan for every relational operation, which forms the core of a Pig Latin program. The logical plan for the statement is added to the logical plan for the program so far, and then the interpreter moves on to the next statement. It’s important to note that no data processing takes place while the logical plan of the program is being constructed. For example, consider again the Pig Latin program from the first example:
— max_temp.pig: Finds the maximum temperature by year
records = LOAD ‘input/ncdc/micro-tab/sample.txt’
AS (year:chararray, temperature:int, quality:int);
filtered_records = FILTER records BY temperature != 9999 AND
quality IN (0, 1, 4, 5, 9);
grouped_records = GROUP filtered_records BY year;
max_temp = FOREACH grouped_records GENERATE group,
MAX(filtered_records.temperature);
DUMP max_temp;
When the Pig Latin interpreter sees the first line containing the LOAD statement, it confirms that it is syntactically and semantically correct, and adds it to the logical plan, but it does not load the data from the file (or even check whether the file exists). Indeed, where would it load it? Into memory? Even if it did fit into memory, what would it do .The trigger for Pig to start execution is the DUMP statement. At that point, the logical plan is compiled into a physical plan and executed.
22. What is Flatten does in Pig?
Syntactically flatten similar to UDF, but it’s powerful than UDFs. The main aim of Flatten is change the structure of tuple and bags,
23. How to debugging in Pig?
Describe : Review the schema.
Explain : logical, Physical and MapReduce execution plans.
Illustrate : Step by step execution of the each step execute in this operator. These commands used to debugging the pig latin script
24. Rack Topology Script
Topology scripts are used by Hadoop to determine the rack location of nodes. This information is used by Hadoop to replicate block data to redundant racks.
25. Co-Group in PIG
Pig will group the two tables and then join the two tables on the grouped column.