Apache Pig is a platform for analysing high level datasets with a complex representation. The most important property of Pig programs is that they support parallelization.
Why Do We Need Apache Pig?
Programmers who are not so comfortable at Java normally used to have a hustle working with Hadoop, especially while performing any MapReduce tasks. Apache Pig is like a dream come true for such developers.
- Using Pig Latin, programmers can use map reduce easily without having to type complicated java codes.
- Apache Pig uses multi-query approach, thereby reducing the length of codes.
- Pig Latin is SQL-like language and it is easy to learn Pig if you are familiar with SQL.
- Apache Pig provides many built-in operators to support data operations like joins, filters, ordering, etc. In addition, it also provides nested data types like tuples, bags, and maps that are missing from MapReduce.
Features of Pig
Apache Pig comes with the following features −
- Rich set of operators − It gives us some inbuilt functions to perform join, sort, filer, etc.
- Ease of programming − Pig Latin is similar to SQL and it is very easy to write Pig scripts if your are from an SQL background.
- Optimization opportunities − The tasks in Apache Pig optimize their execution automatically, so the programmers need to focus only on semantics of the language.
- Extensibility − Using the existing operators, users can develop their own functions to read, process, and write data.
- UDF’s − Pig allows a developer to develop user defined function in other languages that can be invoked in Pig scripts.
- Handles all kinds of data − Apache Pig analyzes all kinds of data, both structured as well as unstructured. It stores the results in HDFS.