FIT5148 – Distributed and Big Data Processing
Question # 40106 | Writing | 6 years ago |
---|
$40 |
---|
FIT5148 – Distributed and Big Data Processing
Semester 1, 2016
Big Data Report (40%)
Student ID:
Part1 (20%)
Tasks
Allocated Marks
Student Marks
Comments
Task 1
a) Upload the files in Pig and Hive as specified in our tutorials.
(Provide screenshots)
1 mark
1 mark
Task 2
a) Find out the player/s who had the highest GWG using pig and hive (you can repeat a player if the year is different). In the query results show the following details: year, player first and last names, date of birth (day, month and year), country of birth, team name, league ID, their positions, and GWG
b) Among those player/s who had the highest GWG, find out the player who had won the highest number of awards using pig and hive. In the query results show the following details: player first and last name, and award count.
c) Then using the first and last name of the player who had won the highest number of awards find out the points that the player had earned for each year that the player received an award using pig and hive. Display the following details: the award names, the award year, and the points/Pts that the player scored that year.
(Provide your hive queries, pig scripts and the screenshots of the results (and tables))
6 marks
2 marks (1 mark for pig and 1 mark for hive)
2 marks (1 mark for pig and 1 mark for hive)
2 marks (1 mark for pig and 1 mark for hive)
Task 3
a) Find out the coach who had won the highest number of awards using pig and hive. In the query results show the following details: coach first and last name, date of birth, birth country and number of awards.
b) Find out the coach who had the highest wins using pig and hive. In the query results show the following details: coach first and last name, year, games, wins, losses and ties.
c) Perform the same query (b) mentioned above but in the results also display the number of awards won by this coach.
(Provide your hive queries, pig scripts and the screenshots of the results (and tables))
6 marks
2 marks (1 mark for pig and 1 mark for hive)
2 marks (1 mark for pig and 1 mark for hive)
2 marks (1 mark for pig and 1 mark for hive)
Task 4
a) Find out the total number of points, goals and assists by each team using pig and hive. In the query results show the following details for all the teams: team name, total number of points, total number goals and total number of assists.
b) Find out for each team which player had scored the highest total number of points using pig and hive. In the query results show the following details: team name, player first and last name, year, number of points, goals and assists.
(Provide your hive queries, pig scripts and the screenshots of the results (and tables))
4 marks
2 marks (1 mark for pig and 1 mark for hive)
2 marks (1 mark for pig and 1 mark for hive)
Task 5
a) In the Hortonworks shell, execute the first query in Task 3 using Pig, and record your time with and without enabling the Tez. Compare the results.
b) In Ambari, for Hive, enable the Tez and perform the first query in Task 3 with and without Tez, and compare your results.
c) For the first query in Task 3, for Hive, this time use Cost Based Optimization (CBO) with Tez on. Record the results such as DAG details, Graphic View or others.
(Provide the results of your experiments in table/s where possible along with the screenshots)
3 marks
1 mark
1 mark
1 mark
Final Mark / 20
Note!
1. For task 2, 3, and 4, you need to Record and Compare the Hive and Pig based on the total time taken as well as other factors such as no of jobs, maps and reducers, etc.
2. In the report, you need to ensure that you include all Hive queries, Pig scripts, and screenshots of all the results, logs, etc. that show the completion of each task and its component, and all the comparison discussions that you will write in paragraphs and comparison tables. (Note: when the results of a query do not fit in a screenshot, you have to provide two screenshots, one from the first page of results and one from the last page of results). There will be mark deduction if the report is incomplete.
Submission related deductions (There is a mark deduction for any missing document. And there will be 5% penalty per day.)
1. An Assessment Cover Sheet for the group
2. Provide a report that includes all the documentation mentioned per each task (Hive SQL code, pig scripts, tables, screenshots, etc) in a Word document in the order of Tasks. Use heading, subheadings and good report format.
Part 2 (20 %)
Criteria
Not satisfactory
Poor
Average
Good
Very Good
Excellent
Comment
Introduction provides a brief description about what paper is about and what will be discussed like an overview/outline (2)
The four selected papers are all closely relevant to one research area. Papers should be completed research papers. DO NOT choose research-in-progress papers, surveys, or review papers. The selection of seminal papers must be based on considering the most influential, well-known and cited papers in that research area, and whether they are full research papers with a proposed approach and its implementation and evaluation. (3)
1) Briefly and clearly describe 4 different approaches under subheadings. (2)
2) Discuss the challenges/issues that these papers focus on. Avoid irrelevant information. (2)
3) Summarise the findings/results of evaluation of their approaches. This should include what improvement or impact each paper had. Provide evidence from their evaluation results. (4)
4) Add your judgment on results. If papers address the same problem, compare them. (3)
5)
avoiding irrelevant information. The discussions original, not copies from the papers. use paragraphs rather than bullets or other styles, paragraphs with a logical and consistent flow.
Conclusion summarises the paper by mentioning main findings and making final points (2)
Correct APA referencing format in the References List and also in in-text citation. Number of references should be appropriate. (2)
Reductions
• The paper presentation (format, style) is not according to the specified requirements (-1 mark from the total mark)
• Writing skills (i.e. English, grammar, spelling), and flow and link between the paragraphs are not appropriate. (up to -2 marks from the total mark)
• The length of the paper exceeds the word limit or is under the limit, up to 10% difference in total word limit is acceptable but more than that results in mark reduction (-1 mark from the total mark)
• Submission requirements are not met or files missing in the zip file (up to -2 marks from the total mark)
• Late submissions (5% from the total achieved mark per day)
• 0 mark will be applied where a case of plagiarism is detected
Attachments:
