Please use this identifier to cite or link to this item: http://dspace.lib.uom.gr/handle/2159/21727
Author: Χαντζοπλάκη, Άννα
Title: Μοντελοποίηση ισχύος υπολογισμών συστοιχίας : ένα πείραμα με HADOOP
Date Issued: 2017
Department: Πρόγραμμα Μεταπτυχιακών Σπουδών Ειδίκευσης στην Εφαρμοσμένη Πληροφορική
Supervisor: Μαργαρίτης, Κωνσταντίνος
Abstract: Hadoop is currently the most popular framework for storing and processing large scale datasets, offering a reliable, scalable and fault-tolerant solution at commodity hardware. However, a raising topic besides performance is energy efficiency and power consumption. The purpose of the dissertation is to evaluate the performance and power implications of a Hadoop ecosystem, by running three different kinds of applications. The Pi benchmark is a high CPU-bound application that uses only a single reduce task, the WordCount is also a CPU-bound application with high CPU utilization and a low disk and network I/O, whereas the well-known Terasort benchmark stresses the Hadoop from every aspect, since it is the most resource-intensive, combining high CPU utilization with high disk throughput and moderate network traffic. The benchmarks were tested for different input sizes and under different combinations of mappers and reducers, in the pursuance of better understanding their behavior, both in terms of performance and energy efficiency. The testbed used was a seven-node cluster. The first important milestone of this project was the establishment of a power estimation model. The formula used takes into account four important metrics: CPU,RAM, disk and cache utilization. We exploited Ganglia’s log files for the calculation of the four aforementioned metrics. Afterwards, we trained our model with the use of an external equipment. The Linear Least Squares method was used to compute the optimal weights for each metric. Several, validation tests were performed and power estimation model proved to be robust. In addition, several experiments were conducted with the use of the aforementioned benchmarks and the power estimations with the use of the power model. Overall, when the computation is complex and the size of the problem is big enough, the addition of more computing units consults in better results both in performance and in energy efficiency. In the case of WordCount example, we proved that the number of mappers plays an important role in improving performance and efficiency. Additionally, for measuring performance it is imperative that we take into consideration the input file size. Small files are a problem in Hadoop and result to additional overhead. Finally, the Terasort results proved a significance on the number of reducers used, since the shuffling phase is an intensive stage of the application.
Keywords: Hadoop
Hadoop multi-node setup
MapReduce
YARN
Ganglia
Κατανάλωση ισχύος
Μοντέλο πρόβλεψης ισχύος
Ενεργειακή αποδοτικότητα
Information: Διπλωματική εργασία--Πανεπιστήμιο Μακεδονίας, Θεσσαλονίκη, 2017.
Appears in Collections:ΠΜΣ Εφαρμοσμένης Πληροφορικής (M)

Files in This Item:
File Description SizeFormat 
ChantzoplakiAnnaMsc2017.pdf2.56 MBAdobe PDFView/Open


Items in Psepheda are protected by copyright, with all rights reserved, unless otherwise indicated.