Please use this identifier to cite or link to this item: http://dspace.lib.uom.gr/handle/2159/21727
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorΜαργαρίτης, Κωνσταντίνοςel
dc.contributor.authorΧαντζοπλάκη, Άνναel
dc.date.accessioned2018-04-02T20:44:22Z-
dc.date.available2018-04-02T20:44:22Z-
dc.date.issued2017el
dc.identifier.urihttp://dspace.lib.uom.gr/handle/2159/21727-
dc.descriptionΔιπλωματική εργασία--Πανεπιστήμιο Μακεδονίας, Θεσσαλονίκη, 2017.el
dc.description.abstractHadoop is currently the most popular framework for storing and processing large scale datasets, offering a reliable, scalable and fault-tolerant solution at commodity hardware. However, a raising topic besides performance is energy efficiency and power consumption. The purpose of the dissertation is to evaluate the performance and power implications of a Hadoop ecosystem, by running three different kinds of applications. The Pi benchmark is a high CPU-bound application that uses only a single reduce task, the WordCount is also a CPU-bound application with high CPU utilization and a low disk and network I/O, whereas the well-known Terasort benchmark stresses the Hadoop from every aspect, since it is the most resource-intensive, combining high CPU utilization with high disk throughput and moderate network traffic. The benchmarks were tested for different input sizes and under different combinations of mappers and reducers, in the pursuance of better understanding their behavior, both in terms of performance and energy efficiency. The testbed used was a seven-node cluster. The first important milestone of this project was the establishment of a power estimation model. The formula used takes into account four important metrics: CPU,RAM, disk and cache utilization. We exploited Ganglia’s log files for the calculation of the four aforementioned metrics. Afterwards, we trained our model with the use of an external equipment. The Linear Least Squares method was used to compute the optimal weights for each metric. Several, validation tests were performed and power estimation model proved to be robust. In addition, several experiments were conducted with the use of the aforementioned benchmarks and the power estimations with the use of the power model. Overall, when the computation is complex and the size of the problem is big enough, the addition of more computing units consults in better results both in performance and in energy efficiency. In the case of WordCount example, we proved that the number of mappers plays an important role in improving performance and efficiency. Additionally, for measuring performance it is imperative that we take into consideration the input file size. Small files are a problem in Hadoop and result to additional overhead. Finally, the Terasort results proved a significance on the number of reducers used, since the shuffling phase is an intensive stage of the application.en
dc.format.extent112el
dc.language.isoenen
dc.publisherΠανεπιστήμιο Μακεδονίαςel
dc.subjectHadoopen
dc.subjectHadoop multi-node setupen
dc.subjectMapReduceen
dc.subjectYARNen
dc.subjectGangliaen
dc.subjectΚατανάλωση ισχύοςel
dc.subjectΜοντέλο πρόβλεψης ισχύοςel
dc.subjectΕνεργειακή αποδοτικότηταel
dc.titleΜοντελοποίηση ισχύος υπολογισμών συστοιχίας : ένα πείραμα με HADOOPel
dc.typeElectronic Thesis or Dissertationen
dc.typeTexten
dc.contributor.departmentΠρόγραμμα Μεταπτυχιακών Σπουδών Ειδίκευσης στην Εφαρμοσμένη Πληροφορικήel
Appears in Collections:Π.Μ.Σ. στην Εφαρμοσμένη Πληροφορική (M)

Files in This Item:
File Description SizeFormat 
ChantzoplakiAnnaMsc2017.pdf2.56 MBAdobe PDFView/Open


Items in Psepheda are protected by copyright, with all rights reserved, unless otherwise indicated.