Μοντελοποίηση ισχύος υπολογισμών συστοιχίας : ένα πείραμα με HADOOP

Χαντζοπλάκη, Άννα

Please use this identifier to cite or link to this item: http://dspace.lib.uom.gr/handle/2159/21727

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Μαργαρίτης, Κωνσταντίνος	el
dc.contributor.author	Χαντζοπλάκη, Άννα	el
dc.date.accessioned	2018-04-02T20:44:22Z	-
dc.date.available	2018-04-02T20:44:22Z	-
dc.date.issued	2017	el
dc.identifier.uri	http://dspace.lib.uom.gr/handle/2159/21727	-
dc.description	Διπλωματική εργασία--Πανεπιστήμιο Μακεδονίας, Θεσσαλονίκη, 2017.	el
dc.description.abstract	Hadoop is currently the most popular framework for storing and processing large scale datasets, offering a reliable, scalable and fault-tolerant solution at commodity hardware. However, a raising topic besides performance is energy efficiency and power consumption. The purpose of the dissertation is to evaluate the performance and power implications of a Hadoop ecosystem, by running three different kinds of applications. The Pi benchmark is a high CPU-bound application that uses only a single reduce task, the WordCount is also a CPU-bound application with high CPU utilization and a low disk and network I/O, whereas the well-known Terasort benchmark stresses the Hadoop from every aspect, since it is the most resource-intensive, combining high CPU utilization with high disk throughput and moderate network traffic. The benchmarks were tested for different input sizes and under different combinations of mappers and reducers, in the pursuance of better understanding their behavior, both in terms of performance and energy efficiency. The testbed used was a seven-node cluster. The first important milestone of this project was the establishment of a power estimation model. The formula used takes into account four important metrics: CPU,RAM, disk and cache utilization. We exploited Ganglia’s log files for the calculation of the four aforementioned metrics. Afterwards, we trained our model with the use of an external equipment. The Linear Least Squares method was used to compute the optimal weights for each metric. Several, validation tests were performed and power estimation model proved to be robust. In addition, several experiments were conducted with the use of the aforementioned benchmarks and the power estimations with the use of the power model. Overall, when the computation is complex and the size of the problem is big enough, the addition of more computing units consults in better results both in performance and in energy efficiency. In the case of WordCount example, we proved that the number of mappers plays an important role in improving performance and efficiency. Additionally, for measuring performance it is imperative that we take into consideration the input file size. Small files are a problem in Hadoop and result to additional overhead. Finally, the Terasort results proved a significance on the number of reducers used, since the shuffling phase is an intensive stage of the application.	en
dc.format.extent	112	el
dc.language.iso	en	en
dc.publisher	Πανεπιστήμιο Μακεδονίας	el
dc.subject	Hadoop	en
dc.subject	Hadoop multi-node setup	en
dc.subject	MapReduce	en
dc.subject	YARN	en
dc.subject	Ganglia	en
dc.subject	Κατανάλωση ισχύος	el
dc.subject	Μοντέλο πρόβλεψης ισχύος	el
dc.subject	Ενεργειακή αποδοτικότητα	el
dc.title	Μοντελοποίηση ισχύος υπολογισμών συστοιχίας : ένα πείραμα με HADOOP	el
dc.type	Electronic Thesis or Dissertation	en
dc.type	Text	en
dc.contributor.department	Πρόγραμμα Μεταπτυχιακών Σπουδών Ειδίκευσης στην Εφαρμοσμένη Πληροφορική	el
Appears in Collections:	Π.Μ.Σ. στην Εφαρμοσμένη Πληροφορική (M)

Files in This Item:

File	Description	Size	Format
ChantzoplakiAnnaMsc2017.pdf		2.56 MB	Adobe PDF	View/Open

Show simple item record Recommend this item

PSEPHEDA

Digital Library and Institutional Repository