Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: http://dspace.lib.uom.gr/handle/2159/21727
Συγγραφέας: Χαντζοπλάκη, Άννα
Τίτλος: Μοντελοποίηση ισχύος υπολογισμών συστοιχίας : ένα πείραμα με HADOOP
Ημερομηνία Έκδοσης: 2017
Τμήμα: Πρόγραμμα Μεταπτυχιακών Σπουδών Ειδίκευσης στην Εφαρμοσμένη Πληροφορική
Επόπτης Καθηγητής: Μαργαρίτης, Κωνσταντίνος
Περίληψη: Hadoop is currently the most popular framework for storing and processing large scale datasets, offering a reliable, scalable and fault-tolerant solution at commodity hardware. However, a raising topic besides performance is energy efficiency and power consumption. The purpose of the dissertation is to evaluate the performance and power implications of a Hadoop ecosystem, by running three different kinds of applications. The Pi benchmark is a high CPU-bound application that uses only a single reduce task, the WordCount is also a CPU-bound application with high CPU utilization and a low disk and network I/O, whereas the well-known Terasort benchmark stresses the Hadoop from every aspect, since it is the most resource-intensive, combining high CPU utilization with high disk throughput and moderate network traffic. The benchmarks were tested for different input sizes and under different combinations of mappers and reducers, in the pursuance of better understanding their behavior, both in terms of performance and energy efficiency. The testbed used was a seven-node cluster. The first important milestone of this project was the establishment of a power estimation model. The formula used takes into account four important metrics: CPU,RAM, disk and cache utilization. We exploited Ganglia’s log files for the calculation of the four aforementioned metrics. Afterwards, we trained our model with the use of an external equipment. The Linear Least Squares method was used to compute the optimal weights for each metric. Several, validation tests were performed and power estimation model proved to be robust. In addition, several experiments were conducted with the use of the aforementioned benchmarks and the power estimations with the use of the power model. Overall, when the computation is complex and the size of the problem is big enough, the addition of more computing units consults in better results both in performance and in energy efficiency. In the case of WordCount example, we proved that the number of mappers plays an important role in improving performance and efficiency. Additionally, for measuring performance it is imperative that we take into consideration the input file size. Small files are a problem in Hadoop and result to additional overhead. Finally, the Terasort results proved a significance on the number of reducers used, since the shuffling phase is an intensive stage of the application.
Λέξεις Κλειδιά: Hadoop
Hadoop multi-node setup
MapReduce
YARN
Ganglia
Κατανάλωση ισχύος
Μοντέλο πρόβλεψης ισχύος
Ενεργειακή αποδοτικότητα
Πληροφορίες: Διπλωματική εργασία--Πανεπιστήμιο Μακεδονίας, Θεσσαλονίκη, 2017.
Εμφανίζεται στις Συλλογές:Π.Μ.Σ. στην Εφαρμοσμένη Πληροφορική (M)

Αρχεία σε αυτό το Τεκμήριο:
Αρχείο Περιγραφή ΜέγεθοςΜορφότυπος 
ChantzoplakiAnnaMsc2017.pdf2.56 MBAdobe PDFΠροβολή/Ανοιγμα


Τα τεκμήρια στην ΨΗΦΙΔΑ προστατεύονται από πνευματικά δικαιώματα, εκτός αν αναφέρεται κάτι διαφορετικό.