Paralelni algoritmi za problem grupiranja podataka

Čabraja, Anto (2014) Paralelni algoritmi za problem grupiranja podataka. Diploma thesis, Faculty of Science > Department of Mathematics.

[img]
Preview
PDF
Language: Croatian

Download (2MB) | Preview
[img] Archive (popratni materijali ("algoritmi i podaci"))
Language: English

Download (19MB)

Abstract

Data clustering is one of the most important methods for data analysis and in most cases the first step in further analysis. Nowadays big services like Google and Facebook store terabytes of data. Data arrives in natural order, usually without a defined sequence and they need to be grouped into meaningful clusters. The problem of data clustering is trivial if we use thorough examination. However, we need a result in real time, which thorough examination can’t offer since data clustering is an NP-hard problem. This thesis offers basic definitions for data clustering and introduces mathematical modeling of data clustering problems. After definitions we describe basic methods for clustering: hierarchical and partitional clustering. In addition, we introduce the concept of evolutionary programming which represents the most popular method in solving NP-hard problems efficiently. In section two particular attention is dedicated to genetic algorithms as a representative of the evolutionary algorithms class. Evolutionary method is a common name for methods inspired by natural processes. A description of heuristic algorithms concept and a construction of sequential algorithms for data clustering problem is given in section three. It is shown that the implemented methods are good in solving problems, but the big time complexity of sequential algorithms leads to parallel processes. Therefore, after the sequential solution, we introduce the concept of parallel programming via MPI technologies and the benefits it offers. These technologies allow many types of network topologies for processors. This thesis describes three topological concepts: connected network, network with a master and group network. Also, we explain the synchronization among processors and demonstrate some parallel algorithm solutions for real-life problems. Synchronization is a hard problem because of a restricted set of functions (send, receive and broadcast). Using those functions we implemented all parallel algorithms in section five. After the implementation and testing we conducted a thorough analysis of results.

Item Type: Thesis (Diploma thesis)
Supervisor: Nogo, Goranka
Date: 2014
Number of Pages: 61
Subjects: NATURAL SCIENCES > Mathematics
Divisions: Faculty of Science > Department of Mathematics
Depositing User: Iva Prah
Date Deposited: 02 Jun 2015 08:46
Last Modified: 02 Jun 2015 08:46
URI: http://digre.pmf.unizg.hr/id/eprint/3913

Actions (login required)

View Item View Item