Data Organization and Data Mining
General
- Course Code: 1841
- Semester: 8th
- Course Type: Scientific Area (SA)
- Course Category: Optional (OP)
- Scientific Field: Data Management - Artifial Inteligence (DMAI)
- Lectures: 4 hours/week
- ECTS units: 6
- Teching and exams language: Greek, English
- The course is offered to Erasmus students
- Recommended prerequisite courses: (1741) Introduction to Data Analytics
- Coordinator: Dervos Dimitrios
- Instructors: Dervos Dimitrios
Educational goals
The course's aim is to introduce/expose the student to state-of-the-art practices and techniques relating to the preparation and organization of data for analytical processing and mining for the purpose of extracting useful information. With regard to data organization, the topics covered are OnLine Analytical Processing (OLAP), and Data Warehousing. As for the data mining algorithms considered, the emphasis is on those categorized to be decriptive (e.g. Clustering, Association Rules Mining), but also includes predictive algorithms, like the Nearest Neighbour, and Naive Bayes classifiers. Last but not least, the course includes case studies exemplifying association rule mining, and distance-based recommender system implementations. The software used includes R/RStudio, and WEKA. Upon successful completion of the course, the student is able to:
- Describe the basics of the OLAP and Data Mining technologies
- Differentiate between data and information
- Analyze and tackle problems relating to the storage, organization, and analytical processing of data in order to extract information
- Develop and pursue analytical data processing tasks in the R/RStudio and WEKA IDE platforms
- Homogenize data from disparate data sources and transform them in order to facilitate analytical data processing and data mining operations
- Produce recommendations by coding in SQL or/and R
- Apply exploratory data analysis, and descriptive data analytics techniques/methos
General Skills
- Upon successful completion of the course, the student will have developed analytical and problem solving skills relating to the organization, transformation, and analytical processing of data for the extraction of information from large datasets. In this respect, he will have learnt to use state-of-the-art data processing and analytics software, and work autonomously, as well as a member of a team.
Course Contents
Data Warehousing, basic concepts: ETL, data transformation, dimensions, Styra and Snowflake schemes
Data Cubes, OLAP queries: roll-up, drill-down, slice & dice
The MS SQL Server Analysis Services environment
The R/RStudio IDE environment
Data Mining
— Basic concepts
— Data mining techniques and categories
— Information objects/representation
— Assessing the quality of the data mining output
Clustering
— Distance/similarity measures
— Clustering algorithms: k-Means, k-Medians, k-Medoids
— Hierarchical clustering: Agglomerative, Divisive
— The DBSCAN algorithm
Market Basket Analysis
— Association rules, measures, noise detection and filtering
— Algorithms: Apriori, Sampling
Classification
— Basic concepts, information and entropy, decision trees
— Nearest neighbors, and data reduction techniques
Recommender systems
— Basic concepts
— Case study: similarity (nearest neighbor) based recommendations (R/RStudio)
— Case study: association rule based recommendations (SQL & R/RStudio)
Teaching Methods - Evaluation
Teaching Method
- Face-to_Face Teaching
- Case Studies: Data Preparation, Data Transformation, and Data Processing
- Hands-On Laboratory/Computer Practicing
Use of ICT means
- ICT based teaching
- Virtual machine with preinstalled course software
- Video recordings of present and past course lectures available on the Internet
- CMS (Moodle) educational content availability
Teaching Organization
Activity | Semester workload |
Lectures | 52 |
Preparation for laboratory exercises and projects | 20 |
Projects | 48 |
Individual study and analysis of literature | 60 |
Total | 180 |
Students evaluation
Languages: : Greek, English
Class project (optional), and hands-on laboratory practicing
Open book written final exam involving multiple choice questions and problem solving
Recommended Bibliography
Recommended Bibliography through "Eudoxus"
- P. Tan, M. Steinbach, A. Karpatne, V. Kumar, "Εισαγωγή στην Εξόρυξη Δεδομένων", Εκδόσεις Α. Τζιόλα & Υιοί Α.Ε., 2η Έκδοση, 2018, ISBN: 978-960-418-813-0, Κωδ. Ευδόξου: 77107675
- M.J. Zaki, W. Meira Jr., "Εξόρυξη και Ανάλυση Δεδομένων: Βασικές Έννοιες και Αλγόριθμοι", Εκδόσεις Κλειδάριθμος ΕΠΕ, 1η Έκδοση, 2017, ISBN: 978-960-461-770-8, Κωδ. Ευδόξου: 68386089
- Αλ. Νανόπουλος, Γ. Μανωλόπουλος, "Εισαγωγή στην Εξόρυξη Δεδομένων και τις Αποθήκες Δεδομένων", Εκδόσεις Νέων Τεχνολογιών, 1η Έκδοση, 2008, ISBN: 978-960-6759-17-8, Κωδ. Ευδόξου: 9457
Complementary greek bibliography
- Β.Σ. Βερύκιος, Β. Καγκλής, Η.Κ. Σταυρόπουλος, "Η Επιστήμη των Δεδομένων μέσα από τη Γλώσσα R", Εκδόσεις ΣΕΑΒ: Ελληνικά Ακαδημαϊκά Συγγράμματα και Βοηθήματα,1η Έκδοση, 2015, ISBN: 978-960-603-394-0, Ανάκτηση από τη διεύθυνση: https://repository.kallipos.gr/bitstream/11419/2965/1/00_master_document.pdf
- A. Rajaraman, J.D. Ullman, "Εξόρυξη από Μεγάλα Σύνολα Δεδομένων", Εκδόσεις Νέων Τεχνολογιών, 1η Έκδοση, 2014, ISBN: 978-960-6759-83-3
- Μ. Βαζιργιάννης, Μ. Χαλκίδη, "Εξόρυξη Γνώσης από Βάσεις Δεδομένων και τον Παγκόσμιο Ιστό", Έκδοση: Γ. Δαρδανός - Κ. Δαρδανός Ο.Ε., 2η Έκδοση, 2005, ISBN: 978-960-402-116-8
- R.J. Roiger, M.W. Geatz, "Εξόρυξη Πληροφορίας: Ένας Εισαγωγικός Οδηγός με Παραδείγματα", Εκδόσεις Κλειδάριθμος Ε.Π.Ε., 1η Έκδοση, 2008, ISBN: 978-960-461-206-2
Complementary international bibliography
- (Ελληνικά) Jiawei Han, Micheline Kamber and Jian Pei, Data Mining: Concepts and Techniques, 3rd ed., The Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791