數(shù)據(jù)挖掘402207
,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Data Mining:Concepts and Techniques,*,Data Mining:Concepts and Techniques,Slides for Textbook Chapter 4,Zhou Hongfang,Department of Computer Science and Engineering,Xi,an University of Technology,2024/11/18,1,Data Mining:Concepts and Techniques,Chapter 4:Data Mining Primitives,Languages,and System Architectures,Data mining primitives:What defines a data mining task?,A data mining query language,Design graphical user interfaces based on a data mining query language,Architecture of data mining systems,Summary,2024/11/18,2,Data Mining:Concepts and Techniques,Why Data Mining Primitives and Languages?,Finding all the patterns autonomously in a database?unrealistic because the patterns could be too many but uninteresting,Data mining should be an interactive process,User directs what to be mined,Users must be provided with a set of,primitives,to be used to communicate with the data mining system,Incorporating these primitives in a,data mining query language,More flexible user interaction,Foundation for design of graphical user interface,Standardization of data mining industry and practice,2024/11/18,3,Data Mining:Concepts and Techniques,What Defines a Data Mining Task?,Task-relevant data,Type of knowledge to be mined,Background knowledge,Pattern interestingness measurements,Visualization of discovered patterns,2024/11/18,4,Data Mining:Concepts and Techniques,Task-Relevant,Data,Database or data warehouse name,Database tables or data warehouse cubes,Condition for data selection,Relevant attributes or dimensions,Data grouping criteria,2024/11/18,5,Data Mining:Concepts and Techniques,Types of knowledge to be mined,Characterization,Discrimination,Association,Classification/prediction,Clustering,Outlier analysis,Other data mining tasks,2024/11/18,6,Data Mining:Concepts and Techniques,Background Knowledge:Concept Hierarchies,Schema hierarchy,E.g.,street city province_or_state country,Set-grouping hierarchy,E.g.,20-39=young,40-59=middle_aged,Operation-derived hierarchy,email address:,dmbook,cs,.,sfu,.ca,login-name department university country,Rule-based hierarchy,low_profit_margin(X)=price(X,P,1,)and cost(X,P,2,)and(P,1,-P,2,)$50,2024/11/18,7,Data Mining:Concepts and Techniques,Measurements of Pattern Interestingness,Simplicity,e.g.,(association)rule length,(decision)tree size,Certainty,e.g.,confidence,P(A|B)=#(A and B)/#(B),classification reliability or accuracy,certainty factor,rule strength,rule quality,discriminating weight,etc.,Utility,potential usefulness,e.g.,support(association),noise threshold(description),Novelty,not previously known,surprising(used to remove redundant rules),2024/11/18,8,Data Mining:Concepts and Techniques,Visualization of Discovered Patterns,Different backgrounds/usages may require,different forms of representation,E.g.,rules,tables,crosstabs,pie/bar chart etc.,Concept hierarchy,is also important,Discovered knowledge might be more understandable when represented at,high level of abstraction,Interactive,drill up/down,pivoting,slicing and dicing,provide different perspectives to data,Different kinds of,knowledge,require different representation:association,classification,clustering,etc.,2024/11/18,9,Data Mining:Concepts and Techniques,Chapter 4:Data Mining Primitives,Languages,and System Architectures,Data mining primitives:What defines a data mining task?,A data mining query language,Design graphical user interfaces based on a data mining query language,Architecture of data mining systems,Summary,2024/11/18,10,Data Mining:Concepts and Techniques,A Data Mining Query Language(DMQL),Motivation,A DMQL can provide the ability to,support ad-hoc and interactive data mining,By providing a,standardized language,like SQL,Hope to achieve a similar effect like that SQL has on relational database,Foundation for system development and evolution,Facilitate information exchange,technology transfer,commercialization and wide acceptance,Design,DMQL is designed with the,primitives,described earlier,2024/11/18,11,Data Mining:Concepts and Techniques,Chapter 4:Data Mining Primitives,Languages,and System Architectures,Data mining primitives:What defines a data mining task?,A data mining query language,Design graphical user interfaces based on a data mining query language,Architecture of data mining systems,Summary,2024/11/18,12,Data Mining:Concepts and Techniques,Designing Graphical User Interfaces Based on a Data Mining Query Language,What tasks should be considered in the design GUIs based on a data mining query language?,Data collection and data mining query composition,Presentation of discovered patterns,Hierarchy specification and manipulation,Manipulation of data mining primitives,Interactive multilevel mining,Other miscellaneous information,2024/11/18,13,Data Mining:Concepts and Techniques,Chapter 4:Data Mining Primitives,Languages,and System Architectures,Data mining primitives:What defines