DK7732數(shù)控高速走絲電火花線切割機(jī)及控制系統(tǒng)【說(shuō)明書+CAD】
DK7732數(shù)控高速走絲電火花線切割機(jī)及控制系統(tǒng)【說(shuō)明書+CAD】,說(shuō)明書+CAD,DK7732數(shù)控高速走絲電火花線切割機(jī)及控制系統(tǒng)【說(shuō)明書+CAD】,dk7732,數(shù)控,高速,電火花,切割機(jī),控制系統(tǒng),說(shuō)明書,仿單,cad
unsuspected relationships which are ofinterest or value to the databases owners, or data miners 9. Due to the large number ofdimensionality and the huge volume ofdata, traditional statistical methods have their limitations in data mining. To meet the challenge ofdata mining, articial intelligence based humancomputer interactive techniques have been widely used in data mining 3,16. * Conceptual construction on incomplete survey data Shouhong Wang a, * , Hai Wang b a DepartmentofMarketing/BusinessInformationSystems,CharltonCollegeofBusiness, UniversityofMassachusettsDartmouth,285OldWestportRoad,NorthDartmouth,MA02747-2300,USA b DepartmentofComputerScience,UniversityofToronto,Toronto,ON,CanadaM5S3G4 Received 22 March 2003; received in revised form 9 September 2003; accepted 20 October 2003 Available online 26 November 2003 Abstract The raw survey data for data mining are often incomplete. The issues of missing data in knowledge discovery are often ignored in data mining. This article presents the conceptual foundations of data mining with incomplete survey data, and proposes query processing for knowledge discovery and a set of query functions for the conceptual construction in survey data mining. Through a case, this paper demonstrates that conceptual construction on incomplete data can be accomplished by using articial intelligence tools such as self-organizing maps. C211 2003 Elsevier B.V. All rights reserved. Keywords: Incomplete survey data; Survey data mining; Conceptual construction; Self-organizing maps; Cluster analysis; Knowledge discovery; Query processing 1. Introduction Data mining is the process oftrawling through data in the hope ofidentifying interpretable patterns. Data mining is dierent from traditional statistical analysis in that it is aimed at nding Data when and why certain types ofvalues are often missing; what variables are correlated in terms of having missing values at the same time. These valuable pieces ofknowledge can be discovered only after the missing part of the data set is fully explored. This paper discusses the issue ofmissing data in mining survey databases for knowledge dis- covery, presents the conceptual foundations of conceptual construction, and proposes a set of query functions for conceptual construction in SOM-based data mining. The rest of the paper is organized as follows. Section 2 discusses the issues of missing data related to data mining. Section 3 introduces SOM for conceptual construction on incomplete data. Section 4 suggests four concepts as knowledge discovery in data mining with incomplete data. It provides a scheme of conceptual construction on incomplete data using SOM. Section 5 proposes a query tool that is used to manipulate SOM for conceptual construction. Section 6 presents a case study that applies the query tool to manipulate the SOM for the conceptual construction on a student opinion survey data set. Finally, Section 7 oers concluding remarks. 2. Issues of missing data Incomplete data sets are ubiquitous in data mining. There have been many treatments of missing data. One ofthe convenient solutions to incomplete data is to eliminate from the data set those records that are missing values. This, however, ignores potentially useful information in those records. In cases where the proportion ofmissing data is large, the conclusions drawn from the screened data set are more likely biased or misleading. There have been many non-statistical techniques for data mining. The self-organizing maps (SOM) method based on Kohonen neural network 12 is one ofthe promising techniques. SOM- based cluster techniques have advantages over other methods for data mining. Data mining typically deals with very high-dimensional data. That is, an observation in the database for data mining is typically described by a large number ofvariables. The curse ofdimensionality turns statistical correlations ofdata insignicant, and thus makes statistical methods powerless. The SOM method, however, does not rely on any assumptions ofstatistical tests, and is considered as S.Wang,H.Wang/Data however, not all missing data are relevant to the data minerC213s interest. Hence, any simple brute-force search method for missing data is not only infeasible for a huge amount ofdata, but also helpless when the data miner is to identify problems, or develop concepts, through data mining. To identify problems or develop concepts, the data miner needs a tool to observe unsuspected patterns ofthe available data and the missing parts. Self-organizing maps (SOM) 12 have been widely used for clustering, since SOM are more computationally ecient than the popular k-means clustering algorithm. More importantly, SOM provide data visualization for the data miner to view high-dimensional data 11. Research 14,16 314 S.Wang,H.Wang/Data j=V M i where V M i; j is the number ofmissing values in both variables i and j, and V M i is the number ofmissing values in variable i. This concept discloses the correlation oftwo variables in terms ofmissing values. The higher the value V M i; j=V M i is, the stronger the correlation ofmissing values would be. 4.4.Conditionaleects The concept ofconditional eects reveals the potential changes ofthe clusters identied ifthe missing values had completed. C5: DPj8zik where DP is the changes ofthe clusters perceived by the data miner, 8zi represents all missing values ofvariable i, and k is the possible value variable i might have for the survey. Typically, k fmax; min;pg where max is the maximal value ofthe scale, min is the minimal value ofthe scale, and p is the random variable with the same distribution function of the values in the complete data. By setting dierent possible values of k for the missing values, the data miner is able to observe the changes ofclusters and redene the problem. In summary, conceptual construction on incomplete data is a knowledge development process. To construct new concepts on incomplete data, the data miner needs to identify a particular problem as a base for the construction. Four concepts on missing data are reliability, hiding, complementing, and conditional eects. Next, we develop a set ofqueries for conceptual con- struction on incomplete data. Our objective ofthese queries is to allow the data miner to conduct the experimental process through the use ofSOM in order to construct new concepts related to the problem. 5. Query processing for conceptual construction Query tools are characterized by structured query language (SQL), the standard query language for relational database management systems. For data mining, as the ultimate objective of information retrieval from the database is the formulation of knowledge through the use of a variety oftechniques, it is unlikely that a single standard query language can be created for all purposes ofdata mining. Nevertheless, to support humancomputer collaboration eectively, visualized query processing is necessary in data mining 5. This study develops a set ofquery functions that assist the data miner to construct concepts related to the missing data through the C3: V M ijxja; b where V M i is the number ofmissing values in variable i, xj is the value ofvariable j, and a; b is the range ofvalues of xj. This index discloses the degree ofuncertainty ofanswers to the survey question, such as do not know and neutral, or the intention ofsystematical missing data, such as do not want to tell. 4.3.Complementing The concept ofcomplementing reveals what variables are more likely to have missing values at 316 S.Wang,H.Wang/Dataj=V M i 5.7.FindC5: DPj8zik This query function allows the data miner to replace the missing values with hypothetical values, and observe the changes ofthe clusters. The hypothetical values could be any between the possible maximal and minimal values. Instead ofreturning a specic number, this query function returns various maps for the data miner to compare the clusters using dierent k values. Based on what-if trials, the data miner is able to perceive the impacts of the missing values on the This query function allows the data miner to nd the correlation of the missing values in two variables. Using this query, the data miner rst chooses two variables to be investigated that are relevant to the problem, and then nds out how often the two variables might have missing values together. the two variables that might have correlation ofmissing values and the range ofthe known values in one variable, and receives the number ofobservations with missing values in the other variable. The data miner might be interested in a particular variable. Using this query, the data miner is allowed to check whether the clusters observed are reliable in terms ofthe particular variable. 5.5.FindC3: V M ijxja;b This query function allows the data miner to nd the correlation of the missing values in one variable and the value range in another variable. This correlation provides knowledge After choosing variables and identifying clusters through SOM, the data miner would like to know how reliable ofthe observed clusters are. This query allows the data miner to nd S M =S C in the variables used for the SOM training. If the value S M =S C is high, the data miner can nd the reliability for individual variables, as described next. 5.2.SaveandretrievetheSOM This query is a general operation for saving and retrieving the SOM along with all the setting for parameters, variables of the data samples. The data miner is allowed to compare a number of SOM results to construct concepts on incomplete data. 318 S.Wang,H.Wang/Data specically, the textbooks do not give much help Although v20 has the highest rate of missing values, missing values do not have a signicant impact on the problem identied In this case, v20 had the highest rate ofmissing values (8.6%). The data miner would like to see problem identied. 320 S.Wang,H.Wang/Data 0:16jv20 2; 0:21jv20 3; 0:28jv20 4; 0:24jv20 5g for the complete data. After setting these values for the missing values ofv20, new trial data were used for SOM to generate maps. Using the same topology ofthe SOM for the complete data, the what-if trials were conducted. As shown in Fig. 3, the overall conclusion in this case was that the missing values in v20 did not have signicant impact on the analysis, variables v1, v14, v16, v18, and v20 that received low values were found to be particularly relevant to the problem ofineective teaching, as summarized in the rst three columns ofTable 1. Incomplete data were then applied to construct new concepts ofthe problem. 6.1.C1: S M =S C Although the rate ofincomplete observations for the entire survey was as high as 37%, S M =S C in terms ofthe very relevant variables (v1, v14, v16, v18, and v20) was 5.2%, indicating that the problem initially identied was generally valid. 6.2.C2: V M i=V C i Among the ve variables, the rate ofmissing values in v20 was the highest at 8.6%, indicating that the dependability ofineective teaching on this variable (i.e., the usefulness ofthe textbook and teaching material) might not be as reliable as other very relevant variables. 6.3.C3: V M ijxja;b The rate ofmissing values in v16 was 2.2%. However, 52.1% ofthe missing values came from the observations with x(v15)1,3. This indicated that students who were not satised with prompt grading often disregarded whether they received helpful comments for their work. 6.4.C4: V M i; j=V M i The rate ofmissing values ofv14 was 3.7%. However, the missing values ofv14 and v10 were highly correlated with V M v14; v10=V M v1433:6%. This indicated that students who omitted the opinion on the learning experience from the course often disregarded whether the tests or assignments were appropriately designed. 6.5.C5: DPj8zik an experimental test ofthe proposed method. Clearly, the scale ofdata mining in this case study is ceptual construction on incomplete data. Using these query functions, the data miner is allowed to construct new concepts related to the problem identied for data mining. Through the real-world case, it has been demonstrated that the model ofconceptual construction can better be used for knowledge discovery. rather small. In general, data mining is applied on much larger data sets than this example in the aspects ofsample size and dimensionality. 7. Conclusions In the data mining eld, incomplete data are often mistreated. This study proposes conceptual construction on incomplete data. Four categories ofconcepts on missing data are proposed. They are reliability ofthe problem identied, intention ofdata hiding, complementing ofmissing values in two variables, and conditional eects ofthe missing data. SOM are selected as the tool for conceptual construction due to their advantages in clustering and data visualization. Building on SOM clustering analysis, this paper then suggests seven categories of query functions for con- Fig. 3. Examples of SOM for what-if trials. S.Wang,H.Wang/Data&KnowledgeEngineering49(2004)311323 321 Knowledge discovery in databases is a growing eld. Generally, knowledge discovery starts with the original problem identication. Yet the validation ofthe problem identied is typically beyond the database and generic statistical algorithms themselves. During the knowledge dis- covery process, new concepts must be constructed through demystifying the data. In conclusion, conceptual construction on incomplete data provides eective techniques for knowledge devel- opment so that the data miner is allowed to interpret the data mining results based on the par- ticular problem domain and his/her perception ofthe missing data. Future work includes implementing the software system on a mainframe-based database system and further evaluating the proposed approach on data sets with a much larger scale. Acknowledgements The authors are indebted to two anonymous referees for their valuable comments for the revision ofthis paper. v18: Test and assignments provide adequate feedback on student progress. v19: The instructor is willing to schedule consultation time with the students. v20: The text book(s) and course material are useful. v21: In comparison to other instructors, this instructor is an eective teacher. References 1 C.C. Aggarwal, S. Parthasarathy, Mining massively incomplete data sets by conceptual reconstruction, in: Proceedings ofthe 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, New York, 2001, pp. 227232. 2 G. Batista, M. Monard, An analysis of four missing data treatment methods for supervised learning, Applied Articial Intelligence 17 (5/6) (2003) 519533. 3 R.J. Brachman, T. Khabaza, W. Kloesgen, G. Piatetsky-Shapiro, G.E. Simoudis, Mining business databases, Communications ofthe ACM 39 (11) (1996) 4248. 4 S. Brin, R. Rastogi, K. Shim, Mining optimized gain rules for numeric attributes, IEEE Transactions on Knowledge and Data Engineering 15 (2) (2003) 324338. 5 L. Chittaro, C. Combi, Visualizing queries on database oftemporal histories: new metaphors and their evaluation, Data and Knowledge Engineering 44 (20) (2003) 239264. 6 G. Deboeck, T. Kohonen, Visual Explorations in Finance with Self-Organizing Maps, Springer-Verlag, London, UK, 1998. 7 A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal ofthe Royal Statistical Society, Series B Methodological 39 (1) (1977) 138. 8 U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, The KDD process for extracting useful knowledge from volumes of data, Communications ofthe ACM 39 (11) (1996) 2734. 9 D.J. Hand, Data mining: statistics and more?, The American Statistician 52 (2) (1998) 112118. 10 I.T. Jollie, Principle Component Analysis, Springer-Verlag, New York, 1986. 11 D.A. Keim, H.P. Kriegel, Visualization techniques for mining large databases: a comparison, IEEE Transactions Appendix A. The questionnaire of student opinion of teachers survey v1: The instructor explains dicult concepts clearly and understandably. v2: Class sessions appear to be carefully planned. v3: The instructor conveys strong interest and enthusiasm. v4: Students are encouraged to express their views and participate in class. v5: The instructor shows a genuine concern for student progress. v6: The instructor stimulates students to think for themselves. v7: Eective use is made ofexamples and illustrations. v8: The instructor speaks in a way which can be clearly understood. v9: The instructor makes it clear how each topic ts into the course. v10: This course was a positive learning experience. v11: Classes are held regularly to an agreed schedule. v12: The various parts ofthe course are eectively co-ordinated. v13: Course requirements are communicated clearly and explicitly. v14: Tests and assignments are reasonable measures ofstudent learning. v15: Where appropriate, student work is graded promptly. v16: Where appropriate, helpful comments are provided when student work is graded. v17: There is close agreement between stated course objectives and what is taught. 322 S.Wang,H.Wang/Data&KnowledgeEngineering49(2004)311323 on Knowledge and Data Engineering 8 (6) (1996) 923938. 12 T. Kohonen, Self-Organization and Associative Memory, Third ed., Springer-Verlag, Berlin, 1989. 13 P. Langley, H. Simon, G. Bradshaw, J. Zytkow, Scientic Discovery: Computational Explorations ofthe Creative Processes, MIT Press, Cambridge, MA, 1987. 14 A. Silberschatz, A. Tuzhilin, What makes patterns interesting in knowledge discovery systems, IEEE Transactions on Knowledge and Data Engineering 5 (6) (1996) 970974. 15 S. Tseng, K. Wang, C. Lee, A pre-processing method to deal with missing values by integrating clustering and regression techniques, Applied Articial Intelligence 17 (5/6) (2003) 535544. 16 S. Wang, Nonlinear pattern hypothesis generation for data mining, Data and Knowledge Engineering 40 (3) (2002) 273283. Shouhong Wang is a Professor of Business Information Systems at University of Massachusetts Dartmouth. He received his Ph.D. (1990) in Information Systems from McMaster University, Canada. His research interests include data mining and articial intelligence in business. He has published over 60 papers in academic journals, including Date and Knowledge Engineering, IEEE Transactions on Systems, Man, and Cybernetics, IEEE Transactions on Pattern Analysis and Machine Intelligence, INFORMS Journal on Computing, Knowledge and Information Systems, International Journal of Intelligent Systems in Account- ing, Finance, and Management, Neural Computing and Applications, and others. He has been a biographee ofCanadian Who C213s Who since 1995, a biographee ofMarquis Who C213s Who in America since 2000, and a biographee ofMarquis Who C213s Who in American Education 2004. S.Wang,H.Wang/Data&KnowledgeEngineering49(2004)311323 323 Hai Wang is a Ph.D. candidate in Department ofComputer Science at the University ofToronto, Canada. He received his B.Sc. and M.Sc. in Computer Science from the University of New Brunswick (1995) and the University ofToronto (1997), respectively. His research interests are in the areas ofdatabase and performance modeling. His papers have been published in ACMSIGMETRICSPerformanceEvaluationReview,Perfor- manceEvaluation,KnowledgeandInformationSystems, andseveralconferenceproceedings.
收藏