微軟SQL數(shù)據(jù)挖掘-數(shù)據(jù)倉庫技術(shù)研討會材料.ppt
《微軟SQL數(shù)據(jù)挖掘-數(shù)據(jù)倉庫技術(shù)研討會材料.ppt》由會員分享,可在線閱讀,更多相關(guān)《微軟SQL數(shù)據(jù)挖掘-數(shù)據(jù)倉庫技術(shù)研討會材料.ppt(54頁珍藏版)》請在裝配圖網(wǎng)上搜索。
1、歡迎光臨 微軟 SQL數(shù)據(jù)挖掘 /數(shù)據(jù)倉庫 技術(shù)研討會 今日安排 微軟 SQL數(shù)據(jù)挖掘技術(shù)概述 左洪 微軟公司 數(shù)據(jù)倉庫在電信的應(yīng)用 貝志城 明天高科 數(shù)據(jù)挖掘在 CRM中的應(yīng)用 王立軍 中圣公司 靈通 IT Service維護(hù)管理服務(wù)系統(tǒng) 鄒雄文 廣州靈通 Introduction to Data Mining with SQL Server 2000 左洪 高級產(chǎn)品市場經(jīng)理 微軟(中國)有限公司 Agenda What is Data Mining The Data Mining Market OLE DB for Data Min
2、ing Overview of the Data Mining Features in SQL Server 2000 Q&A What Is Data Mining? What is DM? A process of data exploration and analysis using automatic or semi- automatic means “Exploring data” scanning samples of known facts about “cases”. “knowledge”: Clusters, Rules, Decision trees, Equ
3、ations, Association rules Once the “knowledge” is extracted it: Can be browsed Provides a very useful insight on the cases behavior Can be used to predict values of other cases Can serve as a key element in closed loop analysis What drive high school students to attend college? The deciding
4、factors for high school students to attend college are Attend College: 55% Yes 45% No All Students Attend College: 79% Yes 11% No IQ=High Attend College: 45% Yes 55% No IQ=Low IQ ? Wealth Attend College: 94% Yes 6% No Wealth = True Attend College: 69% Yes 21% No Wealth = False Parents Encourage?
5、Attend College: 70% Yes 30% No Attend College: 31% Yes 69% No Parents Encourage = No Parents Encourage = Yes Business Oriented DM Problems Targeted ads “What banner should I display to this visitor?” Cross sells “What other products is this customer likely to buy? Fraud detection “Is this insu
6、rance claim a fraud?” Churn analysis “Who are those customers likely to churn?” Risk Management “Should I approve the loan to this customer?” Http:// Mining Model Mining Process - Illustrated DM Engine Data To Predict DM Engine Predicted Data Training Data Mining Model Mining Model The Data
7、Mining Market The $$$: Y2000 Market Size DM Tools Market: $250M 40% - license fees 60% consulting * Gartner The Players Leading vendors SAS SPSS IBM Hundreds of smaller vendors offering DM algorithms Oracle Thinking Machines acquisition The Products End-to-end Data Mining tools Extr
8、action, Cleansing, Loading, Modeling, Algorithms (dozens), Analysts workbench, Reporting, Charting. The customer is the power-analyst PhD in statistics is usually required Closed tools no standard API Total vendor lock-in Limited integration with applications DM an “outsider” in the Data Wa
9、rehouse Extensive consulting required Sky rocketing prices $60K+ for a single user license What the analysts say “ Stand-alone Data Mining Is Dead” - Forrester “The demise of stand alone data mining” Gartner The Microsoft Approach DataPro Users Survey 1999-2001 “Data mining will be the fas
10、test- growing BI technology” The $$$: 2000 Market Size DM Applications Market Size: $1.5B * IDC SQL Server 2000 - The Analysis Platform SQL 2000 provides a complete Analysis Platform Not an isolated, stand alone DM product Platform means: The infrastructure for applications Not an applica
11、tion by itself Integrated vision for all technologies, tools Standard based APIs (OLE DB for DM) Extensible Scaleable Data Flow DW OLTP OLAP DM Apps Reports & Analysis DM Analysis Services 2000 - Architecture Manager UI DSO Analysis Server Client OLE DB OLAP OLAP Engine (local) OLAP Engine DM
12、 Engine DM Engine (local) DM DMM DM Wizards DM DTS Task Ext. Ext. OLE DB for Data Mining Why OLE DB for DM? Make DM a mass market technology by: Leverage existing technologies and knowledge SQL and OLE DB Common industry wide concepts and data presentation Changing DM market perception from
13、 “proprietary” to “open” Increasing the number of players: Reduce the cost and risk of becoming a consumer one tool works with multiple providers Reduce the cost and risk of becoming a provider focus on expertise and find many partners to complement offering Dramatically increase the number of
14、 DM developers Integration With RDBMS Customers would like to Build DM models from within their RDBMS Train the models directly off their relational tables Perform predictions as relational queries (tables in, tables out) Feel that DM is a native part of their database. Therefore Data minin
15、g models are relational objects All operations on the models are relational The language used is SQL (w/Extensions) The effect: every DBA and VB developer can become a DM developer Creating a Data Mining Model (DMM) Identifying the “Cases” DM algorithms analyze “cases” The “case” is the entity
16、being categorized and classified Examples Customer credit risk analysis: Case = Customer Product profitability analysis: Case = Product Promotion success analysis: Case = Promotion Each case encapsulate all we know about the entity A Simple Set of Cases StudentI D Gender Parent Income IQ Enco
17、uragement College Plans 1 Male 23400 120 Not Encouraged No 2 Female 79200 90 Encouraged Yes 3 Male 42000 105 Not Encouraged Yes More Complicated Cases Cust ID Age Marit al Statu s IQ Favorite Movies Title Score 1 35 M 2 Star Wars 8 Toy Story 9 Terminator 7 2 20 S 3 Star Wars 7 Braveheart 7 The Matri
18、x 10 3 57 M 2 Sixth Sense 9 Casablanca 10 A DMM is a Table! A DMM structure is defined as a table Training a DMM means inserting data into the table Predicting from a DMM means querying the table All information describing the case are contained in columns Creating a Mining Model CREATE MINING
19、MODEL Plans Prediction ( StudentID LONG KEY, Gender TEXT DISCRETE, ParentIncome LONG CONTINUOUS, IQ DOUBLE CONTINUOUS, Encouragement TEXT DISCRETE, CollegePlans TEXT DISCRETE PREDICT ) USING Microsoft_Decision_Trees Creating a mining model with nested table Create Mining Model Movie
20、Prediction ( CutomerId long key, Age long continuous, Gender discrete, Education discrete, MovieList table predict ( MovieName text key ) ) using microsoft_decision_trees Training a DMM Training a DMM Training a DMM means passing it data for which the attributes to be predicted are known
21、Multiple passes are handled internally by the provider! Use an INSERT INTO statement The DMM will not persist the inserted data Instead it will analyze the given cases and build the DMM content (decision tree, segmentation model, association rules) INSERT INTO (columns list) INSERT INTO INSER
22、T INTO Plans Prediction ( StudentID, Gender, ParentIncome, IQ, Encouragement, CollegePlans ) SELECT StudentID, Gender, ParentIncome, IQ, Encouragement, CollegePlans FROM CollegePlans When Insert Into Is Done The DMM is trained The model can be retrained Content (rules, trees, formulas) c
23、an be explored OLE DB Schema rowset SELECT * FROM .CONTENT XML string (PMML) Prediction queries can be executed Predictions What are Predictions? Predictions apply the rules of a trained model to a new set of data in order to estimate missing attributes or values Predictions = queries The sy
24、ntax is SQL - like The output is a rowset In order to predict you need: Input data set A trained DMM Binding (mapping) information between the input data and the DMM Specification of what to predict The Truth Table Concept Gende r Parent Income IQ Encouragement Colleg e Plans Probability Male
25、 20000 85 Not Encouraged No 85% Male 20000 85 Not Encouraged Yes 15% Male 20000 85 Encouraged No 60% Male 20000 85 Encouraged Yes 40% Male 20000 90 Not Encouraged No 80% Male 20000 90 Not Encouraged Yes 20% Male 20000 90 Encouraged No 58% Prediction Gender Parent Income IQ Encouragement College Pl
26、ans Probability Male 20000 85 Not Encouraged No 85% Male 20000 85 Not Encouraged Yes 15% Male 20000 85 Encouraged No 60% Male 20000 85 Encouraged Yes 40% Male 20000 90 Not Encouraged No 80% Male 20000 90 Not Encouraged Yes 20% Male 20000 90 Encouraged No 58% Male 20000 90 Encouraged Yes 42% Male 200
27、00 95 Not Encouraged No 78% Male 20000 95 Not Encouraged Yes 22% Male 20000 95 Encouraged No 45% Its a JOIN! Student ID Gender Parent Income IQ Encouragement 1 Male 43000 85 Not Encouraged 2 Male 20000 135 Not Encouraged 3 Female 25000 105 Encouraged 4 Male 96000 100 Encouraged 5 Female 56000 125 No
28、t Encouraged 6 Female 46000 90 Not Encouraged The Prediction Query Syntax SELECT FROM PREDICTION JOIN ON = Example SELECT New Students.StudentID, Plans Prediction.CollegePlans, PredictProbability(CollegePlans) FROM Plans Prediction PREDICTION JOIN New Students ON Plans Prediction.G
29、ender = New Students.Gender AND Plans Prediction.IQ = New Students.IQ AND ... OLE DB DM Sample Provider with Source All required OLE DB objects, such as session, command, and rowset The OLE DB for Data Mining syntax parser Tokenization of input data Query processing engine A sampl
30、e Nave Bayes algorithm Model persistence in XML and binary formats Available at Integrated OLAP and DM Analysis Why Use DM with OLAP Relational DM is designed for: Reports of patterns Batch predictions fed into an OLTP system Real-time singleton prediction in an operational environment OL
31、AP is designed for interactive analysis by a knowledge worker Consistent and convenient navigational model Pre-aggregations of OLAP allow faster performance Understanding DM Content Decision Trees Credit Risk: 65% Good 35% Bad All Customers Credit Risk: 89% Good 11% Bad Debt=Low Credit Risk: 9
32、4% Good 6% Bad ET = Salaried Credit Risk: 70% Good 30% Bad Education? Credit Risk: 31% Good 69% Bad Education= High School Credit Risk: 79% Good 21% Bad Credit Risk: 45% Good 55% Bad Debt=High Debt ? Employ- -ment Type? ET = Self Employed Education= College Customers having high debt and college edu
33、cation: Filter(Individual Customers.Members, Customers.CurrentMember.Properties(“Debt”) = “High” And Customers.CurrentMember.Properties(“Education”) = “College”) Customers having low debt and are self employed: Filter(Individual Customers.Members, Customers.CurrentMember.Properties(“Debt”) = Low A
34、nd Customers.CurrentMember.Properties(“Employment Type”) = “Self Employed”) Equivalent DM Dimension Customers with high debt and college education All Customers Customers with high debt Customers with high debt and high school education Customers with low debt and self employed Customers with low d
35、ebt Customers with low debt and salaried Custom Roll-up Credit Risk - Good = 65%, Bad = 35% Aggregate(Filter( Good = 89%, Bad = 11% Aggregate(Filter( Good = 79%, Bad = 21% Aggregate(Filter( Good = 94%, Bad = 6% Aggregate(Filter( Good = 45%, Bad = 55% Aggregate(Filter( Good = 70%, Bad = 30
36、% Aggregate(Filter( Good = 31%, Bad = 69% Tree = Dimension Every node on the tree is a dimension member The node statistics are the member properties All members are calculated Formula aggregates the case dimension members that apply to this node The MDX is generated by the DM algorithm Anal
37、ysis Service will automatically generate the calculated dimension based on the DM content and also a virtual cube Applies to Classification (decision trees) Segmentation (clusters) Browsing the Virtual Cube Pivot the DM dimension: WA OR CA All Customers 3200 2500 8000 Customers with low debt
38、2320 1503 4300 Customers with high debt 880 997 4700 Customers college 320 450 2310 Customers high school 560 547 2390 Credit Risk: 70% Good, 30% Bad Predictions You might want to view predictions for each case For example: What is the expected profitability of a product? What is the
39、 credit risk of a specific customer? What are the products this customer is likely to buy? All of those predictions are available through MDX calculated members Singleton query is created automatically Prediction Calculated Member Measures.Probability of High Credit Risk: PREDICT(Customers.Curre
40、ntMember, “Credit Risk Model”, “PredictionProbability( PredictionHistogram(“Credit Risk”), High)“ ) Predictions Example Probability of High Credit Risk Probability of Low Credit Risk Joe Smith 73% 27% John Dow 68% 32% William Clington 45% 55% Robert Maxwell 98% 2% Denis Rodman 81% 19% Questions ? E-Mail:
- 溫馨提示:
1: 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2: 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3.本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 裝配圖網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 6.煤礦安全生產(chǎn)科普知識競賽題含答案
- 2.煤礦爆破工技能鑒定試題含答案
- 3.爆破工培訓(xùn)考試試題含答案
- 2.煤礦安全監(jiān)察人員模擬考試題庫試卷含答案
- 3.金屬非金屬礦山安全管理人員(地下礦山)安全生產(chǎn)模擬考試題庫試卷含答案
- 4.煤礦特種作業(yè)人員井下電鉗工模擬考試題庫試卷含答案
- 1 煤礦安全生產(chǎn)及管理知識測試題庫及答案
- 2 各種煤礦安全考試試題含答案
- 1 煤礦安全檢查考試題
- 1 井下放炮員練習(xí)題含答案
- 2煤礦安全監(jiān)測工種技術(shù)比武題庫含解析
- 1 礦山應(yīng)急救援安全知識競賽試題
- 1 礦井泵工考試練習(xí)題含答案
- 2煤礦爆破工考試復(fù)習(xí)題含答案
- 1 各種煤礦安全考試試題含答案