微軟SQL數(shù)據(jù)挖掘-數(shù)據(jù)倉庫技術(shù)研討會材料.ppt

上傳人：za****8 文檔編號：16991277 上傳時間：2020-11-06 格式：PPT 頁數(shù)：54 大小：2.05MB

收藏版權(quán)申訴舉報下載

微軟SQL數(shù)據(jù)挖掘-數(shù)據(jù)倉庫技術(shù)研討會材料.ppt_第1頁

第1頁 / 共54頁

微軟SQL數(shù)據(jù)挖掘-數(shù)據(jù)倉庫技術(shù)研討會材料.ppt_第2頁

第2頁 / 共54頁

微軟SQL數(shù)據(jù)挖掘-數(shù)據(jù)倉庫技術(shù)研討會材料.ppt_第3頁

第3頁 / 共54頁

下載文檔到電腦，查找使用更方便

14.9 積分

下載資源

還剩頁未讀，繼續(xù)閱讀

資源描述：

《微軟SQL數(shù)據(jù)挖掘-數(shù)據(jù)倉庫技術(shù)研討會材料.ppt》由會員分享，可在線閱讀，更多相關(guān)《微軟SQL數(shù)據(jù)挖掘-數(shù)據(jù)倉庫技術(shù)研討會材料.ppt（54頁珍藏版）》請在裝配圖網(wǎng)上搜索。

1、歡迎光臨微軟 SQL數(shù)據(jù)挖掘 /數(shù)據(jù)倉庫技術(shù)研討會今日安排微軟 SQL數(shù)據(jù)挖掘技術(shù)概述左洪微軟公司數(shù)據(jù)倉庫在電信的應(yīng)用貝志城明天高科數(shù)據(jù)挖掘在 CRM中的應(yīng)用王立軍中圣公司靈通 IT Service維護(hù)管理服務(wù)系統(tǒng) 鄒雄文廣州靈通 Introduction to Data Mining with SQL Server 2000 左洪高級產(chǎn)品市場經(jīng)理微軟（中國）有限公司 Agenda What is Data Mining The Data Mining Market OLE DB for Data Min

2、ing Overview of the Data Mining Features in SQL Server 2000 Q&A What Is Data Mining? What is DM? A process of data exploration and analysis using automatic or semi- automatic means “Exploring data” scanning samples of known facts about “cases”. “knowledge”: Clusters, Rules, Decision trees, Equ

3、ations, Association rules Once the “knowledge” is extracted it: Can be browsed Provides a very useful insight on the cases behavior Can be used to predict values of other cases Can serve as a key element in closed loop analysis What drive high school students to attend college? The deciding

4、factors for high school students to attend college are Attend College: 55% Yes 45% No All Students Attend College: 79% Yes 11% No IQ=High Attend College: 45% Yes 55% No IQ=Low IQ ? Wealth Attend College: 94% Yes 6% No Wealth = True Attend College: 69% Yes 21% No Wealth = False Parents Encourage?

5、Attend College: 70% Yes 30% No Attend College: 31% Yes 69% No Parents Encourage = No Parents Encourage = Yes Business Oriented DM Problems Targeted ads “What banner should I display to this visitor?” Cross sells “What other products is this customer likely to buy? Fraud detection “Is this insu

6、rance claim a fraud?” Churn analysis “Who are those customers likely to churn?” Risk Management “Should I approve the loan to this customer?” Http:// Mining Model Mining Process - Illustrated DM Engine Data To Predict DM Engine Predicted Data Training Data Mining Model Mining Model The Data

7、Mining Market The $$$: Y2000 Market Size DM Tools Market: $250M 40% - license fees 60% consulting * Gartner The Players Leading vendors SAS SPSS IBM Hundreds of smaller vendors offering DM algorithms Oracle Thinking Machines acquisition The Products End-to-end Data Mining tools Extr

8、action, Cleansing, Loading, Modeling, Algorithms (dozens), Analysts workbench, Reporting, Charting. The customer is the power-analyst PhD in statistics is usually required Closed tools no standard API Total vendor lock-in Limited integration with applications DM an “outsider” in the Data Wa

9、rehouse Extensive consulting required Sky rocketing prices $60K+ for a single user license What the analysts say “ Stand-alone Data Mining Is Dead” - Forrester “The demise of stand alone data mining” Gartner The Microsoft Approach DataPro Users Survey 1999-2001 “Data mining will be the fas

10、test- growing BI technology” The $$$: 2000 Market Size DM Applications Market Size: $1.5B * IDC SQL Server 2000 - The Analysis Platform SQL 2000 provides a complete Analysis Platform Not an isolated, stand alone DM product Platform means: The infrastructure for applications Not an applica

11、tion by itself Integrated vision for all technologies, tools Standard based APIs (OLE DB for DM) Extensible Scaleable Data Flow DW OLTP OLAP DM Apps Reports & Analysis DM Analysis Services 2000 - Architecture Manager UI DSO Analysis Server Client OLE DB OLAP OLAP Engine (local) OLAP Engine DM

12、 Engine DM Engine (local) DM DMM DM Wizards DM DTS Task Ext. Ext. OLE DB for Data Mining Why OLE DB for DM? Make DM a mass market technology by: Leverage existing technologies and knowledge SQL and OLE DB Common industry wide concepts and data presentation Changing DM market perception from

13、 “proprietary” to “open” Increasing the number of players: Reduce the cost and risk of becoming a consumer one tool works with multiple providers Reduce the cost and risk of becoming a provider focus on expertise and find many partners to complement offering Dramatically increase the number of

14、 DM developers Integration With RDBMS Customers would like to Build DM models from within their RDBMS Train the models directly off their relational tables Perform predictions as relational queries (tables in, tables out) Feel that DM is a native part of their database. Therefore Data minin

15、g models are relational objects All operations on the models are relational The language used is SQL (w/Extensions) The effect: every DBA and VB developer can become a DM developer Creating a Data Mining Model (DMM) Identifying the “Cases” DM algorithms analyze “cases” The “case” is the entity

16、being categorized and classified Examples Customer credit risk analysis: Case = Customer Product profitability analysis: Case = Product Promotion success analysis: Case = Promotion Each case encapsulate all we know about the entity A Simple Set of Cases StudentI D Gender Parent Income IQ Enco

17、uragement College Plans 1 Male 23400 120 Not Encouraged No 2 Female 79200 90 Encouraged Yes 3 Male 42000 105 Not Encouraged Yes More Complicated Cases Cust ID Age Marit al Statu s IQ Favorite Movies Title Score 1 35 M 2 Star Wars 8 Toy Story 9 Terminator 7 2 20 S 3 Star Wars 7 Braveheart 7 The Matri

18、x 10 3 57 M 2 Sixth Sense 9 Casablanca 10 A DMM is a Table! A DMM structure is defined as a table Training a DMM means inserting data into the table Predicting from a DMM means querying the table All information describing the case are contained in columns Creating a Mining Model CREATE MINING

19、MODEL Plans Prediction ( StudentID LONG KEY, Gender TEXT DISCRETE, ParentIncome LONG CONTINUOUS, IQ DOUBLE CONTINUOUS, Encouragement TEXT DISCRETE, CollegePlans TEXT DISCRETE PREDICT ) USING Microsoft_Decision_Trees Creating a mining model with nested table Create Mining Model Movie

20、Prediction ( CutomerId long key, Age long continuous, Gender discrete, Education discrete, MovieList table predict ( MovieName text key ) ) using microsoft_decision_trees Training a DMM Training a DMM Training a DMM means passing it data for which the attributes to be predicted are known

21、Multiple passes are handled internally by the provider! Use an INSERT INTO statement The DMM will not persist the inserted data Instead it will analyze the given cases and build the DMM content (decision tree, segmentation model, association rules) INSERT INTO (columns list) INSERT INTO INSER

22、T INTO Plans Prediction ( StudentID, Gender, ParentIncome, IQ, Encouragement, CollegePlans ) SELECT StudentID, Gender, ParentIncome, IQ, Encouragement, CollegePlans FROM CollegePlans When Insert Into Is Done The DMM is trained The model can be retrained Content (rules, trees, formulas) c

23、an be explored OLE DB Schema rowset SELECT * FROM .CONTENT XML string (PMML) Prediction queries can be executed Predictions What are Predictions? Predictions apply the rules of a trained model to a new set of data in order to estimate missing attributes or values Predictions = queries The sy

24、ntax is SQL - like The output is a rowset In order to predict you need: Input data set A trained DMM Binding (mapping) information between the input data and the DMM Specification of what to predict The Truth Table Concept Gende r Parent Income IQ Encouragement Colleg e Plans Probability Male

25、 20000 85 Not Encouraged No 85% Male 20000 85 Not Encouraged Yes 15% Male 20000 85 Encouraged No 60% Male 20000 85 Encouraged Yes 40% Male 20000 90 Not Encouraged No 80% Male 20000 90 Not Encouraged Yes 20% Male 20000 90 Encouraged No 58% Prediction Gender Parent Income IQ Encouragement College Pl

26、ans Probability Male 20000 85 Not Encouraged No 85% Male 20000 85 Not Encouraged Yes 15% Male 20000 85 Encouraged No 60% Male 20000 85 Encouraged Yes 40% Male 20000 90 Not Encouraged No 80% Male 20000 90 Not Encouraged Yes 20% Male 20000 90 Encouraged No 58% Male 20000 90 Encouraged Yes 42% Male 200

27、00 95 Not Encouraged No 78% Male 20000 95 Not Encouraged Yes 22% Male 20000 95 Encouraged No 45% Its a JOIN! Student ID Gender Parent Income IQ Encouragement 1 Male 43000 85 Not Encouraged 2 Male 20000 135 Not Encouraged 3 Female 25000 105 Encouraged 4 Male 96000 100 Encouraged 5 Female 56000 125 No

28、t Encouraged 6 Female 46000 90 Not Encouraged The Prediction Query Syntax SELECT FROM PREDICTION JOIN ON = Example SELECT New Students.StudentID, Plans Prediction.CollegePlans, PredictProbability(CollegePlans) FROM Plans Prediction PREDICTION JOIN New Students ON Plans Prediction.G

29、ender = New Students.Gender AND Plans Prediction.IQ = New Students.IQ AND ... OLE DB DM Sample Provider with Source All required OLE DB objects, such as session, command, and rowset The OLE DB for Data Mining syntax parser Tokenization of input data Query processing engine A sampl

30、e Nave Bayes algorithm Model persistence in XML and binary formats Available at Integrated OLAP and DM Analysis Why Use DM with OLAP Relational DM is designed for: Reports of patterns Batch predictions fed into an OLTP system Real-time singleton prediction in an operational environment OL

31、AP is designed for interactive analysis by a knowledge worker Consistent and convenient navigational model Pre-aggregations of OLAP allow faster performance Understanding DM Content Decision Trees Credit Risk: 65% Good 35% Bad All Customers Credit Risk: 89% Good 11% Bad Debt=Low Credit Risk: 9

32、4% Good 6% Bad ET = Salaried Credit Risk: 70% Good 30% Bad Education? Credit Risk: 31% Good 69% Bad Education= High School Credit Risk: 79% Good 21% Bad Credit Risk: 45% Good 55% Bad Debt=High Debt ? Employ- -ment Type? ET = Self Employed Education= College Customers having high debt and college edu

33、cation: Filter(Individual Customers.Members, Customers.CurrentMember.Properties(“Debt”) = “High” And Customers.CurrentMember.Properties(“Education”) = “College”) Customers having low debt and are self employed: Filter(Individual Customers.Members, Customers.CurrentMember.Properties(“Debt”) = Low A

34、nd Customers.CurrentMember.Properties(“Employment Type”) = “Self Employed”) Equivalent DM Dimension Customers with high debt and college education All Customers Customers with high debt Customers with high debt and high school education Customers with low debt and self employed Customers with low d

35、ebt Customers with low debt and salaried Custom Roll-up Credit Risk - Good = 65%, Bad = 35% Aggregate(Filter( Good = 89%, Bad = 11% Aggregate(Filter( Good = 79%, Bad = 21% Aggregate(Filter( Good = 94%, Bad = 6% Aggregate(Filter( Good = 45%, Bad = 55% Aggregate(Filter( Good = 70%, Bad = 30

36、% Aggregate(Filter( Good = 31%, Bad = 69% Tree = Dimension Every node on the tree is a dimension member The node statistics are the member properties All members are calculated Formula aggregates the case dimension members that apply to this node The MDX is generated by the DM algorithm Anal

37、ysis Service will automatically generate the calculated dimension based on the DM content and also a virtual cube Applies to Classification (decision trees) Segmentation (clusters) Browsing the Virtual Cube Pivot the DM dimension: WA OR CA All Customers 3200 2500 8000 Customers with low debt

38、2320 1503 4300 Customers with high debt 880 997 4700 Customers college 320 450 2310 Customers high school 560 547 2390 Credit Risk: 70% Good, 30% Bad Predictions You might want to view predictions for each case For example: What is the expected profitability of a product? What is the

39、 credit risk of a specific customer? What are the products this customer is likely to buy? All of those predictions are available through MDX calculated members Singleton query is created automatically Prediction Calculated Member Measures.Probability of High Credit Risk: PREDICT(Customers.Curre

40、ntMember, “Credit Risk Model”, “PredictionProbability( PredictionHistogram(“Credit Risk”), High)“ ) Predictions Example Probability of High Credit Risk Probability of Low Credit Risk Joe Smith 73% 27% John Dow 68% 32% William Clington 45% 55% Robert Maxwell 98% 2% Denis Rodman 81% 19% Questions ? E-Mail:

展開閱讀全文

溫馨提示:
1: 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2: 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3.本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 裝配圖網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

關(guān)于我們 - 網(wǎng)站聲明 - 網(wǎng)站地圖 - 資源地圖 - 友情鏈接 - 網(wǎng)站客服 - 聯(lián)系我們

備案號:蜀ICP備2024067431號-1 川公網(wǎng)安備51140202000466號

本站為文檔C2C交易模式，即用戶上傳的文檔直接被用戶下載，本站只是中間服務(wù)平臺，本站所有文檔下載所得的收益歸上傳人(含作者)所有。裝配圖網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對上載內(nèi)容本身不做任何修改或編輯。若文檔所含內(nèi)容侵犯了您的版權(quán)或隱私，請立即通知裝配圖網(wǎng)，我們立即給予刪除！

微軟SQL數(shù)據(jù)挖掘-數(shù)據(jù)倉庫技術(shù)研討會材料.ppt

最新文檔

相關(guān)資源

相關(guān)搜索