《并行計(jì)算概述課件》由會(huì)員分享,可在線閱讀,更多相關(guān)《并行計(jì)算概述課件(62頁(yè)珍藏版)》請(qǐng)?jiān)谘b配圖網(wǎng)上搜索。
1、單擊此處編輯母版標(biāo)題樣式,單擊此處編輯母版文本樣式,第二級(jí),第三級(jí),第四級(jí),第五級(jí),2019/10/27,并行計(jì)算--硬件基礎(chǔ)及性能評(píng)測(cè),?#?,,,并行計(jì)算,Parallel Computing,基本概念,2024/9/16,2,如何滿足不斷增長(zhǎng)的計(jì)算力需求?,用速度更快的硬件,也就是減少每一條指令所需時(shí)間,優(yōu)化算法(或者優(yōu)化編譯),用多個(gè)處理機(jī),(,器,),同時(shí)解決一個(gè)問(wèn)題,并行計(jì)算,2024/9/16,3,串行計(jì)算與并行計(jì)算,2024/9/16,4,并行的層次,程序級(jí)并行,子程序級(jí)并行,語(yǔ)句級(jí)并行,操作級(jí)并行,微操作級(jí)并行,,并行粒度,粗,細(xì),2024/9/16,5,FLOPS,Floa
2、ting point number Operations Per Second --,每個(gè)時(shí)鐘周期執(zhí)行浮點(diǎn)運(yùn)算的次數(shù),理論峰值=,CPU,主頻*每時(shí)鐘周期執(zhí)行浮點(diǎn)運(yùn)算數(shù)*,CPU,數(shù)目,部分處理器每時(shí)鐘周期執(zhí)行浮點(diǎn)運(yùn)算數(shù):,,2024/9/16,6,www.top500.org,2024/9/16,7,Top500,—,2007,年,11,月,高居榜首的依然是來(lái)自,IBM,的“藍(lán)色基因,/L”,。自從,2004,年,11,月以來(lái),該系統(tǒng)已經(jīng),連續(xù)三年遙遙領(lǐng)先,,而且計(jì)算能力不斷提升,,Linpack,基準(zhǔn)測(cè)試性能,478.2 TFlop/s,(,每秒,478.2,萬(wàn)億次運(yùn)算,),,而半年前還是
3、,280.6 TFlop/s,拿下亞軍位置的還是,IBM,,不過(guò)換成了一臺(tái)落成不久的,“藍(lán)色基因,/P”,。位于德國(guó)尤里希研究中心的這套新系統(tǒng)運(yùn)算能力,167.3 TFlop/s,,不過(guò)按照,IBM,的設(shè)計(jì)規(guī)劃,藍(lán)色基因,/P,的性能將有望突破,1 TFlop/s,大關(guān),即每秒一千萬(wàn)億次運(yùn)算。,2024/9/16,8,Top500,—,2007,年,11,月,第三名也是個(gè)新面孔,同時(shí)也是新,墨西哥,計(jì)算應(yīng)用中心,(NMCAC),的第一套超級(jí)計(jì)算機(jī),由,SGI,基于,Altix ICE 8200,打造,計(jì)算能力,126.9 TFlop/s,。,同時(shí),印度史上首次殺入了,TOP10,行列,,印度計(jì)
4、算研究實(shí)驗(yàn)室的,HP Cluster Platform 3000 BL460c,以,117.9 TFlop/s,的性能拿到了第四位,2024/9/16,9,供應(yīng)商-系統(tǒng)數(shù)量,,2024/9/16,10,供應(yīng)商-計(jì)算能力,2024/9/16,11,國(guó)家分布-系統(tǒng)數(shù)量,2024/9/16,12,國(guó)家分布-計(jì)算能力,2024/9/16,13,體系結(jié)構(gòu)-系統(tǒng)數(shù)量,2024/9/16,14,體系結(jié)構(gòu)-計(jì)算能力,2024/9/16,15,應(yīng)用領(lǐng)域-系統(tǒng)數(shù)量,2024/9/16,16,應(yīng)用領(lǐng)域-計(jì)算能力,2024/9/16,17,操作系統(tǒng)-系統(tǒng)數(shù)量,2024/9/16,18,操作系統(tǒng)-計(jì)算能力,2024/9
5、/16,19,處理器家族-系統(tǒng)數(shù)量,2024/9/16,20,處理器家族-計(jì)算能力,2024/9/16,21,系統(tǒng)數(shù)量,,,,2024/9/16,22,計(jì)算能力,,2024/9/16,23,2007,年中國(guó)高性能計(jì)算機(jī)性能,TOP100,,,2024/9/16,24,并行化方法,域分解(,Domain decomposition,),任務(wù)分解(,Task decomposition,),流水線(,Pipelining,),2024/9/16,25,域分解,First, decide how data elements should be divided among processors,Se
6、cond, decide which tasks each processor should be doing,Example: Vector addition,2024/9/16,26,域分解,,,,,,,,,,,,,,,,,,,,,,,,,Find the largest element of an array,2024/9/16,27,域分解,,,,,,,,,,,,,,,,,,,,,,,,,Find the largest element of an array,,,,,,,,,,CPU 0,CPU 1,CPU 2,CPU 3,2024/9/16,28,域分解,,,,,,,,,,,,,,
7、,,,,,,,,,,,Find the largest element of an array,,,,,,,,,,CPU 0,CPU 1,CPU 2,CPU 3,,,,,,,,,,,,,2024/9/16,29,域分解,,,,,,,,,,,,,,,,,,,,,,,,,Find the largest element of an array,,,,,,,,,,CPU 0,CPU 1,CPU 2,CPU 3,,,,,,,,,,,,,2024/9/16,30,域分解,,,,,,,,,,,,,,,,,,,,,,,,,Find the largest element of an array,,,,,,,
8、,,,CPU 0,CPU 1,CPU 2,CPU 3,,,,,,,,,,,,,2024/9/16,31,域分解,,,,,,,,,,,,,,,,,,,,,,,,,Find the largest element of an array,,,,,,,,,,CPU 0,CPU 1,CPU 2,CPU 3,,,,,,,,,,,,,2024/9/16,32,域分解,,,,,,,,,,,,,,,,,,,,,,,,,Find the largest element of an array,,,,,,,,,,CPU 0,CPU 1,CPU 2,CPU,,3,,,,,,,,,,,,,2024/9/16,33,域
9、分解,,,,,,,,,,,,,,,,,,,,,,,,,Find the largest element of an array,,,,,,,,,,CPU 0,CPU 1,CPU 2,CPU 3,,,,,,,,,,,,,2024/9/16,34,域分解,,,,,,,,,,,,,,,,,,,,,,,,,Find the largest element of an array,,,,,,,,,,CPU 0,CPU 1,CPU 2,CPU 3,,,,2024/9/16,35,域分解,,,,,,,,,,,,,,,,,,,,,,,,,Find the largest element of an array
10、,,,,,,,,,,CPU 0,CPU 1,CPU 2,CPU 3,,,,2024/9/16,36,域分解,,,,,,,,,,,,,,,,,,,,,,,,,Find the largest element of an array,,,,,,,,,,CPU 0,CPU 1,CPU 2,CPU 3,,,,2024/9/16,37,域分解,,,,,,,,,,,,,,,,,,,,,,,,,Find the largest element of an array,,,,,,,,,,CPU 0,CPU 1,CPU,,2,CPU,,3,,,,2024/9/16,38,任務(wù)(功能)分解,First, divi
11、de tasks among processors,Second, decide which data elements are going to be accessed (read and/or written) by which processors,Example: Event-handler for GUI,2024/9/16,39,任務(wù)分解,f(),s(),r(),q(),h(),g(),,,,,,,,2024/9/16,40,任務(wù)分解,f(),s(),r(),q(),h(),g(),,,,,,,,,,,CPU 0,CPU 2,CPU,,1,2024/9/16,41,任務(wù)分解,f()
12、,s(),r(),q(),h(),g(),,,,,,,,,,,CPU 0,CPU 2,CPU 1,2024/9/16,42,任務(wù)分解,f(),s(),r(),q(),h(),g(),,,,,,,,,,,CPU 0,CPU 2,CPU 1,2024/9/16,43,任務(wù)分解,f(),s(),r(),q(),h(),g(),,,,,,,,,,,CPU 0,CPU 2,CPU 1,2024/9/16,44,任務(wù)分解,f(),s(),r(),q(),h(),g(),,,,,,,,,,,CPU 0,CPU 2,CPU 1,2024/9/16,45,流水線,Special kind of task dec
13、omposition,“,Assembly line,”,parallelism,Example: 3D rendering in computer graphics,,,,,,,,Rasterize,Clip,Project,Model,Input,Output,,,2024/9/16,46,Processing One Data Set (Step 1),,,,,,,,Rasterize,Clip,Project,Model,2024/9/16,47,Processing One Data Set (Step 2),,,,,,,,Rasterize,Clip,Project,Model,2
14、024/9/16,48,Processing One Data Set (Step 3),,,,,,,,Rasterize,Clip,Project,Model,2024/9/16,49,Processing One Data Set (Step 4),,,,,,,,Rasterize,Clip,Project,Model,The pipeline processes 1 data set in 4 steps,2024/9/16,50,Processing Two Data Sets (Step 1),,,,,,,,Rasterize,Clip,Project,Model,,,,,,,,20
15、24/9/16,51,Processing Two Data Sets (Time 2),,,,,,,,Rasterize,Clip,Project,Model,,,,,,,,2024/9/16,52,Processing Two Data Sets (Step 3),,,,,,,,Rasterize,Clip,Project,Model,,,,,,,,2024/9/16,53,Processing Two Data Sets (Step 4),,,,,,,,Rasterize,Clip,Project,Model,,,,,,,,2024/9/16,54,Processing Two Data
16、 Sets (Step 5),,,,,,,,Rasterize,Clip,Project,Model,,,,,,,,The pipeline processes 2 data sets in 5 steps,2024/9/16,55,Pipelining Five Data Sets (Step 1),,,Data set 0,,,,,,,,Data set 1,,,,,,,,Data set 2,,,,,,,,Data set 3,,,,,,,,Data set 4,,,,,,,,,,CPU 0,CPU 1,CPU 2,CPU 3,2024/9/16,56,Pipelining Five D
17、ata Sets (Step 2),,,Data set 0,,,,,,,,Data set 1,,,,,,,,Data set 2,,,,,,,,Data set 3,,,,,,,,Data set 4,,,,,,,,,,CPU 0,CPU 1,CPU 2,CPU 3,2024/9/16,57,Pipelining Five Data Sets (Step 3),,,Data set 0,,,,,,,,Data set 1,,,,,,,,Data set 2,,,,,,,,Data set 3,,,,,,,,Data set 4,,,,,,,,,,CPU 0,CPU 1,CPU 2,CPU
18、3,2024/9/16,58,Pipelining Five Data Sets (Step 4),,,Data set 0,,,,,,,,Data set 1,,,,,,,,Data set 2,,,,,,,,Data set 3,,,,,,,,Data set 4,,,,,,,,,,CPU 0,CPU 1,CPU 2,CPU 3,2024/9/16,59,Pipelining Five Data Sets (Step 5),,,Data set 0,,,,,,,,Data set 1,,,,,,,,Data set 2,,,,,,,,Data set 3,,,,,,,,Data set 4
19、,,,,,,,,,,CPU 0,CPU 1,CPU 2,CPU 3,2024/9/16,60,Pipelining Five Data Sets (Step 6),,,Data set 0,,,,,,,,Data set 1,,,,,,,,Data set 2,,,,,,,,Data set 3,,,,,,,,Data set 4,,,,,,,,,,CPU 0,CPU 1,CPU 2,CPU 3,2024/9/16,61,Pipelining Five Data Sets (Step 7),,,Data set 0,,,,,,,,Data set 1,,,,,,,,Data set 2,,,,,,,,Data set 3,,,,,,,,Data set 4,,,,,,,,,,CPU 0,CPU 1,CPU 2,CPU 3,2024/9/16,62,Pipelining Five Data Sets (Step 8),,,Data set 0,,,,,,,,Data set 1,,,,,,,,Data set 2,,,,,,,,Data set 3,,,,,,,,Data set 4,,,,,,,,,,CPU 0,CPU 1,CPU 2,CPU 3,