一個(gè)對(duì)評(píng)估的狀態(tài)機(jī)的可理解性的度量:實(shí)證研究外文文獻(xiàn)翻譯、中英文翻譯、外文翻譯
一個(gè)對(duì)評(píng)估的狀態(tài)機(jī)的可理解性的度量:實(shí)證研究外文文獻(xiàn)翻譯、中英文翻譯、外文翻譯,一個(gè),評(píng)估,狀態(tài)機(jī),可理解,度量,實(shí)證,研究,外文,文獻(xiàn),翻譯,中英文
附錄一:
一個(gè)對(duì)評(píng)估的狀態(tài)機(jī)的可理解性的度量:實(shí)證研究
可理解性(也稱可理解性或適當(dāng)可識(shí)別性)被認(rèn)為是構(gòu)建模型的重要的因素之一。在ISO 25010,可理解性被歸類為一種使用屬性。它也被認(rèn)為是維修方面的一個(gè)重要因素。也就是說,可以理解的模型可以維護(hù)活動(dòng)有所支持,相應(yīng)的進(jìn)行分析, 修改和擴(kuò)展,從而校正系統(tǒng),使得系統(tǒng)更加完善。眾所周知,在軟件開發(fā)過程中, 維修任務(wù)的50%和整個(gè)工藝的60%都是在嘗試了解軟件。這是因?yàn)殄e(cuò)誤的理解是導(dǎo)致錯(cuò)誤的主要原因。誤解的接口規(guī)范將導(dǎo)致通信錯(cuò)誤,并導(dǎo)致執(zhí)行錯(cuò)誤,例如出現(xiàn)缺少功能和故障的現(xiàn)象。恩德雷斯發(fā)現(xiàn),他所分析的誤區(qū)中,有46%都涉及到誤解。
狀態(tài)機(jī)(SM)是一個(gè)很流行的行為模型,用于描述一個(gè)系統(tǒng)、組件或?qū)ο蟮膭?dòng)態(tài)行為。SM被用于開發(fā)商如設(shè)計(jì)師,程序員,經(jīng)理和測(cè)試人員之間的溝通工具。一個(gè)狀態(tài)機(jī)(SM)的行為表現(xiàn)為可接受的事件的順序,以及事件的執(zhí)行過程,并且說明事件引起的變化。該模型還用于各種軟件工程領(lǐng)域,如正式驗(yàn)證這一領(lǐng)域和測(cè)試數(shù)據(jù)自動(dòng)生成的領(lǐng)域,以及逆向工程的領(lǐng)域。一個(gè)單一系統(tǒng)的行為可以由多個(gè)形式的狀態(tài)機(jī)(SMs)進(jìn)行描述。換句話說,即有這些多個(gè)形式的狀態(tài)機(jī)(SMs)同時(shí)指定了相同的行為,那么也不會(huì)造成混亂,因?yàn)樗麄兊臓顟B(tài)和轉(zhuǎn)換都可以分別配置。例如,一個(gè)有界堆容器的行為可以描述為用兩種狀態(tài),即空和非空。在另一方面, 通過把非空又可以分為部分空和全滿這兩種狀態(tài),這樣就可以使得一個(gè)狀態(tài)機(jī)(SM) 同時(shí)存在三個(gè)狀態(tài),即空,部分空和滿。
不同配置的多個(gè)形式的狀態(tài)機(jī)(SMs)的可理解性又是不同,即使它們所描述的行為都是等效的。例如,兩個(gè)狀態(tài)機(jī)(SM)為上文中所提及的有界堆棧容器所產(chǎn)生的可理解性可能也是不相同的。如果是前者的狀態(tài)機(jī)(SM),有兩種狀態(tài)用于發(fā)展,那么全狀態(tài)的限制則有可能出現(xiàn)被忽略的可能性,因?yàn)橥暾臓顟B(tài)并沒有明確在狀態(tài)機(jī)(SM)中提出。為了提高模型的可理解性,現(xiàn)在已經(jīng)提出并開發(fā)了大量的重構(gòu)設(shè)計(jì)規(guī)則和設(shè)計(jì)模式。
定量測(cè)量可理解性是高度理解和使用狀態(tài)機(jī)(SM)實(shí)現(xiàn)其建造功能的第一步。我們所現(xiàn)有認(rèn)知的整個(gè)范圍內(nèi),評(píng)估狀態(tài)機(jī)(SM)的理解性的研究指標(biāo)從未被報(bào)道過。盡管狀態(tài)機(jī)(SM)質(zhì)量評(píng)估作為一項(xiàng)極為重要的指標(biāo),但狀態(tài)機(jī)(SM)指標(biāo)進(jìn)
行質(zhì)量評(píng)估或故障預(yù)測(cè)的深入研究卻停滯不前,相反,取而代之的是元素指標(biāo),如計(jì)數(shù)狀態(tài)或轉(zhuǎn)換的數(shù)量。很多狀態(tài)機(jī)(SM)生成方法事實(shí)上已經(jīng)得到提出,但生成的多個(gè)形式的狀態(tài)機(jī)(SMs)的質(zhì)量卻未被得到過驗(yàn)證。其中的原因可以歸因于缺乏為狀態(tài)機(jī)(SM)的品質(zhì)進(jìn)行定量測(cè)量的方法。
規(guī)模和復(fù)雜性的幾個(gè)元素指標(biāo)已經(jīng)提出。但是,實(shí)際使用這些指標(biāo)來評(píng)估可理解性卻有些負(fù)擔(dān)和阻礙,因?yàn)闋顟B(tài)機(jī)(SM)的可理解性不需要小規(guī)?;蛘吆?jiǎn)單的形式。任何系統(tǒng)的行為都可以描述為一個(gè)單一的狀態(tài)的狀態(tài)機(jī)(SM)。這樣的狀態(tài)機(jī)
(SM)是非常小的和簡(jiǎn)單的。然而,因?yàn)閱螤顟B(tài)捕獲與系統(tǒng)的所有狀態(tài)相關(guān),這可能是比較難于理解的。例如,有界棧容器的行為可以表示為只有一個(gè)狀態(tài)的狀態(tài)機(jī)
(SM)。該狀態(tài)機(jī)(SM)可能有與之堆疊的方法相同的轉(zhuǎn)換數(shù)目,例如,兩個(gè)推入
()和推出()轉(zhuǎn)換形式。但是,狀態(tài)機(jī)(SM)與結(jié)構(gòu)模型如同一類型的圖表并沒什么不同。
聚和耦合是影響可理解性因素中的重要因素。凝聚是模塊中的元素關(guān)聯(lián)程度的一種度量。在強(qiáng)內(nèi)聚模塊中,所有的元素都涉及到一個(gè)單一的功能。這種凝聚力的模塊可以更容易被理解。耦合,在另一方面,是所測(cè)模塊之間的關(guān)系的一個(gè)指標(biāo)。兩個(gè)模塊之間的高耦往往會(huì)使得它更難于被理解。與此相反,低耦合裝置則是自包含的。因此,該模塊可以更容易的被維護(hù)和理解(即所謂的KISS原則:“保持簡(jiǎn)單, 或字面意義上為愚蠢的”)。
我們做出的假設(shè)是,各狀態(tài)在清楚地表明單一的情況下可以詮釋為可理解的狀態(tài)機(jī)(SM)。換句話說,提高理解這一重要因素是狀態(tài)的簡(jiǎn)單性而不是狀態(tài)機(jī)(SM)。因此,我們認(rèn)為,各狀態(tài)應(yīng)高度聚合,盡可能少的耦合。我們也相信,如果他們的狀態(tài)是可以理解的,那么狀態(tài)機(jī)(SM)也可以理解。
我們基于上述假說,于是提出了一個(gè)可理解性度量的概念,稱為狀態(tài)機(jī)可理解性度量(SUM)。首先,讓我們定義內(nèi)聚和耦合度量。凝聚力指標(biāo)是用來測(cè)量相互之間的狀態(tài)和轉(zhuǎn)換的一致性。具體地說,它計(jì)算狀態(tài)的狀況和相關(guān)的轉(zhuǎn)換的約束之間的關(guān)系的程度。耦合度量是用來衡量狀態(tài)之間的獨(dú)立性。具體地來說,我們的耦合度量分析了由狀態(tài)得到的相似情況,并計(jì)數(shù)行為上依賴狀態(tài)的數(shù)目。最后,狀態(tài)機(jī)可理解性度量(SUM)是通過混合的凝聚力和耦合度量綜合進(jìn)行定義的。
為了驗(yàn)證所提出狀態(tài)機(jī)可理解性度量(SUM)的有效性,我們進(jìn)行了一項(xiàng)實(shí)驗(yàn)。在實(shí)驗(yàn)中,我們?yōu)槲鍌€(gè)系統(tǒng)分別準(zhǔn)備了五不同配置的狀態(tài)機(jī)(SM),一共產(chǎn)生25
單一的狀態(tài)機(jī)(SM)。使用多個(gè)形式的狀態(tài)機(jī)(SMs)的主要屬性的十個(gè)問題的模板,然后準(zhǔn)備客觀地衡量多個(gè)形式的狀態(tài)機(jī)(SMs)的實(shí)際可理解性。五個(gè)系統(tǒng)(注意:不是多個(gè)形式的狀態(tài)機(jī)(SMs)),十個(gè)具體問題是單獨(dú)從問題的模板構(gòu)造進(jìn)行的設(shè)置。為了測(cè)量實(shí)際的可理解性,有25位高級(jí)工程師,以及15研究生參加了這項(xiàng)實(shí)驗(yàn)。每個(gè)狀態(tài)機(jī)(SM)的可理解性使用了八個(gè)學(xué)員的提問,總共40人參加。通過該項(xiàng)試驗(yàn)得到了兩個(gè)可理解性指標(biāo),即是可理解效率(UEff)和可理解的正確性
(UCor),這兩個(gè)指標(biāo)被采納用于定量測(cè)量。
為了驗(yàn)證狀態(tài)機(jī)可理解性度量(SUM)的效果,測(cè)定了狀態(tài)機(jī)可理解性度量(SUM)值與從25個(gè)狀態(tài)機(jī)中測(cè)得的UEff 和UCor 之間的相關(guān)性。我們使用Pearson相關(guān)系數(shù)作為分析工具。結(jié)果表明,狀態(tài)機(jī)可理解性度量(SUM)和UEff/UCor之間的關(guān)系在p值分別近似在0.003和0.027這兩個(gè)數(shù)值上時(shí)才有意義。
為了使指標(biāo)有用,它們也應(yīng)當(dāng)能夠評(píng)估同一系統(tǒng)的狀態(tài)機(jī)(SM)的可理解性。因此,我們進(jìn)行了一致性分析,以驗(yàn)證該指標(biāo)能夠持續(xù)評(píng)估狀態(tài)機(jī)(SM)同一系統(tǒng)的UEff或UCor指標(biāo)。結(jié)果表明,狀態(tài)機(jī)可理解性度量(SUM)始終具有與本狀態(tài)機(jī)的UEff/UCor的4、5個(gè)系統(tǒng)分別有著各種相關(guān)的聯(lián)系,并且這一聯(lián)系十分明顯。
一個(gè)事件的一個(gè)先決條件是定位在斜線“/”的左側(cè),而一個(gè)后置條件位于斜線的右側(cè)。這個(gè)符號(hào)是同樣適用于多個(gè)狀態(tài)機(jī)之間的轉(zhuǎn)換。這種方法entertime(T: int)在當(dāng)Cook=-off 和參數(shù)T 滿足t > 0 時(shí)可以得到正確的執(zhí)行。完成后entertime的過程后,lefttime 的值必須等于值T。這就意味著當(dāng)lefttime 大于零entertime 將會(huì)開始運(yùn)行,從而產(chǎn)生相應(yīng)的結(jié)果。在這種情況下,在后置條件未指定的變量仍然保存完好。
可理解性可理解為對(duì)象,組件或系統(tǒng)在某種程度上的的目的,是評(píng)估的一項(xiàng)明確的指標(biāo)。同樣,狀態(tài)機(jī)的性能可以被定義為一個(gè)質(zhì)量的因素,如何快速、正確的使用戶可以正確理解狀態(tài)機(jī)系統(tǒng)的行為。一個(gè)系統(tǒng)的動(dòng)態(tài)行為是一套可執(zhí)行的事件序列。如果一個(gè)狀態(tài)機(jī)的可理解性是很高的,我們可以很容易地和準(zhǔn)確地確定不僅已經(jīng)發(fā)生的的事件序列也可以確定之后的可能即將要發(fā)生的特定事件的狀態(tài)。
人們很容易認(rèn)為,較小和較不復(fù)雜的短信是更容易被人們所理解的。然而,這項(xiàng)研究強(qiáng)調(diào)的是有凝聚力和耦合的短信被認(rèn)為是為了獲得高效的短信而不僅僅是研究他的大小或復(fù)雜性。例如,高粘性和低耦合的短信在很多方面確實(shí)比較小和較不復(fù)雜的短信更容易理解。此外,人們總認(rèn)為銜接耦合可以提高理解性。因?yàn)闋顟B(tài)
機(jī)的主要指標(biāo)在多個(gè)狀態(tài)機(jī)之間的凝聚力和耦合是互補(bǔ)的。在接下來的兩部分,我們給出了這一論點(diǎn)的實(shí)驗(yàn)結(jié)果。我們定義了兩個(gè)假設(shè)方面的總和。第一個(gè)假設(shè)涉及“可理解性效率”與“和”之間的關(guān)系。這個(gè)新的假設(shè)是基于原先這樣的假設(shè)之上的,更高內(nèi)聚、低耦合的短信可以理解的速度是更快的??紤]到兩個(gè)或多個(gè)學(xué)科組被分配了不同的短信,有不同的價(jià)值和他們的理解和使用展現(xiàn)的短信,即系統(tǒng)的行為,是用來回答他們特定的問題。我們調(diào)查是否多個(gè)狀態(tài)機(jī)之間具有較高“和”速度受試者能夠正確地回答這個(gè)問題的概率比那些狀態(tài)機(jī)之間具有較低“和”速度受試者能夠正確地回答這個(gè)問題的概率要高。在第二個(gè)假設(shè),我們調(diào)查是否“和”這一因素也可以代表尤卡樂的指標(biāo)。以狀態(tài)機(jī)進(jìn)行分配分配為例子,如果我們的假設(shè)是正確的,分配給具有較高“和”短信的主體也將獲得更多的正確答案,而那些較低“和”短信的主體則將獲得較少的正確答案。那就是,如果短信更高內(nèi)聚、低耦合,受試者則更應(yīng)正確理解系統(tǒng)的短信。
15名畢業(yè)于釜山國立大學(xué)計(jì)算機(jī)科學(xué)工程系的學(xué)生和有四年工作經(jīng)驗(yàn)的人參 與了這項(xiàng)的實(shí)驗(yàn)活動(dòng)。實(shí)驗(yàn)是在2010開始的。40人已經(jīng)完成了多個(gè)軟件工程的課程。此課程的基礎(chǔ)知識(shí)包括:基于狀態(tài)的模型,UML和正式的語言,如OCL的理解和分析短信。這些知識(shí)的學(xué)習(xí)可以幫助他們加強(qiáng)他們對(duì)于短信內(nèi)容的理解。他們進(jìn)行了多項(xiàng)演習(xí),從基于合同的規(guī)范建立狀態(tài)機(jī)和生成測(cè)試用例,使用短信等方面展開的多項(xiàng)演習(xí)。此外,參與者之間的偏差最小化的統(tǒng)一也規(guī)范了回答問題,解決樣品狀態(tài)機(jī)性能問題的準(zhǔn)則。對(duì)于每個(gè)系統(tǒng),實(shí)驗(yàn)所設(shè)計(jì)的一個(gè)因素會(huì)有兩個(gè)以上的處理采用完全隨機(jī)設(shè)計(jì),即每個(gè)主題只有一個(gè)狀態(tài)機(jī)/系統(tǒng),他們被隨機(jī)分配到每一個(gè)狀態(tài)機(jī)中。在這個(gè)實(shí)驗(yàn)中,我們對(duì)五個(gè)不同的系統(tǒng)將會(huì)產(chǎn)生五個(gè)不同的短信,因此40 名實(shí)驗(yàn)員進(jìn)行評(píng)估將產(chǎn)生25條短信。每個(gè)狀態(tài)機(jī)設(shè)置了10個(gè)問題,由參與者進(jìn)行回答問題。對(duì)于每個(gè)系統(tǒng),八個(gè)案例被分配到同一個(gè)狀態(tài)機(jī)中。換句話說,每一個(gè)狀態(tài)機(jī)將進(jìn)行了八次運(yùn)行。我們共有200 案例。五條短信隨機(jī)選擇五個(gè)不同的系統(tǒng)給每個(gè)學(xué)科。短信會(huì)進(jìn)行隨機(jī)科目的選擇。換句話說,同樣的狀態(tài)機(jī)將被在不同的迭代的八個(gè)學(xué)科進(jìn)行研究。解決時(shí)間限制在每個(gè)狀態(tài)機(jī)不超過15分鐘。我們記錄了與所有的問題有關(guān)的15分鐘以內(nèi)的記錄時(shí)間和提交在狀態(tài)機(jī)中的參與者。實(shí)驗(yàn)共90 分鐘,包括15分鐘的五次迭代和75分鐘的準(zhǔn)備。
利用的方法是一種被廣泛用來分析連續(xù)變量的方法。是用來檢驗(yàn)兩個(gè)連續(xù)變量之間的線性關(guān)系,皮爾森相關(guān)分析可以被視為變量中的參數(shù)至少有一個(gè)符合正態(tài)分
布從而進(jìn)行分析假設(shè)。皮爾森相關(guān)分析的相關(guān)系數(shù)R的數(shù)值范圍在-1和1之間,正屬即是說明是正相關(guān),而負(fù)值意味著與本地相關(guān)變量有關(guān)。此外,我們分析了偏相關(guān)量研究和理解之間的真正關(guān)系時(shí),這個(gè)過程中系統(tǒng)的影響已被刪除。如果R是接近1 的臨界值,這兩個(gè)變量的相關(guān)性則呈現(xiàn)明顯接近直線的趨勢(shì)。另一方面,如果R值接近于零,則這兩個(gè)變量之間的關(guān)系不是線性的。意義這個(gè)因素(表示為P值)可以來自一個(gè)系數(shù)[。例如,P值是0.05,那么對(duì)應(yīng)的系數(shù)的值為0.396的數(shù)據(jù)的數(shù)量應(yīng)該是25。如果系數(shù)值高則對(duì)應(yīng)的值則會(huì)較低。在本研究中,我們使用的顯著性水平為0.05,是一般常用的水平,這意味著一個(gè)微不足道的結(jié)果的概率會(huì)小于5%。那是,我們可以統(tǒng)計(jì)確定度量的標(biāo)準(zhǔn),如果p是小于0.05作為一個(gè)有意義的評(píng)價(jià)短信的理解性指標(biāo)。類似和Ueff之間的關(guān)系,分析表明和尤卡龍之間的關(guān)系是積極和顯著的。在25條短信的“和”值與實(shí)測(cè)值之間的線性關(guān)系進(jìn)行統(tǒng)計(jì),則會(huì)得出相應(yīng)的結(jié)論。顯著這個(gè)因素(P = 0.027)應(yīng)控制在0.05的水平(P 60.05)。皮爾森相關(guān)分析的結(jié)果沒有考慮到多個(gè)狀態(tài)機(jī)之間的系統(tǒng)的影響。應(yīng)滿足適當(dāng)?shù)亩嚓P(guān)測(cè)試的要求,即語境因素,系統(tǒng)的標(biāo)準(zhǔn)值是從分析過程中得出的,不過在這個(gè)過程中忽略系統(tǒng)的影響。獨(dú)立的變量即金額和各因變量即Ueff和尤卡龍。公關(guān)是一個(gè)偏相關(guān)系數(shù),和Ueff之間的偏相關(guān)結(jié)果表明了積極和顯著的偏相關(guān)系數(shù)應(yīng)為0.833(P = 0)。這一過程伴隨著尤卡龍的部分相關(guān)系數(shù)也呈現(xiàn)顯著增加(PR = 0.566,P = 0.004)??傊柹瓕?shí)驗(yàn)的所有相關(guān)系數(shù)和偏相關(guān)系數(shù)都十分顯著。因此,我們可以拒絕了兩個(gè)零假設(shè),分別為h1,0和h2,0。此外,結(jié)果表明,金額是可理解性呈正相關(guān)時(shí)的重要因素。因此,我們不得不接受替代的假設(shè)(h1,1):使用短信能有較高的“和” 值以此顯著提高效率進(jìn)而提高學(xué)生的理解的正確性。
一個(gè)狀態(tài)機(jī)(SM)是一個(gè)動(dòng)態(tài)的行為模型,廣泛應(yīng)用于各種領(lǐng)域??衫斫庑允欠€(wěn)定性的重要因素。對(duì)于行為模型來說,可理解性為對(duì)進(jìn)行有效和正確的溝通有著極為重要的作用。然而,在研究過程中以研究評(píng)估安全管理體系的可理解性一直不被重視,沒有得到可令人信服的研究結(jié)果。因此,在本文中,我們提出了一種可理解性的度量,稱為狀態(tài)機(jī)可理解性度量(SUM),基于凝聚力和耦合。并且我們進(jìn)行了一項(xiàng)實(shí)驗(yàn),以確認(rèn)狀態(tài)機(jī)可理解性度量(SUM)的有用性。
為了驗(yàn)證狀態(tài)機(jī)可理解性度量(SUM)的有效性和普遍性,我們進(jìn)行相關(guān)性和一致性的分析。對(duì)于這些分析,我們準(zhǔn)備了從五次正式的規(guī)范中提取出的五種不同的信息,從而產(chǎn)生了共25中狀態(tài)機(jī)(SM)的。其可理解性(即可理解效率,UEff,
和可理解的正確性,UCor)是通過評(píng)估40人計(jì)算得到的。在分析結(jié)果中,狀態(tài)機(jī)可理解性度量(SUM)和兩個(gè)可理解的測(cè)量之間的p-值分別為0.003和0.027。此外, 在4、5個(gè)系統(tǒng)中SUM-UEff和SUM-UCor都分別有積極的關(guān)系。這些結(jié)果證實(shí)了狀態(tài)機(jī)可理解性度量(SUM)可以成為一個(gè)有用的狀態(tài)機(jī)(SM)的可理解性指標(biāo)。此外, 我們發(fā)現(xiàn),簡(jiǎn)單這一因素并不能積極的影響狀態(tài)機(jī)(SM)的可理解性。實(shí)驗(yàn)結(jié)果表明,系統(tǒng)越多,狀態(tài)機(jī)(SM)的正確性和可理解性越高。
我們一致認(rèn)為,實(shí)驗(yàn)的結(jié)果尚未定論。因此,我們計(jì)劃使用更多的系統(tǒng)和狀態(tài)機(jī)(SM)以及參與者來不斷進(jìn)行這項(xiàng)研究,通過更多的實(shí)驗(yàn),在未來,研究出高度理解的狀態(tài)機(jī)(SM),并不斷推進(jìn)其發(fā)展。一個(gè)能夠使得理解性很低的狀態(tài)機(jī)(SM) 轉(zhuǎn)變?yōu)楦叨壤斫獾臓顟B(tài)機(jī)(SM)的重構(gòu)方法還在不斷的研究與探索中。
附錄二:
A metric towards evaluating understandability ofstate machines:An empirical study
Understandability (also called comprehensibility or appropriateness recognizability) is considered one of the important quality factors for models. In ISO 25010, understandability is classified as an attribute of usability. It is also considered a crucial factor in the maintenance aspect. That is, understandable models can support maintenance activities to analyze, modify, and extend a system for correction,
adaptation, and perfection. It is well known that in the software development process, 50% of maintenance tasks and 60% of the overall process are consumed trying to understand the software. This is because misunderstanding can be a major cause of implementation errors. The misunderstanding of interface specifications leads to communication errors, and results in implementation errors such as missing functions and malfunctions. Endres found that of the errors he analyzed, 46% involved a misunderstanding.
A state machine (SM) is a popular behavioral model used to describe the dynamic behavior of a system, a component, or an object. SMs are utilized as a communication tool between developers such as designers, coders, managers, and testers. The behavior of a SM is represented by acceptable events sequences, actions corresponding to the events, and state changes according to the events. The model is also utilized in various software engineering fields such as formal validation , automatic test data generation, and reverse engineering .
The behavior of a single system can be described by multiple forms of SMs. In other words, states and transitions can be configured differently even if the SMs specify the same behavior. For example, the behavior of a bounded stack container can be described with two states (empty and nonempty). On the other hand, a SM with three states (empty, partially empty, and full) can be used by splitting the nonempty state into partially empty and full states.
Differently configured SMs can have different understandabilities, even if the behaviors they describe are equivalent. For example, the two SMs for the bounded stack container above may have different understandabilities. If the former SM that has two states is used for development, the constraint for the full state can be overlooked because the full state is not explicitly presented in the SM. To improve the understandability of models, a large number of design refactoring rules and design patterns have been proposed and developed.
A quantitative measure for understandability is the first step in the construction and use of highly understandable SMs. To the best of our knowledge, no studies on metrics aimed at evaluating the understandability of SMs have ever been reported. Despite the importance of SM quality, in-depth studies of SM metrics for quality evaluation or fault prediction have not been conducted; elemental metrics such as counting the number of states or transitions are presented instead. Many SM generation approaches have in fact been proposed , but the quality of the generated SMs is not validated. One of the reasons can be attributed to the absence of quantitative measures for SM qualities.
Several elemental metrics of size and complexity have been proposed for SMs . However, actually using these metrics to evaluate understandability is burdensome because smallness or simplicity is not necessary for understandability of SMs. The behavior of any system can be described by a SM with a single state. That SM is extremely small and simple. However, this can be more difficult to understand, because the single state captures all situations associated with the system. For example, the behavior of the bounded stack container can be represented as a SM with only one state. The SM may have as many transitions as the number of methods of the stack, e.g., two transitions for push() and pop(). However, the SM does not differ from structural models such as class diagrams.
Cohesion and coupling are known as significantly influential factors in understandability . Cohesion is a measure of the degree to which the elements of a module belong together. In a strongly cohesive module, all elements are related to a single function. Such cohesive modules can be easier to understand. Coupling, on the other hand, is an indicator of the relationship between the measured module and others.
High coupling between two modules can make it harder to understand either of them .By contrast, low coupling means that the module is self-contained. Thus, the module can be maintainable as well as understandable (the so-called KISS principle: ‘‘keep it simple, stupid!’’) .
Our hypothesis is that states should distinctly show single situations for understandable SMs. In other words, the important factor for understandability is simplicity of states not the SM. We therefore believe that states should be highly cohesive and less coupled to be understandable. We also believe that the SMs can be understandable if their states are understandable.
We propose an understandability metric, called the State machine Understandability Metric (SUM), which is based on the above hypothesis. First, let us define cohesion and coupling metrics. The cohesion metric is used to measure consistency among the state and transitions. Specifically, it calculates the degree of relationship between the condition of state and the constraints of the associated transitions. The coupling metric is used to measure independence between states.
Specifically, our coupling metric analyzes similarity of situations captured by states, and counts the number of behavioral dependent states. Finally, SUM is defined by mixing the cohesion and the coupling metrics.
To validate the efficacy of the proposed SUM, we conducted an experiment. In the experiment, we prepared five differently configured SMs for each of five systems, yielding 25 SMs in total. Ten question templates using major attributes of SMs were then prepared to objectively measure the actual understandabilities of the SMs. For the five systems (note: not for the SMs), individual sets of ten concrete questions were constructed from the question templates. To measure the actual understandability, 25 senior and 15 graduate students participated in the experiment. The understandabilities of each SM were then measured using the questions and groups of eight participants, yielding 40 participants in total. Two understandability indicators, understandability efficiency (UEff) and understandability correctness (UCor), were adopted for quantitative measurements.
To validate the efficacy of SUM, correlation analysis between the measured SUM
values and the measured UEff and UCor of the 25 SMs was performed. We used Pearson’s correlation coefficient as the analysis tool. The results showed that there are meaningful relationships between SUM and UEff/UCor (p-value = 0.003 and 0.027, respectively).
For metrics to be useful, they should also be able to evaluate the understandability of SMs for the same system. Therefore, we performed a consistency analysis to validate that the metrics can consistently evaluate the UEff or UCor of the SMs for the same system. The results indicated that SUM consistently has positive relationships with the UEff and UCor of four and five systems, respectively.
The remainder of this paper is organized as follows: Section 2 presents the definition of SM, a conceptual overview of understandability, and an outline of existing metrics. Section 3 defines SUM after defining the metrics of state cohesion and state coupling. Section 4 gives details of our experimental settings. Section 5 describes the results of our correlation and consistency analyses. Finally, Section 6 discusses threats to validity and Section 7 draws conclusions and proposes future work.
Similar to cohesion, the coupling of a SM can be determined from the ouplings between its states. The coupling of a state can be determined from the number of other states sharing the same preceding and following states. If there are many other states sharing the same preceding and following states, the coupling of the state is strong. A state that is tightly coupled with other states is difficult to understand because additional understanding of behaviors in similar states is involved. Consequentially, tightly coupled states make a SM hard to understand.
IT shows a comparison of the proposed metrics to existing size and complexity metrics on the SMs for Oven. It is easy to think that smaller and less complex SMs such as Oven3 are more understandable.However, this study insists that cohesion and coupling of SMs have to be considered to obtain high understandability of SMs rather than size or complexity. For example, highly cohesive (high SMCOH) and lowly coupled (low SMCOUP) SMs such as Oven1 can be more understandable than just smaller and less complex SMs such as Oven3. Moreover, SUM which considers both cohesion and coupling can be a major indicator of SM understandability because
cohesion and coupling in SMs are complementary. In the next two sections, we give experiment results for this contention.
We defined two hypotheses with regard to SUM. The first hypothesis concerns the relationship between UEff and SUM. This hypothesis is based on the assumption that the more highly cohesive and lower coupled SMs can be understood faster. Consider that two or more subject groups are assigned different SMs that have different SUM values and they understand the behavior of the system using the exposed SMs, i.e., they answer the given questions. We investigate whether the subjects in the groups exposed to the SMs having the higher SUM can correctly answer the question faster than those exposed to the SMs having the lower SUM. In the second hypothesis, we investigate whether the SUM can also represent UCor. Using the SM assigning example presented above, if our assumption is correct, the subjects assigned to the SMs with the higher SUM will also get more correct answers than those assigned to the SMs with the lower SUM. That is, if SMs are more highly cohesive and lower coupled, subjects should more correctly understand the systems using the SMs.
The human subjects who participated were 25 fourth-year and 15 graduate students
in the Department of Computer Science Engineering at Pusan National University. The experiment was performed in May 2010. The 40 participants had already completed more than one software engineering course. The courses taught them base knowledge such as state-based models, UML, and formal languages such as OCL to understand and analyze SMs. To help them strengthen their understanding of SMs, they performed exercises to construct SMs from contract-based specifications and to generate test cases using SMs. Additionally, to minimize deviation between the participants, an exercise was given to solve sample SM understandability questions with guidelines for answering the questions. For each system, the experiment was designed as one factor with more than two treatments using a completely randomized design [48], i.e., each subject received only one SM per system and they were assigned randomly to each SM. In this experiment, we had five SMs for five different systems, therefore 25 SMs were evaluated with 40 subjects. For each SM, 10 questions were prepared and participants answered the questions. For each system, eight subjects were assigned to each SM. In
other words, each SM was evaluated eight times, in total we had 200 aluations. Five SMs randomly selected for the five different systems were given to each subject. The SMs were given to subjects in random order. In other words, the same SM was examined by eight different subjects in different iterations.
The solving time was limited to 15 min for each SM. We asked that participants who solved all the questions pertaining to a SM within the 15 min record the time and submit. The experiment took a total of 90 min including 15 min for preparation and 75 min for the five iterations.
The method utilized is one that is widely used to analyze the relationship between continuous variables [50]. To test the linear relationship between two continuous variables, Pearson’s correlation analysis can be considered for parametric analysis under the assumption that at least one of the variables meets the normality. Pearson’s coefficient of correlation r can have any value between -1 and 1, a positive value signifies a positive correlation while a negative value signifies the native correlation of the variables. Inaddition, we analyzed the partial correlations to examine genuine relationships between SUM and understandability when the effects of systems have been removed. The correlations of the two variables are clearly close to a straight line if r is closer to 1 or-1. On the other hand, if r has a value close to zero, the relationship between the two variables is a sense that it is not linear.
Significance (expressed as p-value) can be derived from a coefficient. For example, the p-value is 0.05 if the coefficient value is 0.396 when the number of data is 25. The
p-value is low if the coefficient value is high. In this research, we used significance level a = 0.05, which is the level generally used and which means that the probability of an insignificant result is less than 5%. That is, we can statistically confirm the metric as a meaningful indicator to evaluate the understandability of SMs if the p is less than 0.05. Similar to the relationship between SUM and UEff, it was analyzed that the relationship between SUM and UCor was also positive and significant. The second row in Table 10 shows the results of Pearson’s correlation analysis between SUM and UCor. As shown in the table, in the 25 SMs the linear relationship between SUM values and measured UCor values with the subjects was statistically significant (p = 0.027) at a level of 0.05
(p 6 0.05).The Pearson’s correlation analysis results do not take into account the effects of SMs’ systems. Appropriate tests of the bivariate correlation require that the contextual factor i.e., system, be partialled from the analysis. Table 11 shows the partial correlations when
收藏