壓縮包內(nèi)含有CAD圖紙和說明書,均可直接下載獲得文件,所見所得,電腦查看更方便。Q 197216396 或 11970985
一個(gè)有關(guān)移動(dòng)機(jī)器人定位的視覺傳感器模型
Matthias Fichtner Axel Gro_mann
Arti_cial Intelligence Institute
Department of Computer Science
Technische Universitat Dresden
Technical Report WV-03-03/CL-2003-02
摘要
我們提出一個(gè)在走廊和傳感器模型凌亂的奧西環(huán)境下的概率估計(jì)。該模型是基于在相機(jī)中所獲得的圖像特性與本發(fā)明的特征的環(huán)境中的一個(gè)給定的三維幾何模型的比較。所涉及的技術(shù)比最先進(jìn)的攝影測(cè)量方法還簡(jiǎn)單。這使得模型用于概率機(jī)器人的定位方法中。此外,他非常適合傳感器融合。傳感器模型已經(jīng)用于蒙特卡羅定位跟蹤的移動(dòng)機(jī)器人位置導(dǎo)航任務(wù)里。實(shí)證結(jié)果提出了這個(gè)應(yīng)用程序。
1 介紹
準(zhǔn)確的本地化問題是移動(dòng)機(jī)器人技術(shù)的基礎(chǔ)。為了成功的解決復(fù)雜的任務(wù),自主機(jī)器人必須可以正確的可靠地估計(jì)其目前的狀態(tài)。該定位方法的選擇通常取決于傳感器的種類和號(hào)碼,有關(guān)操作環(huán)境的先驗(yàn)知識(shí),和可用的計(jì)算資源。最近,應(yīng)用導(dǎo)航技術(shù)已經(jīng)越來越受歡迎。在室內(nèi)機(jī)器人的技術(shù)中,我們可以區(qū)分的方法是古人的攝影測(cè)量和計(jì)算機(jī)視覺,該方法起源于人工智能機(jī)器人。以視覺為基礎(chǔ)的導(dǎo)航技術(shù)的發(fā)展的一個(gè)重要的技術(shù)貢獻(xiàn)是通過識(shí)別來自未知的觀點(diǎn)在使用特征不變的特征單幅圖像的3D對(duì)象的工作。后來,這種技術(shù)拓展到全球定位和同步地圖建設(shè)。
最后系統(tǒng)進(jìn)行位置跟蹤通過使用一個(gè)環(huán)境的幾何模型和統(tǒng)計(jì)模型的機(jī)器人的不確定性運(yùn)動(dòng)姿勢(shì)和吩咐。機(jī)器人的位置是由一個(gè)高斯分布表示,并通過卡爾曼濾波更新。尋找相應(yīng)的攝像機(jī)圖像和模型的特點(diǎn)進(jìn)行了優(yōu)化,由此突出姿勢(shì)的不確定性為攝像機(jī)圖像。
在蒙特卡洛定位的基礎(chǔ)上,凝結(jié)算法已經(jīng)成功的應(yīng)用于導(dǎo)游機(jī)器人。這種基于視覺的貝葉斯過濾技術(shù)利用一個(gè)基于采樣的密度表示。在結(jié)論的對(duì)比中,他可以代表多峰概率分布。由于天花板的視覺地圖,它能定位全球范圍內(nèi)使用標(biāo)量的亮度測(cè)量機(jī)器人。結(jié)合了視覺距離特性和視覺地標(biāo)在機(jī)器人中的應(yīng)用程序提出了一個(gè)建立產(chǎn)線的方法。由于他們的方法依賴于專業(yè)的標(biāo)志性建筑,它并不適用于一次環(huán)境。
我們工作的目的是開發(fā)一種概率傳感器模式的攝像機(jī)位置估計(jì)。給定一個(gè)3D幾何環(huán)境地圖,我們想近似測(cè)量當(dāng)前相機(jī)圖像在某個(gè)地方獲得的機(jī)器人的工作環(huán)境。我們使用這個(gè)傳感器模型與MCL跟蹤在走廊移動(dòng)機(jī)器人導(dǎo)航的位置??赡艿脑?,它也可以用于定位在雜亂的辦公環(huán)境和基于形狀的物體的檢測(cè)。
一方面,我們結(jié)合攝影技術(shù)基于地圖的特征攝影與韌帶的靈活性和魯棒性,如處理本地化歧義的能力。另一方面,特征匹配操作應(yīng)該足夠快,以允許傳感器融合。除了視覺輸入,我們想使用距離讀數(shù)從聲波和激光來提高定位精度。
本文組織如下。在第二節(jié)中,我們討論之前的工作。在第三節(jié)中,我們描述了視覺傳感器模型的組件。在第四節(jié)中,我們目前的實(shí)驗(yàn)結(jié)果通過MCL位置跟蹤。我們的結(jié)論在第五節(jié)。
2 相關(guān)工作
在經(jīng)典的基于模型的構(gòu)成方法測(cè)定中,我們可以區(qū)分兩個(gè)相互關(guān)聯(lián)的問題。對(duì)應(yīng)的問題是涉及到求相應(yīng)的模型和圖像特征。這種映射發(fā)生之前,位置模型的模型特性生成使用給定的相機(jī)姿勢(shì)。據(jù)說特性可以匹配如果他們相互靠近。而造成的問題由3D相機(jī)坐標(biāo)對(duì)世界起源的模型考慮到相對(duì)應(yīng)的功能結(jié)束。顯然,有一個(gè)問題需要對(duì)方事先解決,這使得解決任何耦合問題非常困難。
上面的經(jīng)典問題的解決方案如下假設(shè)—測(cè)試途徑:
(1) 給定一個(gè)攝像機(jī)位置估計(jì),最佳匹配特征對(duì)群體提供初始猜測(cè)。
(2) 對(duì)于每一個(gè)假說,估計(jì)的相對(duì)相機(jī)的姿勢(shì)被給定計(jì)算誤差的函數(shù)最小化。
(3) 現(xiàn)在有一個(gè)更準(zhǔn)確的姿勢(shì)估計(jì)用于每個(gè)假說,其余模型特性投射到的圖像與使用相機(jī)的姿勢(shì)有關(guān)。匹配的質(zhì)量評(píng)估使用合適的誤差函數(shù),在所有的假設(shè)收益率中排名。
(4) 排名最高的假設(shè)被選中。
注意,對(duì)應(yīng)的問題解決步驟(1)和(3),構(gòu)成問題(2)和(4)。
該算法的性能將取決于所使用的特征的類型,例如邊緣,線段,或色彩,以及圖像和模型之間的相似性度量的選擇,在這里被稱為誤差函數(shù)。線段是我們的選擇,因?yàn)樗麄兛梢栽诟淖児庹諚l件下被相對(duì)可靠地檢測(cè)到。作為世界模型,我們使用的操作環(huán)境是一個(gè)線框模型,在VRML中表示。設(shè)計(jì)一個(gè)合適的相似性度量是更加困難的。
原則上,誤差函數(shù)是基于取向差異相應(yīng)行段在圖像和模型中,他們的距離和長度的差異,為了減少重要性,考慮到目前多有的功能對(duì)。這個(gè)已經(jīng)建立在以下三種常見的措施中。3D被定義為模型線端點(diǎn)之間距離的總和和相應(yīng)的平面。這一措施很大程度上依賴于與相機(jī)相反投影的距離。e2D稱為無限圖像行,是預(yù)測(cè)模型線的垂直距離對(duì)應(yīng)的端點(diǎn)與無限延伸線在平面上的圖像。雙測(cè)度,e2D2,簡(jiǎn)稱為在有限模型的線條,是總和超過所有的圖像線條端點(diǎn)對(duì)應(yīng)在圖像平面奈特雷擴(kuò)展模型的行。
圖1:視覺傳感器模型的處理步驟
限制搜索空間的匹配步驟,提出了限制可能的對(duì)應(yīng)數(shù)量,對(duì)于一個(gè)給定的姿勢(shì)估計(jì)通過結(jié)合線特性引入感性結(jié)構(gòu)。因?yàn)檫@些初始的對(duì)應(yīng)是通過e2D1和e2D2評(píng)價(jià)很高的要求強(qiáng)加在初始姿態(tài)估計(jì)與圖像處理作業(yè),包括去除失真和噪聲和特征提取的精度。假設(shè)在完整的長度下獲得所有可見模型。已經(jīng)證明了一些異常值會(huì)嚴(yán)重影響最初的通訊,在老的原始方法中由于頻繁截?cái)嘁鸬木€路壞下,阻塞和雜物。
3 傳感器模型
我們的方法是出于這個(gè)問題是否解決通信問題的估計(jì)可以避免相機(jī)姿勢(shì)。相反,我們建議進(jìn)行的圖像和模型的功能相對(duì)簡(jiǎn)單,直接匹配。我們想調(diào)查的準(zhǔn)確性和魯棒性是這種方式所能達(dá)到的水平。
參與本方法的處理步驟示于圖1.從相機(jī)中取出圖像失真后,我們用算子提取邊緣。這個(gè)操作符合在改變光照條件下提取。從邊緣線段確定結(jié)束點(diǎn)的坐標(biāo)被忽略。在這種表示中,截?cái)嗷蚍指罹€將有類似的坐標(biāo)在霍夫空間。同樣,行3D地圖投影到圖像平面使用攝像機(jī)姿態(tài)估計(jì),并考慮到可見性約束,被表示為坐標(biāo)空間。我們已經(jīng)在匹配步驟中設(shè)計(jì)了幾種誤差函數(shù)被用來作為相似性度量。他們?cè)谙旅婷枋觥?
中心匹配計(jì)數(shù)
該RST相似性度量是基于線段在霍夫空間的距離。我們認(rèn)為只能在一個(gè)長方形細(xì)胞內(nèi)的霍夫空間模型周圍為中心的功能,這些圖像特征作為可能匹配,對(duì)這些匹配結(jié)果進(jìn)行計(jì)數(shù),并由此對(duì)產(chǎn)生的總和歸一化。從模型特征來測(cè)量措施應(yīng)該是不變的,就在3D地圖或是經(jīng)營環(huán)境的變化不是模仿的對(duì)象。通過歸一化得到明顯特點(diǎn)的人數(shù)的不變性。具體而言,集中匹配計(jì)數(shù)測(cè)量SCMC被定義為:
其中未命名的使用距離參數(shù)的有效匹配。一般來說,這種相似性度量計(jì)算預(yù)期模型的比例至少由一個(gè)測(cè)量圖像點(diǎn)組成。注意,這里端點(diǎn)坐標(biāo)和長度都不考慮。
網(wǎng)格長度匹配
所述第二相似性度量是基于行組的總長度值的比較。圖像中分割線條組合在一起使用一個(gè)統(tǒng)一的離散空間。這種方法類似于霍夫變換擬合的直線。執(zhí)行相同的線路段的三維模型。讓測(cè)量線的長度落入網(wǎng)格單元,同樣根據(jù)模型,然后網(wǎng)格長度匹配測(cè)量是:
對(duì)所有包含模型的所有特性的網(wǎng)格單元,這種方法測(cè)量的長度與預(yù)期的來衡量。再次,該映射是有方向性的,也就是說,該模型被用作參考,以獲得噪聲,雜波和動(dòng)態(tài)對(duì)象的不確定性。
近鄰和豪斯多夫距離
此外,我們嘗試了兩個(gè)通用的方法對(duì)兩組幾何實(shí)體的比較:近鄰和豪斯多夫距離。詳細(xì)信息請(qǐng)參閱【7】。都依賴于距離函數(shù),我們基于離散坐標(biāo)空間,即線路參數(shù)和可選的長度,線性和指數(shù)的方式。查看完整描述【5】。
常見的錯(cuò)誤功能
為了比較,我們還實(shí)施了常用的誤差函數(shù)e3D,e2D1和e2D2,在笛卡爾空間中定義,我們代表行黑森符號(hào)xsinφ-ycosφ=d。通用誤差函數(shù)f,我們定義了相似度測(cè)量:
其中M是一組測(cè)量線,E是一組預(yù)期的行。在e2D1的情況下,f是由兩個(gè)模型之間的垂直距離線端點(diǎn)e1,e2,和無限擴(kuò)展的圖像行定義為:
同樣,雙重相似性度量,通過使用誤差函數(shù)e2D2,基于圖像之間的垂直距離線端點(diǎn)和無限擴(kuò)展行模式。
回憶的誤差函數(shù)e3D模型線端點(diǎn)的距離成正比,視圖平面通過一個(gè)圖像行和相機(jī),我們可以使用f3D實(shí)例化方程定義為:
獲取概率
理想情況下,我們希望返回單遞減值的相似性度量方法用于預(yù)測(cè)模型的姿勢(shì)估計(jì)和實(shí)際偏離特性的姿勢(shì)估計(jì)。一般,我們的目標(biāo)是在一個(gè)有效而簡(jiǎn)單的視覺傳感器模型中,抽象的想法是特定的姿勢(shì)和環(huán)境條件平均了大量的,獨(dú)立的情況。對(duì)于公度,我們想表達(dá)模型機(jī)器人的相對(duì)坐標(biāo)而不是絕對(duì)坐標(biāo)。換句話說,我們假設(shè)測(cè)量m的概率,鑒于姿勢(shì)lm的這張照片姿勢(shì)估計(jì)W的世界模型,等于這測(cè)量三維姿勢(shì)偏差的概率和世界模型。
由視覺傳感器模型返回的概率是通過簡(jiǎn)單的比例獲得:
4 實(shí)驗(yàn)結(jié)果
我們已經(jīng)在一系列實(shí)驗(yàn)中提出了評(píng)估傳感器模型的相似措施。使用理想化條件下人為創(chuàng)建圖像,然后我們將扭曲和噪聲添加到測(cè)試圖像中。隨后,我們使用在走廊已經(jīng)獲得的機(jī)器人真實(shí)圖像。最后,我們使用了傳感器模型穿越走廊跟蹤機(jī)器人的位置。在所有這些情況下,得到的三維可視化模型,他被用來評(píng)估解決方案。
模擬使用人為創(chuàng)造的圖像
作為第一種評(píng)價(jià),我們生成的合成圖像特性通過從某個(gè)相機(jī)姿勢(shì)中生成一個(gè)視圖的模型。一般來說,我們從右分支到左邊重復(fù)圖1。通過引入帶來Δl,我們可以直接顯示其影響相似度值。對(duì)可視化的目的,平移偏差Δx和Δy組成一個(gè)空間偏差Δt。初步實(shí)驗(yàn)顯示只有微不足道的差異時(shí),他們被認(rèn)為是獨(dú)立的。
圖2 :在人為創(chuàng)造的圖像中CMC的性能
對(duì)于每一個(gè)上面給出的相似性度量,至少1500萬個(gè)隨機(jī)相機(jī)姿勢(shì)加上一個(gè)隨機(jī)造成偏差的范圍構(gòu)成收益率模型。中央軍委測(cè)量的結(jié)果如圖2所示。使用GNUPLOT的平滑算子得到表面的3D圖。我們注意到一個(gè)獨(dú)特的峰值在零偏差處理單遞減相似性值誤差增加。請(qǐng)注意,這個(gè)簡(jiǎn)單的測(cè)量認(rèn)為端點(diǎn)坐標(biāo)和長度的行。然而,我們已經(jīng)取得了良好的結(jié)果。
而由此產(chǎn)生的GLM,CMC類似于一個(gè)標(biāo)準(zhǔn)曲線,峰值則更加與眾不同。這符合我們的預(yù)期,因?yàn)樵谶@里考慮到圖像和模型線非常的重要。與中央的測(cè)量相比,在這種方法中會(huì)有偶然的錯(cuò)誤匹配,由于不同的長度。
緊鄰的衡量是不使用的。雖然線性和指數(shù)加權(quán)方案嘗試,即使考慮到線段的長度,沒有獲得獨(dú)特的高峰,在進(jìn)一步考慮中導(dǎo)致其被排除。
基于豪斯多夫距離測(cè)量的表現(xiàn)不如前兩個(gè),CMC和GLM,但是它的方式是所需要的。但其溫和的表現(xiàn)并不支付其最長的計(jì)算時(shí)間消耗在所有提出的措施中,并隨后被忽視。到目前為止,我們已經(jīng)表明我們的相似性措施可執(zhí)行。接下來,我們將使用誤差函數(shù)在這個(gè)框架中。在我們的設(shè)置函數(shù)e2D1表現(xiàn)很好。由此產(chǎn)生的曲線非常相似措施GLM中的曲線。這兩種方法都表現(xiàn)出一種獨(dú)特的峰值在正確的位置姿態(tài)中偏差為零。注意,線段的長度直接影響測(cè)量返回的相似度值,雖然兩個(gè)線性屬性有助于衡量e2D1。令人驚訝的是,其他兩個(gè)誤差函數(shù)e2D2和e3D表現(xiàn)不佳。
更現(xiàn)實(shí)的條件
在我們的傳感器模型中為了排除扭曲和嘈雜的圖像數(shù)據(jù),我們描述進(jìn)行另一組實(shí)驗(yàn)。為此,我們應(yīng)用一下所有綜合誤差模型生成的圖像特征匹配模型特性。每個(gè)原始行被復(fù)制的小概率(p值=0.2),并移入空間。任何線路長度超過30 像素被分成以概率p=0.3.此外,設(shè)有不存在于模型和噪聲通過添加隨機(jī)線條圖像中的均勻分布進(jìn)行了模擬。本方向是根據(jù)電流分布的角度產(chǎn)生相當(dāng)?shù)牡湫吞卣鳌?
從第一組實(shí)驗(yàn)中,這些模擬的結(jié)果沒有顯著地差異。相似度在零偏差的最大值降低,所有的形狀和特征相似性措施仍在考慮保持不變。
使用真實(shí)的走廊照片
對(duì)現(xiàn)實(shí)世界的情況自上述模擬結(jié)果可能是有問題的,我們進(jìn)行了另一組實(shí)驗(yàn)取代合成特性測(cè)量的實(shí)際相機(jī)圖像。比較結(jié)果為各種參數(shù)設(shè)置,我們收集圖片2在先驅(qū)機(jī)器人在走廊上離線和記錄功能。對(duì)于走廊示例典型的觀點(diǎn),機(jī)器人姿態(tài)在三維空間中的兩個(gè)不同的位置幾乎被離散化。后將手動(dòng)機(jī)器人在每個(gè)頂點(diǎn)歸零,它執(zhí)行一個(gè)完整的當(dāng)場(chǎng)將逐步記錄圖像。這樣可以確保最高的準(zhǔn)確性造成坐標(biāo)和圖像想關(guān)聯(lián)。這樣,已經(jīng)有超過3200張圖片收集到64個(gè)不同的位置(x,y)。同樣以上面的模擬,對(duì)姿勢(shì)進(jìn)行了系統(tǒng)選擇從所涵蓋的測(cè)量范圍內(nèi)。通過傳感器模型指的姿態(tài)偏差Δl中的相同的離散值計(jì)算出的值,根據(jù)公式2中的假設(shè)進(jìn)行平均。
圖3:GLM對(duì)從走廊真實(shí)圖像中表現(xiàn)
結(jié)果可視化的相似性度量空間(x,y)和旋轉(zhuǎn)偏離正確的相機(jī)姿態(tài)CMC衡量展覽一個(gè)獨(dú)特的峰值約零點(diǎn)誤差。當(dāng)然,由于數(shù)量小得多的數(shù)據(jù)樣本相對(duì)于模擬使用合成數(shù)據(jù),曲線的形狀更崎嶇不平,但這是符合我們的期望。
采用此設(shè)置中的GLM措施的結(jié)果示于圖3。因?yàn)樗沂玖艘粋€(gè)更獨(dú)特的峰值相比,CMC的曲線,他或多或少的演示了特征圖譜之間的比較,考慮到了線路的長度。
蒙特卡洛本地化使用了視覺傳感器模型
我們的目標(biāo)是設(shè)計(jì)一個(gè)概率傳感器相機(jī)安裝在移動(dòng)機(jī)器人的模型上,我們繼續(xù)呈現(xiàn)的結(jié)果應(yīng)用到移動(dòng)機(jī)器人的本地化上。
傳感器的通用接口模型允許它用于貝葉斯本地化方法的校正步驟,例如,標(biāo)準(zhǔn)版的蒙特卡洛本地化算法。統(tǒng)計(jì)獨(dú)立以來,傳感器讀書中呈現(xiàn)了一個(gè)基本假設(shè)線,我們希望的是使用相機(jī)來代替獲得更高的準(zhǔn)確度和魯棒性除了常用的距離傳感器還有聲吶獲激光。
圖4 :本地化圖像和預(yù)測(cè)模型
在走廊實(shí)驗(yàn)中,移動(dòng)機(jī)器人配備了固定安裝的CCD相機(jī)必須按照預(yù)定的路線形狀的雙重循環(huán)。途中,他必須停在八個(gè)預(yù)定義的位置,或者打開視圖在附近的一個(gè)角落里。每個(gè)圖像捕獲啟動(dòng)MCL的所謂校正步驟和所有樣本的權(quán)重,根據(jù)該傳感器模型重新計(jì)算,收益率密度最高的樣品可能在重新采樣正確的姿勢(shì)坐標(biāo)。在上述預(yù)測(cè)步驟中,將整個(gè)樣本集根據(jù)機(jī)器人的運(yùn)動(dòng)模型和當(dāng)前測(cè)距傳感器的讀數(shù)移入空間。
我們的初步結(jié)果看起來很有希望。在位置跟蹤實(shí)驗(yàn)中,機(jī)器人被賦予了其開始位置的估計(jì)值,大部分時(shí)間里,機(jī)器人的位置假設(shè)在最好的姿勢(shì)。在這個(gè)實(shí)驗(yàn)中,我們已經(jīng)使用了CMC措施,在圖4中,一個(gè)典型的相機(jī)如圖所示,而機(jī)器人如下所示請(qǐng)求路徑。灰度級(jí)圖像描繪了視覺輸入的失真排除和預(yù)處理后的特征提取,也顯示提取線特征。此外,世界模型是根據(jù)兩個(gè)伸出姿勢(shì),該測(cè)距跟蹤的構(gòu)成和通過MCL計(jì)算出的估計(jì)值,他大約對(duì)應(yīng)于正確的姿勢(shì),在他們之間,我們觀察和旋轉(zhuǎn)誤差。
畫面還顯示,旋轉(zhuǎn)誤差對(duì)巧合特性有很強(qiáng)的影響程度。對(duì)應(yīng)于上述給出的結(jié)果,數(shù)據(jù)表現(xiàn)出旋轉(zhuǎn)偏差比平移偏差表現(xiàn)出更高的梯度。這一發(fā)現(xiàn)可以解釋出運(yùn)動(dòng)裸關(guān)節(jié)的功能空間,因此我們的相機(jī)傳感器模型將在檢測(cè)旋轉(zhuǎn)中個(gè)出現(xiàn)分歧。像我們的先鋒軸承旋轉(zhuǎn)測(cè)程法遠(yuǎn)高于平移誤差,這個(gè)屬性使它特別適用于雙輪驅(qū)動(dòng)的機(jī)器人。
5 結(jié)論和未來的工作
我們已經(jīng)提出了一個(gè)概率傳感器模型攝像機(jī)位置估計(jì)。通用的設(shè)計(jì)使得它適合與距離傳感器融合來感知來自其他的傳感器。在理想和現(xiàn)實(shí)的條件下,我們展示了廣泛的模擬和確定了合適的相似性度量。傳感器模型的本地化任務(wù)的移動(dòng)機(jī)器人的申請(qǐng)符合我們的預(yù)測(cè)偏差。在本文中我們強(qiáng)調(diào)了很多可以改進(jìn)的空間。
我們正在研究合適的技術(shù)來定量地評(píng)價(jià)定位算法的設(shè)計(jì)傳感器模型的移動(dòng)機(jī)器人的性能。這將使我們能夠嘗試雜亂的環(huán)境和動(dòng)態(tài)對(duì)象。結(jié)合相機(jī)傳感器模型使用的距離傳感器融合信息呈現(xiàn)強(qiáng)勁的下一個(gè)步驟和導(dǎo)航。因?yàn)橛杏玫墓δ軘?shù)量顯著變化作為機(jī)器人穿越室內(nèi)環(huán)境,這個(gè)想法駕馭相機(jī)對(duì)準(zhǔn)更豐富的視圖(主動(dòng)視覺)提供了一個(gè)很有前途的研究路徑,以及強(qiáng)大的導(dǎo)航功能。
參考文獻(xiàn)
[1] F. Dellaert, W. Burgard, D. Fox, and S. Thrun. Using the condensation algorithm for robust, vision-based mobile robot localisation. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 1999.
[2] D. DeMenthon, P. David, and H. Samet. SoftPOSIT: An algorithm for registration of 3D models to noisy perspective images combining Softassign and POSIT. Technical report, University of Maryland, MD, 2001.
[3] G. N. DeSouza and A. C. Kak. Vision for mobile robot navigation: A survey. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(2):237{267, 2002.
[4] S. Enderle, M. Ritter, D. Fox, S. Sablatnog, G. Kraetzschmar, and G. Palm. Soccer-robot localisation using sporadic visual features. In Intelligent Autonomous Systems 6, pages 959{966. IOS, 2000.
[5] M. Fichtner. A camera sensor model for sensor fusion. Master's thesis, Dept. of Computer Science, TU Dresden, Germany, 2002.
[6] S. A. Hutchinson, G. D. Hager, and P. I. Corke. A tutorial on visual servo control. IEEE Trans. on Robotics and Automation, 12(5):651{ 670, 1996.
[7] D. P. Huttenlocher, G. A. Klanderman, and W. J. Rucklidge. Comparing images using the Hausdor_ distance. IEEE Trans. on Pattern Analysis and Machine Intelligence, 15(9):850{863, 1993.
[8] A. Kosaka and A. C. Kak. Fast vision-guided mobile robot navigation using model-based reasoning and prediction of uncertainties. Com- puter Vision, Graphics, and Image Processing { Image Understanding, 56(3):271{329, 1992.
[9] R. Kumar and A. R. Hanson. Robust methods for estimating pose and a sensitivity analysis. Computer Vision, Graphics, and Image Processing { Image Understanding, 60(3):313{342, 1994.
[10] D. G. Lowe. Three-dimensional object recognition from single twodimensional images. Arti_cial Intelligence, 31(3):355{395, 1987.
[11] S. Se, D. G. Lowe, and J. Little. Vision-based mobile robot localization and mapping using scale-invariant features. In Proc. of the IEEE Int. Conf. on Robotics and Automation, pages 2051{2058, 2001.
[12] G. D. Sullivan, L. Du, and K. D. Baker. Quantitative analysis of the viewpoint consistency constraint in model-based vision. In Proc. of the 4th Int. IEEE Conf. on Computer Vision, pages 632{639, 1993.
- 12 -
任務(wù)書
題目
多用途輪式自動(dòng)行走小車
論文時(shí)間
20**年 2月 24日 至 20**年6 月14 日
課題的
主要內(nèi)容及要求
(含技術(shù)要求、圖標(biāo)要求等)
內(nèi)容:設(shè)計(jì)一輪式行走方式的自動(dòng)行走小車,作為多用途的實(shí)驗(yàn)用載體平臺(tái),可在其上進(jìn)一步安裝多種作業(yè)機(jī)械,如收獲機(jī)械手等其他機(jī)構(gòu),滿足在田間和硬質(zhì)土地路面行走的要求,在此機(jī)械產(chǎn)品基礎(chǔ)上,加裝電氣控制機(jī)構(gòu),可達(dá)到自動(dòng)行走進(jìn)和遙控行走的功能。
要求:1行走速度:≤1m/s
2負(fù)載能力:≤100kg
3行走方式:輪式
4防碰撞探測(cè):機(jī)械方式
5工作平臺(tái)提升方式:半閉環(huán)控制機(jī)械方式提升
6工作平臺(tái)提升高度:500mm
7外形尺寸:寬≤600mm
課題實(shí)施的方法、步驟及工作量要求
1. 查閱有關(guān)資料,了解國家或行業(yè)對(duì)自動(dòng)行走小車的要求等;
2. 確定試驗(yàn)方案,擬定輪式自動(dòng)行走小車的機(jī)械部分結(jié)構(gòu)設(shè)計(jì);
3. 確定工程圖樣,完成零件圖,裝配圖和總裝圖,完成圖紙工作量累計(jì)3張A0圖紙以上;
4. 完成外文翻譯漢子3000以上;
5. 完成設(shè)計(jì)說明書1萬字以上;
指定參考
文獻(xiàn)
[1] 廖權(quán)來等.《電動(dòng)汽車的關(guān)鍵技術(shù)及發(fā)展對(duì)策》. 中國機(jī)械工程[M] ,1998
[2] 劉麗.《世界電動(dòng)汽車最新發(fā)展綜述》[J].汽車電器.1993
[3] 時(shí)瑞祥.《氫氣發(fā)動(dòng)機(jī)小型內(nèi)燃機(jī)》[M].1994
[4] 王文清.《世界汽車工業(yè)發(fā)展趨勢(shì)》[J].汽車電器 1991
[5] 張準(zhǔn)、汪鳳泉編著,《振動(dòng)分析》[M].東南大學(xué)出版社 1991.
[6] 崔存.《現(xiàn)代汽車新技術(shù)》[M].人民交通出版社
[7] 廖大方、王志金譯.《未來汽車技術(shù)》[J].武漢汽車工業(yè)大學(xué)科技情報(bào)所
[8] 劉巖、丁玉蘭、林逸.《汽車高速振動(dòng)仿真與試驗(yàn)研究》[M].2000.17
[9] 魯麥特《車輛動(dòng)力學(xué)—模擬及其方法》[J]北京理工大學(xué)出版社 2002
[10] 龐勇、賀益康.《基于專用控制芯片的永磁無刷直流電機(jī)控制器》[J]1999
[11] 侯志祥、李小穎.《我國21世紀(jì)電動(dòng)汽車發(fā)展的對(duì)策交道科技》2002
[12]《牽引用鉛酸蓄電池產(chǎn)品品種和規(guī)格標(biāo)準(zhǔn)》[J ] GB 7403.2 1987
畢業(yè)設(shè)計(jì)(論文)進(jìn)度計(jì)劃(以周為單位)
第1周(20**年2月24日----20**年2月28日):
下達(dá)設(shè)計(jì)任務(wù)書,明確任務(wù),熟悉課題,收集資料,上交外文翻譯、參考文獻(xiàn)和開題報(bào)告;
第2周--第3周(20**年3月3日--20**年3月14日):
根據(jù)本課題的要求,制定總體方案,繪制草圖;
第4周--第5周(20**年3月17日--20**年3月28日):
完成設(shè)計(jì)方案,擬定自動(dòng)行走小車的設(shè)計(jì)草圖;
第6周--第7周(20**年3月31日--20**年4月11日):
完成自動(dòng)小車的設(shè)計(jì)總圖及有關(guān)零件設(shè)計(jì)圖;
第8周(20**年4月14日--20**年4月18日):
提交第1-8周的《指導(dǎo)記錄表》和已做的畢業(yè)設(shè)計(jì)內(nèi)容,由指導(dǎo)老師初審后上交學(xué)院
第9周--第13周(20**年4月21日--20**年5月23日):
在指導(dǎo)老師的指導(dǎo)下修改并完成設(shè)計(jì),完成相關(guān)設(shè)計(jì)圖紙,同時(shí)撰寫畢業(yè)設(shè)計(jì)說明書,并提交指導(dǎo)老師初審;
第14周--第16周(20**年5月26日--20**年6月14日):
修改畢業(yè)設(shè)計(jì)圖紙及說明書,完成后參加畢業(yè)答辯;
備注
注:表格欄高不夠可自行參加。此表由指導(dǎo)老師在畢業(yè)設(shè)計(jì)(論文)工作開始前填寫,每位畢業(yè)生兩份,一份發(fā)給學(xué)生,一份交院(系)留存。
A Visual-Sensor Model for Mobile Robot Localisation
Matthias Fichtner Axel Gro_mann
Arti_cial Intelligence Institute
Department of Computer Science
Technische Universitat Dresden
Technical Report WV-03-03/CL-2003-02
Abstract
We present a probabilistic sensor model for camera-pose estimation in hallways and cluttered o_ce environments. The model is based on the comparison of features obtained from a given 3D geometrical model of the environment with features present
in the camera image. The techniques involved are simpler than state-of-the-art photogrammetric approaches. This allows the model to be used in probabilistic robot localisation methods. Moreover, it is very well suited for sensor fusion. The sensor model has been used with Monte-Carlo localisation to track the position of a mobile robot in a hallway navigation task. Empirical results are presented for this application.
1 Introduction
The problem of accurate localisation is fundamental to mobile robotics. To solve complex tasks successfully, an autonomous mobile robot has to estimate its current pose correctly and reliably. The choice of the localization method generally depends on the kind and number of sensors, the prior knowledge about the operating environment, and the computing resources available. Recently, vision-based navigation techniques have become increasingly popular [3]. Among the techniques for indoor robots, we can distinguish methods that were developed in the _eld of photogrammetry and computer vision, and methods that have their origin in AI robotics.
An important technical contribution to the development of vision-based navigation
techniques was the work by [10] on the recognition of 3D-objects from unknown viewpoints in single images using scale-invariant features. Later, this technique was extended to global localisation and simultaneous map building [11].
The FINALE system [8] performed position tracking by using a geometrical model of the environment and a statistical model of uncertainty in the robot's pose given the commanded motion. The robot's position is represented by a Gaussian distribution and updated by Kalman _ltering. The search for corresponding features in camera image and world model is optimized by projecting the pose uncertainty into the camera image.
Monte Carlo localisation (MCL) based on the condensation algorithm has been applied successfully to tour-guide robots [1]. This vision-based Bayesian _ltering technique uses a sampling-based density representation. In contrast to FINALE, it can represent multi-modal probability distributions. Given a visual map of the ceiling, it localises the robot globally using a scalar brightness measure. [4] presented a vision-based MCL approach that combines visual distance features and visual landmarks in a RoboCup application. As their approach depends on arti_cial landmarks, it is not applicable in o_ce environments.
The aim of our work is to develop a probabilistic sensor model for camerapose estimation. Given a 3D geometrical map of the environment, we want to find an approximate measure of the probability that the current camera image has been obtained at a certain place in the robot's operating environment. We use this sensor model with MCL to track the position of a mobile robot navigating in a hallway. Possibly, it can be used also for localization in cluttered o_ce environments and for shape-based object detection.
On the one hand, we combine photogrammetric techniques for map-based feature projection with the exibility and robustness of MCL, such as the capability to deal with localisation ambiguities. On the other hand, the feature matching operation should be su_ciently fast to allow sensor fusion. In addition to the visual input, we want to use the distance readings obtained from sonars and laser to improve localisation accuracy.
The paper is organised as follows. In Section 2, we discuss previous work. In Section 3, we describe the components of the visual sensor model. In Section 4, we present experimental results for position tracking using MCL. We conclude in Section 5.
2 Related Work
In classical approaches to model-based pose determination, we can distinguish two interrelated problems. The correspondence problem is concerned with _nding pairs of corresponding model and image features. Before this mapping takes place, the model features are generated from the world model using a given camera pose. Features are said to match if they are located close to each other. Whereas the pose problem consists of _nding the 3D camera coordinates with respect to the origin of the world model given the pairs of corresponding features [2]. Apparently, the one problem requires the other to be solved beforehand, which renders any solution to the coupled
problem very di_cult [6].
The classical solution to the problem above follows a hypothesise-and-test approach:
(1) Given a camera pose estimate, groups of best matching feature pairs provide initial guesses (hypotheses).
(2) For each hypothesis, an estimate of the relative camera pose is computed by minimising a given error function de_ned over the associated feature pairs.
(3) Now as there is a more accurate pose estimate available for each hypothesis, the remaining model features are projected onto the image using the associated camera pose. The quality of the match is evaluated using a suitable error function, yielding a ranking among all hypotheses.
(4) The highest-ranking hypothesis is selected.
Note that the correspondence problem is addressed by steps (1) and (3), and the pose problem by (2) and (4).
The performance of the algorithm will depend on the type of features used, e.g., edges, line segments, or colour, and the choice of the similarity measure between image and model, here referred to as error function. Line segments is the feature type of our choice as they can be detected comparatively reliably under changing illumination conditions. As world model, we use a wire-frame model of the operating environment, represented in VRML. The design of a suitable similarity measure is far more difficult.
In principle, the error function is based on the di_erences in orientation between corresponding line segments in image and model, their distance and difference in length, in order of decreasing importance, in consideration of all feature pairs present. This has been established in the following three common measures [10]. e3D is defined by the sum of distances between model line endpoints and the corresponding plane given by camera origin and image line. This measure strongly depends on the distance to the camera due to back-projection. e2D;1, referred to as in_nite image lines, is the sum over the perpendicular distances of projected model line endpoints to
corresponding, in_nitely extended lines in the image plane. The dual measure, e2D;2, referred to as in_nite model lines, is the sum over all distances of image line endpoints to corresponding, in_nitely extended model lines in the image plane.
To restrict the search space in the matching step, [10] proposed to constrain the number of possible correspondences for a given pose estimate by combining line features into perceptual, quasi-invariant structures beforehand. Since these initial correspondences are evaluated by e2D;1 and e2D;2, high demands are imposed on the accuracy of the initial pose estimate and the image processing operations, including the removal of distortions and noise and the feature extraction. It is assumed to obtain all visible model lines at full length. [12, 9] demonstrated that a few outliers already can severely affect the initial correspondences in Lowe's original approach due to frequent truncation of lines caused by bad contrast, occlusion, or clutter.
3 Sensor Model
Our approach was motivated by the question whether a solution to the correspondence problem can be avoided in the estimation of the camera pose. Instead, we propose to perform a relatively simple, direct matching of image and model features. We want to investigate the level of accuracy and robustness one can achieve this way.
The processing steps involved in our approach are depicted in Figure 1. After removing the distortion from the camera image, we use the Canny operator to extract edges. This operator is relatively tolerant to changing illumination conditions. From the edges, line segments are identi_ed. Each line is represented as a single point (_; _) in the 2D Hough space given by _ = x cos _ + y sin _. The coordinates of the end points are neglected. In this representation, truncated or split lines will have similar coordinates in the Hough space. Likewise, the lines in the 3D map are projected onto the image plane using an estimate of the camera pose and taking into account the
visibility constraints, and are represented as coordinates in the Hough space as well. We have designed several error functions to be used as similarity measure in the matching step. They are described in the following.
Centred match count (CMC)
The first similarity measure is based on the distance of line segments in the Hough space. We consider only those image features as possible matches that lie within a rectangular cell in the Hough space centred around the model feature. The matches are counted and the resulting sum is normalised. The mapping from the expectation (model features) to the measurement (image features) accounts for the fact that the measure should be invariant with respect to objects not modelled in the 3D map or unexpected changes in the operating environment. Invariance of the number of visible features is obtained by normalisation. Speci_cally, the centred match count measure
sCMC is defined by:
where the predicate p de_nes a valid match using the distance parameters (t_; t_) and the operator # counts the number of matches. Generally speaking, this similarity measure computes the proportion of expected model Hough points hei 2 He that are con_rmed by at least one measured image Hough point hmj 2 Hm falling within tolerance (t_; t_). Note that neither endpoint coordinates nor lengths are considered here.
Grid length match (GLM)
The second similarity measure is based on a comparison of the total length values of groupes of lines. Split lines in the image are grouped together using a uniform discretisation of the Hough space. This method is similar to the Hough transform for straight lines. The same is performed for line segments obtained from the 3D model. Let lmi;j be the sum of lengths of measured lines in the image falling into grid cell (i; j), likewise lei;j for expected lines according to the model, then the grid length match measure sGLM is de_ned as:
For all grid cells containing model features, this measure computes the ratio of the total line length of measured and expected lines. Again, the mapping is directional, i.e., the model is used as reference, to obtain invariance of noise, clutter, and dynamic objects.
Nearest neighbour and Hausdorf distance
In addition, we experimented with two generic methods for the comparison of two sets of geometric entities: the nearest neighbour and the Hausdor_ distance. For details see [7]. Both rely on the de_nition of a distance function, which we based on the coordinates in the Hough space, i.e., the line parameter _ and _, and optionally the length, in a linear and exponential manner. See [5] for a complete description.
Common error functions
For comparisons, we also implemented the commonly used error functions e3D, e2D;1, and e2D;2. As they are de_ned in the Cartesian space, we represent lines in the Hessian notation, x sin _ ?? y cos _ = d. Using the generic error function f, we de_ned the similarity measure as:
where M is the set of measured lines and E is the set of expected lines. In case of e2D;1, f is de_ned by the perpendicular distances between both model line endpoints, e1, e2, and the in_nitely extended image line m:
Likewise, the dual similarity measure, using e2D;2, is based on the perpendicular
distances between the image line endpoints and the in_nitely extended model line.
Recalling that the error function e3D is proportional to the distances of model line endpoints to the view plane through an image line and the camera origin, we can instantiate Equation 1 using f3D(m; e) de_ned as:
where ~nm denotes the normal vector of the view plane given by the image endpoints ~mi = [mx;my;w]T in camera coordinates.
Obtaining probabilities
Ideally, we want the similarity measure to return monotonically decreasing values as the pose estimate used for projecting the model features departs from the actual camera pose. As we aim at a generally valid yet simple visual-sensor model, the idea is to abstract from speci_c poses and environmental conditions by averaging over a large number of di_erent, independent situations. For commensurability, we want to express the model in terms of relative robot coordinates instead of absolute world coordinates. In other words, we assume
to hold, i.e., the probability for the measurement m, given the pose lm this image has been taken at, the pose estimate le, and the world model w, is equal to the probability of this measurement given a three-dimensional pose deviation 4l and the world model w.
The probability returned by the visual-sensor model is obtained by simple scaling:
4 Experimental Results
We have evaluated the proposed sensor model and similarity measures in a series of experiments. Starting with arti_cially created images using idealized conditions, we have then added distortions and noise to the tested images. Subsequently, we have used real images from the robot's camera obtained in a hallway. Finally, we have used the sensor model to track the position of the robot while it was travelling through the hallway. In all these cases, a three-dimensional visualisation of the model was obtained, which was then used to assess the solutions.
Simulations using arti_cially created images
As a first kind of evaluation, we generated synthetic image features by generating a view at the model from a certain camera pose. Generally speaking, we duplicated the right-hand branch of Figure 1 onto the left-hand side. By introducing a pose deviation 4l, we can directly demonstrate its inuence on the similarity values. For visualisation purposes, the translational deviations 4x and 4y are combined into a single spatial deviation 4t. Initial experiments have shown only insigni_cant di_erences when they were considered independently.
Fig. 2: Performance of CMC on arti_cially created images.
For each similarity measure given above, at least 15 million random camera poses were coupled with a random pose deviation within the range of 4t < 440cm and 4_ < 90_ yielding a model pose.
The results obtained for the CMC measure are depicted in Figure 2. The surface of the 3D plot was obtained using GNUPLOT's smoothing operator dgrid3d. We notice a unique, distinctive peak at zero deviation with monotonically decreasing similarity values as the error increases. Please note that this simple measure considers neither endpoint coordinates nor lengths of lines. Nevertheless, we obtain already a decent result.
While the resulting curve for the GLM measure resembles that of CMC, the peak is considerably more distinctive. This conforms to our anticipation since taking the length of image and model lines into account is very signi_cant here. In contrast to the CMC measure, incidental false matches are penalised in this method, due to the differing lengths.
The nearest neighbour measure turned out to be not of use. Although linear and exponential weighting schemes were tried, even taking the length of line segments into account, no distinctive peak was obtained, which caused its exclusion from further considerations.
The measure based on the Hausdor_ distance performed not as good as the first two, CMC and GLM, though it behaved in the desired manner. But its moderate performance does not pay off the longest computation time consumed among all presented measures and is subsequently disregarded.
So far, we have shown how our own similarity measures perform. Next, we demonstrate how the commonly used error functions behave in this framework.
The function e2D;1 performed very well in our setting. The resulting curve closely resembles that of the GLM measure. Both methods exhibit a unique, distinctive peak at the correct location of zero pose deviation. Note that the length of line segments has a direct e_ect on the similarity value returned by measure GLM, while this attribute implicitly contributes to the measure e2D;1, though both linearly. Surprisingly, the other two error functions e2D;2 and e3D performed poorly.
Toward more realistic conditions
In order to learn the e_ect of distorted and noisy image data on our sensor model, we conducted another set of experiments described here. To this end, we applied the following error model to all synthetically generated image features before they are matched against model features. Each original line is duplicated with a small probability (p = 0:2) and shifted in space. Any line longer than 30 pixel is split with probability p=0:3. A small distortion is applied to the parameters (_; _; l) of each line according to a random, zeromean Gaussian. Furthermore, features not present in the model and noise are simulated by adding random lines uniformly distributed in the image. Hereof, the orientation is drawn according to the current distribution of angles to yield fairly `typical' features.
The results obtained in these simulations do not di_er significantly from the first set of experiments. While the maximum similarity value at zero deviation decreased, the shape and characteristics of all similarity measures still under consideration remained the same.
Using real images from the hallway
Since the results obtained in the simulations above might be questionable with respect to real-world conditions, we conducted another set of experiments replacing the synthetic feature measurements by real camera images.
To compare the results for various parameter settings, we gathered images with a Pioneer 2 robot in the hallway o_-line and recorded the line features. For two di_erent locations in the hallway exemplifying typical views, the three-dimensional space of the robot poses (x; y; _) was virtually discretized. After placing the robot manually at each vertex (x; y; 0), it performed a full turn on the spot stepwise recording images. This ensures a maximum accuracy of pose coordinates associated with each image. That way, more than 3200 images have been collected from 64 di_erent (x; y) locations. Similarly to the simulations above, pairs of poses (le; lm) were systematically chosen
Fig. 3: Performance of GLM on real images from the hallway.
from with the range covered by the measurements. The values computed by the sensor model referring to the same discretized value of pose deviation 4l were averaged according to the assumption in Equation 2.
The resulting visualisation of the similarity measure over spatial (x and y combined) and rotational deviations from the correct camera pose for the CMC measure exhibits a unique peak at approximately zero deviation. Of course, due to a much smaller number of data samples compared to the simulations using synthetic data, the shape of the curve is much more bumpy, but this is in accordance with our expectation.
The result of employing the GLM measure in this setting is shown in Figure 3. As it reveals a more distinctive peak compared to the curve for the CMC measure, it demonstrates the increased discrimination between more and less similar feature maps when taking the lengths of lines into account.
Monte Carlo Localisation using the visual-sensor model
Recalling that our aim is to devise a probabilistic sensor model