Nothing is given, Everything is earned.
— Lebron James
Take it easy, babe …
Personal Homepage
Tracking
localizing camera and deciding when to insert a keyframe
Local Mapping
Loop Closing
MapPoint pi
KeyFrame Ki
Covisibility Graph: Undirected Weighted Graph
Essential Graph
GOAL
compute relative pose between 2 frames & triangulate an initial set of MapPoints.
STEPS
Parallel computation of two models
fundamental matrix Fcr
each interation, compute score Sm
Model selection
Motion and SfM
BA
ORB Extraction
Initial Pose Estimation from Previous Frame
Initial Pose Estimation via Global Relocalization
Track Local Map
once get initial estimation of camera pose & feature matches, we can project map into frame and search more map point correspondence
The local map also has a reference keyframe Kref, which shares most map points with current frame
New KeyFrame Decision
To insert a frame:
KeyFrame insertion
Recent MapPoints culling
in order to be retained in the map, pass a restrictive test during te firsr 3 keyframes after creation
25% the frames visible
New MapPoint Creation
Created by triangulating ORB from connected keyframes Kc
Local BA
Local KF culling
GOAL detect redundant KFs & delete
Essential Graph Optimization
effectively correct drifts
The instructions are written in detail. Additionally, I want to address some points.
Related Publication: ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras [pdf]
suggest DON NOT try version > 3.0, I have failed multiple times.
Eventuall, I successfully run on OpenCV 2.4.11
compile OpenCV, check here
IF you want to uninstall OPENCV here
./build.sh
Remove -j
of make -j
to make sure your code can run smoothly.
IF your PC is powerful enough, IGNORE this!!!
Good Tutorial can be found here
OPENGL may be needed for Pangolin here
Tracking.preprocess: ORBExtractor (left & right rectified images) > stereo matching > stereo & mono keypoints
Notice: place recognition module based on DBoW2 for relocalization, reinitialization and loop detection.
use the same ORB features for tracking, mapping and place recognition tasks.
system handles mono & stereo keypoints (close OR far)
A. stereo point
close: associated depth < 40 times the stereo/RGBD baseline
far: > 40 times (accurate rotation, but weak scale & translation), triangulate when supported by multiple views
monocular point
B. bootstrapping:
C. BA with mono & stereo constraints
D. loop closing and full BA
stereo/depth info makes scale observable and geometric validation and the pose-graph opt based on rigid body transformation instead of similarity
IF a new loop is detected while opt, opt aborted and close the loop, which will launch a full BA again.
FROM updated to non-updated: spanning tree
E. keyframe insertion
F. localization mode
loop closing deactivated, camera is continuously localized by tracking. VO matches ORB & 3D points
Tracking.cpp
LocalMapping.cpp
LoopClosing.cpp
Viewer.cpp
System::System()
- load ORB vocabulary (ORBVocabulary class, ORBVoc.txt)
- create keyframe database (KeyFrameDatabase class, initialized with *mpVocabulary*)
- create map
- create drawers (used by map)
- initialize tracking thread
- initialize local mapping thread & launch
- initialize loop closing thread & launch
- initialize Viewer thread & launch
- set pointers between threads
some important names:
1 | //---ORB |
System::TrackStereo
- check GUI options
- mpTracker->GrabImageStereo
System::SaveTrajectoryTUM
- mpMap->GetAllKeyFrames()
- transforamtion (1st keyframe is the origin) *GetPoseInverse()*
- framepose stored relative to its reference keyframe (lRit), the timestamp (lT), tracking state (lbL)
- if reference keyframe was culled, traverse the spanning tree to get a suitable keyframe
Tracking::GrabImageStereo
- RGB to Gray
- mCurrentFrame
Frame(mImGray,imGrayRight,timestamp,mpORBExtractorLeft,mpORBExtractorRight,mpORBVocabulary,mK,mDistCoef,mbf,mThDepth);
- Track()
- return mTcw (camera pose W2C)
Frame::Frame
// stereo initialization
- Frame ID
- get scale level info (ORBextractor class)
- ORB Extraction
- threadLeft (Frame::ExtractORB > .join())
- threadRight (Frame::ExtractORB > .join())
- UndistortKeyPoints()
- ComputeStereoMatches(): compute depths if matches
- depth info: mvuRight & mvDepth
- mvpMapPoints, mvOutlier
Frame::UndistortKeyPoints
- *N* feature points
- cv::undistortPoints()
- mvKeysUn: corrected // mvKeys, mvKeysRight
- **redundant** in stereo case
Frame::ComputeStereoMatches
- assign keypoints to row table // vRowIndices
- compute range of rows
- set limits for search //minD, maxD, minZ
- for each left keypoint search a match in the right
- SAD (subpixel match by correlation, IF |deltaR| >1, continue)
- matched points culling // mvuRight, mvDepth
Frame::UnprojectStereo
// backproject a keypoint (if stereo/depth info available) into 3D world coordinates.
// Rotation, translation & camera center
mRcw; //Rotation from world2camera
mtcw; //Translation from world2camera
mRwc; //Rotation from camera2world
m0w; //Translation from camera2world;
// |fx 0 cx|
// K = |0 fy cy|
// |0 0 1 |
// corrected coef: [k1 k2 p1 p2 k3]
// mThDepth: Threshold4 close/far points
Tracking::Track()
IF NOT_INITIALIZED
- StereoInitialization()
ELSE
- CheckReplacedInLastFrame()
- IF mVelocity.empty()
TrackReferenceKeyFrame()
- ELSE
TrackWithMotionModel()
Tracking::StereoInitialization
IF N > 500
- set pose to the origin
- create keyframe
- insert keyframe in the map
- create mappoints and associate to keyframe
// Frame::UnprojectStereo()
// MapPoint::ComputeDistinctiveDescriptors() > find best descriptors for MapPoint; using median of dists
// MapPoint::UpdateNormalAndDepth() > update observations: mNormalVector & mfMaxDistance, mfMinDistance
Tracking::TrackReferenceFrame()
- mCurrentFrame.ComputeBoW();
- ORBmatcher.SearchByBoW()
- initialize pose by *mLastFrame*
- Optimizer::PoseOptimization(&mCurrentFrame)
- discard outliers
Tracking::TrackWithMotionModel
- Tracking::UpdateLastFrame
- Const Velocity Model, estimate current pose
- project points seen in previous frame
- Based on CVM, tracking MapPoints in the last frame
- IF nmathces < 20, uses a wider window search (th > 2*th)
//---ORBmatcher.SearchByProjection
- optimize frame pose with all matches
// Optimizer::PoseOptimization(&mCurrentFrame)
- discard outliers of mvpMapPoints(feature >> MapPoint)
Tracking::UpdateLastFrame()
- update pose according to reference keyframe
// mlRelativeFramePoses: store the reference keyframe for each frame and its relative transformation
- IF stereo OR RGBD
- sort points according to measured depth
- rank depths in ascending order
- IF nPoints > 100, break
Tracking::Relocalization()
Relocalization is performed when tracking is lost
- compte BoW Vector
- mpKeyFrameDB->DetectRelocalizationCandidates(&mCurrentFrame)
- ORB matching with each candidate
- IF enough matches, set up PnP solver
- perform iterations of P4P RANSAC, until a camera pose supported by enough inliers
- Optimizer::PoseOptimization(&mCurrentFrame)
- IF few inliers,search by projection & optimization
ORBmatcher::SearchByProjection
SearchByProjection(currentFrame, lastFrame, th, bMono)
1. project MapPoints in the last frame
2. match & culling
KeyFrameDatabase::DetectRelocalizationCandidates
find similar keyframes in relocalization
- search all keyframes that share a word with current frame
- find keyframes share enough words & decide
Th: minCommonWords = maxCommonWords*0.8f
- compute similarity score
- accumulate score by covisiblity
One Group: Keyframe + GetBestCovisibilityKeyFrames(10)
>> bestAccScore & minScoreToRetain = 0.75f*bestAccScore
return group(>minScoreToRetain) memeber with highest score
slam, (Simultaneous Localization and Mapping)
传感器数据 → 前端视觉里程计 → 后端非线性优化 → 建图
→ 回环检测 →
其中,uk是运动传感器的参数,wk是噪声
yj是路标点,zk,j是观测数据
特殊正交群
特殊欧式群
罗德里格斯公式
相似变换
仿射变换
射影变换
李代数
SO(3)指数映射
SE(3)指数映射
其中,
李代数求导&扰动模型
李代数求导:
扰动模型(左):
相机模型:
切向畸变&径向畸变
双目相机模型
得到,
高斯牛顿法
令左式系数为H,右边为g:
列文伯格-马尔夸尔特法
定义ρ来确定信赖区域范围
第k次迭代求解:
前端 → 后端,提供好的初始值;特征点(Feature)作为路标,由关键点&描述子组成。
ORB
ICP(双目相机 OR RGB-D)
PnP(3D-2D)
重投影误差
关于相机位姿的一阶变化关系:
同理,得到关于位置的一阶变化关系
《Real-Time Visual Odometry from Dense RGB-D Images》
《Semi-direct tracking and mapping with RGB-D camera for MAV》
SSD: Single Shot MultiBox Detector
从 Github 上面下载源工程代码:caffe-SSD
进入caffe-ssd 主目录:cp /home/xxx/…/caffe-ssd/
1 | cp Makefile.config.example Makefile.config |
编译项目:(进入 caffe-ssd 主目录)
1 | make -j8 |
CUDA 版本比较高的需要注释掉config里面最后一行内容:
数据文件准备
预训练模型(VGG):VGG_ILSVRC_16_layers_fc_reduced.caffemodel
(下载地址:密码: t9ub)
下载完毕后将VGG模型放到caffe主目录下 models\VGGNet
下面(如果没有的话,models 下面没有的话mkdir VGGNet
)
VOC2007 和 VOC2012 数据集
进入caffe主目录下的data
目录:
1 | wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar |
如果安装失败,请转到:(VOC 2007 & 2012 Dataset 密码:j3in )
紧接着解压:
1 | tar -xvf VOCtrainval_11-May-2012.tar |
将数据转换为caffe处理的数据类型(LMDB):
cd caffe主目录,执行:
1 | ./data/VOC0712/create_list.sh |
注意在执行 create_data.sh
如果提示no module caffe 的话,用如下指令:export PYTHONPATH=$PYTHONPATH:/home/xxx/.../caffe主目录/python
(自行修改中间路径)
python examples/ssd/ssd_pascal.py
1 | python examples/ssd/score_ssd_pascal.py |
这里注意指定使用的快照模型的路径 & 在caffe主目录下面运行程序
训练准备
创建自己的数据目录myData
:
1 | cd /data |
将/data/VOC0712
下面的create_list.sh
,create_data.sh
,labelmap_voc.protoxt
这三个文件拷贝到’data/myData’:
1 | cp data/create* ./myData |
在/data/VOCdevkit
目录按照VOC数据框架下面创建myData
,用来存放自己的数据集
1 | cd data/VOCdevikit |
一般地,我们只需要关注 :
Annotations:XML描述文件
ImageSets: Main目录下面放 train.txt, val.txt, trainval.txt, test.txt
JPEGImages:存放所有图片
制作VOC数据集
按照VOC Dataset要求整理好数据集后,将之转换为caffe的输入数据。首先,根据自己数据集特点,修改labelmap_voc.protxt
,注意保留item
中background类,其余的类别可以按照自己的需要照葫芦画瓢,给一个简单的示例:
然后,依次运行create_list.sh
,create_data.sh
.注意修改sh中的路径到你自定义的数据集路径。
需要注意的参数有:
create_data.sh: data_root_dir
, data_name
,mapfile
create_list.sh: root_dir
,
1 | # create_list.sh 中应该注释掉 |
训练
在python主目录下运行命令:python examples/ssd/ssd_pascal.py
1 | #需要指定的路径与参数 |
Note: solver parameters中GPU的指定,个数不要超过可用个数,可以用nvidia-smi
来查看可用GPU情况;另外,也可以调整solver_param
参数,比如:iter_size, max_iter, etc.
score_pascal.py
Note: 注意修改参数与ssd_pascal.py
中的路径相同
ssd_detect.py**
Note: 检测单张图片,指定‘–gpu_id’, ‘–model_def’, ‘–model_weights’, ‘–image_file’.
批量完成test images的可视化
build/examples/ssd/ssd_detect.bin
对test结果进行文本输出,输出的格式为 ( path,label,confidence,xmin,ymin,xmax,ymax )caffe root 下执行:
1 | build/examples/ssd/ssd_detect.bin models/VGGNet/mydataset/SSD_300x300/deploy.prototxt models/VGGNet/mydataset/SSD_300x300/mydataset_SSD_300x300_iter_100236.caffemodel data/VOCdevkit/mydataset/test_img_path.txt --confidence_threshold 0.5 --out_file output.txt` |
output.txt 是 ssd_detect.bin 生成的检测结果的txt文档
1 | python examples/ssd/plot_detections.py output.txt /home/wxb/caffe-ssd --labelmap-file data/mydataset/labelmap_voc.prototxt --save-dir results/bbox_results/SSD_300x300/Main/img/ |
you can check the results of following code here1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248<!DOCTYPE HTML>
<html>
<head>
<meta name="Xingbo WANG"
content="andywangxb.github.io">
<meta http-quiv="Content-Type" content="text/html"; charset=gb2312" />
<meta http-equiv="Refresh" content="5;url=https://andywangxb.github.io" />
<!-- title of web -->
<title>html experiments</title>
<!-- style of web -->
<style type="text/css">
h1 {color: green}
p {color: black}
span.red {color:red;}
#header {
background-color:black;
color:white;
text-align:center;
padding:5px;
}
#nav{
line-height:30px;
background-color:#eeeeee;
height:300px;
width:100px;
float:left;
padding:5px;
}
#section{
width:350px;
float:left;
padding:10px;
}
#footer{
background-color:black;
color:white;
clear:both;
text-align:center;
padding:5px;
}
</style>
<!-- outer style -->
<link rel ="stylesheet" type="text/css" href="/html/csstest1.css">
</head>
<!-- visible part -->
<body bgcolor="lightgrey">
<!-- this is experiment -->
<!-- heading-->
<h1 align="center">h1 heading</h1>
<h2 style="background-color:red">h2 heading</h2>
<h3 style="text-align:right">h3 heading</h3>
<h4>h4 heading</h4>
<h5>h5 heading</h5>
<h6>h6 heading</h6>
<!-- paragraph -->
<p>one paragraph</p>
<p>another paragraph</p>
<hr /><!-- split -->
<!-- link -->
<a href="https://andywangxb.github.io"> my personal homepage</a>
<p>
<a href ="/index.html">This </a >is directed to a link of this website.</p>
<p>
<a href ="http://www.qq.com">This </a>is directed to a link outside this website</p>
<a href ="http://www.qq.com" target="_blank">This</a> will open a new page directed to <i>qq.com</i>
<p> you can mail me at<a href ="mailto: wangxbzb@hotmail?subject=Hello%20again">subject: hello again</a>
</p>
<hr />
<!-- insert image -->
<map>
<p>Image
<img src="/images/photo.png" align="center" alt="photo.png" width="100" height="100"/>
among the texts</p>
</map>
<hr />
<!-- word style -->
<b> this text is bold </b>
<br />
<strong> this text is strong </strong>
<br />
<big> this text is big</big>
<br />
<em> this text is emphasized</em>
<br />
<i> this text is italic</i>
<br/>
<small>this text is small</small>
this text contains <sub>subscript</sub>
<br />
this text contains <sup>superscript</sup>
<hr />
<pre>
This is pre tag.
it can be used to demonstarte codes:
for i in range(0,10):
print(i)
</pre>
<code> Computer Code </code>
<br />
<kbd>keyboard input</kbd>
<br />
<tt>teletype text</tt>
<br />
<samp>sample text</samp>
<br />
<var>computer variable</var>
<br />
<hr />
<address>
written by <a href ="wangxbzb@hotmail.com">Xingbo WANG</a>.<br>
Visit us at:<br>
andywangxb.github.io<br>
Wuhan, China<br>
</address>
<hr />
<abbr title="etcetera">etc.</abbr>
<br />
<acronym title="world wide web">www</acronym>
<hr />
<bdo dir="rtl">
here is some Hebrew text
</bdo>
<hr />
<blockquote>
this is for long quotes.this is for long quotes.this is for long quotes.
</blockquote>
<br />
<q>this is for short quotes</q>
<br />
For example:
<q> this is an example from WWF website</q>
<blockquote cite="https://www.worldwildlife.org/who/index.html">
五十年来,WWF 一直致力于保护自然界的未来。世界领先的环保组织,WWF 工作于 100 个国家,并得到美国一百二十万会员及全球近五百万会员的支持。
</blockquote>
<hr />
<p> a "dozen" is not<del> twenty </del> <ins> twelve </ins>
</p>
<hr />
<p><cite>The Scream</cite> by Edward Munch. Painted in 1893.</p>
<hr />
<!-- table -->
<p><h4 align="center">table example</h4>
<table border="1">
<caption>name</caption>
<tr>
<th>heading A</th>
<th>heading B</th>
</tr>
<tr>
<td> row 1 , cell 1 </td>
<td> row 1 , cell 2 </td>
</tr>
<tr>
<td> row 2 , cell 1 </td>
<td> row 2 , cell 2 </td>
</tr>
</table>
</p>
<hr />
<p>unorganized list
<ul>
<li>coffe</li>
<ul>
<li>black coffe</li>
<li>latte</li>
</ul>
<li>milk</li>
</ul>
</p>
<p>organized list
<ol>
<li>coffe</li>
<li>milk</li>
</ol>
</p>
<p>DIY table
<dl>
<dt>computer</dt>
<dd>device</dd>
<dt>monitor</dt>
<dd>device</dd>
</dl>
</p>
<hr />
<!-- span -->
<p><h1>My <span class="red">Important</span> Heading</h1>
</p>
<hr />
<!-- div style -->
<div id="header">
<h1>city gallery</h1>
</div>
<div id="nav">
London<br>
Paris<br>
Tokyo<br>
</div>
<div id="section">
<h1>London</h1>
<p>London is the capital city of England.
</p>
</div>
<div id="footer">
CopyRight
</div>
<hr />
<!-- frame -->
<iframe <!--src="http://www.qq.com"--> name="tencent" width="100%" height="200" frameborder="0"></iframe>
<p>
<a href="http://www.baidu.com" target="tencent">baidu.com</a>
</p>
<hr />
<!-- insert script-->
<script type="text/javascript">
document.write("hello world!")
</script>
<hr />
<!-- special symbols -->
<p>
  space<q> </q><br>
5 < 10<br>
5 & 10<br>
£ 5<br>
¥ 10<br>
¢ 10<br>
®<br>
&trade<br>
5 × 10<br>
10 ÷ 5<br>
<br>
</p>
<hr />
<!-- form -->
<form>
First name:<br>
<input type="text" name="firstname">
<br>
Last name:<br>
<input type="text" name="lastname">
<br>
<input type="radio" name="sex" value="male" checked>Male
<br>
<input type="radio" name="sex" value="female">Female
<br>
</form>
<hr />
</body>
</html>
Anaconda is very convenient tool to manage multiple virtual envs of python in your PC.
So you can enjoy different version of python, avoid conflictions between different projects.
This blog I will introduce how to install anaconda & install envs for tensorflow(CPU)、Scrapy
common commands:
1 | # check current env |
1 | # install new pkgs |
In order to install and upgrade easily in China. We should specifically allocate domestic mirrors to it.
.condarc
in C:\Users\user_name\.condarc- default
1 | conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2/ |
To ensure that pkgs you use does not interfere with each other. Better to create new envs for them.
1 | conda create -n tensorflow python=3.5 |
1 | conda create -n py27 python=2.7 |
Wait a moment & have a cup of tea~
username.github.io
Env: Ubuntu 16.04
Methods:
sudo apt-get install git
NodeJs: NodeJs+NPM
1 | sudo npm install -g -n |
Hexo: sudo npm install hexo-cli -g
hexo init username.github.io
configuration
cd username.github.io
git clone https://github.com/iissnan/hexo-theme-next themes/next
_config.yml
1 | title: [blog name] |
wirtehexo new [layout] "essay name"
hexo s
npm install hexo-deployer-git --save
hexo clean
hexo g
exo d
Hexo: https://hexo.io/
NexT: http://theme-next.iissnan.com/
Individual Settings: 1, 2