How to build a holodeck, week 39 - Camera calibration and registration
Texturing a model from two different camera views is this week’s challenge. And it’s almost working.
Two texture feeds
While my network code isn’t up to the task for a decent video feed just yet, it’s enough to be sending textures from my second Kinect over to the server for texture application. Plenty of work to be done to get the network up to a decent speed - I’m currently using Unity’s LLAPI which “works” (in that I can send 50K+ packets) but has some serious issues both with usability and the API, which seems to have some bugs relating to things like counting the number of packets in the send buffer. All fun and games, but there’s alternatives I can look at when performance becomes the bottleneck. Currently, the main issue is ensuring that the cameras are aligned well.
Image registration, camera alignment
Wiki has a great article on Image Registration which is the domain covering this particular problem.
So, I’ve got two kinect cameras in my garage, at different positions and looking at different angles. As you can see from the video, they don’t quite agree on exactly where they are and where they are pointing (if they did agree, the projection mapping would be pretty close to perfect, and it’s not).
The tolerance here for the cameras being correct is pretty damn tight. If I’m projecting one pixel to be 2mm (which is close to my worst-case on the back wall with various other settings) then I need to be accurate within 2mm at 5000mm distance. Give or take.
in position terms, assuming I have perfect parallel projection, that means I need to be within 2mm to be correct. A pixel either side isn’t the end of the world, so realistically being within 1 cm will give “ok” results. Still, 10mm accuracy is a big ask, when I’m trying to figure out the exact position of the Kinect’s colour camera sensor. It’s not in the centre of the black bar - rather, it’s 26mm in from the right, about 15mm back from the front surface, and about at the midline of the bar vertically.
When I rotate or move the camera tripod, the rotation pivot point isn’t in the centre of the Kinect bar (it’s about 80mm below the bar midline). The pivot axis doesn’t go through the centre axis of the kinect, either. Effectively, moving / rotating / adjusting the tripod gives me pretty arbitrary position for the camera itself, depending on how it’s tilted and turned.
So, measuring the kinect using a tape measure isn’t going to get me close, unless I’m incredibly precise with my measurements.
And, that’s without even taking rotation accuracy into account. a 2mm pixel rise at 5000mm distance is an angle of about 0.02 degrees. 1 degree of inaccuracy is 88mm error, give or take. To be accurate to within 10mm, I need to have my angles correct to within 0.1 degree.
Measuring this manually just isn’t going to cut it.
Using the Tango as a measuring device gives me accuracy to within a degree or two, and a position accurate within a few 10mm or so. The results I’ve achieved from using the Tango have given such bad registration that I spent a good half day trying to figure out why my projection code wasn’t working properly. The result was that I’m confident that it does work properly now, if I can ever manage to get good enough registration!
This still isn’t taking into account any variance in the cameras themselves. The intrinsics params for my two Kinects are not identical, so the centre of the sensor, the distortion params, etc are both different. I have a process for calibrating this now, but it’s not an automatic task yet (the manual steps take about 5 minutes on a good pass). Still, it’s a start, and it’s good to confirm the numbers I expect match the values on t’internets.
Is it good enough?
Well, manually aligning with the Tango really isn’t good enough. Fortunately, there’s lots of work been done on these problems by very smart people, so I started reading up on how someone who had a clue what they were doing would approach this.
The first obvious step is to find some code that does exactly what I want (theoretically). And that’s basically SolvePnP/SolvePNPRansac in OpenCV. (I might be able to write something much simpler given a set of tight constraints and a nice calibration widget, but that’s work for the future).
Unfortunately, I don’t have access to OpenCV directly from Unity - the best Asset store package I can find is 100 bucks, and that’s a lot to spend on testing something that might not even work.
Fortunately, I do know Python quite well, and OpenCV for Python is free, so …
OpenCV and Python
In we dive! I spent a day getting Python back up to speed again (Thanks, windows registry, for providing Python26 paths for all my commandline invocations). A few hours of mashing, reinstalling, and running pip got me to the point of having latest Python (2.7.11), Numpy (1.11) and OpenCV set up on my local machine.
Camera calibration and registration, 101
OpenCV have some tutorial code on exactly what I’m hoping to achieve - calibrating the camera, and estimating the camera pose from an image.
I spent a day following along at home, printing out chessboards from GIMP and sticking things to the floor and wooden boards. Because this is the first time I’ve used Numpy and OpenCV, I had a bit of getting up to speed. Running through the basic tutorials and figuring out OpenCV and Numpy conventions for things like array indices, coordinate spaces, UV mapping access conventions, all that sort of thing, took most of the day Wednesday, but by Thurday morning I’d managed to get the tutorials running on my local machine with my own sets of data.
The calibration process requires a chessboard pattern for OpenCV to recognise in the image. For the calibration to work, this chessboard pattern needs to be big and close to the camera. I generated about 40 test images like this: I ran the calibration, and got somewhat sensible results for the camera intrinsics. (This was my third attempt at various sizes of chessboard).
When I try to estimate the pose, this works pretty well if the chessboard is covering half the screen, but when it’s a tiny little spec in the image - the results are not so great.
This image shows the largest chessboard I could find in the house:
And this image shows that I’ve managed to align the world origin space against OpenCV’s space with a pretty decent pose estimation.
(ignore the axis directions, I was fiddling with the math at that point. As a brief digression, OpenCV’s coordinate system is right handed (like OpenGL) but has the Y and Z axes pointing the opposite way - so positive x is right, positive y is up, and positive Z is backwards.)
Mapping between OpenCV’s coordinate system and Unity’s is fun - I’ve got the positional result correct in Unity space. The rotation returned from SolvePnP is a Rodrigues vector which is convertable into a Matrix or Quaternion, but again, requires jumping coordinate systems - I think I’ve got a handle on it, but more testing is required next week.
The accuracy I’m seeing using this method seems better than the tango, at least for the positional component. I’m hoping the rotational component works as well, but I’ll need a lot more test images before I know how accurate this is.
Next week …
And that’s pretty much this week done. Lots more investigations next week, and hopefully some accurate-enough camera registration. If I can’t get good enough results this way, I’ll have to resort to some other method (ICP on pointclouds or something) so my fingers are definitely crossed. I might investigate the projector calibration method a few folks are using (for example see this great paper) but it’s a lot of learning for my tiny brain.
If I stall out on that, I’ll probably get the Tango back into the mix. Fun times!