On the importance of proper data handling (part 1)

A 208×208 tile size works well for small objects, for medium 416×416 and for large 832×832.Large tile for the large boat, medium for the medium sized boat, small for the small boat.You may note that that we will run into the issue of small batch sizes again with a larger tile size..Our solution is to scale the 832×832 extent into a 416×416 input, the assumption being that larger object detection will not suffer as much from the loss of information during downscaling..For consistency we scale the 208×208 tile up to 416×416 (in reality this was not necessary but it simplifies our setup slightly).This means we have three input types at three different fixed scales (because of the up and down sampling)..There’s no reason to try to make a single network learn all these scales at the same time if we know ahead of time what they are..Let’s simplify the problem and just split it up into three separate YOLO networks, we call this ensemble MultiYOLO..With MultiYOLO one network focuses on small objects via the upscaled 208×208 tiles, another on medium using 416×416 tiles and the third on large using the downscaled 832×832 tiles.Some results…Here is a sample of the output from the small, medium and large object networks..Note how each network appropriately focuses on a different object size range.From left to right, the small, medium and large predictions..As we can see each network focuses on different object sizes.However, there is a missing piece of the puzzle we have yet to explain here..We only told you the tile sizes that we are using to train but not how we actually sample the tiles from each of our XView images..This sampling plays a crucial role in training the MultiYOLO ensemble..We will discuss this in our next post!https://picterra.ch/news/. More details

Leave a Reply