Michael J. Black: Yosemite Sequence FAQ

Frequently Asked Questions

The Yosemite sequence has been used extensively for experimentation and quantitative evaluation of optical flow methods, camera motion estimation, and structure from motion algorithms.

The data was originally generated by Lynn Quam at SRI and David Heeger was the first to use it for optical flow experimentation. The sequence is generated by taking an aerial image of Yosemite valley and texture mapping it onto a depth map of the valley. A synthetic sequence is generated by flying through the valley.

The files here contain ground truth optical flow fields. These are unfortunately quantized to 8 bits. This quantization imposes a fundamental limit on the accuracy that can be measured using this data. I believe current methods are approaching this accuracy limit and I have been thinking about generating floating-point ground truth. If you think you need it, then pester me about actually generating it.

There is often a great deal of confusion about how to report errors for this sequence. There are actually two common sequences: one with clouds and one without. Some methods report errors on the sequence with clouds and assume that the motion of the clouds is translational. I want to emphasize there is no ground truth for the cloud motion. The cloud pattern is fractal and undergoing Brownian motion. The assumption of brightness constancy does not hold for the clouds. So reporting errors in the cloud region is completely meaningless. For this reason, one should only report errors for the rigid region of the scene.

The data for each frame is stored in three files: descriptor, data0, data1. The information in the descriptor files are used to re-scale the 8-bit binary data to floating point values. A descriptor file looks like:

(_data_files 2)
(_data_sets 1)
(_channels 1)
(_dimensions 252 316)
(_data_type "unsigned_1")
(scale 0.031549662234736424)
(pedestal -3.9788706302642823)

The images are 316 wide and 252 pixels high. The "scale" and "pedestal" are used to re-scale the binary data.

The file data0 contains the vertical motion (raw bytes) while data1 contains the horizontal motion.

Here is some simple C pseudo-code for reading in the files. Note that read these binary files on the PC, you must use "rb" for read binary.

I read the raw bytes into "temp" and then re-scale

char fnIn[MAXLEN]; /* input file name */
char *temp; int sizeInput, index;
FILE *infile;
infile = fopen(fnIn, "rb");
if (infile == NULL) {
fprintf(stderr, "infile: NULL\n");
}
sizeInput = nx * ny * sizeof( char );
if ((temp = (unsigned char *) malloc((size_t) sizeInput)) == NULL)
fprintf(stderr, "Unable to allocate memory for temp.");
/* read binary file into temp */
fread((char *) temp, sizeInput, 1, infile);
/* convert binary values to floats using scale and pedestal */
for(i=0; i < ny;i++){
for(j=0; j < nx;j++){
index=i*nx+j;
true_v[index] = (((float) temp[index]) * scale) + (float) pedestal;
}
}
fclose(infile);

Do the same for the horizontal motion true_u.

Return to home page.