Go to the first, previous, next, last section, table of contents, index.

Object Tracking

By default, when the Cognachrome Vision System starts operation, it goes into object tracking mode. In this mode, the incoming video stream is digitized to a resolution of 200 by 250, and each pixel is checked for membership in the three independent color look-up tables ("LUT" for short). Connected regions of recognized pixels are merged, and various statistics about the regions can be calculated:

Centroid
Area
Orientation of major axis
Relative sizes of major and minor axes

When running the Pioneer software, Saphira configures the tracking mode for each channel by setting persistent variables (see section Saphira Interaction Mode), and the data is uploaded in the form of Saphira packets.

When running the stand alone software, the protocol string provides a very flexible way to configure what data is calculated and transmitted by the board (see section Protocol String).

The protocol string, the color lookup tables, and other configuration information are all stored in the configuration EEPROM.

Choosing Tracking Targets
Color Training
Protocol String
Tracking Performance

Choosing Tracking Targets

What you'll read in the following sections are suggestions for how to pick targets to optimize tracking. But don't take all the suggestions as hard-and-fast rules: use them as guidelines to help you optimize your tracking setup. We suggest experimentation to test which suggestions you can get away with ignoring in any given setup.

Choosing Colors
Use Matte not Glossy
Choosing Lighting

Choosing Colors

When deciding what colors to use when marking objects for tracking, the main issue is finding colors which are different from those typically in your environment. We find that highly saturated, "neon" colors often work well. When we set up for tracking in a new environment, we will try several different fluorescent colors and see which one works best for the background and lighting conditions.

Use Matte not Glossy

When possible, use targets with matte rather than glossy surfaces. Glossy materials are much more affected by the angle the target's surface presents relative to the light source and camera. For example, if you look at a ball that's glossy, you will see reflections of lights, and in bad cases, even be able to see reflections of the surroundings. The big problem is that when glossy materials reflect, they usually reflect the whole spectrum--a reflection of a white light, for example, looks white. This degrades the ability of the color thresholding to cleanly detect such a target.

Most materials are between matte and glossy. This means that the reflection of a white light on most surfaces looks whiter than the material itself, not just brighter--the color saturation goes down. The ideal matte surface does not change color saturation when lighting sources are presented at different angles; it just changes brightness.

Choosing Lighting

We first suggest using lots of light. Most cameras start giving grainy pictures as light levels fall.

Because surfaces are never perfectly matte, color saturation will always depend somewhat on the angle of the target's surface relative to the light source and camera. This causes the worst problems when your lighting is a point source. To minimize the problems of glossiness, ideally, you want to use indirect lighting from many directions.

To avoid shadows across the target, you may want to use lights on the same side of the object as the camera. It is bad to light your target only from behind.

Finally, for some colors and some cameras, we have had slightly better performance illuminating targets with fluorescent light rather than with incandescent light. This appears to be because the spectrum of some incandescent lights contains very little green or blue component, thus reducing the effective range colors in the image.

For other pointers on getting a high-quality video input, see section Video Input.

Color Training

Once you have chosen a target object, you will want to train the vision system to threshold for only objects of that color. The vision system has three color channels: a, b, and c. You can set each of these channels to threshold for a different color.

Overview of Video Thresholding
Training the Vision System to See a Target Color

Overview of Video Thresholding

Each pixel of the video input is digitized into a 24-bit RGB value. This value consists of 8 bits each of the red, green, and blue components of the light which reached the camera at that point in the image. The goal of the thresholding stage of the vision system is to classify each point in the image as being interesting, meaning that it may be part of a target you want to track, or uninteresting, meaning that it probably is not part of a target. By looking at the debugging video output from the vision system, you can see how various regions of the image are classified at any given time. The interesting parts of the image are shown as being white, and the rest is left black.

Pixels are classified as interesting or uninteresting in hardware by comparing the RGB value of the pixel to a set of values which were defined as interesting when the system was trained. This is done by using the value as an address into a lookup table which maps RGB values into a single bit for each channel: 1 if it is interesting, and 0 if it is not.

The space of possible RGB values contains 16,777,216 distict points. If one bit were stored for every possible value which could come from the video digitizer, this would require over 16 megabits of thresholding information per channel, times 3 channels. Building an inexpensive system with a lookup table this large is not feasible, so a folding scheme is used to reduce this number to a reasonable amount. Instead of defining an arbitrary volume of 3 dimensional RGB space, two 2 dimensional regions of the RG and BG planes are defined to be interesting. The projection of an incoming RGB value into the RG and RB planes must be contained in the interesting region in order to classify that value as interesting.

The set of RGB values in an image of a given object depend on many factors, including characteristics of the particular camera, lighting, and material in any given situation. Human eyes and brains are very good at adapting to changing circumstances. We interpret a given object as appearing constant and distinct from the background in widely varying conditions. However, the actual RGB values which various cameras would provide for the same object in the same set of varying conditions would vary considerably. Therefore, we find that it is best to train the vision system to recognize certain targets in certain situations from empirical measurements. Methods for doing this are described in the next section.

It is possible to train the system to detect an object in a wide range of lighting conditions. In order to do this, you should follow the directions for training on the color of your target, then incrementally train the system through a range of different conditions. However, the range of conditions in which a given target can be accurately distinguished from the background is limited. It must be the case that the range of values acquired for the target does not intersect with the range of values for the background. Otherwise, you may detect background objects when they enter the range of trained target values. This is not necessarily a fatal problem--so long as you can always distinguish the actual target from false positives in the background you should be fine. However, such false positives will make the recognition problem more difficult, and may adversely affect performance.

For example, say that you want to track a light blue target in a dark blue room. For a limited range of lighting levels, this may work fine because the set of values the target and walls occupy may be disjoint. However, when the lights dim, the values for the target may overlap with the values the walls had when the scene was brighter. This problem also arises between more and less saturated colors. For example, an orange object in dim lighting becomes indistinguishable from brown.

There is therefore a tradeoff between the range of lighting conditions and the properties of the background. If everything in the background is sufficiently different from the target, you can train for a wide range of lighting conditions with no loss of performance. However, if aspects of the background are close to the color of the target, you may only be able to get good performance in a small range of lighting conditions.

See section Choosing Tracking Targets, for guidelines on how to choose a target so as to minimize this problem. Then, if it is possible, the best solution is to modify the background to get rid of troublesome elements. Otherwise, you can try to control the lighting. For example, put up curtains if the sun makes the lighting conditions (both in terms of intensity and spectral characteristics) change according to time of day. The final option is to determine a way to recognize which objects are the ones you are interested in, and accept that false positives may occur.

Training the Vision System to See a Target Color

It is possible to train the vision system for a certain color either from TPUVIS mode or, if you are running the Pioneer software, from Saphira interaction mode. The training procedure and commands are the same in either case, but the method of entering them and specifying which channel to affect is different.

In TPUVIS mode, single character commands are typed in the window displaying the TPUVIS prompt (see section TPUVIS Prompt Mode). Which channel is affected by a given command is modal and depends on what the current channel is set to. You can tell what the current channel is by looking at the prompt; the current channel is shown in brackets:

tpuvis[a]>

In Saphira mode, commands are issued by typing at the command line with the saphira> prompt, then hitting RETURN. Most commands consist of multiple characters, including a channel specification. See section Saphira Interaction Mode.

Sample the Color of the Target Object

The first step in training the system to detect a certain color is to place an appropriately colored training target in the center of the camera's viewing area and sample that color by using the lower case `s' command.

This command will grab a single frame of video, find the color at the exact center of the image, and use that color as the seed color for the training. It will then move out from that point in the image and try to find the outline of the object. If you are using this command from the TPUVIS prompt, a window containing this outline should appear. Once it has found the outline of the object, it will take all of the RGB values from the inside of that area of the image, and classify all of those values as interesting. This set of values will usually be sparse. As the final step, it will fill in the gaps to leave a smooth region.

This step works best when the training target is large in the image, solid colored, convex, and contrasts well with the background. If the training target does not have these characterisics, the outlining step may not work well.

The criteria used for detecting the outline of the training target is controlled by the diffthresh persistent parameter. This specifies how different a color can be from the center color and still be considered to be part of the same object. The default value is 5000. Increase this value if the system is not classifying a large enough region as belonging to the training target. Decrease this value if the system is classifying too large a region as belonging to the object. See section Setting and Storing Persistents, for information on how to change this value.

You may want to try this several times with different values of diffthresh until you get a good initial training. Don't worry about getting it perfect--there are more steps to go. However, you should be sure that the largest block of thresholded pixels in the image corresponds to the object you are interested in.

Grow or Shrink the Region

In the case that the sample step described above resulted in some of the background being incorrectly thresholded high, you can shrink the edges of the training set by using the `<' (less than sign) command. This will nibble away the edges of the thresholding region, hopefully leaving the training target strong while removing any false positives.

If instead the sample step resulted in no false positives, but only ended up accepting a small part of the target object, you can use the `>' (greater than sign) command. This will increase the size of the thresholding region, while leaving the center the same.

How much is removed from or added to the edges of the thresholding set is controlled by the color_grow persistent. Setting this value is somewhat analogous to choosing what grit of sandpaper to use. Increasing this number makes each command have a larger effect, and decreasing this number makes each command have a smaller effect. The default value is 5. See section Setting and Storing Persistents, for information on how to change this value.

Incrementally Train the System

Once you have trained the system to detect roughly the correct color, you can use the lower case `j' command to track the target object, and incrementally add the color of the center of it to the thresholding set. The target object does not need to be as constrained for this step as it was in the first step: it can be smaller and you can move it around.

For each frame of the incremental tracking, a small box centered around the RGB value of the center of the tracked target is added to the training volume. The size of this box is set by the boxsize persistent value. The larger this size is, the greater the effect of each incremental training frame. The default value is 20. If you set this number to be too large, you may start to get false positives in the background. Use the shrink command to correct this, then come back to do more incremental training.

Save the Training and Parameters to Serial EEPROM

Once you are happy with the color settings and persistent parameters, you should save them to non-volatile storage. The vision system contains a small dedicated serial EEPROM for this purpose. When you are ready to save the settings, use the capital `S' command. This will save the training of all three channels, and all of the persistents to the serial EEPROM.

IMPORTANT The stored training values are reloaded from the serial EEPROM on every reset. If you are happy with new settings, you should be sure to issue this command before resetting the board. The persistent values are slightly different. On reset, it will check if the persistent values stored in RAM are still valid. If not, it will use the values in EEPROM to reinitialize the table. However, if the values are still valid, it will not overwrite the values in RAM with the values stored in EEPROM.

Protocol String

The protocol string is a way of customizing how much processing is done on each channel and what format the data is uploaded over serial while the system is in tracking upload mode (see section Tracking Upload Mode). It allows customization without recompilation, so long as the data is sent in a format supported by the board's version of printf (i.e., ASCII encoded decimal or hex, or a one- or two-byte value encoded in base 254.)

The protocol string is actually made up of four substrings, the header, channel A, channel B, and channel C strings. These substrings are set by using the TPUVIS `p' command (see section TPUVIS Prompt Mode). When running the default tracking algorithm which uploads data according to the protocol string, data is uploaded by calling printf with format and arguments calculated from the protocol strings.

By default, data is sent only for the largest blob in each channel. However, the maximum and minimum number of blobs per channel can be set with the `n' TPUVIS command (see section TPUVIS Prompt Mode). Tracking data will be sent for the N largest blobs, where N will be no fewer than the set minimum, and no greater than the set maximum.

If there are fewer than the specified minimum number of blobs, the remainder of the data will be 0's.
If the maximum and minimum are set to be the same, then the data uploaded in each frame will be the same length; this simplifies parsing the results.
If the minimum is set to 0, then only as much data will be sent as there are visible blobs.

In the case where the maximum and minimum are set to be different values, there will have to be some marker in the protocol strings to let the application determine how many blobs were sent for each channel.

Protocol Codes Summary
Protocol Codes Detail
The %w printf Code
Summary of printf Formatting Codes
Controlling the Amount of Processing
Single Color Protocol String Example
Multiple Colors Protocol String Example

Protocol Codes Summary

Certain capital letters have special meaning in a protocol string and are used to determine what the arguments to printf will be, and how much processing needs to be done on the given channel. These special characters are removed from the string in order to create the format argument to printf. The following is a list of all the special characters defined so far and the values they generate. The other capital letters are reserved for future use, so you should avoid the use of capital letters for the formatting part of the protocol string.

`F': Frame count mod 65536 (range 0-65535)
`G': Frame count mod 256 (range 0-255)
`N': Square root of area of blob (range 0-255)
`R': Row number of centroid (range 0-255)
`C': Column number of centroid (range 0-200)
`Y': Row coordinate of centroid multiplied by 128 (range 0-32640)
`X': Column coordinate of centroid multiplied by 128 (range 0-25600)
`A': Angle of blob (range 0-180)
`S': Aspect ratio of blob times 10 (range 10-255)
`W': Width of the bounding box of blob.
`H': Height of the bounding box of blob.

Protocol Codes Detail

`F' or `G': The frame count increments once for every field of video. This number is calculated on interrupts and is an accurate way of telling whether any fields have been missed. This number is reset to 0 every time the tracking is started. `F' is the frame count modulo 65536, so it will wrap after 18 minutes. `G' is the frame count modulo 256, so it will wrap after 4.2 seconds.
`N': The size of the largest blob, calculated by taking the square root of the pixel area of the blob. This is roughly equivalent to the diameter of the blob, and would be the length of the side of the blob if it were square.
`R': Row number of the center of the largest blob for the given channel. The row number starts at 0 at the top of the screen and increases towards the bottom. You should calibrate the row and column numbers; the aspect ratio of the vision system is not square.
`C': Column number of the center of the largest blob for the given channel. The column number starts at 0 on the left side of the screen, and increments towards the right. You should calibrate the row and column numbers; the aspect ratio of the vision system is not square.
`Y' or `X': These are exactly like `R' and `C', respectively, except that they are multiplied by 128. This allows for more precision in computing the centroid.
`S' (capital): Aspect ratio of the blob, times 10. This is the square root of the ratio between of the second moments around the major axis and minor axes, times 10. Intuitively, this gives you an "aspect ratio" for the blob. If you are looking at a circle (or a square), you'll get 1. If you look at an oval (or rectangle) that's twice as long as it is wide, you'll get 2. Three times as long, and you'll get 3. Note that this value is output is scaled by a factor of 10 and then truncated to avoid sending floating point numbers; thus, a circle would output 10, while a rectagle 2.5" x 1" would give you 25.
Note: Currently the blob aspect ratio is not compensated for the aspect ratio of the vision system. This adjustment cannot be done easily (in the general case) as a post-processing step. We plan to allow users to input calibration data and have the system aspect ratio automatically compensated in a future release.
`A': Angle of the major axis of the blob. This number is only meaningful if the tracked object is elongated. Therefore, consult the aspect ratio to get an idea of how much to believe the angle.
Note: Currently this angle is not compensated for the aspect ratio of the vision system. This adjustment cannot be done easily (in the general case) as a post-processing step, so we will soon release a new version of the software that will allow users to input calibration data and have the aspect ratio automatically compensated.
`W': The width of the bounding box of the blob.
`H': The height of the bounding box of the blob.

The %w printf Code

The board's version of printf has an additional nonstandard feature. You can use the `%w' printf code to print a value in base 254, using either one or two bytes. This is useful because it will have a fixed number of bytes regardless of ARC's stream protocol. See section RS-232 Port Protocol, for more information.

Like standard printf formatting codes, you can specify a field width by placing a number between the `%' and the `w'. For example, %1w specifies a field width of 1 character. Unlike standard printf formatting codes, the %w code treats the field width as an exact number of bytes to send, rather than just a minimum. This is good because it means that the %w code will produce a fixed number of characters regardless of the value. If the field width is unspecified, the default width of 2 characters is used.

Field width = 1: The argument is printed as a single byte whose value is between 0 and 253. If the arument is greater than 253, the number modulo 254 will be sent. If the argument is less than 0, the results are undefined.
Field width = 2: The argument is printed as two bytes: b1, then b2. The value is b1*254+b2. If the argument is greater than 64515, the argument modulo 64516 will be sent. If the argument is less than 0, the results are undefined.
Field width > 2: At this time field widths greater than two are not supported. The results are undefined.

Summary of printf Formatting Codes

The following table summarizes the strengths and weaknesses of using various encoding methods for transmitting serial information for machine consumption.

See section The %w printf Code, for more information on the %w formatting code. Refer to any C reference manual for more information on the other formatting codes.

Flags, field width, precision, and size modifiers can be specified between the `%' and the formatting code. These are supported by the version of printf on the processor board. You should not need to use the size modifier flags (`l', `L', or `h') as all of the arguments to printf generated by protocol codes are cast to be 4 byte integers (which is the default if no flag is specified).

Note that in accordance with the ANSI specification, the field widths parameter only specifies a minimum and not a maximum field width. For example, if a number larger than 255 were formatted using the %02x code, it would contain more than 2 characters. See section Protocol Codes Summary for information on the range of values each protocol code generates.

Controlling the Amount of Processing

As well as specifying the upload format, the protocol string is also used to determine how much processing should be done on each channel. Only the minimum amount of processing required to calculate the information requested in the protocol string is done. The processing level is determined by which formatting code in the current protocol string appears lowest on the following list.

(none): No processing
`F', `G': Edges acquired only
`N': Edges acquired, sizes calculated, and largest found
`R', `C', `X', `Y', `W', `H': Edges acquired, sizes calculated, largest found, and center of largest blob calculated.
`A', `S' (capital): Edges acquired, sizes calculated, largest found, and center, moments, angle, and aspect ratio of largest blob calculated.

Single Color Protocol String Example

In this example, we want to track the positions of each of two orange balls. Channel A is trained to detect orange, and the other channels are not used. The maximum number of blobs is set to be 2 using the `n' TPUVIS command (see section TPUVIS Prompt Mode). The tracking data shown is for the following static sample image.

If you do not see this image in color, you can see it at:
`http://www.newtonlabs.com/cognachrome/cognachrome_5.html#protocol_example1'.

Human Readable Protocol String Example
Machine Readable Protocol String Example

Human Readable Protocol String Example

For nice, human readable output, you could use the TPUVIS `p' command (see section TPUVIS Prompt Mode) to set the protocol string to the following:

Header: "frame %3dG:\n" Channel A: "\ta: diameter %3dN, cg (%dC,%dR)\n"
Channel B: "" Channel C: ""

Using the `r' TPUVIS command (see section TPUVIS Prompt Mode) to test the new protocol string on the sample image yields:

frame   1:
        a: diameter  41, cg (189,115)
        a: diameter  31, cg (105,82)
frame   3:
        a: diameter  41, cg (189,115)
        a: diameter  31, cg (106,82)
frame   5:
        a: diameter  41, cg (189,115)
        a: diameter  31, cg (105,82)
frame   7:
        a: diameter  41, cg (189,115)
        a: diameter  31, cg (106,82)
frame   9:
        a: diameter  41, cg (189,115)
        a: diameter  31, cg (106,82)

Note that the frame numbers show that the data was not sent for every frame. Video contains 60 frames per second, and the frame number reported by the `F' and `G' commands count every frame. Using this protocol string yields around 80 characters of data uploaded per frame.

At a baud rate of 38400 baud, you can send data at a rate of 3840 characters per second (baud measures bits per second, there are 10 bits per character). This works out to 64 characters taking a full sixtieth of a second--the same amount of time which it takes to receive a full frame of video. Therefore it is not possible to send the data for every frame using this protocol string. In this situation, the system skips sending data for frames until the serial buffer is clear. The alternative would have been to send every frame with increasing delays until the serial output buffer filled, then throw out the overflow characters. This decision was made in order to minimize latency, which is a very important factor in stably controlling dynamic systems.

Machine Readable Protocol String Example

If you have a program parsing the data stream, you do not have to format the data in a way which a human would find easy to read. Instead, you want to make the information easy for a program to interpret, and you want to pack a large amount of information into a small number of bytes of transmission data. In this example, the same information is sent in the same order as in the human readable example, but it is intended to be read by a machine.

The highest information density can be achieved by using the %c or %w binary formatting codes (see section The %w printf Code). However, the %x hexadecimal formatting code has some advantages, even though it requires approximately twice as many characters:

The problem of synchronization and framing (determining which characters mean what) is easier with hexadecimal encoding of the numbers. You can use any characters except numeric digits and letters from a to f for framing and never confuse them with the numbers.
This problem is much more difficult when using the %c binary formatting code because there are no characters which are guaranteed not to appear as part of a number. Also, while it is possible to use 255 as a unique framing character when using the %w format code, the processing of the protocol string does not yet support numeric literals (i.e. \377 for 255). Therefore there is no convenient way for 255 to be encoded into the protocol string. This will be fixed in future versions.
Using the binary formatting codes will almost certainly result in non-printable characters being sent in the serial stream. This makes it very difficult to even display the data, much less read it.
It is often challenging to write parsing software (software which extracts the information you want from a known format). It is a big advantage if you can read the data and verify that your software is generating the expected result.

In this example, hexadecimal encoding is used. Also, you will notice that the full formatting specification is %02x. The `0' (zero) specifies that zeroes should be used for padding instead of spaces. The `2' specifies that the minimum field width is 2 characters. See section Summary of printf Formatting Codes, for more information.

Only the `G', `N', `C', and `R' protocol codes are used in this example--all of which are specified not to exceed 255 (see section Protocol Codes Summary). Therefore the data for each object will contain a constant number of characters.

This example uses the TPUVIS `p' command (see section TPUVIS Prompt Mode) to set the protocol string to the following:

Header:         "$%02xG"
Channel A:      "#%02xN:%02xC:%02xR"
Channel B:      ""
Channel C:      ""

Using the `r' TPUVIS command to start tracking with this protocol string yields the following results. Carriage returns have been added here to make it more readable. The actual data sent contains no carriage returns.

$01#29:bd:73#1f:6a:52
$02#29:bd:74#1f:6a:53
$03#29:be:74#1f:6a:52
$04#29:bd:74#1f:6a:53
$05#29:be:74#1f:6a:52
$06#28:be:74#1e:6a:52
$07#29:bd:73#1f:6a:52
$08#28:bd:74#1f:6a:53
$09#29:bd:73#1f:6a:52
$0a#29:bd:74#1f:6a:53

Note that while frames were skipped in the human readable example, all frames are present in this example. This is because only 21 characters are being sent per frame. It only takes 5 milliseconds to send 21 characters at 38400 baud. Each frame takes about 17 milliseconds, so it is possible to send all of the data without skipping any frames.

Multiple Colors Protocol String Example

In this example, we want to track the positions of each of two orange balls; the position, aspect ratio, and angle of a blue piece of cardboard; and the position, and bounding box dimensions of a dark green serial adapter. Channel A is trained to detect orange; B is trained to detect blue; and C is trained to detect dark green. The maximum number of blobs is set to be 2 for channel A, and 1 for channels B and C. The tracking data shown is for the followint static sample image.

If you do not see this image in color, you can see it at:
`http://www.newtonlabs.com/cognachrome/cognachrome_5.html#protocol_example2'.

The protocol string is set up to be the following:

Header:         "$%02xG"
Channel A:      "#%02xN:%02xC:%02xR"
Channel B:      "*%02xN:%02xC:%02xR:%02xS:%02xA"
Channel C:      "&%02xN:%02xC:%02xR:%02xW:%02xH"

$01#21:50:a8#1d:86:a2*23:7b:74:64:47&1d:c5:bd:22:25
$03#21:50:a7#1d:86:a2*23:7b:73:64:47&1c:c6:bc:21:26
$05#22:50:a8#1d:86:a2*23:7b:74:64:47&1c:c6:bd:21:25
$08#22:50:a7#1d:86:a1*23:7b:74:64:47&1c:c6:bb:1f:24
$0a#22:50:a7#1d:86:a1*23:7b:75:64:47&1c:c6:bc:21:26

You can see that frames were skipped in this example. Here 51 characters are sent per frame, so it takes 13 milliseconds per frame at 38400 baud to upload the information. Since each frame takes about 17 milliseconds, it would be possible to send this much serial data without missing any frames. However, processing three channels with angle information takes long enough that some frames must be skipped. See section Multiple Channel Tracking Plus Angle Performance Example, for a more in-depth analysis of the calculation time used by this example.

Tracking Performance

You can use the `m' TPUVIS command to obtain rate and latency information when no serial data is sent (see section Debugging and Miscellaneous Detail). Which channels are acquired and how much processing is done for each is set by the protocol string. This is so that the information can easily be compared against tracking performance in normal operation.

Each frame for which the data is available is processed as usual, but no data is sent over the serial port. Instead, the amount of time which passed between when the data started to become available and the time when the processing was completed is recorded. A fixed number of frames are processed, and the number of frames which had to be skipped is calculated. The output of this command is formatted in the following manner.

tpuvis [a]> m Missed M over N frames: avg latency L ms

N: The total number of frames which were processed
M: The total number of frames which were not processed
(N+M): The total number of frames which arrived from the camera
L: The number of milliseconds which passed on average from when data started to become available for the frame (measured at the end of the vertical retrace) to the time when the processing of the frame was completed.

The system has two edge acquisition buffers. Generally one is being filled while the other is being processed. When a new frame starts, whichever buffer is not being processed is filled with the incoming data. If the most recently processed buffer has not been released yet when it would be time to refill it, the buffer which was most recently filled will be refilled, and data for the previous frame will be lost.

The camera provides 60 frames per second, so the total time during which frames were acquired is T = (N+M)/60 seconds. During this time N frames were processed. This leads to an output frame rate of R = N/T, or R = 60*N/(N+M) Hz (frames per second).

From this equation, you can see that if zero frames are missed, R = 60 Hz. As more frames are missed, the frame rate degrades smoothly.

Another way to look at the data uses the latency figures. It takes 17 ms from start to finish for a single frame to arrive from the camera. It then takes some additional time for the system to finish processing the information, and release the buffer to be refilled. This extra time is E = L-17 ms. If E > 17ms, then frames will definitely be missed. It is difficult to say more than that about the relationship between latency and frame rate.

The following examples give performance specifications for the same scene as the multiple colors protocol string example (see section Multiple Colors Protocol String Example) at different levels of processing.

Single Channel Tracking Performance Example
Multiple Channel Tracking Performance Example
Multiple Channel Tracking Plus Angle Performance Example

Single Channel Tracking Performance Example

Processing the scene and calculating the largest blobs on the first channel only yields the following:

tpuvis [a]> m
Missed    0 over  500 frames: avg latency   20 ms

Here M = 0, so the frame rate is a full 60 Hz. It was possible for the acquisition buffer to be switched every time, so no data was lost.

Multiple Channel Tracking Performance Example

Processing the same scene and calculating the center of the largest blobs on all three channels without calculating angle provides the following results:

tpuvis [a]> m
Missed    0 over  500 frames: avg latency   23 ms

Again M = 0, so the frame rate is a full 60 Hz. Both of the last two examples would result in the same frame rate, but this would have slightly more latency than the previous one.

Multiple Channel Tracking Plus Angle Performance Example

Here the protocol string from the protocol string example (see section Multiple Colors Protocol String Example) was used. The center of the largest blobs on all three channels, plus the aspect ratio and angle on channel B were processed:

tpuvis [a]> m
Missed  249 over  500 frames: avg latency   29 ms

Here M = 249 and N = 500. The frame rate R = (60*500)/(500 + 249) = 40Hz. This tells us that the processing buffer was not ready to be refilled about a third of the time. This was caused because the angle and aspect ratio calculations are quite expensive compared to just tracking the center of the objects. In this case, the added time for computing angle on one channel was 6 ms--a 26% increase.

Go to the first, previous, next, last section, table of contents, index.