This paper introduces a robust visual tracking of objects in complex environments with blocking obstacles and light reflection noises. This visual tracking method utilizes a transfer matrix to project image pixels back to real-world coordinates. During the image process, a color and shape test is used to recognize the object and a vector is used to represent the object, which contains the information of orientation and body length of the object. If the object is partially blocked by the obstacles or the reflection from the water surface, the vector predicts the position of the object. During the real-time tracking, a Kalman filter is used to optimize the result. To validate the method, the visual tracking algorithm was tested by tracking a submarine and a fish on the water surface of a water tank, above which three pieces of blur glass were blocking obstacles between the camera and the object. By using this method, the interference from the reflection of the side glass and the fluctuation of the water surface can be also avoided.

## Introduction

The research of unmanned mobile systems (UMS) is increasingly important due to the development of new vehicles systems. The position and velocity information of most of UMS is normally obtained through visual tracking using on-board cameras. Based on the color, the brightness and the position of the image, the processor calculates the shape, size, distance, and velocity, which are used as feedback information for the control system. Many types of controls, such as collision avoidance control, flight formation control, and automated systems of ground vehicles, have been developed for UMS. For example, Shladover et al. summarized the development of an automatic vehicle control algorithm in the program on advanced technology for the highway [1]. Seiler et al. discussed the development of a collision avoidance system by comparing the algorithms from different vehicle companies [2]. Many other researchers who have been working on an image-based system used their own algorithm to control different objects in a special environment. Bourquardez et al. investigated the image-based visual servo control by finding the regulation of the position of the object [3]. Fukao et al. used the camera as the only sensor to find and control the trajectory of the blimp [4].

It is a fact that the first step of the UMS control is to find and locate the object. Nowadays, the commonly used methods are image-based tracking, such as using the global position system (GPS) [5] to track boats or ground vehicles and sonar-based tracking, like using radar to locate the aircraft [6] or underwater sonar detection [7]. However, the process may encounter signal losses, such as unclear signal or noise interference. For instance, when a vehicle is passing through a tunnel, it may lose the GPS signal for a while. When a ship is tracking on the water surface, obstacles over it like bridges or water reflections also interfere with the tracking procedure. In order to improve the tracking result in an environment with noises and obstacles, researchers have made great effort in robust tracking algorithms. For example, Belagiannis et al. tracked a moving car on the road by segmentation and color gradient orientation histograms [8]. Hua et al. calculated the orientation and position of autonomous underwater vehicles by using a nonlinear visual servoing approach [9]. Prats et al. used two-dimensional (2D) visual servoing techniques based on a template tracking method to analyze the alignment of underwater vehicles with respect to underwater structures [10].

In this paper, the research is focused on the prediction in the tracking process when the object is partially or completely block. The goal of this work is to develop a robust visual tracking method, which provides reliable feedback information of a vehicle like its position and velocity when the system is subject to visual noises and blocking obstacles. This robust visual tracking process can be described into the following four steps. First, a localization algorithm is used to locate the object in clear view circumstances using a view domain transformation. Second, a vector is identified to represent the object using minimum volume ellipse. Third, the identified vector is used as the shape and the location information to classify the pixels in four kinds and a different localization method is used to resolve or predict the position of the object in blocked view circumstances. Fourth, a Kalman filter [11] is applied to the position and velocity data to reduce the effect of random noise and the sudden change of the velocity and position. To validate the proposed method, a submarine in a water tank was tracked using a camera above the tank. By discussing the tracking results and the errors of the path and velocity in the *x* and *y* directions, it has been verified that the proposed tracking algorithm is robust to blocking obstacles and water surface reflection noises.

The rest of the paper is organized as follows: The visual tracking method including view domain transformation, preparation, object tracking, and optimization is introduced in Sec. 2. The experimental results and discussion are presented in Sec. 3. Conclusions and future work are discussed in Sec. 4.

## Method

In this section, the main algorithms for tracking a vehicle on a 2D surface are introduced. The algorithms can be separated into the following four parts: view domain transformation, preparation, object tracking, and optimization. The view domain transformation part calculates the matrix to transform the position data from the image pixels to the real-life Cartesian coordinates. The preparation part initiates the steps, which recognize the object and localize its position. The object tracking part resolves the pixels and then figures out the pixels that represent the object being tracked. The optimization part applies a Kalman filter on the velocity result to reducing the experimental errors.

### Camera Calibration.

*x*and

*y*are the coordinates of the point in the real world,

*c*and

*r*represent the column and the row of the pixel in the image. To simplify the expression, “

**R**” is used to represent the real-world domain and “

**I**” is used to represent the image pixels domain. The transformation matrix

*h*is the mapping matrix from

**R**to

**I**. To solve the matrix with nine unknown variables, nine equations are needed. To obtain nine equations, four reference points are picked up in the image with an extra condition on the matrix or more than five reference points by using the least-square regression to have an approximation. In this algorithm, four reference points are picked up. An supplemental equation is to set the second-order norm of the equation to “1,” $\Vert h\Vert =1$. The equations between the

**R**coordinate and the I coordinate are given as

### Preparation.

In the preparation part, four reference points in **I**, whose **R** coordination are known, are picked up first, as shown in Fig. 1. The water surface shown in Fig. 1 is considered as the two-dimensional tracking domain where the lower left corner is the original point and the upper right corner has the **R** coordination (90, 240). Since the lower corners are blocked in the camera view, the points are chosen with real world positions (0, 120), (0, 240), (90, 240), and (90, 120), which are shown as a red cross in Fig. 1. In **I**, after the tracking domain is fixed, pixels outside the domain are all set to zero to rule out any possible noises. The fifth point is put inside the object pixels. Since it is used as an initial point to generate the first searching window, this point does not have to be the center of the object pixels. To choose the size of the searching window, a square in the **R** is made and the four corners are transformed into the **I**. The maximum and minimum of the value in the *x* and *y* directions at the four corners are used to make a new searching window. This initial searching window is demonstrated as a black box around the object in Fig. 1. The size of the square in **R** is proportional to the current velocity of the object. The faster the velocity is, the larger the window size is. The pixels are resolved by the color features. To differentiate the object from other noises, different thresholds are set for red, green, and blue colors and the ratios among them.

### Object Tracking.

The object tracking part includes important algorithms used in each time-step of the process. During the tracking process, the object may disappear from the camera due to interference from the noise and the obstacles. Two situations are analyzed in this research: one is when part of the object body is blocked and another is when the whole body is blocked. To solve the part blocked problem, an ellipse with the minimum area enclosing the object [13] is found out first and then the part of the long axis inside the object is used as a vector to represent the object. The direction of the vector meets the condition that $v\xb7p>0$, where $v$ is the current velocity of the object and $p$ is the vector used to represent the object.

#### Minimum Volume Ellipse.

*x*

_{0}=

*x*−

*c*,

_{x}*y*

_{0}=

*y*−

*c*if the center is at the point

_{y}*c*, and (

*x*,

*y*) is the position of each point. The expression of the ellipse is also given as Eq. (5), where

*p*are the points on the ellipse,

*c*is the center of the ellipse,

*α*represent the rotation angle of the ellipse and

*E*is given as

The area of the ellipse is written as $(\pi /|E|)$. Nima Moshtagh's research in 2005 gave a solution to find the minimum determinant of *E*^{–1}, where all the given points are in the ellipse, where (*x* − *c*)^{T}*E*(*x* − *c*) < 1 [13].

#### Vector Prediction.

In this method, a vector, called the body vector, is used to represent the object. As shown in Fig. 2, a color threshold is used to find the edge of the object and transfer it into **R** space. After the minimum volume ellipse enclosing the pixels [13] is found out, the long diameter of the ellipse has at least two intersections with the boundary (the edge that the shrink factor is 0.5) of the pixels. Two intersections with the longest distance are considered as the body vector. The middle point of them is the location of the object. By introducing the body vector, there are three advantages: (1) the middle point of the vector is directly used as the location of the object to calculate the velocity and acceleration. (2) The vector in the previous frame is used as a reference to find the object in the new frame: when a group of pixels is resolved, if the difference between the current vector and the reference is less than the tolerance in both length and direction, this group of pixels passes the vector check. (3) When part of the object is blocked, the vector is used to predict the location of the object: When the blocked part of the object is small, the direction from the current vector and the average length of all previous vector are used to represent the object. As shown in Fig. 3, although a part of the object has already been blocked by the obstacle, the red vector shows how much the object was blocked. The location of the object which is the middle point of the vector is marked by the black cross. If the blocked part is very large compared to the rest, the signal, which is sent back to the camera, is limited. The controller of the object will not make any change if there is not enough input signal. Thus, the object will have the same status as the previous frame. The direction of the previous vector and the length of the average value of all previous vectors are used as the reference vector.

#### Shape Recognition.

*I*(

*x*,

*y*), all pixels are real numbers. The gradient of the image is given as

The pixel is an edge if $\Vert \u2207I(x,y)\Vert >Threshold\u2009&\Vert \u2207I(x,y)\Vert >\Vert \u2207I((x,y)+\u2207I\u0302(x,y))\Vert \u2009&\Vert \u2207I(x,y)\Vert >\Vert \u2207I((x,y)\u2212\u2207I\u0302(x,y))\Vert $. In Canny's method [14], there are two threshold. Pixels are the edge if $\Vert \u2207I(x,y)\Vert >High threshold$ or $\Vert \u2207I(x,y)\Vert >Low threshold$ and the pixel is next to an edge. If the overlap between the current edge and the reference is larger than the threshold percentage, the pixel group will be regarded as the object.

#### Tracking Steps.

**R**and

**I**, the vector body in

**R**, and the edge are stored and pass to the next iteration. If not, this situation will be treated as a special case and the starting time is from the last resolved frame. In the special tracking method, the time interval is shortened to increase the accuracy of the prediction. As shown in Fig. 4, the length of vector is tested first. As long as the object is not blocked, the length in

**R**is always the same; thus, when the length difference is less than the tolerance, this group of pixels is the object. When a small part of the object is blocked, the angle of the vector does not change much but the length is shorter and that case, the current direction and length of the average value of previous vectors are used to predict the location of the object. When a large part of the object is blocked, there is a large difference between the reference and the current vector in length and angle. The shape and the location are used to test if the group of pixels is the part of the object. In the location prediction, the same velocity during the last time interval is used to predict the location in the next frame. If the distance between the group of pixels and the result of location prediction is less than the tolerance, those pixels pass the location check. In the shape test, the previous boundary is used as a template, named as

*B*

_{1}, the current boundary is named as

*B*

_{2}. The test condition is given as

where Tol is the tolerance, and (*i*, *j*) are the row and column of each pixel in the image matrix, respectively. When Eq. (9) is satisfied, the boundary test passes. If both the shape and location check pass, the pixels are regarded as a part of the object. Other than that, the resolved pixels are noise. If all pixels are noise, it means the whole object has been blocked and the result of the location prediction is used to represent the object.

### Optimization.

*t*is derived from the time

*t*$\u2212$1 by Eq. (10), where

*F*is the state transition model,

_{k}*B*is the control-input model given as

_{k}*P*is calculated in Eq. (11), where

_{k}*Q*is the covariance of the prediction model

*K*is the optimal Kalman gain at

_{t}*t*state,

*H*is the state observation model mapping the actual state space into the observation space,

*z*is the observation value at

_{t}*t*state, and

*R*is the variance of the observation data

According to our assumption, *u _{t}* = 0,

*w*= 0 and

_{t}*R*is set to be 0.01 to reduce the change in position but optimize most in the velocity result.

## Experimental Results and Discussion

This section demonstrates the result of the tracking. In the simulation, the proposed algorithm was used to track a remote control submarine and fish in a 90 cm by 240 cm water tank. The experiment setup was shown in Fig. 5. There were three pieces of glass above the tank, which blocked the view of the camera when the submarine passed through them. The Kalman filter was used to optimize the result of the position and the velocity.

### Visual Tracking Results.

Behind the blurred glass was the blind area. As shown in Fig. 6, the submarine was outside of the blind area in the first frame. In the next three frames, when the submarine began to enter the blind area, part of it was blocked. In the last two frames, the submarine was blocked completely. In all six frames, the black cross was the result of the different methods in the different conditions classified by a decision tree shown in Fig. 4.

In Fig. 6, it is shown that even when part of the submarine was blocked by the glass, the cross was not at the center of the blue area (the part of the submarine, which is outside of the blind area). Since nothing was changed on the remote controllers, the submarine remained the same velocity when passing through the blind area. When the same velocity was used to predict the position of the submarine, the result shown that the black cross was still near the center of the submarine. After the tracking method was applied to the simulation, the path and the velocity are in *x* and *y* directions.

As shown in Fig. 7, the submarine first stayed at the same position. Due to the error in the image and the calculation process, it behaved like it was floating around a certain point. After it began several seconds, it hit the glass, bounced back, and moved into the first blind zone. Since the hit occurred just after the submarine disappeared, it created a sudden velocity change in the blind zone, which did not satisfy the assumption of the same velocity in the blind zone. Because of that, the location prediction was not accurate in the blind area, which led to the large position change in a short time. After the submarine came outside of the blind area, the result was normal tracking before it entered the next blind area. In the second blind area, there was no velocity change, which gave a good location prediction result. The tracking process in the second blind area is also demonstrated in Fig. 6. According to the calculation eight sets of data, the variance of position at the first time state was chosen at 0.01. The tracking result with Kalman filtering is shown in blue in Fig. 7.

The velocity was plotted in Fig. 8. After four seconds, the velocity began to increase. Due to the drag force by the water, the acceleration was decreasing. When the submarine hit the boundary glass on the right side of the tank, the velocity on *x* direction had a large change while the change in y direction was not very large. After the vector prediction frames, the velocity became constant due to the constant location prediction in the blind area. When the submarine moved out the blind area, there was a large change of location in a short time. When it moved into the second blind area, the velocity change was also large, compared with the constant velocity in the second blind area. After the Kalman filter was applied, the velocity was smoother than the original result. The fluctuation problem was solved and the sudden drop in velocity became a gradual change.

In another simulation, a black robotic fish was tracked. In Fig. 9, the red cross represented the position of the fish. The path and the velocity of the fish in the water tank were shown in Figs. 10 and 11, respectively. The black color represented the normal tracking result, the brown was the vector prediction result when the small part of the object was blocked, the green was the vector prediction result when the large part of the object was blocked, and the red was the location prediction when the whole object was blocked and the blue line was the result after applying the Kalman filter.

### Discussion.

In Figs. 7 and 10, the normal tracking gave an accurate result, but the prediction result introduced more errors in the result. When a small part of the submarine was blocked, the direction of the current vector had a very small variation because of the change of the shape, when a large part was blocked, the vector was the same with the one in the last frame. This prediction was valid only when the time was short between two frames and the change of the velocity was small. The result of this prediction had its own errors plus the errors from the previous frame. In the location prediction, it was assumed that the constant velocity, which was calculated in the last vector prediction result. The error of the location prediction was large because it accumulated all error in the previous frames and the prediction process, which caused a large location change when it changed back to the normal tracking method.

When the pixels representing the object were calculated, due to the environment light change and the voltage fluctuation on the camera, the brightness of the object changed. When the brightness of the edge pixels decreased below the resolved threshold, the shape of the object was smaller. The shape variation caused the fluctuation on the length of the vector and the position of the object. The velocity fluctuation was amplified before the blind area. This was due to the smaller time steps and the same standard deviation on the position noise. Although the Kalman filter did not completely remove the experimental error, it avoided the vibration and the sudden change of the feedback signal and provided a more robust response in the control system.

Comparing two tracking results, in Fig. 6, the prediction had a better result than the result in Fig. 9. In Fig. 9, when the fish was moving into the blind area, the prediction result did not mark the position precisely; this was because the fish shook its body in order to swim forward. The vector representing the fish kept changing its direction, which caused the error in the vector prediction when it was moving into the blind area. This phenomenon is also reflected in Fig. 11. In the normal tracking result, the path and the velocity in both *x* and *y* direction had a large fluctuation. In Fig. 10, the vector prediction result shown as the red line deviated the original direction, which also caused the location prediction less precise. In the signal time-step, the fish did not have a stable moving direction; however, in the long-time tracking, it followed the order from the controller and moved to the given direction. The Kalman filter was applied to eliminate the fluctuation of the velocity and the path.

## Conclusion and Future Work

In this paper, a robust visual tracking method is developed to track the object on the specific surface and provide an accurate position even when the object was blocked by the obstacle. The vector prediction and the constant velocity location prediction provide the position of the object if there is no sudden velocity change in the blind area. This tracking algorithm provides position and velocity information for the intelligence feedback control even when the feedback information is not strong enough. However, there is an accumulation of error in the prediction of position and velocity in the blind area. The problem is more complex when the obstacle is not stationary.

The future work will be focused on optimizing the algorithm to have less error and faster computing speed and applying it to the real-time control. This algorithm will be used to test multiple objects. By applying the shape recognizing and minimum ellipse enclosing technique, collision avoidance controls will be developed for autonomous mobile vehicles.

## Funding Data

National Science Foundation (Grant No. CNS #1446557; Funder ID: 10.13039/501100008982).

Division of Computer and Network Systems (Grant No. 1446557; Funder ID: 10.13039/100000144).