During the past summer months, a friend and I have been developing the hardware for the physical model of the Cart-Pole system. The Cart is mounted on an aluminium profile track and is driven by a timing belt. A main microcontroller (ESP32) controls the system and talks to the user's computer. There are also some worker ESP32s scattered around the assembly.

There is an inverted pendulum (Pole) that rotates freely around the Cart with a bearing. The rotation/angle of the Pole is measured using an incremental encoder (AMT102-V). The data pins of the encoder are read using interrupt pins on a worker ESP32. The ESP32 sends the current angle of the Pole to the main ESP32 wirelessly using ESP-NOW. Both the encoder and the ESP32 are powered by a 3.7 V battery. The reason for using wireless communication and battery power is to make the design of the Poles less complex. For example, using wired encoders would require using slip rings and increasing the number of poles would mean changing the design. Using "smart" Poles instead makes the system modular as you can have a single design of a Pole but use any number of Poles in the system.

The timing belt is driven by a stepper motor (CS-M22430) with an inbuilt encoder. However, the encoder is not working properly so we are planning on replacing it with an incremental encoder of the same model used in the Pole. There are also end stops on each end of the track. These are not yet connected but will be used for homing the Cart and for stopping the motor if it spins out. The end stops and motor encoder are all connected to another worker ESP32 that, in the future, will send the rotation of the motor and the end stop signals to the main ESP32.

The stepper motor is driven by a stepper motor driver (CS1-D507S) that is hooked up to a 48 VDC power supply and is connected to the main ESP32. The ESP32 tells the motor which speed it should have using the AccelStepper library. The user's computer communicates to the ESP32 using serial and sends the desired acceleration of the motor. The ESP32 in turn integrates the acceleration to a velocity that it sends to the motor driver.

The control system builds on the technologies that have been used previously in the simulations. The control scheme uses direct collocation for generating trajectories, finite-horizon LQR for trajectory stabilization, and infinite-horizon LQR for balancing. The controller sends control signals, i.e. the desired acceleration of the stepper motor, and receives the current state of the system from the main ESP32 via serial communication. The controller runs a CartPoleEnv that stores the data and shows a virtual version of the Cart-Pole system.

The user can choose among several commands, such as sending a cosine signal or simple position tracking, but we'll stick to the trajectory command for simplicity's sake. The user inputs the desired state (position of the cart and if the pendulum should be up/down) and how long the trajectory will take. The desired state and time are used to generate an optimal trajectory using CasADi. This is done in a separate process using Python's multiprocessing library so that the main thread can keep sending control signals to the ESP32. The optimal states are then linearized and discretized and used to calculate the finite-horizon LQR gains, as well as to calculate the error during the trajectory. The optimal feedforward control plus the feedback control, calculated using the error and LQR gains, is sent to the main ESP32 to perform the trajectory. The trajectory controller is shown below.

As usual, there are some revisions to the mathematical model. Using the control scheme with the physical model worked well when balancing the inverted pendulum. This suggests that the dynamics were quite accurate at the equilibrium points. However, the generated trajectories and feedforward control made the system unstable. This made us doubt the accuracy of our current model, so we went looking for a better one.

Luckily for us, Tobias Glck et al. [1] had already managed to balance a triple pendulum on a cart with their mathematical model. Their equations did however not provide explicit solutions for the pendulums' angular accelerations, which had to be solved using Lagrangian mechanics. To solve for the angular accelerations, as well as the control force which can be used as a constraint, we used SymPy. For faster calculations, you generally want to use numerical expressions instead of symbolic ones. To achieve this, we converted the SymPy expressions to CasADi expressions using a `sympy2casadi`

function by Joris Gillis [2]. Using our CasADi expressions, we can perform calculations easily and quickly in our controller.

We already know a lot about the parameters of our system. The mass of the Cart and Pole, as well as the inertia, length and distance to the centre of mass of the Pole can all be measured. The only unknown parameter is the viscous friction in the joint of the Pole. This could be estimated using for instance linear regression, but at the moment the friction has been estimated empirically.

The physical model was connected to the control scheme and the Pole was set to do some swing up/down maneuvers and position following along the track. The result can be seen in the video below.

And it works super well!

Next up, I will work on system identification, which will be important when we add another pendulum. My friend is working on making a modular Pole design. Hopefully, the end stops and the motor encoder will all be up and running as well. But that's all for now, thanks for reading!

-SL

[1] T. Glck, A. Eder, and A. Kugi, "Swing-up control of a triple pendulum on a cart with experimental validation," Automatica, vol. 49, no. 3, pp. 801-808, Elsevier, 2013, doi:10.1016/j.automatica.2012.12.006.

[2] J. Gillis, "sympy2casadi," [Gist], GitHub, August 13, 2023. URL: https://gist.github.com/jgillis/80bb594a6c8fcf55891d1d88b12b68b8

]]>At the moment, me and a friend (mostly my friend) are assembling the hardware for the cart-pole system. After some thinking and testing, the DC motor we had planned to use is not up to par. It can provide enough angular velocity and torque for the swing-up maneuver, but it lacks the precision needed to accurately balance the inverted pendulums and provide accurate feedback control. Therefore, we decided to switch to a stepper motor instead. Stepper motors have the complete opposite strengths and weaknesses. They have high precision control, but they can lack speed and torque. However, the one that we have should be sufficient for our needs.

Introducing a stepper motor simplifies the dynamics of the system. Similar to the triple pendulum system of Glck et al. [1], we have assumed that there is an ideal velocity and acceleration control of the motor. Using this property, we can choose the angular acceleration of the cart as the control variable *u*. Therefore, as long as the stepper motor can provide enough torque, the dynamics simplify to:

The states of the system remain the same, but the symbols of the position, velocity and acceleration of the cart have changed to *s*. The previous equation that described the cart's acceleration has also been rewritten to describe the output torque of the stepper motor. This equation is used as a constraint in the trajectory solver.

These changes are reflected in the code. There is a new class with a way too long name called `CartPoleStepperMotorSystem`

with updated methods, including a new method called `constrain_states`

which returns the output torque of the motor given a state and control. As always, the code is available on GitHub.

```
...
def constraint_states(self, state: np.ndarray, control: np.ndarray) -> np.ndarray:
d_s = state[1]
dd_s = control[0]
thetas = state[2::2]
d_thetas = state[3::2]
sum1 = sum(self.ms*sin(thetas)*cos(thetas))
sum2 = sum(self.ms*(self.ls/2)*d_thetas**2*sin(thetas))
sum3 = sum((self.u_ps*d_thetas*cos(thetas))/(self.ls/2))
sum4 = sum(self.ms*cos(thetas)**2)
f = 3/7*(self.g*sum1-7/3*(sum2-self.cart.u_c*d_s)-sum3-(sum4-7/3*self.M)*dd_s)
torque = f*self.motor.r
return np.array([torque])
...
```

Using the updated dynamics, we are ready to upgrade our solver using an external library that is already fast and optimized. A popular choice in the industry is CasADi, which is "an open-source software framework for numerical optimization" [2]. CasADi can be written in C++, Matlab/Octave and Python and is quite complex. To address this issue, CasADi released Opti which simplifies the syntax of creating and solving nonlinear programs, which is what we'll use since it provides a gentler learning curve.

The code we'll write is specifically made for the `CartPoleStepperMotorSystem`

and is not abstracted for any dynamical system, unlike our previous `DirectCollocation`

class. The aptly named `CartPoleStepperMotorDirectCollocation`

class has a constructor that takes in the number of discrete points of the trajectory to output (`N`

), the number of collocation points that will be used for the spline (`N_collocation`

), the system and the tolerance of the optimizer. The constructor then calls the method `calculate_jacobian()`

, which constructs the jacobian of the dynamics, which will be discussed later in the part about linearization. The class also has its own methods for calculating the dynamics and constraints.

```
import numpy as np
from scipy.interpolate import CubicSpline #type: ignore
from .cartpolesystem import CartPoleStepperMotorSystem
import casadi as ca
class CartPoleStepperMotorDirectCollocation():
def __init__(self, N: int, N_collocation: int, system: CartPoleStepperMotorSystem, tolerance=1e-6):
self.N = N
self.N_collocation = N_collocation
self.N_states = system.num_states
self.N_controls = system.num_controls
self.tolerance = tolerance
self.system = system
self.calculate_jacobian()
def differentiate(self, x, u):
d_s = x[1]
dd_s = u[0]
next_x = [0 for _ in range(self.system.num_states)]
next_x[0] = d_s
next_x[1] = dd_s
for i in range(self.system.num_poles):
theta = x[2+2*i]
d_theta = x[3+2*i]
u_p = self.system.u_ps[i]
l = self.system.ls[i]
m = self.system.ms[i]
dd_theta = 3/(7*l/2)*(self.system.g*ca.sin(theta)-dd_s*ca.cos(theta)-u_p*d_theta/(m*l/2))
next_x[2+2*i] = d_theta
next_x[3+2*i] = dd_theta
return next_x
def constraint_states(self, x, u):
d_s = x[1]
dd_s = u[0]
sum1 = 0
sum2 = 0
sum3 = 0
sum4 = 0
for j in range(self.system.num_poles):
theta = x[2+2*j]
d_theta = x[3+2*j]
l = self.system.ls[j]
m = self.system.ms[j]
u_p = self.system.u_ps[j]
sum1 += m*ca.sin(theta)*ca.cos(theta)
sum2 += m*l/2*d_theta**2*ca.sin(theta)
sum3 += u_p*d_theta*ca.cos(theta)/(l/2)
sum4 += m*ca.cos(theta)**2
f = 3/7*(self.system.g*sum1-7/3*(sum2-self.system.cart.u_c*d_s)-sum3-(sum4-7/3*self.system.M)*dd_s)
torque = f*self.system.motor.r
return np.array([torque])
```

In the method `make_solver`

, the first thing we do is to create an opti object using `ca.Opti()`

. We then create the state and control variables in each collocation point using `opti.variable()`

whose values will be optimized. Using the initial guess `x0`

and final state `r`

, an initial guess is made, the objective function is set, and equality and inequality constraints are added using methods that will be described later. Using `opti.solver("ipopt")`

, the solver is activated using the algorithm IPOPT, a popular nonlinear optimization algorithm. The final optimized values are then extracted and splines of the state and control trajectories are returned.

```
def make_solver(self, end_time: float, x0, r, x_guess = np.array([]), u_guess = np.array([])):
self.opti = ca.Opti()
self.xs = self.opti.variable(self.N_collocation,self.system.num_states)
self.us = self.opti.variable(self.N_collocation,self.system.num_controls)
self.h = end_time/self.N_collocation
self.end_time = end_time
self.x0 = x0
self.r = r
self.make_guess(x0, r, x_guess, u_guess)
self.set_objective_function()
self.set_eq_constraints()
self.set_ineq_constraints()
self.opti.solver("ipopt")
sol = self.opti.solve()
x_optimal_raw = sol.value(self.xs)
u_optimal_raw = sol.value(self.us)
time_collocation = np.linspace(0, self.end_time, self.N_collocation)
time = np.linspace(0, self.end_time, self.N)
states = np.vstack([
CubicSpline(time_collocation, s_row)(time) for s_row in x_optimal_raw.T
]).T
controls = np.vstack(np.interp(time, time_collocation, u_optimal_raw))
return states, controls
```

The `make_guess`

method works very similarly to the one in the `DirectCollocation`

class. The only difference is that the guesses are automatically set in the method to be the initial values of the variables using `opti.set_initial()`

. In the `set_objective_function`

, the sums of squares objective function of the control signal is added. Here, since this function isn't called for every iteration, we can afford to use for-loops without slowing down the code, since this method is only called once. The objective function is added using `opti.minimize(obj)`

where `obj`

is the objective to minimize.

```
def make_guess(self, x0: np.ndarray, r: np.ndarray, x_guess: np.ndarray, u_guess: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
self.x0 = x0
self.r = r
if x_guess.size == 0 or u_guess.size == 0:
x_guess = np.linspace(x0, r, self.N_collocation)
u_guess = np.zeros((self.N_collocation, self.system.num_controls))
self.x_guess = x_guess
self.u_guess = u_guess
self.opti.set_initial(self.xs, x_guess)
self.opti.set_initial(self.us, u_guess)
return x_guess, u_guess
def set_objective_function(self):
obj = 0
for i in range(self.N_collocation-1):
obj += (self.us[i,0]**2+self.us[i+1,0]**2)*self.h/2
self.opti.minimize(obj)
```

Constraints, both equality and inequality, are added using `opti.subject_to()`

. In the `set_eq_constraints`

, the initial and final state constraints are added. The equality constraints on the angles of the pendulums look a bit strange. The equations make sure that if the difference between the angle and the goal angle is a multiple of 2, it is allowed. Similarly, the `set_ineq_constraints`

adds the dynamic constraints, the lower and upper bounds of the variables, as well as the torque constraint.

```
def set_eq_constraints(self):
self.opti.subject_to(self.xs[0,0] == self.x0[0])
self.opti.subject_to(self.xs[-1,0] == self.r[0])
self.opti.subject_to(self.xs[0,1] == self.x0[1])
self.opti.subject_to(self.xs[-1,1] == self.r[1])
for i in range(self.system.num_poles*2):
if i % 2 == 0:
self.opti.subject_to(ca.arctan2(ca.sin(self.x0[i+2]-self.xs[0,i+2]),ca.cos(self.x0[i+2]-self.xs[0,i+2])) == 0)
self.opti.subject_to(ca.arctan2(ca.sin(self.r[i+2]-self.xs[-1,i+2]),ca.cos(self.r[i+2]-self.xs[-1,i+2])) == 0)
else:
self.opti.subject_to(self.xs[0,i+2] == self.x0[i+2])
self.opti.subject_to(self.xs[-1,i+2] == self.r[i+2])
def set_ineq_constraints(self):
for i in range(self.N_collocation):
s = self.xs[i,0]
last_s = self.xs[i-1,0]
d_s = self.xs[i,1]
last_d_s = self.xs[i-1,1]
dd_s = self.us[i,0]
last_dd_s = self.us[i-1,0]
self.opti.subject_to(self.opti.bounded(self.system.state_lower_bound[0]+self.system.state_margin[0],s,self.system.state_upper_bound[0]-self.system.state_margin[0]))
self.opti.subject_to(self.opti.bounded(self.system.state_lower_bound[1]+self.system.state_margin[1],d_s,self.system.state_upper_bound[1]-self.system.state_margin[1]))
d_x = self.differentiate(self.xs[i,:],self.us[i,:])
last_d_x = self.differentiate(self.xs[i-1,:],self.us[i-1,:])
if i > 0:
self.opti.subject_to((d_s+last_d_s)*self.h/2 == (s-last_s))
self.opti.subject_to((dd_s+last_dd_s)*self.h/2 == (d_s-last_d_s))
for j in range(self.system.num_poles):
theta = self.xs[i,2+2*j]
last_theta = self.xs[i-1,2+2*j]
d_theta = self.xs[i,3+2*j]
last_d_theta = self.xs[i-1,3+2*j]
dd_theta = d_x[3+2*j]
last_dd_theta = last_d_x[3+2*j]
self.opti.subject_to((d_theta+last_d_theta)*self.h/2 == (theta-last_theta))
self.opti.subject_to((dd_theta+last_dd_theta)*self.h/2 == (d_theta-last_d_theta))
constraint_state = self.constraint_states(self.xs[i,:],self.us[i,:])
f = constraint_state[0]
torque = f*self.system.motor.r
self.opti.subject_to(self.opti.bounded(self.system.motor.torque_bounds[0]*(1-self.system.motor.torque_margin),torque,self.system.motor.torque_bounds[1]*(1-self.system.motor.torque_margin))) #type: ignore
```

The previous linearization algorithm used SymPy which is written in pure Python. Since Python is pretty slow, linearizing the dynamics along an entire trajectory can take up to 10 seconds, which is not desirable. To linearize the dynamics, we will use the standard CasADi syntax. We first create symbolic state and control variables (`x_vars`

and `u_vars`

respectively) using `ca.SX.sum(name_of_var)`

. Using `differentiate()`

we can get the dynamic equations (`eqs)`

. We then vertically concrete `x_vars`

and `u_vars`

, as well as the dynamic equations, using `ca.vertcat()`

, which places the elements in a row. The Jacobian (`J`

) can then be calculated using `ca.jacobian(eqs, vars)`

.

```
import casadi as ca
...
def calculate_jacobian(self):
x_vars = []
x_vars.append(ca.SX.sym("s"))
x_vars.append(ca.SX.sym("d_s"))
for i in range(self.system.num_poles):
x_vars.append(ca.SX.sym(f"theta_{i}"))
x_vars.append(ca.SX.sym(f"d_theta_{i}"))
u_vars = [ca.SX.sym("u")]
eqs = self.differentiate(x_vars, u_vars)
vars = x_vars + u_vars
vars = ca.vertcat(*vars)
eqs = ca.vertcat(*eqs)
J = ca.jacobian(eqs, vars)
self.J = J
self.vars = vars
self.eqs = eqs
```

Finally, the `linearize`

method takes in the operating state and control values and substitutes these values in the Jacobian. The substituted Jacobian is then converted to a numeric matrix using `ca.DM()`

and then to NumPy array. The A and B matrices of the linearized system can then be extracted.

```
def linearize(self, x_vals: np.ndarray, u_vals: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
vals = ca.vertcat(*x_vals, *u_vals)
J_eval = ca.substitute(self.J, self.vars, vals)
J_eval_np = np.array(ca.DM(J_eval))
A = np.array(J_eval_np[:,:-1]).astype(np.float64)
B = np.vstack(np.array(J_eval_np[:,-1]).astype(np.float64))
return A, B
```

Using two poles, a simulation time step of 0.005 seconds, 100 collocation points and a swing-up time of 5 seconds, the algorithm took 1.8 seconds to calculate the trajectory and 0.2 seconds to linearize and calculate the feedback gains along the trajectory. The resulting swing-up can be seen below. I'm really satisfied with how fast the algorithm is now. It's not fast enough for model-predictive control, but it is fast enough to calculate trajectories on the go.

Next up, I'll focus on developing a nicer UI and a controller that can be used to send commands to the pendulum live. The goal is for the controller to be able to swing the pendulums to different equilibrium points on the go. This could example be to swing from the up-up position, where the first pendulum is pointing up and the second pendulum is pointing up, to the up-down position. For example, the cover image shows a triple pendulum going from the down-down-down position to the up-up-down position. That's all for now, I hope you enjoyed reading.

-SL

[1] T. Glck, A. Eder, and A. Kugi, "Swing-up control of a triple pendulum on a cart with experimental validation," Automatica, vol. 49, no. 3, pp. 801-808, Elsevier, 2013, doi:10.1016/j.automatica.2012.12.006.

[2] J.A.E. Andersson, J. Gillis, G. Horn, J.B. Rawlings, and M. Diehl, "CasADi - A software framework for nonlinear optimization and optimal control," Mathematical Programming Computation, vol. 11, no. 1, pp. 1-36, Springer, 2019, doi:10.1007/s12532-018-0139-4.

]]>The old code consisted of a single `CartPoleSystem`

class that simulated the system. On top of that, there was a `CartPoleEnv`

class that added rendering and rewards to make it compatible with reinforcement learning. The `differentiate`

method that was used in the explicit integration was messy and did not use NumPy and was therefore quite slow. To fix this, the code has been rewritten using NumPy and the structure has been overhauled. Now, there's a class called `CartPoleDCMotorSystem`

that is instantiated with a `Cart`

, `DCMotor`

and a list of `Pole`

objects. This system class doesn't contain any internal state and is not used for simulation. That is where the `CartPoleEnv`

class comes in which has been overhauled to be used for simulating the system. The differentiate method has been changed to use NumPy which can be seen below. As always, the full code is available in the repository on GitHub.

```
import numpy as np
...
def differentiate(self, state: np.ndarray, control: np.ndarray) -> np.ndarray:
# x = state[0]
d_x = state[1]
Va = control[-1]
thetas = state[2::2]
d_thetas = state[3::2]
sum1 = (self.ms*sin(thetas)*cos(thetas)).sum()
sum2 = (self.ms*(self.ls/2)*d_thetas**2*sin(thetas)).sum()
sum3 = ((self.u_ps*d_thetas*cos(thetas))/(self.ls/2)).sum()
sum4 = (self.ms*cos(thetas)**2).sum()
Ia = (Va-self.motor.Ke*d_x/self.motor.r)/self.motor.Ra
h2 = (self.g*sum1-7/3*(1/self.motor.r**2*(self.motor.Kt*self.motor.r*Ia-self.motor.Bm*d_x)-sum2-self.cart.u_c*d_x)-sum3)
g2 = (sum4-7/3*(self.M+self.motor.Jm/self.motor.r**2))
dd_x = h2/g2
dd_thetas = 3/(7*self.ls/2)*(self.g*sin(thetas)-dd_x*cos(thetas)-self.u_ps*d_thetas/(self.ms*self.ls/2))
d_state = np.concatenate((np.array([d_x, dd_x]), np.column_stack((d_thetas, dd_thetas)).flatten()))
return d_state
...
```

In a previous article on balancing the Cart-Pole using LQR, I omitted to show the linearization code as it's very long and not very good. I simply used the state space equations and calculated the Jacobian by hand and put that into code. The problem is that if the kinematic equations were to change, then I would have to remake those equations. Instead of doing that, there are ways of linearizing the dynamics using code.

Two different methods are either using symbolic mathematics or numerical computing. In numerical computing, the answer is numerically approximated whilst symbolic mathematics can find the precise equation and the variables can later be substituted in, giving you an answer. I chose to use symbolic mathematics using SymPy, which gives a highly precise answer, but it would later turn out that this method is quite slow which numerical computing fixes, but for now we will go forward using Sympy.

To linearize the system using SymPy, we will first have to define the state and control variables using SymPy syntax. This is done using the `symbols`

function. In the system class, a method `get_sp_vars`

is called in the constructor to generate the system variables. Afterward, a method called `get_sp_equations`

is used to calculate the state space equations of the system using the same logic as the `differentiate`

method, only using SymPy variables instead of NumPy arrays.

```
import sympy as sp
...
def get_sp_vars(self):
variables = "x dx " + " ".join([f"theta{i} d_theta{i}" for i in range(self.num_poles)]) + " Va"
self.vars = sp.symbols(variables)
return self.vars
def get_sp_equations(self):
# x = self.vars[0]
d_x = self.vars[1]
Va = self.vars[-1]
thetas = self.vars[2::2]
d_thetas = self.vars[3::2]
f1 = d_x
sp_sum1 = sum(m*sp.sin(theta)*sp.cos(theta) for m, theta in zip(self.ms, thetas))
sp_sum2 = sum(m*(l/2)*d_theta**2*sp.sin(theta) for m, l, d_theta, theta in zip(self.ms, self.ls, d_thetas, thetas))
sp_sum3 = sum(u_p*d_theta*sp.cos(theta)/(l/2) for u_p, d_theta, theta, l in zip(self.u_ps, d_thetas, thetas, self.ls))
sp_sum4 = sum(m*sp.cos(theta)**2 for m, theta in zip(self.ms, thetas))
h2 = (self.g*sp_sum1-7/3*(1/self.motor.r**2*(self.motor.Kt/self.motor.Ra*(Va*self.motor.r-self.motor.Ke*d_x)-self.motor.Bm*d_x)-sp_sum2-self.cart.u_c*d_x)-sp_sum3)
g2 = (sp_sum4-7/3*(self.M+self.motor.Jm/self.motor.r**2))
dd_x = h2/g2
f2 = dd_x
f_d_thetas = d_thetas
dd_thetas = [3/(7*l/2)*(self.g*sp.sin(theta)-dd_x*sp.cos(theta)-u_p*d_theta/(m*l/2)) for l, theta, u_p, m, d_theta in zip(self.ls, thetas, self.u_ps, self.ms, d_thetas)]
f_dd_thetas = dd_thetas
self.eqs = [f1, f2] + [val for pair in zip(f_d_thetas, f_dd_thetas) for val in pair] #type: ignore
return self.eqs
...
```

To linearize the dynamics at an operating point, the Jacobian matrix of the dynamics needs to be calculated. This is easily done by converting the dynamic equations into a SymPy matrix and by using the `jacobian`

function. The Jacobian is calculated in the `get_sp_jacobian`

method. In the `linearize`

method, the Jacobian matrix `F`

is substituted with an operating point which results in an A and B matrix.

```
...
def get_sp_jacobian(self):
self.F = sp.Matrix(self.eqs).jacobian(self.vars)
return self.F
def linearize(self, state0: np.ndarray, control0: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
operating_point = {var: value for var, value in zip(self.vars, np.concatenate((state0, control0)))}
F_evaluated = self.F.subs(operating_point)
A = np.array(F_evaluated[:,:-1]).astype(np.float64)
B = np.array(F_evaluated[:,-1]).astype(np.float64)
return A, B
...
```

In the previous article on trajectory optimization, the feedforward control is calculated using direct collocation. The feedforward signal however uses another integration method that is not explicit RK4 which is used in the simulation, which leads to integration errors. The integration errors, as well as process and measurement noise, make it impossible for the feedforward control to swing up the pendulums alone. To account for noise along the trajectory, an additional feedback LQR controller is added that tries to minimize the trajectory error. The measured state is assumed to be the full state in this setup, but this will hopefully be expanded upon later using a Kalman filter.

The LQR scheme that was developed in the last article was a discrete infinite-horizon LQR. This will however not work for trajectory stabilization. Instead, what we need now is a discrete finite-horizon LQR, which can be calculated similarly. Firstly, the system needs to be linearized along the desired trajectory that was calculated by the direct collocation algorithm. By using NumPy's `vectorize`

, the dynamics are linearized using each row in the desired states and control as the operating points, resulting in a 3-dimensional matrix of A and B matrices for each time step in the trajectory.

```
dt = 0.01 # Simulation time step
end_time = 4 # Duration of the trajectory
dt_collocation = 0.05 # Time step of the collocation
N_collocation = int(end_time/dt_collocation)+1 # Number of collocation points
N = int(end_time/dt) # Number of simulation points
# Instantiate collocation solver
direct_collocation = DirectCollocation(
N_collocation,
system.differentiate,
system.state_lower_bound.shape[0],
system.control_lower_bound.shape[0],
system.constraints,
system.calculate_error,
0.001
)
initial_state = np.array([0, 0] + [radians(180), 0]*n)
reference_state = np.array([0, 0] + [0, 0]*n)
# Find a trajectory
states, controls = direct_collocation.make_controller(end_time, initial_state, reference_state, N)
# Linearize dynamics along the trajectory
As, Bs = np.vectorize(system.linearize, signature='(n),(m)->(n,n),(n,m)')(states, controls)
```

To perform trajectory stabilization using LQR, one first needs to be familiar with the Riccati equations. In our case, this is the Discrete Algebraic Riccati Equation (DARE). I have split this into two equations (1.1 and 1.2), where *P* is the unique positive definite solution to the DARE, and *K* is the discrete infinite-horizon LQR gain which is used for balancing the poles.

To calculate the LQR gains along the trajectory, the infinite-horizon solution won't do. Instead, we will calculate the discrete finite-horizon gains for every A and B matrix. To do this we use the dynamic Riccati equation. The LQR gains are calculated backward in time using 1.3 and 1.4.

To implement this, the `FSFB`

and `LQR`

classes have been rewritten to be more flexible. The class no longer has any members and the A and B matrices are provided in the methods instead. The method `calculate_finite_K_ds`

takes in the A and B matrices along with the Q and R matrices. It first calculates the last *P* and *K* using DARE using the last state when the system should be at an equilibrium point. The gains are later calculated backward using the dynamic Riccati equation.

```
import numpy as np
from scipy.signal import cont2discrete
from scipy.linalg import solve_continuous_are, solve_discrete_are
class FSFB:
def __init__(self, dt: float):
self.dt = dt
def discretize(self, A: np.ndarray, B: np.ndarray, C: np.ndarray, D: np.ndarray):
dlti = cont2discrete((A,B,C,D),self.dt)
A_d = np.array(dlti[0])
B_d = np.array(dlti[1])
return A_d, B_d
class LQR(FSFB):
# Calculates the continuous infinite-horizon LQR gain
def calculate_K(self, A: np.ndarray, B: np.ndarray, Q: np.ndarray, R: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
P = solve_continuous_are(A, B, Q, R)
K = np.linalg.inv(R) @ B.T @ P
return P, K
# Calculates the discrete infinite-horizon LQR gain
def calculate_K_d(self, A_d: np.ndarray, B_d: np.ndarray, Q: np.ndarray, R: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
P_d = solve_discrete_are(A_d, B_d, Q, R)
K_d = np.linalg.inv(R + B_d.T @ P_d @ B_d) @ (B_d.T @ P_d @ A_d)
return P_d, K_d
# Calculates the discrete finite-horizon LQR gains along a trajectory
def calculate_finite_K_ds(self, A_ds: np.ndarray, B_ds: np.ndarray, Q: np.ndarray, R: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
P_d_last, K_d_last = self.calculate_K_d(A_ds[-1], B_ds[-1], Q, R)
K_ds = np.zeros((A_ds.shape[0], K_d_last.shape[0], K_d_last.shape[1]))
K_ds[-1] = K_d_last
P_ds = np.zeros((A_ds.shape[0], P_d_last.shape[0], P_d_last.shape[1]))
P_ds[-1] = P_d_last
for k in range(A_ds.shape[0]-2, -1, -1):
K_ds[k] = np.linalg.inv(R + B_ds[k].T @ P_ds[k+1] @ B_ds[k]) @ (B_ds[k].T @ P_ds[k+1] @ A_ds[k])
P_ds[k] = A_ds[k].T @ P_ds[k+1] @ A_ds[k] - (A_ds[k].T @ P_ds[k+1] @ B_ds[k]) @ K_ds[k] + Q
return P_ds, K_ds
# Calculates the feedback control
def feedback(self, K: np.ndarray, error: np.ndarray) -> np.ndarray:
u = error @ K.T
return u
```

Using the aforementioned control scheme, a trajectory was generated and the feedback gains were calculated. The system was simulated with a time step of 0.01 seconds and two poles.

And it works very well! Granted, this is simulated without any process or measurement noise, but it is looking very promising so far! Below the position of the cart and the input voltage are graphed throughout the swing-up maneuver.

Next up is to optimize the speed of the code as the trajectory solver and the linearization steps are pretty slow at the moment. For the trajectory solver, SciPy will be switched out to GEKKO. For the linearization, a numerical solution will hopefully be used instead.

I am currently developing the physical system with a friend and we are switching out the DC motor for a stepper motor as the balancing act needs high precision. Later on, process and measurement noise will be added to the simulation along with an observer using a Kalman filter. But that's it for now, thank you for reading!

Heads up, this blog post will be less professional and a bit more personal. For the past two months, I've been neck-deep in working on my Bachelor's thesis (which I will hopefully get to write about soon). At the same time, I've been working on a small electric vehicle, or EV for short. It is the first vehicle I've ever built, and it has been as fun as it has been stressful. Keep in mind, the pictures that I took for the documentation were lost to my old phone. The pictures in this article were all taken after the EV was completed.

Besides being a fun project, it also has a fun purpose. I'm a student at Chalmers University of Technology in Gothenburg, and every year on the 30th of April we celebrate Valborg in Sweden. On this day, since 1909, Chalmers students have been putting on a parade in Gothenburg called the Chalmers Cortge. Students have 10 days to build theatrical displays called equipages (Swedish: *ekipage*) that are put on top of large trucks and driven through the city. Besides equipages, students also build theatrical numbers that are not put on trucks. These vehicles are called single numbers (Swedish: *lsnummer*). Most of them are gas driven but an increasing number are becoming electric.

I am a board member of Chalmers Autonomous Systems and Electronics Association (CASE Association for short) and we love building EVs. The association is quite young so last year's Cortge was the first we could participate in. Last year we drove three single numbers, one of which broke down. This year, we wanted to go BIG. So, four of the board members (including me) plus two previous board members decided to build a vehicle each.

Back in September of 2022, my friend and fellow CASE board member suggested the idea of buying a pedal car for kids. We ended up buying one each and started to make plans for electrification. In my possession, I had a brushed DC motor for electric bicycles, a mount for the motor and two wheels mounted on an axle that I had received from my friend. The axle and DC motor had gears in place for a chain drive. This sealed the fate of my EV to use a chain drive, which would come to haunt me for the rest of the month.

The pedal car was small and in rough shape when I bought it. Noticeably, the seat was quite low and uncomfortable to sit in. I also needed a space to place the electronics and batteries in, as well as a platform to attach the motor mount. Looking through the metal scraps in the CASE lab like the little rat I am, I found a rusty and poorly welded steel box that was perfectly suited for my needs. Using a bench drill press, I drilled six holes into the bottom plate of the box, through which I could fit M6 screws. These screws, along with M6 nyloc nuts, were used to fasten the steel box onto the pedal car. I also drilled a couple of holes on the top side of the box to fasten the seat.

Another big flaw with the original pedal car was the wheels. The small plastic wheels did not provide enough grip force, couldn't handle off-road travel, and were super ugly. Luckily, I received a wheel axle along with two wheels from my friend. The axle included a chain drive gear and was mounted on the back of the car. To replace the steering wheels, I bought two rubber wheels with an inner diameter of 200 mm. However, the old steering wheels had an inner diameter of 100 mm. To attach the new wheels, I procured an aluminium rod with a diameter of 200 mm to create a spacer. Using a bench drill press, I drilled a hole along the axis of the rod's length with a diameter of 100 mm and a depth of 60 mm, which was deep enough to fit the steering axle. On the other end of the hole, across the rod, I drilled a 5.5 mm hole to fit an M5 screw to fasten the wheel onto the spacer. Two of these spacers were constructed, one for each wheel.

Finally, the motor mount made of a steel plate needed to be attached. The goal with the motor mount was to be able to tension the chain and to have it as close to the wheel axle as possible, to minimize slackness in the chain. To accomplish this, an aluminium profile was attached using M4 screws underneath the steel box. Holes with a diameter of 4.5 mm were drilled into the motor mount so that it could be fastened to the profile using M4 screws and T-nuts. By doing this, the motor mount can be slid across the length of the profile to tension the chain. When the chain is sufficiently tensioned, the T-nuts can be fastened.

The motor that was used was a brushed DC motor made for electric bicycles (MY1016Z2). The DC motor has a max voltage of 24 V and a max power of 250 W. To power the motor, I used a 7S (25.9 V) LiPo battery. The battery was quite small and had a low charge, but it was convenient and I didn't bother buying a new one. The motor and the battery both used XT90 connectors which I soldered on. For safety reasons, I added a toggle switch that could break the electrical connection to the battery.

To control the speed and direction of the motor, a simple H bridge that can handle the high power requirement will suffice. The H bridge came in the form of a brushed electric speed controller (ESC) with an input voltage of 12-35 V and a max current of 120 A. The ESC has a cable that can power electronics using 5 V. The speed and direction of the motor can be controlled using a PWM signal. Sending a signal with a pulse width of 1500 s makes the motor stay still. Increasing the pulse width will increase the angular velocity until the maximum pulse width of 2200 s. Decreasing the pulse width will instead increase the angular velocity in the reverse direction until reaching the minimum pulse width of 800 s.

Interestingly, when the ESC is in neutral mode at 1500 s the controller actively tries to decelerate the motor to a complete standstill. This causes the motor to brake way too fast which ends up making the chain pop off. To decelerate without destroying the chain, instead of going directly to 1500 s, the ESC is given a signal of 1600 s. This makes the car decelerate normally. Naturally, if the car decelerates when going in reverse a signal of 1400 s is sent, using the same offset of 100 s.

To drive the car, the driver needs a way to control the EV that is robust and reliable. For accelerating, I took inspiration from real cars and bought a cheap gas pedal from Amazon. The gas pedal uses a Hall effect sensor that can be read as an analog signal by a microcontroller. The analog signal is supposed to be in the range of 0-5 V, but after testing with a multimeter, the real range is around 0.9-4.2 V.

To steer the EV, I found a race car steering wheel (bremme, https://www.thingiverse.com/thing:4058270) that suited well for my purposes. I designed a connector for the steering wheel that fits the previous steering wheel's shaft. The steering wheel has several installation holes for push buttons. The EV ended up having two push buttons, one for the parking brake and one for driving in reverse.

To get user input and control the ESC, I used an Arduino Nano microcontroller which was connected to the other electronic components with a breadboard. The microcontroller receives power from the ESC and has connections for the gas pedal, the two push buttons and to control the motor.

When driving the EV, two states can be toggled on and off: the parking brake and the reverse gear. If the parking brake is engaged, the ESC is put in brake mode and engaging the gas pedal won't have any effect. Naturally, if the reverse gear is engaged, accelerating makes the car go backward instead. To toggle these states, the associated push button for each respective state can be pressed. In the code, this is implemented using boolean variables for the states. A state is toggled when its associated button is pressed and it wasn't pressed in the last loop iteration.

To drive the EV forward and backward, the strength of the signal from the gas pedal is read using Arduino's `analogRead()`

, which is a 10-bit value between 0 and 1023. This value is then mapped from the voltage range that was found earlier (converted to 10-bit numbers) to the pulse width of the signal that is sent to the ESC. Since it is assumed that the car is accelerating, the signal is mapped to a pulse width between 1600 s (neutral pulse width plus the deceleration offset) and 2200 s. Afterward, the algorithm checks if the EV is in reverse gear mode. If so, the pulse width is reversed to the range of 800-1400 s.

To accelerate and decelerate the EV without popping off the chain, the desired pulse width, or throttle, is not sent directly to the ESC. Before the car accelerates, the state of the parking brake is checked. If the parking brake is enabled, the neutral signal of 1500 s is sent to the ESC. Otherwise, if the current throttle of the ESC is lower than the desired throttle, the current throttle is increased by a constant value of 10 s. If the current throttle is higher than the desired, the current throttle is decreased by 10 s. If the absolute difference between the current and the desired throttles is less than 10 s, then the current throttle is set to be equal to the desired throttle. After updating, the new current throttle is sent to the ESC using Arduino's Servo library. This is done for every loop cycle and ensures that the EV doesn't accelerate or decelerate too quickly.

```
#include <Arduino.h>
#include <Servo.h>
#define ESC_PIN 9 // ESC signal pin
#define PEDAL_PIN A0 // potentiometer signal pin
#define BRAKE_PIN 3 // parking brake signal pin
#define REVERSE_PIN 4 // reverse gear signal pin
const int MIN_PEDAL = 190; // set minimum pedal value
const int MAX_PEDAL = 860; // set maximum pedal value
Servo esc; // create a Servo object for the ESC
const int PWM_DUTY_MIN = 800; // set PWM duty cycle for full reverse (in microseconds)
const int PWM_DUTY_MAX = 2200; // set PWM duty cycle for full forward (in microseconds)
const int PWM_DUTY_NEUTRAL = 1500; // set PWM duty cycle for neutral/stop (in microseconds)
const int PWM_DUTY_NEUTRAL_OFFSET = 100; // offset for neutral position when parking brake is off
void setup() {
// set pin modes
pinMode(PEDAL_PIN, INPUT);
pinMode(BRAKE_PIN, INPUT);
pinMode(REVERSE_PIN, INPUT);
// attach ESC and put it in neutral position
esc.attach(ESC_PIN); // attach ESC signal to pin 9
esc.writeMicroseconds(PWM_DUTY_NEUTRAL); // set to neutral/stop position
}
// Throttle
int pedal_value;
int desired_throttle = PWM_DUTY_NEUTRAL;
int current_throttle = PWM_DUTY_NEUTRAL;
const int THROTTLE_STEP = 10;
// Reverse gear state
bool last_pressing_reverse = false;
bool is_pressing_reverse = false;
bool reverse = true;
// Parking brake state
bool last_pressing_brake = false;
bool is_pressing_brake = false;
bool brake = true;
void loop() {
// Read the pedal and push buttons
pedal_value = analogRead(PEDAL_PIN);
is_pressing_reverse = digitalRead(REVERSE_PIN);
is_pressing_brake = digitalRead(BRAKE_PIN);
// Update the reverse gear state
if (is_pressing_reverse && !last_pressing_reverse) {
reverse = !reverse;
}
last_pressing_reverse = is_pressing_reverse;
// Update the parking brake state
if (is_pressing_brake && !last_pressing_brake) {
brake = !brake;
}
last_pressing_brake = is_pressing_brake;
// Map the pedal value to the desired throttle and constrain it
desired_throttle = map(pedal_value, MIN_PEDAL, MAX_PEDAL, PWM_DUTY_NEUTRAL+PWM_DUTY_NEUTRAL_OFFSET, PWM_DUTY_MAX);
desired_throttle = constrain(desired_throttle, PWM_DUTY_NEUTRAL+PWM_DUTY_NEUTRAL_OFFSET, PWM_DUTY_MAX);
// Reverse throttle if reverse enabled
if (reverse) {
desired_throttle = PWM_DUTY_NEUTRAL - (desired_throttle - PWM_DUTY_NEUTRAL);
}
// Set current throttle
if (brake) {
// If the parking brake is enabled, brake the motor
current_throttle = PWM_DUTY_NEUTRAL;
} else {
// Else, increase/decrease the throttle to the desired throttle
if (abs(current_throttle-desired_throttle) < THROTTLE_STEP) {
current_throttle = desired_throttle;
} else if (current_throttle < desired_throttle) {
current_throttle += THROTTLE_STEP;
} else if (current_throttle > desired_throttle) {
current_throttle -= THROTTLE_STEP;
}
}
// Write throttle to ESC
esc.writeMicroseconds(current_throttle);
delay(50);
}
```

On the same day as the parade, some last-minute changes were added. This was done as I felt like the EV wasn't cool or unique enough in comparison with the other vehicles. Therefore, I added two big 32x8 RGB LED pixel matrices to the front of the vehicle. This was done hastily by connecting the matrices and an ESP32 microcontroller to the 5 V power on the breadboard. One of the digital pins of the ESP32 was connected to the data in-pins on one of the matrices. This matrix then connected its data out-pin to the data in-pin on the other matrix. To program the matrices, the ESP32 was booted with a WLED web server (Schwinne, https://kno.wled.ge/), which a friend recommended to me. This is a super easy-to-use library that can be used to control LED strips and matrices from your phone or computer.

Finally, the time of reckoning arrived and issues started appearing. The chain loosened and popped off as usual. Luckily, a friend spotted a loose screw that was the root of the problem. The 2-hour drive went surprisingly well. The chain stayed on, the mechanical brake which I had never used worked flawlessly and I didn't hit any kids! Unfortunately, the LED matrix on the front of the car had trouble connecting to my phone, so the only color that worked was the default orange. The small LiPo battery that I used eventually died. This was expected however since the battery had a low charge so my friend had brought a rope along. The rope was used to tow me for the trip back to base. Overall, the Cortge was a success.

(The money shot. Photo: Mns Lundberg, zfoto.ztek.se)

As I mentioned earlier, the chain drive wasn't the most optimal choice. Throughout the whole building process, the chain kept popping off the gears. I also had trouble joining chain links together as this caused them to be stiffer which affects the overall performance of the chain. Tensioning the chain also proved to be difficult. The way the chain could be tensioned using the motor mount did work, but it required 1-2 people to hold the parts while another fastened the screws. The steel box and wheel axle tended to flex while driving which caused the chain to loosen during extended drives.

If there's anything I've learned it is to use good motors when designing vehicles. Brushed DC motors are fine, but there are better alternatives. A popular choice is to use hub motors that are already integrated into the wheels. These have high torque and don't require a belt or chain to be driven. Steering a vehicle with hub motors is a bit different as differential drive is often used, which can require more skillful turning. They are also usually not brushed DC motors and therefore require a more complex and expensive motor controller. However, the benefits are that the development time is significantly decreased since you don't have to worry about developing a chain or belt drive.

Lastly, next time I will focus on building a vehicle with a fun and wacky concept. The car I chose to modify was a simple pedal car but I think it would have been way more fun to create, for instance, a drivable toilet or couch or something. Unfortunately, that will have to wait until next year.

]]>We've discovered the surprising power of the Linear Quadratic Regulator (LQR), but it also has its limits. LQR has a basin of attraction around the point we've linearized the dynamics around. If the cart-pole is in a state outside of this basin, the system becomes unstable and the controller goes mad. To save our pendulum from these horrible nonlinear dynamics, we need to swing it back to our stable basin using some sort of swing-up algorithm. That's where trajectory optimization comes into the picture.

I came across trajectory optimization after reading Daniel Piedrahita's article about cart-pole control. To learn more about this topic and more I've been watching Dr. Russ Tedrake's lectures from the course MIT 6.832 - Underactuated Robotics. I've also watched Dr. Matthew Kelly's introductory video to trajectory optimization and read his paper on how to implement direct collocation. All of these sources are fantastic and I highly recommend you take a look at them!

LQR is a closed-loop control algorithm that spans the entirety of state space. Put simply, anywhere you are, the controller can manage to get you back to a stable fixed point. However, this only works for linear dynamics which require linearization. Trajectory optimization is an open-loop solution to control which can be useful for specific tasks, such as creating a swing-up trajectory. Instead of having a solution for the entire state space, trajectory optimization only creates a trajectory that we need to follow. After we're back in our basin of attraction, we can switch on LQR again!

Trajectory optimization, as the name suggests, uses optimization to minimize the cost of control for a specific trajectory. An optimization problem works by trying to minimize the value of a function called the objective function. This function is dependent on decision variables that need to fulfill some constraints. The decision variables for these types of problems are often the state and control signals, *x* and *u*, and how long the trajectory takes to finish. Using notation from Matthew Kelly, trajectory optimization can be described in what's known as the Lagrange form:

The objective function (1.1) tries to minimize *w* which is the integrand of the trajectory. This problem is subject to the system dynamics (1.2) but also constraints along the path (1.3). This could for example be that a car avoids pedestrians on its way to the goal. There are also boundary constraints (1.4) that put constraints on the initial and final states. The state and control signal also have bounds (1.5)(1.6), for instance, max speeds, voltages, etc. Finally, the initial and final states and time have some limits (1.7)(1.8)(1.9).

One trajectory optimization algorithm is called direct collocation. The direct part means that the problem is discretized, often into what's called a nonlinear program, or NLP, which is what we will do. Converting a trajectory optimization problem into an NLP problem is known as transcription, and direct collocation falls under the umbrella of so-called direct transcription methods.

Nonlinear programs can be generalized as (1.10) an objective function *J* of some decision variables *z* with some equality constraints *f*, inequality constraints *g* and bounds on *z*. The dynamics constraint (1.2) is an equality constraint while the others are inequality constraints. An easy objective function to implement is the integral of the control signal squared (1.11). We can do this if we regard the control signal as a scalar function. This makes sense for the cart-pole where the control is the motor voltage. This objective function, therefore, tries to create a smooth trajectory with a minimized control input.

We still haven't discretized our NLP. To do this, there are again several options to choose from. An easy method to implement is called trapezoidal collocation. We begin by discretizing the decision variables into values at discrete points in time. These are called collocation points and there can be an arbitrary *N* of them which are separated in time by the time step *h*. Here *f* denotes the dynamics.

I won't delve into the proof that trapezoidal collocation accurately approximates the continuous dynamics. Instead, the discretized objective function has been provided below (1.12) as well as the discretized equality constraint on the dynamics (1.13).

If we add in the constraints on the initial and final state as well as their bounds and put this into a solver, this should produce a solution! All that's missing is an initial guess of the trajectory for the solver to optimize. An easy guess to make is just to interpolate from the initial state to the final state with the control signal being zero, and this dumb guess is often enough for the solver to come up with an actual solution!

After your solver has come up with a solution for the optimization problem, it spits out a sequence of length *N* of the optimal state and control signal. These *N* collocation points need to be converted back into a continuous function. The control signal can be easily converted using piecewise linear interpolation. Mathematically, the functions between the points can be constructed using the equation below (1.14).

The state is a bit harder as it can be converted using quadratic interpolation. Instead of connecting the collocation points with lines, points are connected by quadratic functions. Interpolation using low-degree polynomial functions is called spline interpolation. These functions can be constructed using the equation below (1.15).

Now we have all the tools we need to start coding! We will create a class that takes in the dynamics, constraints and the number of collocation points `N`

. The libraries used are `numpy`

and parts of `scipy`

for the solver `minimize`

and the cubic interpolator `CubicSpline`

.

```
import numpy as np
from scipy.optimize import minimize
from scipy.interpolate import CubicSpline
class DirectCollocation():
def __init__(self, N: int, dynamics, N_states: int, N_controls: int, state_lower_bound, state_upper_bound, control_lower_bound, control_upper_bound, tolerance=1e-6):
self.N = N
self.dynamics = dynamics
self.N_states = N_states
self.N_controls = N_controls
self.state_lower_bound = state_lower_bound.T[0]
self.state_upper_bound = state_upper_bound.T[0]
self.control_lower_bound = control_lower_bound.T[0]
self.control_upper_bound = control_upper_bound.T[0]
self.tolerance = tolerance
```

The constraints consist of the lower and upper bounds of the state and control variables. These are flipped to row vectors for easy manipulation. Besides those parameters, there's also the number of state variables `N_states`

, the number of control variables `N_controls`

and the `tolerance`

. `N_states`

and `N_controls`

are useful for calculations. The `tolerance`

specifies how accurate the solver should be; A lower `tolerance`

means a high accuracy.

Next up is the method `make_guess`

which makes the initial guess. This takes in the `initial_state`

and the desired `final_state`

and makes a linear interpolation between those two. The initial guess of the control is zero through the entire trajectory.

```
def make_guess(self, initial_state, final_state):
self.initial_state = initial_state.T[0]
self.final_state = final_state.T[0]
self.initial_variables = np.hstack([
np.hstack([np.linspace(initial, final, self.N) for initial, final in zip(self.initial_state, self.final_state)]),
np.zeros(self.N_controls*self.N),
])
return self.initial_variables
def variables_to_state_control(self, variables):
state = variables[:self.N_states*self.N]
control = variables[self.N_states*self.N:(self.N_states+self.N_controls)*self.N]
return state, control
```

I also threw in the method `variables_to_state_control`

since this is important when it comes to using a solver. The solver can only use data in the form of a row vector, so all of the decision variables (state and control) need to be packed into this single vector. The initial guess, `self.initial_variables`

, is created in this way and `variables_to_state_controls`

takes in the decision variables from the solver and unpacks them into the state and control.

Up next is the `objective_function`

. As the math (1.11) implies, this spits out the quadratic cost of the control signal which the solver tries to minimize.

```
def objective_function(self, variables):
state, control = self.variables_to_state_control(variables)
result = 0
for i in range(self.N_controls):
for k in range(self.N-1):
result += (self.h/2)*(control[self.N*i+k]**2+control[self.N*i+k+1]**2)
return result
```

The constraints can be categorized as either equality constraints or inequality constraints. In code, these are packed up similarly to the decision variables as row vectors. For the equality constraints, the solver ensures that the elements of this vector are always zero. The equality constraints are created in the method `eq_constraints`

. These consist of the constraints on the initial and final state and the dynamic constraints along the trajectory.

```
def eq_constraints(self, variables):
state, control = self.variables_to_state_control(variables)
constraints = []
for k in range(self.N_states):
constraints.append(state[self.N*k]-self.initial_state[k])
constraints.append(state[self.N*k+self.N-1]-self.final_state[k])
state_current = state[0::self.N]
control_current = control[0::self.N]
dynamics_current = self.dynamics(np.vstack(state_current), np.vstack(control_current)).T[0]
for k in range(self.N-1):
state_next = state[k+1::self.N]
control_next = control[k+1::self.N]
dynamics_next = self.dynamics(np.vstack(state_next), np.vstack(control_next)).T[0]
dynamic_constraint = state_next-state_current-self.h/2*(dynamics_next+dynamics_current)
constraints.extend(list(dynamic_constraint))
state_current = state_next
control_current = control_next
dynamics_current = dynamics_next
return constraints
```

The inequality constraints are created in the method `ineq_constraints`

. The solver ensures that the elements of this vector are never zero. For now, these are the upper and lower bounds of the variables.

```
def ineq_constraints(self, variables):
state, control = self.variables_to_state_control(variables)
constraints = []
for k in range(self.N):
current_state = state[k::self.N]
current_control = control[k::self.N]
constraints.extend(list(np.array(current_state)-self.state_lower_bound))
constraints.extend(list(self.state_upper_bound-np.array(current_state)))
constraints.extend(list(np.array(current_control)-self.control_lower_bound))
constraints.extend(list(self.control_upper_bound-np.array(current_control)))
return constraints
```

Finally, the optimal control is solved using `make_controller`

. This method takes in the duration of the trajectory, the number of points to interpolate `N_spline`

, the `initial_state`

and the `desired final_state`

. It first creates an initial guess of the trajectory. The constraints are packed into a dictionary that the solver can process. The solver itself uses Sequential Least Squares Programming (SLSQP) and takes in the objective function, initial guess, constraints and tolerance and spits out a solution of both the state and control signals. If `N_spline`

is defined, then the state and control are splined and interpolated respectively.

```
def make_controller(self, time, initial_state, final_state, N_spline=None):
self.time = time
self.N_spline = N_spline
self.h = (self.time)/self.N
self.make_guess(initial_state, final_state)
constraints = [
{"type": "eq", "fun": self.eq_constraints},
{"type": "ineq", "fun": self.ineq_constraints},
]
sol = minimize(
fun=self.objective_function,
x0=self.initial_variables,
method="SLSQP",
constraints=constraints,
tol=self.tolerance
)
state, control = self.variables_to_state_control(sol.x)
N_time = np.linspace(0, time, self.N)
if N_spline:
N_spline_time = np.linspace(0, time, N_spline)
state_spline = np.vstack([
CubicSpline(N_time, s_row)(N_spline_time) for s_row in state.reshape((self.N_states, self.N))
])
control_spline = np.vstack([
np.interp(N_spline_time, N_time, c_row) for c_row in control.reshape((self.N_controls, self.N))
])
else:
state_spline, control_spline = np.vstack(state), np.vstack(control)
return state_spline, control_spline
```

Using our code for direct collocation, we can now plan a swing-up trajectory for a single pendulum. We then use this control signal and when the pendulum is close to the upper fixed point, we switch on our LQR controller. This was simulated using 41 collocation points and a trajectory duration of 1 second. And... go

Let's try it with two pendulums! To make our LQR way more stable, we'll make the outermost pendulum way lighter than the other one. Let's set the duration to 2 seconds and use 61 collocation points.

So, as you can see above, the trajectory planner doesn't always work out, but why is that? Well, the devil's in the details.

The integrator of the direct collocation algorithm is slightly different from the Rk4. The small differences in the dynamics, and that there isn't a collocation point for every time step, paired with the chaotic nature of the double pendulum make the trajectory unstable.

Another issue is the calculation time. The single pendulum took 17 seconds to calculate. The double pendulum however took a whopping 7.5 minutes!

There are ways of fixing this. For the stability issues, we can use a time-varying LQR controller that stabilizes the dynamics along the trajectory. We can also use Model-Predictive Control, or MPC, which calculates a new trajectory for every time step to compensate for the inaccuracies. This however requires a fast calculation time which can be accomplished by using a faster solver and/or programming language. But for now, this is promising!

I'll try to look into Kalman filters again. It would also be interesting to find a better solver and make a faster trajectory planner so that MPC can be implemented. Making a time-varying LQR controller would also be cool!

Regarding the physical cart-pole, the motor driver I have wasn't powerful enough so I've ordered a new one from AliExpress which should arrive shortly.

By the way, if you want to have a deeper look into the code, visit the GitHub repository! https://github.com/sackariaslunman/inverted-pendulum

Controlling the cart-pole will not be easy; the system has non-linear dynamics. Worse yet, when you add more poles to the cart, the dynamics also become chaotic. Most basic controllers (that I have a sliver of knowledge in designing) operate with linear dynamics. To use a linear controller on a non-linear system you need to approximate dynamics in a way that makes them linear.

To do this we can make some assumptions about our system, starting with that when we start our controller, we let go of the pole in the upright position, i.e. the angle is 0. We also assume then assume that our controller will be able to control this well so that the cart's velocity *dx* and the angular velocity *d* are 0. We also assume that the cart starts at some arbitrary position *x* which we also set to 0.

Using our new initial conditions it is now possible to approximate the linear dynamics at this specific point, where *x*, *dx*, and *d* are all 0, using a method called linearization. Since we used *x* to denote the cart's position I will denote the full state as a bold ** x**. In its non-linear form

Using linearization, the approximated linear dynamics can be described using a matrix *A* that describes how each variable changes based on itself and the other variables in the full state. Each variable also changes based on ** u** which is described in the

As you can see, it's quite a handful to do by hand, especially when you have many states; the number of elements has complexity O(*n*^2) where *n* is the number of states. Basically, the *A* and *B* matrices together describe the "tangent" of how the state variables change at a specific state ** x0** and control state

In our case, ** x0** should be chosen to be where the pole is balanced, i.e. the upright position where all variables are 0. Likewise, when the pole is balanced the control signal shouldn't need to fire so

And this is just the case with one pole! Luckily, I've already calculated these matrices for an arbitrary number of poles. I'm not going to show the final equations since they are disgustingly huge.

So, how do we control a system that is described using a linear state space model? That's where the Full-State Feedback, or FSFB, regulator comes in. It is ridiculously simple; way simpler than PID controllers if you've played around with those. I watched Christoper Lum's lecture on FSFB control and I can happily recommend the entire series of lectures.

To make the most basic type of FSFB regulator, we need to assume that every variable in our state is observable. The observable variables can be described as a subset of ** x** called

In our case, C becomes the identity matrix of size *n* by *n* where *n* is the number of variables in our state. For our system, the observable state does not depend on the control so *D* becomes 0, or more specifically, a column vector of size *n* by 1 where all elements are 0.

Finally, using this knowledge we can design our regulator. Since we can observe every variable at each time step, if we can set a proportional gain for each variable, and the system is controllable, then we can control the entire state. Yes, that's it. Just a proportional gain that we will call *K*. We will design our control ** u** to be equal to -

Optimal control relies heavily on the concepts of system controllability and observability. It's not the goal of this article to explain those concepts but I've provided some links. You can also watch Christopher Lum's lectures.

After modeling our FSFB regulator we are left with one question: how do we optimize the gain *K*? This is where LQR, or the Linear-quadratic regulator, comes in. LQR tries to optimize *K* based on how *"costly"* the different variables in our state are. For our cart-pole, a costly variable might be . We might want to make this 0 as fast as possible and worry about the other variables later. Using LQR, we can specify that is more costly than the others and then optimize our controller based on these criteria.

*K* is optimized using the above equation. The costs of the different state variables are specified on the diagonal of the diagonal matrix *Q*. The costs of the control signals are likewise specified in the *R* matrix in the same manner.

As you can see, *K* is not included in the equation. To solve for *K* you use something called the continuous-time algebraic Ricatti equation or CARE. You can look it up if you want but in Python, it can be solved by using SciPy and NumPy.

```
import numpy as np
from scipy.linalg import solve_continuous_are
S = solve_continuous_are(A, B, Q, R) # S is the solution for CARE
K = np.linalg.inv(R) @ B.T @ S # how to calculate K using S
```

And that's it, two rows of code! Pretty simple if you ask me.

To implement controllers in Python there's already a library that you can use called python-control which I highly recommend. However, since I want to learn all of this from scratch, I've opted for making my own Full-state Feedback regulator class called `FSFB`

. You can have a look at the code below.

```
from scipy.signal import cont2discrete
import numpy as np
class FSFB:
def __init__(self, A, B, C, D, dt: float):
self.A = A
self.B = B
self.C = C
self.D = D
self.dt = dt
self.discretize()
def discretize(self):
dlti = cont2discrete((self.A,self.B,self.C,self.D),self.dt)
self.A_d = np.array(dlti[0])
self.B_d = np.array(dlti[1])
```

What I haven't yet explained is discretization and the `discretize`

method. Since our computers work with zeros and ones, we have to adapt our linear model to work in discrete time rather than continuous time. The function `cont2discrete`

takes in `A`

and `B`

(as well as `C`

and `D`

, but they stay the same after discretization) and calculates their discrete counterparts `A_d`

and `B_d`

based on the time step `dt`

.

I've also made a subclass of `FSFB`

called `LQR`

that implements a method that calculates the gain *K* called `K_lqr`

using *CARE* and its discrete counterpart using *DARE*. There's also the method `calculate_K_r`

which calculates the gain `K_r`

which lets the controller follow a reference signal `r`

, which I won't go into detail about in this article. Finally, the feedback methods take in the current state and reference signal and calculate the next control signal.

```
from scipy.linalg import solve_continuous_are, solve_discrete_are
class LQR(FSFB):
def calculate_K_lqr(self, Q, R):
self.Q = Q
self.R = R
S = solve_continuous_are(self.A, self.B, Q, R)
S_d = solve_discrete_are(self.A_d, self.B_d, Q, R)
self.K_lqr = np.linalg.inv(R) @ self.B.T @ S
self.K_lqr_d = np.linalg.inv(R + self.B_d.T @ S_d @ self.B_d) @ (self.B_d.T @ S_d @ self.A_d)
def calculate_K_r(self):
K_r = np.true_divide(1, self.D + self.C @ np.linalg.inv(-self.A + self.B @ self.K_lqr) @ self.B)
K_r[K_r == np.inf] = 0
K_r = np.nan_to_num(K_r)
self.K_r = K_r.T
K_r_d = np.true_divide(1, self.D + self.C @ np.linalg.inv(np.eye(self.A_d.shape[0]) - self.A_d + self.B_d @ self.K_lqr_d) @ self.B_d)
K_r_d[K_r_d == np.inf] = 0
K_r_d = np.nan_to_num(K_r_d)
self.K_r_d = K_r_d.T
def feedback(self, state, r):
u = self.K_r @ r - self.K_lqr @ state
return u
def feedback_d(self, state, r):
u_d = self.K_r_d @ r - self.K_lqr_d @ state
return u_d
```

Finally! Using the environment that we set up in the last article and our new `LQR`

class we can now regulate the cart-pole. I'm using a *Q* with ones on the diagonal and *R* containing a single element 0.1 for *Va.* The initial conditions are *x* = -0.4, *dx* = 0, = 45 and *d* = 0. The reference signal, or goal, is *x* = 0.1, *dx* = 0, = 0 and *d* = 0. And... go!

Woohoo! Let's add another pole. Adding another pole makes the system even more unstable so the starting angles for both poles will be 30 to ensure that our controller can regulate the system. And... go!

LQR works great if you can measure all state variables accurately. In real life, this is almost certainly impossible. There's always some noise in the system dynamics and measurements, and often you can't measure the entire state. That's where the Kalman filter and LQG control come in which we will have a look at next time.

We are also regulating an unrealistic system. Right now, the motor draws upwards of 2400 Watts: not ideal. To simulate a more realistic system, a good idea could be to do some system identification on the physical model when that has been built. But for now, we'll have to stick with the current system parameters.

I'll look into implementing a Kalman filter and combining it with LQR control to create an LQG controller. I've also read Daniel Piedrahita's article where he talks about regulating a cart-pole using exciting stuff like LQR-Trees and trajectory planning which I will have a look at later on!

Also, the motor driver for the physical model has arrived, so I'll get busy building that too!

So, the old model with a `Cart`

, `Pole`

and `Motor`

class worked well with the forward Euler method where integrating is easy. However, for higher-quality integration schemes, it helps to have the full state of your system and just one function that differentiates it, otherwise, it gets messy quickly.

The new model is just a single class called `CartPoleSystem`

which represents the entire system: cart, poles and motor. The constructor takes in a cart and motor, represented as tuples, and a list of poles, represented as a list of tuples. It also takes in the gravitational acceleration `g`

and the integration scheme represented as a string. We will return to the different kinds of integrators and how we implement them.

```
from __future__ import annotations
import numpy as np
from numpy import sin, cos, pi
color = tuple[int,int,int]
class CartPoleSystem:
def __init__(
self,
cart: tuple[float, float, float, float, float, color],
motor: tuple[float, float, float, float, float, float, float, color],
poles: list[tuple[float, float, float, float, color]],
g: float,
integrator: str = "rk4"
):
self.cart = np.array(cart[:-1], dtype=np.float32)
x0, m, u_c, min_x, max_x, cart_color = cart
self.m = m
self.u_c = u_c
self.min_x = min_x
self.max_x = max_x
self.cart_color = cart_color
self.motor = np.array(motor[:-1], dtype=np.float32)
Ra, Jm, Bm, K, r, min_Va, max_Va, motor_color = motor
self.Ra = Ra
self.Jm = Jm
self.Bm = Bm
self.K = K
self.r = r
self.min_Va = min_Va
self.max_Va = max_Va
self.motor_color = motor_color
self.num_poles = len(poles)
self.poles = np.array([pole[:-1] for pole in poles], dtype=np.float32)
self.pole_colors = [pole[-1] for pole in poles]
self.M = self.m + sum(mp for (_, mp, _, _) in self.poles)
self.g = g
self.integrator = integrator
self.reset(self.get_initial_state())
def reset(self, initial_state):
self.state = np.array([
initial_state
])
self.d_state = np.array([
np.zeros(len(initial_state))
])
def get_initial_state(self):
return np.hstack([np.array([self.cart[0], 0, 0]), np.hstack([[angle0, 0] for angle0 in self.poles.T[0]])])
```

The constructor saves all of the named parameters and the cart, motor and poles as NumPy arrays. The `get_initial_state`

method gets the initial state based on the passed parameters to the constructor. This state is then passed to the `reset`

method which adds it to the first row of the `state`

matrix. We also have the `d_state`

matrix which stores the calculated derivative of the state at each time step. The full state with *n* poles is as follows:

To get the next state, we need to differentiate the current state. The `differentiate`

method does pretty much what the `update`

methods of the previous codebase did, except for the full state in just one function.

```
def differentiate(self, state, u):
Va = u[0]
sum1 = sum([m*sin(state[3+k*2])*cos(state[3+k*2]) for k,(_,m,_,_) in enumerate(self.poles)])
sum2 = sum([m*(l/2)*state[3+k*2+1]**2*sin(state[3+k*2]) for k,(_,m,l,_) in enumerate(self.poles)])
sum3 = sum([(u_p*state[3+k*2+1]*cos(state[3+k*2]))/(l/2) for k,(_,_,l,u_p) in enumerate(self.poles)])
sum4 = sum([m*cos(state[3+k*2])**2 for k,(m,_,_,_) in enumerate(self.poles)])
d_x = state[1]
dd_x = (self.g*sum1-(7/3)*((1/self.r**2)*((self.K/self.Ra)*(Va*self.r-self.K*d_x)-self.Bm*d_x)+sum2-self.u_c*d_x)-sum3)/(sum4-(7/3)*(self.M+self.Jm/(self.r**2)))
d_theta_m = d_x / self.r
dd_thetas = np.hstack([[state[3+k*2+1],(3/(7*l/2)*(self.g*sin(state[3+k*2])-dd_x*cos(state[3+k*2])-u_p*state[3+k*2+1]/(m*l/2)))] for k,(_,m,l,u_p) in enumerate(self.poles)])
return np.hstack([d_x, dd_x, d_theta_m, dd_thetas])
```

The code is messy and could be refactored, but for now, it works. The differentiated state is returned in the following format:

The system is brought forward in time with the `step`

method. This method gets the current state and then differentiates and integrates it either using the forward Euler method or RK4 (I promise I will explain what this is later) which spits out the next state. It then constrains/clamps the next state to within its maximum/minimum boundaries and appends it and the current state's derivative to the state matrices.

```
def get_state(self, t=-1):
return self.state[t]
def step(self, dt: float, Va: float):
state = self.get_state()
next_state, d_state = None, None
if self.integrator == "fe":
next_state, d_state = fe_step(dt, self.differentiate, state, [Va])
else:
next_state, d_state = rk4_step(dt, self.differentiate, state, [Va])
next_state = self.clamp(next_state)
self.update(next_state, d_state)
return next_state
def clamp(self, state):
x = state[0]
if x > self.max_x:
x = self.max_x
elif x < self.min_x:
x = self.min_x
state[0] = x
for k in range(self.num_poles):
state[3+k*2] %= 2*pi
return state
def update(self, next_state, d_state):
self.state = np.vstack([self.state, next_state])
self.d_state = np.vstack([self.d_state, d_state])
```

Besides these important methods, we also have two helper methods that will be used later in the environment class.

```
def max_height(self) -> float:
return sum(l for _,_,l,_ in self.poles)
def end_height(self) -> float:
state = self.get_state()
return sum(l*cos(state[3+k*2]) for k, (_,_,l,_) in enumerate(self.poles))
```

What exactly is numerical integration? Without going too heavy into math lingo, a numerical integrator is a function that receives a state and how to calculate its derivative and then tries to predict what the next state will be. We already touched on this with the forward Euler method which is a numerical integrator, just not a very good one. To brush up on the topic I can recommend watching a series of lectures by professor Steve Brunton on Differential Equations and Dynamical Systems that delves into numerical integration.

The problem with the forward Euler method, or FE, is that it tends to overshoot, and the overshooting only increases with a bigger time step. This can even lead to the simulated system becoming unstable when the time step increases over a certain threshold. This threshold depends on the system and how "dynamic" it is.

The Runge-Kutta methods are methods for numerical integration that try to be more accurate than FE using the same time step. Brunton has a great explanation of the 2nd-order and 4th-order Runge-Kutta methods which you can find here. In my simulation I'm using the 4th-order Runge-Kutta method or RK4. I won't go into details about the algorithm and why it works; it has to do with trying to eliminate the errors of the derivative's Taylor series expansion.

My implementation of FE and RK4 both receive a time step, the differentiation function, the state and the control state. They then calculate the next state using the derivative of the current state.

```
def fe_step(dt: float, differentiate, x0, u):
d_x = differentiate(x0, u)
x = x0 + dt * d_x
return x, d_x
def rk4_step(dt: float, differentiate, x0, u):
f1 = differentiate(x0, u)
f2 = differentiate(x0 + (dt/2) * f1, u)
f3 = differentiate(x0 + (dt/2) * f2, u)
f4 = differentiate(x0 + dt * f3, u)
d_x = f1 + 2*f2 + 2*f3 + f4
x = x0 + (dt/6) * d_x
return x, d_x
```

RK4 is great, but it's not enough for chaotic systems like our cart-pole system. It let me increase the time step from 0.02 to 0.025 without the system going unstable, but that was it. Chaotic systems typically need more powerful and tailor-made integration schemes. I might come back and find a better integrator than RK4.

OpenAI has an open-source package called Gym that is used for testing one's machine learning models on some predefined dynamic models. Gym already has a cart-pole environment, but this is an idealized version with only a single pole and no friction or motor. However, Gym also lets you define your own environment which is what we will do.

The environment is defined as a class called `CartPolesEnv`

and is a subclass of Gym's `Env`

class. The constructor takes in a `CartPoleSystem`

, the time step `dt`

and `g`

. It initializes the screen, its size and the font that will be used for the rendering. Then, more importantly, it initializes the action space or control space. This object defines the minimum and maximum values, as well as the data type and shape, of our control signals, which in our case is the input voltage `Va`

. We do the same for our observation space which is our full state. The class `Box`

is used for creating spaces of continuous signals. If discrete signals are preferred then there's also the `Discrete`

class.

```
from numpy import radians, pi, sin, cos
from random import uniform
import gym
from gym import spaces
import numpy as np
from cartpolesystem import CartPoleSystem
import pygame
from time import perf_counter
class CartPolesEnv(gym.Env):
def __init__(
self,
system: CartPoleSystem,
dt: float,
g: float,
):
super(CartPolesEnv, self).__init__()
self.system = system
self.max_height = system.max_height()
self.dt = dt
self.g = g
self.screen: pygame.surface.Surface | None = None
self.font: pygame.font.Font
self.i : int
self.width = 1200
self.height = 700
self.size = self.width, self.height
self.action_space = spaces.Box(
low=system.min_Va,
high=system.max_Va,
dtype=np.float32
)
num_of_poles = system.num_poles
obs_lower_bound = np.hstack([np.array([system.min_x, np.finfo(np.float32).min]), np.tile(np.array([0.0, np.finfo(np.float32).min]), num_of_poles)])
obs_upper_bound = np.hstack([np.array([system.max_x, np.finfo(np.float32).max]), np.tile(np.array([2*pi, np.finfo(np.float32).max]), num_of_poles)])
self.observation_space = spaces.Box(
low=obs_lower_bound,
high=obs_upper_bound,
shape=(2+2*num_of_poles,),
dtype=np.float32
)
```

An environment needs a `reset`

method that starts a new episode. To train general control models it helps if the new initial state is kind of random. Therefore, the cart will start in a random position on the track and every pole can assume an angle between -15 and 15. It also closes the current rendered window and returns the initial state. The observed state excludes the third element the motor angle since it's not important for balancing the poles.

```
def get_state(self):
state = np.delete(self.system.get_state(), 2, axis=0)
return state
def reset(self):
self.close()
initial_state = [uniform(self.system.min_x, self.system.max_x)*0.8, 0, 0]
for _ in range(self.system.num_poles):
initial_state.extend([radians(uniform(-15, 15)), 0])
self.system.reset(initial_state)
return self.get_state(), {"Msg": "Reset env"}
```

An environment also needs a `step`

method. This method should reward the model if it's doing good and penalize the model if it's doing badly. It should also terminate the episode if the model is doing something unwanted. Since I haven't started doing reinforcement learning or RL on the model, I haven't fleshed out the rewarding and termination conditions. Right now, the episode ends if the cart goes beyond the track. There are no rewards, only a penalization if the episode is terminated. How to effectively reward an RL model is really tricky; we will definitely return to this method and do some improvements.

```
def step(self, action: float):
reward = 0
Va = action
state = self.system.step(self.dt, Va)
x = state[0]
y = self.system.end_height()
terminated = bool(
x >= self.system.max_x or
x <= self.system.min_x
)
if terminated:
reward -= 10
return self.get_state(), reward, terminated, {}, False
```

Last up we have the `render`

method. This is the same PyGame code from last time plus some optional text lines that can be rendered. There's also a `close`

method which closes the PyGame window.

```
def si_to_pixels(self, x: float):
return int(x * 500)
def render(self, optional_text: list[str] = []):
if not self.screen:
pygame.init()
self.screen = pygame.display.set_mode(self.size)
self.font = pygame.font.Font('freesansbold.ttf', 20)
self.start_time = perf_counter()
self.i = 0
self.screen.fill(Colors.gray)
state = self.system.get_state()
x0 = self.si_to_pixels(state[0]) + self.width//2
y0 = self.height//2
pygame.draw.rect(self.screen, self.system.cart_color, (x0, y0, 20, 10))
max_x = self.width//2 + self.si_to_pixels(self.system.max_x)
min_x = self.width//2 + self.si_to_pixels(self.system.min_x)
pygame.draw.rect(self.screen, self.system.cart_color, (min_x-10, y0, 10, 10))
pygame.draw.rect(self.screen, self.system.cart_color, (max_x+20, y0, 10, 10))
motor_x0 = min_x-100
theta_m = state[2]
motor_sin = self.si_to_pixels(sin(-theta_m)*0.05)
motor_cos = self.si_to_pixels(cos(-theta_m)*0.05)
pygame.draw.polygon(self.screen, self.system.motor_color, [
(motor_x0+motor_sin, y0+motor_cos),
(motor_x0+motor_cos, y0-motor_sin),
(motor_x0-motor_sin, y0-motor_cos),
(motor_x0-motor_cos, y0+motor_sin),
])
x0 += 10
for k, ((_,_,l,_),color) in enumerate(zip(self.system.poles,self.system.pole_colors)):
x1 = x0 + self.si_to_pixels(l * sin(state[3+k*2]))
y1 = y0 + self.si_to_pixels(-l * cos(state[3+k*2]))
pygame.draw.line(self.screen, color, (x0, y0), (x1, y1), 10)
x0 = x1
y0 = y1
texts = [
f"Time: {round(self.i*self.dt,2)} s",
"",
"Cart:",
f"Position: {round(state[0],2)} m",
f"Velocity: {round(state[1],2)} m/s",
"",
"Motor:",
f"Angle: {round(theta_m,2)} rad",
]
texts.extend(optional_text)
for k, text_k in enumerate(texts):
text = self.font.render(text_k, True, Colors.black, Colors.gray)
text_rect = text.get_rect()
self.screen.blit(text,(0,20*k,text_rect.width,text_rect.height))
pygame.display.flip()
self.i += 1
def close(self):
pygame.quit()
self.screen = None
color = tuple[int,int,int]
class Colors(Dict):
red = (255,0,0)
green = (0,255,0)
blue = (0,0,255)
purple = (255,0,255)
cyan = (0,255,255)
yellow = (255,255,0)
black = (0,0,0)
white = (255,255,255)
gray = (100,100,100)
```

Since RL isn't my strong suit I will begin balancing the pole using classic control theory. Specifically, I will try out full-state feedback control or FSFB using LQR. I have already linearized the nonlinear model for *n* poles so the rest should be easy (hopefully).

Also, most of the physical parts have arrived. The track is assembled so right now I'm waiting on the driver for the DC motor. I also have to decide what sensors to use. I was thinking about only using a camera, but it might be too inconvenient. Encoders on the motor and pendulum joints could be the way forward. We'll have to see...

I'm currently trying to build a physical model of this system and balance it using different control methods. To do this the *proper* way, I'll first need to model the system to optimize my controller and simulate the controlled system to verify that it works as intended, so let's do that.

Forget about the coding; first, we need to model the system. The Cart-Pole system is what's known as a nonlinear dynamic system. To begin visualizing the system in our heads, let's imagine a cart and just one pole.

The cart can move along the horizontal axis or the x-axis. It can not move along the vertical axis or the y-axis. The pendulum is attached to the cart along the middle and can rotate freely in the xy-plane. The angle between the pendulum and the y-axis is a function of time and will be denoted as . Likewise, the cart's position on the x-axis is *x*. The pole has a length of *l* and a mass of *m* that exerts a gravitational force. There's also an input force *f* which our controller uses to balance the pole.

All forces aren't written out. Two of the missing forces are the friction between the cart and the track which is proportional to the cart's velocity and the friction in the pole's rotation joint.

The system can be mathematically described by the use of several different methods such as Newton's laws of motion or the Lagrangian. Since this is a popular and well-documented model in mechanics I will save myself some work by using Colin D. Green's [1] solution for the equations which can be found here. The equations describe a generalized cart-pole system with multiple poles indexed by *i*.

Now we can add as many poles as we please, but the model is not complete yet. The physical cart will be fastened to a timing belt that will be driven by a DC motor. The DC motor sits on the x-axis and can be described by two equations.

Since the inductance of small DC motors is very small it has been excluded from our equations.

Now, our DC motor's output shaft is coupled with a timing belt on which our cart sits. Therefore, the velocity and acceleration of the cart are proportional to the angular velocity and angular acceleration at the output shaft respectively by the radius of the output shaft, *r*. Likewise, the output torque *TL* is proportional to the input force *f* at the cart by *r*. Using this information we can reformulate equations (3) and (4) in terms of derivations of *x*, *f* and *r*.

Now we can get the final equation for the cart's acceleration by inserting equations (5) and (6) into (1).

With the power of these two equations, we can now begin coding the system in Python. Heads up, the coding was quick and dirty and needs to be refactored; I just wanted to get a simulator out quickly.

Let's begin with the pole. Let's make it a class whose constructor takes in the mass `m`

, initial angle `angle0`

, length `l`

, coefficient of friction `u_p`

and an optional pendulum `pole`

which is attached to the end of this one. It also takes in a color that it will display in the visual part of the simulator. Then we initialize the states of the pole as a dictionary `state`

where the keys are the named variables and the values are lists of the variable values through time. We also have some imports for typing and not having to type `math.sin`

and `math.cos`

.

```
from __future__ import annotations
from math import sin, cos
class Pole:
def __init__(
self,
mass: float,
angle0: float,
length: float,
pole_friction: float,
color: tuple[int,int,int],
child: Pole | None
):
self.m = mass
self.l = length
self.u_p = pole_friction
self.color = color
self.state = {
"theta": [angle0],
"d_theta": [0],
}
self.child = child
```

Next up is an update method that updates the states of the system. The arguments are the time step `dt`

, the acceleration of the cart `dd_x`

and the gravitational acceleration `g`

. The algorithm calculates the angular acceleration of the pole `dd_theta`

by use of equation (2) and uses the forward Euler method to calculate the angular velocity `d_theta`

and finally the angle `theta`

. We then save the new states and append them to the state lists.

```
def update(self, dt: float, dd_x: float, g: float):
theta = self.angle()
d_theta = self.angular_velocity()
dd_theta = 3/(7*self.lh())*(g*sin(theta)-dd_x*cos(theta)-self.u_p*d_theta/(self.m*self.lh()))
d_theta = d_theta + dd_theta * dt
theta = theta + d_theta * dt
self.state["d_theta"].append(d_theta)
self.state["theta"].append(theta)
```

We also make use of some helper methods. One of these is the `__iter__()`

method which puts the pole and all of its attached poles into a list that can be iterated through. We also have two methods that return the angle or angular acceleration at a specific time step, with the default being the last value. Finally the `lh()`

method just returns the length of the pole.

```
def lh(self):
return self.l/2
def __iter__(self):
poles = [self]
pole = self.child
while pole:
poles.append(pole)
pole = pole.child
return iter(poles)
def angle(self, t=None):
if not t:
t = len(self.state["theta"])-1
return self.state["theta"][t]
def angular_velocity(self, t=None):
if not t:
t = len(self.state["d_theta"])-1
return self.state["d_theta"][t]
def __str__(self):
return f"Pole:\n Mass: {self.m}\n Length: {self.l}\n Friction: {self.u_p}\n Child: {self.child is not None}"
```

Moving on we have the cart. Similarly, it has a constructor that takes in all parameters and initializes the state. Two of these parameters are a minimum position `min_x`

and a maximum position `max_x`

which are the boundaries of the track. It also takes in a pole, or rather a chain of poles, and a DC motor which we haven't written yet.

```
class Cart:
def __init__(
self,
mass: float,
cart_friction: float,
x0: float, min_x: float,
max_x: float,
color: tuple[int,int,int],
motor: DCMotor,
pole: Pole
):
self.m = mass
self.u_c = cart_friction
self.min_x = min_x
self.max_x = max_x
self.color = color
self.motor = motor
self.state = {
"x": [x0],
"d_x": [0],
"dd_x": [0]
}
self.pole = pole
def __iter__(self):
return iter(self.pole)
def total_mass(self):
M = self.m
for pole in self:
M += pole.m
return M
def x(self, t=None):
if not t:
t = len(self.state["x"])-1
return self.state["x"][t]
def velocity(self, t=None):
if not t:
t = len(self.state["d_x"])-1
return self.state["d_x"][t]
def acceleration(self, t=None):
if not t:
t = len(self.state["dd_x"])-1
return self.state["dd_x"][t]
```

The update method of the cart calculates `dd_x`

by use of equation (7) and has `dt`

, input voltage `Va`

and `g`

as arguments. With the forward Euler method, we calculate the velocity `d_x`

and position `x`

, but if the position is beyond the boundaries `min_x`

or `max_x`

, we clamp it to the nearest boundary. It then iterates through the chain of poles and updates every one of them. Finally, we also update the state of the motor.

```
def update(self, dt: float, Va: float, g: float):
x = self.x()
d_x = self.velocity()
M = self.total_mass()
sum1 = sum([pole.m*sin(pole.angle())*cos(pole.angle())
for pole in self])
sum2 = sum([pole.m*pole.lh()*pole.angular_velocity()**2
*sin(pole.angle()) for pole in self])
sum3 = sum([pole.u_p*pole.angular_velocity()
*cos(pole.angle())/pole.lh() for pole in self])
sum4 = sum([pole.m*cos(pole.angle())**2 for pole in self])
dd_x = (g*sum1-7/3*((1/self.motor.r**2)*(self.motor.K/self.motor.Ra
*(Va*self.motor.r-self.motor.K*d_x)-self.motor.Bm*d_x)+sum2
-self.u_c*d_x)-sum3)/(sum4-7/3*(M+self.motor.Jm/self.motor.r**2))
d_x = d_x + dd_x * dt
x = self.clamp(x + d_x * dt)
self.state["dd_x"].append(dd_x)
self.state["d_x"].append(d_x)
self.state["x"].append(x)
for pole in self:
pole.update(dt, dd_x, g)
self.motor.update(dt, d_x)
def clamp(self, x: float):
if x > self.max_x:
return self.max_x
elif x < self.min_x:
return self.min_x
return x
```

Lastly, the DC motor. I won't go into detail more than that it has the simplest code. Since we are only interested in its angular velocity `omega`

and angular acceleration `d_omega`

at the moment, most of the state updating happens in the cart and therefore the update method for the motor is very concise.

```
class DCMotor:
def __init__(
self,
resistance: float,
K: float,
inertia: float,
friction: float,
radius: float,
color: tuple[int,int,int]
):
self.Ra = resistance
self.K = K
self.Jm = inertia
self.Bm = friction
self.color = color
self.r = radius
self.state = {
"omega": [0],
"d_omega": [0]
}
def angle(self, t=None):
if not t:
t = len(self.state["omega"])-1
return self.state["omega"][t]
def angular_velocity(self, t=None):
if not t:
t = len(self.state["d_omega"])-1
return self.state["d_omega"][t]
def update(self, dt, d_x):
d_omega = d_x / self.r
omega = self.angle() + d_omega * dt
self.state["d_omega"].append(d_omega)
self.state["omega"].append(omega)
```

All we have to do now is initialize the parameters of our system, i.e. the cart, motor and poles, and give them some initial states.

```
cart = Cart(0.5, 0.05, 0, -0.8, 0.8, colors.red,
DCMotor(0.05, 0.5, 0.05, 0.01, 0.05, colors.black),
Pole(0.2, 10/180*pi, 0.2, 0.005, colors.green,
Pole(0.2, 5/180*pi, 0.15, 0.005, colors.blue,
Pole(0.2, 15/180*pi, 0.15, 0.005, colors.purple, None
)
)
)
)
```

Here we've created a triple-pole system. Each pole has a mass of 0.2 kg and a coefficient of friction at the joint of 0.005. The poles decrease in length from 0.2 m to 0.15 m to 0.1 m and have colors green, blue and purple. They have initial angles of 10, 5 and 15, written in radians.

The cart is red and has a mass of 0.5 kg and a coefficient of friction between it and the track of 0.05. It starts at x=0 on the track and can travel from x=-0.8 m to x=0.8 m. The track, therefore, has a length of 1.6 m.

To simulate the system, we simply have to run a loop where we update the cart. To visualize this, I've made a bad stupid ugly simulator in PyGame. I'm not going to go over this code but you are free to try it out.

```
import sys, pygame
from time import perf_counter
from math import pi, sin ,cos
from cartpole import Cart, Pole, DCMotor
pygame.init()
size = width, height = 1200, 600
screen = pygame.display.set_mode(size)
class Colors:
red = (255,0,0)
green = (0,255,0)
blue = (0,0,255)
purple = (255,0,255)
cyan = (0,255,255)
yellow = (255,255,0)
black = (0,0,0)
white = (255,255,255)
gray = (100,100,100)
colors = Colors
dt = 0.001
g = 9.81
cart = Cart(0.5, 0.05, 0, -0.8, 0.8, colors.red,
DCMotor(0.05, 0.5, 0.05, 0.01, 0.05, colors.black),
Pole(0.2, 10/180*pi, 0.2, 0.005, colors.green,
Pole(0.2, 5/180*pi, 0.15, 0.005, colors.blue,
Pole(0.2, 15/180*pi, 0.10, 0.005, colors.purple, None
)
)
)
)
def si_to_pixels(x: float):
return int(x * 500)
last_update = 0
start_time = perf_counter()
i = 0
font = pygame.font.Font('freesansbold.ttf', 20)
while True:
current_time = perf_counter()-start_time
if current_time > dt + last_update:
last_update = current_time
else:
continue
for event in pygame.event.get():
if event.type == pygame.QUIT:
sys.exit()
screen.fill(colors.gray)
cart.update(dt, 12*cos(i*dt*2), g)
x0 = si_to_pixels(cart.x()) + width//2
y0 = height//2
pygame.draw.rect(screen, cart.color, (x0, y0, 20, 10))
max_x = width//2 + si_to_pixels(cart.max_x)
min_x = width//2 + si_to_pixels(cart.min_x)
pygame.draw.rect(screen, cart.color, (min_x-10, y0, 10, 10))
pygame.draw.rect(screen, cart.color, (max_x+20, y0, 10, 10))
motor_x0 = min_x-100
motor_sin = si_to_pixels(sin(-cart.motor.angle())*0.05)
motor_cos = si_to_pixels(cos(-cart.motor.angle())*0.05)
pygame.draw.polygon(screen, cart.motor.color, [
(motor_x0+motor_sin, y0+motor_cos),
(motor_x0+motor_cos, y0-motor_sin),
(motor_x0-motor_sin, y0-motor_cos),
(motor_x0-motor_cos, y0+motor_sin),
])
x0 += 10
for pole in cart:
x1 = x0 + si_to_pixels(pole.l * sin(pole.angle()))
y1 = y0 + si_to_pixels(-pole.l * cos(pole.angle()))
pygame.draw.line(screen, pole.color, (x0, y0), (x1, y1), 10)
x0 = x1
y0 = y1
texts = [
f"Time: {round(i*dt,2)} s",
f"",
f"Cart:",
f"Position: {round(cart.x(),2)} m",
f"Velocity: {round(cart.velocity(),2)} m/s",
f"Acceleration: {round(cart.acceleration(),2)} m/s^2",
f"",
f"Motor:",
f"Angle: {round(cart.motor.angle(),2)} rad",
f"Angular velocity: {round(cart.motor.angular_velocity(),2)} rad/s",
]
for k, text_k in enumerate(texts):
text = font.render(text_k, True, colors.black, colors.gray)
text_rect = text.get_rect()
screen.blit(text,(0,(text_rect.height)*k,text_rect.width,text_rect.height))
pygame.display.flip()
i += 1
```

Concisely, we update the state of the system every 1 ms and then draw out the different subsystems. The boundaries are marked by red squares and the motor is a square that spins in place. We also write out some data in the top-left corner. The input voltage `12*cos(i*dt*2)`

is a cosine function of time that makes the cart move backward and forwards repeatedly.

And there we go, all done!

A big flaw in the system is that we are using the forward Euler method for numerical integration. This isn't as accurate as other methods, especially with a larger time step. In the future, I'll probably use some Runge-Kutta method instead, but I need to do some research first. Using a time step that is too big also makes the system go unstable. For my simulator, the system goes unstable with a time step greater than around 0.03.

I've ordered all of the parts for the real physical model that I'm building. Until they arrive, I will continue developing the model and simulator. The model is incomplete as it does not include any gear ratio between the motor and timing belt.

The simulator is still very ugly and badly written. I plan to remake the simulator and convert it into an OpenAI gym environment so that it is easy to use for training a reinforcement learning model. Next up is doing just that and perhaps writing a simple controller for balancing a single pole.

[1] Green, Colin D. (2020) Equations of Motion for the Cart and Pole Control Task.

[link1]