Note (November 3, 2005): There are many things I didn't understand very well when I wrote this article below back in 2003. What I am describing here can be much more succinctly explained this way: The motivation is to seek a Lorentz-invariant wave equation of a free particle. For this reason, we start with the (Lorentz-invariant) relativistic equation of motion of a free particle,
p2 = m2, where p is the four-momentum. We quantize this equation by formally replacing the momentum with the momentum operator, and appending the wave function thus:p ˆ2ψ = m2ψ. We then formally take the "square root" of this equation (this is Dirac's magnificent "trick"), taking into account that the momentum is a vectorial quantity:αip ˆiψ = mψ. One way to make this work is to haveαi that satisfy the equation,αiαj + αjαi = ηij + ηji, where ηij is the Minkowski-metric. (Implicit in the argument is that theαi and the momentum operator commute, but that would indeed be the case when theαi are constant and the momentum operator is a derivative operator.) This in particular means thatαiαj + αjαi = 0 wheni ≠ j, meaning that theαi cannot be ordinary numbers that commute under multiplication. Such noncommutative algebras can be represented using appropriately chosen matrices, but in four dimensions (where this algebra is the Clifford-algebra Cl1,3), a conceptually simpler(?) representation can be achieved using biquaternions (quaternions over the complex numbers), which is what I am exploring below. Now without further ado, here are my thoughts from two years ago.
Klein-Gordon equation
In order to satisfy the conditions of special relativity, the Schrödinger equation needs to be modified. Originally derived from the (non-relativistic) relationship between momentum and energy:
E = p²+ V 2m the relativistic version should instead be based on
E² = p² + m²
It is, in fact, possible to derive just such a wave function equation, but it isn't without problems. The equation is called the Klein-Gordon (KG) equation (in the following discussion, ħ is assumed to be 1):
– ∂²φ= (–∇² + m²)φ² ∂t² Just like in the case of the Schrödinger equation, it is possible to derive from the KG equation a continuity equation, by multiplying the equation on the left by φ*, multiplying its complex conjugate on the left by φ, and subtracting one from the other:
[KG]
–∇²φ + m²φ + ∂²φ= 0 ∂t²
0 = φ*∇²φ – φ*m²φ – φ* ∂²φ– φ∇²φ* + φm²φ* + φ ∂²φ*= ∂t² ∂t²
= ∇(φ*∇φ – φ∇φ*) – φ* ∂²φ+ ∂φ* ∂φ– ∂φ ∂φ*+ φ ∂²φ*= ∂t² ∂t ∂t ∂t ∂t ∂t²
= ∇(φ*∇φ – φ∇φ*) – ∂┌
│
│
└φ* ∂φ– φ ∂φ*┐
│
│
┘∂t ∂t ∂t Substituting
j = –i(φ*∇φ – φ∇φ*) ρ = i ┌
│
│
└φ* ∂φ– φ ∂φ*┐
│
│
┘∂t ∂t we get
i ┌
│
│
└∇j + ∂ρ┐
│
│
┘= 0 ∇j + ∂ρ= 0 ∂t ∂t But herein lies the problem. The expression we get for the probability density ρ in this equation is not positive definite!
Dirac-equation
The solution to this dilemma was first proposed by Dirac. First of all, he postulated another equation, linear in ∂/∂t to ensure that the probability density will be positive definite, and also linear in ∇ for purposes of relativistic covariance:
<assumption>
∂φ= (–iα∇ + βm)φ ∂t From this and the KG equation, we can derive some conditions for α and β. Squaring the operator on both sides of Dirac's equation we get:
┌
│
│
└ ∂┐2
│
│
┘φ = (–iα∇ + βm)²φ = ∂t
= – 3
∑
i = 1αi² ∂²φ+ 3
∑
i, j = 1(αiαj + αjαi) ∂²φ– im 3
∑
i = 1(αiβ + βαi) ∂φ+ β²m²φ (∂xi)² ∂xi∂xj ∂xi But we assumed that φ also satisfies the KG equation:
┌
│
│
└ ∂┐2
│
│
┘φ = 3
∑
i = 1 ∂²φ– m²φ ∂t (∂xi)² This leads us to the following conditions:
αiβ + βαi = 0
αiαj + αjαi = 0
αi² = 1, β² = 1A contradiction? Perhaps, if you assume that αi, β, and φ are ordinary (real or complex) scalar quantities. But what if they aren't?
Quaternions
Most textbooks at this point proceed to introduce the Pauli-matrices. But, in my opinion, those matrices are an unnecessary distraction when a much more powerful concept is available to us: Quaternions to the rescue!
Quaternions go where no complex number has gone before: instead of one, we have three imaginary units, all obeying the equation
i² = j² = k² = –1. Quaternion multiplication is non-commutative:ij = –ji = k, jk = –kj = i, ki = –ik = j. The multiplication rules should make it evident that just as a complex number can be expressed using two real numbers, a quaternion can be expressed using two complex numbers:
a + bi + cj + dk = (a + bi) + (c + di)j Quaternions have quaternionic conjugates:
(a + bi + cj + dk)* = a – bi – cj – dk Like real and complex numbers, quaternions form a division algebra: simply put, for two quaternions a and b, their product is zero iff at least one of a or b is zero.
Before proceeding any further, it is worthwhile to examine whether the quaternionic unit i is the same as the imaginary unit i that appears in the Dirac equation. Sadly, the answer is no: if you try to use the quaternionic unit, it is not possible to simplify the equation as elegantly as it was done here, because i and αi or β do not commute. In other words, we require both complex and quaternionic quantities: the algebra we use is the product of C and H, also referred to as quaternions over the complex numbers.
Dirac-matrices
Even with quaternions, all the conditions for the coefficients αi and β are difficult to satisfy at the same time. That is because quaternions provide only three quantities that satisfy an anticommutation relation, whereas we need four. Because of this, we need to take the next best thing, which is a 2×2 quaternionic matrix (which, as per the above, is the same as a 4×4 complex matrix.) The choice is not unique; one specific set of matrices looks like this:
α 1 =┌
│
│
└0 i
–i 0┐
│
│
┘, α2 =┌
│
│
└0 j
–j 0┐
│
│
┘, α 3 =┌
│
│
└0 k
–k 0┐
│
│
┘, β = ┌
│
│
└0 1
1 0┐
│
│
┘The use of quaternionic matrices as coefficients in the Dirac equation implies that the wave function is no longer a (real or complex) scalar function either: instead, it itself becomes a quaternionic 2-vector.
This is a crucial thought: two seemingly contradictory conditions (the KG-equation and the Dirac-equation) are resolved when we switch from using real or complex numbers to some other kind of quantity.
Probability current
The Dirac equation solves the problem with probability density. Multiplying the Dirac-equation with the conjugate transpose of φ on the left, multiplying the conjugate transpose of the Dirac-equation by φ on the right, and adding the results, we get:
φ† ∂φ= φ†(–iα∇ + βm)φ ∂t
∂φ†φ = φ†(–iα ←
∇
– βm)φ ∂t
φ† ∂φ+ ∂φ†φ = –φ†iα∇φ + φ†βmφ – φ†iα ←
∇
φ – φ†βmφ ∂t ∂t
∂(φ†φ) = –∇(φ†iαφ) ∂t The symbol is borrowed from Aitchison (1996): is a row vector (recall that φ† is a quaternionic row vector that is the conjugate transpose of the column vector φ).
Substituting
ρ = φ†φ, j = φ†iαφ we get
∂ρ+ ∇j = 0 ∂t The expression for the probability density ρ is positive definite: for a two-component quaternionic vector
φ = ( it is justφ 1,φ 2),| Therefore, the continuity equation we just derived from the Dirac equation is acceptable.φ1 |² + |φ2 |².
Negative energy
But what does a wave function that is a quaternionic 2-vector physically represent? Let's examine the energies associated with the two elements of this quaternionic wave function.
From the KG equation (which our wave function was postulated to satisfy) we know that energy can be positive or negative:
E = ±√p² + m² In component form, after substituting our choice of values for α and β, the quaternionic 2-vector version of the Dirac-equation looks like this:
∂┌
│
│
└φ1
φ2┐
│
│
┘= ┌
│
│
└–im
–(i j k) · ∇(i j k) · ∇
im┐
│
│
┘┌
│
│
└φ1
φ2┐
│
│
┘∂t Or, multiplied by i on both sides:
i ∂┌
│
│
└φ1
φ2┐
│
│
┘= ┌
│
│
└–im
–i(i j k) · ∇i(i j k) · ∇
im┐
│
│
┘┌
│
│
└φ1
φ2┐
│
│
┘∂t Now is the time to recognize that i∂/∂t is none other but the energy operator, and ∇ is none other but the momentum operator multiplied by –i. For a particle at rest, its momentum is zero, therefore its energy will be:
E = ±√p² + m² = ±m Choosing E to be positive, we get
m ┌
│
│
└φ1
φ2┐
│
│
┘= ┌
│
│
└m
00
–m┐
│
│
┘┌
│
│
└φ1
φ2┐
│
│
┘which implies
φ2 = 0, φ = ┌
│
│
└φ1
0┐
│
│
┘Similarly, if we choose E to be negative, we get
φ1 = 0, φ = ┌
│
│
└0
φ2┐
│
│
┘What this means is that we have decomposed our quaternionic two-vector wave function into two quaternionic scalar wave functions: one corresponding with a positive energy solution, the other corresponding with a negative energy solution. Dirac's interpretation is that these two solutions represent a particle and an antiparticle; the fact that they show up inseparably together in the Dirac-equation implies that particles and antiparticles can only be created and annihilated together.
Spin
But why is the wave function of a particle itself a quaternionic function as opposed to an "ordinary" complex-valued function? If we map the quaternion to a pair of complex numbers, we have the answer: What it really tells us is that a particle represented by this wave function has an extra internal degree of freedom with two discrete values. This in fact, is precisely what a spin-1/2 particle is. Significantly, the Dirac-equation does not admit a solution describing a particle with no internal degrees of freedom; in other words, it implies that nature does not like spin-0 particles.
Aitchison, I. J. R. & Hey, A. J. G., Gauge Theories in Particle Physics, Institute of Physics Publishing, 1996 and, assuming it is not overly pretentious to reference one's own unpublished article from within another unpublished article: Toth, Viktor T, Principles of Elementary Quantum Mechanics, http://www.vttoth.com/q.htm, 2003