Okay, so let me try and make sense of the theoretical logic behind thermodynamics and statistical physics. My intent is to develop the basic equations from first principles without resorting to either quantum mechanics (which is what Landau and Lifshitz are doing) or weird Carnot-machines. Let's see how far I can go.
Equilibrium thermodynamics is based on the principle that the state of a system is
completely determined by its volume V and pressure p. Furthermore,
there is the concept of equilibrium (the zeroeth law). Two systems are in
equilibrium when a function F (unknown for the moment) is zero
p3 = f(p1, V1, V3) = f(p2, V2, V3).
Since the equality persists regardless of the value of V3, we conclude that there exist an equation of state that has the same value for systems that are in equilibrium:
φ(p1, V1) = φ(p2, V2).
We define the empirical temperature θ as
θ = φ(p, V).
From the basic definition of work,
W = – ∫ p dV,
where the negative sign is used conventionally to indicate that work is invested into the system when it is compressed (i.e., its volume decreases.)
In an isolated system, energy is conserved. Any work done on the system must be accompanied by a change in the system's internal energy U:
ΔU = W.
For a system not in isolation, its interaction with its surroundings can be described by the difference,
Q = ΔU – W,
or, in infinitesimal notation:
dQ = dU – dW,
where dQ is the infinitesimal quantity of heat exchanged between the system and its environment.
Now I must stop for a moment and remark on something. Many textbooks use some
notation to distinguish between dQ and dW on the one hand, and
dU on the other. To dU, there corresponds a quantity U, the
internal energy, that changes at the rate dU, but the same is not true
for Q or W! There is no "reservoir of work" from which we take, or
to which we put back some infinitesimal amount of work dW. Similarly,
there is no "reservoir of heat" (indeed, this would be the 19th century concept
of caloricum). Some authors use a crossed đ symbol to denote đQ
and đW, while other authors avoid using a "d-notation" altogether,
and use the language of differential forms instead. In this language, we would
say that neither ψ or ω are closed, meaning that there are not
necessarily quantities Q or W such that
I will be pragmatic (if it quacks like a duck...) and I'll continue to use the d-notation simply because it is practical. Just keep in mind that writing dX doesn't always imply that there exists an X.
We can rewrite our previous equation using the definition of work as
dQ = dU + p dV.
This equation is the first law of thermodynamics, expressing the idea of energy conservation.
The second law of thermodynamics states, simply put, that there are
irreversible processes. From this, two conclusions follow. First, that we can
express dQ in the form
What is an irreversible process? Here's one way to think about it. The state of a system is fully determined by p and V, as per our definition. In other words, we don't consider a change in the internal energy U a change of state, if p and V remain constant. So here's an excellent question, then. Suppose you burn some fuel. The internal energy U changes as chemical energy is converted into, what? A change in pressure and/or volume, which means the ability to do mechanical work? Or a change in heat?
A process is called adiabatic if dQ = 0, i.e., the system does not exchange heat with its environment. The second law states, simply put, that there are states of a system (represented by different values of p and V) that cannot be reached by an adiabatic process, i.e., a curve along which dQ = 0.
That this simple statement has far-reaching consequences is due to
Carathéodory's theorem, which states that in this case, there exist
functions λ and φ such that
Of course this alone does not determine the functions λ and φ. To take that step, it is necessary to make the assumption that heat is additive. In particular, if we have two systems (ignoring interaction energies between the two), then the total change in energy is the sum of individual changes, i.e., dQ = dQ1 + dQ2, and hence
λ dφ = λ1dφ1 + λ2dφ2.
or
dφ = (λ1/λ) dφ1 + (λ2/λ) dφ2.
But this means that there exist functions f1 and f2
such that
log λ1 – log λ2 = log f1/f2.
On the right-hand side, f1/f2 is not a function of the empirical temperature θ, so if we take the partial derivative with respect to θ, we get
∂(log λ1)/∂θ = ∂(log λ2)/∂θ.
This can only happen if both sides of this equation depend only on θ, i.e., there exists a function
g(θ) = ∂(log λ)/∂θ.
What if we replace θ with another function T(θ)? Not just any T, but one that satisfies the equation,
dT/dθ = Tg(θ).
In this case, we have
∂(log λ)/∂T = g(θ)[∂T/∂θ]–1 = 1/T = ∂(log T)/∂T.
The equation dT/dθ = Tg(θ) determines T up
to a multiplicative constant:
log λ = log T + log K,
where K is independent of T. Or,
dQ = TK(φ)dφ.
Solving the equation
dQ = T dS, or
dQ/dS = T.
In this equation, T is called the absolute temperature (remember, our reasoning determined T up to a multiplicative constant, and if we make this constant positive, T will always be positive as well) and S is called the entropy. We can rearrange our equation to read
T dS – dU – p dV = 0, or
dU = T dS – p dV.
This is a fundamental equation of thermodynamics that combines the first and second laws.
Remember our earlier discussion about U being a "proper" function? The existence of U implies that, if we write it as a function of S and V, we have
U = (∂U/∂S) dS + (∂U/∂V) dV.
From this, we obtain
T = ∂U/∂S, and
p = –∂U/∂V.
* * *
Here are two questions to which I failed to provide an answer up to this point. That is to say that I know these statements to be true, I just don't know how to derive them rigorously, using only the axioms of thermodynamics and nothing else (nothing from statistical physics, in particular):
- Why is U expressible as U(S, V)? (I.e., why is the state describable by these two coordinates? Is it simply that the system is two-dimensional by definition, and S and V are independent, or is there something else needed to formally prove this bit?)
- Why is S additive? Does it follow from the fundamental equation when we divide a system into two parts, and from the fact that the temperatures in the two parts are equal, and the energies and volumes are additive?
* * *
Since the order of partial differentiation doesn't matter, we also have
∂T/∂V = ∂[∂U/∂S]/∂V = ∂[∂U/∂V]/∂S = –∂p/∂S.
This and other similar relations, called Maxwell's relations, are most compactly expressed by the Jacobian determinant:
∂(T, S)= 1. ∂(p, V)
The results of axiomatic thermodynamics are derived from the basic postulates, or axioms, that are the "Laws of Thermodynamics", without making any assumptions about the properties of the underlying matter. So here's a good question: is it possible to derive at least some of those postulates if we make assumptions about the nature of matter?
A good starting point is to take a collection of N identical particles of mass m. (A similar reasoning can be developed for particles of varying mass, it just gets more complicated.) We assume that the particles behave in accordance with the laws of classical mechanics. What this means is that the particles' motion is described by second-order differential equations that contain no explicit dependence on the first derivative, i.e., x'' = f(x, t). For bodies whose motion is governed by such equations, there are seven additive constants of motion: the energy E, the three components of the momentum vector P, and the angular momentum bivector M.
Now we divide the system of N particles into two parts, with particle
counts N1 and N2
The combined probability, then, that the system as a whole is in a particular state is the product of individual probabilities: ρ = ρ1ρ2, or log ρ = log ρ1 + log ρ2.
Next, we invoke Liouville's theorem which states that, since ρ
is not explicitly a function of time, it'll stay constant along the path of a
particle in phase space. In other words, it is a constant of motion. Moreover,
log
log ρ = α + βE + γ · P + δ · M.
In other words, the statistical distribution of a system is completely determined by its macroscopic properties, namely its total energy E, momentum P, and angular momentum M.
Furthermore, by choosing a "comoving" coordinate system, we can eliminate P and M, so we are left with
log ρ = α + βE.
One observation at this point is that we need not concern ourselves with the specific nature of the statistical distribution function ρ. As a matter of fact, any function will do, so long as it produces the appropriate macroscopic properties. In particular, we may choose
ρ = Cδ(E – E0),
where C is some constant, and δ is the Dirac delta function.
What this expression is saying is that a probability distribution that gives a
probability of 1 for the system to be in the state
Entropy is defined as the negative average of log
S = –<log ρ> = –∫ ρ log ρ dp dq,
where the integration is meant to be across all possible values of all p and q components. (The total number of these components, i.e., the degree of freedom of the system, will usually be 2DN where D is the dimensionality of the system, and N is the number of particles involved.)
Since ρ is between 0 and 1 (it is a probability distribution function), the expression under the integral sign will be negative, so the RHS will be positive.
Furthermore, while ∫ ρ dp dq = 1 by definition since ρ is a probability distribution function, there are macroscopic states of the system where ρ is nearly flat, and there are cases where it produces a sharp peak. The expression for S will be highest in the latter case. I have no idea how to prove this in the general case (indeed, how to measure "peakiness" in the general case) but if ρ(x) is a normal distribution, λρ(λx) forms a sharper peak if λ is greater. In this case, the definite integral will be –λ(1 + log 2π)/2, so S = λ(1 + log 2π)/2, which is indeed bigger if λ is bigger.
So now take a system and divide it into two subsystems. For both subsystems, ∫ ρi dpi dqi = 1 (i = 1, 2). The combined entropy of this system is
S = –<log ρ> = –∫ ρ1ρ2 log ρ1ρ2 dp1 dq1 dp2 dq2.
Since ρ1 is a function of only p1 and q1, and ρ2 is a function of only p2 and q2, the integral can be expanded and we get
S = ∫ ρ1ρ2 log ρ1ρ2 dp1 dq1 dp2 dq2 =
∫ ρ1ρ2 log ρ1 + ρ1ρ2 log ρ2 dp1 dq1 dp2 dq2 =
∫ ρ2( ∫ ρ1 log ρ1 dp1 dq1) dp2 dq2 + ∫ ρ1( ∫ ρ2 log ρ2 dp2 dq2) dp1 dq1 =
∫ ρ2 S1 dp2 dq2 + ∫ ρ1 S2 dp1 dq1 =
S1∫ ρ2 dp2 dq2 + S2∫ ρ1 dp1 dq1 = S1 + S2.
Hence, S is additive.
* * *
The second law of thermodynamics states that in an equilibrium system, S is maximal. I am looking for a derivation of the second law from the idea that the system is in a maximum probability state. This means thatρ = ρ1ρ2 is maximal, which presupposes that the two subsystems and their respective probability distributions ρ1 and ρ2 are respectively in their "peakiest" states to allow ρ to be maximal. But, I do not yet know how to express this mathematically.
* * *
So for now, let's just accept that S is maximal. (Landau and Lifshitz do pretty much the same thing, so I am in respectable company.)
If S is maximal in the equilibrium state, in a non-equilibrium, as it tends towards equilibrium, dS/dt > 0. (This, really, is the second law.)
So now let's say that a system is in equilibrium with maximal S. Divide the system into two parts. Remember, both S and E are
additive constants of motion, so
The quantity dS/dE is therefore the same for all subsystems of
an equilibrium system. By definition,
If the entropy is a function of not just the energy but other variables, we'll need to use partial derivatives: ∂S/∂E = 1/T, or
∂E/∂S = T.
There is another important relationship involving the energy and partial
derivatives. By definition, pressure is the force perpendicular to a surface
divided by area. Force is the change in energy over distance, i.e.,
∂E/∂V = –p.
Combining these two equations, we obtain the fundamental equation of thermodynamics:
dE = T dS – p dV.
Remember that earlier, we determined that
As we discussed earlier, the macroscopic properties of the system do not
change if we use
First, denoting the volume element in phase space,
Second, the entropy being defined as
ρ = A e–E/T.
This is the Gibbs distribution. It has quite universal validity; in fact, I think it applies to all systems that are in thermodynamic equilibrium.
For a system in which the potential energy is a function only of the positions q, and the kinetic energy is a function only of the momenta p, this splits into two parts:
ρ = A exp(–U(q)/T – K(p)/T) .
This is the Maxwell distribution. A lengthy, but not particularly difficult derivation (see Landau & Lifshitz) can be used to determine that the average kinetic energy of particles will be DT/2, where D is the number of spatial dimensions.
The Maxwell distribution can be used, among other things, to describe collisionless systems, be they stars in a galaxy, or molecules in a gas.
1Paul Bamberg and Shlomo Sternberg: "A course in mathematics
for students of physics 2", Cambridge University Press,
1990
2L. D. Landau and E. M. Lifshitz: "Theoretical Physics V:
Statistical Physics I", Nauka,
1976
3M. S. Longair: "Theoretical concepts in physics",
Cambridge University Press,
1987