$\renewcommand{\vec}[1]{{\bf{#1}}}$Quantum mechanics suggests that a physical property of a particle doesn't exist until it is measured. For instance, an electron has a property called spin that, unlike the angular momentum of a classical object, can only have two values, "up" and "down"; but the orientation of this "up" and "down" is determined by the measuring apparatus.

The spin of an electron may not exist until it is measured but conservation laws still apply. There are experiments that produce pairs of electrons whose combined spin must be zero. Once the spin of the electron on one end is measured, the spin of the other electron will become known as long as the measuring devices are aligned in parallel on the two ends. This applies even if the measuring devices are put in place after the two electrons are generated! And if the two measuring devices aren't aligned, a correlation will still exist between the two electrons that is difficult to explain in classical terms.

Or is it? No less a scientist than Einstein proposed that it isn't: that the problem is really that our knowledge of the system is incomplete and that there are hidden variables that determine the system's behavior fully.

Okay, so imagine some kind of an experiment that creates a pair of correlated electrons that fly off towards the $A$ and $B$ ends of the experimental setup. Let's assume that the outcome of the experiment at the $A$ end depends on two factors: the orientation of the experimental device at $A$ which we represent with $\vec{a}$, and some hidden parameter(s) represented by the Greek letter $\lambda$. In other words, $A(\vec{a},\lambda)=\pm 1$. Similarly at the $B$ end, the outcome of the experiment is a function of $\vec{b}$ (the orientation of the apparatus) and $\lambda$: $B(\vec{b},\lambda)=\pm 1$.

What is really important to notice is that we specifically assume that $A$ does not depend on $\vec{b}$, and $B$ does not depend on $\vec{a}$. In other words, the experiment on one end only depends on the setup of the experimental device on that end and the properties of the electron, but does not depend on the configuration of the experimental device at the other end.

$\lambda$ can have many values; however, what we do know is that it has a probability distribution function (representing, for each $\lambda$, the probability of that value occurring) and it is normalized:

$\int\limits_{-\infty}^{+\infty}\rho(\lambda)d\lambda=1.$

The expectation value of a quantum mechanical experiment is, essentially, the average of the values measured over several experiments. From quantum mechanics, we can compute the expectation value of the product of $A$ and $B$: in certain experiments (for instance, when the electron spin is measured with so-called Stern-Gerlach magnets where $\vec{a}$ and $\vec{b}$ is the magnets' orientation) this expectation value will be the dot product of the two vectors $\vec{a}$ and $\vec{b}$ times $-1$. But we can also compute the expectation value, $P(\vec{a},\vec{b})$ using what we know from probability calculus:

$P(\vec{a},\vec{b})=\int\limits_{-\infty}^{+\infty}\rho(\lambda)\cdot A(\vec{a},\lambda)\cdot B(\vec{b},\lambda) d\lambda.$

What Bell did was to prove that no matter how you choose $A$ and $B$, in general $P(\vec{a},\vec{b})$ cannot be the same as $-\vec{a}\cdot\vec{b}$. Which implies that our assumption, namely that the outcome at $A$ depends only on $\vec{a}$ and $\lambda$, or the outcome at $B$ depends only on $\vec{b}$ and $\lambda$, must be false; the outcome at one end depends on the experimental setup at the other and vice versa.

To prove this, let's introduce another symbol: $P^{xy}(\vec{a},\vec{b})$ is to represent the probability that the outcome of the experiment will be $x$ at $A$ and $y$ at $B$. So for instance, $P^{++}(\vec{a},\vec{b})$ will represent the probability that we measure a $+1$ spin at both $A$ and $B$. We can then construct the following:

$E=P^{++}+P^{--}-P^{+-}-P^{-+}.$

We can prove that $|E(\vec{a},\vec{b})+E(\vec{a}',\vec{b})+E(\vec{a},\vec{b}')-E(\vec{a}',\vec{b}')|\le 2$. But this inequality is not true in the quantum mechanical case. In the case of the Stern-Gerlach magnets, the probabilities can be computed as follows: $P^{++}=P^{--}=\frac{1}{2}\sin^2\frac{\phi}{2}$, and $P^{+-}=P^{-+}=\frac{1}{2}\cos^2\frac{\phi}{2}$ (where $\phi$ is the angle between the orientations of the two magnets), so $E=-\cos\phi$. For instance, if $\vec{a}$, $\vec{b}$, $\vec{a}'$ and $\vec{b}'$ are vectors pointing respectively at $0^\circ$, $45^\circ$, $90^\circ$ and $-45^\circ$, then $E(\vec{a},\vec{b})=E(\vec{a}',\vec{b})=E(\vec{a},\vec{b}')=-\cos 45^\circ$, and $E(\vec{a}',\vec{b}')=\cos 135^\circ$, so $|E(\vec{a},\vec{b})+E(\vec{a}',\vec{b})+E(\vec{a},\vec{b}')-E(\vec{a}',\vec{b}')|=|-3\cos 45^\circ+\cos 135^\circ|=2\sqrt{2}$, which is decidedly greater than 2.

To prove that $|E(\vec{a},\vec{b})+E(\vec{a}',\vec{b})+E(\vec{a},\vec{b}')-E(\vec{a}',\vec{b}')|\le 2$, we first spell out $E$:

$E=\int d\lambda\rho(\lambda)\left\{P_A^+(\vec{a},\lambda)-P_A^-(\vec{a},\lambda)\right\}\left\{P_B^+(\vec{b},\lambda)-P_B^-(\vec{b},\lambda)\right\},$

where $P_A$ and $P_B$ represent the experimental probabilities at the two ends of the apparatus. Since these are probabilities, it follows that $0\le P_A\le 1$ and $0\le P_B\le 1$; therefore, $|P^+-P^-|\le 1$. As a shorthand, we can write $\bar{A}$ and $\bar{B}$ for the two subexpressions in the curly braces:

$E=\int d\lambda\rho(\lambda)\bar{A}(\vec{a},\lambda)\bar{B}(\vec{b},\lambda).$

As before, $\bar{A}\le 1$ and $\bar{B}\le 1$.

We can then write:

$E(\vec{a},\vec{b})+E(\vec{a},\vec{b}')=\int d\lambda\rho(\lambda)\bar{A}(\vec{a},\lambda)|\bar{B}(\vec{b},\lambda)+\bar{B}(\vec{b}',\lambda)|,$

from which:

$|E(\vec{a},\vec{b})+E(\vec{a},\vec{b}')|\le\int d\lambda\rho(\lambda)|\bar{B}(\vec{b},\lambda)+\bar{B}(\vec{b}',\lambda)|.$

Likewise:

$|E(\vec{a}',\vec{b}')-E(\vec{a}',\vec{b}')|\le\int d\lambda\rho(\lambda)|\bar{B}(\vec{b},\lambda)-\bar{B}(\vec{b},\lambda)|.$

But from $\bar{B}\le 1$, it follows that:

$|\bar{B}(\vec{b},\lambda)+\bar{B}(\vec{b}',\lambda)|+|\bar{B}(\vec{b},\lambda)-\bar{B}(\vec{b}',\lambda)|\le 2.$

And since $\int\rho(\lambda)d\lambda=1$, we can assert that $|E(\vec{a},\vec{b})+E(\vec{a}',\vec{b})|+|E(\vec{a},\vec{b}')-E(\vec{a}',\vec{b}')|\le 2$, from which $|E(\vec{a},\vec{b})+E(\vec{a}',\vec{b})+E(\vec{a},\vec{b}')-E(\vec{a}',\vec{b}')|\le 2$, which is just what we set out to prove.

. . .

What this all means is that the outcome of the experiment at $A$ can only be explained as a function of $\vec{a}$ (the setup of the measuring apparatus at $A$), $\lambda$ (whatever parameters are "internal" to the electron) and $\vec{b}$! I.e., some information about the setup of the measuring apparatus at $\vec{b}$ will be "known" to the electron at $\vec{a}$ even if there is no conventional means of transmitting this information from $B$ to $A$; in fact, even if $B$ is an apparatus operated by little green men at Alpha Centauri, who will only perform the measurement some four and a half years from now, and haven't even built their measuring apparatus yet!

References

Bell, J. S., Speakable and unspeakable in quantum mechanics, Cambridge University Press, 1989