A handful of the comments are skeptical of the utility of this method. I can tell you as a physical scientist, it is common to make the same measurement with a number of measuring devices of differing precision. (e.g. developing a consensus standard using a round-robin.) The technique Cook suggests can be a reasonable way to combine the results to produce the optimal measured value.
I'm not a physical scientist, but I spend a lot of time assessing the performance of numerical algorithms, which is maybe not totally dissimilar to measuring a physical process with a device. I've gotten good results applying Simple and Stupid statistical methods. I haven't tried the method described in this article, but I'm definitely on the lookout for an application of it now.
I wonder if this minimum variance approach of averaging the measurements agrees with the estimate of the expected value we'd get from a Bayesian approach, at least in a simple scenario, say a uniform prior over the thing we're measuring and assume that our two measuring devices have unbiased errors described by normal distributions.
At least in the mathematically simpler scenario of a gaussian prior and gaussian observations, the posterior mean is computed by weighing by the the inverses of variances (aka precisions) just like this.
To add, for anyone who's followed the link - that's entries numbers 1 and 2, or "Normal with known variance σ²" and Normal with known precision τ", under "When likelihood function is a continuous distribution".
Also, note that the "precision" τ is defined as 1/σ².
This is equivalent to inverse variance weighting. For independent random variable, this is the optimal method to combine multiple measurements. He just used a different way to write the formula and connect that to other kinds of functions.
He also frames it as a different goal too: normally when we (as a physicist) talks about the random variables to combine, we think of it as different measurements of the same thing. But he didn’t even assume that: he’s saying if you want to have a weighted sum of random variables, not necessarily expected to be a measurement of the same thing (eg share same mean), this is still the optimal solution if all care is minimal variance. His example is stock, where if all you care is your “index” being less volatile, inverse variance weighting is also optimal.
As I’m not a finance person, this is new to me (the math is exactly the same, just different conceptually in what you think the X_i s are).
I wish he mention inverse variance weighting just to draw the connection though. Many comments here would be unnecessary if he did.
Yeah and this is a much more intuitive way of generalising from the n = 2 case. Weights are proportional to inverse variance even for n > 2. Importantly this assumes independence so it doesn’t translate to portfolio optimisation very easily.
Your slippery slope makes no sense to me. What do we need XML for here? Is anybody asking for it? You can use your own grammar checker but you can't render your own equations and submit them.
Even if you personally had a mathjax extension, you would still be prevented from explaining math to others, unless you could convince everyone to install it.
ADDED. Because the new functionality will be used to create cutesy effects for reasons that have nothing to do with communicating math, increasing the demand for moderation work.
Why? Latex is not how maths if supposed to be read, else we'd all be doing that. It's how it might be written.
edit: Nobody is going to use maths for cutesy effects. Where have you ever seen that happen? Downvote them if they do. It is not going to be a big deal.
I realize that this is meant as an exercise to demonstrate a property of variance. But most investors are risk-averse when it comes to their portfolio - for the example given, a more practical target to optimize would be worst-case or near-worst-case return (e.g. p99). For calculating that, a summary measure like variance or mean does not suffice - you need the full distribution of the RoR of assets A and B, and find the value of t that optimizes the p99 of At+B(1-t).
If A and B have different volatilities, it's rather counter-intuitive to allocate proportionally rather than just all to the one with the lower volatility... :-/
I agree, and I had to think about it for a second, but now it seems obvious. It works for the exact same reason that averaging multiple independent measurements can give a more accurate result. The key fact is that the different random variables are all independent, so it's unlikely that the various deviations from the means will line up in the same direction.
Yes, I think that's part of the point of the post. One intuition is that allocating only a little bit to a highly volatile asset creates a not-very volatile asset. Investing a little bit is the same as scaling the asset down until it's not very volatile.
I wish there was a Strunk and White for mathematics.
While by no means logically incorrect, it feels inelegant to setup a problem using variables A and B in the first paragraph and solve for X and Y in the second (compounded with the implicit X==B, and Y==A).
Thanks. Higham explicitly addresses the authors substitution crime in section 2.5. Wonderful resource.
My complaint stems more to the general observation that readability is prized in math and programming but not emphasized in traditional education curriculum to the degree it is in writing.
Bad style is seldomly commented on in our profession.
There exists a problem in real life that you can solve in the simple case, and invoke a theorem in the general case.
Sure, it's unintuitive that I shouldn't go all in on the smallest variance choice. That's a great start. But, learning the formula and a proof doesn't update that bad intuition. How can I get a generalizable feel for these types of problems? Is there a more satisfying "why" than "because the math works out"? Does anyone else find it much easier to criticize others than themselves and wants to proofread my next blog post?
Here's my intuition: you can reduce the variance of a measurement by averaging multiple independent measurements. That's because when they're independent, the worst-case scenario of the errors all lining up is pretty unlikely. This is a slightly different situation, because the random variable aren't necessarily measurements of a single quantity, but otherwise it's pretty similar, and the intuition about multiple independent errors being unlikely to all line up still applies.
Once you have that intuition, the math just tells you what the optimal mix is, if you want to minimize the variance.
This all hinges on the fact the variance is homogeneous to X^2, not X. If we look at the standard deviation instead, we have the expect homogeneity: stddev(tX) = abs(t) stddev(X). However, it is *not linear*, rather stddev(sum t_i X_i) = sqrt(sum t_i stddev(X_i)) assuming independent variables.
Quantitatively speaking, t^2 and (1-t)^2 are always < 1 iff |t| < 1 and t != 0. As such, the standard deviation of a convex combination of variables is *always strictly smaller* than the convex combination of the standard deviations of the variables. In other words, stddev(sum_i t_i X_i) < sum_i t_i stddev(X_i) for all t != 0, |t|<1.
What this means in practice is that the convex combination (that is, with positive coeffs < 1) of any number of random variables is always smaller than the standard deviation of any of those variables.
The primary problem with this method is that while one correctly assumes one cannot forecast future returns, it incorrectly assumes one can correctly forecast future volatility of those returns. To be sure, Vol / variance of returns is more predictable than returns, but it's not perfectly predictable.
This is why Markowitz isn't used much in the industry, at least not in a plug-and-play fashion. Empirical volatility, and the variance
-covariance matrix more generally speaking, is a useful descriptive statistic, but the matrix has high sampling variance, which means Markowitz is garbage in garbage out. Unlike in other fields, you can't just make/collect more data to reduce the sampling variance of the inputs. So you want to regularize the inputs or have some kind of hybrid approach that has a discretionary overlay.
I have some familiarity with the Markowitz model, but certainly not as much as you do about the practical use — could you share notes/articles/talks on the practical use? I’m super interested to learn more.
Read "Advanced portfolio management" by Paleologo (ironically it's actually the introductory one of his two books), or "Active portfolio management" for a more thorough, older, longer book on the topic.
Markowitz isn't really used at all, but Markowitz-like reasoning is used extremely heavily in finance, by which I basically mean factor modelling of various kinds - effectively the result of taking mean-variance as a concept and using some fairly aggressive dimensionality reduction to cope with the problems of financial data, and the fact that one has proprietary views about things ("alpha" and so on)
Black-Litterman model is an example of how to address the shortcoming of unreliable empirical inputs.
You'll also see more ad hoc approaches, such as simulating hypothetical scenarios to determine worst case scenarios.
It's not math heavy. Math heavy is a smell. Expect to see fairly simple monte carlo simulations, but with significant thought put into the assumptions.
Very true, although the off-diagonal terms in the variance-covariance matrix are also hard to estimate, which is a problem, especially when simulating worst case scenarios, which is often when correlations tend to break down.
That's the first thing I thought of. I read the opening of this article and thought "oh this could be applied to a load balancing problem" but it immediately becomes obvious that you can't assume the variance is going to be uniform over time
Doesn't it make more sense to measure and minimize the variance of the underlying cash flows of the companies one is investing in, rather than the prices?
Price variance is a noisy statistic not based on any underlying data about a company, especially if we believe that stock prices are truly random.
A handful of the comments are skeptical of the utility of this method. I can tell you as a physical scientist, it is common to make the same measurement with a number of measuring devices of differing precision. (e.g. developing a consensus standard using a round-robin.) The technique Cook suggests can be a reasonable way to combine the results to produce the optimal measured value.
I'm not a physical scientist, but I spend a lot of time assessing the performance of numerical algorithms, which is maybe not totally dissimilar to measuring a physical process with a device. I've gotten good results applying Simple and Stupid statistical methods. I haven't tried the method described in this article, but I'm definitely on the lookout for an application of it now.
I wonder if this minimum variance approach of averaging the measurements agrees with the estimate of the expected value we'd get from a Bayesian approach, at least in a simple scenario, say a uniform prior over the thing we're measuring and assume that our two measuring devices have unbiased errors described by normal distributions.
At least in the mathematically simpler scenario of a gaussian prior and gaussian observations, the posterior mean is computed by weighing by the the inverses of variances (aka precisions) just like this.
https://en.wikipedia.org/wiki/Conjugate_prior
To add, for anyone who's followed the link - that's entries numbers 1 and 2, or "Normal with known variance σ²" and Normal with known precision τ", under "When likelihood function is a continuous distribution".
Also, note that the "precision" τ is defined as 1/σ².
This seems to be incorrect. The correct way to combine measurements with various degree of precision is to use the inverse variance weighting law
Unless I’m missing something that’s exactly what is proposed:
t_i Var [X_i]] = t_j Var [X_j]
Like a Kalman filter?
This is equivalent to inverse variance weighting. For independent random variable, this is the optimal method to combine multiple measurements. He just used a different way to write the formula and connect that to other kinds of functions.
He also frames it as a different goal too: normally when we (as a physicist) talks about the random variables to combine, we think of it as different measurements of the same thing. But he didn’t even assume that: he’s saying if you want to have a weighted sum of random variables, not necessarily expected to be a measurement of the same thing (eg share same mean), this is still the optimal solution if all care is minimal variance. His example is stock, where if all you care is your “index” being less volatile, inverse variance weighting is also optimal.
As I’m not a finance person, this is new to me (the math is exactly the same, just different conceptually in what you think the X_i s are).
I wish he mention inverse variance weighting just to draw the connection though. Many comments here would be unnecessary if he did.
What a weird way to write the harmonic average.
----
Write v_i = Var[X_i]. John writes
But if you multiply top and bottom by (1 / \prod_{m=1}^n v_m), you just get No need to compute elementary symmetric polynomials.If you plug those optimal (t_i) back into the variance, you get
where `H = n / (\sum_{k=1}^n 1/v_k)` is the Harmonic Mean of the variances.It would be much more readable if AsciiMath[0] is used and still gives you the benefit to render it with MathJax if required.
[0] https://asciimath.org/
There are also ASCII-art ways of writing formulas. An LLM should be able to produce these.
Yeah and this is a much more intuitive way of generalising from the n = 2 case. Weights are proportional to inverse variance even for n > 2. Importantly this assumes independence so it doesn’t translate to portfolio optimisation very easily.
Right, this is known as the inverse variance weighting https://en.wikipedia.org/wiki/Inverse-variance_weighting.
Please will the mods implement maths rendering?? If the source were made available we could do it ourselves.
Once you implement that we’re stuck with it forever. One could just write sum(dy/dx) and be understood in context by one who is knowledgeable enough.
Being 'stuck' with maths rendering is like being 'stuck' with good health. Bring it on?
What else? Grammar checking? XML? Just approximate with ASCII, please.
Your slippery slope makes no sense to me. What do we need XML for here? Is anybody asking for it? You can use your own grammar checker but you can't render your own equations and submit them.
It’s a pretty raw website. You’re better served with an extension. A friend of mine made a Chrome extension we use for block / favorite lists e.g.
Even if you personally had a mathjax extension, you would still be prevented from explaining math to others, unless you could convince everyone to install it.
But you successfully did!
I hope this site does not.
ADDED. Because the new functionality will be used to create cutesy effects for reasons that have nothing to do with communicating math, increasing the demand for moderation work.
Why? Latex is not how maths if supposed to be read, else we'd all be doing that. It's how it might be written.
edit: Nobody is going to use maths for cutesy effects. Where have you ever seen that happen? Downvote them if they do. It is not going to be a big deal.
[flagged]
It’s much clearer when you write these problems in terms of matrix math. The minimum variance portfolio is very important in finance.
How would you write this with matrices? It seems like there are many ways you could generalize.
I realize that this is meant as an exercise to demonstrate a property of variance. But most investors are risk-averse when it comes to their portfolio - for the example given, a more practical target to optimize would be worst-case or near-worst-case return (e.g. p99). For calculating that, a summary measure like variance or mean does not suffice - you need the full distribution of the RoR of assets A and B, and find the value of t that optimizes the p99 of At+B(1-t).
It's hard enough to get a reliable variance-covariance estimate.
If A and B have different volatilities, it's rather counter-intuitive to allocate proportionally rather than just all to the one with the lower volatility... :-/
I agree, and I had to think about it for a second, but now it seems obvious. It works for the exact same reason that averaging multiple independent measurements can give a more accurate result. The key fact is that the different random variables are all independent, so it's unlikely that the various deviations from the means will line up in the same direction.
Yes, I think that's part of the point of the post. One intuition is that allocating only a little bit to a highly volatile asset creates a not-very volatile asset. Investing a little bit is the same as scaling the asset down until it's not very volatile.
The independence assumption means there's value in allocating to the more volatile one, due to diversification.
I wish there was a Strunk and White for mathematics.
While by no means logically incorrect, it feels inelegant to setup a problem using variables A and B in the first paragraph and solve for X and Y in the second (compounded with the implicit X==B, and Y==A).
There are lots of good books on writing mathematics:
1. How to Write Mathematics — Paul Halmos
2. Mathematical Writing — Donald Knuth, Tracy Larrabee, and Paul Roberts
3. Handbook of Writing for the Mathematical Sciences — Nicholas J. Higham
4. Writing Mathematics Well — Steven Gill Williamson
Thanks. Higham explicitly addresses the authors substitution crime in section 2.5. Wonderful resource.
My complaint stems more to the general observation that readability is prized in math and programming but not emphasized in traditional education curriculum to the degree it is in writing.
Bad style is seldomly commented on in our profession.
What's the goal of this article?
There exists a problem in real life that you can solve in the simple case, and invoke a theorem in the general case.
Sure, it's unintuitive that I shouldn't go all in on the smallest variance choice. That's a great start. But, learning the formula and a proof doesn't update that bad intuition. How can I get a generalizable feel for these types of problems? Is there a more satisfying "why" than "because the math works out"? Does anyone else find it much easier to criticize others than themselves and wants to proofread my next blog post?
Here's my intuition: you can reduce the variance of a measurement by averaging multiple independent measurements. That's because when they're independent, the worst-case scenario of the errors all lining up is pretty unlikely. This is a slightly different situation, because the random variable aren't necessarily measurements of a single quantity, but otherwise it's pretty similar, and the intuition about multiple independent errors being unlikely to all line up still applies.
Once you have that intuition, the math just tells you what the optimal mix is, if you want to minimize the variance.
This all hinges on the fact the variance is homogeneous to X^2, not X. If we look at the standard deviation instead, we have the expect homogeneity: stddev(tX) = abs(t) stddev(X). However, it is *not linear*, rather stddev(sum t_i X_i) = sqrt(sum t_i stddev(X_i)) assuming independent variables.
Quantitatively speaking, t^2 and (1-t)^2 are always < 1 iff |t| < 1 and t != 0. As such, the standard deviation of a convex combination of variables is *always strictly smaller* than the convex combination of the standard deviations of the variables. In other words, stddev(sum_i t_i X_i) < sum_i t_i stddev(X_i) for all t != 0, |t|<1.
What this means in practice is that the convex combination (that is, with positive coeffs < 1) of any number of random variables is always smaller than the standard deviation of any of those variables.
> Sure, it's unintuitive that I shouldn't go all in on the smallest variance choice.
Is it?
You have ten estimates of some distance with similar accuracy of the order of 10m : you take the average (and reduce the error by more than half).
If you increase the precision of one measure by 1% you will disregard all the others?
The primary problem with this method is that while one correctly assumes one cannot forecast future returns, it incorrectly assumes one can correctly forecast future volatility of those returns. To be sure, Vol / variance of returns is more predictable than returns, but it's not perfectly predictable.
This is just the observed variance. Which means that you assume that this will be the variance in the future.
Don’t make decisions for evolving systems based on statistics.
Insider info on the other hand works much better.
This is why Markowitz isn't used much in the industry, at least not in a plug-and-play fashion. Empirical volatility, and the variance -covariance matrix more generally speaking, is a useful descriptive statistic, but the matrix has high sampling variance, which means Markowitz is garbage in garbage out. Unlike in other fields, you can't just make/collect more data to reduce the sampling variance of the inputs. So you want to regularize the inputs or have some kind of hybrid approach that has a discretionary overlay.
I have some familiarity with the Markowitz model, but certainly not as much as you do about the practical use — could you share notes/articles/talks on the practical use? I’m super interested to learn more.
Read "Advanced portfolio management" by Paleologo (ironically it's actually the introductory one of his two books), or "Active portfolio management" for a more thorough, older, longer book on the topic.
Markowitz isn't really used at all, but Markowitz-like reasoning is used extremely heavily in finance, by which I basically mean factor modelling of various kinds - effectively the result of taking mean-variance as a concept and using some fairly aggressive dimensionality reduction to cope with the problems of financial data, and the fact that one has proprietary views about things ("alpha" and so on)
Thanks, just bought the book! Some nice reading for the holidays!
Black-Litterman model is an example of how to address the shortcoming of unreliable empirical inputs.
You'll also see more ad hoc approaches, such as simulating hypothetical scenarios to determine worst case scenarios.
It's not math heavy. Math heavy is a smell. Expect to see fairly simple monte carlo simulations, but with significant thought put into the assumptions.
Oooooh this is cool, thanks!
> This is why Markowitz isn't used much in the industry
This may be one reason but the return part is much more problematic than the risk part.
Very true, although the off-diagonal terms in the variance-covariance matrix are also hard to estimate, which is a problem, especially when simulating worst case scenarios, which is often when correlations tend to break down.
That's the first thing I thought of. I read the opening of this article and thought "oh this could be applied to a load balancing problem" but it immediately becomes obvious that you can't assume the variance is going to be uniform over time
Upvoting b/c this comment is true, obviously I disapprove of insider trading.
Doesn't it make more sense to measure and minimize the variance of the underlying cash flows of the companies one is investing in, rather than the prices?
Price variance is a noisy statistic not based on any underlying data about a company, especially if we believe that stock prices are truly random.
Volatility is fairly predictable. Or at least much more predictable than returns
This is also a nice way to combine the ratings of a number of noisy annotators with variable annotations noise.
In computer graphics we call this multiple importance sampling, and it's critical for making robust estimators.
[flagged]