In plain english, the word “random” describes anything that happens without method or conscious decision. However in the world of social science, random is anything but. At least, not in the way you might think. More often than not, social scientists (i.e. economists, political scientists, etc.) use “random” together with another word “variable” – as in “random variable”. What, then, are variables and random variables?, we hear you ask. In a series of posts, we not only explain these terms but also share 3 things we think you should know about random variables in your journey to becoming a data-driven organization. In this post, we demonstrate that random variables are not as random as you may think. Read on!

A Variable Is An Aspect Of A Concept That Can Be Measured

In the beginning of any research project, scientists spend some time clarifying their conception of what they want to study. By the way, a concept refers to an idea in mind. For example, a political scientist wants to understand dictatorial regimes around the world. In this case, she has to specify exactly what about dictatorial regimes she wants to understand. She may say that she wants to understand their durability or staying power. Thus, her concept of interest is durability.

Now that we’ve clarified the concept she wants to study, the next thing is to operationalize it. To operationalize a concept is to convert it from the abstract idea that it is to something that is measurable. For example, to operationalize durability, the scientist could measure how long (in years) a regime is able to hold on to power, how long the ramifications of a regime’s policies last, or even how many public institutions the regime is able to co-opt within its lifetime. Of these 3, let’s assume she uses the first.

The number of years a regime is able to hold on to power is an example of what social scientists call a variable. A variable is any aspect of a concept that can be measured. It is so-called because the measurement can vary with each instance. For example, the 4-year term (1976-1979) of the Pol Pot regime in Cambodia, and the 11-year term (1981-1992) of the Rawlings regime in Ghana are both instances of the concept of durability (of dictatorial regimes). However, the value of the variable (“number of years in power”), which is a measure of an aspect/ a dimension of durability, is not the same in each. Whether a variable adequately reflects a concept is a discussion for another day.

A Random Variable Is A Variable Whose Values Occur According To A Frequency Distribution

A frequency distribution describes a random variable

Now that we know what a variable is, what makes a variable random? Suppose it is late 2021 and a dictatorship has just sprung up in the West African country of Guinea (Conakry). And you are asked how long you think the dictatorial regime will last. You may respond with “I don’t know. I don’t know the factors that influence the lifespan of dictatorial regimes so I cannot tell you” And your answer will be reasonable. That said, the seasoned scientist may make an educated guess by looking up all the dictatorships that have ruled that country in the past and noting how long each of them held on to power. If the country doesn’t have enough past dictatorial regimes, the scientist may look up past dictatorships in the surrounding region or even around the world. He may display the data visually like so:

Number of Regimes Years in Office
5 4
2 11
1 30+

The diagram above is an example of what statisticians call a frequency (or probability) distribution. With this probability distribution, the scientist can make a more informed guess of how long a dictatorial regime will last. For example, he could guess the mode (the most occurring number of years in power), the mean (the average of all the number of years in power), or even the median (the middle number of years in power after he’s arranged the years from least to most). Indeed, scientists often refer to probability distribution as the data generating process (DGP) because to know it is to get a sense of how the concept materializes.

Frequency distributions are helpful for predicting random variables

The use of a frequency distribution to make a good guess of the number of years a dictatorship will be in power is what makes that variable a random one. Indeed, with the help of a frequency distribution, we can make pronouncements like: “95% of the time, a dictatorship will hold on to power for any number of years between 5 and 10”. Or, 25% of the time a dictatorship will stay in power for at least 3 years. If anything, such pronouncements are not at all random (or less so) when compared to responses such as “I don’t know”. We can speak about the variable with relatively more confidence in the former than in the latter.

Having described the usage of the terms “random” and “variable” in our line of work, you can now see, at least we hope so, why random variables are more often anything but random.