#### Plain-language band-aids to fix gaps in your statistics knowledge

Got some gaps in your statistics knowledge? Or perhaps you’re here for tips on how to communicate data science concepts to beginners? Let me do my best to point you towards some to-the-point explanations!

*Note:** Whenever there’s a link, it usually takes you to another article where I’ve explained a foundational concept instead of repeating it here. If there’s no link, I haven’t written the article yet — let me know in the comments if you’re curious about one of them in particular or if you’d like to see a missing term added in. The list is arranged alphabetically. Terms with entries are **bolded**. Enjoy!*

A description of all possible states of the world where you would not want to be taking your **default action**. (Example here.)

Analytics is a subdiscipline of **data science** that is often confused with **statistics**.

Analytics is all about finding good questions, while statistics is all about finding good answers.

The key difference is that analytics is concerned primarily with what’s *in* your **data** while **statistics** is concerned with what’s *beyond* your **data**. Learn more here.

A term that used to mean something else, but these days it’s often used as a casual synonym for machine learning (ML).

“If it’s written in Python, it’s probably machine learning. If it’s written in PowerPoint, it’s probably AI.” — Mat Velloso

For the difference between ML/AI and **statistics**, see the entry on **machine learning.**

Assumptions are ugly band-aids we put over the parts where information is missing. If we knew *all* the facts (and we *knew *that our facts were actually true facts), we wouldn’t need assumptions (or statisticians). Unfortunately, when we have partial information — when we want to know about a whole **population** but we only observe data from a **sample **— the only way we can make the leap beyond our paltry information is to make assumptions.

STATISTICAL INFERENCE = DATA + ASSUMPTIONS

Yup, **statistics** isn’t magic, it’s the art of making assumptions veeeeeery carefully (and mathematically) to make conclusions beyond our **data**.

**Bar Chart**

A way to visualize counts. Like **distributions**, you can think of **bar charts** (used for **categorical data**) and **histograms** (used for **continuous data**) in terms of popularity contests. Or tip jars. That works too.

A statistical school of thought deals with mathematical models of belief. You start with a mathematical description of what you believe and then (via **Bayes’ rule**) discover what you reasonably ought to believe after adding some **data**. The results are highly personal because they’re about *reasonably* updating subjective models of belief — different starting beliefs (called “**priors**”) should give different results (called “**posteriors**”). In Bayesian **statistics**, **parameters** have **probabilities** attached to them, which is heinous sacrilege as far as your typical (**frequentist**) STAT101 class is concerned. The question of whether or not a **parameter** should have a **probability distribution** what the Bayesian vs Frequentist controversy is all about.

A formula that helps you to go from **probability** of checking Twitter when your code is compiling to the **probability** that your code is compiling when you are checking Twitter. Bayes’ Rule is the mathematical underpinning of Bayesian statistics.

Statistical bias occurs when results are consistently off the mark. But that’s not the only definition of bias, so dive deeper here: **selection bias**, algorithmic bias, and other kinds of bias.

**Data** that can take values in two categories, e.g. (Yes, No). When you’re dealing with more than two categories, it’s called **multiclass data**.

The **distribution** that describes the **probability** of a particular number of successes out of a bunch of attempts. Found in the context of binary data.

**Captured Data**

Captured data are intentionally created for a specific analytical purpose, while **exhaust data** are byproducts of digital/online activity.

The propensity of one variable to *cause* another variable to change.

**Data** that takes category values, e.g. Orange, Apple, Mango. **Binary data** is a type of categorical data.

See **cumulative distribution function**.

**Central Limit Theorem**** (CLT)**

The CLT is a handy rule that says that if we compute averages or sums from lots of data, those sums/averages will be **normally distributed**. Learn more here.

**Chi-squared Test**

Something you may want to learn more about if you work a lot with **categorical data**. It’s the classic quick check to see if two categorical **variables** are independent, for example to ask a question like, *“Is the distribution of favorite musical genres the same across all college majors?”* (Or, more morbidly, many STAT101 professors introduce it as, *“Did surviving the Titanic depend on how fancy your ticket was?”* Because we statisticians are a grim bunch.)

**Classical Statistics**

Synonym for **Frequentist Statistics.**

**Cluster ****Sampling**

An approach to collecting **data** that involves deciding on sections/clusters (e.g. schools), randomly selecting several clusters from the collection, then collecting **observations** on all of the units (e.g. students) in those clusters.

The branch of mathematics for counting things, like the number of ways you can seat your wedding guests without offending anyone (0). Here’s my primer on the subject, which will help you understand, among other things, why your combination lock is actually a permutation lock.

**Confidence Interval**

A concept from **frequentist statistics** with a tricky definition. There’s no way around the fact that it’s tricky, so be careful when you interpret confidence intervals (and don’t confuse them with **credible intervals**). A 95% confidence interval means: *“Were this procedure to be repeated on infinite samples, the calculated confidence interval (which would differ for each sample) would encompass the true population **parameter** 95% of the time.”*

**Data** that is obtained by measuring, not counting. Examples: 176.5 cm (my height), 12% (free space on my phone), 3.141592… (pi), -40.00 (where Celsius meets Fahrenheit), etc.

**Convenience Sample**

A **sample** that is nonrandom but the **observations** were convenient to make, e.g. when you put a booth in the airport terminal and ask people walking by to take a survey about air travel… what could possibly go wrong?

The propensity of two **variables** to look like they’re moving together. Learn more here.

Data that can only take non-negative integer values. Obtained by counting things.

**Credible Interval**

The Bayesian cousin of the **confidence interval**. It has the easy interpretation you *wish* a **confidence interval** had. A 95% credible interval is interpreted as *“I believe that the **parameter** lives between here and here with 95% **probability**.”*

**Cumulative Distribution Function (CDF)**

A mathematical formula describing the **probability** of observing a particular value of a **random variable**. See **distribution**.

Stuff someone recorded in electronic form. Or, for the slightly more reverent explanation, read this.

A synonym for **analytics**. The act of finding patterns in your **data** in order to form **hypotheses **or generate ideas.

Data science is the discipline of making **data** useful. Its three subdisciplines are called **statistics**, **machine learning**, and **analytics**. To learn more about the differences between these three areas and how they fit into data science, read this.

Synonym for **observation** and instance.

Synonym for **sample** (collection of **data**).

Exactly what it sounds like — convenient descriptions of various kinds of data you’d encounter in the wild. Many STAT101 classes kick off their first lesson with data types, so if you’re keen to recreate that experience, head over to this article for a guided tour.

A physical action/decision that you commit to doing if you don’t gather any (more) evidence. This is a frequently-overlooked yet super-important concept; you can’t get started with **classical statistics** without it! (Example here.)

**Dependent Variable**

The **variable** (usually Y in our models) we want to predict using some other ones (usually Xs in our **models**, which are our **independent variables**).

**Data** that is obtained by counting, not measuring. Examples: 1 short story, 6 words, 2 baby shoes, 0 times worn, etc.

Think of this as a “**histogram**” of your **population** **data**. The concept is abstract, since we usually can’t observe the **population**.

**Dummy Variable**

Synonym for **indicator variable**.

**Empirical Rule**

If your data are **normally distributed**, (68%)-(95%)-(virtually all) will be found within 1–2–3 **standard deviations** of the **mean**.

**Exhaust Data**

**Captured data** are intentionally created for a specific analytical purpose, while exhaust data are byproducts of digital/online activity. **Exhaust data** usually come about when websites store activity logs for purposes — such as debugging or data hoarding — other than specific analyses.

**Error**

The difference between what we observed in our **data** and what we predicted with our **model**. Another term for error is “residual.” In **simple linear regression**, we assume the errors are **normally distributed**. In a method called GLM, we’re allowed to have more creative assumptions about the errors.

**Estimate, Estimator, and Estimand**

An *estimate* is just a fancy word for ** best guess** about the true value of a

**parameter**(the

*estimand*). It’s the value your guess takes, while an

*estimator*is the formula you use for arriving at that number.

A (**probability**-weighted) average spoken about in the context of **distributions**. Synonyms include: expectation, expected value, mean.

A scientific procedure undertaken to test a **hypothesis** involving **causal** relationships. Core characteristics are randomization into groups and manipulation of those groups by the experimenter (different groups are assigned different “treatments”). Experiments allow you to make causal statements about how two things are related (i.e. in order to be able to say that a medication *causes* an improvement in disease progression, you need to run a well-designed randomized experiment). Until you have clear, measurable, quantitative null and alternative hypothesis statements, as well as a plan for how you will do different things to different parts of the universe at random, what you are about to do is *not *an experiment.

**Exploratory Data Analysis (EDA)**

The act of using some fraction of your **data** for the purpose of generating ideas, forming **hypotheses**, and discovering potentially useful inputs for **machine learning**. All of these inspiring nuggets must be tested on separate data before they can be taken seriously (otherwise you’re cheating).

**Frequentist Statistics**

The kind of approach you tend to see in STAT 101, based on the long-run frequencies you’d see if procedures were to be repeated infinitely many times. Unlike **Bayesian statistics**, you’ll never see the words “belief” or “**prior**” when using these methods. In frequentist statistics, **parameters** never have **probabilities** attached to them — this video will help you make sense of this with a coin toss and personality test.

**Gauss-Markov Assumptions**

Technical **assumptions** you must make in order to use standard **linear regression**. They translate roughly as, *“Assume it’s **normal** and well-behaved.”* (Now you get the joke on my shirt below.) See **normal distribution** and **scedasticity** to learn more.

**Generalized Linear Model (GLM)**

Generalized linear models (GLMs) extend **regression** to situations where the distribution of the **errors** is not **normal**. You may want to learn more about this if you are trying to predict a **categorical response** (e.g. click/no click).

A plot describing the frequency with which things occur in your **data**. The **categories** (or intervals of values) are on the horizontal axis and the height of the bars gives the relative number of times a particular category has occurred in your data. See also: **bar chart** and **distribution**.

A description of how reality might work. H0 stands for **null hypothesis** (all the worlds in which you’d want to take your **default action**), H1 stands for **alternative hypothesis** (all the worlds in which you wouldn’t). (Example here.)

The game of trying to see if your **data** convinces you that your **null hypothesis** is ridiculous and thus that you should stop doing your **default action**. (Example here.)

**Independent Variable**

The **variable** (usually X in our **models**) we want to use to predict another one (usually Y in our **models**).

**Indicator Variable**

A **variable** that takes the value 1 if a condition is met, 0 otherwise. For example, I might record your pet ownership as Cat=1 if you’re owned by a cat and Cat=0 if no cat has claimed you yet. Devs and **ML** folk call the use of indicator variables **one-hot encoding***.*

You’re using ** primary data** if you (or the team you’re part of) collected observations directly from the real world. In other words, you had control over how those measurements were recorded and stored.

Inherited (secondary) data are those you obtain from someone else. The opposite is **primary data**. Here’s my guide to working with inherited data.

Kurtosis is a way to describe the chubbiness of a **distribution’s** tail.

Often a good **model** to use when the **response variable** (Y) is **binary**.

An asymmetric (skewed) **distribution** that features extremely large or extremely small values which are relatively infrequent.

**Machine Learning**** (ML)**

A discipline that’s related to **statistics**, but has a different focus: automation. **Statistics** cares about rigor, inference, and coming to the right conclusion, whereas **machine learning** cares about performance and turning patterns in **data** into recipes that get the job done. Statistics is extremely important in the testing step of an applied machine learning project, since that’s when it’s time to find out whether the prototype actually works.

An average. Used in the context of talking about **samples** (“sample mean”) or **populations** (“population mean”). This is one of the important **moments** (descriptors of the shape of a **distribution**), which is why you see it talked about so often.

The middle thing. Arrange your data from smallest to largest and grab the one in the middle — that’s the median. The median is robust to **outliers** while the **mean** is not.

Mode is pronounced* “the most common value.”* The mode corresponds to the spot where a **distribution/histogram** has its peak. When you hear that a distribution is *multimodal*, it means there’s more than one peak. When a distribution is *symmetric and unimodal*, like the pretty little **bell-shaped curve**, the mode also happens to be the **mean**. If you want to be technically correct, you’d stop saying *“the average Joe”* when you actually mean *“the modal Joe.”*

**Model**

Depending on context, either a fancy word for recipe or a description of how a system might work. For example, here’s a straight line model: Salary *= intercept + slope* *Years_Of_Experience *+ error*

Refers to certain numerical summaries of the shape of a distribution. The average of your data points is called the first moment, the average of squares of your data points is the second moment and so on.

**Categorical data** that can take values in more than two categories, e.g. (Cat, Dog, Parrot, Goldfish, Anteater). When you’re dealing with only two categories, it’s called **binary data**.

**Multiple Comparisons**

The act of testing many **hypotheses.** If you don’t make any special corrections to your methods (**multiple testing correction**) and you claim to be making statistical conclusions, you’re going on a fishing expedition in your data: you will find “something interesting” by random chance even though the results will not be real.

If you’re not doing **statistics** but instead running an **exploratory data analysis** (EDA), you’re in the clear because you know you’re not supposed to take those findings seriously. But if you’re claiming to draw statistical conclusions from this process, what you’re not doing is not statistics — it’s a pantomime that violently misses the point. Just. Don’t.

Always remember to split your **data** so that you’re able to run a lean and controlled test in an unmolested dataset after you’ve fished around in a *different one* for inspiration.

**Multiple Testing Correction**

That said, if you wish to test multiple **hypotheses** the valid statistical way, you can… but there’s a price.* You must make adjustments (start by reading up on *Bonferroni correction* (the simplest, strictest, and most data-expensive option) and then progress to its cousins), otherwise the results that you think are “**statistically-significant**” will turn out to be embarrassingly fake.

*Oh, and that price is pretty steep — you’ll need much more **data** to get the same quality of result. Don’t test multiple **hypotheses** unless you’ve got a really good reason for it.

Like **simple linear regression**, except now you’re allowed to use more than one **predictor**, e.g. using both Experience and Education to predict Salary.

**Multivariate Data**

When your measurements are too complicated for a single number, e.g. the set of measurements a tailor needs in order to create a good custom suit for you. Data that’s not multivariate is called **univariate** (fits in a single number, e.g. your height).

**Multivariate Regression**

Like **multiple regression**, except now you’re predicting a **response** that’s **multivariate** — your Y is a vector now, not a scalar.

**Nonparametric Statistics**

Methods which do not require you to make **assumptions** about which **distribution** your data come from.

Happens when your targeted participants omit their responses. For example, you set up a booth to ask people how their day is going and many of the people who are having a rubbish day scowl at you instead of taking your survey. As a result, the **data** you record are too cheerful thanks to nonresponse **bias**.

The symmetric, bell-shaped distribution found often in nature and wherever we see sums/averages. See **Central Limit Theorem**.

All possible worlds in which you’re happy to take your default action. (Example here.)

A single item in the **sample**.

**One-Hot Encoding**

The use of **indicator variables**.

Unusual data points or data points which are unlikely to have been generated by the process responsible for the bulk of the data. What should you do with them? It depends…

The probability of obtaining a **sample** at least as extreme as the one we just observed when assuming the **null hypothesis** is actually true. That’s a mouthful, so I made the video below to help you wrap your head around it.

To calculate a p-value, you need to know what the **CDF** looks like under the **null hypothesis**. The smaller your p-value, the more ridiculous your **null hypothesis** looks.

A summary measure of a **population**.

See **probability density function**.

**Poisson Regression**

Often a good model to use when the **response variable** (Y) can only be a nonnegative integer. See also: **count data, response, GLM**.

The collection of all items we are interested in.

**Posterior**

The belief you end up having when you add data to your **prior** (starting belief). If you see this word, you’re in **Bayesian statistics** land.

Power is the probability of rejecting the **null hypothesis** if it is false (i.e. of changing your mind if that’s the right thing to do). Along with **significance level**, it determines the quality of your **hypothesis test**.

**Prediction Interval**

An interval giving a plausible range for the next value we might observe. A common statistics gotcha is that people think this is what the **confidence interval** does — nope, that’s the prediction interval: it’s wider than the **confidence interval**, meaning that you should be a lot less sure about where the next **observation** will land than where the **population parameter** will land. Which makes sense intuitively too— I’d be a lot more surprised to discover that the average height of all males in a city is 6’4” than to discover that the height of some random next dude I see in the grocery store is 6’4”. One of these would make a headline, the other would make a shrug.

**Predictor**

Another word for an independent (X) **variable**. Predictors are observed **data**.

**Primary Data**

You’re using **primary data** if you (or the team you’re part of) collected **observations** directly from the real world. In other words, you had control over how those measurements were recorded and stored. If you didn’t, we call that secondary **(inherited) data**.

**Prior**

A starting belief written as a **distribution**. If you see this word, you’re in **Bayesian statistics** land.

*P(X=4)* would be read in English as *“The probability that my die lands with the 4 facing up.”* If I’ve got a fair six-sided die, *P(X=4)*=1/6. But… but… but… what is probability and where does that 1/6 come from? Glad you asked! I’ve covered some probability basics for you here, with **combinatorics** thrown in as a bonus.

**Probability Density Function (PDF)**

A mathematical formula describing the relative **probability** of observing a particular value of a **random variable**. If the **random variable** is not **continuous**, technically this should be called a probability mass function and it should not exist at values the **random variable** can’t take.

**Q-Q Plot**

A visual testing tool for checking **distribution** **assumptions**, which compares the **data** you got with the **data** you’d tend to get from the **distribution** you’re interested in. Also, it often makes you cry, but that’s not why it’s named that.

**R-Squared**

Proportion of the variability in Y explained by X. In **simple linear regression**, this is the square of the **correlation** between X and Y. Here’s a fun trick: chances are you’re pretty bad at intuiting **correlations**, but you’re pretty good at intuiting R-squared as a performance grade… so you can use this like a magic trick in front of your friends. Try it out on guessthecorrelation.com — instead of guessing the **correlation** itself, guess R-squared by assigning a “percentage grade” (like a teacher grading a student) to how well you’d say X “captures” Y, then take the square root. (Don’t forget to insert a minus if the cloud of points is moving down and to the right.)

A random variable (R.V.) is a mathematical function that turns reality into numbers. Think of it as a rule to decide what number you should record in your dataset after a real-world event happens.

Many students confuse random variables with random variates. If you’re a casual reader, skip this, but enthusiasts take note: random variates are outcome **values** like {1, 2, 3, 4, 5, 6} while random variables are **functions** that map reality onto numbers. Little *x* versus big *X* in your textbook’s formulas.

A **dataset** that’s exactly in the form it was collected in — no cleaning or **transformation** has been done to it.

**Statistical** methods involving fitting linear **models** to **data**. Usually, the goal is prediction or **hypothesis testing** about **correlations**/relationships. See **simple linear regression****.**

A **sample** which accurately reflects the characteristics of the population.

**Residual**

Synonym for **error**.

**Response**

Another word for the **dependent (Y) variable**.

Happens when your targeted participants lie in their responses. To enjoy some instant response **bias**, put on an ugly hat and ask your coworkers whether you look good in it.

A subgroup of the **population** of interest.

**Sample**** size**

The number of **observations** (**data points**) in your **sample**.

The act of drawing **observations** from a **population**.

**Sampling Frame**

The list of all items from which we can draw our **sample** **observations**.

**Scedasticity** (sometimes also spelled **skedasticity**)

The ugliest possible word we could have picked for a concept that asks, “Is the **distribution** of **errors** the same everywhere?”

You’ll see this word in the context of **linear regression** 101, where we’re asking one of the diagnostic questions about whether the **Gauss-Markov assumptions** are satisfied (if they are, we can proceed with **simple linear regression**, and if they aren’t, *sad trombone*). If the scatter of **errors** around the line looks like a sausage (same width of scatter everywhere), you can say, “Whew, the errors are **homoscedastic**.” If the scatter looks more a fan or an orchestra of trumpets or a python that has swallowed an antelope, we declare the **errors **to be **heteroscedastic** and turn to **GLMs** instead of **simple linear regression** or find a clever way to **transform** your **data**.

Happens when your **sampling** method prefers some participants to others. For other definitions of **bias**, see this list.

**Significance Level**

The largest **probability** of **Type I error** you’re willing to tolerate. Along with **power **and **sample** **size,** this is a massively important knob you use to control the quality of your test.

**Regression analysis **with just one** response variable **and one **predictor**, meaning that you’re just fitting a straight line through your data.

A completely random draw. In this **sampling** scheme, drawing any permutation of items from the **population** is equally likely.

**Simpson’s Paradox**

An aggregation paradox where disaggregated **data** looked at separately for each group points towards conclusions diametrically opposed to what the aggregated **data** would show.

Exactly what it sounds like: a measure of the asymmetry of a **distribution**. For a handy mnemonic to help you remember which is which on positive and negative skew, read this.

A measure of dispersion. Tells you how far your **data** points are from their **mean**. For more info, see this article. This term is used in the context of talking about **samples** (“**sample** **standard deviation**”) or **populations** (“**population standard deviation**”). **Standard deviation** is one of the important descriptors of the shape of a **distribution**, which is why you see it talked about so often. I cover it in more detail here.

A summary measure computed from a **sample**. In other words, any way of mushing up your **data**. The way we use the terms these days, **analytics** is the discipline that’s about calculating **statistics**, but **statistics** is all about going *beyond* those **data** mushups — an Icarus-like leap into the unknown (expect a big splat if you’re not careful). Learn more here about the subdisciplines of **data science**.

**Statistically Significant**

Doesn’t mean what just happened is “significant” in the eyes of the universe. This is a technical term that simply means that a **null hypothesis** was rejected.

The science of changing your mind under uncertainty. For an 8min intro to the discipline, read this.

**Stratified Sampling**

Dividing your **population** into categories, e.g. statisticians and non-statisticians, and then taking **simple random sample** of a desired size from each category (e.g. 100 statisticians and 100 non-statisticians at random).

Structured data are neatly formatted for analysis. Most of the **datasets** you’d work with in a **statistics** class are structured, whereas in the wild, there’s plenty of **unstructured data** (**data** that needs *you* to put structure on it).

If we’re pedantic about it, there’s no such thing as unstructured data (since by being stored, they’re necessarily forced to have some kind of structure), but let me be generous. Here’s what the definition intends to convey: **structured data** are neatly formatted for analysis, while unstructured data are not in the format you want and they force you to put your own structure onto them. (Think of images, emails, music, videos, text comments left by trolls.)

That thing you call unstructured data is just data that needs *you* to put structure on it.

**Systematic Sampling**

A systematic selection process, e.g. constructing your **sample** out of all of the IDs which are divisible by 99.

**Time Series**

A time-indexed **dataset** where sequential order matters. For example, if we’re recording your hours of sleep today and my hours of sleep today, we can write those in any order — it doesn’t matter if we write yours first or mine first. But if we’re recording your sleep over a few days, we’d be losing something if we shuffled those.

**Transformation**

Taking a **variable** (column of your dataset) and applying a function (e.g. the logarithm) to all the values in that column to make a new **variable**.

Falsely rejecting the **null hypothesis**. (Equivalent to “convicting an innocent person.”) Related to the concept of false positive, but it’s not quite the same thing (in the same way that a **prediction interval** is different from a **confidence interval**).

Incorrectly failing to reject the **null hypothesis**. (Equivalent to “failing to convict a guilty person.”) The probability of this is equal to one minus Power. Related to false negative, but not quite the same thing (in the same way that a **prediction interval** is different from a **confidence interval**).

Correctly rejecting the wrong **null hypothesis**. In other words, using all the right math to solve the *wrong *problem!

**Undercoverage Bias**

Happens when you can’t reach your whole **population** (e.g. when your survey can only be seen by people who have computers, but you want to make a statement about the **population** of all human adults).

**Uniform Distribution**

The **probability distribution** that is shaped like a brick: all outcomes have equal probability associated with them. Someone asked me to include a photo of myself playing this distribution the way I did the **normal distribution** above, and this is the best thing I could come up with:

**Univariate Data**

When your measurements fit in a single number, e.g. your height. If it’s not univariate, it’s **multivariate, **e.g. the set of measurements a tailor needs in order to create a good custom suit for you.

This is the square of the **standard deviation**. Used in the context of talking about **samples** (“sample variance”) or **populations** (“population variance”). This is one of the important **moments** (descriptors of the shape of a **distribution**), which is why you see it talked about so often.

**Variable**

Casual usage: a column of your **dataset** if your **dataset** is formatted the polite way. Formal usage: see **random variable**.

**Volunteer Sample**

A **sample** where the respondents are different from the population because they opted-in (e.g. if we want to measure willingness to lend a hand and we ask people to help out by taking our survey, then we are collecting a volunteer sample and we’d expect that the respondents are more willing to help than the nonrespondents). A recipe for **nonresponse bias**.

**Z-Score**

Allows you to compare quantities measured in different scales, for example *‘Among long jumpers, Melanie’s best jump is twice as “exceptional” as Yufeng’s marathon time is among marathoners.’*

z-score = (thing – thing’s mean) / (thing’s standard deviation)

https://towardsdatascience.com/stats-gist-list-an-irreverent-statisticians-guide-to-jargon-be8173df090d