**Association Rules and the buying of socks**

*when Data Mining is used for Target Marketing*

*Reading Time: 5minutes*

*Post published on 02*

**/**12/2020*by*

**Donata Petrelli**and released with

*licenza CC BY-NC-ND 3.0 IT**(Creative Common – Attribuzione – Non commerciale – Non opere derivate 3.0 Italia)*

*Title Image credits by Luis Otto on Unsplash*

It’s no longer a secret that I love shoes with high heels and, since it’s winter, I often need to buy socks! “Yes, okay… but what does this have to do with the article?” you will say…

Because while I buy socks online often happens a fact: find the advertising of slippers … So, thinking that even you may happen to find ‘special’ tips while shopping, I wanted to explain it in this article for those curious to know why ðŸ™‚

The correct association of products advertised during the purchase phase of a customer is in fact a science and not a simple marketing operation driven by the taste of the seller as you might initially think. It is called **Market Basket Analysis** and is based on **Data Mining** techniques.

Whether you are users or administrators of e-commerce sites, in this article is illustrated the logic behind a well done marketing, which starts from the analysis of buying habits of customers to find an association between the products purchased.

Enjoy reading!

**What Data Mining is for**

There are many situations in which the correct **analysis of data (Big Data)**, through techniques and methodologies of **Data Mining**, allows to obtain that information useful to make optimal decisions and therefore strategic for the success of an initiative or activity.

Some examples. A company that sells its products online and wants to know the buying behavior of its customers in order to carry out a personalized marketing based on the user profile. A financial company that works with investment portfolios and wants to compare the performance of a security with that of other companies.

In these cases, and in many others, it is necessary to analyze the buying behavior and find those rules for which certain products are often bought together. This is dealt with by the market basket analysis, thanks to which it is possible to make targeted marketing campaigns in the first case or to examine optimal financial assets (Stock picking) in the second.

In technical terms, given a very large set of data (Big Data), represented by items purchased online as shoes or stocks, we have to find those patterns that reveal an affinity between the different products and that, otherwise, would be hidden and, in many cases, even unimaginable.

Among the techniques of Data Mining, **Association Rules** are those algorithms that serve to discover relationships or patterns hidden in very large data sets (Big Data). Let’s see how they work.

**Association Rule**

When we talk about **Association Rules**, we mean a relationship where the purchase of a **product X** follows that of **product Y**.

In technical language,** a rule of association has the form**:

where

- X, Y are objects or groups of objects
- X implies Y.

It is an **algorithm that identifies these rules within very large data bases** and does it through two steps:

- It finds all sets of items purchased together (items) and that are repeated with a significant frequency. Imagine these sets as the products that are inside a shopping cart. The first step of the algorithm finds the frequent itemsets.
- From this set of frequent items, it generates rules. It finds the relationships between the purchased items and, of these, considers the most relevant.

In order to better understand how the algorithm works, we need to define the measures it uses. And they are:

**Support**. Measures the**statistical relevance of the rule**and is defined as:

From the formula is therefore obtained:

**Support ( X -> Y ) = p (Xó …’ U Y)**

That is, the joint probability of events X and Y

**Confidence**. It measures the**significance of the rule**and is defined as:

From the formula you get therefore:

**confidence (X -> Y) = p(Y|X) = p (Xó …’ U Y)/p(X)**

i.e. the conditional probability.

After these definitions, we can say that the AR algorithm finds all the rules for which:

- The support is greater than a minimum value:
**support >= minsup** - Confidence is greater than a minimum value:
**confidence >= minconf**

**Association rules in practice**

At this point we can study the case so when I go to buy socks, I get tips for slippers. Surely, behind the e-commerce site has been made a market basket analysis based on the algorithm of **Association rules **of this type.

The first step is to **extract frequent items** during online purchases by periodically analyzing transactions.

We suppose that in a given period you have the following transactions:

At this point we calculate the support of items as fractions of transactions containing it.

Fixed a threshold of 50%, the algorithm considers only those with threshold >= 50%. Therefore:

The others are excluded because they do not reach the minimum support because they appear only once out of a total of four transactions (25%).

At this point we proceed with the extraction of the rules. Then you set a minimum MINCONF confidence value and find the rules with confidence >= MINCONF.

We set the minimum 50% confidence value:

- confidence rule A C is: Support {A,C} / Support {A} = 66,6%

- confidence rule C A is: Support {A,C} / Support {C} = 100,0%

and generate Rules:

**A C support 50%, confidence 66,6%****C A support 50%, confidence 100%**

The result of the algorithm is that the purchase of action C is followed by the purchase of action A, that’s why when I buy the socks I also find the slippers!

**Conclusion**

Association Rules are used for:

- Identification of dependencies
- Identification of classes
- Description of classes
- Identification of outliers/exceptions

In addition to retail, Association Rules are applied in many other fields such as:

- Text mining and web analysis
- Documents, news group, …
- Intelligent query answering

In my work are used within many models for data analysis and decision support. In particular the analysis and management of markets, risk and fraud detection and management.

Do you use it? How?