Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

What is the jaccard similarity measure?

Data Miningdata mining
0
Posted

What is the jaccard similarity measure?

0
Chirag Patel

The Jaccard similarity (Jaccard 1902, Jaccard 1912) is a common index for binary variables. It is defined as the quotient between the intersection and the union of the pairwise compared variables among two objects.

Equation of the Jaccard similarity

In the equation dJAD is the Jaccard distance between the objects i and j. For two data records with n binary variables y the variable index k ranges from 0 to n-1. Four different combinations between yi,k and yj,kcan be distinguished when comparing binary variables. These combinations are (0/0), (0/1), (1/0) and (1/1). The sums of these combinations can be grouped by:

  • J01: the total number of variables being 0 in yi and 1 in yj.
  • J10: the total number of variables being 1 in yi and 0 in yj.
  • J11: the total number of variables being 1 in both yi and yj.
  • J00: the total number of variables being 0 in both yi and yj.

As each paired variable belongs to one of these groups it can be easily seen that:

J00 + J01 + J10 + J11 = n

As the Jaccard similarity is based on joint presence, J00 is discarded.

The Jaccard dissimilarity is defined as dJAD = 1- dJAS.

In some cases the Jaccard similarity is computed as dJAS=2dBCD/(1+dBCD), where dBCD is the Bray–Curtis dissimilarity. This equation does not reduce values to binary states. Thus, results are different when using on the one hand a presence/absence matrix and on the other hand a count matrix. The results are the same, when the count matrix is converted to a binary matrix beforehand.

 

What is your question?

*Sadly, we had to bring back ads too. Hopefully more targeted.