In the previous article I wrote about storage of Identity Attributes. We saw that we’ll allow Identity Attributes to be stored on multiple storage nodes but that they should not be stored on all storage nodes. Let’s look at this in more detail.
For this article let us assume:
- We store a total of P identities on M storage nodes. Every identity
has a variable number
identity attributes where p is the unique entity number.
In previous articles on the subject I’ve been using an intuitive math notation that most people would intuitively understand. It is now time to get much more mathematically correct in our notations as we will start to dive into some actual calculus.
Vector Notation
While the earlier notation of is fairly intuitive, it is not a formally correct notation. It implies that I is a scalar value formed of the summation of the function values IA(i). Of course we can not add function values IA(i) into one scalar value as we would really be trying to ‘add’ values like “blue” (i=’eyecolor’), “weakpassword” (i=’facebookpassword’) etc. which obviously isn’t very meaningful. Instead, we’ll switch from the summation notation to a vector notation:
In which we define the identity of entity p to be an n-dimensional vector consisting of the n elements
through
.
Similarly, we can define our storage network as a vector:
An m by n matrix with n rows and m columns will be notated as
Projection of
on 
We can project our identity on our storage network by means of a projection matrix
through a simple transpose of
, making it a 1xn vector, followed by a matrix multiplication with
as a matrix multiplication of a 1xn vector such as
with an nxp matrix like
results in an 1xp vector such as
. Let’s demonstrate how this works by defining a simple identity first:
This is identity 1 holding three attributes, firstname, lastname and email. Let’s now assume our storage network consists of 10 nodes. To project the three IA’s onto our 10 nodes we need a 3×10 matrix, let’s take the following as example:
Multiplying with A results in:
Which is our 10 node storage vector indicating that attribute 1 gets stored on node 1, attribute 2 gets stored on node 2 and attribute 3 gets stored on node 3 while the other nodes contain no IA’s.
So with this particular projection matrix we see that there is no redundancy at all.
Should we choose for A:
and multiply with we would get:
which means to store attribute IA[1] on nodes 1 and 4, attribute IA[2] on nodes 2 and 5 and attribute IA[3] on nodes 3 and 6. We see that we have now introduced a redundancy of 2, meaning that every IA[n] is stored twice on the storage network.
What we see from thise example with two different projection matrices is that we can vary the redundancy by choosing a different projection matrix. It is easy to see that the redundancy varies between 0 and p if p is the number of storage nodes. With a redundancy of p however, all the IA[n] would be stored on every storage node. This means that every node would have access to all IA[n] as well! While we intend to do a one-way encryption on the IA’s to prevent node owner from deanonimizing, we’re fully aware that some data will be deanonimized regardlessly. Encryption is a good way to throw up a barrier for deanonimizing data but it should never be seen as a complete prevention.
- Redundancy should be at least 2
- Redundancy should be < p/2
By the above rules we set a lower limit for the redundancy, guaranteeing that when one storage node drops out for any reason the storage network would still contain a copy of the IA. We also set an upper limit that guarantees that no single storage node will ever have all IA’s of an identity.