Sunday, August 19, 2007

It all has to start with I, doesn't it?

It always has to start with the self. The self is the center of the world in the brand new avatar of the Internet. While it feels gratifying to be acknowledged as The Master of the world, I would perhaps have been more comfortable just having the royal seal at my disposal. However, idempotent as we might be, we have to realize that in the increasingly fragmented world, we need better techniques of establishing ourselves. The self needs better means of self-expression and self-authority. And, thus, my first blog post in my new technical blog starts with a discussion of identity management systems on the Internet.

A discussion of identity management systems has to start with the Laws of Identity, penned by the grand daddy of all-things-identity at Microsoft, Kim Cameron. Unlike what people would expect, the laws are not written in a technical language with complex cryptographic equations making them esoteric, but rather in a very accessible language because they talk more about the philosophical aspect of identity rather than the technical, a very important consideration in the design of a mature technical system. The seven laws (over-simplifying them) are:

  1. User Control and Consent: The user is the King, the Queen and the Jack. The identity meta-system must recognize the user as being the final authority on whether he wants information to be disclosed, and ask him/her at every instance. It should also have means of protection against phishing and other attacks.
  2. Minimum Disclosure for a Constrained Use: Information disclosed should be the minimum required for the completion of the current task. Essentially, there should be no need of disclosing credit card information if you try to comment on this blog. Also, if a site just needs the single bit information whether a person is above 18 or not (as many do!), they should not ask for the date of birth, since that means divulging more information.
  3. Justifiable Parties: This is from the experience of the failure of the over-arching vision of the Microsoft Passport identity management system. The law states that there should be a justifiable need for an identity provider and its interactions to have identity information. Essentially, there is no need to unify my Social Security Number of Tax Identification Number with my MySpace account. Users may not be very comfortable having one identity system for all uses. I may not want to divulge my company identity when surfing objectionable material online.
  4. Directed Identity: This, to me, seems like a corollary to the laws 2 and 3 above. It says that there should be unidirectional identity handles which don't reveal more information about the identity than that required. For instance, if my employer allows me to ex-officio access IEEE Journals, IEEE should not be able to get my identity handle, except for the information that I work for a particular company which allows me access. Also, identity providers should be like 'beacons' emitting identity information as allowed by the users, but establishing an identity relationship with it should be a uni-directional identity relationship. This is essentially to prevent correlation of identity-handles. Cookies are an example -- while a cookie might authenticate a user in a widget, cookies cannot be shared across sites to avoid correlation. Of course, there can be ways to defeat this purpose and those are essentially the instances that are undesirable.
  5. Pluralism of Operators and Technologies: Cameron states that one single monolithic system can never be enough for all our identity needs. A person might definitely want to have separate providers (Windows Domain Authentication, Open ID, Paypal) and technologies (Kerberos, Web Services) for different use-scenarios and may not want to correlate them for obvious reasons.
  6. Human Integration: Cameron makes the point that we need better design of UI to prevent identity theft and ensure privacy during the interaction of the human and the terminal on which they authenticate themselves. There can be many a slip between the cup and the lip, and this is becoming all the more apparent thanks to phishing and other kinds of attacks. We need better methods to prevent identity systems masquerading as others, and more secure means of communication between the user and his terminal for identity information exchange (biometrics?).
  7. Consistent Experience across Contexts: Cameron tries to make a point for a universal identity information entry interface across the various kinds of identities we might like to maintain (professional, personal, financial), but the point seems more for Windows Info Card (I'll talk about that later). It seems inspired by our carrying different kinds of identity cards in our wallets, such as the Driving License, employer ID card and so on each of which have the same experience (show the card and gain access).

It is great to have somebody's wisdom and experience captured so concisely in a set of seven rules. That is what lets us stand on the shoulder of giants and build bigger and better technologies.

The laws seem simple, intuitive and practical, and are extremely general. I think that is its biggest undoing -- since they do not give formal semantics of the laws in a mathematical language, it is very easy to have ambiguity and doubt in terms of their interpretation. (A mathematical formulation of something as general as identity is not very easy either). Also, since they are written in such general language, there can be very loop holes and an actual identity system would have to do a lot of thinking to make them very robust, secure and private. I would only request Cameron to explore writing more formal means of expressing these laws and have extensive case-studies (I may not have looked very carefully for them) and have more extensive discussion about privacy, security and so on -- concepts that are becoming very pertinent by the day. I would also like to see more discussion from the perspective of the identity system -- things such as identifying bots, using captchas, and establishing authenticity of information a user enters (is the user really over 18?). He should perhaps consider writing a book!

A theoretical discussion of identity systems is not of much use, so I would endeavor to discuss some systems in use today. The simplest by far is the simple login password form backed by a text file/database that you can implement in under an hour. My guess that is a pretty robust solution for most simple sites. The downside is a registration process and the need of remembering one more set of usernames and password. The fact that most of us practically use the same usernames and passwords for every site is a matter of convenience as well as a significant security threat. If any one of the sites of compromised (which is very much possible because such under-an-hour hacks can not possibly maintain the highest standards of software quality), the risk of all your accounts being compromised is quite high. Also, it is very difficult to ensure consistent interfaces, and security of transactions. Varying privacy policies might well mean that the user control on the information s/he has divulged to one party is rather suspect. However, they serve their own purpose. This method is quick and dirty -- and works well in a rather large number of scenarios.

Of course, identity is very well understood in an enterprise setting. Kerberos and Lightweight Directory Accesss Procotol (LDAP) have been around for ages and have been the subject of a lot of research. There are standard implementations that can be used like a black box, and single sign-on within a single enterprise is probably a well-solved problem (that is a rather speculative statement). It is a much easier problem also because if we consider the scope of privacy and security etc. is a single enterprise intranet and the problem as well as their solutions are primarily technical. If, however, we consider a federated identity management system for the whole of internet, the scope is much larger, and the deliberations are not just technical, but philosophical as well, since it involves trust between parties who don't trust each other :)

Another concept that tries to ensure convenience is Open ID - a federated identity management system. The aim is simple -- to use identification information on one site to automatically establish it for some other sites. For instance, if you have Wordpress blog and you want to leave a comment at LiveJournal, you can provide your Wordpress blog URL and LJ automatically uses Web Services to establish identity. There is a user-consent phase and since it is not controlled by a single party, it is preferred by many (unlike Passport). The scheme works well for simple single sign-on areas which are public facing. This has recently been backed by AOL and Microsoft which has lent a lot of weight to the OpenID system. However, the system only establishes a basic protocol. The Open ID site unequivocally states that it is not a trust system and doesn't try to control spam. I would also be worried about using it in a general setting because if one site gets compromised the taint can spread across the federated system (this probably needs to be studied more). Another problem is that, since Open ID itself is rather vague about security and a number of other points, I very much envisage individual corporations coming up with their own standards (much like Javascript) which would yield a number of child-protocols perhaps not interoperable.

Microsoft is promoting the Windows CardSpace (nee Information Card and many other names). This follows the common practice of lifting paradigms from the real world into the virtual. A user can have a number of cards provided by various Identity Providers which Windows would save securely. When a website (Relying Party) wishes to establish the identity of a user, he would be presented with a secure dialog where he can choose which identity information to transmit, much like you looking into your wallet and taking out either your business card or your Driving License as required. Microsoft provides a number of cryptographic protocols which form the bedrock of secure transmission, and the initiative can not be successful without the participation of the other parties involved (one of the biggest problems due to intense competition). I am sure it would satisfy Cameron's laws since Cameron would have been obviously involved in the development process. However, I can very easily foresee myself lifting the problems from the real world as well -- what happens when my wallet gets lost (laptop stolen, or even virus infected), people cheating about credentials, Relying Parties passing information around (that could compromise the whole system!).

On the Internet itself, identity for very specific applications has been worked out to a little extent. Paypal and Google Checkout establish your identity with respect to financial transactions, and have become hugely popular. One of the oldest technologies on the internet (email) still remains the most popular means for establishing your identity in the online realm. How much progress have we really made in the last decade or two?

Considering that identity is a problem which is not well solved even in the real world completely, my guess the virtual world will only lag behind. There are a lot of new technologies, ideas and we have to wait and see which ones click. However, my humble guess would be that as Cameron himself proffers that there should be a pluralism of operators and technologies. The application and the usage scenario should be clearly delineated before starting to design any system (which is so true!) and it is easier and viable to solve specific needs (financial identity, enterprise setting). Scoping the usage always makes the problem tractable and leads to success (perhaps after a few iterations). My concern is that none of the current technologies clearly scope their work and that would be my biggest gripe.

[Another review of identity related technologies at Read Write Web. There is a conference Internet Identity Workshop as well. If you want a fleeting identity to login to sites which unnecessarily want login, you can check out Bug Me Not. Thanks to Mohit for some initial pointers.]