Wednesday, December 26, 2007

Honey Bee Algorithm for Allocating Servers

I came across an interesting article in The Hindu (see the story from GaTech news; I couldn't find the link on The Hindu website) today which described work done by Sunil Nakrani and Craig Tovey, researchers in GaTech, on using a decentralized profit-aware load balancing algorithm for allocating servers for serving HTTP requests for multiple hosted services on the Web. The interesting, thing is that the algorithm is based on how Honey Bees in a bee-hive decide where to collect nectar from. I decided to take a look at the paper.

Essentially, the forager bees collect information about how profitable a particular nectar source and how much is the cost involved in collecting from that source (round trip time). Based on a composite score, they perform a waggle-dance which essentially indicates what is the value of performing foraging where they have been. The inactive foragers can thereafter figure out where to go look for nectar.

The researchers modeled it in the server space by having an advert-board, where servers post profits from serving a request and the time required to serve it. Thereafter, the other servers can choose which colony (or service) they wish to be a part of. Existing servers can also move to another colony based on a probability determined from a look-up table indexed by the ratio of their profits by the profits of their colony.

Their results indicate that they do quite well compared to optimal-omniscient strategy (which knows the pattern of all future web requests) and better than existing greedy and static assignment strategy. Shows that we still have a lot to learn from nature!

One thing that flummoxed me though was that the original paper seems to have been published way back in 2003 (see Tovey's publication page). I wonder why it got press publicity only now.

[The paper also cites a Harvard Business Review paper titled Swarm Intelligence: A whole New Way to Think About Business]

Wednesday, December 19, 2007

Fran Allen and the evolution of HPC

I had the good fortune of being able to listen to Fran Allen (IBM profile and Wikipedia entry) today. Fran pioneered a lot of work in compiler optimization and was awarded the Turing Award for her contribution to Computer Science in 2007. That makes her the first and only (till date) woman to have won the Turing Award, the highest honour in Computer Science.

It was inspiring listening to her talk about her adventures. She almost described the evolution of high performance computing, with the earliest IBM systems starting from Stretch, which was supposed to be 100X faster than the existing machines (but turned out to be only 50X faster) and was delivered to the National Security Agency. She also described some of failures she had been involved in (Stretch, since it was 2X slower than intended, and then the ACS project). What was interesting was that most of the basis of the pioneering work she described had its basis in the work she had done in these failed projects. The fact that failures are the foundations of mammoth successes is one message she clearly drove home with her optimistic outlook. She also described her work during the System 360 and the PowerPC projects.

Her appeal to computer science researchers and students was mainly about the programming models and architecture decisions revolving around multi-core, a buzz word most of us have been left confused and wondering about. This new revolution that promises to change the way we write software and exploit parallelism in our programs, is the biggest opportunity, as Fran put it!

What was also interesting was how we got to the lecture hall wading through mud in a construction site during the rain. Apparently, there are two conference halls at IISc -- one called JRD Tata and the other JN Tata. No wonder, we got to the wrong one and found that it as hosting a conference on Power Electronics. Note to self: make sure you always check the location properly before setting forth.

Other than that, have been lately busy with hacking Python for S60 (this is a brilliant idea-- having a platform agnostic scripting language!) to work on my phone and a Python based remote administration toolkit. Will post more about them soon!

Technorati Tags: , , , ,

Wednesday, October 10, 2007

Some Observations on Social Networking

I took a rather long break from reading technology news while I was in London and Abu Dhabi and just got back to reading some of the things I used to follow closely. While it seems to me that I didn't miss much (on looking at Techcrunch), I did find it extremely interesting to read the analysis presented on Chris Anderson's blog.

In one of his posts, he states in no unclear terms that social networking ought to be leveraged in innovative ways everywhere rather than have soc-nets which are for the sake of soc-nets themselves. This post also captures what is wrong with a new soc-net coming up everyday. There is great scope of using social-networking as a feature in your product, making it viral, making it authentic and useful. Unfortunately, a host of internet startups seem to only want to create social-nets without giving a thought to how it can be useful. Reminds me of the heydeys of the internet age, when it had become fashionable to start a new business that had a .com suffix even if all you did was just register a company. Something very similar seems to be happening to social-nets now -- people just want a social networking component to everything without giving a thought to how it is making sense in the context of the product. That is the point. If you want your product to be useful in the new world, you need to have a clearly thought out social component to it, because otherwise somebody else will add it and peddle your own product better.

Another observation which I found interesting was that facebook applications (which are a rage now-a-days) don't seem to exhibit the pareto distribution characteristics which is what Anderson uses to explain the Long Tail. He attributes it mainly to the effects of viral social networking, and that most apps are pretty much useless. It might also be a case of "limited" shelf real estate. I agree with the first two points, but not so much with the third because that is also true of book in my personal bookshelf. The Long Tail is typically defined not at the end of the consumer but rather at the end of the producer/seller. Facebook as a marketplace is governed by the same arguments as Amazon because there are good search tools, and strong recommendation channels. Facebook has a strong collaborative filtering engine though which is stronger than Amazon's primarily because it is more explicit. I, personally, would believe that as a marketplace of Apps, facebook is fairly new in the game, and it may not be in the 'quiescent' state as yet, and starting to model based on the pre-mature data we already have might not give us correct indicators. Currently, it is just seeing a massive growth which is characterized by almost factory-produced similarity. The apps on everybody's profile are very similar. However, my guess is that as things start to settle down, we will start seeing a lot more differentiation (and usefulness!).

To use a metaphor, currently, it is signing on children who are just discovering the ABCs of facebook apps, and hence the apps that have become very popular are like children's books which are characterized by sameness. However, as the books become more mature and deeper, people will resort to have things that are more tuned into what their personal interests are. Also, the people will demand books that are useful in their content and perhaps design. That is when we will start seeing the Long Tail effect kick in. In a similar fashion, the early days of the internet were dominated by technology centric pages and it was only much later in its evolution that the internet became the downtown for all kinds of information.

However, the different dynamics of the Facebook social network will mean that the marketplace characteristics are never going to be identical to those of (say) Amazon. Since there is a far more stronger viral network, if we view the network as a graph, we will see a lot more cliques. Users would be far more tempted to install apps which are common to their friends networks. However, if we look at these graphs at a higher level of granularity, there might be less ineraction across cliques as compared to traditional retailers like Amazon -- we will see more isolated vertices if seen from far away. The fact that installing apps (for now) is free, will also alter dynamics considerably.

It will be interesting to see how things pan out.

Sunday, August 19, 2007

It all has to start with I, doesn't it?

It always has to start with the self. The self is the center of the world in the brand new avatar of the Internet. While it feels gratifying to be acknowledged as The Master of the world, I would perhaps have been more comfortable just having the royal seal at my disposal. However, idempotent as we might be, we have to realize that in the increasingly fragmented world, we need better techniques of establishing ourselves. The self needs better means of self-expression and self-authority. And, thus, my first blog post in my new technical blog starts with a discussion of identity management systems on the Internet.

A discussion of identity management systems has to start with the Laws of Identity, penned by the grand daddy of all-things-identity at Microsoft, Kim Cameron. Unlike what people would expect, the laws are not written in a technical language with complex cryptographic equations making them esoteric, but rather in a very accessible language because they talk more about the philosophical aspect of identity rather than the technical, a very important consideration in the design of a mature technical system. The seven laws (over-simplifying them) are:

  1. User Control and Consent: The user is the King, the Queen and the Jack. The identity meta-system must recognize the user as being the final authority on whether he wants information to be disclosed, and ask him/her at every instance. It should also have means of protection against phishing and other attacks.
  2. Minimum Disclosure for a Constrained Use: Information disclosed should be the minimum required for the completion of the current task. Essentially, there should be no need of disclosing credit card information if you try to comment on this blog. Also, if a site just needs the single bit information whether a person is above 18 or not (as many do!), they should not ask for the date of birth, since that means divulging more information.
  3. Justifiable Parties: This is from the experience of the failure of the over-arching vision of the Microsoft Passport identity management system. The law states that there should be a justifiable need for an identity provider and its interactions to have identity information. Essentially, there is no need to unify my Social Security Number of Tax Identification Number with my MySpace account. Users may not be very comfortable having one identity system for all uses. I may not want to divulge my company identity when surfing objectionable material online.
  4. Directed Identity: This, to me, seems like a corollary to the laws 2 and 3 above. It says that there should be unidirectional identity handles which don't reveal more information about the identity than that required. For instance, if my employer allows me to ex-officio access IEEE Journals, IEEE should not be able to get my identity handle, except for the information that I work for a particular company which allows me access. Also, identity providers should be like 'beacons' emitting identity information as allowed by the users, but establishing an identity relationship with it should be a uni-directional identity relationship. This is essentially to prevent correlation of identity-handles. Cookies are an example -- while a cookie might authenticate a user in a widget, cookies cannot be shared across sites to avoid correlation. Of course, there can be ways to defeat this purpose and those are essentially the instances that are undesirable.
  5. Pluralism of Operators and Technologies: Cameron states that one single monolithic system can never be enough for all our identity needs. A person might definitely want to have separate providers (Windows Domain Authentication, Open ID, Paypal) and technologies (Kerberos, Web Services) for different use-scenarios and may not want to correlate them for obvious reasons.
  6. Human Integration: Cameron makes the point that we need better design of UI to prevent identity theft and ensure privacy during the interaction of the human and the terminal on which they authenticate themselves. There can be many a slip between the cup and the lip, and this is becoming all the more apparent thanks to phishing and other kinds of attacks. We need better methods to prevent identity systems masquerading as others, and more secure means of communication between the user and his terminal for identity information exchange (biometrics?).
  7. Consistent Experience across Contexts: Cameron tries to make a point for a universal identity information entry interface across the various kinds of identities we might like to maintain (professional, personal, financial), but the point seems more for Windows Info Card (I'll talk about that later). It seems inspired by our carrying different kinds of identity cards in our wallets, such as the Driving License, employer ID card and so on each of which have the same experience (show the card and gain access).

It is great to have somebody's wisdom and experience captured so concisely in a set of seven rules. That is what lets us stand on the shoulder of giants and build bigger and better technologies.

The laws seem simple, intuitive and practical, and are extremely general. I think that is its biggest undoing -- since they do not give formal semantics of the laws in a mathematical language, it is very easy to have ambiguity and doubt in terms of their interpretation. (A mathematical formulation of something as general as identity is not very easy either). Also, since they are written in such general language, there can be very loop holes and an actual identity system would have to do a lot of thinking to make them very robust, secure and private. I would only request Cameron to explore writing more formal means of expressing these laws and have extensive case-studies (I may not have looked very carefully for them) and have more extensive discussion about privacy, security and so on -- concepts that are becoming very pertinent by the day. I would also like to see more discussion from the perspective of the identity system -- things such as identifying bots, using captchas, and establishing authenticity of information a user enters (is the user really over 18?). He should perhaps consider writing a book!

A theoretical discussion of identity systems is not of much use, so I would endeavor to discuss some systems in use today. The simplest by far is the simple login password form backed by a text file/database that you can implement in under an hour. My guess that is a pretty robust solution for most simple sites. The downside is a registration process and the need of remembering one more set of usernames and password. The fact that most of us practically use the same usernames and passwords for every site is a matter of convenience as well as a significant security threat. If any one of the sites of compromised (which is very much possible because such under-an-hour hacks can not possibly maintain the highest standards of software quality), the risk of all your accounts being compromised is quite high. Also, it is very difficult to ensure consistent interfaces, and security of transactions. Varying privacy policies might well mean that the user control on the information s/he has divulged to one party is rather suspect. However, they serve their own purpose. This method is quick and dirty -- and works well in a rather large number of scenarios.

Of course, identity is very well understood in an enterprise setting. Kerberos and Lightweight Directory Accesss Procotol (LDAP) have been around for ages and have been the subject of a lot of research. There are standard implementations that can be used like a black box, and single sign-on within a single enterprise is probably a well-solved problem (that is a rather speculative statement). It is a much easier problem also because if we consider the scope of privacy and security etc. is a single enterprise intranet and the problem as well as their solutions are primarily technical. If, however, we consider a federated identity management system for the whole of internet, the scope is much larger, and the deliberations are not just technical, but philosophical as well, since it involves trust between parties who don't trust each other :)

Another concept that tries to ensure convenience is Open ID - a federated identity management system. The aim is simple -- to use identification information on one site to automatically establish it for some other sites. For instance, if you have Wordpress blog and you want to leave a comment at LiveJournal, you can provide your Wordpress blog URL and LJ automatically uses Web Services to establish identity. There is a user-consent phase and since it is not controlled by a single party, it is preferred by many (unlike Passport). The scheme works well for simple single sign-on areas which are public facing. This has recently been backed by AOL and Microsoft which has lent a lot of weight to the OpenID system. However, the system only establishes a basic protocol. The Open ID site unequivocally states that it is not a trust system and doesn't try to control spam. I would also be worried about using it in a general setting because if one site gets compromised the taint can spread across the federated system (this probably needs to be studied more). Another problem is that, since Open ID itself is rather vague about security and a number of other points, I very much envisage individual corporations coming up with their own standards (much like Javascript) which would yield a number of child-protocols perhaps not interoperable.

Microsoft is promoting the Windows CardSpace (nee Information Card and many other names). This follows the common practice of lifting paradigms from the real world into the virtual. A user can have a number of cards provided by various Identity Providers which Windows would save securely. When a website (Relying Party) wishes to establish the identity of a user, he would be presented with a secure dialog where he can choose which identity information to transmit, much like you looking into your wallet and taking out either your business card or your Driving License as required. Microsoft provides a number of cryptographic protocols which form the bedrock of secure transmission, and the initiative can not be successful without the participation of the other parties involved (one of the biggest problems due to intense competition). I am sure it would satisfy Cameron's laws since Cameron would have been obviously involved in the development process. However, I can very easily foresee myself lifting the problems from the real world as well -- what happens when my wallet gets lost (laptop stolen, or even virus infected), people cheating about credentials, Relying Parties passing information around (that could compromise the whole system!).

On the Internet itself, identity for very specific applications has been worked out to a little extent. Paypal and Google Checkout establish your identity with respect to financial transactions, and have become hugely popular. One of the oldest technologies on the internet (email) still remains the most popular means for establishing your identity in the online realm. How much progress have we really made in the last decade or two?

Considering that identity is a problem which is not well solved even in the real world completely, my guess the virtual world will only lag behind. There are a lot of new technologies, ideas and we have to wait and see which ones click. However, my humble guess would be that as Cameron himself proffers that there should be a pluralism of operators and technologies. The application and the usage scenario should be clearly delineated before starting to design any system (which is so true!) and it is easier and viable to solve specific needs (financial identity, enterprise setting). Scoping the usage always makes the problem tractable and leads to success (perhaps after a few iterations). My concern is that none of the current technologies clearly scope their work and that would be my biggest gripe.

[Another review of identity related technologies at Read Write Web. There is a conference Internet Identity Workshop as well. If you want a fleeting identity to login to sites which unnecessarily want login, you can check out Bug Me Not. Thanks to Mohit for some initial pointers.]