I gave a talk recently at the Groupon office in Palo Alto, on “Retail from the Other Side: Learning from Working with POS Systems”, and wanted to share the slides for the same.
Retail Systems are a complex bunch. If retail is all about detail, their systems are all about variety, and the variety that I have seen in retail systems over the last few years is mind boggling. If you want to build anything for these systems, you have to take into account the store layout and architecture, whether they are on a POS system, a regular PC, or a thin client, or even on a virtualized environment like Citrix. You have to deal with antiquated system configurations, low memory, challenging connectivity issues.
Perhaps some of the biggest challenges are human related – in adoption and training – retail being a very geographically distributed operation, it becomes very difficult to retrain and ensure the associates are best positioned to use complex systems, but in a simple and efficient manner. You have to deal with language issues, remote connectivity, and busy store hours.
No wonder building store system, and building for store systems is not for the faint hearted.
I was wondering what means can be employed to control this — and there are a few strategies that can be used here:
- Input Cleanup – Always clean up inputs when you are accepting anything from a foreign agent (user, website, api). This should include cleaning up script tags, cleaning up for SQL injection, etc.
- Secure Account Settings – Ensure that before changing account settings, users are at least made to put in their password once again, or it’s on a separate location (https) that prevents the same authentication tokens to be used. Yahoo/Google do that for all important account settings
- Sandbox External Code – If you do have to run any custom code that the user sends your way, run it in a sandbox. Rather than giving it access to all your data structures, create a new datstructure, populate it appropriately and let it spit out the results in some predefined format (say, XML). You can parse the results and display it again. Showing users’ code directly can be quite dangerous.
- Extra authentication for APIs – Give the API an extra authentication token, say an api key, that prevents the users to access your api’s without it. The challenge here would be distribution of this extra information. This can either be done by asking users to put in an extra api key when they give api access to somebody, or to make the software pass through a API validation step (a la OpenID, or Vista UAC) that only gives out the api key, after correctly informing the user.
Do you have any other tips?
I still remember Bitwise 2006 very fondly — all the last minute action, with the teams participating, running around to arrange problems, solutions, making sure everything runs properly. And it’s been 2 years since then. It’s very proud to see Bitwise 2008 progressing so well, with teams from almost 43 countries and clicks from 75! It can’t get any bigger than this. It all started in 2001, and has come so far since then!
If you fancy yourself as a ace programmer, and you think you can unravel the double helix in your sleep, if you live and dream algorithms, crunch numbers when nobody’s looking at you, you gotta participate in Bitwise, the real test of your abilities. It’s the largest algorithm intensive online programming contest in India organized completely by students from one of the best engineering institutions in the country. You compete with the best brains in the area worldwide, and there are a sweet USD 2500 on offer as prizes.
Do you have it in you? Visit the Bitwise 2008 site and register NOW.
In a previous post on the Honey-bee algorithm for allocating servers, which I found quite fascinating, I had pointed out I had referred to a paper on Swarm Intelligence by Eric Bonabeau and Christopher Meyer published by Harvard Business Review, and finally I got time to go back and read it and I found it quite fascinating! The paper describes case studies where people have used algorithms inspired by nature (ants, bees) which use a decentralized model of computation and optimization.
The paper points out that the main advantages of using algorithms like these are flexibility, robustness and self-organization. The algorithms work in a completely decentralized manner, and work on the principle that the wisdom of all the ants (or the small agents) can be harnessed in such a manner that the whole is far greater than the sum of its parts. Also, the algorithms are invariably robust to failure and adaptive since they don’t make use of a central decision making bodies and there is a lot of experimentation with new sources of food (or results in the case of algorithms).
The paper also points out that there are several cases where these concepts have been used successfully (both in business and academia):
- Optimally routing telephones calls and Internet data packets seems to be a tough problem because if we use a centralized algorithm, it will neither be robust nor adaptive. Algorithms based on swarm intelligence come to the rescue since they are not based on a central decision making body, but rather work on the principle that the scouts recruit other agents to follow new promising paths.
- Fleet management and cargo management also suffer from similar problems. The paper points out that Southwest Airlines found out that in some cases, letting cargo go to wrong destinations and recovering is faster and more robust than always making sure that all cargo is handled correctly.
- Small simple rules that lets people take decisions for themselves usually works best. This has since been shown to work very well for companies such as Google as well.
There are more case studies in the paper, but what’s fascinating is that these techniques become even more popular now-a-days because companies have realized that it is easier to tolerate failure than to eradicate it — more so in the context of the Internet where there is a race to build systems that are self-correcting (such as Map-Reduce, Hadoop and Dryad). Also the new realities of the emerging architectures (multi-core, highly parallel, massive clusters grids) is going to mean that we have more parallel horsepower to run our applications and such self-organizing algorithms are going to become even more popular in the computing community.
However, one concern would be programming models for such computing bedrocks. We still don’t understand how to manage parallel computation very well to ensure that interpreting such algorithms in code is going to remain difficult for the average programmer for quite sometime.
I came across this article in Linux Today which describes Project Fortress, an open-source effort from Sun to provide a language based on Fortran to easily write parallel programs. The project seems to be built on top of Java. Some salient features seem to be:
- Implicit parallelism: If you want to execute a loop sequentially you have to explicitly write that. The big claim is of course, using this efficiently on multi-core machines.
- Support for unicode: As a result, the scientific research community can make use of greek alphabets in their code, and even use things like superscripts, subscripts, and hats and bars! This means that your code is going to look a lot more like your algorithm.
- Automated Type inference: The system has extensive type inference (the kind that functional languages and C# 3.0 have) and that means that your code is far more readable.
- Extensive library support: In fact, even some parts of the main system are implemented as libraries. They expose the parsed AST to the programmer, and give him extensive control.
These sound quite interesting, and it seems that the scientific computing language of the future is going to look a lot like Fortress, if they are successful with this effort.
I came across an interesting article in The Hindu (see the story from GaTech news; I couldn’t find the link on The Hindu website) today which described work done by Sunil Nakrani and Craig Tovey, researchers in GaTech, on using a decentralized profit-aware load balancing algorithm for allocating servers for serving HTTP requests for multiple hosted services on the Web. The interesting, thing is that the algorithm is based on how Honey Bees in a bee-hive decide where to collect nectar from. I decided to take a look at the paper.
Essentially, the forager bees collect information about how profitable a particular nectar source and how much is the cost involved in collecting from that source (round trip time). Based on a composite score, they perform a waggle-dance which essentially indicates what is the value of performing foraging where they have been. The inactive foragers can thereafter figure out where to go look for nectar.
The researchers modeled it in the server space by having an advert-board, where servers post profits from serving a request and the time required to serve it. Thereafter, the other servers can choose which colony (or service) they wish to be a part of. Existing servers can also move to another colony based on a probability determined from a look-up table indexed by the ratio of their profits by the profits of their colony.
Their results indicate that they do quite well compared to optimal-omniscient strategy (which knows the pattern of all future web requests) and better than existing greedy and static assignment strategy. Shows that we still have a lot to learn from nature!
One thing that flummoxed me though was that the original paper seems to have been published way back in 2003 (see Tovey’s publication page). I wonder why it got press publicity only now.
[The paper also cites a Harvard Business Review paper titled Swarm Intelligence: A whole New Way to Think About Business]
I had the good fortune of being able to listen to Fran Allen (IBM profile and Wikipedia entry) today. Fran pioneered a lot of work in compiler optimization and was awarded the Turing Award for her contribution to Computer Science in 2007. That makes her the first and only (till date) woman to have won the Turing Award, the highest honour in Computer Science.
It was inspiring listening to her talk about her adventures. She almost described the evolution of high performance computing, with the earliest IBM systems starting from Stretch, which was supposed to be 100X faster than the existing machines (but turned out to be only 50X faster) and was delivered to the National Security Agency. She also described some of failures she had been involved in (Stretch, since it was 2X slower than intended, and then the ACS project). What was interesting was that most of the basis of the pioneering work she described had its basis in the work she had done in these failed projects. The fact that failures are the foundations of mammoth successes is one message she clearly drove home with her optimistic outlook. She also described her work during the System 360 and the PowerPC projects.
Her appeal to computer science researchers and students was mainly about the programming models and architecture decisions revolving around multi-core, a buzz word most of us have been left confused and wondering about. This new revolution that promises to change the way we write software and exploit parallelism in our programs, is the biggest opportunity, as Fran put it!
What was also interesting was how we got to the lecture hall wading through mud in a construction site during the rain. Apparently, there are two conference halls at IISc — one called JRD Tata and the other JN Tata. No wonder, we got to the wrong one and found that it as hosting a conference on Power Electronics. Note to self: make sure you always check the location properly before setting forth.
Other than that, have been lately busy with hacking Python for S60 (this is a brilliant idea– having a platform agnostic scripting language!) to work on my phone and a Python based remote administration toolkit. Will post more about them soon!