• Supercomputer applications for the developing world.

    by  • November 4, 2007 • Hexayurt • 1 Comment

    http://news.ycombinator.com/item?id=76007

    Occam.

    Specifically, http://transterpreter.org

    Yes, the language it runs (Occam) is 20 years old. But the language was designed for programs running on dozens to thousands of nodes, and in the transterpreter implementation, there’s the possibility of doing this on heterogeneous hardware, where the fast nodes do things like splitting and merging the data set, and the smaller “grunt compute” nodes do the actual work.

    Parallel programming is hard, but that’s inherent hardness. You can’t get around things like memory bandwidth and latency at a programming language level, no matter how much you try. You can only get away from those things by dealing with the fact you have thousands of machines, or tens of thousands.

    It’s only going to get worse from here on in, as “faster” comes to mean more processors, not higher clock rates. You’ll see this: 2 core! 3 core! 4 core! 8 core! and pretty soon (within 10 years) we’ll see 64 and 128 core desktop machines, maybe even a revival of unusual architectures like wafter scale integration with 3D optical interconnects (i.e. upward pointing tiny lasers and photocells fabricated on the chip) to handle getting data on and off the processors.

    We’ve seen unambiguously that **GIGANTIC** data sets have their own value. Google’s optimization of their algorithms clearly uses enormous amounts of observed user behavior. Translation efforts with terabyte source cannons. Image integration algorithms like that thing that Microsoft were demonstrating recently… gigantic data sets have power because statistics draw relationships out of the real world, rather than having programmers guessing about what the relationships are.

    I strongly suspect that 20 years from now, there are going to be three kinds of application programming:

    1> Interface programming

    2> Desktop programming (in the sense of programming things which operate on *your personal objects* – these things are like *pens and paper* and you have your own.)

    3> Infrastructure programming – supercomputer cluster programming (Amazon and Google are *supercomputer* *applications* *companies*) – which will provide yer basic services.

    One of the concepts I’m pitching to the military right now is using the massive data sets they have from satellite sources to provide “precision agriculture” support for the developing world. Precision Agriculture in America is tractors with GPS units that vary their fertilizer and pesticide distribution on a meter-by-meter basis (robotic valves consult the dataset as you drive around the land.)

    In a developing world context, your farmers get the GPS coordinates for their land tied to their cell phone numbers either by an aid worker, or by their own cell phone company.

    Then the USG runs code over their sat data, and comes up with farming recommendations for that plot of land. If the plots are small enough (and they often are) the entire plot is a single precision agriculture cell.

    But if you think about the size of the datasets – we’re talking about doing this for maybe 20 – 30% of the planet’s landmass – and the software to interpret the images is non-trivial and only going to get more complex as modeling of crops and farming practices improves…

    Real applications – change the world applications – need parallel supercomputer programming. Occam was *right* in the same way that Lisp is *right* but for a different class of problems. That’s because Occam is CSP (concurrent sequential processes) and those are a Good Thing. There may need to be refinements to handle the fact we have much faster nodes, but much slower networks, than Occam was originally designed for – but that may also turn out to be a non-issue.

    I’m also working on similar stuff around expert systems for primary health care – medical expert systems are already pretty well understood – so the notion is to develop an integrated set of medical practices (these 24 drugs which don’t require refrigeration, don’t produce overdose easily, and are less than $10 per course) with an expert system which can be accessed both by patients themselves to figure out if their symptoms are problematic or not, and by slightly trained health care workers who would use the systems to figure out what to prescribe from their standard pharmacopoeia.

    It’s not much, but for the poorest two or three billion, this could be the only health care service they ever see. None of the problems are particularly intractable, but you better bet there’s a VAST – and I mean VAST – distributed call center application at the core of this.

    Of course, the Right Way to do this is FOLDING@HOME or SETI – we’ve already proven that public interest supercomputing on a heterogeneous distributed network works.

    Now we just need to turn it to something directly lifesaving, rather than indirectly important for broader reasons.

    Remember that the richest 50% of the human race have cell phones already, and rumor has it (i.e. I read it on the internet) that phone numbers and internet users in Africa have doubled every year for the past seven years. 10 years from now the network is going to be ubiquitous, even among many of the very, very poorest.

    We get a do-over here in our relationship with the developing world. We can’t fix farm subsidies, but we can ensure that when they plug into the network for the first time, there is something useful there.

    flattr this!

    About

    Vinay Gupta is a consultant on disaster relief and risk management.

    http://hexayurt.com/plan

    One Response to Supercomputer applications for the developing world.

    1. November 27, 2007 at 9:22 am

      There’s an issue here with energy efficiency, in that datacentres and even work/home PCs are already a measurable proportion of US energy consumption. The trouble with massively-distributed systems is that having a node ‘ready for work’ but not actually *doing* anything is tremendously wasteful – you want the overall system load average to be rather high.

      Also, you don’t necessarily want to do the computation on a mesh of OLPC XOs, since that would require them to be in use for a large proportion of the day, which shifts the energy burden from the developed world to somewhere that’s already got supply issues.

      Tricky balancing? Interesting.

    Leave a Reply

    Your email address will not be published. Required fields are marked *