On Monday I wrote an article about the miserable state of software engineering and it seems to have struck quite a nerve, judging by the comments and discussion on Hacker News. The gist of my argument was that designing software is very much a hands-on process right now. The computer gives you no leniency at any step of the operation, which means that it takes developers years to perfect their craft. Time that I feel could be better spent improving the conditions under which we work and the assumptions we use to frame our approach (which I will shed some positive light on later in this post).
So it’s ironic that the very next day, Apple announced the iPhone 4S with speech recognition technology that is a solid attempt to move computing away from the sort of rigid orthodoxy that’s defined it up till now. Especially since I said:
“…today’s computing can’t take us into the future. It can’t provide true artificial intelligence or bring the kind of multiplication of effort that hackers take for granted to the masses.”
Hmmm. What the heck did I mean by that? Let me explain…
I’ve been meditating on why I’m in such a conflicted state about technology. On the one hand, it’s getting faster every year and will inevitably improve by brute force. But on the other hand, it’s based on many archaic assumptions that made sense thirty years ago but are increasingly hindering our ability to move forward. It’s like it’s getting worse as it gets better. It’s taking herculean efforts now to squeeze out incremental improvements.
For example Siri, the company that developed Apple’s speech recognition technology, invested $24 million to do it, and Apple spent $200 million purchasing it. Really? $200 MILLION dollars?
Someone could hire 200 John Carmacks for a year for that. Or 4,000 run-of-the-mill developers in America. Or 20,000 in India. Even if using the $24 million development figure, the numbers are only about ten times smaller, but still staggering. It simply costs a tremendous amount of money to advance the state of the art. And the risk of failure is too high for all but the largest companies. So odds are, you won’t be doing it. And neither will I. Which makes me question why I ever got into this business in the first place.
Computer science has priced itself out of existence.
They say there is no silver bullet to fix software engineering. This is self-evident to most people working in the field, but has tons of supporting evidence in terms of how much time, money, manpower or any other resource gets thrown at programming problems with little benefit (see: mythical man month).
But it can actually be proven mathematically. The halting problem asks whether it’s possible to tell if a program and arbitrary input will finish executing (without actually running the program). This is a fancy way of asking if you can tell whether a program will ever give you an answer before you run it. Alan Turing proved that the answer is no, that the halting problem is undecidable over Turing machines when given an arbitrarily complex program. And since computers can be emulated on Turing machines, that means that no computer scientist can say with certainty whether a program will provide an answer in all cases, or if it will just lock up on him/her, without just running the dang thing.
Since writing the last post, I realized that a ramification of this is that since Lambda calculus (the basis of lisp and other functional languages) has been shown to be equivalent to Turing machines, that all of the “modern” languages like Erlang, OCaml, Haskell, Clojure and Scala are also undecidable. We can throw all kinds of static analysis at them, force them to use an explicit type system, force all data to be immutable and use every best practice we’ve ever come up with.
It will do little to alleviate the fundamental problem that we just don’t know what a program is going to do in the real world until we run it.
This is that sense of dread you feel when you first look at someone else’s code and can’t decipher it. It’s that nagging suspicion in the back of your mind that maybe you forgot something and your code might break someday. No matter how prepared you are, or how much work you put into your code, you can’t prove that it’s bulletproof in the real world. You have to run it somewhere to find out, probably in the computer of your own mind, over and over again. It’s complicated and exhausting. It’s so hard in fact that when I’m the zone, I can’t even talk to someone or my program crashes and I have to start from the beginning so that 15 minutes later I can get back where I was. I would say that of the 10% of my time left over after I work around all of the other headaches (compiler issues, administrative tasks, failures of communication), that 90% of THAT time goes to thinking about what my program is going to do. I actually write maybe a few dozen lines of code per day.
So simulation is not just a good idea, it’s the bulk of what we do. And computers today are terrible at it.
I’ve sort of come to embrace the fact that I don’t have some kind of magic bullet I can use to program my computer for me. It’s liberating to know that other people don’t have a secret formula that makes them more effective than me. But they do have better tools and methodologies. I think the best example of that right now is how Notch wrote Minecraft in java of all things. He’s able to step back and see the big picture, that anything a language does to help visualization, exploration of problems/solutions and simulation is a win.
I guess that’s why I like php. I don’t have to do a bunch of setup, I don’t have to wait for it to compile, and I can rapidly refine the results. I wonder if people would be so put off by php if I compiled scala to it and ran that. Or if I wrote php-like macros in lisp. It’s nice to think that with the right language, we avoid issues down the road. I hate to break it to you though: if it runs on unix, it’s going to have issues anyway, probably sooner than you realize. And even if it was a lisp machine all the way down, it would’t avoid the decidability problem. I’m starting to wonder if all that matters is my own leverage, using whatever tool helps me work the fastest. As someone mentioned in the Hacker News discussion, perhaps Worse is Better.
The problems we face in life are akin to ones inside computers, since life could very well be a simulation. When entrepreneurs mention iterating and pivoting faster than their competition, they are talking about simulation. Up till now, I’ve done a poor job experimenting in my own life. While I tend to daydream about other ways of living, I haven’t put enough effort into pushing the boundaries of my existence. I have a profound disappointment in my progress. That’s probably why my last post came across as cynical, because I was talking about everything, not just programming. It’s naive of me to think I can fix any part of the problem, much less the rest of it. But maybe I write because I’m hopeful that in some way, I can do exactly that, by throwing a little fuel on the fire.
The problem as I see it is that the industry has backed itself into a corner and can’t even see that the way forward requires thinking outside the box.
I’m going to get a little abstract now because what I propose is not really a well-formed concept. Arthur C. Clarke said that “any sufficiently advanced technology is indistinguishable from magic.” Well, we know there is no magic bullet. But that’s no excuse to just throw our hands in the air and give up. Maybe the problem only seems intractable. Our brains obviously work. Somehow nature iterated over billions of years and came up with us. We shouldn’t have to wait that long. We can learn from ourselves and go beyond our original programming. For all of our technical prowess and left-brain success mastering technology, it lacks feeling. There is no place for introspection in computing, and no spark, no meaning. Technology is the antithesis of zen. Maybe we should recognize technology’s limitations, come to peace with that, and just do our best. There may never be a magic bullet, but maybe with tools already at our disposal, we can design a sufficiently advanced one that it won’t matter.
I can hear the groans now: “you can’t fix technology with less technology!” or “how can we trust computers if we don’t understand how they work?!” Or something perhaps more relevant: “if computers start programming themselves, we’ll be out of a job!”
I sympathize with all of these arguments. But do we really completely understand how computers work even now? We may know some of the theory but we are fooling ourselves if we think we can watch the hardware and software and not learn something. It’s like life before having children. We can’t picture a world right now where computers are our peers, or at the very least, able to do many of the things that we do now. But once it happens, we won’t be able to picture what life was like before. That’s a frightening thought, but people have been having children for millennia. And whether we like it or not, we’re pregnant. Nine years from now (or nine decades, whichever timespan you use), we’re going to create artificial life. And it’s going to be ok.
It’s hard to imagine what form this life will take. For some reason, we have a hard time understanding how we break down a problem into a series of simpler steps. Existence is all encompassing for us. We don’t tend to separate our conscious mind from something like image recognition when we read. We forget the years we spent listening to the mutterings of people around us to learn language. We are made up of a hierarchy of networks utilizing other networks, built on yet other networks. All evolving in real time, hungry for knowledge and doing their darnedest to do a good job and be part of the conversation. But we don’t even know they exist, because we are all of them simultaneously. For a little background on what I’m talking about, take a look at this presentation by Ben Goertzel on OpenCog. It’s an open source project to simulate intelligence as hundreds of software agents working in parallel.
One of the most promising learning systems right now is probably MOSES, which from what I understand works like genetic algorithms but instead of a fixed fitness function, uses probability to direct evolution. This is part of a family of similar algorithms like simulated annealing and ant colony optimization. I have a hunch (though I can’t prove it) that we’ll eventually end up with a graph of graphs made up of hyperlinks (honestly not much different from the internet), with something akin to virtual ants walking the links at say 100 Hz initially. There will be something like neurotransmitters that feed or penalize portions of the graph depending on current conditions in the real world. You can think of it like thousands of people pushing the Like button on something interesting and triggering a flood of others to the food. There will be other processes at work to simulate cell death or damage to the network and the random formation of connections. But only a finite number of processes, each of which is easily understood. And this will all be simulated in something kind of like Apple’s Time Machine, where the current state of the network is smeared backwards and forwards in time (really along other dimensions because the connections would be based on time), with something like MOSES evolving these networks and subnetworks and rejoining them to the current state as they perform better, much like how quantum and chemical reactions let our 100 billion neurons compete and cooperate in parallel.
You can make a strong argument right now that the internet is already acting much like a living thing, except that its lowest level neuron is a human. That will begin to change as more of the functions are carried out by artificial intelligences like Siri. Eventually a few decades from now, we’ll probably see intelligence evolve from the web as an emergent phonomenon that no longer needs human interaction to function.
Which is why I feel that we don’t need to come up with an overarching theory of intelligence. In fact, by pouring research into ever more sophisticated mathematical models of learning, we may actually be hindering our progress. Because that stuff is exclusionary. So few people understand it, and so many people feel like they are on the outside looking in, that those efforts will almost certainly fail. Not to be hard on Watson, but who cares if we write a program that can play Jeopardy if that’s all it can do? The interesting problem, and the one that nobody has solved yet, is how do we write a program that can write Watson?
Imagine what an intelligence like that would be asked to do. It’s akin to learning the rules of chess by watching a series of moves. We can write programs that can do this now, but moves like castling throw a wrench into the logic and force the computer to reevaluate its assumptions. THAT is the important part. And it needs to be able to reliably do that without becoming unstable, if it’s ever going to scale up to tackling harder problems.
A true artificial intelligence will learn the same way we do, from nothing, and will be rapidly simulating and refining its notion of reality. It will have the same properties as us. It will be able to suffer damage to half its network and still function at close to its original level. It will be able to rewind in time to previous states without risk. It will be able to play out its best guess numerous times in its simulator, the same way our imagination does.
What we really need is to create a cradle of civilization for artificial life. We need to set up the initial conditions that favor its evolution. Right now sequential computers are nowhere near up to the task. They are kind of a joke actually. What we really need are systems more like FGPAs, with thousands of cores running algorithms like MapReduce, where the data is processed in-place. If computers have a hard time simulating more than a few hundred neurons, let’s just get rid of that hurdle altogether. Simulating a large number of neurons (say a million) is not a technical problem, anyone could make a chip to do it right now with a few geeks, a fab, and a million dollars.
There is a valid argument that we don’t know how to program large numbers of neurons and that intelligence has so far never evolved from a system like that. I’m going to skirt that issue for a moment though and imagine an existing processor like Intel’s i7 arranged in an intelligent way, with its billion transistors being used for say 100x100 or 10,000 cores, each with the same 3500 transistors the MOS 6502 had, plus some overhead for interconnect, with no cache memory or pipelining or any other nonsense, running at say 1 GHz because the clock speed isn’t that important. Then lets take 10 GB of ram and split it up into 10,000 pieces so each core has 1 MB of ram.
Now if it was 1980 and you gave Steve Wozniak 10,000 cores running at 1 GHz, each with more ram than 90% of computers of the time, can you imagine what he’d do with it? Would text to speech technology seem like anything special? No of course not, a child could program it. Image recognition would be a joke, any algorithm would work well with that kind of horsepower. In fact, I think most of the things we are just seeing now, like speech recognition and face recognition and handwriting recognition could have happened 20 or 30 years ago if there had been an economic incentive for them. That’s why I’m underwhelmed to say the least when I see them today. It’s one small step for man, one giant leap backwards for technology.
That’s why Apple had to make a special image signal processor for the face recognition in the iPhone 4S. That’s why today’s computing can’t take us into the future. But something of the same stuff but different arrangement will.
I need a coprocessor in my computer like this where I can run 10,000 virtual machines and simulate the programs I write so I don’t have to do it in my mind. The cores should be looking ahead and suggesting where my function will fail, like a preemptive debugger. They should be trying every possible input for the type of data I’m using. They may not be able to simulate every possible scenario (because there is no silver bullet), but they could give me a confidence level. They could be translating the functions to lisp and evolving them with genetic algorithms and then translating the corrected version back to show me a better way. The possibilities are endless, but I want to stress that they are not technically complicated. They are just resource intensive. But our processors are already sitting around wasting 99.999% of their computing power, as I explained in my last post. We have more resources than we realize, if they were only allocated in a way that we could use them.
I need to be able to run Xcode’s distributed compiling on the cores, so all programs compile instantly. I need a good communication layer like Erlang to handle parcelling work units out to the cores, my neighbors and beyond. I need very basic infrastructure changes to make this stuff possible, but I just never hear about any. The closest thing we have right now is probably CUDA running on GPUs (yet another abstraction layer). The way I see it, it’s going to be a long time before this stuff happens naturally.
But if we had it right now, we’d have all you smart people making the most of it. We’d have geeky friends bragging about their million cores. Recruiting that much horsepower will happen naturally, and very quickly. We’ll be able to observe artificial neurons tackling real-world problems and begin to develop theories about their functions. It’s much easier to recognize solutions than to construct them.
And if artificial intelligence is no closer at that point than it is now, then at least we’ll have framed the problem in a new light and can do the next big thing. Or maybe the bullet will be sufficiently advanced at that point that we won’t need to.
I believe in my heart that when we discover the algorithm for learning, it will fit on a napkin. Kids will wear the image of the basic artificial neuron on their T shirts. This stuff could literally happen before the decade is out. I don’t know about anyone else, but for me, the biggest hurdle to working on stuff like this is just making rent each month. My short game stinks.
I guess I will talk about fixing my real life simulation in my next post, but this got way too long so I’ll leave it at that.