In my last post I gave my take on why every API is flawed. I realize that sounds defeatist but consider that it’s only the tip of the iceberg. There are a multitude of problems with programming today, and even if we set out to solve each one, we have to consider that the rest of the world isn’t likely to follow our example. So instead of changing the world, I want to talk about how I would like to change my own approach. If it’s good enough, who knows, maybe it will catch on someday.
Think back on your life. When was your period of highest productivity? Mine without hesitation was when I was using HyperCard at 12 years old (a language very similar to Applescript but not crappy). Here is a snippet of HyperTalk:
on mousedown domenu "New Button" set the name of button 1 to "OK" show button 1 at 100, 200 end mousedown
Notice that there is no real punctuation besides the occasional quotation mark or comma. This was the closest I ever came to natural language programming. It’s really pretty astonishing how quickly people grokked this language. My dad programmed a bit in it and I don’t think he had ever studied any other programming language before.
I’m not going to break down the code very deeply, because it’s obvious what it does if you have even the slightest bit of context. Most of the functions in HyperTalk are written as callbacks to messages. So they look like: on mousedown, on keydown, etc.
I’d say the only complicated piece of code here is domenu, because it’s calling HyperCard’s human interface to create objects. Think about that for a moment. You can tell HyperCard to do the same thing a person would do to perform the same task. That’s unheard of today. We’ve lost the ability to write macros in most languages, and I find that very sad.
Since it’s the only button on the page, we can refer to it as button 1. We could also say:
set myButton to it
Which means to create a variable referring to the result of the last command performed. Then we’d say:
set the name of myButton to "OK"
The show button line is fairly obvious. We’re in a cartesian coordinate system, so show the button at a 2D location like it’s on graph paper. In HyperCard the origin is at top left and increasing right and down, but it could be arbitrary if we were generalizing the language for OpenGL or the web.
I had a whole rant here about how I
hate dislike Objective-C and other trendy languages/APIs that I decided to leave out since I’m talking about methodologies, not languages. But I don’t want to leave anybody out. I tried learning JavaBeans and was never able to. For some reason they are just too dang complicated. Same thing with Microsoft Access. It’s a cryptic environment where everything feels hidden and I constantly have to hop on the web to do even the simplest thing. Content managers on the web generally suck, XML files generally suck, really 90% of everything sucks.
So let’s get to the meat of this post. All APIs are flawed because you have to rely on some authoritarian decider to provide you some interface to the code they are proud of. This is a level above where we want to be: full control. The best way to provide full control was developed half a century ago - the unix console. Things like streams. Those are actually a level above what’s really going on, which is black boxes that take some input and provide some output. Functional programming is the syntax above this base level.
Let’s think forward some arbitrary period of time and imagine how programming works in Star Trek or Star Wars or your favorite sci fi universe. We can think of this as manipulating state by controlling aspects of a block of data. All we are trying to do is get the computer to do what we want. Today we have a world where Scotty talks into a mouse and nothing happens.
But we know that what really happens is programmers spend a couple of hours in google and Stack Overflow until they find the snippet they need to do whatever trivial task the API makes difficult. Let’s start with this post, which I saw the other day on Hacker News. In the future, programming will be collaborative, whether it’s between users or artificially intelligent computer agents. Right now we spend 90% of our time formatting our answer to fit what the computer can understand. In the future, we’ll spend 90% of our time reviewing the options available and then choose the one that best furthers our goals.
Every metaphor will be a static process. We’ll feed it some data and in an imperceptibly small period of time we’ll get the result. We can think of each box as a leaf in a big tree that under the hood is represented as a hash that distinguishes it from other objects, based on the inputs and outputs it’s defined for. Let’s talk to this futuristic computer in our text-based dialect:
make a screen
make: 1) verb. instantiate something. Associated with imperative programming. a: 1) superfluous word. 2) Sometimes used to name a variable. screen: 1) noun. usually a graphical user interface to view state. 2) verb. filter results. please choose one of the following examples or write something similar: 1) screen1 = screen 2) set myScreen to screen 3) (screen (dimensions 0 0 1 1)) ...
So the user tries one of the suggestions:
screen1 = screen
A new screen pops up with default dimensions. from here, maybe the user tries to make a button:
make button in the center of screen
make: 1) verb. instantiate something... button: 1) human interface element to trigger an action from a click. in: 1) verb... the: 1) superfluous word... center: 1) noun: indicates the spatial center of an object. of: 1) verb... screen: 1) possible reference to screen1 2) noun. usually a graphical user interface to view state. 3) verb. filter results. please choose one of the following examples or write something similar: 1) button1 = button( screen ) 2) set myButton to make button in screen1 3) (button (dimensions 100 200) screen) ...
So the user tries one of the suggestions:
set myButton to make button at center of screen1
A button appears in the center of the screen.
And so on and so forth. Now the remarkable thing about all of this (I know this sounds weird) is, none of these types are named in this futuristic OS. Everything is a concept, based on the hash of the inputs and outputs that describe what it does. The computer infers the types, names and actions that the user is trying to perform. The questions the user asks will be stored as part of the program to explain the syntax that evolves, which itself will have a large degree of uniqueness but will still share metaphors common to other programs.
It also infers the methods of programming that the user is comfortable using. Some users like algebra, some don’t know what it is so prefer natural language, some prefer c-like or lisp-like syntax. It’s fine to mix any of these, because everything can be represented as nouns and verbs under the hood, or matter and energy, or however else we like to visualize it.
There is no API to speak of, other than the console. The API is actually a kind of wiki/chatterbot on the order of say kilobytes to petabytes that represents the relationships represented by the current programs in existence at the time the snapshot was taken.
Also since all of this is black boxes with some input and output, everything is actually a noun. Like I said, this program is represented as a tree of hashes under the hood, with hash values drawn from the current wiki snapshot. The wiki itself even has a hash. I say hash but what I really mean is a relationship graph, which is lisp.
Notice some interesting features of all of this. It loosely represents the point and click metaphor of the internet. The user can always ask the computer for more elaboration or more suggestions. Programming will become an interactive exercise where the user is creating code with multiple choice instead of paragraph form.
The user will also be able to save new code as a hash based on the original hash and the new relationships, and then share it with the world, to be integrated into other APIs. The hash will also act as a repository, and since the relationships are spelled out explicitly, APIs will be able to share code with each other (that replicates other inputs and outputs but with smaller code or faster speed) creating new APIs. This will very quickly simulate many aspects of sexual reproduction.
Humans will be able to act as APIs so multiple humans can share connections to an API that is evolving. It will be valid to ask another API (possibly a human) to suggest code that performs a function. Those new inputs and outputs can then be integrated into the API.
It will not take long at all for APIs to become sophisticated. Constant evolution will generate APIs that are so sharp that it feels like talking to a human. The suggestions will be relevant and nearly always correct. Notice the line where I said “1) possible reference to screen1”. The API itself will evolve to suit the user, as he or she uses it.
From here on it’s all just speculation, but I think that what will happen is programmer ability will not be as important as it once was. The “best” programmers will be able to just jump in and tell the computer what to do with very few questions, but with so many algorithms cached in the API, they won’t tend to come up with much novel code. The computer will notice what they are doing and suggest the prior art. Software patents won’t exist, unless you want to talk about patenting a hash representing the whole context of a program and its root API. Change one bit anywhere though, and you have a new hash, so new patent by definition. By this point we’ll have algorithms building our houses and growing our food, so without a money system, maybe patents won’t make any sense anyway. They don’t use money on Star Trek. But I digress.
So how does this help us today? Well, IMHO we have a lot of good metaphors but crappy implementations. I’m basically talking about BASH files with agents that help the user explore the problem space, but approachable by mere mortals. UNIX has a bunch of holdovers from when memory and disk space were scarce, like abbreviations and cryptic operators that make it difficult to convey ideas. That will largely go away when someone can read a program in natural language from the beginning.
Programs of the future will just be language-agnostic text files where users embed whatever languages best solve the problem, as lines similar to BASH files. They will probably end up looking like a transcript from the programming session. Or in this case, just the summary:
ABCD1234ABCD1234 make a screen ... 1) screen1 = screen 2) set myScreen to screen 3) (screen (dimensions 0 0 1 1)) 1 make button in the center of screen ... 1) button1 = button( screen ) 2) set myButton to make button in screen1 3) (button (dimensions 100 200) screen) 2
This code would show a new screen with a button in the middle. The first line is the checksum encoding the API and all of the context needed for this code to be parsed and executed. It holds the summary of the conversation that created the code we are looking at. This is just one level of detail. The user could zoom in further and see full descriptions of all the words, or zoom out and see just the pure code. Each line is whatever the user would have executed on the console. The point is, what we think of as the end result code, language or syntax isn’t the important part. The logic and context are. And they can be presented any number of ways, generated from the user’s chain of thought.
I’m glossing over a ton of details obviously like how the runtime will work. But remember that that part will be provided by the API. It has enough context to understand what the user is trying to do from just this small bit of code. If users want to get explicit and create a process with some amount of guaranteed memory and access to various external devices, they could certainly do that. But there won’t be much reason to, because that infrastructure is explicitly provided by the API.
So it won’t make any sense to write API descriptions in archaic formats like Objective-C or .NET. It will be better to use flat, simple descriptions in human readable formats more like HyperTalk or JSON or even SQL. We’re going for syntax-less, stateless descriptions like static websites so users can always understand what they are seeing and dive down further when they are ready.
Here’s one concrete example of what I’m talking about. Right now to talk to a scanner, you have to use some archaic driver and a bunch of weird commands to control it at too low of a level to be useful to most people. In the future, you’ll just type something like “scanner” on the console, the API will infer that you are asking about the scanner and tell you all the inputs and outputs, and then you’ll type something like “scan A4 to myfile.jpg at 300 dpi grayscale” and the API will figure out what you want and make the file. You can always try again in color or with a different colorspace or whatever else you want, as you dive further into the available options. No drivers needed, because your code interfaces the API in the scanner directly, just like if it was a website you were pointing and clicking on. Cool!
But remember that there are truths to programming that will never go away. People will always do things their own way. We’ll probably never get past the strict/unforgiving APIs that operating systems and embedded devices or peripherals use today. But with a pure middle layer like this, we’ll be able to treat the relationships as the real OS and their crappy APIs as any other black box. If a process like BIND or Apache crashes all the time or has to be sandboxed, then that description will be encoded in its logic hash. Think of it this way. If the user has to check for an exception or failure code from a process, then that bit of crapulence will get encoded just like any other logic.
But many of the smaller algorithms will be provably error-free. As they evolve, then the seemingly intractable tasks of making things like web servers will begin to become commonplace. By that I mean, kids that can read will be able to write web servers. And they will.
It won’t seem unusual to write a program that does 90% of what commercial programs like Photoshop do in a few minutes. Just create a screen, create brushes, create layers, create special FX to act on each pixel, and before you know it you have a functionally-written app with no real main loop, that can’t crash, and which can be trivially extended.
This seems impossible until you consider leverage. You don’t just go and write something like Photoshop as a pure console app. You write it in something like Qt or above the web browser that comes as part of your API snapshot. Only in the future, the APIs will be much higher level than even web browsers today. They’ll work more like Apple’s Siri. They will always be bending over backwards to do exactly what you want. Someday they’ll say computers are man’s best friend, not dogs.
This all goes beyond what I could ever do right now, but I don’t think a bare bones implementation of this is actually that complicated. It will honestly look like a wiki with a text prompt. I’m not really sure what I would do with it, because I can already program. But I’m very curious to see what children come up with. They will want to write their own versions of anything that costs money. Maybe that’s a hint right there.