The problem with-one time pads is that the pads must be kept secret by both parties for at least the full lifetime of any cyphertext they were used to encode, and if they are to be genuinely one-time then they will only allow a finite number of bits to be securely transmitted.
We can solve the first problem by keeping the one time pads secret forever. And we can solve the second problem once we recognise what we really mean by the term random, which is another way of saying that there is no knowable reason why any particular bit of the pad should be either 0 or 1. And so what is random information and what is not depends upon what we know, and not on the information itself. This is the case with the random numbers chosen for one-time pads: from the point of view of the communicating agents the pads are not random, but from the point of view of someone not in possession of the information, they are random.
It follows then that provided we assume the system is indeed secure to start with, then we can use the initial one-time pad to encrypt the exchange of further pad data for subsequent messages, provided we are careful to ensure that the attacker never has a crib, which is a piece of data which he knows is probably a part of the plaintext. The possibility of a crib arises because of the symmetry, from the point of view of the external observer, between a random pad and plaintext. We can avoid the possibility of cribs fairly straightfowardly by using a channel protocol which allows the sender to insert messages into the stream at random, which messages instruct the receiver to alter the encryption algorithm to be used for subsequent messages. If the alterations to the algorithm are in the form of fragments of executable program code, which describe mathematical functions which can be automatically composed in some way with the existing encryption that is in effect, then the attacker will not have any plausible cribs, because he would have no reason to suppose any one program fragment is any more likely than any other. We may then reason circularly, that our initial assumption that the system is indeed secure, holds good, and consequently that we can continue to use it to securely transmit all further one-time pads over the same channel, ad infinitum.
The problem is thus reduced to just that of securing the initial exchange of one-time pads. It may at first seem that secure pad exchange could never be achieved without some direct communication between the two parties, so that they can be certain they are not communicating through an unknown man-in-the-middle who has access to one or both of the initial one-time pads, and who can therefore read all the messages they exchange on that channel. The problem of pad exchange is not a problem of security however, but rather one of identity. And again, we can solve it by thinking carefully about what we really mean by the term identity.
Contrary to popular opinion, somone's identity is not "who they really are," nor even "who they know they are." Such definitions of identity are practically unverifiable, because they don't tell us anything at all about the individual concerned. If you are in La Paz, Bolivia, for example, you probably cannot get a new UK passport, by sending a letter to the British Embassy in Washington D.C. and declaring "I'm me!" They need to know who you are before they can identify you as the one-time holder of British Passport No. 307906487. So identity is not a matter of who someone really is, it is a matter of what everyone else knows about that person, in other words, it is common knowledge. So in order to establish the identity of someone with whom we wish to communicate, all we need is to establish sufficient common knowledge to convince ourselves that this person with whom we are in communication is indeed the same person of whom we share some of the common knowledge. You may well wonder then why we need identity documents to serve as so-called credentials, and the answer is simply that we don't, and the proof is that identity documents can and often are forged, but not even the CIA can forge common knowledge. If this sounds a bit puzzling, then don't worry, you're in good company: Bertrand Russell didn't seem to understand it either. But you, dear reader, can do better, just by thinking over the problem for a while and discussing it with your friends in a cooperative and inquiring manner. Thus, as we found was the case in the notion of randomness, the notion of identity too is a matter, not of fact, but of the knowledge others may or may not have.
Since we are intending to use our knowledge of the identity of the person with whom we are communicating to establish a secure communications channel, we can once again assume that the channel is indeed secure, and on the basis of this assumption, we can use that same channel to establish beyond any reasonable doubt that the person with whom we are communicating is indeed the person of whom we share some of the common knowledge. Then we proceed on the basis of directly shared common knowledge, in the form of biometric information, such as photographs, fingerprints, iris patterns, spoken voice recordings and DNA samples, combined with shared personal knowledge of life history, which may include such details as "one-time holder of British Passport No. 307906487", or it may not.
Having thus established sufficient direct, i.e. biometric, knowledge of one another's identity, we can then fairly straight-forwardly extend this throughout the whole network, using shared multi-party communications. The principle is that we pass the biometric information -- information representing the direct knowledge people have of other people's bodies and brains -- around the network, so that any path on the resulting graph will be a chain of trust extending from the direct knowledge individuals have of one another, and passing via other chains of trust, to form what mathematicians call the transitive closure of the trust relation, which is ultimately founded on individual direct knowledge particular people have about each other.
Then what we mean by Identity-with-a-capital-I, amounts to the common knowledge which the whole of Humanity has, as to the essential identity of those individuals. This essential identity that individuals have in the common knowledge as a whole is strictly more than just their physical or bodily identity, or indeed their biographical and genealogical identities -- all of which are accidents of their being --- and which inevitably change with the passage of time: their Identity within the whole of common knowledge is their higher eternal identity of mind.
During this process of sharing of the information we have about people, we hope not to have to rely on any single "authoritative" source of information as to an individual's identity, because that would be no better than a passport. We need to find an algorithm which can somehow quantify the trustworthiness of any statement as to the identity of any individual, from the point of view of any other, in terms of the number of independent sources available, and the degree to which those sources corroborate each other. One possible method might be to include only the sources amongst which there is no disagreement whatsoever, and assign a score of one to each of them, but that might not be sufficient to establish the web initially, and it may be necessary to allow degrees of disagreement to be accounted for by adjusting the weights of evidence arriving via the different chains. Whatever is the particular algorithm chosen, it should be computed in a distributed manner, so that as any individual's identity information floods through the network, the calculation of the quantitative degree of trust is carried out at each node. Perhaps we could ask the people at Google how to do this sort of thing?
An indicator of the feasibility of establishing a transitive web of trust is the notion sometimes called "six degrees of separation". A similar notion is evident in "The Kevin Bacon Game", in which people are given the name of an actor or actress, and have to try to find the shortest chain of actors who have appeared together in films, and which connect that person with Kevin Bacon. But there are a great many different ways, apart from acting together in movies, in which different people may be connected to one another in common knowledge. So when we combine all possible connections, we may find that the "degree of separation" is in fact closer to one than to six. In other words, as we include more and more genealogical and biographical information about ourselves in the system, we will surely find that we share many, many more connections between ourselves (not to mention Kevin Bacon) than were ever immediately apparent.
The final flourish, will be to provide a "launchpad" whereby individuals could prime certain "addresses" to accept one or more programs from one or more different types of communications media, such as SMS messages, Web Server URLs, IP Datagrams, USB flash memory devices, MicroSD cards attached to the legs of migrating Canada Geese, or scanned pieces of paper recovered from floating bottles, and run those programs, passing the output of one to the input of another. As well as exchanging the initial one-time pad and identity credentials, each pair of people authenticating would exchange, let's say six, primes, which are each sets of two 10 digit random addresses, and a 10 digit random pad. If each person then authenticated with another six others, then each would have six possible nodes to which they could send programs, and which programs could trigger the sending of further messages, either directly, or indirectly via other primes, depending on the particular programs actually sent. These prime sheets would not be transmitted electronically, in the first-stage bootstrap at least, they would be little pieces of hand-written paper, and the bearer would record separately to which physical address and protocol any sheet of primes corresponds.
The purpose of the primes is to provide an asynchronous boot process, so that the initial message exchange that triggers, say, reading of a one-time pad by the receiver, is not correlated in any externally observable way, with the actions of the sender of that message, to whom that pad in fact corresponds. The same mechanism also allows us to keep the initial pads a secret for ever, because we can arrange for the actual pads that are used to be hybrids of the pads we know, based on the order in which the initial messages just happen to arrive at their destinations, and then we ourselves could not at any point actually know that order, any better than a would-be attacker could. And if we don't actually know the secret, then we can't "spill the beans," even under extreme duress.
The reason we suggest n primes shared between each of n peers is to make possible the random exchange of primes to defeat an attacker who can observe all the parties engaging in pad-exchanges and thereby identify all the potential participants. Provided he cannot observe which primes are swapped during authentication, he will not be able to correlate the transmission and reception of the first step of each initial message transmission.
It should be clear that what we have described above is just one possible way to start a network. One would not expect to be able to have complete confidence in the result, especially given the impossibility of actually knowing that the network has not actually been compromised. So one should expect to have to iterate the process, using several independent first-stage networks as transport media to bootstrap a second-stage. Then proceed to a third-stage from several independent second-stage networks, etc. etc.
The reason we suggest n primes shared between each of n peers is to make possible the random exchange of primes to defeat an attacker who can observe all the parties engaging in pad-exchanges and thereby identify all the potential participants. Provided he cannot observe which primes are swapped during authentication, he will not be able to correlate the transmission and reception of the first step of each initial message transmission.
It should be clear that what we have described above is just one possible way to start a network. One would not expect to be able to have complete confidence in the result, especially given the impossibility of actually knowing that the network has not actually been compromised. So one should expect to have to iterate the process, using several independent first-stage networks as transport media to bootstrap a second-stage. Then proceed to a third-stage from several independent second-stage networks, etc. etc.
I hope this is not too hairy. I think it is just about possible for someone to get the gist of it from prose alone, without using diagrams or notes. Ed Dijkstra recommended this as a good way of making sure an algorithm isn't too complicated. I hope he's right!
So much for the theory, then. That's just to get the academics off our backs and buy ourselves a bit of breathing space. Don't wait for them to unanimously agree that it's a Good Thing, they never will. Make the most of it by just getting on with building it. Actual security is not theoretical, it's practical, and it's not a matter of fact, it's a matter of knowledge. I say "it", but I mean them. Don't just write millions of lines of technical bit-twiddling code implementing one huge rat's nest of networking wire, like ARPANET, then wait for it to be hacked, because it will be hacked. In fact, try not to write any code at all. Certainly, you should not be exchanging code. You should exchange only formal definitions, and you should independently develop the absolute minimum of manually-written code you need to automatically translate those formal definitions into actual working, interpretable or compilable computer languages.
All we need are a few neat and simple bytecode language definitions, and a few languages like node.js for writing low-level interpreters.
For the languages, look at Reynold's "Definitional Interpreters". Write translators which parse those interpreters and JIT compile lightning or LLVM IL implementations: both of these support tail-recursion, and will easily interpret Reynold's interpreter IV. Then write a term-rewriting system in that, look at PLT redex to get some ideas of how you can make a system with completely configurable semantics. If you've got the guts, look at System F and proofs and Types, but as soon as you think you're about to go nuts, put it down and write some concrete code. But not too much, just for therapy, you know! Try writing a Reynolds-style interpreter for System F, and compile it into lightning or LLVM, or something else. If that works, try defining lightning as system F primitives of "atomic" type. For parsers, look at Tom Ridge's P3 and P4 parsers: they will be easy to write in System F and Reynold's III/IV and V.
For the interpreters, look at the ML Kit, which, in the guise of smltojs, lets you write browser JavaScript in a really nice typed functional language, and will be fairly easy to coerce into writing neat little JavaScript server apps in ML. I wouldn't bother with SMLServer though, at least while it needs Apache to run.
And while you do that, exchange primes, under many and various schemes, at every available opportunity. Make friends with really weird people and set up networks with them. Don't think of networks as precious. Play around with them, try to break them deliberately, and report only the negative results publicly. We don't need another OpenBSD honeypot. Don't be afraid to set up "spare" networks, and "spoof" networks and redundant duplicates. The more smoke around the sooner the little girtls at the NSA and GCHQ will say "Fuck it, let's just join in." And don't turn them away. You need secure communications with your enemy much more than you need them with your friends. You are not going to start a war with with your friends, and create a new extremist religious state, over a silly little misunderstanding, are you?
Read about X.500 directory server. Directories are vital, and having a good formal description from which you can automatically generate implementations in any language will make the higher-level functions like mail and instant-messaging, and distributed JavaScript RAID disk block loops running on Chrome browsers easier to configure remotely.
The three laws of metarobotics are
And it's fun!
So much for the theory, then. That's just to get the academics off our backs and buy ourselves a bit of breathing space. Don't wait for them to unanimously agree that it's a Good Thing, they never will. Make the most of it by just getting on with building it. Actual security is not theoretical, it's practical, and it's not a matter of fact, it's a matter of knowledge. I say "it", but I mean them. Don't just write millions of lines of technical bit-twiddling code implementing one huge rat's nest of networking wire, like ARPANET, then wait for it to be hacked, because it will be hacked. In fact, try not to write any code at all. Certainly, you should not be exchanging code. You should exchange only formal definitions, and you should independently develop the absolute minimum of manually-written code you need to automatically translate those formal definitions into actual working, interpretable or compilable computer languages.
All we need are a few neat and simple bytecode language definitions, and a few languages like node.js for writing low-level interpreters.
For the languages, look at Reynold's "Definitional Interpreters". Write translators which parse those interpreters and JIT compile lightning or LLVM IL implementations: both of these support tail-recursion, and will easily interpret Reynold's interpreter IV. Then write a term-rewriting system in that, look at PLT redex to get some ideas of how you can make a system with completely configurable semantics. If you've got the guts, look at System F and proofs and Types, but as soon as you think you're about to go nuts, put it down and write some concrete code. But not too much, just for therapy, you know! Try writing a Reynolds-style interpreter for System F, and compile it into lightning or LLVM, or something else. If that works, try defining lightning as system F primitives of "atomic" type. For parsers, look at Tom Ridge's P3 and P4 parsers: they will be easy to write in System F and Reynold's III/IV and V.
For the interpreters, look at the ML Kit, which, in the guise of smltojs, lets you write browser JavaScript in a really nice typed functional language, and will be fairly easy to coerce into writing neat little JavaScript server apps in ML. I wouldn't bother with SMLServer though, at least while it needs Apache to run.
And while you do that, exchange primes, under many and various schemes, at every available opportunity. Make friends with really weird people and set up networks with them. Don't think of networks as precious. Play around with them, try to break them deliberately, and report only the negative results publicly. We don't need another OpenBSD honeypot. Don't be afraid to set up "spare" networks, and "spoof" networks and redundant duplicates. The more smoke around the sooner the little girtls at the NSA and GCHQ will say "Fuck it, let's just join in." And don't turn them away. You need secure communications with your enemy much more than you need them with your friends. You are not going to start a war with with your friends, and create a new extremist religious state, over a silly little misunderstanding, are you?
Read about X.500 directory server. Directories are vital, and having a good formal description from which you can automatically generate implementations in any language will make the higher-level functions like mail and instant-messaging, and distributed JavaScript RAID disk block loops running on Chrome browsers easier to configure remotely.
The three laws of metarobotics are
- Don't steal.
- Don't lie.
- Don't be lazy.
That's all there is to it.
And it's fun!
No hay comentarios:
Publicar un comentario