Conceptual Underpinnings
Generated Secrets
Account secrets can be saved in encrypted form, as with password vaults, or generated from a root secret. Generated secrets have several important benefits. First, they are produced from a random seed, and so are quite unpredictable. This is important, because predictability can be exploited when cracking passwords. Second, if the root secret is shared with another trusted party, then you both can generate new shared secrets without passing any further secrets. Furthermore, if you keep a copy of the root secret, say in a safe deposit box, then there is a good chance you can resurrect your secrets if you happen to lose your accounts files. Finally, with generated secrets, it is possible to have stealth secrets, which are secrets for which there is absolutely no evidence.
Secrets are generated from a collection of seeds, one of which must be random with a very high degree of entropy. The random seed is referred to as the ‘master seed’. It is extremely important that the master seed remain completely secure. Never disclose a master seed to anyone except for a person you wish to collaborate with, and then only use the shared master seed for shared secrets. Each file that contains accounts will contain a master seed for the accounts it holds. Typically, you would have one file to hold your private accounts, and then one file for every group of people you collaborate with.
A secret is generated by combining a master seed with several other seeds, such as the account name, the secret name, and perhaps a version name. The combination is then hashed to form a long binary number that is unique to your secret. From there the number is transformed into a usable form by one of the Secrets classes. PIN() converts it to a sequence of digits, Password() converts it to a sequence of characters, Passphrase converts it to a sequence of words, etc.
For example, consider the following rather abbreviated accounts file:
from avendesora import Account, Passphrase
master_seed = 'c2VjcmV0IG1lc3NhZ2UsIHN1Y2Nlc3NmdWxseSBkZWNvZGVkIQ'
class Login(Account):
username = 'rand'
passcode = Passphrase()
This file contains one secret, the login passphrase for Rand. In this case, the master seed is combined with the words ‘login’ (the account name, down cased) and ‘passcode’ (the attribute name); the combination is hashed, and the hash is used to generate the passphrase. The words in the passphrase are chosen at random from a dictionary of roughly 10,000 words. The first word is chosen by taking the first 14 bits from the hash and using that number to select a word. The second word is chosen using the next 14 bits, and so on. The hash is constructed such that even the smallest changes in any seed results in a completely different hash. As such, the resulting passphrase is quite unpredictable:
> avendesora login
username: rand
passcode: dither estate cockroach flavoring
The passcode itself is not stored, rather it is the seeds that are stored and the passcode is regenerated when needed. Notice that all the seeds except the master seed need not be kept secure. Thus, once you have shared a master seed with a collaborator, all you need to do is share the remaining seeds and your collaborator can generate exactly the same passcode.
Another important thing to notice is that the generated passcode is dependent on the account and secret names. Thus if you rename your account or your secret, the passcode will change. So you should be careful when you first create your account to choose names appropriately so you don’t feel the need to change them later.
Entropy
A 4 word Avendesora password provides 53 bits of entropy, which seems like a lot, but NIST is recommending 80 bits for your most secure passwords. So, how much is actually required? It is worth exploring this question.
Entropy is a measure of how hard the password is to guess. Specifically, it is the base two logarithm of the likelihood of guessing the password in a single guess. Every increase by one in the entropy represents a doubling in the difficulty of guessing your password. The actual entropy is hard to pin down, so generally we talk about the minimum entropy, which is the likelihood of an adversary guessing the password if he or she knows everything about the scheme used to generate the password but does not know the password itself. So in this case the minimum entropy is the likelihood of guessing the password if it is known that we are using 4 space separated words as our passphrase where the words are selected at random with a uniform distribution from a known list. This is very easy to compute. There are roughly 10,000 words in our dictionary, so if there was only one word in our passphrase, the chance of guessing it would be one in 10,000 or 13 bits of entropy. If we used a two word passphrase the chance of guessing it in a single guess is one in 10,000*10,000 or one in 100,000,000 or 26 bits of entropy.
The probability of guessing our passphrase in one guess is not our primary concern. Really what we need to worry about is given a determined attack, how long would it take to guess the password. To calculate that, we need to know how fast our adversary could try guesses. If they are trying guesses by typing them in by hand, their rate is so low, say one every 10 seconds, that even a one word passphrase may be enough to deter them. This is why bank PINs can be so short. Our one word passphrase provides roughly the same security as a four digit PIN. Alternatively, they may have a script that automatically tries passphrases through a login interface. Again, generally the rate is relatively slow. Perhaps at most they can get is 1000 tries per second. In this case they would be able to guess a one word passphrase in 10 seconds and a two word passphrase in a day, but a 4 word passphrase would require 300,000 years to guess in this way.
The next important thing to think about is how your password is stored by the machine or service you are logging into. The worst case situation is if they save the passwords in plain text. In this case if someone were able to break in to the machine or service, they could steal the passwords. Saving passwords in plain text is an extremely poor practice that was surprisingly common, but is becoming less common as companies start to realize their liability when their password files get stolen. Instead, they are moving to saving passwords as hashes. A hash is a transformation that is very difficult to reverse, meaning that if you have the password it is easy to compute its hash, but given the hash it is extremely difficult to compute the original password. Thus, they save the hashes (the transformed passwords) rather than the passwords. When you log in and provide your password, it is transformed with the hash and the result is compared against the saved hash. If they are the same, you are allowed in. In that way, your password is not stored and so is no longer available to thieves that break in. However, they can still steal the file of hashed passwords, which is not as good as getting the plain text passwords, but it is still valuable because it allows thieves to greatly increase the rate that they can try passwords. If a poor hash was used to hash the passwords, then passwords can be tried at a very high rate. For example, it was recently reported that password crackers were able to try 8 billion passwords per second when passwords were hashed with the MD5 algorithm. This would allow a 4 word passphrase to be broken in 14 days, whereas a 6 word password would still require 4,000,000 years to break. The rate for the more computational intensive sha512 hash was only 2,000 passwords per second. In this case, a 4 word passphrase would require 160,000 years to break.
In most cases you have no control over how your passwords are stored on the machines or services that you log into. Your best defense against the notoriously poor security practices of most sites is to always use a unique password for sites where you are not in control of the secrets. That way the poor security practices of one site would not compromise your other accounts. For example, you might consider using the same passphrase for your login password and the passphrase for an ssh key on a machine that you administer, but never use the same password for two different websites unless you do not care if the content of those sites become public.
So, if we return to the question of how much entropy is enough, you can say that for important passwords where you are in control of the password database and it is extremely unlikely to get stolen, then four randomly chosen words from a reasonably large dictionary is plenty. If what the passphrase is trying to protect is very valuable and you do not control the password database (ex., your brokerage account) you might want to follow the NIST recommendation and use 6 words to get 80 bits of entropy. If you are typing passwords on your work machine, many of which employ keyloggers to record your every keystroke, then no amount of entropy will protect you from anyone that has or gains access to the output of the keylogger. In this case, you should consider things like one-time passwords or two-factor authentication. Or better yet, only access sensitive accounts from your home machine and not from any machine that you do not control.