The following is an excerpt from the new book Cryptography in the Database: The Last Line of Defense (2006) by Kevin Keenan and published by Addison-Wesley. Kevin Kenan presents a start-to-finish blueprint and execution plan for designing and building—or selecting and integrating—a complete database cryptosystem. Kenan systematically shows how to eliminate weaknesses, overcome pitfalls, and defend against attacks that can compromise data even if it's been protected by strong encryption. For more information or to purchase the book, click here.
What is cryptography?
Cryptography is the art of "extreme information security." It is extreme in the
sense that once treated with a cryptographic algorithm, a message (or a database
field) is expected to remain secure even if the adversary has full access to the
treated message. The adversary may even know which algorithm was used. If the
cryptography is good, the message will remain secure.
This is in contrast to most information security techniques, which are designed
to keep adversaries away from the information. Most security mechanisms prevent
access and often have complicated procedures to allow access to only authorized
users. Cryptography assumes that the adversary has full access to the message and
still provides unbroken security. That is extreme security.
A more popular conception of cryptography characterizes it as the science of
"scrambling" data. Cryptographers invent algorithms that take input data, called
plaintext, and produce scrambled output. Scrambling, used in this sense, is much
more than just moving letters around or exchanging some letters for others. After a
proper cryptographic scrambling, the output is typically indistinguishable from a
random string of data. For instance, a cryptographic function might turn "Hello,
whirled!" into 0x397B3AF517B6892C.
While simply turning a message into a random sequence of bits may not seem
useful, you'll soon see that cryptographic hashes, as such functions are known, are
very important to modern computer security. Cryptography, though, offers
much more.
Many cryptographic algorithms, but not all, are easily reversible if you
know a particular secret. Armed with that secret, a recipient could turn
0x397B3AF517B6892C back into "Hello, whirled!" Anyone who did not know the
secret would not be able to recover the original data. Such reversible algorithms are
known as ciphers, and the scrambled output of a cipher is ciphertext. The secret
used to unscramble ciphertext is called a key. Generally, the key is used for both
scrambling, called encryption, and unscrambling, called decryption.
A fundamental principle in cryptography, Kerckhoffs' Principle, states that
the security of a cipher should depend only on keeping the key secret. Even if
everything else about the cipher is known, so long as the key remains secret,
the plaintext should not be recoverable from the ciphertext.
The opposite of Kerckhoffs' Principle is security through obscurity. Any
cryptographic system where the cipher is kept secret depends on security through
obscurity. Given the difficulty that even professional cryptographers have in
designing robust and efficient encryption systems, the likelihood of a secret cipher
providing better security than any of the well-known and tested ciphers is
vanishingly small. Plus, modern decompilers, disassemblers, debuggers, and other
reverse-engineering tools ensure that any secret cipher likely won't remain secret
for long.
Cryptographic algorithms can be broadly grouped into three categories:
symmetric cryptography, asymmetric (or public-key) cryptography, and
cryptographic hashing. Each of these types has a part to play in most cryptographic
systems, and we next consider each of them in turn.
Symmetric Cryptography
Symmetric key cryptography is so named because the cipher uses the same key for
both encryption and decryption. Two famous ciphers, Data Encryption Standard
(DES) and Advanced Encryption Standard (AES), both use symmetric keys.
Because symmetric key ciphers are generally much faster than public-key ciphers,
they are suitable for encrypting small and large data items.
Modern symmetric ciphers come in two flavors. Block ciphers encrypt a chunk of
several bits all at once, while stream ciphers generally encrypt one bit at a time as the
data stream flows past. When a block cipher must encrypt data longer than the
block size, the data is first broken into blocks of the appropriate size, and then the
encryption algorithm is applied to each. Several modes exist that specify how each
block is handled. The modes enable an algorithm to be used securely in a variety of
situations. By selecting an appropriate mode, for instance, a block cipher can even
be used as stream cipher.
The chief advantage of a stream cipher for database cryptography is that the
need for padding is avoided. Given that block ciphers operate on a fixed block size,
any blocks of data smaller than that size must be padded. Stream ciphers avoid
this, and when the data stream ends, the encryption ends.We'll return to block and
stream ciphers in the algorithm discussion in Chapter 4 "Cryptographic Engines
and Algorithms."
The primary drawback of symmetric key ciphers is key management. Because
the same key is used for both encryption and decryption, the key must be
distributed to every entity that needs to work with the data. Should an adversary
obtain the key, not only is the confidentiality of the data compromised, but integrity
is also threatened given that the key can be used to encrypt as well as decrypt.
The risks posed by losing control of the key make distributing and storing the
key difficult. How can the key be moved securely to all the entities that need to
decrypt the data? Encrypting the key for transmission would make sense, but what
key would be used to encrypt the key, and how would you get the key-encrypting
key to the destination?
Once the key is at the decryption location, how should it be secured so that an
attacker can't steal it? Again, encryption offers a tempting solution, but then you
face the problem of securing the key used to encrypt the original key.
We'll look at these problems in more detail in Chapter 5 "Keys: Vaults,
Manifests, and Managers." In terms of the key distribution problem,
cryptographers have devised an elegant solution using public-key cryptography,
which we examine next.
Public-Key Cryptography
Public-key cryptography, also known as asymmetric cryptography, is a relatively
recent invention. As you might guess from the name, the decryption key is
different from the encryption key. Together, the two keys are called a key pair and
consist of a public key, which can be distributed to the public, and a private key,
which must remain a secret. Typically the public key is the encryption key and the
private key is the decryption key, but this is not always the case.Well-known
asymmetric algorithms include RSA, ElGamal, and Diffie-Hellman. Elliptic curve
cryptography provides a different mathematical basis for implementing existing
public-key algorithms.
Public-key ciphers are much slower than symmetric-key ciphers and so are
typically used to encrypt smaller data items. One common use is to securely
distribute a symmetric key. A sender first encrypts a message with a symmetric key
and then encrypts that symmetric key with the intended receiver's public key. He
then sends both to the receiver. The receiver uses her private key to decrypt the
symmetric key and then uses the recovered symmetric key to decrypt the message.
In this manner the speed of the symmetric cipher is still a benefit, and the problem
of distributing the symmetric key is removed. Such systems are known as
hybrid cryptosystems.
Another important use for public-key cryptography is to create digital
signatures. Digital signatures are used much like real signatures to verify who sent
a message. The private key is used to sign the message, and the public key is used
to verify the signature.
A common, easily understood digital signature scheme is as follows. To sign a
message, the sender encrypts the message with the private key. Anyone with the
corresponding public key can decrypt the message and know that it could only
have been encrypted with the private key, which presumably only the sender
possesses. Note that this does not protect the confidentiality of the message,
considering anyone could have the sender's public key. The goal of a digital
signature is simply to verify the sender.
Because the public key can be distributed to anyone, we don't have the same
problem as we do with symmetric cryptography. However, we do have a problem
of unambiguously matching the public key with the right person. How do we know
that a particular public key truly belongs to the person or entity we think it does?
This is the problem that public key infrastructure (PKI) has tried to solve.
Unfortunately, PKI hasn't lived up to its promise, and the jury is still out on what
the long-term accepted solution will be.
Public-key cryptography is mentioned here to help readers new to cryptography
understand how it is different from symmetric algorithms. We do not use
public-key cryptography in this book, and we do not cover particular algorithms
or implementation details. As is discussed in section 2.3, "Applying Cryptography,"
public-key schemes aren't necessary for solving the problems in which
we're interested.
Cryptographic Hashing
The last type of cryptographic algorithm we'll look at is cryptographic hashing.
A cryptographic hash, also known as a message digest, is like the fingerprint of
some data. A cryptographic hash algorithm reduces even very large data to a small
unique value. The interesting thing that separates cryptographic hashes from other
hashes is that it is virtually impossible to either compute the original data from the
hash value or to find other data that hashes to the same value.
A common role played by hashing in modern cryptosystems is improving the
efficiency of digital signatures. Because public-key ciphers are much slower than
symmetric ciphers, signing large blocks of data is very time-consuming. Instead,
most digital signature protocols specify that the digital signature is instead applied
to a hash of the data. Given that computing a hash is generally fast and the
resulting value is typically much smaller than the data, the signing time is
drastically reduced.
Other common uses of cryptographic hashes include protecting passwords,
time-stamping data to securely track creation and modification dates and times,
and assuring data integrity. The well-known Secure Hash Algorithm family
includes SHA-224, SHA-256, SHA-384, and SHA-512. The older SHA-1 and MD5
algorithms are currently in wider use, but flaws in both have been identified, and
both should be retired in favor of a more secure hash.
For more information or to purchase the book, click here.