Add explanations of encrypted passwords, and fork vs vfork.

This commit is contained in:
Rob Landley 2006-01-29 06:29:01 +00:00
parent 08a1b5095d
commit b1b3cee831

View File

@ -12,6 +12,11 @@
</ul> </ul>
<li><a href="#adding">Adding an applet to busybox</a></li> <li><a href="#adding">Adding an applet to busybox</a></li>
<li><a href="#standards">What standards does busybox adhere to?</a></li> <li><a href="#standards">What standards does busybox adhere to?</a></li>
<li><a href="#tips">Tips and tricks.</a></li>
<ul>
<li><a href="#tips_encrypted_passwords">Encrypted Passwords</a></li>
<li><a href="#tips_vfork">Fork and vfork</a></li>
</ul>
</ul> </ul>
<h2><b><a name="goals" />What are the goals of busybox?</b></h2> <h2><b><a name="goals" />What are the goals of busybox?</b></h2>
@ -172,6 +177,116 @@ applet is otherwise finished. When polishing and testing a busybox applet,
we ensure we have at least the option of full standards compliance, or else we ensure we have at least the option of full standards compliance, or else
document where we (intentionally) fall short.</p> document where we (intentionally) fall short.</p>
<h2><a name="tips" />Programming tips and tricks.</a></h2>
<p>Various things busybox uses that aren't particularly well documented
elsewhere.</p>
<h2><a name="tips_encrypted_passwords">Encrypted Passwords</a></h2>
<p>Password fields in /etc/passwd and /etc/shadow are in a special format.
If the first character isn't '$', then it's an old DES style password. If
the first character is '$' then the password is actually three fields
separated by '$' characters:</p>
<pre>
<b>$type$salt$encrypted_password</b>
</pre>
<p>The "type" indicates which encryption algorithm to use: 1 for MD5 and 2 for SHA1.</p>
<p>The "salt" is a bunch of ramdom characters (generally 8) the encryption
algorithm uses to perturb the password in a known and reproducible way (such
as by appending the random data to the unencrypted password, or combining
them with exclusive or). Salt is randomly generated when setting a password,
and then the same salt value is re-used when checking the password. (Salt is
thus stored unencrypted.)</p>
<p>The advantage of using salt is that the same cleartext password encrypted
with a different salt value produces a different encrypted value.
If each encrypted password uses a different salt value, an attacker is forced
to do the cryptographic math all over again for each password they want to
check. Without salt, they could simply produce a big dictionary of commonly
used passwords ahead of time, and look up each password in a stolen password
file to see if it's a known value. (Even if there are billions of possible
passwords in the dictionary, checking each one is just a binary search against
a file only a few gigabytes long.) With salt they can't even tell if two
different users share the same password without guessing what that password
is and decrypting it. They also can't precompute the attack dictionary for
a specific password until they know what the salt value is.</p>
<p>The third field is the encrypted password (plus the salt). For md5 this
is 22 bytes.</p>
<p>The busybox function to handle all this is pw_encrypt(clear, salt) in
"libbb/pw_encrypt.c". The first argument is the clear text password to be
encrypted, and the second is a string in "$type$salt$password" format, from
which the "type" and "salt" fields will be extracted to produce an encrypted
value. (Only the first two fields are needed, the third $ is equivalent to
the end of the string.) The return value is an encrypted password in
/etc/passwd format, with all three $ separated fields. It's stored in
a static buffer, 128 bytes long.</p>
<p>So when checking an existing password, if pw_encrypt(text,
old_encrypted_password) returns a string that compares identical to
old_encrypted_password, you've got the right password. When setting a new
password, generate a random 8 character salt string, put it in the right
format with sprintf(buffer, "$%c$%s", type, salt), and feed buffer as the
second argument to pw_encrypt(text,buffer).</p>
<h2><a name="tips_vfork">Fork and vfork</a></h2>
<p>On systems that haven't got a Memory Management Unit, fork() is unreasonably
expensive to implement, so a less capable function called vfork() is used
instead.</p>
<p>The reason vfork() exists is that if you haven't got an MMU then you can't
simply set up a second set of page tables and share the physical memory via
copy-on-write, which is what fork() normally does. This means that actually
forking has to copy all the parent's memory (which could easily be tens of
megabytes). And you have to do this even though that memory gets freed again
as soon as the exec happens, so it's probably all a big waste of time.</p>
<p>This is not only slow and a waste of space, it also causes totally
unnecessary memory usage spikes based on how big the _parent_ process is (not
the child), and these spikes are quite likely to trigger an out of memory
condition on small systems (which is where nommu is common anyway). So
although you _can_ emulate a real fork on a nommu system, you really don't
want to.</p>
<p>In theory, vfork() is just a fork() that writeably shares the heap and stack
rather than copying it (so what one process writes the other one sees). In
practice, vfork() has to suspend the parent process until the child does exec,
at which point the parent wakes up and resumes by returning from the call to
vfork(). All modern kernel/libc combinations implement vfork() to put the
parent to sleep until the child does its exec. There's just no other way to
make it work: they're sharing the same stack, so if either one returns from its
function it stomps on the callstack so that when the other process returns,
hilarity ensues. In fact without suspending the parent there's no way to even
store separate copies of the return value (the pid) from the vfork() call
itself: both assignments write into the same memory location.</p>
<p>One way to understand (and in fact implement) vfork() is this: imagine
the parent does a setjmp and then continues on (pretending to be the child)
until the exec() comes around, then the _exec_ does the actual fork, and the
parent does a longjmp back to the original vfork call and continues on from
there. (It thus becomes obvious why the child can't return, or modify
local variables it doesn't want the parent to see changed when it resumes.)
<p>Note a common mistake: the need for vfork doesn't mean you can't have two
processes running at the same time. It means you can't have two processes
sharing the same memory without stomping all over each other. As soon as
the child calls exec(), the parent resumes.</p>
<p>(Now in theory, a nommu system could just copy the _stack_ when it forks
(which presumably is much shorter than the heap), and leave the heap shared.
In practice, you've just wound up in a multi-threaded situation and you can't
do a malloc() or free() on your heap without freeing the other process's memory
(and if you don't have the proper locking for being threaded, corrupting the
heap if both of you try to do it at the same time and wind up stomping on
each other while traversing the free memory lists). The thing about vfork is
that it's a big red flag warning "there be dragons here" rather than
something subtle and thus even more dangerous.)</p>
<br> <br>
<br> <br>
<br> <br>