Add explanations of encrypted passwords, and fork vs vfork.

2024-11-09 02:09:01 +00:00 · 2006-01-29 06:29:01 +00:00 · 2006-01-29 06:29:01 +00:00 · b1b3cee831
commit b1b3cee831
parent 08a1b5095d
1 changed files with 115 additions and 0 deletions
--- a/docs/busybox.net/programming.html
+++ b/docs/busybox.net/programming.html
@ -12,6 +12,11 @@
  </ul>
  <li><a href="#adding">Adding an applet to busybox</a></li>
  <li><a href="#standards">What standards does busybox adhere to?</a></li>
  <li><a href="#tips">Tips and tricks.</a></li>
  <ul>
    <li><a href="#tips_encrypted_passwords">Encrypted Passwords</a></li>
    <li><a href="#tips_vfork">Fork and vfork</a></li>
  </ul>
 </ul>
 <h2><b><a name="goals" />What are the goals of busybox?</b></h2>
@ -172,6 +177,116 @@ applet is otherwise finished.  When polishing and testing a busybox applet,
 we ensure we have at least the option of full standards compliance, or else
 document where we (intentionally) fall short.</p>
 <h2><a name="tips" />Programming tips and tricks.</a></h2>
 <p>Various things busybox uses that aren't particularly well documented
 elsewhere.</p>
 <h2><a name="tips_encrypted_passwords">Encrypted Passwords</a></h2>
 <p>Password fields in /etc/passwd and /etc/shadow are in a special format.
 If the first character isn't '$', then it's an old DES style password.  If
 the first character is '$' then the password is actually three fields
 separated by '$' characters:</p>
 <pre>
  <b>$type$salt$encrypted_password</b>
 </pre>
 <p>The "type" indicates which encryption algorithm to use: 1 for MD5 and 2 for SHA1.</p>
 <p>The "salt" is a bunch of ramdom characters (generally 8) the encryption
 algorithm uses to perturb the password in a known and reproducible way (such
 as by appending the random data to the unencrypted password, or combining
 them with exclusive or).  Salt is randomly generated when setting a password,
 and then the same salt value is re-used when checking the password.  (Salt is
 thus stored unencrypted.)</p>
 <p>The advantage of using salt is that the same cleartext password encrypted
 with a different salt value produces a different encrypted value.
 If each encrypted password uses a different salt value, an attacker is forced
 to do the cryptographic math all over again for each password they want to
 check.  Without salt, they could simply produce a big dictionary of commonly
 used passwords ahead of time, and look up each password in a stolen password
 file to see if it's a known value.  (Even if there are billions of possible
 passwords in the dictionary, checking each one is just a binary search against
 a file only a few gigabytes long.)  With salt they can't even tell if two
 different users share the same password without guessing what that password
 is and decrypting it.  They also can't precompute the attack dictionary for
 a specific password until they know what the salt value is.</p>
 <p>The third field is the encrypted password (plus the salt).  For md5 this
 is 22 bytes.</p>
 <p>The busybox function to handle all this is pw_encrypt(clear, salt) in
 "libbb/pw_encrypt.c".  The first argument is the clear text password to be
 encrypted, and the second is a string in "$type$salt$password" format, from
 which the "type" and "salt" fields will be extracted to produce an encrypted
 value.  (Only the first two fields are needed, the third $ is equivalent to
 the end of the string.)  The return value is an encrypted password in
 /etc/passwd format, with all three $ separated fields.  It's stored in
 a static buffer, 128 bytes long.</p>
 <p>So when checking an existing password, if pw_encrypt(text,
 old_encrypted_password) returns a string that compares identical to
 old_encrypted_password, you've got the right password.  When setting a new
 password, generate a random 8 character salt string, put it in the right
 format with sprintf(buffer, "$%c$%s", type, salt), and feed buffer as the
 second argument to pw_encrypt(text,buffer).</p>
 <h2><a name="tips_vfork">Fork and vfork</a></h2>
 <p>On systems that haven't got a Memory Management Unit, fork() is unreasonably
 expensive to implement, so a less capable function called vfork() is used
 instead.</p>
 <p>The reason vfork() exists is that if you haven't got an MMU then you can't
 simply set up a second set of page tables and share the physical memory via
 copy-on-write, which is what fork() normally does.  This means that actually
 forking has to copy all the parent's memory (which could easily be tens of
 megabytes).  And you have to do this even though that memory gets freed again
 as soon as the exec happens, so it's probably all a big waste of time.</p>
 <p>This is not only slow and a waste of space, it also causes totally
 unnecessary memory usage spikes based on how big the _parent_ process is (not
 the child), and these spikes are quite likely to trigger an out of memory
 condition on small systems (which is where nommu is common anyway).  So
 although you _can_ emulate a real fork on a nommu system, you really don't
 want to.</p>
 <p>In theory, vfork() is just a fork() that writeably shares the heap and stack
 rather than copying it (so what one process writes the other one sees).  In
 practice, vfork() has to suspend the parent process until the child does exec,
 at which point the parent wakes up and resumes by returning from the call to
 vfork().  All modern kernel/libc combinations implement vfork() to put the
 parent to sleep until the child does its exec.  There's just no other way to
 make it work: they're sharing the same stack, so if either one returns from its
 function it stomps on the callstack so that when the other process returns,
 hilarity ensues.  In fact without suspending the parent there's no way to even
 store separate copies of the return value (the pid) from the vfork() call
 itself: both assignments write into the same memory location.</p>
 <p>One way to understand (and in fact implement) vfork() is this: imagine
 the parent does a setjmp and then continues on (pretending to be the child)
 until the exec() comes around, then the _exec_ does the actual fork, and the
 parent does a longjmp back to the original vfork call and continues on from
 there.  (It thus becomes obvious why the child can't return, or modify
 local variables it doesn't want the parent to see changed when it resumes.)
 <p>Note a common mistake: the need for vfork doesn't mean you can't have two
 processes running at the same time.  It means you can't have two processes
 sharing the same memory without stomping all over each other.  As soon as
 the child calls exec(), the parent resumes.</p>
 <p>(Now in theory, a nommu system could just copy the _stack_ when it forks
 (which presumably is much shorter than the heap), and leave the heap shared.
 In practice, you've just wound up in a multi-threaded situation and you can't
 do a malloc() or free() on your heap without freeing the other process's memory
 (and if you don't have the proper locking for being threaded, corrupting the
 heap if both of you try to do it at the same time and wind up stomping on
 each other while traversing the free memory lists).  The thing about vfork is
 that it's a big red flag warning "there be dragons here" rather than
 something subtle and thus even more dangerous.)</p>
 <br>
 <br>
 <br>