I'm sure it's slightly more efficient as a static inline function, but
it results in less flash usage as a separate function. This is important
for the bootloader where every byte matters.
Nuvoton's sample startup_M251.S file handles enough initialization for
my purposes, so I can completely bypass _start and jump directly to
main. Note that I also had to add a define to enable clearing of BSS.
The default values for SystemCoreClock, CyclesPerUs, and PllClock work
fine for my purposes of running from the 48 MHz HIRC. Remove unnecessary
initialization code. This is especially useful for the bootloader where
flash space is at a premium.
Also strip out unneeded UART setup code.
This will be used during firmware updates so that the main firmware can
communicate to the bootloader that it should stay in the bootloader for
a firmware update rather than run the main firmware again.
The problem is that over time, the meaning of curChipType has changed.
It was originally meant to exactly map to chips (four SST39SF040 chips
or four M29F160FB5AN6E2 chips) but over time its meaning has shifted to
simply indicating whether the unlock address needs to be shifted or not.
When curChipType is ParallelFlash_SST39SF040_x4, sometimes the
programming size is 4 MB or 8 MB. So don't restrict it to 2 MB.
Note that the erase sector sizes are just plain wrong in this case. In
the future I should read the chip ID and keep a table of sector sizes
for each known chip ID.
This README contains an overall summary of the entire project, but it
was missing build instructions for the firmware. Add them now that cmake
is all set up.
This allows building with CMake instead of Eclipse. The reasoning behind
this is to make the code more easily portable to other architectures,
and to move away from being dependent on Eclipse.
The 646 and 1286 have different required USB PLL values when you have a
16 MHz crystal. Detect the chip at runtime to set up the PLL correctly.
This requires taking over control of the PLL from LUFA.
A whole bunch of files in this project had DOS line endings. This is due
to how I started working on it on a Windows machine with little Git
experience. Now it's inconsistent so I'm fixing it.
I noticed that after I implemented the SPI optimization of cycle
counting instead of polling on SPIF, the first "normal" SPI transaction
I tried would fail. This is because nothing was clearing the SPIF flag
anymore, and the normal SPI driver still looks at it. So it was thinking
that the latest transaction was already completed (it wasn't). Worked
around this by making sure we clear the flag in SPI_Assert. I'm not
concerned about performance impact here because the actual clean SPI
driver is not used in performance-bound situations.
Fixed an issue that identified the wrong pins as shorted to ground in
the electrical test functionality. Whoops!
If multiple read or write cycles are done in sequence, we'll no longer
needlessly update the data direction registers (which is a slow SPI
transaction). We can also skip updating the pullups on the AVR if
multiple read cycles occur in sequence.
This makes the code pretty easily portable to other architectures if someone
wants to make a more modern SIMM programmer. I also was pretty careful to split
responsibilities of the different components and give the existing components
better names. I'm pretty happy with the organization of the code now.
As part of this change I have also heavily optimized the code. In particular,
the read and write cycle routines are very important to the overall performance
of the programmer. In these routines I had to make some tradeoffs of code
performance versus prettiness, but the overall result is much faster
programming.
Some of these performance changes are the result of what I discovered when
I upgraded my AVR compiler. I discovered that it is smarter at looking at 32-bit
variables when I use a union instead of bitwise operations.
I also shaved off more CPU cycles by carefully making a few small tweaks. I
added a bypass for the "program only some chips" mask, because it was adding
unnecessary CPU cycles for a feature that is rarely used. I removed the
verification feature from the write routine, because we can always verify the
data after the write chunk is complete, which is more efficient. I also added
assumptions about the initial/final state of the CS/OE/WE pins, which allowed me
to remove more valuable CPU cycles from the read/write cycle routines.
There are also a few enormous performance optimizations I should have done a
long time ago:
1) The code was only handling one received byte per main loop iteration. Reading
every byte available cut nearly a minute off of the 8 MB programming time.
2) The code wasn't taking advantage of the faster programming command available
in the chips used on the 8 MB SIMM.
The end result of all of these optimizations is I have programming time of the
8 MB SIMM down to 3:31 (it used to be 8:43).
Another minor issue I fixed: the Micron SIMM chip identification wasn't working
properly. It was outputting the manufacturer ID again instead of the device ID.