Well, this week so far has been very productive with regards to electronics! This is a project I’ve been wanted to do for ages just haven’t got round to doing. I’ve previously designed VGA controllers on both my CPLD and my FPGA board but I’ve not actually done anything useful other than design a bouncing box and display a few colour bars so I finally thought: why not actually try developing a full graphics controller, how hard can it be?!
The answer to that is actually “not very hard”, I managed to get it working relatively well in one evening of coding, displaying text, images and (unsynchronised) video!
While I love my STM32F0, and wouldn’t swap it for any other platform, it is limited when it comes to precision timing events (or more so, I’m limited in my assembly knowledge!) so all I’m essentially doing is outsourcing all of the precision timing events to the FPGA and shifting data to the FPGA from my STM32F0.
The FPGA side of things is essentially just a super dumb half duplex unidirectional memory interface. The FPGA expects a 24bit SPI transfer consisting of the memory address and pixel data.
Yes, I know that the block diagram method of Altera is for lazy people but I am lazy and don’t see the point in port mapping (for this application) when everything is so much easier to visualise in the block diagram format! Regardless, each section has a different use and I’ll go through what those uses are.
“SR” (Inst4, bottom left):
This is the main interface to the world! Its pretty much a 24bit SIPO shift register where it accepts 24bits in serial form and outputs 24 bits in parallel form (to the busses PDO and MADDR), along with controlling the WR input of the dual port SRAM. To ensure the system is kept stable, the incoming clock, data and latch inputs are synchronised to the the c1 clock, running at 192MHz. The reason this clock is so fast is to ensure the FPGA can capture every edge of the SPI clock/latch from the STM32F0. I’m clocking the STM32F0 SPI at 24MHz and without a drastic amount of oversampling, data from the STM32F0 to the FPGA was getting corrupted.
Upon the latch input going from high to low (falling edge), the WR pin is set low and the two outputs PDO and MADDR are both set to zero. The system then clocks in 24 bits on the rising edge of the clock input (from the STM32F0). If more than 24 bits are clocked in, like a FIFO, all the original bits will get shifted out of the top and will be lost. On the next rising edge of the latch input (low to high), the WR pin is set high, along with the shifted data being written to the two output ports, PDO (Port data output) and MADDR (Memory address). On the next rising edge of c1, this data will be written to the dual port SRAM
PLL (Inst8, farthest left):
This block does pretty much what it says in the name, it PLLs! For those who don’t know, PLLs are effective methods of creating fractional multiples of an input square wave, such as clocks, variable duty cycle pulse waves or phase shifted square waves. The reason I’m using a PLL is to provide two system clocks. One for the VGA controller and memory interface and the other for the shift register. The two output clocks, c0 and c1 have output frequencies of 48MHz and 144MHz, respectively.
VGA Controller (Inst, top right):
This is where all the timing magic happens! This block generated the VSync, HSync, and colour output signals, along with the memory address of the current pixel. The memory address is generated from the horizontal and vertical counters. The actual timings used, give a resolution of 800×600 pixels which if you divide both by 8, gives the chosen resolution of 100×75, nifty huh! Fortunately, division by 8 is as simple as shifting down by 3 bits by using the srl keyword. The VGA controller section actually outputs a 16bit word per pixel as this is what the DACs on my FPGA board support. My controller however only has an 8bit video interface (256 colours). To essentially “upscale” from 8bit colour to 16bit colour, some of the most significant bits are assigned to the lower bits, on the red channel for example:
Red(4 downto 2) <= MemDI(2 downto 0);
Red(1 downto 0) <= MemDI(2 downto 1);
Where MemDI is the data into the controller from the memory module. This upscaling allows the controller to display full red, green and blue, as opposed to only displaying the top few bits. The controller also ensures that the colour outputs are zero during synchronisation phases (required for some monitors).
The VGA controller outputs a memory address to one side of the dual port SRAM and reads the data from the same side on the next clock edge, displaying it pixel by pixel to the screen.
Dual port SRAM memory (Inst7, bottom right):
Dual port SRAM is by far the easiest method of allowing both read and write operations at the same time giving a layer of separation between the shift register interface and the VGA controller. This drastically simplifies potential synchronisation problems that may become present when trying to read and write a single port SRAM at the same time. The problem here however comes with larger occupied space and increased complexity in manufacture. Fortunately for me, the FPGA I’m using (Altera Cyclone IV EP4CE6E22C8) features onboard memory blocks, allowing me to store the whole frame buffer for a 100×75, 8bit pixel screen on the FPGA. If more memory was available, I would be able to store the whole 800×600 pixel frame buffer but the amount of space increases dramatically with increasing screen sizes! I need 7,500 bytes to store a 100×75 8 bit frame buffer whereas I need 480,000 bytes to store a 800×600 8bit frame buffer! In the future, I might replace this for an SDRAM interface for my onboard SDRAM, allowing me to store the full frame buffer.
As the interface is so simple, I can shift data from the STM32F0 to the FPGA pretty fast. As the STM32F0 series features a variable length SPI interface, I can send a pixel over in two 11bit SPI writes (8 bits for the pixel colour and 13bits for the address). I have however made the shift register 24bits long, for systems that don’t support a variable length SPI interface, instead requiring 3x 8bit SPI writes. As it doesn’t take that long for the STM32F0 to send a pixel over, it actually allows relatively complex tasks such as video playback to be realisable at a relatively good frame rate. Reading my proprietary video format off a bog standard SD card, through the SPI protocol on the STM32F0 gave an unsynchronised frame rate of ~17fps. By proprietary video format, I merely mean packing every pixel, one after another into a data file and streaming that off the SD card and into the graphics controller. No decoding/decompression takes place on the STM32F0.
The video conversion is done in Matlab where the scaling factors for the original video to the 100×75 pixel screen are first calculated, the video is then scaled and converted to RGB332 format (SLOW SLOW SLOWWWWWW) and finally written to the output data file. This process is unbelievably slow, most probably because Matlab isn’t really meant to be used for relatively heavy video processing. I’ve been meaning to get around to creating a C/C++ program to do my video processing needs but haven’t got round to it just yet… This same process is used for displaying images on the screen. The image is first scaled, then packed into RGB332 format and stored in a data file, to then be read on my STM32F0 and each pixel pushed into the screen buffer. One very fortunate thing however is the fact that as the screen is interfaced with a single function: WritePix(X, Y, Col), my previously written “GFXC” library can be used to write data to the screen! This allows me to do things like write text, draw circles, ellipses, squares and so on.
I will at some point in the future be uploading code for this project but I wouldn’t as of yet consider it complete enough to release to the general public as there are still a few niggles that I need to iron out. Until then, keep tuned for more updates!
Oh and also, here is a quick vlog demonstrating the video playback capabilities: