A few FPGA development boards now have HDMI outputs, and it's quite easy to use these to display an image. Here's one way you could display a static image on a Digilent Atlys!
This design is based on XAPP495, but you may want to use my 'fixed' version, which works on ISE 14.
The vtc_demo basically has two parts:
We will assume that the HDMI transmitter logic already does everything we want and I have not modified it any further (other than fixing the clock generation).
The image generator only displays coloured bars, which are not very interesting! We will replace it with a new module called image_gen, which displays an image stored in the FPGA’s Block RAM.
module image_gen( input i_clk_74M, //74.25 MHZ pixel clock input i_rst, // not used input baronly, // not used input [1:0] i_format, // not used input [11:0] i_vcnt, //vertical counter from video timing generator input [11:0] i_hcnt, //horizontal counter from video timing generator output [7:0] o_r, output [7:0] o_g, output [7:0] o_b );
If you examine how hdclrbar works, it really just has two inputs, the (x, y) coordinates that dvi_encoder wants to draw to the screen next. These signals are called i_hcnt and i_vcnt. The dvi_encoder supplies these coordinates and expects an RGB pixel value one clock cycle later at o_r / o_g / o_b. For this exercise we'll emulate the hdclrbar interface exactly to make it easier to change from one to the other - it also includes some other input signals that we will not use and just leave unconnected.
We will use a BRAM block memory to hold the pixel data for an image. Because the Spartan-6 does not have enough BRAM to fit a high resolution image, I have scaled it down to 256x192 with 6-bit colour depth. The Spartan-6 BRAMs are exactly 18 bits wide. If we use 6 bits for each of red, green and blue this is convenient because it means we can store all colours for a single pixel in one BRAM memory location. This is important because it takes one clock cycle to fetch a single 18-bit word from BRAM. If a pixel was stored in multiple memory locations, we would need multiple clock cycles to retrieve a single pixel.. so the memory would need to be run much faster than dvi_encoder. This is possible, but it would be more complicated to design. Alternatively you could use two BRAMs in parallel to create a 36-bit word.

Anyway, 884736 bits (256x192x18) is well within the Atlys' Spartan-6 LX45 BRAM capacity, so this is a good size to use. It is also exactly one quarter of 1024x768 (a standard screen resolution) so we can make the image nearly fill the screen by making it four times larger in both directions - a very simple operation to do in digital logic.
Here's how to create a BRAM using the CORE Generator

Once created, you can copy the HDL Instantiation Template for the new BRAM and paste it into your image_gen module (or read on to see mine).

The design for my image_gen module is very simple: it just converts the incoming i_hcnt/i_vcnt values into an address in the BRAM. A BRAM memory is one dimensional: the 256x192 pixels are stored as 49152 consecutive 18-bit values, so we just need to convert the x,y coordinates to a single value using the operation address = (x + y*256) - remember that image is 256 pixels wide. Because the screen resolution might be even larger than this, we need to define some signals that will tell us if the requested pixel location is outside of our stored image.
 wire h_too_big, v_too_big;
 assign h_too_big = i_hcnt[11:10] > 2'b00; // > 1024
 assign v_too_big = i_vcnt[11:8] > 4'b0011; // > 768
 always @*
 begin
   if (h_too_big | v_too_big)
     pixel_address = 16'd0;
   else
     pixel_address = (i_vcnt[9:2] * 256) + i_hcnt[9:2];
   end
If the x,y coordinate is larger than the image (h_too_big | v_too_big), we instruct the BRAM to return the pixel value in memory location zero, which is just the top-left pixel of our image. If we don’t do this, the picture will be repeated horizontally in an unusual way.
If not, we perform the calculation to map the coordinates to the linear memory address. Note that we also shift the incoming coordinates right by two bits, which makes the image four times larger (2^2) in both directions. This means that the 256x192 image is actually displayed on the screen as 1024x768. Each pixel in the source image is displayed as a square of 4x4 pixels on the monitor.
Here is the instantiation of our BRAM. First of all, the BRAM's output enable is always high - we always want it to be returning pixel data.
wire [17:0] doutb; image_bram image_bram ( // Port B (read) .clkb(i_clk_74M), // input clkb .addrb(pixel_address), // input [15 : 0] addrb .doutb(doutb), // output [17 : 0] doutb // Port A (read). // None of these signals are used at the moment - in the future they could be used to // upload a new image to BRAM without regenerating the bit file. .clka(i_clk_74M), // input clka .ena(), // input ena .wea(), // input [0 : 0] wea .addra(), // input [15 : 0] addra .dina(), // input [17 : 0] dina );
Here the output signals are connected to doutb, which is the BRAM’s data output register (on port B - we are not using port A for anything yet). Because the colour values are 6-bit, convert each of them back to 8-bit by adding two zero bits to the right hand side (the least significant bits). This just means that the image doesn’t have as many colours as the original, but it is not too noticeable because the image is very low resolution and looks terrible anyway!
 assign o_b = {doutb[17:12], 2'b00};
 assign o_g = {doutb[11:6], 2'b00};
 assign o_r = {doutb[5:0], 2'b00};
We have not implemented a way for the FPGA to decode a compressed image format such as JPEG, so we first need to convert an image to the correct size and our uncompressed 'raw' 18-bit colour format. You can write a simple program to do this. Here's an example using Ruby:
 #!/usr/bin/env ruby
 require 'rubygems'
 require 'rmagick'
 img = Magick::Image.read('image.jpg').first
 thumb = img.thumbnail(256, 192)
 puts "memory_initialization_radix=16;"
 puts "memory_initialization_vector="
 0.upto(thumb.rows-1).each do |r|
   scanline = thumb.export_pixels_to_str(0, r, thumb.columns, 1, "RGB");
   i = 0
   while i <= scanline.length-2 do
     puts ((scanline.getbyte(i) >> 2) + ((scanline.getbyte(i+1) >> 2) << 6)  + ((scanline.getbyte(i+2) >> 2) << 12)).to_s(16) + ","
     i = i+3
   end
 end
Here's an example using Matlab (which I have not tested). It is incredibly slow, probably due to my inefficient Matlab coding style.
 im = imread('image.jpg');
 im = uint32(imresize(im, [256 192]));
 im_out = zeros(1, 256*192, 'uint32');
 for J = 1:192,
   J
   for I = 1:256,
     red = bitsrl(im(I, J, 1), 2);
     green = bitsll( bitsrl(im(I, J, 2), 2), 6);
     blue = bitsll( bitsrl(im(I, J, 3), 2), 12);
     im_out( 1 + (I-1) + (J-1)*256) = red+green+blue;
   end
 end
 outfile = fopen('coefile.coe', 'w');
 fprintf(outfile, 'memory_initialization_radix=16;\r\n');
 fprintf(outfile, 'memory_initialization_vector=');
 for I = 1:49152,
   fprintf(outfile, '%x,\r\n', im_out(I));
 end
 fclose(outfile);
Please note that these write the image into BGR format (within the 18-bit BRAM word) rather than RGB. It doesn't really matter which order you use, as long as the output of your BRAM is hooked up to the appropriate signals, unless you want the colours to be swapped.
Re-open the BRAM core you generated and selected the new .coe file, then re-generate the core.
In vtc_demo.v you can now replace the hdcolorbar instantiation with image_gen directly. I have just commented the hdcolorbar out so that I can swap back quickly.
   //hdcolorbar clrbar(
   image_gen clrbar(
     .i_clk_74M(pclk),
     .i_rst(reset),

This design currently fails to meet timing constraints for some reason, but it does work!
Here are some suggestions for things you could do using this basic design: