EE265 Lab 1: TI TMS320C55x DSP Tutorial

Winter 2007-2008
Instructor: Teresa Meng

I. Background and Motivation:

You will be hearing a lot about the term DSP throughout the course. What is DSP? Why is it important? Why do you want to process signals in digital instead of leaving it in analog? When should you use DSP? What are the advantages, disadvantages and tradeoffs? What are DSP processors and how are they different than microprocessors? This section will discuss briefly these questions, and hopefully, you will understand these issues in greater detail by the end of this course.

 

DSP vs. Analog:

DSP stands for digital signal processing and it often involves taking an analog signal, converting it to samples of digital values, processing the digital data, and converting it back to analog for output. We often consider DSP along with the analog front and back-ends since that is what we ultimately hear and see. Why do we process signals digitally rather than working with the original signal in the analog domain? The answer is it depends on the system and its requirements. For some systems, working with the analog signals gives a better solution. For others, DSP is better. It is up to you, the designer, to make the decision based on materials you learn through this and other courses. Here are some things to consider in making the tradeoffs:

  • Considerations for Analog systems

o        Operating conditions - temperature, frequency, supply voltage

o        Radiation/interference

o        Out of band interference

o        Aliasing

  • Considerations for Digital systems

o        ADC (analog to digital converter) precision and filter noise

o        DAC (digital to analog converter) filter order

o        More complex algorithms may be used. Some may be unattainable with analog systems (examples: compression, data analysis, and data synthesis).

o        Flexibility in choice of algorithms.

o        Reprogrammable.

o        Fixed point - loss of precision due to limited datapath

o        Floating point uses more power

o        Reliability not dependent on operating conditions

o        Reusability of hardware

o        Throughput and latency - some systems require that DSP be performed in real-time. This is a throughput constraint that requires the system to have finite storage equivalent input and output data rates. This does not mean that an input signal will be immediately processed to generate an output signal. On the contrary, most DSP systems generate a corresponding output to an input signal at a delayed time. The maximum latency tolerated by a system is part of the specification.

o        Storage of intermediate results - on-chip memory needed (may be a limiting factor)

 

DSP Processors vs. General-purpose Microprocessors:

How do DSP processors differ from microprocessors? Both types of processors perform computations on digital signals. But the main difference is that DSP processors are tailored to process data signals whereas microprocessors are designed to reduce the amount of computations in a general computing environment where most signals being processed pertain to some form of program control. DSP processors are also designed to consume less power under its target applications. Products that use DSPs include DVD players, camcorders, cell phones, wireless base stations, modems, karaoke players, etc.

The separation between DSP processors and microprocessors is narrowing since microprocessors are approaching, if not exceeding, throughput capabilities of the DSP processors. In the personal computer industry, microprocessors are taking over several computational tasks used to be cost effective only when implemented with DSPs. This results in a narrowing market segment for DSP processors in the PC industry. However, DSP processors remain highly competitive in areas where non-PC related high throughput, low power computing is required.

 

This course integrates a number of different topics in digital signal processing (DSP): DSP processor programming, DSP algorithms implementation, and performance considerations. Understanding the programming methodology of DSP processors is relatively simple. But to write optimal code requires understanding of the DSP algorithms as well as the capabilities of the DSP processors. Although this course covers the programming methodology for TI's TMS320C55x DSP processor (which is one of the most popular DSP processor currently in use), the fundamentals of DSP programming extend to other DSP processors as well. Furthermore, the design principles you will learn in coding a DSP processor will be applicable to DSP designs in application-specific hardware.

II. Purpose:

The purpose of this lab is to introduce the design flow and basic programming methodology for working with DSP processors (in particular, TI's TMS320C55x DSP processor). This lab consists of an introduction, a couple DSP demos, and a brief programming exercise. The introduction is not intended to cover everything you need to know about DSP programming, but to provide you a working knowledge of TI's C55x DSP and its tool environment and enough material for understanding TI's C55x reference materials.

III. Introduction to the C55x DSP Processor

This section provides an overview of the C55x processor. The architecture and the instruction set of the C55x will be discussed first, followed by an introduction to the programming tools used in the EE265 lab. As an overview, the following discussion does not provide detailed explanations of the topics discussed. For detailed descriptions of both the architecture and the instruction set you will need to refer to the C55x manuals. For now, you should read the following section before reading the manuals. Section IV of this lab lists assigned reading, which should help fill in many of the details.

 

As a first introduction, the information you are exposed to in this lab can be rather overwhelming. However, since much of this material will not be discussed again in later labs, so you are highly encouraged to read the material now and then reread it later in the course after you become more comfortable with the C55x processor. It may help to make note of which parts are confusing so that you can clear up the confusion later, when understanding is required by the lab exercises.

A. Architecture

The most pressing issue in the design of modern high speed digital logic is the high power consumption of large memories. Thus, the architecture of the DSP is designed around efficient memory access and utilization. Unlike other DSP processors, the C55x processor architecture is based on unified memory system, although the program memory and the data memory are logically separated. This means that the program memory and the data memory are separately addressed, in which the program memory space is byte addressable 24bit address and the data memory space is word addressable 23bit address, but the physical memory is shared by program memory space and data memory space. This memory may include both on-chip and off-chip memories. Take a minute to examine the C55x architecture in Figure 1. Notice that there are four read busses, three for data and one for program code. Specialized instructions allow data access using all three data busses in one clock cycle. Part of the data memory that is on-chip is dual-ported and is referred to as DARAM (Dual-Access RAM) in the manuals. This means that this part of the data memory can be accessed twice in the same clock cycle. This is useful for implementing DSP algorithms where multi-operand instructions are predominant.

Take note of the four basic blocks of the C55x CPU, the Instruction Buffer Unit (I Unit), the Program Flow Unit (P Unit), the Address-data Flow Unit (A Unit) and the Data Computation Unit (D Unit). You should read more about these first three units in the TMS320C55x Technical Overview and the TMS320C55x DSP CPU Reference Guide.

 

Datapath

Notice that there are six pairs of buses interconnecting the core and the memory units. Five of these buses are for the data memory and the sixth one is for the program memory. Though there are five data buses, three read and two write, this does not mean that the processor uses all buses every clock cycle. Rather, there are a limited number of instructions that can make use of the full memory bandwidth. The CB and DB read buses and the EB and FB write buses can be used jointly to access single 32-bit values or individually to access two 16-bit values. The BB read bus is used primarily by the special dual-MAC instructions, and can access a single 16-bit value stored in internal memory (only in internal memory!).

Data Computation Unit

Take a minute to study the architecture of the C55x Data Computation Unit (D-Unit) in Figure 2. This figure can tell you a lot about the capabilities and limitations of the C55x processor. First, notice the computational building blocks of the C55x processor. These include the shifter, a 40-bit arithmetic logic unit (ALU), 2 17x17 bit multiply-accumulators (MACs) and four register-accumulators. Next, note the interconnections between these blocks. This tells us how data can be transferred between the computing units and how the processor core can be configured in each cycle. Of special note is the fact that while there are two MAC units, there are only three data bus pathways. Thus, while the TI C55x can perform two multiply-accumulate actions in a single clock cycle, at least one of the input operands must be shared between multipliers. Furthermore, this shared operand must be stored in internal memory, as it is transferred on the BB data bus.

Functional Units

The two MACs use a 17x17 bit multiplier, so that signed values can be multiplied without additional bit level operations. The output of the multiplier is fed into a 40-bit adder to generate a 40-bit, which can then be optionally saturated (32 bit or 40 bit). The combination of multiplier and adder enables a single cycle multiply accumulate (MAC) operation and is very useful for filtering operations. The ALU is independent of the MAC unit. It is capable of performing logic operations as well as additions. Note that one of the inputs to the ALU is taken from the Barrel shifter. This means that the data word can be shifted prior to addition or logic operations. Additionally, notice that the ALU is 40-bits. It can perform a single arithmetic operation on a 40 bit value (i.e., from the accumulator) or two arithmetic operations on 16 bit values. As you explore the TI C55x instruction set, you'll note that the size of the operands for various instructions are typically specified.

You will find instructions that utilize multiple accumulators, the ALU, the Barrel Shifter, and one or both MACs. Such instructions are very useful for certain applications like Viterbi decoding. The ability to perform these types of instructions makes TI's C55x a very powerful DSP since they utilize most of the processor's core resources. However, it is up to the DSP engineer to choose the appropriate algorithms tailored to the DSP architecture. Otherwise, almost any algorithm can be implemented with simple add and multiply instructions. An optimal DSP code is one that results in the lowest power consumption. This is equivalent to a code that executes with minimal number of cycles and one that uses instructions having high processor utilization.

Control Circuitry

In the C55x, the program control and addressing circuitry are split into 3 units, the Instruction Buffer (I Unit), the Program Flow Unit (P Unit) and the Address Data Flow Unit (A Unit). C55x instructions can be up to 6 bytes. However, examining the architectural diagram in Figure 1, you'll notice that the program data bus is only 32 bits wide! The instruction buffer queue stores up to 64 bytes of program code; as many instructions are fewer than 4 bytes wide, this means that instructions longer than 4 bytes can often be executed without the extra cycle latency that would be required in the absence of the instruction buffer. However, if the instruction buffer is emptied by a sequence of long instructions or as a result of a branch or call, if the next instruction to be executed is longer than 4 bytes, a one cycle delay will ensue. Furthermore, code loops which fit entirely within the instruction buffer queue can be executed efficiently not only in terms of eliminating potential delays in fetch long instructions, but also by forgoing the energy required for code memory accesses.

Many algorithms access data by way of address pointers (much like C/C++-style pointers). The A Unit contains a 16 bit ALU which gives the C55x the ability to dynamically update address pointers without taking any additional cycles to perform pointer arithmetic such as adding constants to a pointer or incrementing modulo some value.

Memory Organization

The C55x has 3 memory spaces, program (page 0), data (page 1), and I/O (page 2). Make note of the page numbers associated with each memory space because you will need to know them for coding. As instructions are specified in 8-bit (1 byte) chunks, the program memory space is addressed at the byte level, with 24 bit addressing. As data is processed in 16 bit words, data memory space is addressed at a word level, and hence data memory addresses are 23 bits wide. The actual amount of internal memory available on the C55x processor depends on the particular model; the C5509A, the processor used in this class, has 8 blocks of 8KB of DARAM (64KB in total) and 24 blocks of 8KB of SARAM. Note that for full data bandwidth, some instructions require different operands to be stored in different blocks of memory.

More detailed description of the C55x memory space is available in Chapter 3 - Memory of the CPU Reference Guide and in the TMS320C5509A Data Manual

The program memory is pointed to by the Program Counter (PC) which references the next instruction to be executed. There are also other auxiliary and status registers that are associated with the program memory space. They are primarily used for program flow control such as branching or conditional execution.

The data memory space is associated with 8 16-bit auxiliary registers, AR0, AR1, ..., AR7. These registers are used primarily as pointers, just like the pointers found in high-level languages such as C/C++. By functioning as pointers --- using a form of addressing called “indirect addressing” the address registers enable faster instructions to be implemented (more on this later). The A Unit data address generator, DAGEN, is dedicated to operating on the address registers. Automatic increments, decrements, modulo, as well as indexed increments, bit-reversal, etc., are supported by DAGEN. These all make data manipulation even more flexible with C55x. Recall that the data space uses 23 bit addressing. When the address registers are used for indirect addressing, the high 7 bits of address are taken from the high 7 bits of the extended auxiliary registers XA0, XA1, ..., XAR7, which are set using a 23 bit constant (k23).

Tables 6-4 and 6-5 in the TMS320C55x DSP CPU Reference Guide list the possible variations of indirect addressing using address registers. Notice that there are restrictions on the set of address registers that can be used with two-operand --- such as instructions that multiply two numbers that are both stored in memory --- and parallel instructions; the “Dual AR Indirect Addressing Modes” are shown in Table 6-7. Instructions which use three operands from memory require special addressing, using the “coefficient data pointer” (CDP. Before using instructions which use the CDP (denoted by the Cmem notation in the reference guide) you should read and understand the details given in section 6.4.3.

For those that are curious, here is a brief explanation of why address registers enable faster instructions: The C55x processor does not have a fixed instruction length, so instructions can be up to 7 bytes long. However, there are two reasons why longer instructions are not to be desired. First, memory access is a major component of energy consumption; the more times memory is accessed in a battery powered device, the shorter the battery lifetime will be. Second, since the processor can only fetch two words (32 bits) of memory each clock cycle, if an instruction is more than two words in length, this increases the chances that it will not be able to be executed in a single clock cycle. This means that instructions can be both more energy efficient and faster if they can be packed into fewer bits (and therefore fewer words). Some of the bits that make up an instruction are used to tell the instruction decoder what type of operation is supposed to be performed, while the remaining bits typically tell the processor what the operands are for the instruction. For example, if a load instruction is used, the operand to be loaded can be specified by encoding its address directly in the instruction. For the C55x, addresses are 16 bits, so encoding this address directly in the instruction (so-called “absolute addressing”) would result in an instruction that is at least 2 words long. Alternatively, if the operands are addressed by way of pointers (“indirect addressing"), then the instruction only needs to encode enough bits to indicate which pointer to use. Since there are only 8 address registers available for use as pointers, this takes at most 3 bits. Even more detail: In the case of dual-operand addressing, the instruction needs to specify two operands by way of pointers, so more bits are needed, and thus there are more restrictions on the potential pointer manipulations.

Recommended: For more information on addressing, read all of Chapter 6 - Addressing Modes of the TMS320C55x DSP CPU Reference Guide.

Processor Configuration (Recommended)

Inside the C55x processor core are a number of registers pertaining to the control and configuration of the DSP processor as well as communication with peripheral devices. These registers can be used to monitor the status and to configure the processor. To simplify programming, these registers are mapped to the data memory. This means that instructions that work with the data memory can access and operate on the information contained in the MMRs. This is why they are called memory mapped registers (MMRs). The instruction set also has several instructions that can only be used to operate on MMRs. C55x MMRs are listed in Table 2-1 of the CPU Reference Manual. Of particular note are the 4 status registers ST0_C55 -- ST3_55. These registers control and report many basic operations. Addressing, conditional flags, overflow mode, sign extension, saturation, rounding, circular addressing, fraction mode, global interrupt enable, shifting, and much more are all accessible through these registers.

 

B. Instruction Set

Overview:

The C55x instruction set can be summarized as: too many choices. There are many different types of instructions available and for any given instruction there may be many different variations to choose from. For example, a look at the instruction set documentation will reveal that there are more than 20 variants of the addition operation (including multiple versions of ADD, ADDV, ADD::MOV, ADDSUB, ADDSUB, ADDSUBCC, and ADDSUB2CC). The number of choices may appear daunting at first, but the availability of the many variations means that there is usually an instruction available that will do exactly what you need. For example, if you need to add two numbers then there is probably a specific instruction available to do this for you, regardless of where the numbers are stored. Some examples: 1) adding two numbers that are both stored in memory, 2) adding a single number stored in memory to a number in an accumulator, 3) adding a constant to an accumulator, or 4) adding a value to an address pointer. The significance of all of this is that if you know you need to perform a certain type of operation, you need only find the proper version (the proper syntax) to use for a particular situation. Even more importantly, the availability of many different instructions results in more efficient code, since many operations can be done in a single cycle instead of using several cycles to perform a task. For example, if you were not allowed to add two numbers that were both stored in memory, then you would first need to load one of the numbers into an accumulator and then perform the add. This would take a minimum of two cycles for the load and add, whereas the dual-operand version that adds two numbers directly from memory could be done in a single cycle.

 

DSP-specific and Application-Specific Instructions:

Much of the C55x instruction set is comprised of common instructions such as Load/Store, ADD, Multiply, etc., but there are also many instructions that are available specifically for DSP operations. These DSP-specific instructions are the reason why DSP processors can be more efficient than general-purpose processors. They allow certain DSP operations to be performed using fewer instructions (and fewer clock cycles) than would be required if using general-purpose instructions. There are also instructions that are application-specific in that they are available primarily to speed up certain specific DSP algorithms (such as FIR filtering with symmetric coefficients or Viterbi decoding). Although these instructions may have been targeted for certain specific algorithms, they can sometimes be exploited for other uses as well.

 

Optimization, Options and Pitfalls:

The abundance of available instructions means that there is a lot of room for optimization depending on which instruction is chosen for a particular task. (This also means that it is very difficult to design a C-to-assembly compiler for DSPs.) Being familiar with the available instructions can make programming easier and more efficient. It is essential for the DSP programmer to have a good understanding of all the options and pitfalls.

 

Instruction Types:

The C55x instruction set can be broken down into the following categories:

  • Data transfer
    • These instructions transfer data within and between the program and data memory spaces. These are primarily the various Load, Store and Move instructions.
    • DSP programs mostly work on data stored in the data memory. Data memory is composed entirely of RAM or processor-specific ROM. Many programs operate on data that is generated externally, such as an audio stream that is sampled with an external Analog-to-Digital (A/D) converter. The C55x itself does not have an A/D converter, but (as you'll see later) data can be obtained from an external A/D converter by way of a serial port on the C55x.
    • The DSP program will be physically stored in some type of ROM addressable in the program memory space. Program memory contains not only the instructions for a program but can also include constant data such as filter coefficients for use in your DSP algorithm. Most instructions operate on data in the data space, so data (such as filter coefficients) must sometimes be moved from program memory to data memory.
  • Computational
    • General
      • Arithmetic or Logical Instructions: Instructions primarily handled by the ALU such as AND, OR, XOR, ADD, SUB, etc.
      • Shift: Most C55x ALU instructions have shifting embedded since the Barrel Shifter is in series with the ALU. Arithmetic shift, logical shift, and rotate instructions are also available.
      • Long Word Instructions: As discussed above, the C55x memory datapaths are 16-bits wide and the processor core datapaths are 40-bits. You can only load and store 16-bit words. The C55x has provisions (i.e. instructions and processor modes) for working with 32-bit and 40-bit data, which are available for special circumstances. (Most DSP algorithms do not require more than 16-bit precision. )
      • Exponent encoder: The C55x is a fixed point DSP. However, it is also capable of performing limited floating point operations using the exponent encoder.
    • DSP oriented
      • MAC: Multiply-accumulate (or multiply-subtract) instructions allow the result of a multiplication to be added to a previously-computed sum. This can often be done in a single cycle, which makes for very efficient DSP operations. MAC is one of the most used instructions since many DSP algorithms (such as FIR filtering) involve adding the products of multiplications. There are several variations of MAC instructions but some of these may not be available in all cases. In planning to use the MAC, you need to consider where the operands are stored and in which order you need to do the multiply and accumulate. (Multiply-subtract (MAS) instructions are also available.)
      • Application-Specific Instructions: The C55x has several application-specific instructions. One such example is MAXDIFF - Compare and Select Accumulator Content Maximum, specifically designed to facilitate Viterbi decoding. (Application notes are available on-line that discuss how Viterbi decoding is done on the C55x.)
  • Program flow control
    • Condition - These instructions enable you to test a variable (stored in the accumulator, address registers, the memory-mapped registers, etc.) against a certain condition. If the condition is met, the appropriate status flags are set. These status flags are used by instructions such as branch or conditional execution (XCC) to determine what instruction to execute next.
    • Branch - There are a number of different branching instructions available. These include basic branching (B), conditional branching (BCC), and subroutine calling (CALL).
    • Repeat - Repeat instructions are used for looping over a single instruction or a block of instructions. Looping can also be done using conditional branching, but repeat instructions can be used when the number of iterations is known apriori. Looping with repeat instructions is equivalent to a “for” loop, whereas branching is equivalent to a “while” loop. The looping is controlled by way of local registers that store information such as the loop iteration count, the address of the first instruction in a loop and also the last instruction to be executed in a loop. The program decoder will process the information stored in these registers to decide which instruction to execute next (whether it needs to exit the loop or branch back to the beginning, etc.). The repeat loop has much less cycle overhead than branching because it knows exactly how many times it needs to execute. Also when it is done looping, program execution resumes at the next instruction following the loop. Therefore the pipeline does not need to be flushed (unlike branching) and no cycles are wasted other than the cycles needed to setup the repeat operation. This type of instruction is specific to DSP processors, since the number of iterations needed by a DSP algorithm is fixed.
  • Parallel Instructions
    • Many instructions in the C55x instruction set can be executed in parallel, so that two instructions can be implemented in the same cycle. This is possible when the two instructions use completely different resources (functional units and buses), so they can be performed at the same time without causing resource conflicts. These are a perfect example of special instructions that are included to optimize performance as much as possible. Parallelism in the C55x can be both at a single instruction level (for example, MAC :: MAC is a specific instruction) or user defined. Read Ch. 2 of the Mnemonic Instruction Set Reference Guide for more information.

 

Pipelining

The pipeline of the C55x is discussed in Section 4.4 of the TMS320C55x DSP Programmer's Guide. The C55x makes programming quite easy by protecting against almost all potential pipeline conflicts. Thus, under normal circumstances, the pipeline will not introduce any problems. However, if you find your code takes longer than expected to run and are not aware of how the C55x handles pipelining or even what a pipeline is, it will make debugging quite difficult. The rule of thumb for debugging an assembly code on a pipelined processor is that when you run into an instruction that does not make any sense during debugging, add several NOP (no-operation) instructions before the instruction or check the conditional flags (if you don't understand pipelining, this sentence will probably not make any sense. Keep it in mind and you will understand later).

Interrupts

A majority of the labs in this course involve the use of interrupts. We will be discussing interrupts in greater detail in Lab 2. Essentially, interrupts are used as signals to the processor to do things other than what the processor is currently doing. The sequence of events for processing an interrupt is as follows.

 

  1. First, an external (or internal) device generates an interrupt to the C55x processor. The C55x processor decides whether to accept the interrupt. If the interrupt is accepted, the current program flow is interrupted.
  2. If the interrupt is accepted, all interrupts are henceforth disabled so that the current interrupt that is being responded to cannot be interrupted.
  3. The current PC (program counter) value is pushed onto the stack. (This is called the “context-save” and it is analogous to setting a bookmark which allows the program execution to return to what it was doing before the interrupt occurred).
  4. The PC is then loaded with the address of the interrupt vector (which is a small set of instructions that are executed each time an interrupt occurs).
  5. The interrupt vector corresponding to the incoming interrupt is then executed. The general term for code that is executed when an interrupt occurs is “interrupt service routine” or ISR. Often, the ISR won't fit in the space allotted for the interrupt vector so the interrupt vector contains a branch instruction to the larger interrupt service routine. This is because all interrupt vectors for the C55x are exactly 4 words long. If the vector includes a branch to an ISR, the ISR is executed following the interrupt vector. If the ISR no more than 4 words, then it may be coded into the interrupt vector without using a branch.
  6. At the end of the ISR, a return from interrupt (RETI) is executed. This restores the saved PC from the stack and enables all interrupts to resume the normal program flow.

 

A number of things need to be configured for interrupts to work properly. As implied in the previous paragraph, you need to setup the interrupt vector and the stack and you need to code the ISR itself. You will also need to setup the interrupt mask register (IMR) and the interrupt flag register (IFR). Also, before the main program starts, you will need to enable the global interrupt mask (INTM). You will learn how to do this by going through the exercises. Sometimes this setup can be done automatically for you by the programming tools, as will be seen in later labs.

The IMR is used to selectively enable and disable interrupts. In other words, the IMR configures the C55x processor to listen to certain interrupts during normal processing. The IFR indicates which interrupts have active requests so that you can find out if an interrupt occurs while another interrupt is being serviced. The IFR can also be used in another method of handling external signaling called polling. Polling is different from interrupts in that a new interrupt sets the appropriate IFR bit, but it does not stop the current program flow. Instead, the program checks the IFR to see if an interrupt request is active. If so, an equivalent “ISR” is executed. This results in more predictable program flow, since the program is never actually interrupted. Finally, the global interrupt mask (INTM) is a convenient way to enable (or disable) all interrupts that are selected with IMR.

Different C55x models have different number of interrupts. The maximum number of interrupts supported on the C55x is 32, most of which are inactive. There are also two types of interrupts: software and hardware. Software interrupts may be used to indicate that an event occurred in the program. Hardware interrupts are generated by physical devices. There are different types of hardware interrupts. Three hardware interrupts are user specified. The user may connect a signaling wire to the DSP hardware to control the DSP through these interrupts. Other hardware interrupts are used for device-to-device communication such as serial ports and buffered serial ports. We will focus on the hardware interrupts in this course.

 

C. Working with Assembly Language

Assembly is a low level programming language. It is dedicated to a certain type of hardware. For example, DSP assembly codes cannot run on Sun Workstations unadulterated. Assembly coding does not have the conveniences of a high level programming language where you are provided with highly abstracted constructs like objects or even certain simple functions. Like any programming language, it needs tools for development, such as compilers, simulators, and debuggers. Often, the actual DSP hardware is required to verify real-time capabilities of the assembly code.

In this course, you are provided with the following:

  • Hardware
    • PC Computer
      • Used for writing, simulating and debugging DSP programs.
      • Generates audio input to the DSP for processing.
      • Used for running Matlab to verify your algorithm and to validate the DSP processed signals.
    • TI C5509 Evaluation Board (EVM)
      • Verifies that the DSP program runs under real-time constraints.
      • Generates audio frequency output for listening to the signal that is actually produced by your DSP algorithms.
  • Software
    • Code Composer Studio 3.1 (CCS) - This is TI's DSP development environment that includes:
      • Assembler - compiles the hand written assembly code into object codes. Object codes are machine codes that have not been allocated in the program memory. Relative addresses are compiled but absolute addresses in instructions such as branches are still symbolic.
      • Linker - links all necessary object codes, assigns each portion of the code to a proper memory location, and generates a DSP hardware compatible binary (or so called executable). All symbolic addresses are resolved to a physical address.
      • Debugger - loads and simulates the executable. Without the EVM board, the term “debugger” usually refers to a software simulator. However, when using an EVM board, the debugger essentially serves as an interface to the EVM board. It allows you to upload your code onto the EVM board and execute the code on the EVM. In addition, it provides features for observing the states of the board, such as accumulator values, status registers, PC counters, etc. (which can be observed after stopping the EVM from running). 
  • Documentation
    • Manuals - a majority of TI's users and reference manuals are available in the electronic format (linked to the course webpage) and through the CCS Help menu. They have been downloaded from their main website and stored locally for faster access. (A hard copy of some the manuals will be available in the lab. Please do not take them out of the lab.)
    • The course webpage has links to many related documents, some of which you will need to refer to (and some which you will never need to look at). There is a wealth of information available here, but some of it will not be useful in the labs. We will try to direct you toward the essential documents, but it is a good idea to take a look at the other documents that are available.
    • You may choose to print the documents, but keep be mindful of the print quota where you are printing.

 

The code for a typical assembly-based project usually consists of the following:

  • Assembly codes
    • Main assembly code
    • Hardware configuration files
      • Interrupt vector setup
      • Serial port setup
      • AIC (analog interface chip) setup
    • Macros

 

  • Command files - command files are like configuration scripts that configure how a tool works. There are various command files that configure both the linker and the debugger. We will only describe the linker command file, since this is the only command file that you will need to modify for each project.
    • Linker command file Configures the linker. It contains 3 parts, all of which are placed in a single ASCII text file:
      • The first part is the file I/O section. This specifies which files the linker use for input and output of the linking process. Command line options can also be placed in this section.
      • The second part is the memory definition. This tells linker about the memory sections that are available on the processor, since the linker needs to decide where to place both code and data. The size and locations of the processor's memory sections are specified and assigned names here. The names could be any arbitrary name, but are usually chosen to indicate the type of memory section that is referred to. An example of this is the name IDATA, which can indicate that it refers to internal data memory.
      • The third part is the mapping section. This tells the linker where to place the different parts of the entire DSP code. Remember that a complete DSP code consists of not only the main program instructions but also the interrupt vector definition, the stack definition, space for intermediate variables, etc.. In general, these can be stored in either the program memory or the data memory and they all need to be mapped into some memory section (as defined in the previous part of the linker command file).

 

  • General Extension Language (GEL) file - As far as this course is concerned, the GEL file is used to initialize target memory locations, such as the DSP configuration registers, to known values. This can be very important for restoring the DSP to a working state. For example, if the configuration registers were accidentally overwritten and therefore the processor was not working properly, the GEL file could be used to restore these registers. Code Composer Studio provides a gel file named dsk5509a.gel that has appropriate settings for the C5509 processor. If you are curious to know more, the General Extension Language is an interpretive language similar to C that lets you create functions to extend Code Composer Studio's usefulness. It is particularly useful for automated testing and user workspace customization. However, you will not need to modify any GEL file in this course.

 

  • Compilation scripts/Project files

IV. Readings:

Chapter 1: CPU Architecture

Chapter 6: Addressing Modes

Chapter 2: Tutorial (optional)

Chapter 4: Optimizing Assembly Code

 

V. Lab Environment:

This section takes you through the lab computers and discusses where to find things and what to look for.

We highly recommend you to read through the lab and do much of the lab work outside of the lab as the computer resources as well as the TA resources are limited. You may find that some labs are rather lengthy. It is important that you work efficiently inside the lab since there are limited number of lab stations. Try to use the lab for debugging your code and to execute the experiment. If this is not possible, try at least to have as much of your algorithm roughly coded as possible before coming into the lab. This is especially important in later labs where most of the labs are project-oriented and require extensive coding.

We ask that you do not lock the lab PC's in order to reserve a lab machine. In fact, the TA's and the system administrators are responsible for logging off any locked lab PC's if there are none available. If such situation arises, the user will be logged off, and any programs that are running will be terminated. The TA will not be able to tell if there are important programs running.

A. Working Directory

You will be assigned an account on the EE network. This will allow you to log in on any of the lab computers in Packard 001 to access your ee265 account. You are responsible for keeping your files in your account folder and not leave random files on the local machine, especially the C:/ drive. You may use the local C:/ drive for TEMPORARY storage since, on occasion, the EE network may be down preventing you from working on the labs. Files left on the C:/ drive will be removed by the system administrator on a regular basis.

Each user will have a personal network directory. This directory is already mapped to the Z:/ drive for you and will be available to you whichever lab machine you are on. You may access this working directory through the "My Computer" icon. We will refer to this working directory as Z:/ in this and subsequent labs. If you don't know how to manipulate files under the windows operating system, please ask the TA's. For the sake of system administration, please keep ALL your lab files under your working directory. All user files outside of your working directory may be deleted at any time.

The user privileges are setup such that the different ee265 accounts are private, so you may not be able to install certain programs. If you find something that you would like to install but couldn't or you would like those programs accessible on the EE domain, let us know and we will install it on the server.

B. Setting up the Lab

All the lab materials are stored on this website. Download these files to your personal directory before doing lab assignments. Files for lab1 are

C. Code Submission

You are to demonstrate your code to the TA on the due date (usually on Thursdays). Demonstrations must be done during the TA office hours. The TA's have allotted office hours on Thursdays for this and we expect that you will be able to find a time slot to meet with them. During the meeting, you are asked to setup and demonstrate your lab. Upon completion, you will need to provide a brief project report (templates will be provided each week in the handouts section of the webpage). You will also be expected to attach a hardcopy of your code (and linker command file) to the lab submission for grading purposes.

Your code will be graded based on a number of criteria: performance in terms of cycle time and code size, coding style, and lab write-ups. We are not asking you to spend a lot of time on comments and style, but enough for you and the TA to understand.

 

VI. Lab Exercises

In this lab, you will first be introduced to the CCS tool environment from the perspective of an experienced user. From it, you will get exposed to the do's and don'ts of lab equipment and software. Then, you will go through two sample programs for a demonstration of the C5509 DSP board. Finally, you will be asked write code (and test) a simple FIR filter. You will also need to answer many questions at the end of this lab.

A. Tool Environment

In this section, we will take you through the tool environment, how to run CCS, what to look for, and precautions to take when working with the hardware. There is a CCS tutorial available. However, it is time consuming to go through. Besides, exact instructions on what to do in CCS are provided in the lab exercises.

The Code Composer Studio is an integrated development environment specifically designed for TI DSP processors. The Code Composer Studio also has extended features for hardware debugging. It facilitates communication of data between the host and the EVM board and enables monitoring of the board status information. Like all development environments, the combination of CCS and the 5509a DSK board is sometimes buggy and it will take time to get used to the development flow. HINT: Often, repeating the same steps may actually make something work.

1. On-line References

You can find all the important documentation you will need on-line through the TI homepage, http://www.ti.com/. To simplify the search for information, we have downloaded the on-line references (most of them in the Adobe Acrobat PDF format) and made them available on the EE network server through the class homepage. In addition the CCS Help menu is VERY USEFUL and contains searchable information. With a few clicks, you can locate the description of the ST0_55 register or the MPY::MPY instruction.

2. Hardware setup

Before you run CCS, you need to power-up the EVM board and check its connections. Previous EVM boards proved rather fragile. We hope with care, and lessons learned from those boards to have fewer boards broken in the early weeks of this class.

Look around the workbench for the EVM. It is marked TMS320C5509A DSK. The EVM has a power socket, USB connectors, stereo audio jacks. and the power supply.

In order to use the EVM with CCS, you will first need to

 

  • Connect the mini-USB connector to the DSP. Oddly, it sometimes appears to be important to plug the USB cable first into the EVM before plugging it into the computer. The mini-USB port, which is adjacent to the power socket, connects to a small JTAG interface which enables the debugging features we will take advantage of in this course. One of the other two peripheral-size USB ports is connected directly to a USB output port of the DSP, and the other is connected to an independent power measurement system which allows developers to measure the instantaneous level of the DSP's power consumption.
  • Connect the USB cable to the computer.
  • (Optionally) Connect the audio input and output cables. The stereo 3.5 mm jumper cable should connect on one side to an audio source, such as the computer's line out. On the other end, it connects to the audio jack marked “Line In” on the EVM. Your headphones should connect to the audio jack marked “Headphone” on the EVM.
  • Power the board by plugging the power supply into the power socket at the corner of the board. You should notice the 4 LEDs light up in a sequential pattern that concludes with all 4 illuminated. While one of the buttons is marked “Reset”, we have found that unplugging the board is the most reliable way to power cycle the DSP.

 

3. Software Tool Environment

This section provides you with the REALLY useful knowledge that is entirely experience based and you will not find it in any of TI's documentation. It is important that you go through this section in detail even if you don't understand some of the terminologies. This section will save you time later on when you encounter strange anomalies in CCS that you cannot comprehend. It is highly recommended that you refer to this section when you encounter problems in this lab or any future labs.

CCS startup procedure:

  • You must first connect power to the EVM board before you start CCS. Otherwise, you will get an error dialog box telling you that CCS is having problem communicating with the EVM.
  • You can start CCS by double clicking the 5509A DSK CCStudio v3.1 icon. 

Do not poke around and start the CCS setup utility. If you do, CCS will still run but the device driver for RTDX (you will learn about this later) may break. The staff will have to reinstall CCS to make it work again. If RTDX does not work for you, check to see if the board is broken first and report the problem to the TA.

  • After you started CCS, the development environment window should come up. The next thing that you should do is to connect to the EVM. To do this, choose the Connect option from the Debug menu; Debug->Connect. If this is successful, a window denoted “Disassembly” will pop up in the code view area of the development environment.
  • To the left is the Project View window. At this point, you can either open a project or load an executable onto the EVM board. A project is like a container that groups your source codes to help you keep track of them. Executables are the compiled version of your source codes that are ready to be loaded onto the EVM board. The typical development flow is as follows:
    1. Start a new project (Project->New...) or open a project template (Project->Open...).
    2. Complete/edit the source codes.
    3. Compile the source codes ( Project->Build or use a keyboard or toolbar shortcut) to build the executable and verify that there are no syntax errors.
    4. Load the executable ( File->Load Program ) onto the EVM board.
    5. Run (Project->Run, or keyboard/toolbar shortcuts) and debug the program. In general, debugging would likely involve repeating steps 2-5 until the program runs as expected.
  • Working with projects:
    • The Project View window contains a listing of the source codes. Only the source codes listed here will be compiled and linked. You can add source codes to the project either using the pull down menu Project->Add Files to Project... or by dragging the source code from a file manager to the Project View window.
    • Make note of the path name of the directory where the source codes are stored. It cannot contain spaces. For example, the path name C:\Program Files\ti\my project is not supported by CCS. If the path to your source codes contains spaces, what will happen is that the extra spaces will be incorrectly encoded into the .pjt file of your project. This causes a problem when you reload the project. CCS will not complain but all the source codes that were originally included in the project will not appear in the Project View window. (NOTE: This was the case for earlier versions of CCS, but it may not be true in CCS 3.1)