www.mycpu.eu (C) 2023 by Dennis Kuschel 


MyCA - The Cross Assembler for MyCPU



Table of Contents:

About Cross Assemblers

You may ask what a cross assembler is. Maybe you have heared about assemblers. "Assembly" or "Assembler" is not only a synonym for the machine language of microprocessors, but it stands also for a program or tool that is used to translate readable programs (text files) into machine code that can be understood by microprocessors. These tools often run on the machine you are writing programs for. For example, you can write an assembly program for you PC, and compile (or better: assemble) it on your PC. That would be the easy way. But what if your target machine does not have enough memory for the tool, or even worse, does neither have a keyboard nor a display? Then you have to use a "cross assembler", that means an assembler tool that runs on one machine but produces code for an other machine. This is the way I started with working on MyCPU.

The Old Cross Assembler for MyCPU

When I started the MyCPU project in 2001, I needed a simple cross assembler to compile my first test programs for MyCPU. At that time I had already written programs for the 8051 microprocessor, and for that purpose I used a table oriented macro assembler called "hasm". Table oriented? Yes, that means that the assembler could be easyly adopted to lots of target platforms, by simply writing a translation table ("replace text command X by binary code Y" and so on). But I quickly hit the limits of that tool - especially the limited space for labels was annoying. (The program was a MSDOS program from the 1990's and used only the lower 640kB of system memory.) Because of this limitation I was forced to split MyCPU's kernel ROM into two halfs, meaning two different software projects that needed to be managed. Finally it was no more possible to put even one more source code line into the ROM, because the assembler raised immediately the out-of-memory error. Even worse, the old hasm executable does no more run in the command shell of modern 64-bit operating systems. And because of its architecture the hasm.exe executes very slow in DOS-box emulators, thus making it nearly unusable on Windows 7 upwards. And because I changed my host operating system from Windows XP to Windows 7 in april of 2014, I was ultimately forced to write a new cross assembler for MyCPU. And the neat side effect is that the new cross assembler is "open source", and it is also available for the Linux platform!

MyCA was born

Over the christmas holidays 2014 I had some time to start writing the new cross assembler. I wrote "myca" completely from scratch, and this took about 100 hours. My goal was that "myca" should be 100% compatible to hasm.exe, so that the generated .hex files are equal between the two programs, which made it easier for me to track down errors in myca. But myca is still not a full replacement for hasm.exe, because it supports only the functions of hasm.exe that I used in the MyCPU source code.

Features of MyCA

  • MyCA is a macro cross assembler that runs on a Windows- or Linux- based host.

  • MyCA has a built-in C/C++ style preprocessor. All well-known preprocessor commands are supported, like #ifdef, #ifndef, #if, #else, #elif, #endif, #include, #error and #waring. But the preprocessor does not support macros with a parameter list like #define MIN(x, y) ((x)<(y)?(x):(y)). This is because myca itself is a macro assembler and supports the MACRO command for this purpose.

  • MyCA supports user-defined memory segments for code and data. If segments are not bound by the user to a specific base address, myca will place the segments at compile time to the best suited start address.

  • MyCA supports macros with local labels. I am using lots of macros in the source code of the MyCPU kernel ROM.

  • MyCA has no hard coded limit for the amount of labels, variables and defines. But because it is still a 32-bit program, the heap can not get bigger than 4GB, but you should never hit this limit (not when assembling code for MyCPU...)

  • MyCA is a multi-pass assembler. Code is built incrementally, thus making the generated program as small as possible. For example, if a memory cell in the zero page is addressed, you do not need the OP-codes for addressing 16-bit addresses, but you can take the smaller OP-codes that are especially available for zero page addressing. The assembler needs several passes through the code to decide which OP-code would be the best.

  • MyCA can be used to replace the myasm assembler on MyCPU. If you have written programs on the MyCPU itself, you can now also cross-assemble these programs by using myca. MyCA will auto-detect if your program was originally written for myasm.

  • MyCA can generate binary files, Intel hex files and listfiles.

  • MyCA is really fast, it is at least 20 times faster than hasm.exe

  • You can use MyCA for your own 8-bit homebrew computer project. You must only rewrite the two source files assembler.c and opcodes.h to support your own instruction set.

New Features since 2016:
  • MyCA can now also assemble code for the new virtual 16-bit CPU emulation "vCPU".


MyCA Command Line Arguments

When you run myca without any arguments, myca will print out these help text:

[MyCA] Macro Cross Assembler V1.07 for MyCPU, (C) 2023 by Dennis Kuschel

Command line arguments:

  myca source.asm [-o binfile][-h[hexfile]][-l[listfile]][-r binrange][-m][-n]

Flags:
  -o   Set name for output binary file
  -h   Generate a hex file
  -l   Generate a list file
  -lnv Do not show vCPU2 instructions in list file
  -r   Set binary output address range, e.g. -r 0x0000-0xFFFF for full 64kb
  -m   Make myca behave like the myasm program on MyCPU
  -n   Forbit myca to behave like the myasm program on MyCPU
  -s   Be silent and print only error messages
  -D   Set preprocessor define: '-D string' or '-D string=value'
  -I   Set include path: '-I /path/to/include/files'
  -O   Set optimization level (only valid for vCPU2): -O0 =none, -O1 =optimize
       -O2 =optimize more, -Os =optimize for size, -Ofast =optimize for speed
  -t   Set target. Allowed targets are: mycpu, vcpu, vcpu2. Example: -t vcpu2

Examples:
  myca rom.asm -o rom.bin -r 0x0000-0xFFFF
  -> Builds the ROM memory image 'rom.bin' with the address range 0x0000-0xFFFF

  myca program.asm -o programname
  -> Builds a user program (the parameter -o is optional), works like myasm

Usually you would call MyCA with at least one parameter: That is the source file you want to compile. If MyCA gets only this one parameter, it automatically generates a binary output file that has the same name like the source file but with the ending ".o". This is the same behaviour like myasm has on MyCPU. Also like with myasm, you can tell myca the name of the output file by using the parameter "-o". Please note that it does not matter if you write myca sourcefile.asm -o testprogram  or   myca sourcefile.asm -otestprogram, the result is the same: MyCA will generate the binary called "testprogram".

You can also advice MyCA to generate a hexfile instead of a binary. This is done by setting the "-h" flag: myca testprogram.asm -h will then generate the hexfile testprogram.hex. Note that you can, similar to the -o parameter, influence the name of the hexfile.

A very usefull feature is the list file that is generated when you append the flag "-l" to the command. The listfile shows all source code lines and the generated program code bytes. In the first column you will find the target address, the second column contains up to three code bytes, and the last column contains the source code line that was translated into these bytes. At the bottom of the listfile you will find a table that lists all memory segments that are used by your program, including the base addresses of these segments and the size. Below the table is a status line that informs you about the overall compilation status (if it was successfull or not).

The special flags "-m" and "-n" are used to advice MyCA to behave like myasm or not. Usually MyCA detects automatically the file format of your source file, but sometimes it may fail. If you observe such a problem then try to set one of these flags. The difference between these formats is mainly the default setting of string formats: myasm automatically translates ASCII strings into the PETSCII code that is used by MyCPU to display texts. MyCA does this translation not by default to keep compatibility to the old hasm.exe program. Note that you can also define the string format within your source file: Use either ".mode ascii" or ".mode petscii" to switch between these two formats.

Some words about the source file encoding format: If you have started coding your assembly program on MyCPU with the text editor "edit", you may have encountered that your program is unreadable with an external texteditor. This is because MyCPU stores all text files in PETSCII format by default. MyCA detects the PETSCII format automatically and translates it on-the-fly into the ASCII format. You can convert your source code files into the ASCII format by simply using the copy-command on MyCPU with the parameter "/a": copy source-pet.asm source-asc.asm /a.


Assembler Programming with MyCA

Some short facts:
  • The usual preprocessor directives are supported
  • Comments are started with a semi-colon (;)
  • Labels start always at the left side of the line. They may end with a colon(:)
  • Mnemonics must not start in the first column of a line. At least one space is required.
  • There are only a few commands that MyCA understands:
    • ORG : set origin of a program or data segment to a fixed memory address
    • SEGMENT : open a new or switch to an existing data or code segment
    • SET and EQU : assign a value to a variable
    • DB and DW : Insert bytes or words into the program. DB can also be used to put text strings into a program.
    • DS : reserve some bytes within a segment
    • MACRO and ENDMACRO: used to define a macro
    • MODE : switch between ASCII and PETSCII mode for characters and strings

Some words about segments:

Usually when you are programming in assembly, you would use only one memory segment, that is the segment where all your programm code is stored in. But using multiple segments can make things easier. For example, when your program uses the paged data memory area $4000-$7FFF, you can define a data segment within this memory. All variables and buffers you are using will be placed in this segment. You must only tell MyCA the size of the variables and buffers with the DS command, and MyCA will automatically assign memory addresses to your variables.

An other advantage of segments is that you can switch between segments at compile time. For example, the source files of the MyCPU kernel ROM use generally two segments: A program code segment and an initialization code segment. All initialization code (such as memory and register initialization) goes into the init code segment, and all other code goes into the program code segment. When assembling this sources, MyCA will first collect all the initialization code snippets of all source files and put them together, and after that init code segment all other program code is placed. So the user does not need to explicitly call the initialization routines of the modules.

Segments can be bound to a memory start address or they can be left floating. Binding a memory segment is done by the ORG command. When the ORG command follows immediately after the first segment command, this segment is bound to the memory address given by ORG. If a segment is not bound it remains floating. That means that MyCA will try to automatically assign a memory base address to this segment. But this is only possible if there is at least one bound segment of the same type in the program. Floating segments are then placed behind the already bound segments. Segments can be of type code or type data.


Example Program

This short example program shows all the features and functions of MyCA. Below the program you will find the listfile that was generated by MyCA and the hexdump of the program. The command was "myca example.asm -l"


Program example.asm:
; Set mode to PETSCII. Strings are now automatically converted to PETSCII code.
.mode petscii

; Assign a value to a variable. This can be done with EQU or SET.
KERN_GETCH      EQU  023Ch
KERN_PRINTSTR   EQU  0244h

; Define a macro that does not take any parameter. 
ret0      MACRO
            CLA
            RTS
          ENDMACRO

; Define a macro that takes one parameter, that is str
print     MACRO  str
            LPT  # str
            JSR  (KERN_PRINTSTR)
          ENDMACRO

; Define a macro that takes three parameters
bcopy     MACRO  srcptr, dstptr, cnt
            LDX  # cnt
            CLY
cploop      LDA  ( srcptr ),Y
            STA  ( dstptr ),Y
            INY
            DXJP cploop
          ENDMACRO


; Build the program header that is required for all MyCPU programs
programheader SEGMENT CODE
ORG 8000h

          DW staddr     ; ptr to memory start address of the program
staddr    DW main       ; program entry point (ptr to main function)
          DW 0          ; optional: main program termination function
          DW codestart  ; this pointer points behind the initdata segment


; Define a data segment in the zero page.
; You are allowed to use the addresses $00-$0F.
zeropage SEGMENT DATA
ORG 00h

; Declare two 16-bit pointers within the zero-page
ptr1      DS 2
ptr2      DS 2

; Open a new segment that contains initialized data, like strings and tables
; Also some small amount of buffer memory can be defined here.
initdata SEGMENT CODE
ORG 8008h

text1     DB "This is a text.\r",0
buffer    DS 20  ; reserve 20 bytes buffer memory for later use


; Now follows the code segment. Since it is not bound to any start address
; the constdata segment can still grow. See the example at the end of this file.
programcode SEGMENT CODE

codestart:

          ;Four dummy instructions: They are required directly after "codestart"
          ;to reference all zeropage addresses we use. This is required for the
          ;dynamic program loader and the automatic memory allocation routine.
          FLG   ptr1
          FLG   ptr1+1
          FLG   ptr2
          FLG   ptr2+1

main:     ;program entry point: main program (must be somewhere behind codestart)

          ;example: use a macro to copy some bytes
          LPT   #text1
          SPT   ptr1
          LPT   #buffer
          SPT   ptr2
          bcopy ptr1, ptr2, 17

          ;print the string that was copied to the buffer
          LPT   #buffer
          JSR   (KERN_PRINTSTR)

          ;print a string using a macro
          print text2

          ;this demonstrates how a label works
waitkey:  JSR   (KERN_GETCH)
          CMP   #0
          JPZ   waitkey

          ;return with accu=0 (exit code), use macro ret0
          ret0


; Switch back to the initdata segment and add one further text line.
; MyCA will place this data to the correct place between
; the program header an the start of the program code.
initdata SEGMENT CODE

text2   DB "Press any key to quit.\r",0


Listfile example.lst:
                ;~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                ;~~  [MyCA] Macro Cross Assembler V1.0 for MyCPU, (c) 2015 by Dennis Kuschel  ~~
                ;~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

                ;[File: example.asm]

                ; Set mode to PETSCII. Strings are now automatically converted to PETSCII code.
                .mode petscii

                ; Assign a value to a variable. This can be done with EQU or SET.
                KERN_GETCH      EQU  023Ch
                KERN_PRINTSTR   EQU  0244h

                ; Define a macro that does not take any parameter.
                ret0      MACRO
                            CLA
                            RTS
                          ENDMACRO

                ; Define a macro that takes one parameter, that is str
                print     MACRO  str
                            LPT  # str
                            JSR  (KERN_PRINTSTR)
                          ENDMACRO

                ; Define a macro that takes three parameters
                bcopy     MACRO  srcptr, dstptr, cnt
                            LDX  # cnt
                            CLY
                cploop      LDA  ( srcptr ),Y
                            STA  ( dstptr ),Y
                            INY
                            DXJP cploop
                          ENDMACRO

                ; Build the program header that is required for all MyCPU programs
                programheader SEGMENT CODE
                ORG 8000h

    8000 0280             DW staddr     ; ptr to memory start address of the program
    8002 4D80   staddr    DW main       ; program entry point (ptr to main function)
    8004 0000             DW 0          ; optional: main program termination function
    8006 4580             DW codestart  ; this pointer points behind the initdata segment

                ; Define a data segment in the zero page.
                ; You are allowed to use the addresses $00-$0F.
                zeropage SEGMENT DATA
                ORG 00h

                ; Declare two 16-bit pointers within the zero-page
                ptr1      DS 2
                ptr2      DS 2

                ; Open a new segment that contains initialized data, like strings and tables
                ; Also some small amount of buffer memory can be defined here.
                initdata SEGMENT CODE
                ORG 8008h

    8008 D44849 text1     DB "This is a text.\r",0
    800B 532049
    800E 532041
    8011 205445
    8014 58542E
    8017 0D00 
                buffer    DS 20  ; reserve 20 bytes buffer memory for later use

                ; Now follows the code segment. Since it is not bound to any start address
                ; the constdata segment can still grow. See the example at the end of this file.
                programcode SEGMENT CODE

                codestart:

                          ;Four dummy instructions: They are required directly after "codestart"
                          ;to reference all zeropage addresses we use. This is required for the
                          ;dynamic program loader and the automatic memory allocation routine.
    8045 3C00             FLG   ptr1
    8047 3C01             FLG   ptr1+1
    8049 3C02             FLG   ptr2
    804B 3C03             FLG   ptr2+1

                main:     ;program entry point: main program (must be somewhere behind codestart)

                          ;example: use a macro to copy some bytes
    804D 6C0880           LPT   #text1
    8050 6F00             SPT   ptr1
    8052 6C1980           LPT   #buffer
    8055 6F02             SPT   ptr2
                          bcopy ptr1, ptr2, 17
    8057 5011               LDX  # 17
    8059 2E                 CLY
    805A 3400   __m1_cploop      LDA  ( ptr1 ),Y
    805C 4402               STA  ( ptr2 ),Y
    805E 8B                 INY
    805F 495A80             DXJP __m1_cploop

                          ;print the string that was copied to the buffer
    8062 6C1980           LPT   #buffer
    8065 1B4402           JSR   (KERN_PRINTSTR)

                          ;print a string using a macro
                          print text2
    8068 6C2D80             LPT  # text2
    806B 1B4402             JSR  (KERN_PRINTSTR)

                          ;this demonstrates how a label works
    806E 1B3C02 waitkey:  JSR   (KERN_GETCH)
    8071 7000             CMP   #0
    8073 196E80           JPZ   waitkey

                          ;return with accu=0 (exit code), use macro ret0
                          ret0
    8076 2C                 CLA
    8077 1F                 RTS

                ; Switch back to the initdata segment and add one further text line.
                ; MyCA will place this data to the correct place between
                ; the program header an the start of the program code.
                initdata SEGMENT CODE

    802D D05245 text2   DB "Press any key to quit.\r",0
    8030 535320 
    8033 414E59 
    8036 204B45 
    8039 592054 
    803C 4F2051 
    803F 554954 
    8042 2E0D00 



Segment Table:
**************
Segment Name                 Startaddr  Endaddr     Size  Type
=========================================================================
programheader                     8000     8008        8  CODE  fixed
initdata                          8008     8045       3D  CODE  fixed
programcode                       8045     8078       33  CODE  floating
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
zeropage                             0        4        4  DATA  fixed
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

No errors found.


Hexdump:




  << go back