Skip to content

orion-kernel/assembly-tute

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Hello, World! in Assembly

https://notebooklm.google.com/notebook/508f5265-b7ac-4d56-9345-03ae517eaa1c

This document explains how to write, assemble, and run a "Hello, World!" program in x86-64 assembly on a Linux system.

The Assembly Code (hello.asm)

section .data
    hello db 'Hello, World!', 0ah

section .text
    global _start

_start:
    ; write(1, hello, 13)
    mov rax, 1
    mov rdi, 1
    mov rsi, hello
    mov rdx, 13
    syscall

    ; exit(0)
    mov rax, 60
    xor rdi, rdi
    syscall

Code Explanation

  • section .data: This section is for declaring initialized data.
    • hello db 'Hello, World!', 0ah: This defines a byte string named hello containing "Hello, World!" and a newline character (0ah).
  • section .text: This section contains the program's code.
    • global _start: This makes the _start label visible to the linker, marking it as the program's entry point.
    • _start:: This is where the program begins execution.
  • Printing "Hello, World!": This part of the code uses a system call to write the string to the console.
    • mov rax, 1: Sets rax to 1, the syscall number for write.
    • mov rdi, 1: Sets rdi to 1, the file descriptor for stdout.
    • mov rsi, hello: Sets rsi to the memory address of our hello string.
    • mov rdx, 13: Sets rdx to 13, the length of the string.
    • syscall: Executes the write system call.
  • Exiting the program: This part of the code uses a system call to terminate the program.
    • mov rax, 60: Sets rax to 60, the syscall number for exit.
    • xor rdi, rdi: Sets rdi to 0 (the exit code) by XORing it with itself.
    • syscall: Executes the exit system call.

Assembling and Running

To run the assembly program, you need to assemble it into an object file and then link it to create an executable.

Commands

  1. Assemble with nasm:
    nasm -f elf64 hello.asm -o hello.o
  2. Link with ld:
    ld hello.o -o hello
  3. Run the executable:
    ./hello

Command Explanations

  • nasm -f elf64 hello.asm -o hello.o: This command assembles the hello.asm file.
    • nasm: The Netwide Assembler.
    • -f elf64: Specifies the output format as ELF64, the standard for 64-bit Linux executables.
    • hello.asm: The input assembly file.
    • -o hello.o: Specifies the output object file name.
  • ld hello.o -o hello: This command links the hello.o object file.
    • ld: The GNU linker.
    • hello.o: The input object file.
    • -o hello: Specifies the output executable file name.
  • ./hello: This command executes the hello program.

Understanding Data Declaration Order in Assembly

Hey! Here is a simple explanation for how the order of lines in the .data section of an assembly file can change the program's output. It all comes down to how the assembler reads the code.

The Core Concept: The Assembler Reads Top-to-Bottom

Think of the assembler as a program that reads your .asm file one line at a time, from top to bottom. As it reads, it calculates memory addresses and defines values.

The most important symbol for this explanation is the dollar sign: $

  • $: This special symbol represents the current memory address where the assembler is working. Its value changes as the assembler moves down the file and allocates space for data.

The Base Code

Here is the simple program we are looking at. It's designed to print a message to the screen.

section .text
  global _start

    _start:
      mov edx, len    ; The length of the message
      mov ecx, msg    ; The message to write
      mov ebx, 1      ; File descriptor (1 for stdout)
      mov eax, 4      ; System call number (sys_write)
      int 0x80        ; Call the kernel

      mov eax, 1      ; System call number (sys_exit)
      int 0x80        ; Call the kernel

section .data
  msg db "12345", 0xa
  ; The next lines are what we will change

The key instruction is len equ $ - msg. This tells the assembler: "Create a constant called len and set its value to the current address ($) minus the starting address of msg."


Case 1: Printing Only the First Message

In this case, we calculate the length immediately after the first message.

The Code

section .data
  msg db "12345", 0xa

  ; --- The important part ---
  len equ $ - msg
  msg2 times 5 db "6", 0xa

The Explanation

  1. The assembler reads msg db "12345", 0xa and allocates 6 bytes of memory.
  2. On the very next line, it sees len equ $ - msg. At this exact moment, the $ symbol points to the memory address right after the msg string.
  3. The assembler calculates len as (address after msg) - (address of msg), which equals 6.
  4. Even though msg2 is defined on the next line, it doesn't matter because the value of len has already been set in stone.
  5. The program runs and prints 6 bytes starting from msg.

Expected Output

12345

Case 2: Printing Both Messages

In this case, we move the length calculation to the very end.

The Code

section .data
  msg db "12345", 0xa
  msg2 times 5 db "6", 0xa

  ; --- The important part ---
  len equ $ - msg

The Explanation

  1. The assembler reads msg db "12345", 0xa (6 bytes).
  2. It then continues and reads msg2 times 5 db "6", 0xa (10 bytes).
  3. Only after processing both msg and msg2 does it see len equ $ - msg. Now, the $ symbol points to the memory address after all the msg2 data.
  4. The assembler calculates len as (address after msg2) - (address of msg), which is the total combined length: 16 bytes.
  5. The program runs and prints 16 bytes starting from msg. Since msg2 is located right after msg in memory, it prints all of msg and then continues, printing all of msg2 as well.

Expected Output

12345
6
6
6
6
6

Summary of Our Discussion

You understood this perfectly. The key takeaway is that the assembler is not magic; it's a sequential program. The value of $ is dynamic during assembly time, and the position of the len equ $ - msg line is what determines the final, constant value of len that gets baked into your executable. Your analysis of both cases was spot on.

That's a great question. You've touched on the other half of data definition.

Here’s the key difference:

  • DB, DW, DD, etc. Define Data. They allocate space and put a specific value into it.
  • RESB, RESW, RESD, etc. Reserve Space. They only allocate empty space. They are for creating uninitialized variables that your program will fill in later.

Think of it like this:

  • DB: my_message: db 'Hello!' ➡ Puts the bytes for 'H', 'e', 'l', 'l', 'o', '!' into the file.
  • RESB: user_input: resb 100 ➡ Skips 100 bytes of space, leaving an empty buffer. Your code can later read keyboard input and store it at the user_input address.

How to Use Them (The .bss Section)

The correct way to use these RESB directives is in a special section called .bss.

Your binary file has different sections:

  • .text: For your code (like mov, jmp).
  • .data: For initialized data (like db 'Hello'). This data is saved inside your executable file.
  • .bss: For uninitialized data (like resb 100). This data is not saved in the file. It's just a note that says, "When you load this program, please set aside 100 bytes of empty memory for it." This keeps your executable file small.

Example

Here is how you would use it in a program.

section .data
    ; --- Initialized data ---
    prompt_msg: db 'What is your name? ', 0


section .bss
    ; --- Reserved (uninitialized) space ---

    ; Reserve 100 bytes of space to store the user's name
    user_name: resb 100

    ; Reserve 1 word (2 bytes) to store the user's age
    user_age: resw 1

    ; Reserve 1 doubleword (4 bytes) for a counter
    login_attempts: resd 1


section .text
global _start

_start:
    ; ... your code would go here ...

    ; For example, you might:
    ; 1. Call a function to print 'prompt_msg'
    ; 2. Call a function to read keyboard input
    ; 3. Store the result at the 'user_name' address

    ; ... more code ...

In this example:

  • prompt_msg is created using db because we know its value ("What is your name?") from the start.
  • user_name is created using resb 100 because we don't know the user's name yet. We are just saving an empty 100-byte buffer for it.
  • user_age is created using resw 1 to hold a 2-byte number that we'll get from the user later.
  • login_attempts is created using resd 1 to hold a 4-byte number that our code will increment or reset.

Summary

Directive Purpose Analogy
DB, DW, DD Define Data Puts a pre-filled box on the shelf.
RESB, RESW, RESD Reserve Space Puts an empty box on the shelf (labeled).

About

assembly-learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors