This program tries to recreate bash --posix behaviour in some ways.
Execution : • Informations Briefly • What you should know (Summary) • I- Parsing • 1. Separation commande_line • 1.1 Quote close, Quoting rules • 2. Separation of words in commande_line
Ever wondered how your terminal shell actually works? How does it parse commands, execute programs, handle pipes, and manage processes? Minishell is a 42 school project that challenges you to build your own mini version of Bash from scratch in C. It's an incredible journey into the depths of Unix systems programming.
This project implements a functional shell program that handles command parsing and execution, pipes and redirections, environment variable expansion, built-in commands (cd, echo, pwd, export, unset, env, exit), signal handling (Ctrl-C, Ctrl-D, Ctrl-), and process management with proper cleanup.
In this article, I'll walk through the key concepts and implementation challenges of building a shell, from understanding processes and pipes to handling signals and building command pipelines.
Minishell is a miniature shell program based on Bash that supports:
Core Features:
The shell provides an interactive prompt with command history (up and down arrows), executes both system executables from the environment (ls, cat, grep, etc.) and local executables (./minishell), and includes builtin commands with their essential options. It supports pipes (|) for chaining commands, redirections (>, >>, <, <<), environment variable expansion ($USER, $VAR), exit status tracking ($?), and signal handling for user interrupts.
Limitations:
The project intentionally doesn't support backslashes, semicolons, logical operators (&&, ||), or wildcards to keep the scope manageable while still covering the fundamental concepts.
Before diving into implementation, let's understand what processes are and how they work.
A process is a program in execution. When you run a program, the system loads its instructions into RAM and executes them. The operating system manages all processes and allocates memory to each one independently—each has its own stack, heap, and instruction pointer.
You can view active processes with:
ps auxEach process has a PID (Process Identifier), which is a unique non-negative integer, and a PPID (Parent Process Identifier) that references the parent process.
Processes are organized hierarchically. At startup, Unix has just one process called init (PID 1), which is the direct or indirect ancestor of all other processes.
The fork() system call creates a new process by cloning the current one:
#include <unistd.h>
pid_t fork(void);The return value is crucial: in the parent process, it returns the child's PID; in the child process, it returns 0; and on error, it returns -1.
Here's a basic example:
#include <unistd.h>
#include <stdio.h>
int main(void)
{
pid_t pid;
printf("Before fork\n");
pid = fork();
if (pid == -1)
{
perror("fork failed");
return 1;
}
else if (pid == 0)
{
// Child process
printf("I'm the child, PID: %d\n", getpid());
}
else
{
// Parent process
printf("I'm the parent, child PID: %d\n", pid);
}
return 0;
}Important: The child inherits the parent's instruction pointer, so it doesn't start from the beginning—it continues from where fork() was called!
When you fork, the child gets a copy of the parent's memory, not a shared reference. Changes made in one process don't affect the other:
int main(void)
{
int value = 42;
pid_t pid;
pid = fork();
if (pid == 0)
{
value = 100; // Child changes value
printf("Child: value = %d\n", value);
}
else
{
sleep(1); // Wait for child to finish
printf("Parent: value = %d\n", value); // Still 42!
}
return 0;
}This isolation is why we need inter-process communication mechanisms like pipes.
After creating a child process, the parent should wait for it to finish. Otherwise, you can get zombie processes—terminated children whose exit status hasn't been collected.
When a parent doesn't wait for its children, they become zombies:
Conversely, if a parent exits before waiting, children become orphans and are adopted by init:
#include <sys/wait.h>
pid_t wait(int *status);
pid_t waitpid(pid_t pid, int *status, int options);wait() waits for any child process to terminate.
waitpid() offers more control with three parameters: pid (specific child to wait for, or -1 for any child), status (pointer to store the exit status), and options (flags like WNOHANG to return immediately if child hasn't exited).
Use these macros to examine the status:
if (WIFEXITED(status))
{
// Child exited normally
int exit_code = WEXITSTATUS(status);
printf("Exit code: %d\n", exit_code);
}
if (WIFSIGNALED(status))
{
// Child was terminated by a signal
int signal = WTERMSIG(status);
printf("Terminated by signal: %d\n", signal);
}#include <unistd.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
pid_t pid;
int status;
pid = fork();
if (pid == 0)
{
// Child process
printf("Child: Working...\n");
sleep(2);
exit(42);
}
else
{
// Parent process
printf("Parent: Waiting for child...\n");
waitpid(pid, &status, 0);
if (WIFEXITED(status))
{
printf("Parent: Child exited with code %d\n",
WEXITSTATUS(status));
}
}
return 0;
}Here's what the output looks like when analyzing exit status:
And here's a comparison with different exit codes:
Pipes are the foundation of shell command chaining. They allow one process's output to become another's input.
A pipe is a unidirectional communication channel with a read end (file descriptor) and a write end (file descriptor).
Data written to the write end is buffered until read from the read end.
#include <unistd.h>
int pipe(int pipefd[2]);The pipefd array will contain pipefd[0] (the read end) and pipefd[1] (the write end).
#include <unistd.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
int pipefd[2];
pid_t pid;
char buffer[100];
if (pipe(pipefd) == -1)
{
perror("pipe");
return 1;
}
pid = fork();
if (pid == 0)
{
// Child reads from pipe
close(pipefd[1]); // Close write end
read(pipefd[0], buffer, sizeof(buffer));
printf("Child received: %s\n", buffer);
close(pipefd[0]);
}
else
{
// Parent writes to pipe
close(pipefd[0]); // Close read end
char *msg = "Hello from parent!";
write(pipefd[1], msg, strlen(msg) + 1);
close(pipefd[1]);
wait(NULL);
}
return 0;
}This is crucial and often the source of bugs. Each process must close the pipe ends it doesn't use: if all write ends aren't closed, read() will wait indefinitely instead of returning EOF, and if all read ends aren't closed, write() will block when the pipe is full.
// Parent writes to pipe
close(pipefd[0]); // MUST close unused read end
write(pipefd[1], data, size);
close(pipefd[1]); // MUST close when done
// Child reads from pipe
close(pipefd[1]); // MUST close unused write end
read(pipefd[0], buffer, size);
close(pipefd[0]); // MUST close when doneIf you forget to close unused file descriptors, you'll see output like this where the process hangs indefinitely:
When you run cat file.txt | wc -l, the shell:
- Creates a pipe
- Forks two child processes
- Redirects the first command's stdout to the pipe's write end
- Redirects the second command's stdin to the pipe's read end
Using dup2() to redirect file descriptors:
int pipefd[2];
pipe(pipefd);
// First command: cat file.txt
if (fork() == 0)
{
close(pipefd[0]); // Close read end
dup2(pipefd[1], STDOUT_FILENO); // Redirect stdout to pipe
close(pipefd[1]);
execlp("cat", "cat", "file.txt", NULL);
}
// Second command: wc -l
if (fork() == 0)
{
close(pipefd[1]); // Close write end
dup2(pipefd[0], STDIN_FILENO); // Redirect stdin from pipe
close(pipefd[0]);
execlp("wc", "wc", "-l", NULL);
}
// Parent closes all pipe ends and waits
close(pipefd[0]);
close(pipefd[1]);
wait(NULL);
wait(NULL);Here's a visual representation of how the shell's pipe operator works:
A pipeline like cmd1 | cmd2 | cmd3 requires multiple pipes. The pattern is that N commands require N-1 pipes, and each middle command reads from one pipe and writes to the next.
cmd1 --> pipe1 --> cmd2 --> pipe2 --> cmd3
Key implementation points:
- Create all pipes before forking
- Each child closes all pipe ends it doesn't use
- First command only writes, last command only reads
- Middle commands both read and write
// For 3 commands, need 2 pipes
int pipe1[2], pipe2[2];
pipe(pipe1);
pipe(pipe2);
// Command 1: only writes to pipe1
if (fork() == 0)
{
dup2(pipe1[1], STDOUT_FILENO);
close(pipe1[0]);
close(pipe1[1]);
close(pipe2[0]);
close(pipe2[1]);
// Execute command 1
}
// Command 2: reads from pipe1, writes to pipe2
if (fork() == 0)
{
dup2(pipe1[0], STDIN_FILENO);
dup2(pipe2[1], STDOUT_FILENO);
close(pipe1[0]);
close(pipe1[1]);
close(pipe2[0]);
close(pipe2[1]);
// Execute command 2
}
// Command 3: only reads from pipe2
if (fork() == 0)
{
dup2(pipe2[0], STDIN_FILENO);
close(pipe1[0]);
close(pipe1[1]);
close(pipe2[0]);
close(pipe2[1]);
// Execute command 3
}
// Parent closes all pipes and waits
close(pipe1[0]);
close(pipe1[1]);
close(pipe2[0]);
close(pipe2[1]);
// Wait for all childrenShells need to handle user interrupts gracefully. When you press Ctrl-C, you don't want to exit the shell—just the current command.
A signal is an asynchronous notification sent to a process. Common signals:
SIGINT(2): Interrupt (Ctrl-C)SIGQUIT(3): Quit (Ctrl-)SIGTERM(15): Termination requestSIGKILL(9): Force kill (cannot be caught!)SIGSTOP(19): Stop process (cannot be caught!)
Here's a visual representation of how signals work at the system level:
It's important to understand that signals can be pending. When a signal is blocked, it becomes pending until it's unblocked:
Important: There can only be one pending signal of any particular type. If multiple signals of the same type are sent while blocked, only one will be delivered when unblocked.
The modern way to handle signals is with sigaction():
#include <signal.h>
struct sigaction {
void (*sa_handler)(int); // Handler function
sigset_t sa_mask; // Signals to block during handler
int sa_flags; // Flags (e.g., SA_RESTART)
};
int sigaction(int signum, const struct sigaction *act,
struct sigaction *oldact);#include <signal.h>
#include <stdio.h>
#include <unistd.h>
void handle_sigint(int sig)
{
write(STDOUT_FILENO, "\nCaught SIGINT!\n", 16);
}
int main(void)
{
struct sigaction sa;
sa.sa_handler = handle_sigint;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART;
if (sigaction(SIGINT, &sa, NULL) == -1)
{
perror("sigaction");
return 1;
}
printf("Press Ctrl-C to test (Ctrl-\\ to quit)...\n");
while (1)
sleep(1);
return 0;
}Example output showing signal handling:
Signal handlers are tricky! Follow these rules:
- Keep handlers short and simple - Just set a flag if possible
- Use only async-signal-safe functions - No
printf(),malloc(), etc.! - Save and restore errno - Handlers can interfere with error handling
- Block signals when accessing shared data
- Use
volatile sig_atomic_tfor flag variables
Safe functions include: write(), _exit(), signal(), kill(), and a few others. Check the signal-safety(7) man page.
In Minishell, the expected behavior is:
- Ctrl-C (SIGINT): Display a new prompt line (interrupt current command)
- Ctrl-D: Exit the shell (EOF)
- Ctrl-\ (SIGQUIT): Do nothing (ignore)
In the parent shell process:
void setup_signals(void)
{
struct sigaction sa;
// Handle SIGINT
sa.sa_handler = handle_sigint;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART;
sigaction(SIGINT, &sa, NULL);
// Ignore SIGQUIT
sa.sa_handler = SIG_IGN;
sigaction(SIGQUIT, &sa, NULL);
}
void handle_sigint(int sig)
{
(void)sig;
write(STDOUT_FILENO, "\n", 1);
rl_on_new_line(); // Readline function
rl_replace_line("", 0); // Clear current line
rl_redisplay(); // Redisplay prompt
}In child processes executing commands, restore default signal handling:
signal(SIGINT, SIG_DFL);
signal(SIGQUIT, SIG_DFL);Sometimes you need to block signals temporarily:
sigset_t set;
sigemptyset(&set);
sigaddset(&set, SIGINT);
// Block SIGINT
sigprocmask(SIG_BLOCK, &set, NULL);
// Critical section here
// Unblock SIGINT
sigprocmask(SIG_UNBLOCK, &set, NULL);Here's an example demonstrating signal blocking in action:
And what happens when a blocked signal is unblocked (notice how the pending signal is delivered immediately):
Shells support redirecting input and output:
>: Redirect output (overwrite)>>: Redirect output (append)<: Redirect input<<: Here-document (read until delimiter)
int fd = open("output.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
if (fd == -1)
{
perror("open");
return 1;
}
dup2(fd, STDOUT_FILENO); // Redirect stdout to file
close(fd);
// Now all output goes to output.txt
printf("This goes to the file\n");int fd = open("output.txt", O_WRONLY | O_CREAT | O_APPEND, 0644);
dup2(fd, STDOUT_FILENO);
close(fd);int fd = open("input.txt", O_RDONLY);
if (fd == -1)
{
perror("open");
return 1;
}
dup2(fd, STDIN_FILENO); // Redirect stdin from file
close(fd);
// Now reads come from input.txt
char buffer[100];
read(STDIN_FILENO, buffer, sizeof(buffer));A here-document reads input until a delimiter is reached:
cat << EOF
Line 1
Line 2
EOFImplementation approach:
- Display a prompt for each line
- Read user input
- Stop when delimiter is encountered
- Store all lines in a temporary file or pipe
- Redirect stdin from that source
// Simplified here-doc implementation
int pipefd[2];
pipe(pipefd);
// Read until delimiter
char *line;
while ((line = readline("> ")))
{
if (strcmp(line, delimiter) == 0)
break;
write(pipefd[1], line, strlen(line));
write(pipefd[1], "\n", 1);
free(line);
}
close(pipefd[1]);
dup2(pipefd[0], STDIN_FILENO);
close(pipefd[0]);Shells maintain environment variables and expand them in commands.
The environment is available through:
// Global variable
extern char **environ;
// Or passed to main
int main(int argc, char **argv, char **envp)Each environment entry is a string: "NAME=value"
When you see $USER in a command, the shell should:
- Extract the variable name
- Look it up in the environment
- Replace
$USERwith its value
char *get_env_value(char *name, char **env)
{
int i = 0;
size_t len = strlen(name);
while (env[i])
{
if (strncmp(env[i], name, len) == 0 && env[i][len] == '=')
return &env[i][len + 1];
i++;
}
return NULL;
}$?: Exit status of last command$$: Current shell's PID
These require special handling during parsing.
Some commands must be executed by the shell itself (not in a child process) because they affect the shell's state.
int builtin_cd(char **args)
{
char *path = args[1];
if (!path)
path = getenv("HOME");
if (chdir(path) != 0)
{
perror("cd");
return 1;
}
return 0;
}int builtin_export(char **args, char ***env)
{
if (!args[1])
{
// Print all environment variables
print_env(*env);
return 0;
}
// Add/update variable
char *name = args[1];
// Parse NAME=value format
// Update environment
return 0;
}int builtin_exit(char **args)
{
int exit_code = 0;
if (args[1])
exit_code = atoi(args[1]);
exit(exit_code);
}Other builtins: pwd, env, unset, echo -n
Building a shell requires parsing user input into a command structure. This typically involves:
Break input into tokens:
Input: ls -la | grep txt > output.txt
Tokens: [ls] [-la] [|] [grep] [txt] [>] [output.txt]
Build a command structure:
typedef struct s_redir
{
int type; // <, >, <<, >>
char *file;
struct s_redir *next;
} t_redir;
typedef struct s_cmd
{
char **args; // Command and arguments
t_redir *redirs; // List of redirections
struct s_cmd *next; // Next command in pipeline
} t_cmd;Parse the tokens into this structure, handling:
- Quotes (single and double)
- Variable expansion
- Whitespace
- Special characters
Putting it all together:
while (1)
{
// 1. Display prompt
char *line = readline("minishell$ ");
if (!line)
break; // Ctrl-D
// 2. Add to history
add_history(line);
// 3. Parse input
t_cmd *cmd = parse_line(line);
// 4. Execute command
if (is_builtin(cmd))
execute_builtin(cmd);
else
execute_pipeline(cmd);
// 5. Cleanup
free_cmd(cmd);
free(line);
}The execute_pipeline() function:
void execute_pipeline(t_cmd *cmds)
{
int num_cmds = count_commands(cmds);
int pipes[num_cmds - 1][2];
// Create all pipes
for (int i = 0; i < num_cmds - 1; i++)
pipe(pipes[i]);
// Fork and execute each command
t_cmd *current = cmds;
for (int i = 0; i < num_cmds; i++)
{
if (fork() == 0)
{
// Setup redirections for this command
if (i > 0) // Not first command
dup2(pipes[i-1][0], STDIN_FILENO);
if (i < num_cmds - 1) // Not last command
dup2(pipes[i][1], STDOUT_FILENO);
// Close all pipe fds
close_all_pipes(pipes, num_cmds - 1);
// Apply redirections from command
apply_redirections(current->redirs);
// Execute
execve(current->args[0], current->args, environ);
exit(1);
}
current = current->next;
}
// Parent closes all pipes and waits
close_all_pipes(pipes, num_cmds - 1);
for (int i = 0; i < num_cmds; i++)
wait(NULL);
}With all the forking and string manipulation, leaks are easy:
- Use Valgrind religiously
- Free everything in both parent and child paths
- Be especially careful with readline's returned strings
Unclosed file descriptors accumulate and cause mysterious bugs:
- Track all opens/pipes with a list
- Close in both parent and child
- Use
lsof -p <pid>to debug
Children not properly waited for become zombies:
- Always
wait()orwaitpid()for children - Use
WNOHANGif you need non-blocking checks
With multiple processes and signals:
- Block signals during critical sections
- Use proper signal-safe functions
- Be careful with shared resources
Quotes are surprisingly complex:
echo "Hello $USER" # Expands variables
echo 'Hello $USER' # Literal string
echo "He said 'hi'" # Nested quotesImplement a state machine to track quote context.
Several community testers exist:
- Start simple: Get basic command execution working first
- Test incrementally: Add one feature at a time
- Compare with bash: Run the same command in bash and your shell
- Use strace: See all system calls:
strace -f ./minishell - Check with valgrind:
valgrind --leak-check=full --track-fds=yes ./minishell
- Empty input
- Commands with only whitespace
- Unclosed quotes
- Invalid redirections
- Permission errors
- Non-existent commands
- Signal delivery during system calls
Building Minishell taught me:
- Systems programming is hard but rewarding - You gain deep appreciation for shells
- Error handling is crucial - Every system call can fail
- Resource management matters - File descriptors and memory are precious
- RTFM - Man pages become your best friend
- Testing is essential - Edge cases will break your shell
The project forces you to understand:
- How processes actually work
- The beauty and complexity of Unix pipes
- Why signal handling is so tricky
- What shells do when you type commands
Minishell is more than just a project—it's a deep dive into Unix fundamentals. By building a shell from scratch, you gain intimate knowledge of:
- Process creation and management
- Inter-process communication
- Signal handling
- File descriptors and I/O redirection
- Command parsing and execution
Every time you open a terminal now, you'll understand what's happening under the hood. You'll appreciate the elegance of Unix's pipe philosophy and the complexity involved in making it all work seamlessly.
The skills learned here transfer directly to systems programming, understanding how tools like Docker and systemd work, and building more complex concurrent applications.
If you're working on Minishell or a similar project, embrace the challenge. Debug patiently, test thoroughly, and don't be afraid to dive deep into man pages. The frustration is temporary, but the knowledge is permanent.
Happy shell building! 🐚
- GNU Bash Manual
- Man pages:
man bash,man fork,man pipe,man signal - The Linux Programming Interface by Michael Kerrisk
- Advanced Programming in the UNIX Environment by Stevens & Rago
- Minishell Tutorial Series
Thanks goes to these wonderful people (emoji key):
John Decorte 💻 |
XU WU Lei Jie 💻 |
LucasYaiche 💻 |
||||
|
|
||||||
This project follows the all-contributors specification. Contributions of any kind welcome!
















