Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions SOLUTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
## Solution notes


### Task 01 – Run‑Length Encoder
[x] Done
- Language: Go
- Approach: I used a single-pass, rune-based iteration to count consecutive characters. The string is converted to a []rune to support UTF-8 characters. A strings.Builder accumulates the result by appending each character followed by its count.

- Why: UTF-8 safe: Using []rune ensures multibyte characters (e.g. emojis or Thai letters)
are handled properly.
Efficient: strings.Builder is used to avoid repeated string concatenation (which would
be inefficient).
Linear time complexity: The algorithm scans the string once (O(n)).

- Time spent: ~10 min
- AI tools used: ChatGPT (for validation and write-up support)

### Task 02 – Fix‑the‑Bug
[x] Done
- Language: Go
- Approach: The original code used a global `current` variable without synchronization, which caused data races when accessed from multiple goroutines. I fixed this by introducing a `sync.Mutex` to protect access to the shared variable. The ` NextID()` function now uses `mu.Lock()` and `mu.Unlock()` to ensure only one goroutine can read and update `current` at a time.

- Why: Using `sync.Mutex` guarantees thread safety and prevents race conditions by serializing access to the critical section. While it's not as fast as lock-free approaches like sync/atomic, it's simple, easy to understand, and sufficient for cases where performance is acceptable and clarity is preferred.
- validation: go run -race tasks/02-fix-the-bug/go/buggy_counter.go
- Time spent: ~15 min
- AI tools used: ChatGPT (write-up support)

### Task 03 – Sync-aggregator
[x] Done
- Language: Go
- Approach: I implemented a concurrent file processing system using a fixed-size worker pool (with `sync.WaitGroup`) and Go channels. Each worker processes a file by counting lines and words, while respecting a per‑file timeout using `context.WithTimeout`. File paths are resolved relative to the working directory using `filepath.Abs`. To maintain the correct order of results, each task is indexed and results are collected into a slice in input order.

I also added logic to:
Skip any file that starts with `#sleep=N` where `N >= 5`, returning a `timeout` status.

Ignore metadata lines starting with `#` for line/word counting.

- Why: This approach ensures:

Concurrency control (limits goroutines using a worker pool)

Safe timeout enforcement (to prevent hanging or long-running file reads)

Ordered results (matching the order of paths in `filelist.txt`)

Compatibility with test runner environments (by resolving relative paths dynamically)

Using goroutines and channels allows for high throughput without sacrificing correctness. Applying file-level timeout ensures slow files don’t block the entire operation.

- Time spent: ~70 min
- AI tools used: ChatGPT [test troubleshooting, and edge-case handling, write-up support]


### Task 04 – SQL-resoning
[x] Done
- Language: Go (SQL)
- Approach: For Task A, I computed the total pledged amount per campaign and calculated each campaign's percentage of its funding target using `SUM()` and `GROUP BY`. The result was ordered by `pct_of_target` descending.

For Task B, I calculated the 90th percentile (`P90`) of pledge amounts both globally and for donors from Thailand.

I used window functions (`ROW_NUMBER`, `COUNT`, `OVER`) to rank and compute each pledge's position.

Then applied linear interpolation to calculate the percentile accurately using a subquery join on rank.

Final result was rounded using `ROUND(..., 0)` to ensure integer output as expected in the test.

I added relevant indexes to optimize query performance, especially on `donor.country`, `donor.id`, `pledge.donor_id`, and `pledge.amount_thb`.

- Why: Using SQL window functions and common table expressions (CTEs) makes the logic clear, maintainable, and performant even on large datasets.

Interpolation ensures accurate percentile computation instead of relying on simple LIMIT or approximation.

Indexes improve JOIN and filter performance significantly, especially for `country = 'Thailand' `and pledge amount ranking.


- Time spent: ~30 min
- AI tools used: ChatGPT [index strategy and write-up support]


### Task 02 – Fix‑the‑Bug
[x] Done
- Language: Python
- Time spent: ~50 min
- AI tools used: ChatGPT
### SUMMARY

Completed all 4 tasks in Go (plus SQL for Task 4) with focus on correctness, concurrency safety, and clean logic.
Each task was implemented efficiently and verified with provided tests. Used Go’s standard library features like goroutines, channels, context timeouts, mutex locks, and SQL window functions. Edge cases (e.g. UTF-8 strings, file timeouts, percentile interpolation) were handled carefully to match expected outputs.

Used ChatGPT for validation, write-up clarity, and troubleshooting during complex logic (especially for Task 03).
Total time spent: ~125 min.

Tried Task 02 in Python just for fun, exploring how concurrency safety works differently in another language.
30 changes: 28 additions & 2 deletions tasks/01-run-length/go/rle.go
Original file line number Diff line number Diff line change
@@ -1,9 +1,35 @@
package rle

import (
"strconv"
"strings"
)

// Encode returns the run‑length encoding of UTF‑8 string s.
//
// "AAB" → "A2B1"
func Encode(s string) string {
// TODO: implement
panic("implement me")

if len(s) == 0 {
return ""
}

var builder strings.Builder
count := 1
runes := []rune(s)

for i := 1; i < len(runes); i++ {
if runes[i] == runes[i-1] {
count++
} else {
builder.WriteRune(runes[i-1])
builder.WriteString(strconv.Itoa(count))
count = 1
}
}

builder.WriteRune(runes[len(runes)-1])
builder.WriteString(strconv.Itoa(count))

return builder.String()
}
12 changes: 9 additions & 3 deletions tasks/02-fix-the-bug/go/buggy_counter.go
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
package counter

import "time"
import (
"sync"
)

var current int64
var (
current int64
mu sync.Mutex
)

func NextID() int64 {
mu.Lock()
defer mu.Unlock()
id := current
time.Sleep(0)
current++
return id
}
17 changes: 13 additions & 4 deletions tasks/02-fix-the-bug/python/buggy_counter.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,20 @@
import time

_current = 0
_lock = threading.Lock() # Protects access to _current

def next_id():
"""Returns a unique ID, incrementing the global counter."""
"""Returns a unique ID, incrementing the global counter safely."""
global _current
value = _current
time.sleep(0)
_current += 1
with _lock:
print(f"Current ID: {_current}")
value = _current
time.sleep(0) # Optional; simulates work
_current += 1
Copy link

Copilot AI Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The increment of _current occurs outside the lock. Move _current += 1 inside the with _lock: block to ensure the update is thread-safe.

Suggested change
_current += 1
_current += 1 # Increment inside the lock to ensure thread safety

Copilot uses AI. Check for mistakes.
return value

def main():
next_id()

if __name__ == "__main__":
main()
12 changes: 12 additions & 0 deletions tasks/02-fix-the-bug/python/test2_counter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# test2_counter.py
import concurrent.futures, buggy_counter as bc

def test_no_duplicates():
with concurrent.futures.ThreadPoolExecutor(max_workers=200) as ex:
ids = list(ex.map(lambda _: bc.next_id(), range(10_000)))
assert len(ids) == len(set(ids)), "Duplicate IDs found!"
print("✅ Test passed! No duplicate IDs.")

if __name__ == "__main__":
test_no_duplicates()

157 changes: 153 additions & 4 deletions tasks/03-sync-aggregator/go/aggregator.go
Original file line number Diff line number Diff line change
@@ -1,7 +1,16 @@
// Package aggregator – stub for Concurrent File Stats Processor.
package aggregator

import "errors"
import (
"bufio"
"context"
"os"
"path/filepath"
"strconv"
"strings"
"sync"
"time"
)

// Result mirrors one JSON object in the final array.
type Result struct {
Expand All @@ -11,10 +20,150 @@ type Result struct {
Status string `json:"status"` // "ok" or "timeout"
}

type Task struct {
Index int
Path string
}

type ResultWithIndex struct {
Index int
Result Result
}

func readLines(filelistPath string) ([]string, error) {
file, err := os.Open(filelistPath)
if err != nil {
return nil, err
}
defer file.Close()

var lines []string
scanner := bufio.NewScanner(file)
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
if line != "" {
lines = append(lines, line)
}
}
return lines, scanner.Err()
}

func processFileWithTimeout(displayPath, fullPath string, timeoutSec int) Result {
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(timeoutSec)*time.Second)
defer cancel()

resultChan := make(chan Result, 1)

go func() {
lines, words := 0, 0

file, err := os.Open(fullPath)
if err != nil {
resultChan <- Result{Path: displayPath, Status: "timeout"}
return
}
defer file.Close()

scanner := bufio.NewScanner(file)

firstLine := true

for scanner.Scan() {
select {
case <-ctx.Done():
resultChan <- Result{Path: displayPath, Status: "timeout"}
return
default:
line := scanner.Text()

// Handle #sleep=N on first line
if firstLine {
firstLine = false
if strings.HasPrefix(line, "#sleep=") {
nStr := strings.TrimPrefix(line, "#sleep=")
if n, err := strconv.Atoi(nStr); err == nil && n >= 5 {
resultChan <- Result{Path: displayPath, Status: "timeout"}
return
}
continue // skip first line even if sleep < 5
}
}

// Skip other metadata lines starting with #
if strings.HasPrefix(line, "#") {
continue
}

lines++
words += len(strings.Fields(line))
}
}

resultChan <- Result{
Path: displayPath,
Lines: lines,
Words: words,
Status: "ok",
}
}()

select {
case <-ctx.Done():
return Result{Path: displayPath, Status: "timeout"}
case res := <-resultChan:
return res
}
}

// Aggregate must read filelistPath, spin up *workers* goroutines,
// apply a per‑file timeout, and return results in **input order**.
func Aggregate(filelistPath string, workers, timeout int) ([]Result, error) {
// ── TODO: IMPLEMENT ────────────────────────────────────────────────────────
return nil, errors.New("implement Aggregate()")
// ───────────────────────────────────────────────────────────────────────────
// for debugging
// absBase, _ := filepath.Abs("tasks/03-sync-aggregator/data") // get absolute path to data dir

// for testing
absBase, err := filepath.Abs("../data")
if err != nil {
return nil, err
}
Comment on lines +125 to +128
Copy link

Copilot AI Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Hardcoding "../data" for the base directory is brittle; consider deriving the data directory relative to filelistPath or passing it in as a parameter.

Suggested change
absBase, err := filepath.Abs("../data")
if err != nil {
return nil, err
}
baseDir := filepath.Dir(filelistPath)
absBase := filepath.Join(baseDir, "data")

Copilot uses AI. Check for mistakes.

paths, err := readLines(filelistPath)
if err != nil {
return nil, err
}

taskChan := make(chan Task)
resultChan := make(chan ResultWithIndex, len(paths))
var wg sync.WaitGroup

for i := 0; i < workers; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for task := range taskChan {
fullPath := filepath.Join(absBase, task.Path)
// fmt.Println("DEBUG read path:", fullPath)

res := processFileWithTimeout(task.Path, fullPath, timeout)
resultChan <- ResultWithIndex{Index: task.Index, Result: res}
}
}()
}

go func() {
for i, path := range paths {
taskChan <- Task{Index: i, Path: path}
}
close(taskChan)
}()

wg.Wait()
close(resultChan)

results := make([]Result, len(paths))
for r := range resultChan {
results[r.Index] = r.Result
}

return results, nil
}
Loading
Loading