This document will walk you through the sheet_watch codebase to help you understand Rust concepts through real, working code. We'll start with basic concepts and build up to understanding the entire application.
- Rust Basics Overview
- Project Structure
- Dependencies and Cargo.toml
- Module System
- Walking Through Each File
- Key Rust Concepts in Action
- How Everything Fits Together
Before diving into the code, let's understand some key Rust concepts you'll see throughout:
// Ownership: Each value has one owner
let data = String::from("hello"); // data owns the string
// Borrowing: References let you use without taking ownership
fn process_data(text: &str) { // &str is a borrowed string slice
println!("{}", text);
}
process_data(&data); // &data borrows the string// Result<T, E> represents success (Ok) or failure (Err)
fn might_fail() -> Result<String, &'static str> {
Ok("success".to_string()) // Success case
// Err("something went wrong") // Error case
}
// The ? operator propagates errors up the call stack
let result = might_fail()?; // If Err, return early; if Ok, unwrap valuematch some_result {
Ok(value) => println!("Got: {}", value),
Err(error) => println!("Error: {}", error),
}// async functions return Future<Output = T>
async fn fetch_data() -> Result<String, Error> {
// await waits for the async operation to complete
let response = http_client.get("url").await?;
Ok(response.text().await?)
}Our project follows Rust's standard layout:
sheet_watch/
├── Cargo.toml # Dependencies and project metadata
├── Cargo.lock # Exact dependency versions (auto-generated)
├── src/ # Source code
│ ├── main.rs # Entry point (contains main() function)
│ ├── lib.rs # Library root (we don't have this - we're a binary)
│ ├── args.rs # CLI argument parsing
│ ├── auth.rs # Google authentication
│ ├── cfg.rs # Configuration management
│ ├── csv_sink.rs # CSV writing functionality
│ ├── job.rs # Main business logic
│ ├── sheets.rs # Google Sheets API
│ ├── state.rs # Persistent state management
│ └── transform.rs # Data transformation
├── config/ # Configuration files
└── target/ # Build output (generated by cargo)
[package]
name = "sheet_watch" # Our binary name
version = "0.1.0" # Semantic versioning
edition = "2021" # Rust edition (language version)
[[bin]] # Defines a binary target
name = "sheet_watch" # Binary name
path = "src/main.rs" # Entry point file
[dependencies]
clap = { version = "4", features = ["derive"] } # Command-line argument parsing
tokio = { version = "1", features = ["rt-multi-thread", "macros", "fs"] } # Async runtime
google-sheets4 = "5.0" # Google Sheets API client
# ... more dependencies- Crates: Rust's compilation units (like libraries/packages in other languages)
- Features: Optional functionality you can enable/disable
- Versions: Semantic versioning (major.minor.patch)
Rust organizes code into modules. Here's how our project is structured:
// In main.rs, we declare modules
mod args; // Tells Rust to look for src/args.rs
mod auth; // Tells Rust to look for src/auth.rs
mod cfg; // etc...
// We can then use items from those modules
use args::Args;
use cfg::Cfg;pubmakes items public (usable from other modules)- Without
pub, items are private to their module pub(crate)makes items public within the current crate only
// src/main.rs
// Import external crates
use anyhow::Result; // Better error handling
use tracing::{info, debug}; // Structured logging
use tokio; // Async runtime
// Import our modules
mod args; // Declare modules
mod auth;
mod cfg;
// ... etc
use args::Args; // Use specific items
use cfg::Cfg;
use job::run_job;
#[tokio::main] // Macro that sets up async runtime
async fn main() -> Result<()> { // main returns Result for error handling
// Parse command line arguments
let args = Args::parse(); // clap derives this for us
// Set up logging based on user's choice
setup_logging(&args.log_level)?;
// Load configuration from file + CLI overrides
let cfg = Cfg::load(args)?;
// Initialize Google Sheets authentication
let hub = auth::initialize().await?;
// Run the main job
if cfg.once {
run_job(cfg, hub).await?; // Run once and exit
} else {
// Schedule mode (not implemented yet)
todo!("Implement scheduling");
}
Ok(()) // Return success
}Key Concepts:
#[tokio::main]: Macro that transformsasync fn main()into a regular main that sets up the async runtimeResult<()>:()is the unit type (likevoidin other languages)?operator: If the function returns anErr, propagate it up; ifOk, unwrap the value
// src/args.rs
use clap::Parser; // Derive macro for CLI parsing
#[derive(Parser, Debug)] // Auto-generate CLI parser
#[command(author, version, about)] // Metadata from Cargo.toml
pub struct Args { // Public struct other modules can use
#[arg(long)] // --sheet-id flag
pub sheet_id: Option<String>, // Optional value
#[arg(long)]
pub raw_range: Option<String>,
#[arg(long)]
pub csv_path: Option<String>,
#[arg(long)] // Boolean flag
pub once: bool, // Defaults to false
#[arg(long, default_value = "info")]
pub log_level: String, // Has default value
#[arg(long, default_value = "config/config.toml")]
pub config: String,
}Key Concepts:
#[derive(...)]: Automatically implements traits (like interfaces)Option<T>: Represents a value that might be present (Some(T)) or absent (None)pub: Makes the struct and its fields public- Attributes (
#[...]): Metadata that affects compilation
// src/cfg.rs
use anyhow::Result;
use serde::{Deserialize, Serialize}; // JSON/TOML serialization
use config::{Config, File}; // Configuration file loading
#[derive(Debug, Deserialize, Serialize, Clone)]
pub struct Cfg {
pub sheet_id: String,
pub block_range_template: String,
// Optional fields
pub specific_blocks: Option<Vec<u32>>,
// ... more fields
}
impl Cfg { // Implementation block - methods for Cfg
pub fn load(args: Args) -> Result<Self> { // Associated function (like static method)
let mut cfg = Cfg::default(); // Start with defaults
// Try to load from file
if std::path::Path::new(&args.config).exists() {
let config = Config::builder()
.add_source(File::with_name(&args.config))
.build()?; // ? propagates any errors
// Override defaults with file values
if let Ok(sheet_id) = config.get_string("sheet_id") {
cfg.sheet_id = sheet_id;
}
}
// Override with command line arguments
if let Some(sheet_id) = args.sheet_id {
cfg.sheet_id = sheet_id;
}
Ok(cfg) // Return success
}
pub fn validate(&self) -> Result<()> { // Method (takes &self reference)
if self.sheet_id.is_empty() {
anyhow::bail!("sheet_id cannot be empty"); // Early return with error
}
Ok(())
}
}
impl Default for Cfg { // Trait implementation
fn default() -> Self {
Self { // Self refers to the current type (Cfg)
sheet_id: "YOUR_SHEET_ID".to_string(),
block_range_template: "Block {}!A1:BZ".to_string(),
// ... other defaults
}
}
}Key Concepts:
impl Type: Implementation block for adding methods to a type&self: Immutable reference to the instance (likethisbut explicit)Self: Type alias for the current typeif let: Pattern matching with conditional bindinganyhow::bail!: Macro for early return with error
// src/auth.rs
use anyhow::Result;
use google_sheets4::{Sheets, hyper_rustls};
use yup_oauth2::{ServiceAccountAuthenticator, ServiceAccountKey};
pub async fn initialize() -> Result<Sheets<hyper_rustls::HttpsConnector<hyper::client::HttpConnector>>> {
// This return type is complex! Let's break it down:
// Sheets<...> - Google Sheets client
// hyper_rustls::HttpsConnector<...> - HTTPS connection type
// hyper::client::HttpConnector - HTTP connection type
// Try to load service account key
let key = load_service_account_key().await?;
// Create authenticator
let auth = ServiceAccountAuthenticator::builder(key)
.build()
.await?;
// Create HTTP client with HTTPS support
let connector = hyper_rustls::HttpsConnectorBuilder::new()
.with_native_roots() // Use system certificate store
.https_or_http() // Support both HTTP and HTTPS
.enable_http1() // Enable HTTP/1.1
.build();
// Create Sheets client
let hub = Sheets::new(
hyper::Client::builder().build(connector),
auth,
);
Ok(hub)
}
async fn load_service_account_key() -> Result<ServiceAccountKey> {
// Try environment variable first
if let Ok(path) = std::env::var("GOOGLE_APPLICATION_CREDENTIALS") {
return load_key_from_file(&path).await;
}
// Auto-detect JSON files in current directory
for entry in std::fs::read_dir(".")? { // ? propagates IO errors
let entry = entry?;
let path = entry.path();
if path.extension() == Some(std::ffi::OsStr::new("json")) {
if let Ok(key) = load_key_from_file(&path.to_string_lossy()).await {
return Ok(key);
}
}
}
anyhow::bail!("No service account key found")
}Key Concepts:
- Complex generic types:
Sheets<HttpsConnector<HttpConnector>> async fn: Functions that can be awaitedstd::env::var(): Access environment variablesstd::fs::read_dir(): Read directory contents- Method chaining:
builder().build()
// src/sheets.rs
use anyhow::Result;
use google_sheets4::{Sheets, hyper_rustls, hyper, api::ValueRange};
use regex::Regex;
// Custom types for our domain
#[derive(Debug, Clone)]
pub struct BlockInfo {
pub name: String,
pub block_number: u32,
}
// Auto-discover all block tabs
pub async fn discover_block_tabs(
hub: &Sheets<hyper_rustls::HttpsConnector<hyper::client::HttpConnector>>,
sheet_id: &str,
) -> Result<Vec<BlockInfo>> {
// Get spreadsheet metadata
let (_, spreadsheet) = hub
.spreadsheets()
.get(sheet_id)
.doit() // Execute the request
.await?; // Await the async result
let mut blocks = Vec::new(); // Mutable vector
// Regex to match "Block X" patterns
let block_regex = Regex::new(r"(?i)^block\s+(\d+)$")?;
if let Some(sheets) = spreadsheet.sheets {
for sheet in sheets { // Iterate over sheets
if let Some(properties) = sheet.properties {
if let Some(title) = properties.title {
// Pattern matching with regex
if let Some(captures) = block_regex.captures(&title) {
if let Some(number_match) = captures.get(1) {
if let Ok(block_number) = number_match.as_str().parse::<u32>() {
blocks.push(BlockInfo {
name: title.clone(), // Clone the string
block_number,
});
}
}
}
}
}
}
}
// Sort by block number
blocks.sort_by_key(|b| b.block_number); // Closure (anonymous function)
Ok(blocks)
}
// Detect optimal column range for a block
pub async fn detect_block_extent(
hub: &Sheets<hyper_rustls::HttpsConnector<hyper::client::HttpConnector>>,
sheet_id: &str,
block_name: &str,
) -> Result<String> {
// Sample the first few rows
let sample_range = format!("{}!A1:ZZ10", block_name); // String interpolation
let (_, value_range) = hub
.spreadsheets()
.values_get(sheet_id, &sample_range)
.doit()
.await?;
let sample_rows = extract_rows_from_response(value_range)?;
let max_column = find_rightmost_week_column(&sample_rows)?;
// Convert column number to letter
let end_column = column_number_to_letter(max_column + 5);
let optimized_range = format!("{}!A1:{}", block_name, end_column);
Ok(optimized_range)
}Key Concepts:
&strvsString: String slices vs owned stringsVec<T>: Dynamic arrays- Closures:
|b| b.block_numberis an anonymous function format!(): String interpolation macro- Tuple destructuring:
let (_, value_range) = ...
// src/transform.rs
use chrono::{DateTime, Utc, NaiveDate, Duration, Datelike};
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct WorkoutRecord {
pub id: String,
pub block_name: String,
pub week_start_date: String,
pub week_number: u32,
pub day_number: u32,
pub workout_date: String,
pub exercise_name: String,
pub record_type: String,
pub sets: Option<u32>, // Option because it might be missing
pub reps: Option<String>,
pub load: Option<f64>,
pub rpe: Option<f64>,
pub notes: Option<String>,
pub processed_at: DateTime<Utc>, // Timestamp
}
// Main function to normalize block data
pub fn normalize_block_data(raw_rows: Vec<Vec<String>>, block_name: &str) -> Result<Vec<WorkoutRecord>> {
let mut records = Vec::new();
// Find week structure
let weeks = parse_week_structure(&raw_rows)?;
if weeks.is_empty() {
return Ok(records);
}
// Find day markers and exercise rows
let day_markers = find_day_markers(&raw_rows);
let exercise_rows = find_exercise_rows(&raw_rows);
// Process each week
for week in &weeks {
for day_marker in &day_markers {
// Process exercises for this day
for exercise_row_idx in &exercise_rows {
if let Some(exercise_data) = extract_exercise_data(&raw_rows, *exercise_row_idx, week, day_marker) {
// Create prescribed record
if let Some(prescribed) = create_prescribed_record(&exercise_data, block_name, week, day_marker) {
records.push(prescribed);
}
// Create actual record
if let Some(actual) = create_actual_record(&exercise_data, block_name, week, day_marker) {
records.push(actual);
}
}
}
}
}
Ok(records)
}
// Helper function to parse dates
fn parse_date(date_str: &str) -> Result<NaiveDate> {
// Try different date formats
let formats = ["%-m/%-d/%Y", "%m/%d/%Y", "%Y-%m-%d"];
for format in &formats {
if let Ok(date) = NaiveDate::parse_from_str(date_str, format) {
return Ok(date);
}
}
anyhow::bail!("Could not parse date: {}", date_str)
}
// Calculate workout date from week start + day offset
fn calculate_workout_date(week_start: &str, day_number: u32) -> Result<String> {
let start_date = parse_date(week_start)?;
let workout_date = start_date + Duration::days((day_number - 1) as i64);
Ok(format!("{}/{}/{}", workout_date.month(), workout_date.day(), workout_date.year()))
}Key Concepts:
- Structs with different field types
Option<T>: Handling missing dataDateTime<Utc>: Timezone-aware timestamps- Error propagation with
? - Array iteration with references:
&weeks - Pattern matching with
if let
// src/state.rs
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::fs;
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct State {
pub last_processed_row: usize,
pub last_updated: chrono::DateTime<chrono::Utc>,
pub total_processed: usize,
pub block_states: HashMap<String, BlockState>, // Per-block state
}
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct BlockState {
pub last_processed_row: usize,
pub total_processed: usize,
pub last_updated: chrono::DateTime<chrono::Utc>,
}
impl State {
pub fn new() -> Self {
Self::default()
}
pub fn get_block_state(&self, block_range: &str) -> BlockState {
// Get block state or return default
self.block_states
.get(block_range) // Returns Option<&BlockState>
.cloned() // Clone the value (Option<BlockState>)
.unwrap_or_else(|| BlockState { // Provide default if None
last_processed_row: 0,
total_processed: 0,
last_updated: chrono::Utc::now(),
})
}
pub fn update_block_state(&mut self, block_range: &str, new_rows: usize) {
let block_state = self.block_states
.entry(block_range.to_string()) // Get or create entry
.or_insert_with(|| BlockState { // Insert default if missing
last_processed_row: 0,
total_processed: 0,
last_updated: chrono::Utc::now(),
});
block_state.last_processed_row += new_rows;
block_state.total_processed += new_rows;
block_state.last_updated = chrono::Utc::now();
self.total_processed += new_rows;
self.last_updated = chrono::Utc::now();
}
}
pub fn load_state(path: &str) -> Result<State> {
if std::path::Path::new(path).exists() {
let content = fs::read_to_string(path)?; // Read file to string
let state = serde_json::from_str(&content)?; // Parse JSON
Ok(state)
} else {
Ok(State::new()) // Default state
}
}
pub fn save_state(path: &str, state: &State) -> Result<()> {
let content = serde_json::to_string_pretty(state)?; // Serialize to JSON
fs::write(path, content)?; // Write to file
Ok(())
}Key Concepts:
HashMap<K, V>: Key-value storage&mut self: Mutable reference to selfunwrap_or_else(): Provide default value if Noneentry()API: Ergonomic way to work with HashMap entries- File I/O with error handling
// src/job.rs
use anyhow::Result;
use tracing::{info, warn};
pub async fn run_job(
cfg: Cfg,
hub: Sheets<hyper_rustls::HttpsConnector<hyper::client::HttpConnector>>,
) -> Result<()> {
info!("Starting job execution");
cfg.validate()?;
let mut state = load_state(&cfg.state_path)?;
// Get ranges to process
let ranges = if let Some(legacy_ranges) = cfg.get_legacy_block_ranges() {
legacy_ranges
} else {
// Auto-discover blocks
let discovered_blocks = discover_block_tabs(&hub, &cfg.sheet_id).await?;
if discovered_blocks.is_empty() {
anyhow::bail!("No block tabs found");
}
// Detect optimal ranges for each block
let mut optimized_ranges = Vec::new();
for block in discovered_blocks.iter() {
match detect_block_extent(&hub, &cfg.sheet_id, &block.name).await {
Ok(optimized_range) => {
optimized_ranges.push(optimized_range);
}
Err(e) => {
warn!("Failed to detect extent for {}: {}", block.name, e);
let fallback = cfg.block_range_template
.replace("{}", &block.block_number.to_string());
optimized_ranges.push(fallback);
}
}
}
optimized_ranges
};
let mut all_normalized_rows = Vec::new();
// Process each range
for (range_index, range) in ranges.iter().enumerate() { // Iterator with index
info!("Processing range {}/{}: {}", range_index + 1, ranges.len(), range);
let start_row = state.get_next_row_for_block(range);
let raw_rows = fetch_rows(&hub, &cfg.sheet_id, range, start_row).await?;
if raw_rows.is_empty() {
continue;
}
let block_name = range.split('!').next().unwrap_or(range);
let normalized_rows = normalize_block_data(raw_rows.clone(), block_name)?;
all_normalized_rows.extend(normalized_rows); // Move data into vec
state.update_block_state(range, raw_rows.len());
}
// Write to CSV
if !all_normalized_rows.is_empty() {
append(&cfg.output_csv.path, &all_normalized_rows, cfg.output_csv.ensure)?;
}
save_state(&cfg.state_path, &state)?;
info!("Job completed successfully");
Ok(())
}Key Concepts:
enumerate(): Iterator that yields (index, item) tuplesextend(): Move all items from one collection into another- Control flow with
if let,match, and early returns - Error handling throughout the async call chain
// This would NOT compile - value moved
let data = String::from("hello");
let moved_data = data; // data is moved to moved_data
// println!("{}", data); // ERROR: data no longer owns the string
// Instead, we borrow:
let data = String::from("hello");
let borrowed = &data; // borrow a reference
println!("{}", data); // OK: data still owns the string
println!("{}", borrowed); // OK: borrowed is just a reference// Each ? can fail and propagate the error upward
pub async fn complex_operation() -> Result<String> {
let config = load_config()?; // Could fail
let client = create_client(&config)?; // Could fail
let data = fetch_data(&client).await?; // Could fail (async)
let processed = process_data(data)?; // Could fail
Ok(processed) // Success
}// Exhaustive matching ensures all cases are handled
match auth_result {
Ok(client) => {
info!("Authentication successful");
client
}
Err(AuthError::InvalidCredentials) => {
anyhow::bail!("Invalid credentials");
}
Err(AuthError::NetworkError(e)) => {
anyhow::bail!("Network error: {}", e);
}
Err(e) => {
anyhow::bail!("Unknown auth error: {}", e);
}
}let numbers = vec![1, 2, 3, 4, 5];
let doubled: Vec<i32> = numbers
.iter() // Create iterator
.filter(|&&x| x > 2) // Filter elements
.map(|&x| x * 2) // Transform elements
.collect(); // Collect into Vec
// In our code:
let block_numbers: Vec<u32> = discovered_blocks
.iter()
.map(|block| block.block_number)
.filter(|&num| num <= 25)
.collect();Here's the complete flow of our application:
1. main.rs
├── Parse CLI arguments (args.rs)
├── Load configuration (cfg.rs)
├── Initialize auth (auth.rs)
└── Run job (job.rs)
2. job.rs
├── Load previous state (state.rs)
├── Discover blocks (sheets.rs)
├── For each block:
│ ├── Detect optimal range (sheets.rs)
│ ├── Fetch raw data (sheets.rs)
│ └── Transform data (transform.rs)
├── Write to CSV (csv_sink.rs)
└── Save new state (state.rs)
Google Sheets
↓ (sheets.rs)
Raw Data: Vec<Vec<String>>
↓ (transform.rs)
WorkoutRecord structs
↓ (csv_sink.rs)
CSV File
// Errors bubble up through the ? operator
main() -> Result<()>
└── run_job() -> Result<()>
└── discover_blocks() -> Result<Vec<BlockInfo>>
└── hub.get().await -> Result<Spreadsheet>
└── HTTP request (could fail)
// If any step fails, the error propagates to main()
// which prints it and exits with error codelet client = HttpClient::builder()
.timeout(Duration::from_secs(30))
.user_agent("sheet_watch/1.0")
.build()?;// File is automatically closed when it goes out of scope
{
let file = File::open("data.txt")?;
// ... use file
} // File automatically closed here// Chain operations that might fail
let result = config
.get_string("sheet_id") // Result<String, Error>
.ok() // Option<String>
.filter(|s| !s.is_empty()) // Option<String>
.ok_or_else(|| anyhow!("sheet_id is required"))?; // Result<String, Error>To continue learning Rust:
- The Rust Book: https://doc.rust-lang.org/book/ - Official comprehensive guide
- Rust by Example: https://doc.rust-lang.org/rust-by-example/ - Learning through examples
- Rustlings: https://github.com/rust-lang/rustlings - Interactive exercises
- Rust API Documentation: https://doc.rust-lang.org/std/ - Standard library docs
- Add a new CLI flag: Try adding a
--dry-runflag that shows what would be processed without actually doing it - Add validation: Add validation to ensure block numbers are reasonable (e.g., 1-100)
- Add filtering: Allow processing only specific exercise types
- Improve error messages: Add more context to error messages when things fail
This codebase is a great example of real-world Rust - it shows async programming, error handling, external APIs, file I/O, and good code organization. Keep exploring and experimenting!