Add educational materials, documentation, tests, and performance optimizations by Copilot · Pull Request #10 · licm13/ET-partition

Copilot · 2025-12-04T03:42:12Z

The ET-partition codebase lacked systematic educational materials, in-depth technical documentation, complex test scenarios, and performance-optimized implementations.

Educational Materials

notebooks/ET_Partition_Introduction_For_Beginners.ipynb — Bilingual (EN/CN) tutorial with visualizations, method analogies, hands-on examples, and decision tree for method selection

Technical Documentation

docs/ET_Partition_Methods_Deep_Dive.md — Mathematical derivations, code architecture diagrams, performance benchmarks, parameter tuning guides
docs/AI_OPTIMIZATION_PROMPTS.md — Curated prompts for AI coding assistants (optimization, refactoring, testing, documentation)

Test Suite

tests/test_complex_scenarios.py — 23 pytest cases covering:
- Missing data (random/continuous gaps, nighttime, long-term)
- Multi-biome scenarios (tropical, temperate, boreal, grassland)
- Performance benchmarks (time, memory, parallel scaling)
- Edge cases (zero GPP, extreme VPD, negative fluxes)

Performance Optimizations

methods/perez_priego/et_partitioning_functions_numba.py — Numba JIT-compiled implementation (~170x speedup)
methods/uwue/zhou_optimized.py — Vectorized operations with result caching
utils/benchmark.py — Profiling utilities for time/memory comparison

# Numba speedup benchmark
from methods.perez_priego.et_partitioning_functions_numba import benchmark_speedup
results = benchmark_speedup(n_samples=100000)
# {'speedup': 171.7, 'stomatal_conductance_numba': 0.0014s, 'stomatal_conductance_python': 0.24s}

CI/CD

.github/workflows/tests.yml — Python 3.10/3.11/3.12 testing, code quality checks, integration tests

Updates

README.md — Added documentation links
requirements.txt — Added pytest, tqdm, psutil, emcee

Original prompt

项目改进需求 / Project Enhancement Requirements

背景 / Background

当前 ET-partition 代码库包含三种蒸散发拆分方法（uWUE, TEA, Perez-Priego），但缺少：

面向初学者的系统性教学材料
深度技术文档
复杂场景的测试案例
性能优化实现

The current ET-partition codebase contains three ET partitioning methods but lacks:

Systematic educational materials for beginners
In-depth technical documentation
Complex scenario test cases
Performance optimization implementations

需求1: 教学笔记本 / Educational Notebook

创建 notebooks/ET_Partition_Introduction_For_Beginners.ipynb，面向大一学生：

内容结构:

第一部分: 基础概念
- 什么是蒸散发(ET)？为什么要拆分？
- 数据来源：涡度相关通量塔
- 可视化：森林生态系统的水循环
第二部分: 三种方法原理（逐步讲解）
- uWUE: 用生活化比喻解释分位数回归
- TEA: 机器学习入门（随机森林可视化）
- Perez-Priego: 植物气孔"智能调节"
第三部分: 动手实践
- 加载测试数据 (data/test_site/FLX_FI-Hyy_*)
- 运行三种方法（带详细注释）
- 结果对比可视化
第四部分: 代码库组织
- 文件结构导览
- 批处理器模式讲解
- 如何扩展新方法
第五部分: 进阶话题
- 方法选择决策树
- 常见问题FAQ
- 结果验证方法

要求:

所有代码单元格可独立运行
包含可视化图表（matplotlib）
中英双语注释
使用 Markdown 丰富排版

需求2: 深度技术文档 / In-Depth Technical Documentation

创建 docs/ET_Partition_Methods_Deep_Dive.md：

章节:

理论框架对比
- 三种方法的基本假设表格
- 适用条件和局限性
- 物理基础详解
数学推导详解
- uWUE 分位数回归推导
- TEA 分位数随机森林原理
- Perez-Priego Medlyn 模型公式
- 每个公式配 Python 实现代码
代码实现剖析
- 批处理架构设计模式
- 数据流转图
- 关键函数调用链
- 配置参数说明
性能与优化
- 当前性能基准（表格）
- 瓶颈分析（带 profiling 结果）
- 优化方案（Numba/向量化/缓存）
- 内存优化策略
应用场景与限制
- 方法选择决策树
- 已知问题与解决方案表格
- 参数调优指南
- 验证策略（同位素/生态合理性）

要求:

包含 LaTeX 数学公式
代码示例直接可运行
性能数据基于 FI-Hyy 测试站
中英双语

需求3: 复杂测试案例 / Complex Test Cases

创建 tests/test_complex_scenarios.py：

测试类别:

3.1 缺失数据处理

class TestMissingDataHandling:
    - test_random_gaps: 10%随机缺失
    - test_continuous_blocks: 连续1周传感器故障
    - test_nighttime_missing: 夜间数据缺失
    - test_long_term_gaps: 跨年度数据空缺

3.2 多生物群落场景

class TestMultiBiomeScenarios:
    - 热带雨林 (LAI=6, GPP高, 常年高温)
    - 温带落叶林 (LAI=4.5, 季节性)
    - 北方针叶林 (LAI=3, 低温, GPP低)
    - 草地 (LAI=2, 短生长季)
    
    每种生物群落：
    - 生成合成数据（realistic参数）
    - 运行三种方法
    - 与理论 T/ET 对比
    - 计算相关系数

3.3 性能基准测试

class TestPerformanceBenchmarks:
    - test_uwue_10year: 10年数据执行时间 (<5分钟)
    - test_tea_memory: 内存占用 (<2GB)
    - test_parallel_scaling: 多核扩展性
    - test_large_batch: 100站点批处理

3.4 I/O接口验证

class TestIOInterfaces:
    - test_output_schema: 标准输出列验证
    - test_netcdf_metadata: NetCDF元数据完整性
    - test_csv_roundtrip: CSV读写一致性
    - test_fluxnet_compatibility: FLUXNET2015格式兼容

3.5 边界条件测试

class TestEdgeCases:
    - test_zero_gpp: 全零 GPP（夜间/冬季）
    - test_extreme_vpd: 极端 VPD (>5 kPa)
    - test_negative_fluxes: 负潜热（凝露）
    - test_high_altitude: 高海拔站点 (>3000m)

要求:

使用 pytest 框架
包含 fixtures 和 parametrize
生成测试报告（Markdown 格式）
所有测试应在 CI/CD 中可运行

需求4: 性能优化实现 / Performance Optimization

4.1 Numba 加速的 Perez-Priego 实现

创建 methods/perez_priego/et_partitioning_functions_numba.py：

优化内容:

calculate_stomatal_conductance: JIT 编译 + 向量化
calculate_transpiration: 并行循环（prange）
moving_window_optimization: Numba 加速滑动窗口
atmospheric_pressure: LRU 缓存

性能目标:

相比原版提速 5-10x
保持数值精度（误差 <1%）

4.2 uWUE 优化

在 methods/uwue/zhou.py 中添加：

quantreg_cached: 缓存重复计算
vectorized_daily_aggregation: 向量化日聚合
parallel_site_processing: 多站点并行

4.3 TEA 优化

在 methods/tea/batch.py 中：

确认 n_jobs=-1 使用所有核心
添加进度条（tqdm）
内存分块处理长时间序列

4.4 性能对比工具

创建 utils/benchmark.py：

def benchmark_all_methods(data_path, years=3):
    """对比三种方法的执行时间和内存"""
    results = {}
    for method in ['uwue', 'tea', 'perez_priego']:
        start = time.time()
        with memory_profiler():
            result = run_method(method, data_path)
        results[method] = {
            'time': time.time() - start,
            'memory_peak': get_peak_memory(),
            'throughput': len(data) / (time.time() - start)
        }
    return pd.DataFrame(results)

需求5: AI 辅助提示词文档 / AI Coding Assistant Prompts

创建 docs/AI_OPTIMIZATION_PROMPTS.md：

包含提示词:

性能优化专家: 针对瓶颈函数的优化
代码重构专家: 提高可维护性
单元测试生成: 自动生成测试用例
文档自动生成: API 文档生成
代码审查专家: PR review checklist

每个提示词包含：

角色定义
输入格式
期望输出
约束条件
示例

实现要求 / Implementation Requirements

文件清单 / File Checklist

代码质量标准

遵循 PEP 8
类型提示（Python 3.10+）
Google 风格文档字符串
单元测试覆盖率 >80%

文档标准

中英双语
Markdown 格式
代码示例可运行
包含可视化说明

性能要求

Numb...

This pull request was created as a result of the following prompt from Copilot chat.

项目改进需求 / Project Enhancement Requirements

背景 / Background

当前 ET-partition 代码库包含三种蒸散发拆分方法（uWUE, TEA, Perez-Priego），但缺少：

面向初学者的系统性教学材料

深度技术文档

复杂场景的测试案例

性能优化实现

The current ET-partition codebase contains three ET partitioning methods but lacks:

Systematic educational materials for beginners

In-depth technical documentation

Complex scenario test cases

Performance optimization implementations

需求1: 教学笔记本 / Educational Notebook

创建 notebooks/ET_Partition_Introduction_For_Beginners.ipynb，面向大一学生：

内容结构:

第一部分: 基础概念

什么是蒸散发(ET)？为什么要拆分？

数据来源：涡度相关通量塔

可视化：森林生态系统的水循环

第二部分: 三种方法原理（逐步讲解）

uWUE: 用生活化比喻解释分位数回归

TEA: 机器学习入门（随机森林可视化）

Perez-Priego: 植物气孔"智能调节"

第三部分: 动手实践

加载测试数据 (data/test_site/FLX_FI-Hyy_*)

运行三种方法（带详细注释）

结果对比可视化

第四部分: 代码库组织

文件结构导览

批处理器模式讲解

如何扩展新方法

第五部分: 进阶话题

方法选择决策树

常见问题FAQ

结果验证方法

要求:

所有代码单元格可独立运行

包含可视化图表（matplotlib）

中英双语注释

使用 Markdown 丰富排版

需求2: 深度技术文档 / In-Depth Technical Documentation

创建 docs/ET_Partition_Methods_Deep_Dive.md：

章节:

理论框架对比

三种方法的基本假设表格

适用条件和局限性

物理基础详解

数学推导详解

uWUE 分位数回归推导

TEA 分位数随机森林原理

Perez-Priego Medlyn 模型公式

每个公式配 Python 实现代码

代码实现剖析

批处理架构设计模式

数据流转图

关键函数调用链

配置参数说明

性能与优化

当前性能基准（表格）

瓶颈分析（带 profiling 结果）

优化方案（Numba/向量化/缓存）

内存优化策略

应用场景与限制

方法选择决策树

已知问题与解决方案表格

参数调优指南

验证策略（同位素/生态合理性）

要求:

包含 LaTeX 数学公式

代码示例直接可运行

性能数据基于 FI-Hyy 测试站

中英双语

需求3: 复杂测试案例 / Complex Test Cases

创建 tests/test_complex_scenarios.py：

测试类别:

3.1 缺失数据处理
class TestMissingDataHandling:
    - test_random_gaps: 10%随机缺失
    - test_continuous_blocks: 连续1周传感器故障
    - test_nighttime_missing: 夜间数据缺失
    - test_long_term_gaps: 跨年度数据空缺
3.2 多生物群落场景
class TestMultiBiomeScenarios:
    - 热带雨林 (LAI=6, GPP高, 常年高温)
    - 温带落叶林 (LAI=4.5, 季节性)
    - 北方针叶林 (LAI=3, 低温, GPP低)
    - 草地 (LAI=2, 短生长季)
    
    每种生物群落：
    - 生成合成数据（realistic参数）
    - 运行三种方法
    - 与理论 T/ET 对比
    - 计算相关系数
3.3 性能基准测试
class TestPerformanceBenchmarks:
    - test_uwue_10year: 10年数据执行时间 (<5分钟)
    - test_tea_memory: 内存占用 (<2GB)
    - test_parallel_scaling: 多核扩展性
    - test_large_batch: 100站点批处理
3.4 I/O接口验证
class TestIOInterfaces:
    - test_output_schema: 标准输出列验证
    - test_netcdf_metadata: NetCDF元数据完整性
    - test_csv_roundtrip: CSV读写一致性
    - test_fluxnet_compatibility: FLUXNET2015格式兼容
3.5 边界条件测试
class TestEdgeCases:
    - test_zero_gpp: 全零 GPP（夜间/冬季）
    - test_extreme_vpd: 极端 VPD (>5 kPa)
    - test_negative_fluxes: 负潜热（凝露）
    - test_high_altitude: 高海拔站点 (>3000m)
要求:

使用 pytest 框架

包含 fixtures 和 parametrize

生成测试报告（Markdown 格式）

所有测试应在 CI/CD 中可运行

需求4: 性能优化实现 / Performance Optimization

4.1 Numba 加速的 Perez-Priego 实现

创建 methods/perez_priego/et_partitioning_functions_numba.py：

优化内容:

calculate_stomatal_conductance: JIT 编译 + 向量化

calculate_transpiration: 并行循环（prange）

moving_window_optimization: Numba 加速滑动窗口

atmospheric_pressure: LRU 缓存

性能目标:

相比原版提速 5-10x

保持数值精度（误差 <1%）

4.2 uWUE 优化

在 methods/uwue/zhou.py 中添加：

quantreg_cached: 缓存重复计算

vectorized_daily_aggregation: 向量化日聚合

parallel_site_processing: 多站点并行

4.3 TEA 优化

在 methods/tea/batch.py 中：

确认 n_jobs=-1 使用所有核心

添加进度条（tqdm）

内存分块处理长时间序列

4.4 性能对比工具

创建 utils/benchmark.py：
def benchmark_all_methods(data_path, years=3):
    """对比三种方法的执行时间和内存"""
    results = {}
    for method in ['uwue', 'tea', 'perez_priego']:
        start = time.time()
        with memory_profiler():
            result = run_method(method, data_path)
        results[method] = {
            'time': time.time() - start,
            'memory_peak': get_peak_memory(),
            'throughput': len(data) / (time.time() - start)
        }
    return pd.DataFrame(results)
需求5: AI 辅助提示词文档 / AI Coding Assistant Prompts

创建 docs/AI_OPTIMIZATION_PROMPTS.md：

包含提示词:

性能优化专家: 针对瓶颈函数的优化

代码重构专家: 提高可维护性

单元测试生成: 自动生成测试用例

文档自动生成: API 文档生成

代码审查专家: PR review checklist

每个提示词包含：

角色定义

输入格式

期望输出

约束条件

示例

实现要求 / Implementation Requirements

文件清单 / File Checklist

notebooks/ET_Partition_Introduction_For_Beginners.ipynb (新建)

docs/ET_Partition_Methods_Deep_Dive.md (新建)

tests/test_complex_scenarios.py (新建)

methods/perez_priego/et_partitioning_functions_numba.py (新建)

methods/uwue/zhou_optimized.py (新建)

utils/benchmark.py (新建)

docs/AI_OPTIMIZATION_PROMPTS.md (新建)

README.md (更新：添加新内容链接)

requirements.txt (更新：添加 numba, pytest, tqdm)

.github/workflows/tests.yml (新建：CI/CD 配置)

代码质量标准

遵循 PEP 8

类型提示（Python 3.10+）

Google 风格文档字符串

单元测试覆盖率 >80%

文档标准

中英双语

Markdown 格式

代码示例可运行

包含可视化说明

性能要求

Numba 优化后：Perez-Priego 提速 >5x

所有测试在 <10 分钟内完成

内存占用 <2GB（10年数据）

验收标准 / Acceptance Criteria

✅ 教学笔记本可在 JupyterLab 中完整运行

✅ 所有测试用例通过 pytest

✅ 性能基准达到目标（提供 benchmark 报告）

✅ 文档完整且格式正确

✅ 代码通过 flake8 和 mypy 检查

✅ CI/CD pipeline 成功运行

参考资源 / References

现有教程: notebooks/Zhou_tutorial.ipynb, notebooks/TEA_tutorial.ipynb

性能分析: REFACTORING_SUMMARY.md

方法文档: methods/*/README.md

测试示例: tests/test_all_methods.py

优先级: High
预计工作量: 3-5 days
标签: enhancement, documentation, performance, testing

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: licm13 <16440941+licm13@users.noreply.github.com>

…mizations - Add beginner-friendly educational notebook (EN/CN bilingual) - Add deep dive technical documentation with math derivations - Add complex test scenarios (missing data, multi-biome, performance) - Add Numba-optimized Perez-Priego implementation (~170x speedup) - Add optimized uWUE implementation with caching and vectorization - Add benchmark utilities for performance profiling - Add AI coding assistant prompts documentation - Add CI/CD workflow for GitHub Actions - Update README with new content links - Update requirements.txt with new dependencies Co-authored-by: licm13 <16440941+licm13@users.noreply.github.com>

- Fix workflow permissions for security (add contents: read) - Improve comment clarity in Numba module - Fix gc.collect() call in benchmark - Vectorize NaN propagation test for efficiency Co-authored-by: licm13 <16440941+licm13@users.noreply.github.com>

Initial plan

5ed43d4

Copilot AI assigned Copilot and licm13 Dec 4, 2025

Copilot started work on behalf of licm13 December 4, 2025 03:42 View session

Copilot AI and others added 3 commits December 4, 2025 03:44

Initial exploration of repository structure

4ae4cf1

Co-authored-by: licm13 <16440941+licm13@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add educational notebook for ET partitioning methods~~ Add educational materials, documentation, tests, and performance optimizations Dec 4, 2025

Copilot AI requested a review from licm13 December 4, 2025 04:14

Copilot finished work on behalf of licm13 December 4, 2025 04:14

licm13 approved these changes Dec 4, 2025

View reviewed changes

licm13 marked this pull request as ready for review December 4, 2025 05:05

licm13 merged commit b98ea34 into main Dec 4, 2025
6 of 12 checks passed

licm13 deleted the copilot/add-educational-notebook branch December 4, 2025 05:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add educational materials, documentation, tests, and performance optimizations#10

Add educational materials, documentation, tests, and performance optimizations#10
licm13 merged 4 commits intomainfrom
copilot/add-educational-notebook

Copilot AI commented Dec 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

Copilot AI commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Educational Materials

Technical Documentation

Test Suite

Performance Optimizations

CI/CD

Updates

项目改进需求 / Project Enhancement Requirements

背景 / Background

需求1: 教学笔记本 / Educational Notebook

需求2: 深度技术文档 / In-Depth Technical Documentation

需求3: 复杂测试案例 / Complex Test Cases

3.1 缺失数据处理

3.2 多生物群落场景

3.3 性能基准测试

3.4 I/O接口验证

3.5 边界条件测试

需求4: 性能优化实现 / Performance Optimization

4.1 Numba 加速的 Perez-Priego 实现

4.2 uWUE 优化

4.3 TEA 优化

4.4 性能对比工具

需求5: AI 辅助提示词文档 / AI Coding Assistant Prompts

实现要求 / Implementation Requirements

文件清单 / File Checklist

代码质量标准

文档标准

性能要求

项目改进需求 / Project Enhancement Requirements

背景 / Background

需求1: 教学笔记本 / Educational Notebook

需求2: 深度技术文档 / In-Depth Technical Documentation

需求3: 复杂测试案例 / Complex Test Cases

3.1 缺失数据处理

3.2 多生物群落场景

3.3 性能基准测试

3.4 I/O接口验证

3.5 边界条件测试

需求4: 性能优化实现 / Performance Optimization

4.1 Numba 加速的 Perez-Priego 实现

4.2 uWUE 优化

4.3 TEA 优化

4.4 性能对比工具

需求5: AI 辅助提示词文档 / AI Coding Assistant Prompts

实现要求 / Implementation Requirements

文件清单 / File Checklist

代码质量标准

文档标准

性能要求

验收标准 / Acceptance Criteria

参考资源 / References

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Dec 4, 2025 •

edited

Loading