Add Noosphere Engine foundation (Phase 0 + Phase 1) by user1303836 · Pull Request #187 · user1303836/intelstream

user1303836 · 2026-02-06T19:54:05Z

Summary

Implements the foundational infrastructure for the Noosphere Engine feature, covering Phase 0 (core package structure) and Phase 1 (metrics computation layer).

Phase 0 - Core Infrastructure:

SQLAlchemy models (10 tables): NoosphereGuildState, MessageEmbedding, EgregoreSnapshot, UserBioelectricState, SoundscapeSnapshot, AttractorSnapshot, ArchiveEntry, ArchiveLink, CrystalRoom, GuildMetricsBaseline
Shared dataclasses: ProcessedMessage, CommunityStateVector
Constants and enums: PHI, Fibonacci sequence, ComputationMode (10 modes), PathologyType (10 types), MessageClassification
EmbeddingService: async wrapper around sentence-transformers (paraphrase-multilingual-MiniLM-L12-v2, 384-dim)
OutputGovernor: O(1) sidechain gain computation with token bucket and cooldown
SoundscapeMonitor: anthrophony/biophony/geophony message classification

Phase 1 - Metrics and Computation:

MetricsComputer: three-tier scheduling (hourly/daily/weekly) producing CommunityStateVector
WelfordAccumulator: online mean/variance with z-score normalization and sigmoid mapping
Egregore Index: weighted composite of coherence (0.4) + convergence (0.3) + concentration (0.3)
Archive decay: stigmergic storage with reference-extended half-life (base 168h, 1.5x per reference)
FibonacciScheduler: golden-angle jitter for quasiperiodic scheduling
PhiParameter: golden ratio mode weight oscillator

Dependencies added: sentence-transformers, chromadb, numpy, scipy, networkx

Test plan

90 unit tests covering all modules (all passing)
ruff check passes
ruff format passes
mypy passes with no issues (14 source files checked)
SQLAlchemy models tested with in-memory SQLite
Numerical algorithms verified (Welford convergence, coherence bounds, entropy normalization)

Phase 0 - Core package structure: - SQLAlchemy models (10 tables): guild state, embeddings, egregore snapshots, soundscape, attractors, archive, crystal rooms, baselines - Shared dataclasses: ProcessedMessage, CommunityStateVector - Constants: PHI, Fibonacci, enums (ComputationMode, PathologyType, MessageClassification) - EmbeddingService: async sentence-transformers wrapper (384-dim) - OutputGovernor: O(1) sidechain gain, token bucket, cooldown - SoundscapeMonitor: anthrophony/biophony/geophony classification Phase 1 - Metrics and computation: - MetricsComputer: three-tier scheduling (hourly/daily/weekly) - WelfordAccumulator: online mean/variance with z-score normalization - Egregore Index: weighted coherence + convergence + concentration - Archive decay: reference-extended half-life (base 168h) - FibonacciScheduler: golden-angle jitter for quasiperiodic scheduling - PhiParameter: golden ratio mode weight oscillator 90 tests covering all modules.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

user1303836 · 2026-02-06T19:57:59Z

Code Review: PR #187 -- Noosphere Engine Foundation (Phase 0 + Phase 1)

Verdict: Changes Requested -- 7 issues to address before merge, 5 positive observations.

Issues Requiring Changes

1. Enums use enum.Enum instead of str, Enum -- constants.py

Per our alignment agreement, all enums should use str, Enum as base classes for JSON serialization compatibility. Currently ComputationMode, PathologyType, and MessageClassification all inherit from bare enum.Enum.

# Current
class MessageClassification(enum.Enum):
    ANTHROPHONY = "anthrophony"

# Expected
class MessageClassification(str, enum.Enum):
    ANTHROPHONY = "anthrophony"

2. CommunityStateVector missing agreed fields -- shared/data_models.py

The dataclass is missing fields that dev-analytics and dev-features depend on:

sentiment_alignment: float = math.nan
interaction_modularity: float = math.nan

These were agreed upon during interface alignment. Use math.nan defaults so Phase 1 code doesn't need to compute them yet, but downstream consumers can check for them.

3. datetime.utcnow() is deprecated -- metrics_computer.py:58

datetime.utcnow() is deprecated since Python 3.12. The existing codebase uses datetime.now(UTC) (see database/models.py). Change to match:

from datetime import UTC, datetime
# ...
now = datetime.now(UTC)

4. TYPE_CHECKING imports used for runtime types -- shared/data_models.py

numpy and MessageClassification are imported under TYPE_CHECKING, but they're used at runtime in dataclass fields. This works today because of from __future__ import annotations, but it's fragile -- any code that inspects field types at runtime (dataclasses.fields, serialization libraries, isinstance checks) will break. Move these to unconditional imports.

5. chromadb listed as dependency but unused -- pyproject.toml

chromadb appears in the dependency list but no code imports it. If it's planned for a future phase, remove it now and add it when actually needed. It's a heavy dependency.

6. Duplicate [dependency-groups] section -- pyproject.toml

There are two [dependency-groups] sections in pyproject.toml. TOML spec says duplicate tables are invalid. Some parsers silently merge them, others may error. Consolidate into one section.

7. MetricsComputer unbounded memory growth -- metrics_computer.py

_messages and _embeddings dicts append indefinitely with no pruning. For a long-running bot, this will eventually consume all available memory. Add a rolling window or clear after each computation cycle:

def compute_hourly(self, guild_id: int) -> CommunityStateVector:
    messages = self._messages.pop(guild_id, [])
    embeddings = self._embeddings.pop(guild_id, [])
    # ... compute ...

Or cap at a configurable max size (e.g., last 1000 messages per guild).

Positive Observations

OutputGovernor (shared/output_governor.py): Clean O(1) gain computation with token bucket pattern. Per-channel tracking, gain floor of 0.1, stochastic send decision -- all match the self-quieting design principle well.
WelfordAccumulator (shared/baseline.py): Numerically stable online mean/variance. Correct use of population variance (N not N-1 for normalization purposes). Sigmoid clamped at +/-10 to avoid overflow.
EgregoreIndex default (shared/egregore.py): Returns 0.5 when baseline count < 2, avoiding division-by-zero and giving a neutral starting point. Weighted composite (0.4/0.3/0.3) matches spec.
Archive decay math (shared/archive_decay.py): effective_half_life = base * (1.5^references) correctly extends relevance for frequently-referenced entries. Fidelity floor of 0.01 prevents complete erasure. Interaction boost with exponential decay is well-designed.
FibonacciScheduler (shared/fibonacci.py): Golden-angle jitter with max(0.5, ...) floor prevents degenerate intervals. Clean cycle-through-sequence design.

@greptile

greptile-apps · 2026-02-06T19:58:49Z

Thanks for the detailed review! All 7 issues are clear and actionable. I'll address them before requesting re-review:

✅ Change enums to str, Enum base classes
✅ Add sentiment_alignment and interaction_modularity fields with math.nan defaults
✅ Replace datetime.utcnow() with datetime.now(UTC)
✅ Move numpy and MessageClassification out of TYPE_CHECKING block
✅ Remove chromadb dependency
✅ Consolidate duplicate [dependency-groups] sections
✅ Fix memory leak by adding .pop() to clear messages/embeddings after computation

Appreciate the positive feedback on the implementation patterns! Will push fixes shortly.

- Change guild_id, channel_id, user_id, message_id from str to int in ProcessedMessage and CommunityStateVector (Discord IDs are integers) - Add missing Phase 4 fields to CommunityStateVector: sentiment_alignment, interaction_modularity, fractal_dimension, lyapunov_exponent, gromov_curvature (default to math.nan) - Update all dict key types in MetricsComputer, SoundscapeMonitor, OutputGovernor to match - Update all tests to use int IDs

Per team-lead correction: existing codebase uses str for Discord IDs throughout (database models, cogs, repository layer). Reverting the int change to maintain consistency. - Revert guild_id/channel_id/user_id/message_id back to str in ProcessedMessage, CommunityStateVector, and all dependent modules - Change enum base from enum.Enum to (str, enum.Enum) for ComputationMode, PathologyType, MessageClassification to enable direct JSON serialization - Keep Phase 4 fields (sentiment_alignment, interaction_modularity, fractal_dimension, lyapunov_exponent, gromov_curvature) with math.nan defaults - Add test verifying enum JSON serialization

- constants.py: Match foundation's enum.Enum base (not str, Enum), add MessageClassification and Phase 0 constants (embedding, archive, output governor), keep Crystal Room enums as additive-only additions - shared/phi_parameter.py: Replace with foundation's canonical version (no tick_count or set_phase -- engine tracks its own tick count) - test_constants.py: Keep only Crystal Room enum tests, defer shared constant/enum tests to foundation - test_phi_parameter.py: Remove tick_count and set_phase tests - test_engine.py: Use engine._tick_count instead of phi.tick_count - test_serendipity.py: Fix flaky range test by using zero-noise injector

- Replace datetime.utcnow() with datetime.now(UTC) (item 3) - Move TYPE_CHECKING imports to runtime with noqa comments (item 4) - Remove unused chromadb dependency and mypy override (item 5) - Remove duplicate [dependency-groups] section (item 6) - Add MAX_MESSAGES/EMBEDDINGS_PER_GUILD bounds to MetricsComputer (item 7)

user1303836 · 2026-02-06T20:12:34Z

Re-review: All 7 Items Addressed

I've verified every fix against the original review. All items are resolved.

Original Issues -- Status

#	Issue	Status
1	Enums use bare `enum.Enum`	Fixed -- `ComputationMode`, `PathologyType`, `MessageClassification` all use `(str, enum.Enum)` base (constants.py lines 24, 37, 50)
2	`CommunityStateVector` missing Phase 4 fields	Fixed -- `sentiment_alignment`, `interaction_modularity`, `fractal_dimension`, `lyapunov_exponent`, `gromov_curvature` added with `math.nan` defaults (data_models.py lines 42-46)
3	`datetime.utcnow()` deprecated	Fixed -- replaced with `datetime.now(UTC)` in metrics_computer.py (line 68). No remaining `utcnow` references in source or tests.
4	`TYPE_CHECKING` imports for runtime types	Fixed -- `datetime`, `numpy`, `MessageClassification` now imported at runtime with `# noqa: TC001/TC002/TC003` comments (data_models.py lines 5, 7, 9)
5	Unused `chromadb` dependency	Fixed -- removed from both `[project.dependencies]` and `[[tool.mypy.overrides]]`. Zero references to chromadb in the codebase.
6	Duplicate `[dependency-groups]` section	Fixed -- zero occurrences of `[dependency-groups]` in pyproject.toml
7	`MetricsComputer` unbounded memory	Fixed -- `MAX_MESSAGES_PER_GUILD = 5000` and `MAX_EMBEDDINGS_PER_GUILD = 5000` constants added (metrics_computer.py lines 18-19). `ingest_message()` trims both lists when exceeded (lines 42-49). Two new tests verify bounds.

Verification Checklist

data_models.py imports are now runtime (not TYPE_CHECKING), preventing fragile annotation-only behavior
All IDs remain str throughout
pyproject.toml is clean: no duplicate sections, no phantom dependencies, mypy overrides list is accurate
94 noosphere tests (including 2 new bounds tests)

Minor Note (non-blocking)

Foundation uses (str, enum.Enum) tuple base class while analytics (PR #188) uses enum.StrEnum. Both serialize identically, but if you want consistency across branches, pick one style. StrEnum is the more modern Python 3.11+ approach. Not blocking since both work correctly.

Verdict: Approve

This PR is clean and ready to merge.

Merge Crystal Room enums from dev-features into the canonical constants.py so PR #189 can import them after rebase. Both use str, enum.Enum base class for JSON serialization consistency.

- Engine now dispatches CommunityStateVector and ProcessedMessage objects instead of kwargs, matching Phase 2 cog listener signatures - Add data_models.py with CommunityStateVector and ProcessedMessage (matching dev-foundation PR #187 canonical definitions) - Cap ModeManager._history at 100 entries to prevent unbounded growth - Split noosphere loading exception handling: ImportError -> debug, other exceptions -> warning with traceback

greptile-apps bot reviewed Feb 6, 2026

View reviewed changes

This was referenced Feb 6, 2026

Phase 2: Noosphere Engine analytics modules #188

Open

Add Noosphere Engine: Phase 3 features and orchestrator #189

Open

user1303836 mentioned this pull request Feb 6, 2026

Noosphere Engine: Cross-PR Integration Concerns #190

Open

Add CrystalRoomMode and CrystalRoomState enums to canonical constants

29afbc3

Merge Crystal Room enums from dev-features into the canonical constants.py so PR #189 can import them after rebase. Both use str, enum.Enum base class for JSON serialization consistency.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Noosphere Engine foundation (Phase 0 + Phase 1)#187

Add Noosphere Engine foundation (Phase 0 + Phase 1)#187
user1303836 wants to merge 5 commits intomainfrom
feature/noosphere-foundation

user1303836 commented Feb 6, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

user1303836 commented Feb 6, 2026

Uh oh!

greptile-apps bot commented Feb 6, 2026

Uh oh!

user1303836 commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

user1303836 commented Feb 6, 2026

Summary

Test plan

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

user1303836 commented Feb 6, 2026

Code Review: PR #187 -- Noosphere Engine Foundation (Phase 0 + Phase 1)

Issues Requiring Changes

Positive Observations

Uh oh!

greptile-apps bot commented Feb 6, 2026

Uh oh!

user1303836 commented Feb 6, 2026

Re-review: All 7 Items Addressed

Original Issues -- Status

Verification Checklist

Minor Note (non-blocking)

Verdict: Approve

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant