Skip to content

Add Noosphere Engine foundation (Phase 0 + Phase 1)#187

Open
user1303836 wants to merge 5 commits intomainfrom
feature/noosphere-foundation
Open

Add Noosphere Engine foundation (Phase 0 + Phase 1)#187
user1303836 wants to merge 5 commits intomainfrom
feature/noosphere-foundation

Conversation

@user1303836
Copy link
Owner

Summary

Implements the foundational infrastructure for the Noosphere Engine feature, covering Phase 0 (core package structure) and Phase 1 (metrics computation layer).

Phase 0 - Core Infrastructure:

  • SQLAlchemy models (10 tables): NoosphereGuildState, MessageEmbedding, EgregoreSnapshot, UserBioelectricState, SoundscapeSnapshot, AttractorSnapshot, ArchiveEntry, ArchiveLink, CrystalRoom, GuildMetricsBaseline
  • Shared dataclasses: ProcessedMessage, CommunityStateVector
  • Constants and enums: PHI, Fibonacci sequence, ComputationMode (10 modes), PathologyType (10 types), MessageClassification
  • EmbeddingService: async wrapper around sentence-transformers (paraphrase-multilingual-MiniLM-L12-v2, 384-dim)
  • OutputGovernor: O(1) sidechain gain computation with token bucket and cooldown
  • SoundscapeMonitor: anthrophony/biophony/geophony message classification

Phase 1 - Metrics and Computation:

  • MetricsComputer: three-tier scheduling (hourly/daily/weekly) producing CommunityStateVector
  • WelfordAccumulator: online mean/variance with z-score normalization and sigmoid mapping
  • Egregore Index: weighted composite of coherence (0.4) + convergence (0.3) + concentration (0.3)
  • Archive decay: stigmergic storage with reference-extended half-life (base 168h, 1.5x per reference)
  • FibonacciScheduler: golden-angle jitter for quasiperiodic scheduling
  • PhiParameter: golden ratio mode weight oscillator

Dependencies added: sentence-transformers, chromadb, numpy, scipy, networkx

Test plan

  • 90 unit tests covering all modules (all passing)
  • ruff check passes
  • ruff format passes
  • mypy passes with no issues (14 source files checked)
  • SQLAlchemy models tested with in-memory SQLite
  • Numerical algorithms verified (Welford convergence, coherence bounds, entropy normalization)

Phase 0 - Core package structure:
- SQLAlchemy models (10 tables): guild state, embeddings, egregore
  snapshots, soundscape, attractors, archive, crystal rooms, baselines
- Shared dataclasses: ProcessedMessage, CommunityStateVector
- Constants: PHI, Fibonacci, enums (ComputationMode, PathologyType,
  MessageClassification)
- EmbeddingService: async sentence-transformers wrapper (384-dim)
- OutputGovernor: O(1) sidechain gain, token bucket, cooldown
- SoundscapeMonitor: anthrophony/biophony/geophony classification

Phase 1 - Metrics and computation:
- MetricsComputer: three-tier scheduling (hourly/daily/weekly)
- WelfordAccumulator: online mean/variance with z-score normalization
- Egregore Index: weighted coherence + convergence + concentration
- Archive decay: reference-extended half-life (base 168h)
- FibonacciScheduler: golden-angle jitter for quasiperiodic scheduling
- PhiParameter: golden ratio mode weight oscillator

90 tests covering all modules.
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@user1303836
Copy link
Owner Author

Code Review: PR #187 -- Noosphere Engine Foundation (Phase 0 + Phase 1)

Verdict: Changes Requested -- 7 issues to address before merge, 5 positive observations.


Issues Requiring Changes

1. Enums use enum.Enum instead of str, Enum -- constants.py

Per our alignment agreement, all enums should use str, Enum as base classes for JSON serialization compatibility. Currently ComputationMode, PathologyType, and MessageClassification all inherit from bare enum.Enum.

# Current
class MessageClassification(enum.Enum):
    ANTHROPHONY = "anthrophony"

# Expected
class MessageClassification(str, enum.Enum):
    ANTHROPHONY = "anthrophony"

2. CommunityStateVector missing agreed fields -- shared/data_models.py

The dataclass is missing fields that dev-analytics and dev-features depend on:

  • sentiment_alignment: float = math.nan
  • interaction_modularity: float = math.nan

These were agreed upon during interface alignment. Use math.nan defaults so Phase 1 code doesn't need to compute them yet, but downstream consumers can check for them.

3. datetime.utcnow() is deprecated -- metrics_computer.py:58

datetime.utcnow() is deprecated since Python 3.12. The existing codebase uses datetime.now(UTC) (see database/models.py). Change to match:

from datetime import UTC, datetime
# ...
now = datetime.now(UTC)

4. TYPE_CHECKING imports used for runtime types -- shared/data_models.py

numpy and MessageClassification are imported under TYPE_CHECKING, but they're used at runtime in dataclass fields. This works today because of from __future__ import annotations, but it's fragile -- any code that inspects field types at runtime (dataclasses.fields, serialization libraries, isinstance checks) will break. Move these to unconditional imports.

5. chromadb listed as dependency but unused -- pyproject.toml

chromadb appears in the dependency list but no code imports it. If it's planned for a future phase, remove it now and add it when actually needed. It's a heavy dependency.

6. Duplicate [dependency-groups] section -- pyproject.toml

There are two [dependency-groups] sections in pyproject.toml. TOML spec says duplicate tables are invalid. Some parsers silently merge them, others may error. Consolidate into one section.

7. MetricsComputer unbounded memory growth -- metrics_computer.py

_messages and _embeddings dicts append indefinitely with no pruning. For a long-running bot, this will eventually consume all available memory. Add a rolling window or clear after each computation cycle:

def compute_hourly(self, guild_id: int) -> CommunityStateVector:
    messages = self._messages.pop(guild_id, [])
    embeddings = self._embeddings.pop(guild_id, [])
    # ... compute ...

Or cap at a configurable max size (e.g., last 1000 messages per guild).


Positive Observations

  • OutputGovernor (shared/output_governor.py): Clean O(1) gain computation with token bucket pattern. Per-channel tracking, gain floor of 0.1, stochastic send decision -- all match the self-quieting design principle well.

  • WelfordAccumulator (shared/baseline.py): Numerically stable online mean/variance. Correct use of population variance (N not N-1 for normalization purposes). Sigmoid clamped at +/-10 to avoid overflow.

  • EgregoreIndex default (shared/egregore.py): Returns 0.5 when baseline count < 2, avoiding division-by-zero and giving a neutral starting point. Weighted composite (0.4/0.3/0.3) matches spec.

  • Archive decay math (shared/archive_decay.py): effective_half_life = base * (1.5^references) correctly extends relevance for frequently-referenced entries. Fidelity floor of 0.01 prevents complete erasure. Interaction boost with exponential decay is well-designed.

  • FibonacciScheduler (shared/fibonacci.py): Golden-angle jitter with max(0.5, ...) floor prevents degenerate intervals. Clean cycle-through-sequence design.


@greptile

@greptile-apps
Copy link

greptile-apps bot commented Feb 6, 2026

Thanks for the detailed review! All 7 issues are clear and actionable. I'll address them before requesting re-review:

  1. ✅ Change enums to str, Enum base classes
  2. ✅ Add sentiment_alignment and interaction_modularity fields with math.nan defaults
  3. ✅ Replace datetime.utcnow() with datetime.now(UTC)
  4. ✅ Move numpy and MessageClassification out of TYPE_CHECKING block
  5. ✅ Remove chromadb dependency
  6. ✅ Consolidate duplicate [dependency-groups] sections
  7. ✅ Fix memory leak by adding .pop() to clear messages/embeddings after computation

Appreciate the positive feedback on the implementation patterns! Will push fixes shortly.

- Change guild_id, channel_id, user_id, message_id from str to int in
  ProcessedMessage and CommunityStateVector (Discord IDs are integers)
- Add missing Phase 4 fields to CommunityStateVector: sentiment_alignment,
  interaction_modularity, fractal_dimension, lyapunov_exponent,
  gromov_curvature (default to math.nan)
- Update all dict key types in MetricsComputer, SoundscapeMonitor,
  OutputGovernor to match
- Update all tests to use int IDs
Per team-lead correction: existing codebase uses str for Discord IDs
throughout (database models, cogs, repository layer). Reverting the
int change to maintain consistency.

- Revert guild_id/channel_id/user_id/message_id back to str in
  ProcessedMessage, CommunityStateVector, and all dependent modules
- Change enum base from enum.Enum to (str, enum.Enum) for
  ComputationMode, PathologyType, MessageClassification to enable
  direct JSON serialization
- Keep Phase 4 fields (sentiment_alignment, interaction_modularity,
  fractal_dimension, lyapunov_exponent, gromov_curvature) with
  math.nan defaults
- Add test verifying enum JSON serialization
user1303836 added a commit that referenced this pull request Feb 6, 2026
- constants.py: Match foundation's enum.Enum base (not str, Enum),
  add MessageClassification and Phase 0 constants (embedding, archive,
  output governor), keep Crystal Room enums as additive-only additions
- shared/phi_parameter.py: Replace with foundation's canonical version
  (no tick_count or set_phase -- engine tracks its own tick count)
- test_constants.py: Keep only Crystal Room enum tests, defer shared
  constant/enum tests to foundation
- test_phi_parameter.py: Remove tick_count and set_phase tests
- test_engine.py: Use engine._tick_count instead of phi.tick_count
- test_serendipity.py: Fix flaky range test by using zero-noise injector
- Replace datetime.utcnow() with datetime.now(UTC) (item 3)
- Move TYPE_CHECKING imports to runtime with noqa comments (item 4)
- Remove unused chromadb dependency and mypy override (item 5)
- Remove duplicate [dependency-groups] section (item 6)
- Add MAX_MESSAGES/EMBEDDINGS_PER_GUILD bounds to MetricsComputer (item 7)
@user1303836
Copy link
Owner Author

Re-review: All 7 Items Addressed

I've verified every fix against the original review. All items are resolved.

Original Issues -- Status

# Issue Status
1 Enums use bare enum.Enum Fixed -- ComputationMode, PathologyType, MessageClassification all use (str, enum.Enum) base (constants.py lines 24, 37, 50)
2 CommunityStateVector missing Phase 4 fields Fixed -- sentiment_alignment, interaction_modularity, fractal_dimension, lyapunov_exponent, gromov_curvature added with math.nan defaults (data_models.py lines 42-46)
3 datetime.utcnow() deprecated Fixed -- replaced with datetime.now(UTC) in metrics_computer.py (line 68). No remaining utcnow references in source or tests.
4 TYPE_CHECKING imports for runtime types Fixed -- datetime, numpy, MessageClassification now imported at runtime with # noqa: TC001/TC002/TC003 comments (data_models.py lines 5, 7, 9)
5 Unused chromadb dependency Fixed -- removed from both [project.dependencies] and [[tool.mypy.overrides]]. Zero references to chromadb in the codebase.
6 Duplicate [dependency-groups] section Fixed -- zero occurrences of [dependency-groups] in pyproject.toml
7 MetricsComputer unbounded memory Fixed -- MAX_MESSAGES_PER_GUILD = 5000 and MAX_EMBEDDINGS_PER_GUILD = 5000 constants added (metrics_computer.py lines 18-19). ingest_message() trims both lists when exceeded (lines 42-49). Two new tests verify bounds.

Verification Checklist

  • data_models.py imports are now runtime (not TYPE_CHECKING), preventing fragile annotation-only behavior
  • All IDs remain str throughout
  • pyproject.toml is clean: no duplicate sections, no phantom dependencies, mypy overrides list is accurate
  • 94 noosphere tests (including 2 new bounds tests)

Minor Note (non-blocking)

Foundation uses (str, enum.Enum) tuple base class while analytics (PR #188) uses enum.StrEnum. Both serialize identically, but if you want consistency across branches, pick one style. StrEnum is the more modern Python 3.11+ approach. Not blocking since both work correctly.

Verdict: Approve

This PR is clean and ready to merge.

Merge Crystal Room enums from dev-features into the canonical
constants.py so PR #189 can import them after rebase. Both use
str, enum.Enum base class for JSON serialization consistency.
user1303836 added a commit that referenced this pull request Feb 6, 2026
- Engine now dispatches CommunityStateVector and ProcessedMessage objects
  instead of kwargs, matching Phase 2 cog listener signatures
- Add data_models.py with CommunityStateVector and ProcessedMessage
  (matching dev-foundation PR #187 canonical definitions)
- Cap ModeManager._history at 100 entries to prevent unbounded growth
- Split noosphere loading exception handling: ImportError -> debug,
  other exceptions -> warning with traceback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant