Skip to content

OpenFlow-X/SeraFlow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Sera

Sera is an AI system that connects real-time vision with natural language understanding. It observes your surroundings, builds a memory of the environment, and lets you interact with it through voice or text.

Video

(Note: This video and images are for reference and does not has anything to do with the project's outcomes!)

Click Here

Demo Expected Ouputs

sr-mixedworld-140429-8pm-00068-1000px sr-mixedworld-140429-8pm-00068-1000px

Vizual Reference: Click Here

Features

  • Real-time video processing from smartphone or webcam
  • Object tracking and spatial memory across time
  • Natural language interface to query stored visual context
  • 3D visualization of the room or environment
  • Lightweight models using transfer learning for vision and speech
  • Modular architecture for easy extension and experimentation

Example Use Cases

Ask questions like:

  • "Where did I place my glasses yesterday?"
  • "How many books are on the table right now?"
  • "What was on the desk last night?"
  • Walk through a room and view its 3D memory map on your laptop

Tech Stack

  • Python, PyTorch or TensorFlow
  • OpenCV for video input and object tracking
  • YOLO/Segment Anything for perception (fine-tuned)
  • Whisper or SpeechT5 for voice input
  • Flask/FastAPI backend + WebSocket for real-time streaming
  • Three.js/Blender/Unity for 3D scene visualization

Project Structure

Sera/
│
├── models/          # Vision, language and memory modules
├── data/            # Sample datasets and recorded sessions
├── src/
│   ├── vision/      # Object detection, tracking, scene mapping
│   ├── memory/      # Spatial and temporal memory system
│   ├── interface/   # Voice and text query processing
│   ├── server/      # API and real-time streaming logic
│   └── ui/          # 3D visualization frontend
│
├── requirements.txt
├── README.md
└── LICENSE

How It Works

  1. Video stream from phone or webcam is sent to the system
  2. Objects are detected, tracked, and positioned in 3D space
  3. A memory module stores each object's location and time context
  4. Users ask questions using voice or text
  5. Natural language query is matched to visual-memory data and answered

Future Enhancements

  • Personal object recognition and face identification
  • Multi-room or outdoor mapping
  • Voice assistant integration with smart devices
  • On-device processing for privacy-focused use

Contribution

Contributions, ideas, and research suggestions are welcome. Fork the repo, open an issue, or submit a pull request.

License

This project is licensed under the MIT License.

About

Sera sees, understands, and speaks your surroundings.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published