Skip to content

luohongk/some-stars

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2,961 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Some Stars

LuoHongkun的star列表,每6小时自动更新,参考链接->github star列表自动更新


Table of Contents

JavaScript

C++

Python

  • NVIDIA/soma-retargeter - SOMA BVH to humanoid robot motion retargeting library built with Newton and NVIDIA Warp

  • TeleHuman/HumanoidSoccer - Learning Soccer Skills for Humanoid Robots: A Progressive Perception-Action Framework

  • robodhruv/visualnav-transformer - Official code and checkpoint release for mobile robot foundation models: GNM, ViNT, and NoMaD.

  • csiro-robotics/WildCross - [IEEE ICRA 2026] The official repository for paper WildCross: A Cross-Modal Large Scale Benchmark for Place Recognition and Metric Depth Estimation in Natural Environments at IEEE ICRA 2026

  • dimensionalOS/dimos - Dimensional is the agentic operating system for physical space. Vibecode humanoids, quadrupeds, drones, and other hardware platforms in natural language and build multi-agent systems that work seamlessly with physical input (cameras, lidar, actuators).

  • Perkins729/OmniXtreme -

  • zhutengjie/CLOT - official code for paper CLOT: Closed-Loop Global Motion Tracking for Whole-Body Humanoid Teleoperation

  • HKUDS/nanobot - "🐈 nanobot: The Ultra-Lightweight OpenClaw"

  • UniflexAI/tinynav - TinyNav: A lightweight, hackable system to guide your robots anywhere.

  • Humanoid-SkillBlender/SkillBlender - Official implementation of SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending

  • K-Dense-AI/claude-scientific-skills - A set of ready to use Agent Skills for research, science, engineering, analysis, finance and writing.

  • leggedrobotics/rsl_rl - A fast and simple implementation of learning algorithms for robotics.

  • AgibotTech/genie_sim - Simulation Platform from AgiBot

  • hwjiang1510/RayZer - Code for ICCV'2025 (Best student paper honorable mention) "RayZer: A Self-supervised Large View Synthesis Model"

  • Spirit-AI-Team/spirit-v1.5 - Spirit-v1.5: A Robotic Foundation Model by Spirit AI

  • lukasmolnar/wb-mpc-locoman - A flexible optimization framework for whole-body loco-manipulation, built with Pinocchio and CasADi. Supports multiple dynamics formulations and solver backends.

  • nvidia-cosmos/cosmos-predict2.5 - Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.

  • be2rlab/km-vipe - Online Tightly Coupled Vision-Language-Geometry Fusion for Open-Vocabulary Semantic SLAM

  • marmotlab/ORION-multi-agent-navigation - ORION: Option-Regularized Deep Reinforcement Learning for Cooperative Multi-Agent Online Navigation

  • ManifoldTechLtd/Odin-Nav-Stack - An open-source navigation stack based on Odin1.

  • R-C-Group/Odin-Navigation-Stack - Odin-Navigation-Stack的解读

  • Wenyueh/MinivLLM - Based on Nano-vLLM, a simple replication of vLLM with self-contained paged attention and flash attention implementation

  • RPL-CS-UCL/litevloc_code - LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation

  • william13077/IAmGoodNavigator -

  • facebookresearch/home-robot - Mobile manipulation research tools for roboticists

  • MaureenZOU/m3-spatial - [ICLR 2025] Official Implementation of M3: 3D-Spatial Multimodal Memory

  • ika-rwth-aachen/ros2_unbag - A ROS 2 tool for exporting bags to human readable files. Supports pluggable export routines to handle any message type.

  • NVlabs/vla0 - VLA-0: Building State-of-the-Art VLAs with Zero Modification

  • realsee-developer/RealSee3D - RealSee3D: A multi-view RGB-D dataset combining real-world captures and procedurally generated scenes, with extensible annotations for diverse 3D vision research.

  • Galery23/SAGE-3D_Official - This is the official repository of the paper "Towards Physically Executable 3D Gaussian for Embodied Navigation".

  • ZHUANGHP/Any-SSR - This is the official code for Any-SSR "Analytic Subspace Routing: How Recursive Least Squares Works in Continual Learning of Large Language Model"

  • Ericonaldo/visual_wholebody - Train a loco-manipulation dog with RL

  • facebookresearch/spider - A general physic-based retargeting framework.

  • WzcTHU/SeeNav-Agent -

  • hanruihua/ir-sim - A Python-based lightweight robot simulator designed for navigation, control, and learning

  • Any-4D/Any4D - Any4D: Unified Feed-Forward Metric 4D Reconstruction

  • cmjang/InternNav-deploy - Edge deployment guide for InternNav-based perception and navigation on Unitree Go2 / Go2W / B2 robots (ROS 2, RealSense, Python).

  • nvidia-isaac/WBC-AGILE - Whole Body Control for humanoids: AGILE

  • Xian-Bei/TALO - Pushing 3D Vision Foundation Models Towards Globally Consistent Online Reconstruction

  • ContinualAI/avalanche - Avalanche: an End-to-End Library for Continual Learning based on PyTorch.

  • arclab-hku/P2M - [RA-L'25] A Simple LiDAR-centric End-to-end Navigation Framework in Dynamic Environments

  • 3DAgentWorld/VGGT4D - The official implementation of the paper “VGGT4D: Mining Motion Cues in Visual Geometry Transformers for 4D Scene Reconstruction.”

  • Mayankm96/isaac-spinning-up - Educational Resource for Isaac Lab

  • open-gigaai/giga-brain-0 - GigaBrain-0: A World Model-Powered Vision-Language-Action Model

  • wang-kevin3290/scaling-crl -

  • fanegg/Human3R - An unified model for 4D human-scene reconstruction

  • co-me-tokens/CoMe - [CVPR 26] Release repo of our work "Co-Me: Confidence-Guided Token Merging for Visual Geometric Transformers"

  • BIT-DYN/OpenGraph - [RAL 2024] OpenGraphs: Open-Vocabulary Hierarchical 3D Scene Graphs in Large-Scale Outdoor Environments

  • amazon-far/TWIST2 - [arXiv 2025] TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System

  • Hilti-Research/hilti-trimble-slam-challenge-2026 - 360 Visual-Inertial Benchmark with Floor Plan Priors for SLAM and Localization

  • leggedrobotics/pace-sim2real - PACE: A systematic approach for sim-to-real transfer of legged robots, identifying actuator and joint dynamics with standard joint encoders.

  • xuxw98/Online3D - [CVPR 2024] Memory-based Adapters for Online 3D Scene Perception

  • LeCAR-Lab/HDMI -

  • EGalahad/sim2real -

  • KumarRobotics/RT-GuIDE - [RA-L 2025] RT-GuIDE: Real-Time Gaussian Splatting for Information-Driven Exploration

  • dcharatan/pixelsplat - [CVPR 2024 Oral, Best Paper Runner-Up] Code for "pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction" by David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann

  • Motphys/MotrixLab - A general-purpose machine learning architecture designed for robot training

  • concept-graphs/concept-graphs - Official code release for ConceptGraphs

  • UnrealZoo/unrealzoo-gym - [ICCV 2025 Highlights] Large-scale photo-realistic virtual worlds for embodied AI

  • rossning92/helicopter-rl - Train a reinforcement learning agent (PPO) to play a retro helicopter arcade game using Stable-Baselines3 and a custom Gymnasium environment.

  • AMAP-EAI/SocialNav - Official implementation for "SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied Navigation"

  • Jonnyffeler/OutdoorSceneGraph -

  • WEIFENG2333/VideoCaptioner - 🎬 卡卡字幕助手 | VideoCaptioner - 基于 LLM 的智能字幕助手 - 视频字幕生成、断句、校正、字幕翻译全流程处理!- A powered tool for easy and efficient video subtitling.

  • ymy-k/Hi-SAM - [IEEE TPAMI] Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

  • FlagOpen/RoboCOIN - RoboCoin + LeRobot integration

  • facebookresearch/sam-3d-body - The repository provides code for running inference with the SAM 3D Body Model (3DB), links for downloading the trained model checkpoints and datasets, and example notebooks that show how to use the model.

  • lovelyyoshino/TradingAgentForMarket -

  • facebookresearch/sam-3d-objects - SAM 3D Objects

  • AIR-DISCOVER/FreeAskWorld - [AAAI 2026 Oral] FreeAskWorld is an interactive simulation framework that integrates large language models (LLMs) for high-level planning and socially grounded interaction in embodied AI.

  • linglingxiansen/SocialNav-Map -

  • linglingxiansen/MapNav -

  • Maxwell-Zhao/RoboSimGS - Code for [RA-L] High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting

  • Chenkehan21/svm-nav -

  • fastgs/FastGS - [CVPR 2026] Offical code for "FastGS: Training 3D Gaussian Splatting in 100 Seconds"

  • agrimgupta92/derl - Code for "Embodied Intelligence via Learning and Evolution", Gupta et al, Nature Communications

  • GREAT-WHU/MASt3R-Fusion - Integrating Feed-Forward Visual Model with IMU, GNSS for High-Functionality SLAM.

  • mrwangyou/SCOPE - Official repository of "Expand Your SCOPE, Semantic Cognition Over Potential-based Exploration for Embodied Visual Navigation"

  • Livioni/OmniVGGT-official - [CVPR 2026 MAIN] OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer

  • sair-lab/AirRoom - [CVPR 2025] AirRoom: Objects Matter in Room Reidentification

  • ByteDance-Seed/Depth-Anything-3 - Depth Anything 3

  • JIEKE66633/One-click-cleaning-of-C-drive - 只需轻松一点,即可安全高效的清理C盘残留和垃圾,并且对电脑毫无危险

  • LeapLabTHU/AdaptiveNN - [Nature Machine Intelligence 2025] Emulating Human-like Adaptive Vision for Efficient and Flexible Machine Visual Perception

  • zhoubohan0/NOLO - [IROS 2025 oral] Official implementation of NOLO: Navigate Only Look Once

  • zhaozijie2022/m3w-marl - Official implementation of the paper "Learning and Planning Multi-Agent Tasks via a MoE-based World Model"

  • HybridRobotics/whole_body_tracking -

  • MrZihan/Dynam3D - Official implementation of "Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation" (NeurIPS'25 Oral)

  • facebookresearch/Online-3DGS-Monocular - Code repo for the SIGGRAPH paper "Monocular Online Reconstruction with Enhanced Detail Preservation". Project page https//poiw.github.io/MODP/index.html

  • wsakobe/TrackVLA - [CoRL 2025] Repository relating to "TrackVLA: Embodied Visual Tracking in the Wild"

  • unified-force/UniFP - CoRL2025 UniFP: Learning a Unified Policy for Position and Force Control in Legged Loco-Manipulation

  • 666ghj/BettaFish - 微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。

  • worldbench/3EED - [NeurIPS 2025 DB Track] 3EED: Ground Everything Everywhere in 3D

  • DAVIAN-Robotics/ACG - Code for "ACG: Action Coherence Guidance for Flow-based Vision-Language-Action Models" (ICRA 2026)

  • newton-physics/newton - An open-source, GPU-accelerated physics simulation engine built upon NVIDIA Warp, specifically targeting roboticists and simulation researchers.

  • Ma-Zhuang/OmniNWM - OmniNWM: Omniscient Navigation World Models for Autonomous Driving

  • cshizhe/VLN-DUET - Official implementation of Think Global, Act Local: Dual-scale GraphTransformer for Vision-and-Language Navigation (CVPR'22 Oral).

  • lovelyyoshino/VLFM-Commit - 适配CUDA11.8、habitat-sim0.2.4版本的VLFM,并给出详细的代码理解注释

  • Fudan-MAGIC-Lab/VINGS-Mono - Source code for [TRO2025] VINGS-Mono: Visual Inertial Gaussian Splatting Monocular SLAM in Large Scenes.

  • deepseek-ai/DeepSeek-OCR - Contexts Optical Compression

  • aubingazhib/LightGlueStick - a Fast and Robust Glue for Joint Point-Line Matching

  • ReinFlow/ReinFlow - [NeurIPS 2025] Flow x RL. "ReinFlow: Fine-tuning Flow Policy with Online Reinforcement Learning". Support VLAs e.g., pi0, pi0.5. Fully open-sourced.

  • woyut/NavQ_ICCV25 - Implementation of "NavQ: Learning a Q-Model for Foresighted Vision-and-Language Navigation" (ICCV 2025)

  • physical-superintelligence-lab/Humanoid-Everyday - Humanoid dataset for learning

  • NHirose/OmniVLA - Official repository for OmniVLA training and inference code

  • IRMVLab/I2PNet - [TRO 2025] Codes for "End-to-end 2D-3D Registration between Image and LiDAR Point Cloud for Vehicle Localization"

  • MobiusLqm/MoDGS - Official Implementation of paper accepted by ICLR2025-MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors

  • xieyuser/UniGS - Unified Geometry-Aware Gaussian Splatting for Multimodal Rendering

  • imlixinyang/FlashWorld - Code for "FlashWorld: High-quality 3D Scene Generation within Seconds" (ICLR 2026 Oral)

  • OpenHelix-Team/Spatial-Forcing - Official implementation of Spatial-Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model [ICLR2026]

  • starVLA/starVLA - StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing

  • jzhzhang/Uni-NaVid - [RSS 2025] Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks.

  • YanjieZe/awesome-humanoid-robot-learning - A Paper List for Humanoid Robot Learning.

  • karpathy/nanochat - The best ChatGPT that $100 can buy.

  • Inception3D/TTT3R - A simple state update rule to enhance length generalization for CUT3R

  • Zxy-MLlab/LIBERO-PRO - LIBERO-PRO is the official repository of the LIBERO-PRO — an evaluation extension of the original LIBERO benchmark

  • Eku127/habitat-data-collector - Habitat-based tools for dynamic arrangement and data recording

  • geyan21/ManiFlow_Policy - [CoRL 2025] ManiFlow: A General Robot Manipulation Policy via Consistency Flow Training

  • MIV-XJTU/JanusVLN - [ICLR2026] Official implementation for "JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation"

  • WECENG/ticket-purchase - 大麦自动抢票,支持人员、城市、日期场次、价格选择

  • fscdc/RewardMap - [ICLR 2026] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

  • OpenHelix-Team/VLA-RFT - VLA-RFT: Vision-Language-Action Models with Reinforcement Fine-Tuning

  • luohongk/Embodied-AI-Daily - 📚这个仓库是在arxiv上收集的有关VLN,VLA,World Model,SLAM,Gaussian Splatting,非线性优化等相关论文。每天都会自动更新!issue区域是最新10篇论文

  • Tsinghua-MARS-Lab/SLAM-Former - SLAM-Former: Putting SLAM into One Transformer

  • jmanhype/vggt-mps - VGGT 3D Vision Agent optimized for Apple Silicon with Metal Performance Shaders

  • AIGeeksGroup/Nav-R1 - Nav-R1: Reasoning and Navigation in Embodied Scenes

  • Alibaba-NLP/DeepResearch - Tongyi Deep Research, the Leading Open-source Deep Research Agent

  • InternRobotics/InternVLA-A1 - InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation​

  • BIT-DYN/omnimap - [TRO 2025] OmniMap: A General Mapping Framework Integrating Optics, Geometry, and Semantics

  • InternRobotics/InternVLA-M1 - InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy

  • facebookresearch/map-anything - MapAnything: Universal Feed-Forward Metric 3D Reconstruction

  • RUC-NLPIR/FlashRAG - ⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)

  • Vid2Sim/Vid2Sim - [CVPR 25] Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation

  • manycore-research/SpatialGen - [3DV 2026] SpatialGen: Layout-guided 3D Indoor Scene Generation

  • PRIME-RL/SimpleVLA-RL - [ICLR 2026] SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

  • NJU-3DV/SpatialVID - [CVPR 2026] SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

  • AIGeeksGroup/3D-R1 - 3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding

  • wenhuiwei-ustc/BotVIO -

  • mystorm16/FastVGGT - [ICLR 2026] FastVGGT: Fast Visual Geometry Transformer

  • vllm-project/vllm - A high-throughput and memory-efficient inference and serving engine for LLMs

  • zhangganlin/vista-slam - [3DV 2026] ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association

  • stepfun-ai/Step-Audio2 - Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

  • OpenHelix-Team/LLaVA-VLA - LLaVA-VLA: A Simple Yet Powerful Vision-Language-Action Model [ICRA 2026]

  • JiuTian-VL/CogVLA - [NeurIPS 2025] CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification

  • Heathcliff-saku/BSC-Nav - This repository is the official implementation of our paper (From reactive to cognitive: brain-inspired spatial intelligence for embodied agents)

  • LetheSec/HuggingFace-Download-Accelerator - 利用HuggingFace的官方下载工具从镜像网站进行高速下载。

  • vuer-ai/vuer - Vuer is a 3D visualization tool for robotics and VR applications.

  • Tencent-Hunyuan/Hunyuan-GameCraft-1.0 - Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition

  • Tokishx/DifNav - This is the source code to paper “DAgger Diffusion Navigation: DAgger Boosted Diffusion Policy for Vision-Language Navigation”.

  • cvg/FrontierNet - [RA-L 2025] FrontierNet: Learning Visual Cues to Explore

  • CrystalSixone/VLN_CLASH - This is the official repository for VLN-CLASH.

  • sgl-project/sglang - SGLang is a high-performance serving framework for large language models and multimodal models.

  • OpenGalaxea/GalaxeaVLA - Galaxea's open-source VLA repository

  • ziyan-xiaoyu/SpatialMQA -

  • haotian-liu/LLaVA - [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

  • Stanford-TML/HEAD_rl_deploy - Official implementation of HEAD CoRL 2025

  • Zhoues/RoboRefer - [NeurIPS 2025] Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"

  • openai/gpt-oss - gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

  • GuHuangAI/LaDiWM - code for CoRL2025 "LaDiWM: A Latent Diffusion-based World Model for Predictive Manipulation"

  • unique1i/SceneSplat - [ICCV 2025 Oral] SceneSplat - Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

  • wangyr22/DepthGS - Official implementation of IROS 2025 paper Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline

  • sapientinc/HRM - Hierarchical Reasoning Model Official Release

  • dfki-ric/better_launch - A better replacement for the ROS2 launch system: intuitive, simple, memorable.

  • chengine/splatnav -

  • maturk/dn-splatter - DN-Splatter + AGS-Mesh: Depth and Normal Priors for Gaussian Splatting

  • Tencent-Hunyuan/HunyuanWorld-1.0 - Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels with Hunyuan3D World Model

  • Feliciaxyao/NavMorph - Official implementation of NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments (ICCV'25).

  • NVlabs/Long-RL - Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)

  • RayFronts/RayFronts - [IROS'25] Source code for "RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration"

  • ShaohonChen/Qwen3-SmVL - 将SmolVLM2的视觉头与Qwen3-0.6B模型进行了拼接微调

  • yyfz/Pi3 - [ICLR 2026] π^3: Permutation-Equivariant Visual Geometry Learning

  • wzzheng/StreamVGGT - [ICLR 2026] Streaming 4D Visual Geometry Transformer

  • leandro-svg/HybridTrack - [RA-L25/ICRA26] HybridTrack: A Hybrid Approach for Robust Multi-Object Tracking

  • wencan25/Fast3D - [ACM MM 2025] Fast3D: Accelerating 3D Multi-modal Large Language Models for Efficient 3D Scene Understanding

  • Selen-Suyue/MBA - [RA-L 2025 & ICRA 2026] 😽 Motion Before Action: Diffusing Object Motion as Manipulation Condition

  • lisj575/GaussianUDF - Code Release for CVPR (2025), "GaussianUDF: Inferring Unsigned Distance Functions through 3D Gaussian Splatting"

  • ColinQiyangLi/qc -

  • facebookresearch/hydra - Hydra is a framework for elegantly configuring complex applications

  • DengKaiCQ/VGGT-Long - Official implement of VGGT-Long

  • Zhangwenyao1/DreamVLA - [NeurIPS 2025] DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

  • Sirui-Xu/InterMimic - [CVPR 2025 Highlight] InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions

  • HorizonRobotics/EmbodiedGen - Towards a Generative 3D World Engine for Embodied Intelligence

  • yang-zj1026/legged-loco - Low-level locomotion policy training in Isaac Lab

  • NVlabs/VILA - VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

  • bytedance/F-16 - F-16 is a powerful video large language model (LLM) that perceives high-frame-rate videos, which is developed by the Department of Electronic Engineering at Tsinghua University and ByteDance.

  • InternRobotics/StreamVLN - [ICRA 2026] Official implementation of the paper: "StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling"

  • AnjieCheng/NaVILA - [RSS'25] This repository is the implementation of "NaVILA: Legged Robot Vision-Language-Action Model for Navigation"

  • LiteReality/LiteReality - [NeurIPS 2025] LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans

  • WHU-USI3DV/PatchAugNet - PatchAugNet: Patch feature augmentation-based heterogeneous point cloud place recognition in large-scale street scenes

  • InternRobotics/AnySplat - [SIGGRAPH Asia 2025 (ACM TOG)] AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views

  • OpenGVLab/InternVL - [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

  • FlagOpen/RoboBrain2.5 - RoboBrain 2.5: Advanced version of RoboBrain. Depth in Sight, Time in Mind. 🎉🎉🎉

  • unitreerobotics/unitree_rl_lab - This is a repository for reinforcement learning implementation for Unitree robots, based on IsaacLab.

  • THU-SI/LangScene-X - [ICCV 2025] LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion

  • OpenDriveLab/DetAny3D - [ICCV 2025] Detect Anything 3D in the Wild

  • avlmaps/AVLMaps - [ISER 2023] The official implementation of Audio Visual Language Maps for Robot Navigation

  • youjie-zhou/FMF-SLAM -

  • google-research/valan - Vision and Language Agent Navigation

  • InternRobotics/CronusVLA - [AAAI26 oral] CronusVLA: Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling

  • zst1406217/VR-Robo - [RA-L 2025] VR-Robo: A Real-to-Sim-to-Real Framework for Visual Robot Navigation and Locomotion

  • ML-GSAI/LLaDA-V -

  • MIT-SPARK/Clio -

  • hovsg/HOV-SG - [RSS2024] Official implementation of "Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation"

  • aau-cns/radar_transformer - Transformer-based deep learning architecture for 3D point matching in sparse radar point clouds

  • iMoonLab/yolov13 - Implementation of "YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception".

  • nianticlabs/marepo - [CVPR 2024 Highlight] Map-Relative Pose Regression for Visual Re-Localization

  • ahydchh/Impromptu-VLA -

  • ut-amrl/creste_public - [RSS 2025] CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance

  • JohannaXie/GauSS-MI - [RSS 2025] GauSS-MI: Gaussian Splatting Shannon Mutual Information for Active 3D Reconstruction

  • hnuzhy/YOTO - [RSS2025] Code for my paper "You Only Teach Once: Learn One-Shot Bimanual Robotic Manipulation from Video Demonstrations"

  • Qi-Zhangyang/GPT4Scene-and-VLN-R1 - GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models

  • tsinghua-fib-lab/Mem4Nav - Mem4Nav: Boosting Vision-and-Language Navigation in Urban Environments with a Hierarchical Spatial-Cognition Long-Short Memory System

  • PRBonn/PINGS - 📌 PINGS: Gaussian Splatting Meets Distance Fields within a Point-Based Implicit Neural Map [RSS' 25]

  • Tencent/DepthCrafter - [CVPR 2025 Highlight] DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

  • siyuhsu/vla-cache - [NeurIPS 2025] VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching

  • LMCache/LMCache - Supercharge Your LLM with the Fastest KV Cache Layer

  • openai/openai-cs-agents-demo - Demo of a customer service use case implemented with the OpenAI Agents SDK

  • 3DTopia/MaterialAnything - [CVPR 2025 Highlight] Material Anything: Generating Materials for Any 3D Object via Diffusion

  • LeCAR-Lab/ASAP - [RSS 2025] "ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills"

  • zhaihongjia/PanoGS - [CVPR 2025] PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding

  • GeeeekExplorer/nano-vllm - Nano vLLM

  • Fediory/HVI-CIDNet - [CVPR2025 && NTIRE2025] HVI: A New Color Space for Low-light Image Enhancement (Official Implementation)

  • isaac-sim/IsaacSim - NVIDIA Isaac Sim™ is an open-source application on NVIDIA Omniverse for developing, simulating, and testing AI-driven robots in realistic virtual environments.

  • jzhzhang/3DAwareNav - [CVPR 2023] We propose a framework for the challenging 3D-aware ObjectNav based on two straightforward sub-policies. The two sub-polices, namely corner-guided exploration policy and category-aware identification policy, simultaneously perform by utilizing online fused 3D points as observation.

  • buaa-colalab/OctoNav-R1 - Code for OctoNav-Bench and OctoNav-R1

  • Fanqi-Lin/OneTwoVLA - Official implementation of "OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning"

  • realcrane/3D-student-splatting-and-scooping - This is the source code of our CVPR 2025 Best Paper Honourable Mention paper: 3D Student Splatting and Scooping

  • microsoft/qlib - Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, including supervised learning, market dynamics modeling, and RL, and is now equipped with https://github.com/microsoft/RD-Agent to automate R&D process.

  • agent0ai/agent-zero - Agent Zero AI framework

  • Shubhamsaboo/awesome-llm-apps - Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.

  • facebookresearch/habitat-lab - A modular high-level library to train embodied AI agents across a variety of tasks and environments.

  • karpathy/minGPT - A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

  • allenzren/open-pi-zero - Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence

  • Physical-Intelligence/real-time-chunking-kinetix - Simulated experiments for "Real-Time Execution of Action Chunking Flow Policies".

  • B0B8K1ng/WMNavigation - [IROS'25 Oral] WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation

  • Physical-Intelligence/openpi -

  • nunchaku-ai/nunchaku - [ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

  • InternRobotics/NavDP - Official implementation of the paper: "NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance"

  • GeWu-Lab/AnyTouch - The repo for "AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors", ICLR 2025

  • JIA-Lab-research/LISA - Project Page for "LISA: Reasoning Segmentation via Large Language Model"

  • InternRobotics/InternUtopia - A simulation platform for versatile Embodied AI research and developments.

  • JunweiLiang/awesome_lists - Awesome Lists for Tenure-Track Assistant Professors and PhD students. (助理教授/博士生生存指南)

  • Zeying-Gong/Falcon - Official Code for "From Cognition to Precognition: A Future-Aware Framework for Social Navigation" (ICRA 2025)

  • Zeying-Gong/ascent - [RAL‘26] Stairway to Success: An Online Floor-Aware Zero-Shot Object-Goal Navigation Framework via LLM-Driven Coarse-to-Fine Exploration

  • GradientSpaces/Rectified-Point-Flow - [NeurIPS 2025, Spotlight] Rectified Point Flow: Generic Point Cloud Pose Estimation

  • THU-SI/Spatial-MLLM - [NeurIPS 2025] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

  • openvla/openvla - OpenVLA: An open-source vision-language-action model for robotic manipulation.

  • Eku127/DualMap - [RAL-25] An online open-vocabulary mapping system that enables natural language querying to navigate dynamic scenes, with ROS support.

  • MIT-SPARK/VGGT-SLAM - VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold

  • lichunshang/deep_ekf_vio -

  • SunYangtian/UniGeo - UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation

  • Zie619/n8n-workflows - all of the workflows of n8n i could find (also from the site itself)

  • Paper2Poster/Paper2Poster - [NeurIPS 2025 D&B] Open-source Multi-agent Poster Generation from Papers

  • resemble-ai/chatterbox - SoTA open-source TTS

  • VITA-Group/VLM-3R - [CVPR 2026] VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction

  • NMS05/DinoV2-SigLIP-Phi3-LoRA-VLM -

  • Fosowl/agenticSeek - Fully Local Manus AI. No APIs, No $200 monthly bills. Enjoy an autonomous agent that thinks, browses the web, and code for the sole cost of electricity. 🔔 Official updates only via twitter @Martin993886460 (Beware of fake account)

  • YanyuanQiao/Open-Nav - [ICRA 2025] Official implementation of Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs

  • facebookresearch/habitat-challenge - Code for the habitat challenge

  • DreamTechAI/Direct3D-S2 - [NeurIPS 2025] Direct3D‑S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

  • AILab-CVC/YOLO-World - [CVPR 2024] Real-Time Open-Vocabulary Object Detection

  • GengzeZhou/NavGPT-2 - [ECCV 2024] Official implementation of NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

  • hojonathanho/diffusion - Denoising Diffusion Probabilistic Models

  • AUTOMATIC1111/stable-diffusion-webui - Stable Diffusion web UI

  • hanruihua/neupan_ros - ROS Wrapper of NeuPAN planner

  • AgibotTech/agibot_x1_train - The reinforcement learning training code for AgiBot X1.

  • unitreerobotics/unitree_rl_gym -

  • siyuanliii/masa - Official Implementation of CVPR24 highlight paper: Matching Anything by Segmenting Anything

  • OpenDriveLab/UniVLA - [RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions

  • JiuhaiChen/BLIP3o - Official implementation of BLIP3o-Series

  • apple/ml-fastvlm - This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

  • xming521/WeClone - 🚀 One-stop solution for creating your AI twin from chat history 💡 Fine-tune LLMs with your chat logs to capture your unique style, then bind to a chatbot to bring your digital self to life. 从聊天记录创造数字分身的一站式解决方案

  • bagh2178/UniGoal - [CVPR 2025] UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

  • IDEA-Research/GroundingDINO - [ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

  • harry0703/MoneyPrinterTurbo - 利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.

  • QitaoZhao/DiffusionSfM - [CVPR 2025] "DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion" official implementation.

  • Brummi/anycam - Official repository for "AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos" (CVPR 2025)

  • lyp-deeplearning/LiftFeat - Code for "LiftFeat: 3D Geometry-Aware Local Feature Matching", ICRA2025

  • huggingface/nanoVLM - The simplest, fastest repository for training/finetuning small-sized VLMs.

  • gradslam/gradslam - gradslam is an open source differentiable dense SLAM library for PyTorch

  • MrZihan/GridMM - Official implementation of GridMM: Grid Memory Map for Vision-and-Language Navigation (ICCV'23).

  • liangpan99/TokenHSI - [CVPR 2025 Oral] TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization

  • MrZihan/HNR-VLN - Official implementation of Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation (CVPR'24 Highlight).

  • natolambert/rlhf-book - Textbook on reinforcement learning from human feedback

  • lllyasviel/FramePack - Lets make video diffusion practical!

  • cshizhe/onav_rim -

  • DefaultRui/BEV-Scene-Graph - [ICCV23] Bird’s-Eye-View Scene Graph for Vision-Language Navigation

  • chen-judge/MapGPT - [ACL 24] The official implementation of MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation.

  • FarInHeight/To-Match-or-Not-to-Match - Official code for "To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition" CVPR IMW 2025

  • amaralibey/MixVPR - MixVPR: Feature Mixing for Visual Place Recognition (WACV 2023)

  • amaralibey/gsv-cities - GSV-Cities: a large-scale dataset for visual place recognition

  • jiangxinke/Agentic-RAG-R1 - Agentic RAG R1 Framework via Reinforcement Learning

  • NVIDIA-AI-IOT/ros2_nanollm - ROS2 nodes for LLM, VLM, VLA

  • ZiYang-xie/WorldGen - 🌍 WorldGen - Generate Any 3D Scene in Seconds

  • facebookresearch/flow_matching - A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

  • ybgdgh/L3MVN - Leveraging Large Language Models for Visual Target Navigation

  • ibaiGorordo/vggt-pytorch-inference - Repository for running the VGGT model in PyTorch

  • facebookresearch/nwm - Official code for the CVPR 2025 paper "Navigation World Models".

  • DefaultRui/VLN-VER - [CVPR24] Volumetric Environment Representation for Vision-Language Navigation

  • pablovela5620/mini-dpvo -

  • EricTan7/RAM - [CVPR2025] Official implementation of RAM

  • Jirl-upenn/VLMnav - End-to-End Navigation with VLMs

  • lllyasviel/ControlNet - Let us control diffusion models!

  • naokiyokoyama/ovon - Open Vocabulary Object Navigation

  • bdaiinstitute/vlfm - The repository provides code associated with the paper VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation (ICRA 2024)

  • isaac-sim/IsaacLab - Unified framework for robot learning built on NVIDIA Isaac Sim

  • NVlabs/HOVER - HOVER

  • cvlab-kaist/ZeroCo - CVPR 2025 (Highlight) : Official implementation of "Cross-View Completion Models are Zero-shot Correspondence Estimators"

  • SWE-agent/SWE-agent - SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]

  • KTH-RPL/OneMap - [ICRA'25] One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation

  • isri-aist/RoboManipBaselines - A software framework integrating various imitation learning methods and benchmark environments for robotic manipulation

  • AlbertoJaenal/MapAbstractionVPR - Implementation for Image database abstracion

  • THU-SI/VideoScene - [CVPR 2025 Highlight] VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

  • subframe7536/maple-font - Maple Mono: Open source monospace font with round corner, ligatures and Nerd-Font icons for IDE and terminal, fine-grained customization options. 带连字和控制台图标的圆角等宽字体,中英文宽度完美2:1,细粒度的自定义选项

  • sintel-dev/Orion - Unsupervised time series anomaly detection library

  • lus6-Jenny/RINGSharp - [IEEE T-RO 2025] RING#: PR-by-PE Global Localization with Roto-translation Equivariant Gram Learning.

  • yuliangguo/depth_any_camera - [CVPR 2025] Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera

  • SpatialVLA/SpatialVLA - 🔥 SpatialVLA: a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes. Accepted at RSS 2025.

  • apple/ml-matrix3d - [CVPR 2025 Highlight] Matrix3D: Large Photogrammetry Model All-in-One

  • FlagOpen/RoboBrain - [CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete. Official Repository.

  • arajv/SayNav - Grounding Large Language Models for Dynamic Planning to Navigation in New Environments

  • BAAI-DCAI/SpatialBot - The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.

  • facebookresearch/RAM - A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).

  • honghd16/GSA-VLN - Official repository of General Scene Adaptation for Vision-and-Language Navigation (ICLR'2025)

  • MarSaKi/ETPNav - [TPAMI 2024] Official repo of "ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments"

  • Chenkehan21/CA-Nav-code -

  • CrystalSixone/VLN-GOAT - Repository for Vision-and-Language Navigation via Causal Learning (Accepted by CVPR 2024)

  • GAMMA-UMD-Outdoor-Navigation/BehAV - BehAV: Behavioral Rule Guided Autonomy Using VLM for Robot Navigation in Outdoor Scenes (ICRA'25)

  • dillonloh/AdaVLN - IsaacSim Extension for Dynamic Objects in Matterport3D Environments for AdaVLN research

  • vlmaps/vlmaps - [ICRA2023] Implementation of Visual Language Maps for Robot Navigation

  • GradientSpaces/WildGS-SLAM - [CVPR 2025] WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments

  • SakanaAI/AI-Scientist-v2 - The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

  • zd11024/NaviLLM - [CVPR 2024] The code for paper 'Towards Learning a Generalist Model for Embodied Navigation'

  • LlamaFamily/Llama-Chinese - Llama中文社区,实时汇总最新Llama学习资料,构建最好的中文Llama大模型开源生态,完全开源可商用

  • RoboVerseOrg/RoboVerse - RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning

  • yuancaimaiyi/collaborationSfM - 众包SfM

  • hanruihua/NeuPAN - [TRO 2025] NeuPAN: Direct Point Robot Navigation with End-to-End Model-based Learning.

  • NVIDIAGameWorks/kaolin - A PyTorch Library for Accelerating 3D Deep Learning Research

  • wyf3/llm_related - 复现大模型相关算法及一些学习记录

  • rvp-group/Splat-LOAM - [ICCV 25] 2D Gaussian Splatting based LiDAR Odometry And Mapping

  • MAC-VO/MAC-VO - [ICRA 2025 Best Paper] MAC-VO: Metrics-aware Covariance for Learning-based Stereo Visual Odometry

  • ffrivera0/reloc3r - [CVPR 2025] Relative camera pose estimation and visual localization with Reloc3r

  • lpiccinelli-eth/UniK3D - [CVPR 2025] UniK3D: Universal Camera Monocular 3D Estimation

  • rerun-io/pi0-lerobot -

  • yzqin/dexmv-sim - DexMV: Imitation Learning for Dexterous Manipulation from Human Videos, ECCV 2022

  • LSXI7/MINIMA - [CVPR 2025] MINIMA: Modality Invariant Image Matching

  • om-ai-lab/VLM-R1 - Solve Visual Understanding with Reinforced VLMs

  • shengjun-zhang/GGN - [NeurIPS 2024] Gaussian Graph Network: Learning Efficient and Generalizable Gaussian Representations from Multi-view Images

  • mindverse/Second-Me - Train your AI self, amplify you, bridge the world

  • yanyan-li/4DGS-SLAM - Instead of removing dynamic objects as distractors and reconstructing only static environments, this paper proposes an efficient architecture that incrementally tracks camera poses and establishes the 4D Gaussian radiance fields in unknown scenarios by using a sequence of RGB-D images.

  • manycore-research/SpatialLM - [NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling

  • nv-tlabs/3dgrut - Ray tracing and hybrid rasterization of Gaussian particles

  • nianticlabs/ace - [CVPR 2023 - Highlight] Accelerated Coordinate Encoding (ACE): Learning to Relocalize in Minutes using RGB and Poses

  • sunfanyunn/LayoutVLM - Official code for "LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models" (CVPR 2025)

  • roomtour3d/roomtour3d-NaviLLM - [CVPR 2025] RoomTour3D - Geometry-aware, cheap and automatic data from web videos for embodied navigation

  • open-mmlab/OpenPCDet - OpenPCDet Toolbox for LiDAR-based 3D Object Detection.

  • PengYu-Team/GEODE_dataset - Extending the Robustness of LiDAR SLAM to Geometrically Degenerate Scenarios

  • PRBonn/kiss-slam - A LiDAR SLAM system that just works

  • VSLAM-LAB/VSLAM-LAB - A Comprehensive Framework for Visual SLAM Systems and Datasets

  • HCI-LMC/VLN-SUSA - [AAAI 2026] Official code for "Agent Journey Beyond RGB: Unveiling Hybrid Semantic-Spatial Environmental Representations for Vision-and-Language Navigation"

  • QVPR/Patch-NetVLAD - Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"

  • Xiaoming-Zhao/PointNav-VO - [ICCV 2021] Official implementation of "The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation"

  • jiachenzhu/DyT - Code release for DynamicTanh (DyT)

  • HKUDS/AI-Researcher - [NeurIPS2025] "AI-Researcher: Autonomous Scientific Innovation" -- A production-ready version: https://novix.science/chat

  • facebookresearch/vggt - [CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

  • LeeBY68/Hier-SLAM - 🌳 [ICRA'25] Hier-SLAM: Semantic Gaussian Splatting SLAM with Hierarchical Categorical Representation

  • graphdeco-inria/hierarchical-3d-gaussians - Official implementation of the SIGGRAPH 2024 paper "A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets"

  • ali-vilab/MangaNinjia - [CVPR 2025 Highlight] Official implementation of "MangaNinja: Line Art Colorization with Precise Reference Following"

  • FoundationVision/GLEE - [CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

  • InternRobotics/EmbodiedScan - [CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

  • ikaijua/Awesome-AITools - Collection of AI-related utilities. Welcome to submit pull requests /收藏AI相关的实用工具,欢迎提交pull requests

  • NVlabs/FoundationStereo - [CVPR 2025 Best Paper Nomination] FoundationStereo: Zero-Shot Stereo Matching

  • THU-MIG/yoloe - YOLOE: Real-Time Seeing Anything [ICCV 2025]

  • NVlabs/curobo - CUDA Accelerated Robot Library

  • whu-lyh/SaliencyI2PLoc - Official code of SaliencyI2PLoc

  • robot-learning-freiburg/LCDNet - PyTorch code for training LCDNet for loop closure detection in LiDAR SLAM. http://rl.uni-freiburg.de/research/lidar-slam-lc

  • MarSaKi/VLN-BEVBert - [ICCV 2023} Official repo of "BEVBert: Multimodal Map Pre-training for Language-guided Navigation"

  • crepuscularlight/SemanticLoopClosure - Master thesis regarding semantic loop closure

  • Ghiara/LEGION - Official implementation of paper on Nature Machine Intelligence: "Preserving and Combining Knowledge in Robotic Lifelong Reinforcement Learning"

  • OpenHands/OpenHands - 🙌 OpenHands: AI-Driven Development

  • Zhefan-Xu/isaac-go2-ros2 - Unitree Go2 simulation platform for testing navigation, decision-making and autonomous tasks. (NVIDIA Isaac/ROS2)

  • YicongHong/Recurrent-VLN-BERT - Code of the CVPR 2021 Oral paper: A Recurrent Vision-and-Language BERT for Navigation

  • gaoxiangjun/Mani-GS - [CVPR' 2025'] Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh

  • convexsplatting/convex-splatting - [CVPR 2025 - Highlight] Original implementation of "3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes"

  • jzhzhang/NaVid-VLN-CE - [RSS 2024 & RSS 2025] VLN-CE evaluation code of NaVid and Uni-NaVid

  • jacobkrantz/VLN-CE - Vision-and-Language Navigation in Continuous Environments using Habitat

  • JeffLIrion/python-graphslam - Graph SLAM solver in Python

  • Stability-AI/generative-models - Generative Models by Stability AI

  • vdorbala/LGX - Code for LGX (Language Guided Exploration). We use LLMs to perform embodied robot navigation in a zero-shot manner.

  • PKU-VCL-3DV/SLAM3R - [CVPR 2025 Highlight] Real-time dense scene reconstruction with SLAM3R

  • fanegg/Feat2GS - [CVPR2025] Feat2GS: Probing Visual Foundation Models with Gaussian Splatting

  • MrZihan/Sim2Real-VLN-3DFF - Official implementation of Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation (CoRL'24).

  • csiro-robotics/Pair-VPR - [IEEE RA-L 2025] The official repository for Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers

  • facebookresearch/fast3r - [CVPR 2025] Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

  • XiaohanLei/GaussNav - PyTorch implementation of paper: GaussNav: Gaussian Splatting for Visual Navigation

  • dmar-bonn/active-gs - [RA-L2025] ActiveGS: Active Scene Reconstruction Using Gaussian Splatting

  • WU-CVGL/Omni-Scene - [CVPR2025] Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction

  • liw95/LightLoc - [CVPR2025] LightLoc: Learning Outdoor LiDAR Localization at Light Speed

  • hzxie/GaussianCity - The official implementation of "GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation". (CVPR 2025)

  • showlab/ShowUI - [CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

  • iris0329/SeeGround - [CVPR'25] SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

  • pengwangucla/DeLS-3D - The code for DeLS-3D of CVPR 2018

  • rpng/calc - Convolutional Autoencoder for Loop Closure

  • CASIA-LMC-Lab/FastSAM - Fast Segment Anything

  • HKUST-Aerial-Robotics/SG-Reg - [T-RO 2025] SG-Reg: Generalizable and Efficient Scene Graph Registration

  • rmurai0610/MASt3R-SLAM - [CVPR 2025] MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors

  • AirVLN/AirVLN -

  • xuxw98/ESAM - [ICLR 2025, Oral] EmbodiedSAM: Online Segment Any 3D Thing in Real Time

  • GeLuzhou/Dynamic-GSG - [IROS 25] Dynamic 3D Gaussian Scene Graphs for Environment Adaptation

  • fishmarch/ROSTools -

  • sair-lab/AirCode - [RA-L 2022] AirCode: A Robust Object Encoding Method

  • jingyaogong/minimind - 🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

  • url-kaist/MambaGlue - MambaGlue: Fast and Robust Local Feature Matching With Mamba @ ICRA'25

  • fraunhoferhhi/AT-GS - Adaptive and Temporally Consistent Gaussian Surfels for Multi-view Dynamic Reconstruction

  • sunsmarterjie/yolov12 - [NeurIPS 2025] YOLOv12: Attention-Centric Real-Time Object Detectors

  • HKUDS/GraphGPT - [SIGIR'2024] "GraphGPT: Graph Instruction Tuning for Large Language Models"

  • luigifreda/pyslam - pySLAM is a hybrid Python/C++ Visual SLAM pipeline supporting monocular, stereo, and RGB-D cameras. It provides a broad set of modern local and global feature extractors, multiple loop-closure strategies, a volumetric reconstruction module, integrated depth-prediction models, and semantic segmentation capabilities for enhanced scene understanding.

  • wangyizhao/PRIOR-SLAM - PRIOR-SLAM: Enabling Visual SLAM for Loop Closure under Large Viewpoint Variations

  • Vision-CAIR/MiniGPT-4 - Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

  • yuanzhoulvpi2017/vscode_debug_transformers -

  • DavideCatto/XFeat-ONNX -

  • cvg/limap - A toolbox for mapping and localization with line features.

  • BJHYZJ/DovSG - [RA-L 2025] Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation

  • yang-zj1026/NaVILA-Bench - Vision-Language Navigation Benchmark in Isaac Lab

  • CUT3R/CUT3R - Official implementation of Continuous 3D Perception Model with Persistent State

  • QwenLM/Qwen3 - Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

  • LYX0501/InstructNav -

  • huggingface/open-r1 - Fully open reproduction of DeepSeek-R1

  • fudan-zvg/DG-SLAM - [NeurIPS 2024] DG-SLAM: Robust Dynamic Gaussian Splatting SLAM with Hybrid Pose Optimization

  • open-webui/open-webui - User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

  • deepseek-ai/DeepSeek-Coder - DeepSeek Coder: Let the Code Write Itself

  • Irvingao/Point-DETR3D - [AAAI 2024] Point-DETR3D: Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-Supervised 3D Object Detection

  • HaoyiZhu/SPA - [ICLR 2025] SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

  • zezhishao/DailyArXiv - Daily ArXiv Papers.

  • microsoft/MoGe - [CVPR'25 Oral] MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision

  • NVlabs/InstantSplat - InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds

  • Nanne/pytorch-NetVlad - Pytorch implementation of NetVlad including training on Pittsburgh.

  • VITA-Group/MM3DGS-SLAM - [IROS 2024] MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements

  • hmz-15/Interactive-Predicate-Learning - InterPreT: Interactive Predicate Learning from Language Feedback for Generalizable Task Planning (RSS 2024)

  • google-deepmind/mujoco_menagerie - A collection of high-quality models for the MuJoCo physics engine, curated by Google DeepMind.

  • GarlanLou/LF-GNSS - LF-GNSS: A Fundamental Framework for Exploring Learning and Filtering Integration in GNSS

  • Aceinna/gnss-ins-sim - Open-source GNSS + inertial navigation, sensor fusion simulator. Motion trajectory generator, sensor models, and navigation

  • modelscope/FunClip - Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

  • naver/mast3r - Grounding Image Matching in 3D with MASt3R

  • opendilab/LightZero - [NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios (awesome MCTS)

  • naver/dust3r - DUSt3R: Geometric 3D Vision Made Easy

  • OpenDriveLab/AgiBot-World - [IROS 2025 Best Paper Award Finalist & IEEE TRO 2026] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems

  • facebookresearch/DiT - Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

  • RoboTwin-Platform/RoboTwin - RoboTwin 2.0 Offical Repo

  • deepseek-ai/DeepSeek-V3 -

  • MachinePerceptionLab/QQ-SLAM -

  • YvanYin/Metric3D - The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."

  • modelscope/ms-agent - MS-Agent: a lightweight framework to empower agentic execution of complex tasks

  • Genesis-Embodied-AI/Genesis - A generative world for general-purpose robotics & embodied AI learning.

  • ZexinHe/Neural-LightRig - [CVPR2025] Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion

  • gramuah/ros4vsn - Evaluation of Visual Semantic Navigation Models in Real Robots

  • noodle-lab/GaussianSpa - Project website: https://noodle-lab.github.io/gaussianspa/

  • PDFMathTranslate/PDFMathTranslate - [EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero

  • myhhub/stock - stock股票.获取股票数据,计算股票指标,筹码分布,识别股票形态,综合选股,选股策略,股票验证回测,股票自动交易,支持PC及移动设备。

  • ayoussf/SuperPoint-PrP -

  • PKU-YuanGroup/Open-Sora-Plan - This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

  • hustvl/DiffusionDrive - [CVPR 2025 Highlight] Truncated Diffusion Model for Real-Time End-to-End Autonomous Driving

  • ispc-lab/HRegNet - [ICCV 2021] HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration

  • nerfstudio-project/gsplat - CUDA accelerated rasterization of gaussian splatting

  • ranahanocka/point2mesh - Reconstruct Watertight Meshes from Point Clouds [SIGGRAPH 2020]

  • TianxingChen/G3Flow - [CVPR 25] G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation

  • TheBlewish/Automated-AI-Web-Researcher-Ollama - A python program that turns an LLM, running on Ollama, into an automated researcher, which will with a single query determine focus areas to investigate, do websearches and scrape content from various relevant websites and do research for you all on its own! And more, not limited to but including saving the findings for you!

  • facebookresearch/neuralfeels - Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation

  • ori-drs/oxford_spires_dataset - [IJRR 2025] Lidar-visual dataset with ground truth 3D map for SLAM/NeRF

  • blazzbyte/OpenInterpreterUI - Simplify code execution with Open Interpreter UI Project with Streamlit. A user-friendly GUI for Python, JavaScript, and more. Pay-as-you-go, no subscriptions. Ideal for beginners.

  • fudan-zvg/gaussian-raytracing -

  • ChenYutongTHU/SplatFormer - [ICLR' 25] SplatFormer: Point Transformer for Robust 3D Gaussian Splatting

  • akawincent/ZED-data-collector - In this project, ZED camera is used to extract image, IMU, pose data and convert them into a dataset format as ground truth for evaluation of other SLAM systems

  • Parskatt/RoMa - [CVPR 2024] RoMa: Robust Dense Feature Matching; RoMa is the robust dense feature matcher capable of estimating pixel-dense warps and reliable certainties for almost any image pair.

  • microsoft/autogen - A programming framework for agentic AI

  • KwanWaiPang/Gaussian-SLAM_comment - Gaussian-SLAM的中文注释

  • open-mmlab/mmdetection - OpenMMLab Detection Toolbox and Benchmark

  • InternRobotics/VLM-Grounder - [CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding

  • GREAT-WHU/GREAT-Dataset -

  • VladimirYugay/Gaussian-SLAM - Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting

  • MisEty/RTG-SLAM - RTG-SLAM: Real-time 3D Reconstruction at Scale Using Gaussian Splatting (ACM SIGGRAPH 2024)

  • Tencent-Hunyuan/Tencent-Hunyuan-Large -

  • cvg/NoPoSplat - [ICLR'25 Oral] No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images

  • megvii-research/MCTrack - [IROS2025]This is the offical implementation of the paper "MCTrack: A Unified 3D Multi-Object Tracking Framework for Autonomous Driving"

  • nv-tlabs/SCube - [NeurIPS 2024] SCube: Instant Large-Scale Scene Reconstruction using VoxSplats

  • princeton-vl/RAFT-Stereo -

  • ChenHoy/DROID-Splat - End-to-End SLAM with camera calibration, monocular prior integration and dense Rendering

  • openinterpreter/open-interpreter - A natural language interface for computers

  • robot-learning-freiburg/CL-SLAM - Continual SLAM: Beyond Lifelong Simultaneous Localization and Mapping through Continual Learning. http://continual-slam.cs.uni-freiburg.de

  • ywyeli/Place3D - [NeurIPS'24 Spotlight] Is Your LiDAR Placement Optimized for 3D Scene Understanding?

  • TommyZihao/openvino_tonypi - 基于OpenVINO,本地部署大模型智能体Agent,控制TonyPi人形机器人

  • donydchen/mvsplat - 🌊 [ECCV'24 Oral] MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

  • songw-zju/LiDAR2Map - The official implementation of "LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation" (CVPR 2023)

  • zhaihongjia/SplatLoc - [TVCG 2025] SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality

  • NVIDIA/TensorRT-LLM - TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

  • NanmiCoder/MediaCrawler - 小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫

  • microsoft/BitNet - Official inference framework for 1-bit LLMs

  • hkchengrex/Cutie - [CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation

  • 520xyxyzq/3DGS-CD - 3DGS-based change detection for physical object rearrangement

  • facebookresearch/lingua - Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

  • google/nerfies - This is the code for Deformable Neural Radiance Fields, a.k.a. Nerfies.

  • Owen718/LongPrompt-LLamaGen - This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompts. And it's also powered by additional prompt refining features for improved performance.

  • openai/improved-diffusion - Release for Improved Denoising Diffusion Probabilistic Models

  • RuijieZhu94/MotionGS - [NeurIPS 2024] MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting

  • hzy46/Deep-Learning-21-Examples - 《21个项目玩转深度学习———基于TensorFlow的实践详解》配套代码

  • StanfordVL/3DSceneGraph - The data skeleton from "3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera" http://3dscenegraph.stanford.edu

  • cvg/depthsplat - [CVPR'25] DepthSplat: Connecting Gaussian Splatting and Depth

  • HKUDS/LightRAG - [EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"

  • princeton-vl/DROID-SLAM -

  • Nightmare-n/DepthAnyVideo - Depth Any Video with Scalable Synthetic Data (ICLR 2025)

  • linyicheng1/EdgePoint - EdgePoint: Learning Efficient Keypoint Extraction and Description for Edge Devices

  • minwoo0611/HeLiOS - [ICRA2025] HeLiOS: Heterogeneous LiDAR Place Recognition

  • uzh-rpg/bflow - Official implementation of "Dense Continuous-Time Optical Flow from Event Cameras"

  • VITA-Group/LightGaussian - [NeurIPS 2024 Spotlight]"LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS", Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang

  • uzh-rpg/deep_ev_tracker - Repository relating to "Data-driven Feature Tracking for Event Cameras" (CVPR, 2023, Award Candidate) and "Data-driven Feature Tracking for Event Cameras with and without Frames" (T-PAMI 2025)

  • IRMVLab/DVLO - [ECCV 2024 Oral] DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment

  • QiZS-BIT/GSPR - [IEEE IROS'25] GSPR: Multimodal Place Recognition using 3D Gaussian Splatting for Autonomous Driving

  • hustvl/osp - [ECCV 2024] Occupancy as Set of Points

  • HuangJunJie2017/BEVDet - Code base of the BEVDet series .

  • city-super/Octree-AnyGS - Octree-GS

  • buaacyw/MeshAnythingV2 - [ICCV 2025] From anything to mesh like human artists. Official impl. of "MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization"

  • roboflow/supervision - We write your reusable computer vision tools. 💜

  • yifanlu0227/ChatSim - [CVPR2024 Highlight] Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration

  • cjy1992/interp-e2e-driving - Interpretable End-to-end Urban Autonomous Driving with Latent Deep Reinforcement Learning

  • Robertwyq/PanoOcc - [CVPR 2024] PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation

  • TheAlgorithms/Python - All Algorithms implemented in Python

  • morrisfl/UniFEx - Framework for computationally efficient training of universal image feature extraction models.

  • PeidongLi/SSR - [ICLR 2025] The official implementation of SSR

  • qintonguav/ParkingE2E -

  • hanyangyu1021/LMGaussian - official implementation of LM-Gaussian

  • pytorch/pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration

  • eth-ait/GaussianHaircut - Gaussian Haircut: Human Hair Reconstruction with Strand-Aligned 3D Gaussians

  • pyg-team/pytorch-frame - Tabular Deep Learning Library for PyTorch

  • jkulhanek/wild-gaussians - [NeurIPS'24] WildGaussians: 3D Gaussian Splatting In the Wild

  • bassamlab/SigmaRL - SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning

  • DLR-MI/UTrack - Multi-Object Tracking with Uncertain Detections [ECCV 2024 UnCV]

  • stanfordnlp/dspy - DSPy: The framework for programming—not prompting—language models

  • TempleRAIL/drl_vo_nav - [T-RO 2023] DRL-VO: Learning to Navigate Through Crowded Dynamic Scenes Using Velocity Obstacles

  • lucasbrynte/gasfm - Implementation of the CVPR 2024 paper "Learning Structure-from-Motion with Graph Attention Networks".

  • SPengLiang/OccupancyM3D - [CVPR 2024] Learning Occupancy for Monocular 3D Object Detection

  • zhangganlin/GlORIE-SLAM - GlORIE-SLAM: Globally Optimized RGB-only Implicit Encoding Point Cloud SLAM

  • jbriales/rgbd_benchmark_tools - Tools for TUM RGBD Dataset Benchmark

  • yastrebksv/TennisProject - Tennis analysis using deep learning and machine learning

  • cvg/GeoCalib - GeoCalib: Learning Single-image Calibration with Geometric Optimization (ECCV 2024)

  • NVIDIA/TransformerEngine - A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

  • lus6-Jenny/RING - [IEEE T-RO 2023] Source code of RING and RING++ for loop closure detection in LiDAR SLAM.

  • hacksider/Deep-Live-Cam - real time face swap and one-click video deepfake with only a single image

  • GANWANSHUI/GaussianOcc - (ICCV 2025) GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting

  • GradientSpaces/LoopSplat - [3DV 2025, Oral] LoopSplat: Loop Closure by Registering 3D Gaussian Splats

  • zhaofuq/LOD-3DGS - LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian(Published in SIGGRAPH Asia 2024)

  • hjr37/CP-SLAM - CP-SLAM: Collaborative Neural Point-based SLAM

  • cvg/nicer-slam - [3DV'24 Best Paper Honorable Mention] NICER-SLAM: Neural Implicit Scene Encoding for RGB SLAM

  • lpiccinelli-eth/UniDepth - Universal Monocular Metric Depth Estimation

  • JeongminB/E-D3DGS - [ECCV 2024] Official repository for "Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting"

  • spla-tam/SplaTAM - SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM (CVPR 2024)

  • sparolab/SOLiD - SOTA LiDAR Global Descriptor in LiDAR Place Recognition (accepted in RA-L'24 w/ ICRA'25)

  • IPNL-POLYU/UrbanNavDataset - UrbanNav:An Open-sourced Multisensory Dataset for Benchmarking Positioning Algorithms Designed for Urban Areas

  • YuxueYang1204/TrimGS - Trim 3D Gaussian Splatting for Accurate Geometry Representation

  • open-mmlab/mmtracking - OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.

  • huggingface/transformers - 🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

  • openai/openai-python - The official Python library for the OpenAI API

  • llmbev/talk2bev - Talk2BEV: Language-Enhanced Bird's Eye View Maps (ICRA'24)

  • liuyuan-pal/SyncDreamer - [ICLR 2024 Spotlight] SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

  • fudan-zvg/4d-gaussian-splatting - [ICLR 2024] Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting

  • eriksandstroem/Loopy-SLAM -

  • qinzheng93/GeoTransformer - [CVPR2022] Geometric Transformer for Fast and Robust Point Cloud Registration

  • yanyan-li/GeoGaussian - GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering

  • Parskatt/DeDoDe - [3DV 2024 Oral] DeDoDe 🎶 Detect, Don't Describe --- Describe, Don't Detect, for Local Feature Matching

  • ericzzj1989/BALF - [WACV 2024] BALF: Simple and Efficient Blur Aware Local Feature Detector

  • lyakaap/NetVLAD-pytorch - PyTorch implementation of NetVLAD & Online Hardest Triplet Loss.

  • xiaobiaodu/DreamCar - [RA-L 2024] DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction

  • nianticlabs/acezero - [ECCV 2024 - Oral] ACE0 is a learning-based structure-from-motion approach that estimates camera parameters of sets of images by learning a multi-view consistent, implicit scene representation.

  • meta-llama/llama-models - Utilities intended for use with Llama models.

  • cs230-stanford/cs230-code-examples - Code examples in pyTorch and Tensorflow for CS230

  • ddbourgin/numpy-ml - Machine learning, in numpy

  • tarashakhurana/4d-occ-forecasting - CVPR 2023: Official code for `Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting'

  • uoip/stereo_msckf - Python implementation of Multi-State Constraint Kalman Filter (MSCKF) for Vision-aided Inertial Navigation.

  • fundamentalvision/BEVFormer - [ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.

  • NVlabs/FB-BEV - Official PyTorch implementation of FB-BEV & FB-OCC - Forward-backward view transformation for vision-centric autonomous driving perception

  • OpenDriveLab/OccNet - [ICCV 2023] OccNet: Scene as Occupancy

  • Tsinghua-MARS-Lab/Occ3D -

  • ViewFormerOcc/ViewFormer-Occ - [ECCV 2024] ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers

  • MCG-NJU/SparseOcc - [ECCV 2024] Fully Sparse 3D Occupancy Prediction & RayIoU Evaluation Metric

  • VISION-SJTU/SparseOcc - Official implementation for 'SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction' (CVPR 2024)

  • weiyithu/SurroundOcc - [ICCV 2023] SurroundOcc: Multi-camera 3D Occupancy Prediction for Autonomous Driving

  • autonomousvision/occupancy_networks - This repository contains the code for the paper "Occupancy Networks - Learning 3D Reconstruction in Function Space"

  • SY-007-Research/3dgs_render_python -

  • Ferry-Li/SI-SOD - ICML2024: Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection

  • Ferry-Li/SI_Metric - A portable computation of Size-Invariant Metrics for ICML2024: Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection

  • autonomousvision/mip-splatting - [CVPR'24 Best Student Paper] Mip-Splatting: Alias-free 3D Gaussian Splatting

  • Vincentqyw/image-matching-webui - 🤗 image matching webui

  • LiheYoung/Depth-Anything - [CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation

  • microsoft/graphrag - A modular graph-based Retrieval-Augmented Generation (RAG) system

  • rvp-group/vbr-devkit - Vision Benchmark in Rome Development Kit

  • utiasSTARS/pykitti - Python tools for working with KITTI data.

  • huang-yh/GaussianFormer - [ECCV 2024] Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

  • TQTQliu/MVSGaussian - [ECCV 2024] MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo

  • buaacyw/MeshAnything - [ICLR 2025] From anything to mesh like human artists. Official impl. of "MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers"

  • swc-17/SparseDrive - SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation

  • minghanqin/LangSplat - Official implementation of the paper "LangSplat: 3D Language Gaussian Splatting" [CVPR2024 Highlight]

  • Xinyu-Yi/TransPose - A real-time motion capture system that estimates poses and global translations using only 6 inertial measurement units

  • Awesome3DGS/3D-Gaussian-Splatting-Papers - 3D高斯论文,持续更新,欢迎交流讨论。

  • JonathonLuiten/Dynamic3DGaussians -

  • cvg/glue-factory - Training library for local feature detection and matching

  • cvg/LightGlue - LightGlue: Local Feature Matching at Light Speed (ICCV 2023)

  • muskie82/MonoGS - [CVPR'24 Highlight & Best Demo Award] Gaussian Splatting SLAM

  • lukas-blecher/LaTeX-OCR - pix2tex: Using a ViT to convert images of equations into LaTeX code.

  • tjiiv-cprg/EPro-PnP - [CVPR 2022 Best Student Paper] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

  • ChiWeiHsiao/DeepVO-pytorch - PyTorch Implementation of DeepVO

  • cvg/nice-slam - [CVPR'22] NICE-SLAM: Neural Implicit Scalable Encoding for SLAM

  • yanyan-li/SLAM-BOOK - 这是一本关于SLAM的书稿,希望能清楚的介绍SLAM系统中的使用的几何方法和深度学习方法。书稿最后应该会达到200页左右,书稿每章对应的代码也会被整理出来。

  • Shiaoming/Python-VO - A simple python implemented frame-by-frame visual odometry with SuperPoint feature detector and SuperGlue feature matcher.

  • openxrlab/xrdslam - Platform for Deep Learning based SLAM

miscellaneous

Jupyter Notebook

  • duoan/TorchCode - 🔥 LeetCode for PyTorch — practice implementing softmax, attention, GPT-2 and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online.

  • Infrasys-AI/AISystem - AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

  • leggedrobotics/navitrace_evaluation -

  • qiuzh20/gated_attention - The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

  • QwenLM/Qwen3-Omni - Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

  • QwenLM/Qwen2.5-Omni - Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

  • IDEA-Research/Grounded-SAM-2 - Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2

  • nv-tlabs/GEN3C - [CVPR 2025 Highlight] GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control

  • LaVi-Lab/VG-LLM - The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'

  • facebookresearch/dinov3 - Reference PyTorch implementation and models for DINOv3

  • facebookresearch/dinov2 - PyTorch code and models for the DINOv2 self-supervised learning method.

  • InternRobotics/InternNav - InternRobotics' open platform for building generalized navigation foundation models.

  • HeegerGao/FLIP - Code for FLIP: Flow-Centric Generative Planning for General-Purpose Manipulation Tasks

  • datawhalechina/easy-rl - 强化学习中文教程(蘑菇书🍄),在线阅读地址:https://datawhalechina.github.io/easy-rl/

  • RL4VLM/RL4VLM - Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

  • ByteDance-Seed/Seed1.5-VL - Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

  • Robotics-STAR-Lab/DynamicPose - [IROS 2025] DynamicPose: Real-time and Robust 6D Object Pose Tracking for Fast-Moving Cameras and Objects

  • microsoft/ai-agents-for-beginners - 12 Lessons to Get Started Building AI Agents

  • NVIDIA/Isaac-GR00T - NVIDIA Isaac GR00T N1.6 - A Foundation Model for Generalist Robots.

  • google-gemini/gemini-fullstack-langgraph-quickstart - Get started with building Fullstack Agents using Gemini 2.5 and LangGraph

  • datawhalechina/happy-llm - 📚 从零开始构建大模型

  • Liuziyu77/Visual-RFT - Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’

  • bagh2178/SG-Nav - [NeurIPS 2024] SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation

  • facebookresearch/EdgeTAM - [CVPR 2025] Official PyTorch implementation of "EdgeTAM: On-Device Track Anything Model"

  • zhanshijinwat/Steel-LLM - Train a 1B LLM with 1T tokens from scratch by personal

  • facebookresearch/co-tracker - CoTracker is a model for tracking any point (pixel) on a video.

  • arclab-hku/DEIO - (ICCV2025) Learning-based Event-Inertial Odometry

  • mit-acl/dynus -

  • IDEA-Research/Grounded-Segment-Anything - Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

  • facebookresearch/segment-anything - The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

  • xinyu1205/recognize-anything - Open-source and strong foundation image recognition models.

  • CompVis/depth-fm - [AAAI 2025, Oral] DepthFM: Fast Monocular Depth Estimation with Flow Matching

  • luhengshiwo/LLMForEverybody - 每个人都能看懂的大模型知识分享,LLMs春/秋招大模型面试前必看,让你和面试官侃侃而谈

  • luohongk/Embodied-Navigation - 关于Embodied-Navigation的仓库,主要用于整理我在定位,感知,规控,3D Vision, VLN中的部分知识

  • GAP-LAB-CUHK-SZ/gaustudio - A Modular Framework for 3D Gaussian Splatting and Beyond

  • HCPLab-SYSU/LH-VLN - Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method (CVPR-25)

  • HandsOnLLM/Hands-On-Large-Language-Models - Official code repo for the O'Reilly Book - "Hands-On Large Language Models"

  • CurryYuan/ZSVG3D - [CVPR 2024] Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

  • datawhalechina/tiny-universe - 《大模型白盒子构建指南》:一个全手搓的Tiny-Universe

  • robot-pesg/BotanicGarden - BotanicGarden: A high-quality dataset for robot navigation in unstructured natural environments

  • QwenLM/Qwen3-VL - Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

  • DjangoPeng/LLM-quickstart - Quick Start for Large Language Models (Theoretical Learning and Practical Fine-tuning) 大语言模型快速入门(理论学习与微调实战)

  • zju3dv/LoFTR - Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021, T-PAMI 2022

  • hesamsheikh/ml-retreat - Machine Learning Journal for Intermediate to Advanced Topics.

  • DataExpert-io/data-engineer-handbook - This is a repo with links to everything you'd ever want to learn about data engineering

  • florinshen/FlashSplat - [ECCV2024] [3DV Nectar 2025] FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally

  • yzslab/gaussian-splatting-lightning - A 3D Gaussian Splatting framework with various derived algorithms and an interactive web viewer

  • CyberOrigin2077/Cyber - This repo is designed for General Robotic Operation System

  • Tencent-Hunyuan/HunyuanDiT - Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

  • microsoft/OmniParser - A simple screen parsing tool towards pure vision based GUI agent

  • TommyZihao/vlm_arm - 机械臂+大模型+多模态=人机协作具身智能体

  • cumtcssuld/RSP_of_CUMTCS - 【矿大计算机学院资源共享计划(Resource SharingPlan of CUMTCS)】本仓库由矿大计算机学院学生会学习部牵头维护,由计算机学院全体同学共建共享。欢迎大家积极的参加到本资源库的建设中来吧!(每当有重大更新,我们都会将整个库克隆到码云,点击下边链接,到我们的码云仓库可以获得更好的下载体验)

  • ut-amrl/ObVi-SLAM - Long-Term Object Visual SLAM

  • Infrasys-AI/AIInfra - AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。

  • CompVis/stable-diffusion - A latent text-to-image diffusion model

  • AnyLoc/Revisit-Anything - Code release for Revisit Anything: Visual Place Recognition via Image Segment Retrieval (ECCV 2024)

  • be2rlab/gsplatloc - [IROS 2025] GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization

  • TommyZihao/Train_Custom_Dataset - 标注自己的数据集,训练、评估、测试、部署自己的人工智能算法

  • isl-org/ZoeDepth - Metric depth estimation from a single image

  • datawhalechina/leedl-tutorial - 《李宏毅深度学习教程》(李宏毅老师推荐👍,苹果书🍎),PDF下载地址:https://github.com/datawhalechina/leedl-tutorial/releases

  • Fafa-DL/Lhy_Machine_Learning - 李宏毅2021/2022/2023春季机器学习课程课件及作业

  • yubaoliu/RDS-SLAM - DS-SLAM: Real-Time Dynamic SLAM Using Semantic Segmentation Methods

  • SakanaAI/AI-Scientist - The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑‍🔬

  • datawhalechina/self-llm - 《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程

  • facebookresearch/sam2 - The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

  • hustvl/4DGaussians - [CVPR 2024] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

  • heucoder/ML-DL_book - 机器学习、深度学习一些个人认为不错的书籍。

  • verlab/accelerated_features - Implementation of XFeat (CVPR 2024). Do you need robust and fast local feature extraction? You are in the right place!

TypeScript

  • openclaw/openclaw - Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

  • Molunerfinn/PicGo - 🚀 The Ultimate Image Uploader for Efficient Creators. Supports Obsidian, Typora, VS Code etc. and 60+ image hosting services (S3, GitHub, Cloudflare R2, Imgur, Aliyun OSS...). Paste, upload, done.

  • zimya/zhihu_obsidian - Zhihu on Obsidian | 知乎 Obsidian 插件

  • OpenCut-app/OpenCut - The open-source CapCut alternative

  • chanhx/crabviz - Generate interactive call graphs for various languages

  • plait-board/drawnix - 开源白板工具(SaaS),一体化白板,包含思维导图、流程图、自由画等。All in one open-source whiteboard tool with mind, flowchart, freehand and etc.

  • shareAI-lab/learn-claude-code - Bash is all you need - A nano claude code–like 「agent harness」, built from 0 to 1

  • google-gemini/gemini-cli - An open-source AI agent that brings the power of Gemini directly into your terminal.

  • n8n-io/n8n - Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.

  • 123xiao/sex-agreement-app - X行为同意协议系统

  • microsoft/vscode - Visual Studio Code

  • mastra-ai/mastra - From the team behind Gatsby, Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack.

  • Binaryify/OneDark-Pro - Atom's iconic One Dark theme for Visual Studio Code

  • MegaScenes/web-viewer - web viewer for 3d reconstructions

  • coaidev/coai - 🚀 Next Generation Multi-tenant AI One-Stop Solution. Builtin Admin & Billing System. Enterprise-Grade Unified LLM Gateway Support for 200+ Models And 35+ Providers, Load Balacing w/ Priority-base Routing, Cost Management, Chat Share, Cloud Sync, Credit/Subscription Billing, All File Parsing, Web Search, Built-in Model Cache.

  • Eugeny/tabby - A terminal for a more modern age

  • amir9480/vscode-cpp-helper - vscode extension to create implementation for c++ function prototypes.

  • hcengineering/platform - Huly — All-in-One Project Management Platform (alternative to Linear, Jira, Slack, Notion, Motion)

  • conwnet/github1s - One second to read GitHub code with VS Code.

  • ocsjs/ocsjs - OCS 网课助手,刷课脚本,网课脚本,帮助大学生解决网课难题,支持【超星学习通】【知道智慧树】【职教云】【智慧职教】【中国大学MOOC】等网课 , 可以在 脚本猫 以及 油猴 等开源脚本管理器下运行。

  • immich-app/immich - High performance self-hosted photo and video management solution.

  • clash-verge-rev/clash-verge-rev - A modern GUI client based on Tauri, designed to run in Windows, macOS and Linux for tailored proxy experience

CSS

CMake

Swift

  • caol64/wenyan - 文颜- Markdown文章排版美化工具,支持微信公众号、今日头条、知乎等平台。

  • jordanbaird/Ice - Powerful menu bar manager for macOS

  • lwouis/alt-tab-macos - Windows alt-tab on macOS

  • exelban/stats - macOS system monitor in your menu bar

  • ejbills/DockDoor - Window peeking, alt-tab and other enhancements for macOS

  • Caldis/Mos - 一个用于在 macOS 上平滑你的鼠标滚动效果或单独设置滚动方向的小工具, 让你的滚轮爽如触控板 | A lightweight tool used to smooth scrolling and set scroll direction independently for your mouse on macOS

  • gao-sun/eul - 🖥️ macOS status monitoring app written in SwiftUI.

Kotlin

  • pppscn/SmsForwarder - 短信转发器——监控Android手机短信、来电、APP通知,并根据指定规则转发到其他手机:钉钉群自定义机器人、钉钉企业内机器人、企业微信群机器人、飞书机器人、企业微信应用消息、邮箱、bark、webhook、Telegram机器人、Server酱、PushPlus、手机短信等。包括主动控制服务端与客户端,让你轻松远程发短信、查短信、查通话、查话簿、查电量等。(V3.0 新增)PS.这个APK主要是学习与自用,如有BUG请提ISSUE,同时欢迎大家提PR指正

BibTeX Style

Shell

Astro

HTML

C

  • JunweiLiang/xf_mic_asr_offline_junwei -

  • Robotics-STAR-Lab/ApexNav - [RA-L'25] An Reliable and Efficient Framework for Zero-Shot Object Navigation

  • PrideLab/PRIDE-PPPAR - An open‑source software for Multi-GNSS PPP ambiguity resolution

  • 0voice/algorithm-structure - 2021年最新总结 500个常用数据结构,算法,算法导论,面试常用,大厂高级工程师整理总结

  • kevin2431/Traj-LO - [RA-L 2024] In Defense of LiDAR-Only Odometry Using an Effective Continuous-Time Trajectory

  • rtklibexplorer/RTKLIB - A version of RTKLIB optimized for low cost GNSS receivers, especially u-blox receivers. It is based on RTKLIB 2.4.3. This software is provided “AS IS” without any warranties of any kind so please be careful, especially if using it in any kind of real-time application. Click on the "Releases" label below to see the latest Windows pre-release.

  • tomojitakasu/RTKLIB -

  • MichaelBeechan/PPP-RTK - SPP、RTD、PPP、RTK、PPP-RTK、RAIM、ARAIM et al

TeX

Rust

  • openai/harmony - Renderer for the harmony response format to be used with gpt-oss

  • prefix-dev/pixi - Powerful system-level package manager for Linux, macOS and Windows written in Rust – building on top of the Conda ecosystem.

  • rustdesk/rustdesk - An open-source remote desktop application designed for self-hosting, as an alternative to TeamViewer.

  • lapce/lapce - Lightning-fast and Powerful Code Editor written in Rust

  • typst/typst - A markup-based typesetting system that is powerful and easy to learn.

  • makeecat/Peng - A minimal quadrotor autonomy framework in Rust (Mac, Linux, Windows)

Go

Lua

  • ayamir/nvimdots - A well configured and structured Neovim.

  • gaboolic/rime-shuangpin-fuzhuma - 墨奇音形,打造最强双拼辅助码rime输入方案,让天下双拼用户人人用得上辅助码。基于雾凇-白霜词库,支持小鹤双拼、自然码双拼、搜狗双拼、微软双拼等多种双拼,辅助码支持墨奇码(原创拆分开源支持4万字)、自然码部首辅、小鹤音形(鹤形辅)等,支持双拼和辅助码之间排列组合,支持整句/字词输入。不认识的字可以笔画、部件拆字、仓颉码反查。支持aw、aj模式输入英文、日文,支持双拼并击输入、emoji、快符、日期、大写数字、计算器等高级功能。雾凇鹤|雾凇自然|墨奇码|墨奇音形

Vim Script

Cuda

Roff

Dockerfile

  • Anduin2017/HowToCook - 程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).

  • linyicheng1/Dockers - 一些常用的Dockerfile 文件,能够快速部署运行一些常用算法,避免重复配置环境

  • jaeseok4104/slam-docker - SLAM Docker for research

C#

  • Achuan-2/SlideSCI - PPT plugin, supports one-click to add image titles, copy and paste positions, one-click image alignment, and one-click to insert Markdown (including bold, hyperlinks, and other inline styles, as well as code blocks, LaTeX, and other block-level styles)! PPT插件,支持一键添加图片标题,复制粘贴位置、一键图片对齐、一键插入Markdown(加粗、超链接等行内样式、代码块、LaTeX等块级样式)、便捷导出图片!

  • 2dust/v2rayN - A GUI client for Windows, Linux and macOS, support Xray and sing-box and others

  • mahoshojo0805/ContestPrograms - 测绘技能大赛程序

MATLAB

Vue

Makefile

LLVM

  • llvm/llvm-project - The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

Matlab

  • Relja/netvlad - NetVLAD: CNN architecture for weakly supervised place recognition

Cython

SCSS

Dart

  • localsend/localsend - An open-source cross-platform alternative to AirDrop

  • chen08209/FlClash - A multi-platform proxy client based on ClashMeta,simple and easy to use, open-source and ad-free.

Markdown

Clojure

Java

  • krahets/hello-algo - 《Hello 算法》:动画图解、一键运行的数据结构与算法教程。支持简中、繁中、English、日本語,提供 Python, Java, C++, C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 等代码实现

About

我的star列表,每天自动更新

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • JavaScript 100.0%