A powerful, modern, browser-based data viewer that supports multiple formats (Parquet, Arrow, Avro, JSONL, ORC, Delta Lake, Iceberg) and connects directly to cloud storage (Azure, S3, GCS). Runs entirely in your browser with zero uploads required. Analyze and explore your data files privately and securely with advanced features and beautiful UI.
- 100% Private: All processing happens in your browser - no server uploads
- No Installation Required: Just open the web app and start analyzing
- Local Processing: All parsing and analysis happens on your device
- No Data Storage: Files are not saved or cached anywhere
- Direct Cloud Access: Connect to your cloud storage without proxies or backends
- Azure Data Lake Storage Gen2: SAS token authentication, multiple URL formats
- Amazon S3: Access key authentication, supports S3-compatible services
- Google Cloud Storage: OAuth token authentication, public bucket support
- Enterprise Ready: Temporary credentials, CORS-aware, secure browser access
- Advanced Schema Inspection: View column types, encodings, compression, and metadata
- Smart Data Preview: Browse through your data with intelligent pagination
- In-Place Data Editing: Click any cell to edit values with real-time modification tracking
- Real-time Search: Search across all data with instant filtering
- Column Sorting: Click any column header to sort data
- Data Statistics: Automatic calculation of null counts, unique values, and data types
- Performance Metrics: Track processing speed and memory usage
- View/Edit Mode Toggle: Switch between viewing and editing modes
- Drag & Drop Support: Simply drag data files onto the interface
- VSCode-Like Themes: Beautiful dark and light themes with proper contrast
- Responsive Design: Beautiful interface that works on all devices
- Collapsible Tree Views: Organized metadata display with expandable sections
- Enhanced Data Tables: Sticky headers, row numbers, and type-specific cell styling
- In-Place Editing: Click-to-edit data cells with modification tracking
- Progress Indicators: Real-time feedback during file processing
- Keyboard Shortcuts: Efficient navigation with keyboard commands
- Export Modified CSV: Save your data modifications as CSV files
- Export Original Data: Export as CSV or JSON formats
- Schema Export: Download schema definitions
- Smart Filename Generation: Automatic naming based on source file
- Large File Support: Handle files up to 500MB efficiently
- Lightning Fast: Powered by the lightweight hyparquet library
- Memory Optimized: Efficient handling of large datasets
- Streaming Processing: Progressive loading with status updates
- Browser Optimized: Tested across all modern browsers
Try it now: https://mjtpena.github.io/parquet-viewer
- File name, size, and format version
- Total rows and columns
- Row groups and compression info
- Processing performance metrics
- Column names, data types, and nullability
- Parquet-specific encodings and compression
- Repetition types and converted types
- Storage size analysis with compression ratios
- Paginated Data View: Navigate through large datasets efficiently
- Smart Search: Find data across all columns instantly
- Column Sorting: Sort by any column in ascending/descending order
- Type-Aware Display: Different styling for strings, numbers, booleans, nulls
- Row-by-Row Navigation: Jump to specific pages or use keyboard navigation
- Null value counts and percentages
- Data type distribution analysis
- Unique value counting
- Column-specific compression statistics
- 📂 Select or Drop a data file in any supported format (up to 500MB)
- ⚡ Automatic Processing with real-time progress updates
- 🔍 Explore Schema - inspect column types, encodings, and metadata in collapsible tree views
- 📊 Browse & Edit Data - search, sort, navigate, and edit data in-place
- 💾 Export Modified Data - download your modifications as CSV files
- 🔗 Connect to Cloud Storage - click the cloud button in the interface
- 🎯 Select Provider - choose Azure, S3, or Google Cloud Storage
- 📋 Enter Details - paste your storage URL and add credentials
- 🌐 Browse Files - navigate your cloud storage like a desktop app
- 📊 Analyze Data - click any data file to load and analyze instantly
- Pure Client-Side: Built with vanilla JavaScript (ES6 modules)
- Zero Dependencies: No frameworks or build processes required
- Single File: Everything in one HTML file for easy deployment
- Web Standards: Uses modern browser APIs for optimal performance
- Hyparquet v1.16.0: Fast, lightweight Parquet parser
- Apache Arrow: In-browser Arrow file processing
- AVSC: Avro schema and data processing
- Cloud APIs: Direct REST API integration (Azure, S3, GCS)
- No heavy frameworks: Keeps the application fast and secure
- Chrome 80+ ✅
- Firefox 80+ ✅
- Safari 14+ ✅
- Edge 80+ ✅
- File Size Limit: 500MB (browser memory dependent)
- Processing Speed: ~50,000-100,000 rows/second
- Memory Usage: ~2-3x file size during processing
- Supported Encodings: All standard Parquet encodings
- Compression Support: GZIP, Snappy, LZ4, ZSTD
URL formats supported:
• abfss://container@account.dfs.core.windows.net/path
• https://account.dfs.core.windows.net/container/path
• https://account.blob.core.windows.net/container/path
Authentication:
• SAS Token (recommended for browser use)
• Anonymous/Public (for public containers)
URL formats supported:
• s3://bucket-name/path
• https://bucket-name.s3.region.amazonaws.com/path
• https://s3.region.amazonaws.com/bucket-name/path
Authentication:
• Access Key ID + Secret Access Key
• Session Token (for temporary credentials)
• Anonymous/Public (for public buckets)
URL formats supported:
• gs://bucket-name/path
• https://storage.googleapis.com/bucket-name/path
• https://bucket-name.storage.googleapis.com/path
Authentication:
• OAuth Access Token (get via: gcloud auth print-access-token)
• Anonymous/Public (for public buckets)
- API Testing: Quickly inspect Parquet responses
- Data Pipeline Debugging: Verify intermediate file formats
- Schema Validation: Ensure data types match expectations
- Performance Analysis: Check compression and encoding efficiency
- Quick Data Inspection: View file contents without heavy tools
- Data Quality Assessment: Check for nulls, duplicates, and anomalies
- Schema Documentation: Understand data structure and types
- Sample Data Extraction: Export subsets for further analysis
- Report Verification: Confirm data exports are correct
- Data Sharing: Convert Parquet to accessible formats
- File Validation: Ensure data integrity before processing
- Quick Previews: Get instant insights without technical setup
Ctrl+F: Focus search box←/→: Navigate between pagesCtrl+S: Export modified data as CSVCtrl+E: Export original data as CSVEsc: Reset view/clear search?: Toggle keyboard shortcuts help
# Fork the repository and enable GitHub Pages
git clone https://github.com/yourusername/parquet-viewer.git
cd parquet-viewer
# Enable GitHub Pages in repository settings# Clone and serve locally
git clone https://github.com/mjtpena/parquet-viewer.git
cd parquet-viewer
# Open index.html in your browser or serve with any web server
python -m http.server 8000 # Python 3
# or
npx serve . # Node.jsSimply download index.html and serve it from any web server. No build process or dependencies required.
- No Network Requests: After initial page load, everything runs offline
- No Telemetry: No analytics, tracking, or data collection
- No External Dependencies: All code is self-contained
- No Server Storage: Files never leave your device
- Memory Management: Automatic cleanup after processing
- Secure Processing: Files are processed in isolated browser context
Q: File won't load or shows error
- Ensure file is a valid Parquet format
- Check file size is under 500MB
- Try with a different browser
- Verify file isn't corrupted
Q: Browser runs out of memory
- Use a smaller file or close other browser tabs
- Try increasing browser memory limits
- Consider using a 64-bit browser
Q: Performance is slow
- Close unnecessary browser tabs
- Disable browser extensions temporarily
- Use a modern browser version
- Check available system memory
Q: Features not working
- Enable JavaScript in your browser
- Update to a supported browser version
- Clear browser cache and reload
- Primitive types (INT32, INT64, FLOAT, DOUBLE, BOOLEAN, BYTE_ARRAY)
- Logical types (STRING, TIMESTAMP, DECIMAL, etc.)
- Complex types (basic support for nested structures)
- GZIP, Snappy, LZ4, ZSTD
- Uncompressed files
- Compression ratio analysis
- Plain, Dictionary, RLE
- Delta encoding variants
- All standard Parquet encodings
- Complex nested schemas (deep nesting)
- Map and List types (full support)
- Advanced filtering predicates
- Multi-file datasets
MIT License - Free for personal and commercial use. See LICENSE file for details.
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Make your changes to
index.html - Test across different browsers and file types
- Submit a pull request with clear description
- Performance optimizations for large files
- Support for complex nested types
- Advanced filtering and search
- Additional export formats
- Accessibility improvements
If you find Multi-Format Data Viewer useful:
- ⭐ Star the repository on GitHub
- 🐛 Report bugs and request features
- 🔄 Share with colleagues who work with data files
- 💡 Contribute improvements via pull requests
- 📢 Spread the word on social media
- Apache Parquet: The Parquet format specification
- Hyparquet: The JavaScript Parquet parser we use
- Apache Arrow: Columnar data format and processing libraries
- DuckDB: Fast analytical database with Parquet support
Made with ❤️ for the data community