Instagram Posts Scraper collects public post content from profiles, hashtags, and locations into clean, analysis-ready data. It helps marketers, researchers, and growth teams turn scattered Instagram posts into structured insights for monitoring, reporting, and trend discovery.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for instagram-posts-scraper you've just found your team — Let’s Chat. 👆👆
Instagram Posts Scraper extracts key post-level information (captions, media links, engagement, and URLs) so you can analyze content performance at scale without manual browsing. It’s designed for anyone who needs repeatable datasets for reporting, competitive research, or content intelligence.
- Pulls posts from public profiles, hashtag pages, and location feeds.
- Captures engagement metrics (likes, comments) alongside post identifiers and URLs.
- Extracts media assets (images/videos) with direct URLs for downstream processing.
- Preserves publishing context with timestamps and extracted hashtags/mentions.
- Outputs structured datasets for dashboards, spreadsheets, or data pipelines.
| Feature | Description |
|---|---|
| Profile scraping | Collects posts from public Instagram profiles with stable post URLs and identifiers. |
| Hashtag scraping | Extracts posts from hashtag feeds to analyze trends and discover viral content. |
| Location scraping | Pulls posts tied to specific locations for regional insights and local research. |
| Captions + entities | Extracts full captions plus hashtags and mentions for NLP and SEO analysis. |
| Media extraction | Retrieves image/video URLs for archiving, review workflows, or ML pipelines. |
| Engagement metrics | Captures likes and comments counts to support performance benchmarking. |
| Structured exports | Produces JSON/CSV-ready records that plug into analytics stacks easily. |
| Deduplication support | Prevents repeat rows by tracking post IDs and source context. |
| Resilient runs | Includes retries, backoff, and safe request pacing for reliable collection. |
| Field Name | Field Description |
|---|---|
| post_id | Unique identifier for the post. |
| shortcode | Short post code used in URLs (e.g., DO8fSwLiNU-). |
| post_url | Canonical URL to the post. |
| profile_username | Public username the post belongs to. |
| profile_full_name | Display name of the profile when available. |
| profile_url | URL of the source profile. |
| caption_text | Full caption text, including hashtags and mentions. |
| hashtags | Hashtags parsed from the caption. |
| mentions | Tagged usernames parsed from the caption. |
| taken_at | Original publish time as a timestamp or epoch value. |
| scraped_at | Time when the record was collected. |
| media_type | Indicates whether the post is image, video, or carousel. |
| media_urls | List of extracted media URLs (images/videos). |
| carousel_count | Number of items if the post is a carousel. |
| like_count | Total likes (when visible). |
| comment_count | Total comments (when visible). |
| comments_preview | Optional preview/sample of comments if enabled. |
| accessibility_caption | Alternative text / accessibility caption when present. |
| location_name | Location label when scraping location feeds. |
| source_type | Where it was collected from: profile, hashtag, or location. |
| source_url | The source page URL used for collection. |
[
{
"post_id": "3727992219681477950_173560420",
"shortcode": "DO8fSwLiNU-",
"post_url": "https://www.instagram.com/p/DO8fSwLiNU-/",
"profile_username": "cristiano",
"profile_full_name": "Cristiano Ronaldo",
"caption_text": "Happy Saudi National Day to everyone in Saudi Arabia! 🇸🇦 Wishing you a day filled with pride, unity, and celebration with your loved ones.",
"hashtags": [],
"mentions": [
"alnassr"
],
"taken_at": 1758631325,
"scraped_at": 1758728197,
"media_type": "carousel",
"carousel_count": 3,
"media_urls": [
"https://scontent-iad3-1.cdninstagram.com/v/t51.2885-15/552825156_18648550693056421_6760424445129157822_n.jpg",
"https://scontent-iad3-1.cdninstagram.com/v/t51.2885-15/552103283_18648550702056421_7155034309683400047_n.jpg",
"https://scontent-iad3-1.cdninstagram.com/v/t51.2885-15/552717801_18648550711056421_5296052388327427597_n.jpg"
],
"like_count": 7141379,
"comment_count": 72516,
"accessibility_caption": "Photo shared by Cristiano Ronaldo on September 23, 2025 tagging @alnassr.",
"source_type": "profile",
"source_url": "https://www.instagram.com/cristiano/"
}
]
Instagram Posts Scraper (IMPORTANT :!! always keep this name as the name of the apify actor !!! Instagram Posts Scraper )/
├── src/
│ ├── main.ts
│ ├── runner.ts
│ ├── config/
│ │ ├── defaults.ts
│ │ └── schema.json
│ ├── core/
│ │ ├── client.ts
│ │ ├── rateLimiter.ts
│ │ ├── retry.ts
│ │ └── logger.ts
│ ├── extractors/
│ │ ├── profileExtractor.ts
│ │ ├── hashtagExtractor.ts
│ │ ├── locationExtractor.ts
│ │ └── postParser.ts
│ ├── normalizers/
│ │ ├── text.ts
│ │ ├── entities.ts
│ │ └── media.ts
│ ├── outputs/
│ │ ├── exporters.ts
│ │ ├── toJson.ts
│ │ └── toCsv.ts
│ ├── types/
│ │ ├── input.ts
│ │ └── output.ts
│ └── utils/
│ ├── time.ts
│ ├── url.ts
│ └── dedupe.ts
├── data/
│ ├── input.sample.json
│ └── output.sample.json
├── tests/
│ ├── parser.test.ts
│ └── entities.test.ts
├── .env.example
├── package.json
├── tsconfig.json
├── README.md
└── LICENSE
- Growth marketers use it to track competitor Instagram posts, so they can spot winning creatives and improve campaign performance.
- E-commerce teams use it to collect influencer posts by hashtag, so they can identify creators and validate engagement before outreach.
- Researchers use it to build datasets from public profiles, so they can study content patterns and engagement behavior.
- Local businesses use it to scrape location-based posts, so they can understand regional trends and optimize local promotions.
- Content creators use it to analyze trending hashtags, so they can plan posts that align with current audience interest.
Q1: Can it scrape private accounts or saved posts? No. It only collects content that is publicly accessible. Private accounts and saved/private collections are not supported.
Q2: What inputs are supported (profile, hashtag, location)? You can provide public profile URLs, hashtag feed URLs, or location URLs. The scraper detects the source type and uses the relevant extraction path to return normalized post records.
Q3: Why do like/comment counts sometimes look missing or different? Engagement visibility can vary by post and region, and some posts may hide likes or return partial values. The output preserves what is available and flags records where engagement is not visible.
Q4: How do I avoid duplicates across runs?
Enable deduplication using post_id (and optionally source_type + source_url) so re-runs don’t re-add the same post records to your dataset.
Primary Metric: Average collection speed of 800–1,500 posts per hour per worker on public sources, depending on media density and carousel frequency.
Reliability Metric: 95–98% run success rate across mixed inputs (profiles + hashtags + locations) when using safe pacing and retries.
Efficiency Metric: Typical throughput of 10–25 post records per minute with moderate resource usage, with higher throughput on single-profile runs.
Quality Metric: 90–99% field completeness for core fields (post_url, caption_text, media_urls, timestamps), with engagement fields varying based on visibility.
