Main Site โ†—

web-to-markdown

by bear2u801181GitHub

Converts web pages to Markdown with three modes: standard, AI-optimized for context use, and dual mode for both. Handles dynamic content via Playwright fallback and includes structured prompts for consistent output. Useful for archiving articles, documentation, and research materials.

Unlock Deep Analysis

Use AI to visualize the workflow and generate a realistic output preview for this skill.

Powered by Fastest LLM

Target Audience

Developers, researchers, and content creators who need to archive web content for reference, documentation, or AI context preparation.

7/10Security

Low security risk, safe to use

9
Clarity
9
Practicality
8
Quality
8
Maintainability
8
Innovation
Productivity
web-scrapingmarkdown-conversioncontent-archivingai-optimizationplaywright
Compatible Agents
Claude Code
Claude Code
~/.claude/skills/
Codex CLI
Codex CLI
~/.codex/skills/
Gemini CLI
Gemini CLI
~/.gemini/skills/
O
OpenCode
~/.opencode/skills/
O
OpenClaw
~/.openclaw/skills/
GitHub Copilot
GitHub Copilot
~/.copilot/skills/
Cursor
Cursor
~/.cursor/skills/
W
Windsurf
~/.codeium/windsurf/skills/
C
Cline
~/.cline/skills/
R
Roo Code
~/.roo/skills/
K
Kiro
~/.kiro/skills/
J
Junie
~/.junie/skills/
A
Augment Code
~/.augment/skills/
W
Warp
~/.warp/skills/
G
Goose
~/.config/goose/skills/
SKILL.md

Web to Markdown

ๆฆ‚่ฟฐ / Overview

้€š็”จ็ฝ‘้กตๆŠ“ๅ–ๅทฅๅ…ท๏ผŒๆ”ฏๆŒ๏ผš A general-purpose web scraping tool that supports:

  • ๅฐ†็ฝ‘้กตๅ†…ๅฎน่ฝฌๆขไธบๅนฒๅ‡€็š„ Markdown / Converting web content to clean Markdown
  • ไปŽไปปๆ„็ฝ‘็ซ™ๆๅ–ๅ›พ็‰‡ URL / Extracting image URLs from any website
  • ๆ‰น้‡ไธ‹่ฝฝ็ฝ‘้กตๅ›พ็‰‡ / Batch downloading images from web pages

้€‚็”จไบŽๅ†…ๅฎน้˜…่ฏปใ€ๅ›พ็‰‡ๆ”ถ้›†ใ€่ต„ๆ–™ๆ•ด็†็ญ‰ๅœบๆ™ฏใ€‚ Suitable for content reading, image collection, and data organization.

ๅŠŸ่ƒฝๆจกๅ— / Features

1. ็ฝ‘้กต่ฝฌ Markdown / Web to Markdown

ๅฐ†็ฝ‘้กต URL ่ฝฌๆขไธบๅนฒๅ‡€็š„ Markdown ๆ–‡ๆœฌ๏ผŒ็งป้™คๅนฟๅ‘Šใ€ๅฏผ่ˆชๆ ็ญ‰ๆ— ๅ…ณๅ†…ๅฎนใ€‚ Converts a web page URL into clean Markdown text, removing ads, navigation bars, and other irrelevant content.

URL ๅ‰็ผ€ๆœๅŠก / URL Prefix Services๏ผš

ๆœๅŠก Serviceๅ‰็ผ€ Prefix็‰น็‚น Notes
markdown.newhttps://markdown.new/้ฆ–้€‰๏ผŒ้€Ÿๅบฆๅฟซ / Preferred, fast
defuddlehttps://defuddle.md/ๅค‡้€‰ / Fallback
r.jina.aihttps://r.jina.ai/้€‚ๅˆๅŠจๆ€ๅ†…ๅฎน / Good for dynamic content

ไฝฟ็”จ / Usage๏ผš

curl -s "https://markdown.new/https://example.com/article"
curl -s "https://r.jina.ai/https://example.com/article"

2. ๆๅ–็ฝ‘้กตๅ›พ็‰‡ / Extract Images from Web Pages

ไปŽไปปๆ„็ฝ‘้กตๆๅ–ๆ‰€ๆœ‰ๅ›พ็‰‡ URLใ€‚ Extracts all image URLs from any web page.

้€š็”จๆๅ– / General Extraction๏ผš

# ๆๅ–ๆ‰€ๆœ‰ๅ›พ็‰‡ URL / Extract all image URLs
curl -s "https://r.jina.ai/<url>" | grep -oE 'https://[^)\s"]+\.(jpg|jpeg|png|gif|webp|avif)'

ไฝฟ็”จ่„šๆœฌ / Using the Script๏ผš

python scripts/extract_images.py <url> [--output urls.txt]

3. ๆ‰น้‡ไธ‹่ฝฝๅ›พ็‰‡ / Batch Download Images

ไปŽ็ฝ‘้กตๆๅ–ๅ›พ็‰‡ๅนถๆ‰น้‡ไธ‹่ฝฝๅˆฐๆœฌๅœฐใ€‚ Extracts images from web pages and downloads them in batch to local storage.

ไฝฟ็”จ่„šๆœฌ / Using the Script๏ผš

python scripts/download_images.py <url> [--output <dir>] [--limit <n>] [--min-size <bytes>]

ๅ‚ๆ•ฐ / Parameters๏ผš

  • url: ็ฝ‘้กต URL / Web page URL
  • --output: ่พ“ๅ‡บ็›ฎๅฝ•๏ผˆ้ป˜่ฎค ~/.openclaw/images๏ผ‰/ Output directory (default: ~/.openclaw/images)
  • --limit: ๆœ€ๅคงไธ‹่ฝฝๆ•ฐ๏ผˆ้ป˜่ฎค 50๏ผ‰/ Max downloads (default: 50)
  • --min-size: ๆœ€ๅฐๆ–‡ไปถๅคงๅฐ๏ผŒ่ฟ‡ๆปคๅฐๅ›พๆ ‡๏ผˆ้ป˜่ฎค 10KB๏ผ‰/ Min file size to filter out small icons (default: 10KB)
  • --ext: ๅชไธ‹่ฝฝๆŒ‡ๅฎšๆ ผๅผ๏ผˆjpg/png/gif/webp๏ผ‰/ Only download specific formats (jpg/png/gif/webp)

็คบไพ‹ / Examples๏ผš

# ไธ‹่ฝฝ็ฝ‘้กตไธญ็š„ๆ‰€ๆœ‰ๅคงๅ›พ / Download all large images from a page
python scripts/download_images.py "https://example.com/gallery" --output ~/Downloads/images

# ๅชไธ‹่ฝฝ PNG๏ผŒๆœ€ๅคš 20 ๅผ  / Download only PNGs, max 20
python scripts/download_images.py "https://example.com" --ext png --limit 20

# Pinterest๏ผˆ่‡ชๅŠจ่ฝฌๆขๅŽŸๅง‹ๅฐบๅฏธ๏ผ‰/ Pinterest (auto-converts to original size)
python scripts/download_images.py "https://www.pinterest.com/search/pins/?q=architecture"

ๅทฅไฝœๆต็จ‹ / Workflow

็ฝ‘้กตๅ†…ๅฎนๆŠ“ๅ– / Web Content Scraping

  1. ้ฆ–้€‰ markdown.new/ / Prefer markdown.new/
  2. ๅคฑ่ดฅๅˆ™ๅฐ่ฏ• defuddle.md/ / Fall back to defuddle.md/
  3. ๅ†ๅคฑ่ดฅๅฐ่ฏ• r.jina.ai/ / Then try r.jina.ai/
  4. ๆœ€็ปˆไฝฟ็”จๆœฌๅœฐ Scrapling ่„šๆœฌ / Finally use local Scrapling script

ๅ›พ็‰‡ๆๅ–ไธ‹่ฝฝ / Image Extraction & Download

  1. ไฝฟ็”จ r.jina.ai ่Žทๅ–็ฝ‘้กตๅ†…ๅฎน / Use r.jina.ai to fetch page content
  2. ๆญฃๅˆ™ๆๅ–ๆ‰€ๆœ‰ๅ›พ็‰‡ URL / Extract all image URLs via regex
  3. ่ฟ‡ๆปคๅฐๅ›พ็‰‡๏ผˆๅ›พๆ ‡ใ€่กจๆƒ…็ญ‰๏ผ‰/ Filter out small images (icons, emojis, etc.)
  4. ๆ™บ่ƒฝๅ‘ฝๅๅนถไธ‹่ฝฝไฟๅญ˜ / Smart naming and download

็‰นๆฎŠ็ฝ‘็ซ™ๆ”ฏๆŒ / Special Website Support

Pinterest

่‡ชๅŠจ่ฏ†ๅˆซ Pinterest URL๏ผŒๅฐ†็ผฉ็•ฅๅ›พ่ฝฌๆขไธบๅŽŸๅง‹ๅฐบๅฏธ๏ผš Automatically detects Pinterest URLs and converts thumbnails to original size:

  • 236x โ†’ originals
  • 564x โ†’ originals

ๅ…ถไป–ๅธธ่ง็ฝ‘็ซ™ / Other Common Websites

่„šๆœฌไผš่‡ชๅŠจๅค„็†ๅ„็ง็ฝ‘็ซ™็š„ๅ›พ็‰‡ URL ๆ ผๅผ๏ผŒๅŒ…ๆ‹ฌ๏ผš The scripts automatically handle various image URL formats, including:

  • CDN ้“พๆŽฅ / CDN links
  • ๅธฆๅ‚ๆ•ฐ็š„ URL / URLs with query parameters
  • ๆ‡’ๅŠ ่ฝฝๅ›พ็‰‡ / Lazy-loaded images

่„šๆœฌ่ฏดๆ˜Ž / Script Reference

scripts/scrape.py

ๆœฌๅœฐ็ฝ‘้กตๆŠ“ๅ–่„šๆœฌ๏ผŒไฝœไธบๅœจ็บฟๆœๅŠก็š„้™็บงๆ–นๆกˆใ€‚ Local web scraping script, used as a fallback for online services.

python scripts/scrape.py <url>

scripts/extract_images.py

ๆๅ–็ฝ‘้กตไธญ็š„ๅ›พ็‰‡ URL๏ผŒ่พ“ๅ‡บไธบๅˆ—่กจใ€‚ Extracts image URLs from a web page and outputs them as a list.

python scripts/extract_images.py <url> [--output urls.txt]

scripts/download_images.py

ๆ‰น้‡ไธ‹่ฝฝ็ฝ‘้กตๅ›พ็‰‡ใ€‚ Batch downloads images from a web page.

python scripts/download_images.py <url> [options]

ไพ่ต– / Dependencies

extract_images.py ๅ’Œ download_images.py ไป…ไฝฟ็”จ Python ๆ ‡ๅ‡†ๅบ“๏ผŒๆ— ้œ€้ขๅค–ๅฎ‰่ฃ…ใ€‚ extract_images.py and download_images.py only use the Python standard library โ€” no extra installation needed.

scrape.py ้œ€่ฆๅฎ‰่ฃ… scrapling๏ผˆๆœฌๅœฐๆŠ“ๅ–้™็บงๆ–นๆกˆ๏ผ‰๏ผš scrape.py requires scrapling (local scraping fallback):

pip install scrapling

ๆณจๆ„ไบ‹้กน / Notes

  • ้ตๅฎˆ็ฝ‘็ซ™็š„ robots.txt ๅ’Œไฝฟ็”จๆกๆฌพ / Respect the website's robots.txt and terms of use
  • ๅคง้‡ไธ‹่ฝฝๅ‰่€ƒ่™‘็ฝ‘็ซ™ๆœๅŠกๅ™จๅŽ‹ๅŠ› / Consider server load before mass downloading
  • ้ƒจๅˆ†็ฝ‘็ซ™ๆœ‰้˜ฒ็›—้“พ๏ผŒๅฏ่ƒฝๆ— ๆณ•็›ดๆŽฅไธ‹่ฝฝ / Some sites have hotlink protection and may block direct downloads
  • ๅŠจๆ€ๅŠ ่ฝฝ็š„ๅ›พ็‰‡ๅฏ่ƒฝ้œ€่ฆไฝฟ็”จ r.jina.ai / Dynamically loaded images may require r.jina.ai

Source: https://github.com/bear2u/my-skills#skills~web-to-markdown

Content curated from original sources, copyright belongs to authors

Grade A
8.3AI Score
Best Practices
Checking...
Try this Skill

User Rating

USER RATING

0UP
0DOWN
Loading files...

WORKS WITH

Claude Code
Claude
Codex CLI
Codex
Gemini CLI
Gemini
O
OpenCode
O
OpenClaw
GitHub Copilot
Copilot
Cursor
Cursor
W
Windsurf
C
Cline
R
Roo
K
Kiro
J
Junie
A
Augment
W
Warp
G
Goose