JavaScript · Developer · Code Reference

Strip HTML Tags in JavaScript

Updated: May 2026

Stripping HTML tags in JavaScript is a deceptively common task with several distinct approaches — each with different trade-offs in accuracy, security, and environment support. This guide covers the three main methods: DOMParser (most accurate), innerHTML on a detached node (short but risky), and regex (simple but fragile).

Use the online HTML stripper →

Free · No upload · Browser-based

Method 1: DOMParser — Most Accurate

The most reliable approach uses the browser's native HTML5 parser. It handles malformed markup, multi-line tags, attribute values containing >, and decodes entities automatically.

function stripHtml(html) {
  const doc = new DOMParser().parseFromString(html, 'text/html');

  // Remove script and style subtrees
  doc.querySelectorAll('script, style').forEach(el => el.remove());

  return doc.body.textContent || '';
}

This returns the concatenated text of all visible text nodes. It does not insert line breaks at block boundaries — every node's text runs together. For readable multiline output, walk the tree manually:

function stripHtmlStructured(html) {
  const BLOCK = new Set(['P','DIV','H1','H2','H3','H4','H5','H6',
    'LI','BLOCKQUOTE','PRE','TR','BR']);
  const SKIP = new Set(['SCRIPT','STYLE','HEAD','NOSCRIPT']);

  function walk(node) {
    let text = '';
    for (const child of node.childNodes) {
      if (child.nodeType === Node.TEXT_NODE) {
        text += child.textContent;
      } else if (child.nodeType === Node.ELEMENT_NODE) {
        if (SKIP.has(child.tagName)) continue;
        if (BLOCK.has(child.tagName)) {
          text += '\n' + walk(child).trim() + '\n';
        } else {
          text += walk(child);
        }
      }
    }
    return text;
  }

  const doc = new DOMParser().parseFromString(html, 'text/html');
  return walk(doc.body).replace(/\n{3,}/g, '\n\n').trim();
}

Environment: Browser only. Not available in Node.js without a DOM polyfill (use jsdom or linkedom in Node).

Method 2: innerHTML on a Detached Element

A shorter pattern creates a throwaway DOM element, sets its innerHTML to your string, then reads textContent:

function stripHtml(html) {
  const el = document.createElement('div');
  el.innerHTML = html;
  return el.textContent || el.innerText || '';
}

Warning — XSS risk: setting innerHTML on a connected element executes embedded scripts. This pattern is only safe when el is detached from the document (not appended to the DOM). Even detached, onerror handlers on <img> tags can fire in some browser/version combinations. Never use this pattern with untrusted HTML on a page that also handles sensitive operations.

Entities: entities are decoded automatically since innerHTML triggers the HTML parser. Script and style content is included in textContent — you must remove those elements first:

function stripHtmlSafe(html) {
  const el = document.createElement('div');
  el.innerHTML = html;
  el.querySelectorAll('script, style').forEach(e => e.remove());
  return el.textContent || '';
}

Environment: Browser only. Requires an active document context.

Method 3: Regex — Simple but Fragile

The regex approach is the most commonly googled, and the most error-prone:

function stripHtml(html) {
  return html.replace(/<[^>]*>/g, '');
}

This fails in multiple real-world cases:

Tags spanning multiple lines are not matched if /s (dotAll) is not set.
Attribute values containing > cause premature match termination: <div data-x="a>b"> becomes data-x="a>b">.
HTML entities remain encoded — & stays as literal text.
Script and style content appears in the output.

Use regex only for tightly controlled HTML you generate yourself (e.g., a template with known, simple markup). Never use it on user-supplied or scraped HTML.

Environment: Works in browser and Node.js. No DOM dependency.

Node.js: Stripping HTML Without a Browser

In a Node.js environment, DOMParser is not available natively. Options:

jsdom: const { JSDOM } = require('jsdom'); const dom = new JSDOM(html); dom.window.document.body.textContent. Full DOM support, heavy dependency.
linkedom: lighter alternative to jsdom, faster startup, same API surface.
html-to-text (npm): a dedicated library with options for preserving word wrapping, bullets, link rendering, and table layout in plain text.
sanitize-html: primarily a sanitiser but can be configured to allow zero tags, producing plain text as a side effect.

Frequently Asked Questions

Is DOMParser available in Web Workers?

Yes. DOMParser is available in both Window and Worker contexts in all modern browsers. You can run structured HTML stripping off the main thread for performance.

Does textContent decode HTML entities?

Yes, when read from a parsed DOM node. The parser decodes entities during tree construction; textContent returns the decoded string. If you set textContent directly on a node, the string is treated as a literal (no parsing).

What is the fastest method for high-throughput Node.js processing?

Regex with known-good HTML is fastest but fragile. For production use, linkedom offers the best speed-accuracy balance. For very high throughput (millions of pages), consider a Rust-based HTML parser via WASM — e.g., html5ever bindings.