← Back to tool

Microsoft Word · CMS · Content Editors

Clean HTML Pasted From Word

Updated: May 2026

Microsoft Word has generated notoriously messy HTML for decades. When you copy content from Word and paste it into a CMS, email editor, or web form that accepts HTML, the underlying markup it injects is bloated, non-standard, and full of proprietary Microsoft namespaces. The fastest remedy is to strip it all down to clean plain text and re-paste.

Clean Word HTML now →

Free · No upload · Browser-based

What Word HTML Actually Looks Like

When you copy a sentence from Word and paste it into an HTML editor or "Paste as HTML" in a CMS, the clipboard contains markup like this:

<p class="MsoNormal"><span lang="EN-GB" style="font-size:12.0pt;
font-family:"Times New Roman",serif;mso-fareast-font-family:
"Times New Roman";mso-ansi-language:EN-GB;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA">Hello, world.<o:p></o:p></span></p>

That is one sentence: "Hello, world." — wrapped in 250+ characters of proprietary markup. A full Word document pasted as HTML can easily generate 10× more bytes of markup than actual content, with:

  • Microsoft Office XML namespace tags: <o:p>, <w:WordDocument>, <m:math>
  • Inline style attributes encoding every font, size, spacing, and colour setting from the Word stylesheet
  • Conditional comments for IE compatibility
  • Embedded base64-encoded images
  • MsoNormal, MsoListParagraph class names that reference Word's internal stylesheet
  • Lang attributes on every span element
  • XML processing instructions (<?xml ...?>)

Why This Causes Problems in CMS and Email Editors

Many WYSIWYG editors — WordPress's TinyMCE, Drupal's CKEditor, Squarespace, Webflow — try to import the Word HTML and render it. The result is unpredictable:

  • Font size overrides: inline font-size styles in Word HTML override the CMS theme's stylesheet, making the pasted text larger, smaller, or a different font family than the rest of the page.
  • Broken layouts: Word table markup conflicts with the CMS's responsive grid, causing horizontal overflow on mobile.
  • Invisible characters: non-breaking spaces (&nbsp;) from Word copy appear as layout glitches in search and text processing tools.
  • Search index pollution: if the CMS stores the raw HTML in its full-text search index, the index contains Microsoft namespace fragments instead of real keywords.
  • Email rendering failures: Word HTML pasted into an email template causes rendering chaos in Outlook (ironically), Apple Mail, and Gmail, each of which interprets the proprietary styles differently.

The Cleanest Workflow: Strip to Plain Text, Then Re-Format

The most reliable way to move content from Word to a CMS is a two-step process:

  • Step 1 — Strip to plain text. Paste the Word content into the Flowfiles HTML tag stripper (or use the browser's "Paste as plain text" option if your CMS provides it — usually Ctrl+Shift+V). All Word markup is discarded; you are left with the raw words.
  • Step 2 — Re-apply formatting in the CMS editor. Use the CMS's own toolbar to apply headings, bold, italics, and lists. This ensures the formatting uses the CMS's CSS classes, not Word's inline styles.

This approach takes slightly longer than a single paste, but the result is clean, maintainable HTML that respects your site's design system.

Using the Flowfiles Stripper on Word HTML

If you already have the HTML source of a Word-generated page (e.g., a .htm file exported from Word via File → Save As → Web Page), you can paste it directly into the Flowfiles tool:

  • Open the .htm file in a text editor and copy all content, or drag the file onto the input area.
  • Enable "Preserve line breaks" so paragraphs stay separated.
  • Enable "List items as bullets" to recover Word's bulleted and numbered lists.
  • Click "Strip tags" — all Microsoft namespaces, inline styles, and proprietary tags are removed.
  • The plain-text output contains only your content, ready to paste into any editor.

Alternative: Paste via Notepad or TextEdit

A classic workaround that requires no tool: paste your Word content into Notepad (Windows) or TextEdit in plain-text mode (Mac), copy it again from there, then paste into your CMS. Notepad strips all formatting and HTML, leaving pure text. This works but loses all structure — paragraph breaks, headings, lists — which must then be re-applied in the CMS editor. The Flowfiles stripper with "Preserve line breaks" enabled preserves that structure automatically.

Frequently Asked Questions

Can the tool handle a full .htm file exported from Word?

Yes. Drag and drop the .htm file exported from Word directly onto the input area. The file is read locally and the full Word HTML is stripped to clean plain text.

Why does pasting from Word into Gmail look fine but break in other clients?

Gmail applies aggressive sanitisation to pasted content, stripping most Word HTML before displaying it. Other clients like Outlook or Apple Mail apply less sanitisation and display the raw Word styles, causing visual inconsistencies.

Will bullet points from Word survive the stripping?

If the "List items as bullets" option is enabled, <li> elements are converted to • item lines. Word's MsoListParagraph paragraphs that do not use real <li> tags may appear as regular paragraphs — review the output and adjust manually if needed.