Remove Extra Spaces Between Words
Updated: May 2026
Extra spaces between words — two, three, or more consecutive space characters where only one belongs — are the most frequent whitespace problem in text assembled from multiple sources. They are invisible in most rich-text displays but cause measurable issues in search, data processing, and text comparison.
Free · Instant · No upload · Preserves line breaks
Why extra spaces appear between words
The most widespread cause is the two-space-after-period convention that many people learned when touch-typing was taught on typewriters. Monospaced typewriter fonts needed the extra spacing to separate sentences visually. Proportional digital fonts don't — but the typing habit persists. A document typed this way will have a double space before every sentence-starting word.
PDF-to-text conversion is the second major cause. PDFs encode each character's position numerically. When a word gap is slightly wider than normal — common in justified text, or when the document was exported from an application that adds optical kerning — the extraction algorithm interprets the gap as two spaces. Long documents can carry thousands of these artifacts.
In programming, text.split(' ') on a string with double spaces produces empty strings in the resulting array: ['hello', '', 'world'] instead of ['hello', 'world']. Cleaning spaces before tokenizing eliminates an entire class of edge-case bugs.
How the collapse works technically
The space-collapse operation uses a regular expression that matches any run of two or more consecutive horizontal whitespace characters and replaces the entire run with a single space. The key design detail is that "horizontal whitespace" excludes newline characters (LF and CR). The regex only operates within each line, so paragraph breaks and blank lines are completely unaffected.
This means you can safely run the collapse on multi-paragraph text without worrying about the structure being flattened. Every line's internal spacing is normalized; the line boundaries themselves are preserved exactly.
- Handles 2, 3, 4 or any number of consecutive spaces — one pass fixes all
- Processes Unicode space variants including non-breaking spaces (U+00A0)
- Works on any text: prose, HTML, Markdown, CSV, code, configuration files
- Newlines are never affected by the space collapse operation
Use cases where this operation is essential
- CMS and blog publishing — loading articles with double spaces into WordPress, Ghost, or similar CMSs produces inconsistent paragraph rendering that varies by theme.
- AI writing assistants — pasting text with double spaces into ChatGPT, Claude, or other LLM interfaces works, but token boundaries may split on the extra space, subtly affecting generation quality.
- Full-text search indexes — Elasticsearch, Meilisearch, and similar engines tokenize on spaces. A double space between words may or may not be treated as a token boundary depending on the analyzer configuration. Normalizing input avoids ambiguity.
- PDF content extraction — any pipeline that extracts text from PDFs for NLP should include a space-collapse step as standard preprocessing.
Frequently asked questions
Does collapsing spaces affect punctuation spacing?
No. The tool collapses horizontal space characters. Punctuation marks (periods, commas, colons) are not whitespace and are not changed. The space before or after a punctuation mark is treated the same as any other space.
Can I collapse spaces without removing trailing spaces?
Yes. Enable only "Collapse multiple spaces" and leave "Trim each line" unchecked. Each option is independent.
Will this fix spaces in URLs or code?
Yes for spaces between tokens. Be careful with string literals or intentional multi-space sequences in code — collapsing changes them. Use with judgment on source code files.