Text processing utilities

coaster.utils.text.deobfuscate_email(text)[source]

Deobfuscate email addresses in provided text

coaster.utils.text.normalize_spaces(text)[source]

Replace whitespace characters with regular spaces.

coaster.utils.text.normalize_spaces_multiline(text)[source]

Replace whitespace characters with regular spaces, but ignoring characters that are relevant to multiline text, like tabs and newlines.

coaster.utils.text.sanitize_html(value, valid_tags=None, strip=True, linkify=False)[source]

Strips unwanted markup out of HTML.

coaster.utils.text.simplify_text(text)[source]

Simplify text to allow comparison.

>>> simplify_text("Awesome Coder wanted at Awesome Company")
'awesome coder wanted at awesome company'
>>> simplify_text("Awesome Coder, wanted  at Awesome Company! ")
'awesome coder wanted at awesome company'
>>> simplify_text(u"Awesome Coder, wanted  at Awesome Company! ") == (
...   'awesome coder wanted at awesome company')
True
coaster.utils.text.text_blocks(html_text, skip_pre=True)[source]

Extracts a list of paragraphs from a given HTML string

coaster.utils.text.ulstrip(text)[source]

Strip Unicode extended whitespace from the left side of a string

coaster.utils.text.urstrip(text)[source]

Strip Unicode extended whitespace from the right side of a string

coaster.utils.text.ustrip(text)[source]

Strip Unicode extended whitespace from a string

coaster.utils.text.word_count(text, html=True)[source]

Return the count of words in the given text. If the text is HTML (default True), tags are stripped before counting. Handles punctuation and bad formatting like.this when counting words, but assumes conventions for Latin script languages. May not be reliable for other languages.