Text processing utilities

coaster.utils.text.sanitize_html(value, valid_tags={'em': [], 'pre': [], 'code': [], 'h3': [], 'h6': [], 'h4': [], 'h5': [], 'mark': [], 'strong': [], 'sub': [], 'img': ['src', 'width', 'height', 'align', 'alt'], 'ul': [], 'li': ['start'], 'sup': [], 'cite': [], 'dl': [], 'blockquote': [], 'hr': [], 'dd': [], 'ol': [], 'abbr': ['title'], 'br': [], 'dt': [], 'ins': [], 'a': ['href', 'title', 'target', 'rel'], 'b': [], 'i': [], 'p': [], 'del': []}, strip=True)[source]

Strips unwanted markup out of HTML.

coaster.utils.text.word_count(text, html=True)[source]

Return the count of words in the given text. If the text is HTML (default True), tags are stripped before counting. Handles punctuation and bad formatting like.this when counting words, but assumes conventions for Latin script languages. May not be reliable for other languages.

coaster.utils.text.deobfuscate_email(text)[source]

Deobfuscate email addresses in provided text

coaster.utils.text.simplify_text(text)[source]

Simplify text to allow comparison.

>>> simplify_text("Awesome Coder wanted at Awesome Company")
'awesome coder wanted at awesome company'
>>> simplify_text("Awesome Coder, wanted  at Awesome Company! ")
'awesome coder wanted at awesome company'
>>> simplify_text(u"Awesome Coder, wanted  at Awesome Company! ") == 'awesome coder wanted at awesome company'
True