Text processing utilities¶
-
coaster.utils.text.
normalize_spaces
(text)[source]¶ Replace whitespace characters with regular spaces.
-
coaster.utils.text.
normalize_spaces_multiline
(text)[source]¶ Replace whitespace characters with regular spaces, but ignoring characters that are relevant to multiline text, like tabs and newlines.
-
coaster.utils.text.
sanitize_html
(value, valid_tags=None, strip=True, linkify=False)[source]¶ Strips unwanted markup out of HTML.
-
coaster.utils.text.
simplify_text
(text)[source]¶ Simplify text to allow comparison.
>>> simplify_text("Awesome Coder wanted at Awesome Company") 'awesome coder wanted at awesome company' >>> simplify_text("Awesome Coder, wanted at Awesome Company! ") 'awesome coder wanted at awesome company' >>> simplify_text(u"Awesome Coder, wanted at Awesome Company! ") == ( ... 'awesome coder wanted at awesome company') True
-
coaster.utils.text.
text_blocks
(html_text, skip_pre=True)[source]¶ Extracts a list of paragraphs from a given HTML string
-
coaster.utils.text.
ulstrip
(text)[source]¶ Strip Unicode extended whitespace from the left side of a string
-
coaster.utils.text.
urstrip
(text)[source]¶ Strip Unicode extended whitespace from the right side of a string
-
coaster.utils.text.
word_count
(text, html=True)[source]¶ Return the count of words in the given text. If the text is HTML (default True), tags are stripped before counting. Handles punctuation and bad formatting like.this when counting words, but assumes conventions for Latin script languages. May not be reliable for other languages.