<divclass="dev_page_bread_crumbs"><ulclass="breadcrumb clearfix"><li><ahref="/api">API</a></li><iclass="icon icon-breadcrumb-divider"></i><li><ahref="/api/entities">Styled text with message entities</a></li></ul></div>
<h1id="dev_page_title">Styled text with message entities</h1>
<p>Telegram supports styled text using <ahref="/type/MessageEntity">message entities</a>.</p>
<p>A client that wants to send styled messages would simply have to integrate a <ahref="https://en.wikipedia.org/wiki/Markdown">Markdown</a>/<ahref="https://en.wikipedia.org/wiki/HTML">HTML</a> parser, and generate an array of message entities by iterating through the parsed tags. </p>
<p>Special care must be taken to consider the length of strings when generating message entities as the number of <ahref="https://en.wikipedia.org/wiki/UTF-16">UTF-16</a> code units, even if the message itself must be encoded using UTF-8. </p>
<h4><aclass="anchor"href="#unicode-codepoints-and-encoding"id="unicode-codepoints-and-encoding"name="unicode-codepoints-and-encoding"><iclass="anchor-icon"></i></a>Unicode codepoints and encoding</h4>
<p>A <ahref="https://en.wikipedia.org/wiki/Unicode">Unicode</a><ahref="https://en.wikipedia.org/wiki/Code_point">code point</a> is a number ranging from <code>0x0</code> to <code>0x10FFFF</code>, usually represented using <code>U+0000</code> to <code>U+10FFFF</code> syntax.<br>
Unicode defines a codespace of 1,112,064 assignable code points within the <code>U+0000</code> to <code>U+10FFFF</code> range.<br>
Each of the assignable codepoints, once assigned by the Unicode consortium, maps to a specific character, emoji or control symbol. </p>
<p>The Unicode codespace is further subdivided into 17 planes:</p>
<ul>
<li>Plane 1: <code>U+0000</code> to <code>U+FFFF</code>: Basic Multilingual Plane (BMP)</li>
<li>Planes 2-17: <code>U+00000</code> to <code>U+10FFFF</code>: Multiple supplementary planes as specified <ahref="https://en.wikipedia.org/wiki/Plane_(Unicode)">by the Unicode standard</a></li>
</ul>
<p>Since storing a 21-bit number for each letter would result in a waste of space, the Unicode consortium defines multiple encodings that allow storing a code point into a smaller <em>code unit</em>: </p>
<p><ahref="https://en.wikipedia.org/wiki/UTF-8">UTF-8 »</a> is a Unicode encoding that allows storing a 21-bit Unicode code point into <em>code units</em> as small as 8 bits.<br>
UTF-8 is used by the MTProto and Bot API when transmitting and receiving fields of type <ahref="/type/string">string</a>. </p>
<p><ahref="https://en.wikipedia.org/wiki/UTF-16">UTF-16 »</a> is a Unicode encoding that allows storing a 21-bit Unicode code point into one or two 16-bit <em>code units</em>. </p>
<p>UTF-16 is used when computing the length and offsets of entities in the MTProto and bot APIs, by counting the number of UTF-16 code units (<strong>not</strong> code points).</p>
<li>Code points in the BMP (<code>U+0000</code> to <code>U+FFFF</code>) count as 1, because they are encoded into a single UTF-16 code unit</li>
<li>Code points in all other planes count as 2, because they are encoded into two UTF-16 code units (also called surrogate pairs)</li>
</ul>
<p>A simple, but not very efficient way of computing the entity length is converting the text to UTF-16, and then taking the byte length divided by 2 (=number of UTF-16 code units).</p>
<p>However, since UTF-8 encodes codepoints in non-BMP planes as a 32-bit code unit starting with <code>0b11110</code>, a more efficient way to compute the entity length without converting the message to UTF-16 is the following: </p>
<ul>
<li>If the byte marks the beginning of a 32-bit UTF-8 code unit (all bytes starting with <code>0b11110</code>) increment the count by 2, otherwise</li>
<li>If the byte marks the beginning of a UTF-8 code unit (all bytes not starting with <code>0b10</code>) increment the count by 1.</li>
</ul>
<p>Example: </p>
<pre><code>length := 0
for byte in text {
if (byte & 0xc0) != 0x80 {
length += 1 + (byte >= 0xf0)
}
}</code></pre>
<p><strong>Note</strong>: the <em>length</em> of an entity <strong>must not</strong> include the length of trailing newlines or whitespaces, <code>rtrim</code> entities before computing their length: however, the next <em>offset</em><strong>must</strong> include the length of newlines or whitespaces that precede it. </p>
<p>The following entities can also be used to <ahref="/api/mentions">mention</a> users:</p>
<ul>
<li><ahref="/constructor/inputMessageEntityMentionName">inputMessageEntityMentionName</a> =><ahref="https://t.me/botfather">Mention a user</a></li>
<li><ahref="/constructor/inputMessageEntityMentionName">messageEntityMention</a> =><ahref="https://t.me/botfather">@botfather</a> (this mention is generated automatically server-side for @usernames in messages)</li>
<p>Also, <ahref="/constructor/messageEntityCustomEmoji">messageEntityCustomEmoji</a> entities are used for <ahref="/api/custom-emoji">custom emojis »</a>.</p>