From f05fa347071356978c58d900e020ead2b134a0e2 Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Mon, 8 Aug 2022 08:15:11 +0000 Subject: [PATCH] Update content of files --- data/web/corefork.telegram.org/api/entities.html | 14 +++++++------- data/web/corefork.telegram.org/api/errors.html | 2 +- .../corefork.telegram.org/api/file_reference.html | 6 +++--- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/data/web/corefork.telegram.org/api/entities.html b/data/web/corefork.telegram.org/api/entities.html index 31a43eefa5..50c3a4a4a4 100644 --- a/data/web/corefork.telegram.org/api/entities.html +++ b/data/web/corefork.telegram.org/api/entities.html @@ -47,7 +47,7 @@

Nested entities are supported.

Entity length

Special care must be taken to consider the length of strings when generating message entities as the number of UTF-16 code units, even if the message itself must be encoded using UTF-8.

-

Example implementation: tdlib.

+

Example implementations: tdlib, MadelineProto.

Unicode codepoints and encoding

A Unicode code point is a number ranging from 0x0 to 0x10FFFF, usually represented using U+0000 to U+10FFFF syntax.
Unicode defines a codespace of 1,112,064 assignable code points within the U+0000 to U+10FFFF range.
@@ -63,12 +63,12 @@ Each of the assignable codepoints, once assigned by the Unicode consortium, maps UTF-8 is used by the MTProto and Bot API when transmitting and receiving fields of type string.

UTF-16

UTF-16 » is a Unicode encoding that allows storing a 21-bit Unicode code point into one or two 16-bit code units.

+

UTF-16 is used when computing the length and offsets of entities in the MTProto and bot APIs, by counting the number of UTF-16 code units (not code points).

+

Computing entity length

-

UTF-16 is used when computing the length and offsets of entities in the MTProto and bot APIs, by counting the number of UTF-16 code units (not code points).

-

Computing entity length

A simple, but not very efficient way of computing the entity length is converting the text to UTF-16, and then taking the byte length divided by 2 (=number of UTF-16 code units).

However, since UTF-8 encodes codepoints in non-BMP planes as a 32-bit code unit starting with 0b11110, a more efficient way to compute the entity length without converting the message to UTF-16 is the following:

Example:

length := 0
-for char in text {
-    if (char & 0xc0) != 0x80 {
-        length += 1 + (char >= 0xf0)
+for byte in text {
+    if (byte & 0xc0) != 0x80 {
+        length += 1 + (byte >= 0xf0)
     }
 }

Note: the length of an entity must not include the length of trailing newlines or whitespaces, rtrim entities before computing their length: however, the next offset must include the length of newlines or whitespaces that precede it.

-

Example implementation: tdlib.

+

Example implementations: tdlib, MadelineProto.

Allowed entities

For example the following HTML/Markdown aliases for message entities can be used: