diff --git a/data/web/corefork.telegram.org/api/entities.html b/data/web/corefork.telegram.org/api/entities.html index 31a43eefa5..50c3a4a4a4 100644 --- a/data/web/corefork.telegram.org/api/entities.html +++ b/data/web/corefork.telegram.org/api/entities.html @@ -47,7 +47,7 @@
Nested entities are supported.
Special care must be taken to consider the length of strings when generating message entities as the number of UTF-16 code units, even if the message itself must be encoded using UTF-8.
-Example implementation: tdlib.
+Example implementations: tdlib, MadelineProto.
A Unicode code point is a number ranging from 0x0
to 0x10FFFF
, usually represented using U+0000
to U+10FFFF
syntax.
Unicode defines a codespace of 1,112,064 assignable code points within the U+0000
to U+10FFFF
range.
@@ -63,12 +63,12 @@ Each of the assignable codepoints, once assigned by the Unicode consortium, maps
UTF-8 is used by the MTProto and Bot API when transmitting and receiving fields of type string.
UTF-16 » is a Unicode encoding that allows storing a 21-bit Unicode code point into one or two 16-bit code units.
+UTF-16 is used when computing the length and offsets of entities in the MTProto and bot APIs, by counting the number of UTF-16 code units (not code points).
+U+0000
to U+FFFF
) count as 1, because they are encoded into a single UTF-16 code unitsUTF-16 is used when computing the length and offsets of entities in the MTProto and bot APIs, by counting the number of UTF-16 code units (not code points).
-A simple, but not very efficient way of computing the entity length is converting the text to UTF-16, and then taking the byte length divided by 2 (=number of UTF-16 code units).
However, since UTF-8 encodes codepoints in non-BMP planes as a 32-bit code unit starting with 0b11110
, a more efficient way to compute the entity length without converting the message to UTF-16 is the following:
Example:
length := 0
-for char in text {
- if (char & 0xc0) != 0x80 {
- length += 1 + (char >= 0xf0)
+for byte in text {
+ if (byte & 0xc0) != 0x80 {
+ length += 1 + (byte >= 0xf0)
}
}
Note: the length of an entity must not include the length of trailing newlines or whitespaces, rtrim
entities before computing their length: however, the next offset must include the length of newlines or whitespaces that precede it.
Example implementation: tdlib.
+Example implementations: tdlib, MadelineProto.
For example the following HTML/Markdown aliases for message entities can be used:
A string literal in the form of /[A-Z_0-9]+/
, which summarizes the problem. For example, AUTH_KEY_UNREGISTERED
. This is an optional parameter.
A full machine-readable JSON list of RPC errors that can be returned by all methods in the API can be found here », what follows is a description of its fields:
+A full machine-readable JSON list of RPC errors that can be returned by all methods in the API can be found here », what follows is a description of its fields:
errors
- All error messages and codes for each method (object).File references are strings of bytes, that can be encountered in the file_reference
fields of document and photo objects.
They must be cached by the client, along with the origin context where the document/photo object was found, in order to be refetched when the file reference expires.
-Example implementation of a reference database: MadelineProto, android, telegram desktop, tdlib.
+Example implementation of a reference database: MadelineProto, android, telegram desktop, tdlib.
Assume you receive a message from your friend: that message contains a messageMediaPhoto with a photo.
Your client has to cache not only the file_reference
field of the photo, but also the context in which the file reference was seen (in this case, a message coming from a specific user).
The context info is in this case, an origin context of type message, containing the message ID and the peer ID of the chat/channel/user where the message was seen.
+The context info is in this case, an origin context of type message, containing the message ID and the peer ID of the chat/channel/user where the message was seen.
The context info has to be associated with the file reference: when downloading a file using upload.getFile, a FILE_REFERENCE_EXPIRED
error (or another error starting with FILE_REFERENCE_
) may be returned.
-If this happens, the context info must be used to refetch the object that contained the file reference: in this example, the peer info and the message ID have to be used with channels.getMessages or messages.getMessages to refetch the message, recache the file reference and use it in a new file download request.
More than one origin context can be associated to one file reference, for greater resilience (in the case of a message that was deleted in one chat but was also forwarded in another chat, the file reference can be refetched from the second chat, instead).
Origin contexts for objects returned by method calls with certain parameters can be considered, too (for example, in the case of favorited sticker sets returned by messages.getFavedStickers).