mirror of
https://github.com/MarshalX/telegram-crawler.git
synced 2025-01-04 02:11:40 +01:00
Update content of files
This commit is contained in:
parent
88106cdd4b
commit
f05fa34707
3 changed files with 11 additions and 11 deletions
|
@ -47,7 +47,7 @@
|
|||
<p>Nested entities are supported.</p>
|
||||
<h2><a class="anchor" href="#entity-length" id="entity-length" name="entity-length"><i class="anchor-icon"></i></a>Entity length</h2>
|
||||
<p>Special care must be taken to consider the length of strings when generating message entities as the number of <a href="https://en.wikipedia.org/wiki/UTF-16">UTF-16</a> code units, even if the message itself must be encoded using UTF-8. </p>
|
||||
<p>Example implementation: <a href="https://github.com/tdlib/td/tree/master/td/telegram/MessageEntity.cpp">tdlib</a>.</p>
|
||||
<p>Example implementations: <a href="https://github.com/tdlib/td/tree/master/td/telegram/MessageEntity.cpp">tdlib</a>, <a href="https://github.com/danog/MadelineProto/blob/stable/src/danog/MadelineProto/TL/Conversion/DOMEntities.php">MadelineProto</a>.</p>
|
||||
<h3><a class="anchor" href="#unicode-codepoints-and-encoding" id="unicode-codepoints-and-encoding" name="unicode-codepoints-and-encoding"><i class="anchor-icon"></i></a>Unicode codepoints and encoding</h3>
|
||||
<p>A <a href="https://en.wikipedia.org/wiki/Unicode">Unicode</a> <a href="https://en.wikipedia.org/wiki/Code_point">code point</a> is a number ranging from <code>0x0</code> to <code>0x10FFFF</code>, usually represented using <code>U+0000</code> to <code>U+10FFFF</code> syntax.<br>
|
||||
Unicode defines a codespace of 1,112,064 assignable code points within the <code>U+0000</code> to <code>U+10FFFF</code> range.<br>
|
||||
|
@ -63,12 +63,12 @@ Each of the assignable codepoints, once assigned by the Unicode consortium, maps
|
|||
UTF-8 is used by the MTProto and Bot API when transmitting and receiving fields of type <a href="/type/string">string</a>. </p>
|
||||
<h4><a class="anchor" href="#utf-16" id="utf-16" name="utf-16"><i class="anchor-icon"></i></a>UTF-16</h4>
|
||||
<p><a href="https://en.wikipedia.org/wiki/UTF-16">UTF-16 »</a> is a Unicode encoding that allows storing a 21-bit Unicode code point into one or two 16-bit <em>code units</em>. </p>
|
||||
<p>UTF-16 is used when computing the length and offsets of entities in the MTProto and bot APIs, by counting the number of UTF-16 code units (<strong>not</strong> code points).</p>
|
||||
<h3><a class="anchor" href="#computing-entity-length" id="computing-entity-length" name="computing-entity-length"><i class="anchor-icon"></i></a>Computing entity length</h3>
|
||||
<ul>
|
||||
<li>Code points in the BMP (<code>U+0000</code> to <code>U+FFFF</code>) count as 1, because they are encoded into a single UTF-16 code units</li>
|
||||
<li>Code points in all other planes count as 2, because they are encoded into two UTF-16 code units (also called surrogate pairs)</li>
|
||||
</ul>
|
||||
<p>UTF-16 is used when computing the length and offsets of entities in the MTProto and bot APIs, by counting the number of UTF-16 code units (<strong>not</strong> code points).</p>
|
||||
<h3><a class="anchor" href="#computing-entity-length" id="computing-entity-length" name="computing-entity-length"><i class="anchor-icon"></i></a>Computing entity length</h3>
|
||||
<p>A simple, but not very efficient way of computing the entity length is converting the text to UTF-16, and then taking the byte length divided by 2 (=number of UTF-16 code units).</p>
|
||||
<p>However, since UTF-8 encodes codepoints in non-BMP planes as a 32-bit code unit starting with <code>0b11110</code>, a more efficient way to compute the entity length without converting the message to UTF-16 is the following: </p>
|
||||
<ul>
|
||||
|
@ -77,13 +77,13 @@ UTF-8 is used by the MTProto and Bot API when transmitting and receiving fields
|
|||
</ul>
|
||||
<p>Example: </p>
|
||||
<pre><code>length := 0
|
||||
for char in text {
|
||||
if (char & 0xc0) != 0x80 {
|
||||
length += 1 + (char >= 0xf0)
|
||||
for byte in text {
|
||||
if (byte & 0xc0) != 0x80 {
|
||||
length += 1 + (byte >= 0xf0)
|
||||
}
|
||||
}</code></pre>
|
||||
<p><strong>Note</strong>: the <em>length</em> of an entity <strong>must not</strong> include the length of trailing newlines or whitespaces, <code>rtrim</code> entities before computing their length: however, the next <em>offset</em> <strong>must</strong> include the length of newlines or whitespaces that precede it. </p>
|
||||
<p>Example implementation: <a href="https://github.com/tdlib/td/tree/master/td/telegram/MessageEntity.cpp">tdlib</a>.</p>
|
||||
<p>Example implementations: <a href="https://github.com/tdlib/td/tree/master/td/telegram/MessageEntity.cpp">tdlib</a>, <a href="https://github.com/danog/MadelineProto/blob/stable/src/danog/MadelineProto/TL/Conversion/DOMEntities.php">MadelineProto</a>.</p>
|
||||
<h2><a class="anchor" href="#allowed-entities" id="allowed-entities" name="allowed-entities"><i class="anchor-icon"></i></a>Allowed entities</h2>
|
||||
<p>For example the following HTML/Markdown aliases for message entities can be used:</p>
|
||||
<ul>
|
||||
|
|
|
@ -49,7 +49,7 @@
|
|||
<h4><a class="anchor" href="#error-type" id="error-type" name="error-type"><i class="anchor-icon"></i></a>Error Type</h4>
|
||||
<p>A string literal in the form of <code>/[A-Z_0-9]+/</code>, which summarizes the problem. For example, <code>AUTH_KEY_UNREGISTERED</code>. This is an optional parameter.</p>
|
||||
<h4><a class="anchor" href="#error-database" id="error-database" name="error-database"><i class="anchor-icon"></i></a>Error Database</h4>
|
||||
<p>A full machine-readable JSON list of RPC errors that can be returned by all methods in the API can be found <a href="/file/464001369/11a13/YluC1AKcb9I.83769.json/d4636d863f53e8e461">here »</a>, what follows is a description of its fields: </p>
|
||||
<p>A full machine-readable JSON list of RPC errors that can be returned by all methods in the API can be found <a href="/file/464001615/10635/9GTEXCYSRss.84110.json/6c3b6a35149e591c95">here »</a>, what follows is a description of its fields: </p>
|
||||
<ul>
|
||||
<li><code>errors</code> - All error messages and codes for each method (object).<ul>
|
||||
<li>Keys: Error codes as strings (numeric strings)</li>
|
||||
|
|
|
@ -44,13 +44,13 @@
|
|||
|
||||
<div id="dev_page_content"><p>File references are strings of bytes, that can be encountered in the <code>file_reference</code> fields of <a href="/constructor/document">document</a> and <a href="/constructor/photo">photo</a> objects.</p>
|
||||
<p>They must be cached by the client, along with the <strong>origin context</strong> where the document/photo object was found, in order to be refetched when the file reference expires.</p>
|
||||
<p>Example implementation of a reference database: <a href="https://github.com/danog/MadelineProto/blob/master/src/danog/MadelineProto/MTProtoTools/ReferenceDatabase.php">MadelineProto</a>, <a href="https://github.com/DrKLO/Telegram/blob/master/TMessagesProj/src/main/java/org/telegram/messenger/FileRefController.java">android</a>, <a href="https://github.com/telegramdesktop/tdesktop/blob/bec39d89e19670eb436dc794a8f20b657cb87c71/Telegram/SourceFiles/data/data_file_origin.cpp">telegram desktop</a>, <a href="https://github.com/tdlib/td/blob/56163c2460a65afc4db2c57ece576b8c38ea194b/td/telegram/FileReferenceManager.cpp">tdlib</a>.</p>
|
||||
<p>Example implementation of a reference database: <a href="https://github.com/danog/MadelineProto/blob/stable/src/danog/MadelineProto/MTProtoTools/ReferenceDatabase.php">MadelineProto</a>, <a href="https://github.com/DrKLO/Telegram/blob/master/TMessagesProj/src/main/java/org/telegram/messenger/FileRefController.java">android</a>, <a href="https://github.com/telegramdesktop/tdesktop/blob/bec39d89e19670eb436dc794a8f20b657cb87c71/Telegram/SourceFiles/data/data_file_origin.cpp">telegram desktop</a>, <a href="https://github.com/tdlib/td/blob/56163c2460a65afc4db2c57ece576b8c38ea194b/td/telegram/FileReferenceManager.cpp">tdlib</a>.</p>
|
||||
<h4><a class="anchor" href="#another-example" id="another-example" name="another-example"><i class="anchor-icon"></i></a>Another example:</h4>
|
||||
<p>Assume you receive a <a href="/constructor/message">message</a> from your friend: that message contains a <a href="/constructor/messageMediaPhoto">messageMediaPhoto</a> with a <a href="/constructor/photo">photo</a>.</p>
|
||||
<p>Your client has to cache not only the <code>file_reference</code> field of the photo, but also the context in which the file reference was seen (in this case, a message coming from a specific user).</p>
|
||||
<p>The context info is in this case, <a href="https://github.com/danog/MadelineProto/blob/master/src/danog/MadelineProto/MTProtoTools/ReferenceDatabase.php#L74">an origin context of type message</a>, containing the message ID and the peer ID of the chat/channel/user where the message was seen.</p>
|
||||
<p>The context info is in this case, <a href="https://github.com/danog/MadelineProto/blob/stable/src/danog/MadelineProto/MTProtoTools/ReferenceDatabase.php#L74">an origin context of type message</a>, containing the message ID and the peer ID of the chat/channel/user where the message was seen.</p>
|
||||
<p>The context info has to be associated with the file reference: when downloading a file using <a href="/method/upload.getFile">upload.getFile</a>, a <code>FILE_REFERENCE_EXPIRED</code> error (or another error starting with <code>FILE_REFERENCE_</code>) may be returned.<br>
|
||||
If this happens, the context info must be used to refetch the object that contained the file reference: in this example, the peer info and the message ID have to be used with <a href="/method/channels.getMessages">channels.getMessages</a> or <a href="/method/messages.getMessages">messages.getMessages</a> to <a href="https://github.com/danog/MadelineProto/blob/master/src/danog/MadelineProto/MTProtoTools/ReferenceDatabase.php#L481">refetch the message</a>, recache the file reference and use it in a new file download request.</p>
|
||||
If this happens, the context info must be used to refetch the object that contained the file reference: in this example, the peer info and the message ID have to be used with <a href="/method/channels.getMessages">channels.getMessages</a> or <a href="/method/messages.getMessages">messages.getMessages</a> to <a href="https://github.com/danog/MadelineProto/blob/stable/src/danog/MadelineProto/MTProtoTools/ReferenceDatabase.php#L481">refetch the message</a>, recache the file reference and use it in a new file download request.</p>
|
||||
<p>More than one origin context can be associated to one file reference, for greater resilience (in the case of a message that was deleted in one chat but was also forwarded in another chat, the file reference can be refetched from the second chat, instead).</p>
|
||||
<p>Origin contexts for objects returned by method calls with certain parameters can be considered, too (for example, in the case of favorited sticker sets returned by <a href="/method/messages.getFavedStickers">messages.getFavedStickers</a>).</p></div>
|
||||
|
||||
|
|
Loading…
Reference in a new issue