telegram-crawler/data/web/blogfork.telegram.org/mtproto/serialize.html
2024-02-14 15:36:17 +00:00

243 lines
22 KiB
HTML
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html class="">
<head>
<meta charset="utf-8">
<title>Binary Data Serialization</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta property="description" content="MTProto operation requires that elementary and composite data types as well as queries to which such data types are passed…">
<meta property="og:title" content="Binary Data Serialization">
<meta property="og:image" content="">
<meta property="og:description" content="MTProto operation requires that elementary and composite data types as well as queries to which such data types are passed…">
<link rel="icon" type="image/svg+xml" href="/img/website_icon.svg?4">
<link rel="apple-touch-icon" sizes="180x180" href="/img/apple-touch-icon.png">
<link rel="icon" type="image/png" sizes="32x32" href="/img/favicon-32x32.png">
<link rel="icon" type="image/png" sizes="16x16" href="/img/favicon-16x16.png">
<link rel="alternate icon" href="/img/favicon.ico" type="image/x-icon" />
<link href="/css/bootstrap.min.css?3" rel="stylesheet">
<link href="/css/telegram.css?236" rel="stylesheet" media="screen">
<style>
</style>
</head>
<body class="preload">
<div class="dev_page_wrap">
<div class="dev_page_head navbar navbar-static-top navbar-tg">
<div class="navbar-inner">
<div class="container clearfix">
<ul class="nav navbar-nav navbar-right hidden-xs"><li class="navbar-twitter"><a href="https://twitter.com/telegram" target="_blank" data-track="Follow/Twitter" onclick="trackDlClick(this, event)"><i class="icon icon-twitter"></i><span> Twitter</span></a></li></ul>
<ul class="nav navbar-nav">
<li><a href="//telegram.org/">Home</a></li>
<li class="hidden-xs"><a href="//telegram.org/faq">FAQ</a></li>
<li class="hidden-xs"><a href="//telegram.org/apps">Apps</a></li>
<li class=""><a href="/api">API</a></li>
<li class="active"><a href="/mtproto">Protocol</a></li>
<li class=""><a href="/schema">Schema</a></li>
</ul>
</div>
</div>
</div>
<div class="container clearfix">
<div class="dev_page">
<div id="dev_page_content_wrap" class=" ">
<div class="dev_page_bread_crumbs"><ul class="breadcrumb clearfix"><li><a href="/mtproto" >Mobile Protocol</a></li><i class="icon icon-breadcrumb-divider"></i><li><a href="/mtproto/serialize" >Binary Data Serialization</a></li></ul></div>
<h1 id="dev_page_title">Binary Data Serialization</h1>
<div id="dev_page_content"><p>MTProto operation requires that elementary and composite data types as well as queries to which such data types are passed as arguments or by which they are returned, be transmitted in binary format (i. e. <em>serialized</em>) .
The <a href="/mtproto/TL">TL language</a> is used to describe the data types to be serialized.</p>
<h3><a class="anchor" href="#general-definitions" id="general-definitions" name="general-definitions"><i class="anchor-icon"></i></a>General Definitions</h3>
<p>For our purposes, we can identify a <em>type</em> with the set of its <em>(serialized) values</em> understood as strings (finite sequences) of 32-bit numbers (transmitted in little endian order). </p>
<p>Therefore:</p>
<ul>
<li><em>Alphabet</em> (A), in this case, is a set of 32-bit numbers (normally, signed, i. e. between -2^31 and 2^31 - 1). </li>
<li><em>Value</em>, in this case, is the same as a <em>string in Alphabet A</em>, i. e. a finite (possibly, empty) sequence of 32-bit numbers. The set of all such sequences is designated as <em>A*</em>. </li>
<li><em>Type</em>, for our purposes, is the same as the set of legal values of a type, i. e. some set T which is a subset of A* and is a prefix code (i. e. no element of T may be a prefix for any other element). Therefore, any sequence from A* can contain no more than one prefix that is a member of T. </li>
<li><em>Value of Type T</em> is any sequence (value) which is a member of T as a subset of A*. </li>
<li><em>Compatible Types</em> are the types T and T' not intersecting as subsets of A*, such that the union of T and T' is a prefix code. </li>
<li><em>Coordinated System of Types</em> is a finite or infinite set of types T_1, ..., T_n, ..., such that any two types from this set are compatible. </li>
<li><em>Data Type</em> is the same as <em>type</em> in the sense of the definition above. </li>
<li><em>Functional Type</em> is a type describing a function; it is not a type in the sense of the definition above. Initially, we ignore the existence of functional types and describe only the data types; however, in reality, functional types will later be implemented in some extension of this system using the so-called <em>temporary combinators</em>. </li>
</ul>
<h4><a class="anchor" href="#combinators-constructors-composite-data-types" id="combinators-constructors-composite-data-types" name="combinators-constructors-composite-data-types"><i class="anchor-icon"></i></a>Combinators, Constructors, Composite Data Types</h4>
<ul>
<li>
<p><em>Combinator</em> is a function that takes arguments of certain types and returns a value of some other type. We normally look at combinators whose argument and result types are data types (rather than functional types). </p>
</li>
<li>
<p><em>Arity (of combinator)</em> is a non-negative integer, the number of combinator arguments. </p>
</li>
<li>
<p><em>Combinator identifier</em> is an identifier beginning with a lowercase Roman letter that uniquely identifies a combinator. </p>
</li>
<li>
<p><em>Combinator number</em> or <em>combinator name</em> is a 32-bit number (i.e., an element of A) that uniquely identifies a combinator. Most often, it is CRC32 of the string containing the combinator description without the final semicolon, and with one space between contiguous lexemes. This always falls in the range from 0x01000000 to 0xffffff00. The highest 256 values are reserved for the so-called <em>temporal-logic combinators</em> used to transmit functions. We frequently denote as <em>combinator</em> the combinator name with single quotes: '<em>combinator</em>'.</p>
</li>
<li>
<p><em>Combinator description</em> is a string of format <code>combinator_name type_arg_1 ... type_arg_N = type_res;</code> where <code>N</code> stands for the arity of the combinator, <code>type_arg_i</code> is the type of the i-th argument (or rather, a string with the combinator name), and <code>type_res</code> is the combinator value type.</p>
</li>
<li>
<p><em>Constructor</em> is a combinator that cannot be computed (reduced). This is used to represent composite data types. For example, combinator 'int_tree' with description <code>int_tree IntTree int IntTree = IntTree</code>, alongside combinator <code>empty_tree = IntTree</code>, may be used to define a composite data type called “IntTree” that takes on values in the form of binary trees with integers as nodes. </p>
</li>
<li>
<p><em>Function (functional combinator)</em> is a combinator which may be computed (reduced) on condition that the requisite number of arguments of requisite types are provided. The result of the computation is an expression consisting of constructors and base type values only. </p>
</li>
<li>
<p><em>Normal form</em> is an expression consisting only of constructors and base type values; that which is normally the result of computing a function. </p>
</li>
<li>
<p><em>Type identifier</em> is an identifier that normally starts with a capital letter in Roman script and uniquely identifies the type.</p>
</li>
<li>
<p><em>Type number</em> or <em>type name</em> is a 32-bit number that uniquely identifies a type; it normally is the sum of the CRC32 values of the descriptions of the type constructors.</p>
</li>
<li>
<p><em>Description of (composite) Type T</em> is a collection of the descriptions of all constructors that take on Type <em>T</em> values. This is normally written as text with each string containing the description of a single constructor. Here is a description of Type 'IntTree', for example:</p>
<p>int_tree IntTree int IntTree = IntTree;
empty_tree = IntTree;</p>
</li>
<li>
<p><em>Polymorphic type</em> is a type whose description contains parameters (<em>type variables</em>) in lieu of actual types; approximately, what would be a template in C++. Here is a description of Type <code>List alpha</code> where <code>List</code> is a polymorphic type of arity 1 (i. e., dependent on a single argument), and <code>alpha</code> is a type variable which appears as the constructor's optional parameter (in curly braces): </p>
<p>cons {alpha:Type} alpha (List alpha) = List alpha;
nil {alpha:Type} = List alpha;</p>
</li>
<li>
<p><em>Value of (composite) Type T</em> is any sequence from A* in the format <code>constr_num arg1 ... argN</code>, where constr_num is the index number of some Constructor <em>C</em> which takes on values of Type <em>T</em>, and arg_i is a value of Type <em>T_i</em> which is the type of the i-th argument to Constructor <em>C</em>. For example, let Combinator int_tree have the index number 17, whereas Combinator empty_tree has the index number 239. Then, the value of Type <code>IntTree</code> is, for example, <code>17 17 239 1 239 2 239</code> which is more conveniently written as <code>'int_tree' 'int_tree' 'empty_tree' 1 'empty_tree' 2 'empty_tree'</code>. From the standpoint of a high-level language, this is <code>int_tree (int_tree (empty_tree) 1 (empty_tree)) 2 (empty_tree): IntTree</code>.</p>
</li>
<li>
<p><em>Schema</em> is a collection of all the (composite) data type descriptions. This is used to define some agreed-to system of types.</p>
</li>
</ul>
<h4><a class="anchor" href="#boxed-and-bare-types" id="boxed-and-bare-types" name="boxed-and-bare-types"><i class="anchor-icon"></i></a>Boxed and Bare Types</h4>
<ul>
<li><em>Boxed type</em> is a type any value of which starts with the constructor number. Since every constructor has a uniquely determined value type, the first number in any boxed type value uniquely defines its type. This guarantees that the various boxed types in totality make up a coordinated system of types. A boxed type identifier is always capitalized.</li>
<li><em>Bare type</em> is a type whose values do not contain a constructor number, which is implied instead. A bare type identifier always coincides with the name of the implied constructor (and therefore, begins with a lowercase letter) which may be padded at the front by the percentage sign (%). In addition, if <code>X</code> is a boxed type with no more than a single constructor, then <code>%X</code> refers to the corresponding bare type. The values of a bare type are identical with the set of number sequences obtained by dropping the first number (i. e., the external constructor index number) from the set of values of the corresponding boxed type (which is the result type of the selected constructor), starting with the selected constructor index number. For example, <code>3 4</code> is a value of the <code>int_couple</code> bare type, defined using <code>int_couple int int = IntCouple</code>. The corresponding boxed type is <code>IntCouple</code>; if 404 is the constructor index number for <code>int_couple</code>, then <code>404 3 4</code> is the value for the <code>IntCouple</code> boxed type which corresponds to the value of the bare type <code>int_couple</code> (also known as <code>%int_couple</code> and <code>%IntCouple</code>; the latter form is conceptually preferable but longer).</li>
</ul>
<p>Conceptually, only boxed types should be used everywhere. However, for speed and compactness, bare types have to be used (for instance, an array of 10,000 bare int values is 40,000 bytes long, whereas boxed Int values take up twice as much space; therefore, when transmitting a large array of integer identifiers, say, it is more efficient to use the <code>Vector int</code> type rather than <code>Vector Int</code>). In addition, all base types (int, long, double, string) are bare.</p>
<p>If a boxed type is polymorphic of type arity r, this is also true of any derived bare type. In other words, if one were to define <code>intCouple {alpha:Type} int alpha = IntCouple alpha</code>, then, thereafter, intCouple as an identifier would also be a polymorphic type of arity 1 in combinator (and consequently, in constructor and type) descriptions. The notations <code>intCouple X</code>, <code>%(IntCouple X)</code>, and <code>%IntCouple X</code> are equivalent.</p>
<h4><a class="anchor" href="#base-types" id="base-types" name="base-types"><i class="anchor-icon"></i></a>Base Types</h4>
<p>Base types exist both as bare (int, long, double, string) and as boxed (Int, Long, Double, String) versions. Their <em>constructor</em> identifiers coincide with the names of the relevant bare types. Their pseudodescriptions have the following appearance:</p>
<pre><code>int ? = Int;
long ? = Long;
double ? = Double;
string ? = String;</code></pre>
<p>Consequently, the <code>int</code> constructor index number, for example, is the CRC32 of the string <code>"int ? = Int"</code>.</p>
<p>The values of bare type <code>int</code> are exactly all the single-element <em>sequences</em>, i. e. numbers between -2^31 and 2^31-1 represent themselves in this case. Values of type <code>long</code> are two-element sequences that are 64-bit signed numbers (little endian again). Values of type <code>double</code>, again, are two-element sequences containing 64-bit real numbers in a standard double format. And finally, the values of type <code>string</code> look differently depending on the length L of the string being serialized:</p>
<ul>
<li>If L &lt;= 253, the serialization contains one byte with the value of L, then L bytes of the string followed by 0 to 3 characters containing 0, such that the overall length of the value be divisible by 4, whereupon all of this is interpreted as a sequence of int(L/4)+1 32-bit numbers. </li>
<li>If L &gt;= 254, the serialization contains byte 254, followed by 3 bytes with the string length L, followed by L bytes of the string, further followed by 0 to 3 null padding bytes.</li>
</ul>
<h4><a class="anchor" href="#object-pseudotype" id="object-pseudotype" name="object-pseudotype"><i class="anchor-icon"></i></a>Object Pseudotype</h4>
<p>The <code>Object</code> pseudotype is a “type” which can take on values that belong to any boxed type in the schema. This helps quickly define such types as <em>list of random items</em> without using polymorphic types. It is best not to abuse this capability since it results in the use of dynamic typing. Nonetheless, it is hard to imagine the data structures that we know from PHP and JSON without using the Object pseudotype.</p>
<p>It is recommended to use <code>TypedObject</code> instead whenever possible:</p>
<pre><code>object X:Type value:X = TypedObject;</code></pre>
<h4><a class="anchor" href="#built-in-composite-types-vectors-and-associative-arrays" id="built-in-composite-types-vectors-and-associative-arrays" name="built-in-composite-types-vectors-and-associative-arrays"><i class="anchor-icon"></i></a>Built-In Composite Types: Vectors and Associative Arrays</h4>
<p>The Vector t polymorphic pseudotype is a “type” whose value is a sequence of values of any type t, either boxed or bare.</p>
<pre><code>vector {t:Type} # [ t ] = Vector t;</code></pre>
<p>Serialization always uses the same constructor “vector” (const 0x1cb5c415 = crc32("vector t:Type # [ t ] = Vector t”) that is not dependent on the specific value of the variable of type t. The value of the Vector t type is the index number of the relevant constructor number followed by N, the number of elements in the vector, and then by N values of type t. The value of the optional parameter t is not involved in the serialization since it is derived from the result type (always known prior to deserialization).</p>
<p>Polymorphic pseudotypes IntHash t and StrHash t are associative arrays mapping integer and string keys to values of type t. They are, in fact, vectors containing bare 2-tuples (int, t) or (string, t):</p>
<pre><code>coupleInt {t:Type} int t = CoupleInt t;
intHash {t:Type} (vector %(CoupleInt t)) = IntHash t;
coupleStr {t:Type} string t = CoupleStr t;
strHash {t:Type} (vector %(CoupleStr t)) = StrHash t;</code></pre>
<p>The percentage sign, in this case, means that a bare type that corresponds to the boxed type in parentheses is taken; the boxed type in question must have no more than a single constructor, whatever the values of the parameters.</p>
<p>The keys may be sorted or be in some other order (as in PHP arrays). For associative arrays with sorted keys, the IntSortedHash or StrSortedHash alias is used:</p>
<pre><code>intSortedHash {t:Type} (intHash t) = IntSortedHash t;
strSortedHash {t:Type} (strHash t) = StrSortedHash t;</code></pre>
<h4><a class="anchor" href="#polymorphic-type-constructors" id="polymorphic-type-constructors" name="polymorphic-type-constructors"><i class="anchor-icon"></i></a>Polymorphic Type Constructors</h4>
<p>The constructor of a polymorphic type does not depend on the specific types to which the polymorphic type is applied. When it is computed, optional parameters (normally containing type variables and placed in curly braces) cease to be optional (the curly braces are removed), and, in addition to that, all parenthesis are also removed. Therefore, </p>
<pre><code>vector {t:Type} # [ t ] = Vector t;</code></pre>
<p>corresponds to the constructor number crc32("vector t:Type # [ t ] = Vector t") = 0x1cb5c415. During (de)serialization, the specific values of the optional variable t are derived from the result type (i. e. the object being serialized or deserialized) that is always known, and are never serialized explicitly.</p>
<p>Previously, it had to be known which specific variable types each polymorphic type will apply to. To accomplish this, the type system used strings of the form</p>
<pre><code>polymorphic_type_name type_1 ... type_N;</code></pre>
<p>For example,</p>
<pre><code>Vector int;
Vector string;
Vector Object;</code></pre>
<p>Now they are ignored.</p>
<p>See also <a href="/mtproto/TL-polymorph">polymorphism in TL</a>.</p>
<p>In this case, the Object pseudotype permits using Vector Object to store lists of anything (the values of any boxed types). Since bare types are efficient when short, in practice it is unlikely that cases more complex than the ones cited above will be required. </p>
<h4><a class="anchor" href="#field-names" id="field-names" name="field-names"><i class="anchor-icon"></i></a>Field Names</h4>
<p>Let us say that we need to represent <em>users</em> as triplets containing one integer (user ID) and two strings (first and last names). The requisite data structure is the triplet int, string, string which may be declared as follows:</p>
<pre><code>user int string string = User;</code></pre>
<p>On the other hand, a group may be described by a similar triplet consisting of a group ID, its name, and description:</p>
<pre><code>group int string string = Group;</code></pre>
<p>For the difference between User and Group to be clear, it is convenient to assign names to some or all of the fields:</p>
<pre><code>user id:int first_name:string last_name:string = User;
group id:int title:string description:string = Group;</code></pre>
<p>If the User type needs to be extended at a later time by having records with some additional field added to it, it could be accomplished as follows:</p>
<pre><code>userv2 id:int unread_messages:int first_name:string last_name:string in_groups:vector int = User;</code></pre>
<p>Aside from other things, this approach helps define correct mappings between fields that belong to different constructors of the same type, convert between them as well as convert type values into an associative array with string keys (field names, if defined, are natural choices for such keys).</p>
<h3><a class="anchor" href="#tl-language" id="tl-language" name="tl-language"><i class="anchor-icon"></i></a>TL Language</h3>
<p>See <a href="/mtproto/TL">TL Language</a></p></div>
</div>
</div>
</div>
<div class="footer_wrap">
<div class="footer_columns_wrap footer_desktop">
<div class="footer_column footer_column_telegram">
<h5>Telegram</h5>
<div class="footer_telegram_description"></div>
Telegram is a cloud-based mobile and desktop messaging app with a focus on security and speed.
</div>
<div class="footer_column">
<h5><a href="//telegram.org/faq">About</a></h5>
<ul>
<li><a href="//telegram.org/faq">FAQ</a></li>
<li><a href="//telegram.org/privacy">Privacy</a></li>
<li><a href="//telegram.org/press">Press</a></li>
</ul>
</div>
<div class="footer_column">
<h5><a href="//telegram.org/apps#mobile-apps">Mobile Apps</a></h5>
<ul>
<li><a href="//telegram.org/dl/ios">iPhone/iPad</a></li>
<li><a href="//telegram.org/android">Android</a></li>
<li><a href="//telegram.org/dl/web">Mobile Web</a></li>
</ul>
</div>
<div class="footer_column">
<h5><a href="//telegram.org/apps#desktop-apps">Desktop Apps</a></h5>
<ul>
<li><a href="//desktop.telegram.org/">PC/Mac/Linux</a></li>
<li><a href="//macos.telegram.org/">macOS</a></li>
<li><a href="//telegram.org/dl/web">Web-browser</a></li>
</ul>
</div>
<div class="footer_column footer_column_platform">
<h5><a href="/">Platform</a></h5>
<ul>
<li><a href="/api">API</a></li>
<li><a href="//translations.telegram.org/">Translations</a></li>
<li><a href="//instantview.telegram.org/">Instant View</a></li>
</ul>
</div>
</div>
<div class="footer_columns_wrap footer_mobile">
<div class="footer_column">
<h5><a href="//telegram.org/faq">About</a></h5>
</div>
<div class="footer_column">
<h5><a href="//telegram.org/blog">Blog</a></h5>
</div>
<div class="footer_column">
<h5><a href="//telegram.org/apps">Apps</a></h5>
</div>
<div class="footer_column">
<h5><a href="/">Platform</a></h5>
</div>
<div class="footer_column">
<h5><a href="https://twitter.com/telegram" target="_blank" data-track="Follow/Twitter" onclick="trackDlClick(this, event)">Twitter</a></h5>
</div>
</div>
</div>
</div>
<script src="/js/main.js?47"></script>
<script>backToTopInit("Go up");
removePreloadInit();
</script>
</body>
</html>