telegram-crawler/data/web/blogfork.telegram.org/api/end-to-end/video-calls.html
2022-12-10 22:50:15 +00:00

218 lines
20 KiB
HTML
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html class="">
<head>
<meta charset="utf-8">
<title>End-to-End Encrypted Voice and Video Calls</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta property="description" content="This article describes the end-to-end encryption used for Telegram voice and video calls.
Related Articles
End-to-End Encryption…">
<meta property="og:title" content="End-to-End Encrypted Voice and Video Calls">
<meta property="og:image" content="">
<meta property="og:description" content="This article describes the end-to-end encryption used for Telegram voice and video calls.
Related Articles
End-to-End Encryption…">
<link rel="icon" type="image/svg+xml" href="/img/website_icon.svg?4">
<link rel="apple-touch-icon" sizes="180x180" href="/img/apple-touch-icon.png">
<link rel="icon" type="image/png" sizes="32x32" href="/img/favicon-32x32.png">
<link rel="icon" type="image/png" sizes="16x16" href="/img/favicon-16x16.png">
<link rel="alternate icon" href="/img/favicon.ico" type="image/x-icon" />
<link href="/css/bootstrap.min.css?3" rel="stylesheet">
<link href="/css/telegram.css?233" rel="stylesheet" media="screen">
<style>
</style>
</head>
<body class="preload">
<div class="dev_page_wrap">
<div class="dev_page_head navbar navbar-static-top navbar-tg">
<div class="navbar-inner">
<div class="container clearfix">
<ul class="nav navbar-nav navbar-right hidden-xs"><li class="navbar-twitter"><a href="https://twitter.com/telegram" target="_blank" data-track="Follow/Twitter" onclick="trackDlClick(this, event)"><i class="icon icon-twitter"></i><span> Twitter</span></a></li></ul>
<ul class="nav navbar-nav">
<li><a href="//telegram.org/">Home</a></li>
<li class="hidden-xs"><a href="//telegram.org/faq">FAQ</a></li>
<li class="hidden-xs"><a href="//telegram.org/apps">Apps</a></li>
<li class="active"><a href="/api">API</a></li>
<li class=""><a href="/mtproto">Protocol</a></li>
<li class=""><a href="/schema">Schema</a></li>
</ul>
</div>
</div>
</div>
<div class="container clearfix">
<div class="dev_page">
<div id="dev_page_content_wrap" class=" ">
<div class="dev_page_bread_crumbs"><ul class="breadcrumb clearfix"><li><a href="/api" >API</a></li><i class="icon icon-breadcrumb-divider"></i><li><a href="/api/end-to-end%2Fvideo-calls" >End-to-End Encrypted Voice and Video Calls</a></li></ul></div>
<h1 id="dev_page_title">End-to-End Encrypted Voice and Video Calls</h1>
<div id="dev_page_content"><p>This article describes the end-to-end encryption used for Telegram <strong>voice</strong> and <strong>video calls</strong>.</p>
<h5><a class="anchor" href="#related-articles" id="related-articles" name="related-articles"><i class="anchor-icon"></i></a>Related Articles</h5>
<p><div class="dev_page_nav_wrap"></p>
<ul>
<li><a href="/api/end-to-end">End-to-End Encryption in Secret Chats</a></li>
<li><a href="/mtproto/security_guidelines">Security Guidelines for Client Developers</a>
</div></li>
</ul>
<hr>
<h2><a class="anchor" href="#establishing-calls" id="establishing-calls" name="establishing-calls"><i class="anchor-icon"></i></a>Establishing Calls</h2>
<p>Before a call is ready, some preliminary actions have to be performed. The calling party needs to contact the party to be called and check whether it is ready to accept the call. Besides that, the parties have to negotiate the protocols to be used, learn the IP addresses of each other or of the Telegram relay servers to be used (so-called <em>reflectors</em>), and generate a one-time encryption key for this voice call with the aid of <em>Diffie--Hellman key exchange</em>. All of this is accomplished in parallel with the aid of several Telegram API methods and related notifications. This document covers details related to key generation, encryption and security.</p>
<h2><a class="anchor" href="#key-generation" id="key-generation" name="key-generation"><i class="anchor-icon"></i></a>Key Generation</h2>
<p>The Diffie-Hellman key exchange, as well as the whole protocol used to create a new voice call, is quite similar to the one used for <a href="/api/end-to-end#key-generation">Secret Chats</a>. We recommend studying the linked article before proceeding.</p>
<p>However, we have introduced some important changes to facilitate the <a href="#key-verification">key verification process</a>. Below is the entire exchange between the two communicating parties, the Caller (A) and the Callee (B), through the Telegram servers (S).</p>
<ul>
<li><em>A</em> executes <a href="/method/messages.getDhConfig">messages.getDhConfig</a> to find out the 2048-bit Diffie-Hellman prime <em>p</em> and generator <em>g</em>. The client is expected to check whether <em>p</em> is a safe prime and perform all the <a href="/api/end-to-end#sending-a-request">security checks</a> necessary for secret chats.</li>
<li><em>A</em> chooses a random value of <em>a</em>, 1 &lt; a &lt; p-1, and computes <em>g_a:=power(g,a) mod p</em> (a 256-byte number) and <em>g_a_hash:=SHA256(g_a)</em> (32 bytes long).</li>
<li><em>A</em> invokes (sends to server <em>S</em>) <a href="/method/phone.requestCall">phone.requestCall</a>, which has the field <code>g_a_hash:bytes</code>, among others. For this call, this field is to be filled with <em>g_a_hash</em>, <strong>not</strong> <em>g_a</em> itself.</li>
<li>The Server <em>S</em> performs privacy checks and sends an <a href="/constructor/updatePhoneCall">updatePhoneCall</a> update with a <a href="/constructor/phoneCallRequested">phoneCallRequested</a> constructor to all of <em>B</em>'s active devices. This update, apart from the identity of <em>A</em> and other relevant parameters, contains the <em>g_a_hash</em> field, filled with the value obtained from <em>A</em>.</li>
<li><em>B</em> accepts the call on one of their devices, stores the received value of <em>g_a_hash</em> for this instance of the voice call creation protocol, chooses a random value of <em>b</em>, 1 &lt; b &lt; p-1, computes <em>g_b:=power(g,b) mod p</em>, performs all the required security checks, and invokes the <a href="/method/phone.acceptCall">phone.acceptCall</a> method, which has a <em>g_b:bytes</em> field (among others), to be filled with the value of <em>g_b</em> itself (not its hash).</li>
<li>The Server <em>S</em> sends an <a href="/constructor/updatePhoneCall">updatePhoneCall</a> with the <a href="/constructor/phoneCallDiscarded">phoneCallDiscarded</a> constructor to all other devices <em>B</em> has authorized, to prevent accepting the same call on any of the other devices. From this point on, the server <em>S</em> works only with that of <em>B</em>'s devices which has invoked <a href="/method/phone.acceptCall">phone.acceptCall</a> first.</li>
<li>The Server <em>S</em> sends to <em>A</em> an <a href="/constructor/updatePhoneCall">updatePhoneCall</a> update with <a href="/constructor/phoneCallAccepted">phoneCallAccepted</a> constructor, containing the value of <em>g_b</em> received from <em>B</em>.</li>
<li><em>A</em> performs all the usual security checks on <em>g_b</em> and <em>a</em>, computes the Diffie--Hellman key <em>key:=power(g_b,a) mod p</em> and its fingerprint <em>key_fingerprint:long</em>, equal to the lower 64 bits of <em>SHA1(key)</em>, the same as with secret chats. Then <em>A</em> invokes the <a href="/method/phone.confirmCall">phone.confirmCall</a> method, containing <code>g_a:bytes</code> and <code>key_fingerprint:long</code>.</li>
<li>The Server <em>S</em> sends to <em>B</em> an <a href="/constructor/updatePhoneCall">updatePhoneCall</a> update with the <a href="/constructor/phoneCall">phoneCall</a> constructor, containing the value of <em>g_a</em> in <em>g_a_or_b:bytes</em> field, and <em>key_fingerprint:long</em></li>
<li>At this point <em>B</em> receives the value of <em>g_a</em>. It checks that <em>SHA256(g_a)</em> is indeed equal to the previously received value of <em>g_a_hash</em>, performs all the <a href="/mtproto/security_guidelines">usual Diffie-Hellman security checks</a>, and computes the key <em>key:=power(g_a,b) mod p</em> and its fingerprint, equal to the lower 64 bits of <em>SHA1(key)</em>. Then it checks that this fingerprint equals the value of <code>key_fingerprint:long</code> received from the other side, as an implementation sanity check.</li>
</ul>
<p>At this point, the Diffie--Hellman key exchange is complete, and both parties have a 256-byte shared secret key <em>key</em> which is used to encrypt all further exchanges between <em>A</em> and <em>B</em>.</p>
<p>It is of paramount importance to accept each update only once for each instance of the key generation protocol, discarding any duplicates or alternative versions of already received and processed messages (updates).</p>
<h2><a class="anchor" href="#encryption" id="encryption" name="encryption"><i class="anchor-icon"></i></a>Encryption</h2>
<blockquote>
<p>This document describes encryption in <strong>voice and video calls</strong> as implemented in Telegram apps with versions <strong>7.0</strong> and above. See <a href="https://core.telegram.org/api/end-to-end/voice-calls">this document</a> for details on encryption used in <strong>voice calls</strong> in app versions released before <strong>August 14, 2020</strong>.</p>
</blockquote>
<p>The <a href="https://github.com/TelegramMessenger/tgcalls">Telegram Voice and Video Call Library</a> uses an optimized version of <a href="/">MTProto 2.0</a> to send and receive <strong>packets</strong>, consisting of one or more end-to-end encrypted <strong>messages</strong> of various types (<a href="https://webrtcglossary.com/ice/"><em>ice</em></a> <em>candidates list, video formats, remote video status, audio stream data, video stream data, message ack</em> or <em>empty</em>).</p>
<p>This document describes only the encryption process, leaving out encoding and network-dependent parts.</p>
<p>The library starts working with:</p>
<ul>
<li>An <a href="#key-generation">encryption key</a> <code>key</code> shared between the parties, as generated above.</li>
<li>Information whether the call is <strong>outgoing</strong> or <strong>incoming</strong>.</li>
<li>Two data transfer channels: <strong>signaling</strong>, offered by the Telegram API, and <strong>transport</strong> based on WebRTC.</li>
</ul>
<p>Both data transfer channels are unreliable (messages may get lost), but <strong>signaling</strong> is slower and more reliable.</p>
<h3><a class="anchor" href="#encrypting-call-data" id="encrypting-call-data" name="encrypting-call-data"><i class="anchor-icon"></i></a>Encrypting Call Data</h3>
<p>The body of a packet (<code>decrypted_body</code>) consists of several messages and their respective <code>seq</code> numbers concatenated together.</p>
<ul>
<li>decrypted_body = message_seq1 + message_body1 + message_seq2 + message_body2</li>
</ul>
<p>Each <code>decrypted_body</code> is unique because no two <code>seq</code> numbers of the first message can be the same. If only old messages need to be re-sent, an <em>empty</em> message with new unique <code>seq</code> is added to the packet first.</p>
<p>The <a href="#key-generation">encryption key</a> <code>key</code> is used to compute a 128-bit <code>msg_key</code> and then a 256-bit <code>aes_key</code> and a 128-bit <code>aes_iv</code>:</p>
<ul>
<li>msg_key_large = SHA256 (substr(key, 88+x, 32) + decrypted_body);</li>
<li>msg_key = substr (msg_key_large, 8, 16);</li>
<li>sha256_a = SHA256 (msg_key + substr (key, x, 36));</li>
<li>sha256_b = SHA256 (substr (key, 40+x, 36) + msg_key);</li>
<li>aes_key = substr (sha256_a, 0, 8) + substr (sha256_b, 8, 16) + substr (sha256_a, 24, 8);</li>
<li>aes_iv = substr (sha256_b, 0, 4) + substr (sha256_a, 8, 8) + substr (sha256_b, 24, 4);</li>
</ul>
<p><code>x</code> depends on whether the call is <strong>outgoing</strong> or <strong>incoming</strong> and on the connection type:</p>
<ul>
<li>x = 0 for <strong>outgoing</strong> + <strong>transport</strong></li>
<li>x = 8 for <strong>incoming</strong> + <strong>transport</strong></li>
<li>x = 128 for <strong>outgoing</strong> + <strong>signaling</strong></li>
<li>x = 136 for <strong>incoming</strong> + <strong>signaling</strong></li>
</ul>
<p>This allows apps to decide which packet types will be sent to which connections and work in these connections independently (with each having its own <code>seq</code> counter).</p>
<p>The resulting <code>aes_key</code> and <code>aes_iv</code> are used to encrypt <code>decrypted_body</code>:</p>
<ul>
<li>encrypted_body = AES_CTR (decrypted_body, aes_key, aes_iv)</li>
</ul>
<p>The packet that gets sent consists of <code>msg_key</code> and <code>encrypted_body</code>:</p>
<ul>
<li>packet_bytes = msg_key + encrypted_body</li>
</ul>
<p>When received, the packet gets decrypted using <code>key</code> and <code>msg_key</code>, after which <code>msg_key</code> is checked against the relevant <code>SHA256</code> substring. If the check fails, the packet <strong>must</strong> be discarded.</p>
<h3><a class="anchor" href="#protecting-against-replay-attacks" id="protecting-against-replay-attacks" name="protecting-against-replay-attacks"><i class="anchor-icon"></i></a>Protecting Against Replay Attacks</h3>
<p>Each of the peers maintains its own 32-bit monotonically increasing counter for outgoing messages, <code>seq</code>, starting with <code>1</code>. This <code>seq</code> counter is prepended to each sent message and increased by <code>1</code> for each new message. No two <code>seq</code> numbers of the first message in a packet can be the same. If only old messages need to be re-sent, an <em>empty</em> message with a new unique <code>seq</code> is added to the packet first. When the <code>seq</code> counter reaches <code>2^30</code>, the call must be aborted. Each peer stores <code>seq</code> values of all the messages it has received (and processed) which are larger than <code>max_received_seq - 64</code>, where <code>max_received_seq</code> is the largest <code>seq</code> number received so far.</p>
<p>If a packet is received, the first message of which has a <code>seq</code> that is smaller or equal to <code>max_received_seq - 64</code> or its <code>seq</code> had already been received, the message is discarded. Otherwise, the <code>seq</code> values of all incoming messages are memorized and <code>max_received_seq</code> is adjusted. This guarantees that no two packets will be processed twice.</p>
<h2><a class="anchor" href="#key-verification" id="key-verification" name="key-verification"><i class="anchor-icon"></i></a>Key Verification</h2>
<p>To verify the key, and ensure that no MITM attack is taking place, both parties concatenate the secret key <em>key</em> with the value <em>g_a</em> of the Caller ( <em>A</em> ), compute SHA256 and use it to generate a sequence of emoticons. More precisely, the SHA256 hash is split into four 64-bit integers; each of them is divided by the total number of emoticons used (currently 333), and the remainder is used to select specific emoticons. The specifics of the protocol guarantee that comparing four emoticons out of a set of 333 is sufficient to prevent eavesdropping (MiTM attack on DH) with a probability of <strong>0.9999999999</strong>.</p>
<p>This is because instead of the standard Diffie-Hellman key exchange which requires only two messages between the parties:</p>
<ul>
<li>A-&gt;B : (generates a and) sends g_a := g^a</li>
<li>B-&gt;A : (generates b and true key (g_a)^b, then) sends g_b := g^b</li>
<li>A : computes key (g_b)^a</li>
</ul>
<p>we use a <strong>three-message modification</strong> thereof that works well when both parties are online (which also happens to be a requirement for voice calls):</p>
<ul>
<li>A-&gt;B : (generates a and) sends g_a_hash := hash(g^a)</li>
<li>B-&gt;A : (stores g_a_hash, generates b and) sends g_b := g^b</li>
<li>A-&gt;B : (computes key (g_b)^a, then) sends g_a := g^a</li>
<li>B : checks hash(g_a) == g_a_hash, then computes key (g_a)^b</li>
</ul>
<p>The idea here is that <em>A</em> commits to a specific value of <em>a</em> (and of <em>g_a</em>) without disclosing it to <em>B</em>. <em>B</em> has to choose its value of <em>b</em> and <em>g_b</em> without knowing the true value of <em>g_a</em>, so that it cannot try different values of <em>b</em> to force the final key <em>(g_a)^b</em> to have any specific properties (such as fixed lower 32 bits of SHA256(key)). At this point, <em>B</em> commits to a specific value of <em>g_b</em> without knowing <em>g_a</em>. Then <em>A</em> has to send its value <em>g_a</em>; it cannot change it even though it knows <em>g_b</em> now, because the other party <em>B</em> would accept only a value of <em>g_a</em> that has a hash specified in the very first message of the exchange.</p>
<p>If some impostor is pretending to be either <em>A</em> or <em>B</em> and tries to perform a Man-in-the-Middle Attack on this Diffie--Hellman key exchange, the above still holds. Party <em>A</em> will generate a shared key with <em>B</em> -- or whoever pretends to be <em>B</em> -- without having a second chance to change its exponent <em>a</em> depending on the value <em>g_b</em> received from the other side; and the impostor will not have a chance to adapt his value of <em>b</em> depending on <em>g_a</em>, because it has to commit to a value of <em>g_b</em> before learning <em>g_a</em>. The same is valid for the key generation between the impostor and the party <em>B</em>.</p>
<p>The use of hash commitment in the DH exchange constrains the attacker to only <strong>one guess</strong> to generate the correct visualization in their attack, which means that using just over 33 bits of entropy represented by four emoji in the visualization is enough to make a successful attack highly improbable.</p>
<blockquote>
<p>For a slightly more user-friendly explanation of the above see: <a href="https://core.telegram.org/techfaq#q-how-are-voice-calls-authenticated">How are calls authenticated?</a></p>
</blockquote></div>
</div>
</div>
</div>
<div class="footer_wrap">
<div class="footer_columns_wrap footer_desktop">
<div class="footer_column footer_column_telegram">
<h5>Telegram</h5>
<div class="footer_telegram_description"></div>
Telegram is a cloud-based mobile and desktop messaging app with a focus on security and speed.
</div>
<div class="footer_column">
<h5><a href="//telegram.org/faq">About</a></h5>
<ul>
<li><a href="//telegram.org/faq">FAQ</a></li>
<li><a href="//telegram.org/privacy">Privacy</a></li>
<li><a href="//telegram.org/press">Press</a></li>
</ul>
</div>
<div class="footer_column">
<h5><a href="//telegram.org/apps#mobile-apps">Mobile Apps</a></h5>
<ul>
<li><a href="//telegram.org/dl/ios">iPhone/iPad</a></li>
<li><a href="//telegram.org/android">Android</a></li>
<li><a href="//telegram.org/dl/web">Mobile Web</a></li>
</ul>
</div>
<div class="footer_column">
<h5><a href="//telegram.org/apps#desktop-apps">Desktop Apps</a></h5>
<ul>
<li><a href="//desktop.telegram.org/">PC/Mac/Linux</a></li>
<li><a href="//macos.telegram.org/">macOS</a></li>
<li><a href="//telegram.org/dl/web">Web-browser</a></li>
</ul>
</div>
<div class="footer_column footer_column_platform">
<h5><a href="/">Platform</a></h5>
<ul>
<li><a href="/api">API</a></li>
<li><a href="//translations.telegram.org/">Translations</a></li>
<li><a href="//instantview.telegram.org/">Instant View</a></li>
</ul>
</div>
</div>
<div class="footer_columns_wrap footer_mobile">
<div class="footer_column">
<h5><a href="//telegram.org/faq">About</a></h5>
</div>
<div class="footer_column">
<h5><a href="//telegram.org/blog">Blog</a></h5>
</div>
<div class="footer_column">
<h5><a href="//telegram.org/apps">Apps</a></h5>
</div>
<div class="footer_column">
<h5><a href="/">Platform</a></h5>
</div>
<div class="footer_column">
<h5><a href="https://twitter.com/telegram" target="_blank" data-track="Follow/Twitter" onclick="trackDlClick(this, event)">Twitter</a></h5>
</div>
</div>
</div>
</div>
<script src="/js/main.js?47"></script>
<script>backToTopInit("Go up");
removePreloadInit();
</script>
</body>
</html>