1. 背景
截止目前,IETF QUIC最新版本已至draft-29。chromium官方也放出消息说正在加大部署IQUIC。IQUIC成为未来的QUIC标准,这促使我们去关注IQUIC的各种特性。0-RTT可以提高建连的速度。但是在实际测试IQUIC时,我们发现GQUIC可以正常启用0-RTT,但是IQUIC却始终无法生效。
2. 分析
客户端测试工具:
1) google quiche的quic_client
2) chrome浏览器: stable和canary
0-RTT建连没能成功,就是要找到其中的原因。我们的开发环境是chromium 87.0.4259.0,这是官方的一个中间版本,大概在2020年9月1日放出。经过调研我们发现这个版本的quiche还没有实现early_data_reason的相关接口,也就是没有提供0-RTT被拒绝的接口。不过在一个星期后(大约9月7日)的版本增加了该接口。这个接口还是挺有用的,方便系统告知我们0-RTT失败的原因。考虑到更新chromium的时间成本略高,于是我们自己复制相关代码实现了该接口:
--- a/quic/core/crypto/crypto_utils.cc +++ b/quic/core/crypto/crypto_utils.cc @@ -12,6 +12,7 @@ #include "third_party/boringssl/src/include/openssl/hkdf.h" #include "third_party/boringssl/src/include/openssl/mem.h" #include "third_party/boringssl/src/include/openssl/sha.h" +#include "third_party/boringssl/src/include/openssl/ssl.h" #include "net/third_party/quiche/src/quic/core/crypto/aes_128_gcm_12_decrypter.h" #include "net/third_party/quiche/src/quic/core/crypto/aes_128_gcm_12_encrypter.h" #include "net/third_party/quiche/src/quic/core/crypto/aes_128_gcm_decrypter.h" @@ -690,6 +691,29 @@ const char* CryptoUtils::HandshakeFailureReasonToString( return "INVALID_HANDSHAKE_FAILURE_REASON"; } +const char* CryptoUtils::EarlyDataReasonToString( + ssl_early_data_reason_t reason) { + switch (reason) { + RETURN_STRING_LITERAL(ssl_early_data_unknown); + RETURN_STRING_LITERAL(ssl_early_data_disabled); + RETURN_STRING_LITERAL(ssl_early_data_accepted); + RETURN_STRING_LITERAL(ssl_early_data_protocol_version); + RETURN_STRING_LITERAL(ssl_early_data_peer_declined); + RETURN_STRING_LITERAL(ssl_early_data_no_session_offered); + RETURN_STRING_LITERAL(ssl_early_data_session_not_resumed); + RETURN_STRING_LITERAL(ssl_early_data_unsupported_for_session); + RETURN_STRING_LITERAL(ssl_early_data_hello_retry_request); + RETURN_STRING_LITERAL(ssl_early_data_alpn_mismatch); + RETURN_STRING_LITERAL(ssl_early_data_channel_id); + RETURN_STRING_LITERAL(ssl_early_data_token_binding); + RETURN_STRING_LITERAL(ssl_early_data_ticket_age_skew); + RETURN_STRING_LITERAL(ssl_early_data_quic_parameter_mismatch); + } + QUIC_BUG_IF(reason < 0 || reason > ssl_early_data_reason_max_value) + << "Unknown ssl_early_data_reason_t " << reason; + return "unknown ssl_early_data_reason_t"; +} + // static std::string CryptoUtils::HashHandshakeMessage( const CryptoHandshakeMessage& message, diff --git a/quic/core/crypto/crypto_utils.h b/quic/core/crypto/crypto_utils.h index 6c167f8..ce3e1d8 100644 --- a/quic/core/crypto/crypto_utils.h +++ b/quic/core/crypto/crypto_utils.h @@ -210,7 +210,7 @@ class QUIC_EXPORT_PRIVATE CryptoUtils { // Returns the name of the HandshakeFailureReason as a char* static const char* HandshakeFailureReasonToString( HandshakeFailureReason reason); - + static const char* EarlyDataReasonToString(ssl_early_data_reason_t reason); // Returns a hash of the serialized |message|. static std::string HashHandshakeMessage(const CryptoHandshakeMessage& message, Perspective perspective); diff --git a/quic/core/quic_crypto_server_stream.cc b/quic/core/quic_crypto_server_stream.cc index 665cae1..5fc6839 100644 --- a/quic/core/quic_crypto_server_stream.cc +++ b/quic/core/quic_crypto_server_stream.cc @@ -360,6 +360,16 @@ bool QuicCryptoServerStream::GetBase64SHA256ClientChannelID( return true; } +ssl_early_data_reason_t QuicCryptoServerStream::EarlyDataReason() const { + if (IsZeroRtt()) { + return ssl_early_data_accepted; + } + if (zero_rtt_attempted_) { + return ssl_early_data_session_not_resumed; + } + return ssl_early_data_no_session_offered; +} + bool QuicCryptoServerStream::encryption_established() const { return encryption_established_; } diff --git a/quic/core/quic_crypto_server_stream.h b/quic/core/quic_crypto_server_stream.h index 5a4b9b1..14a4fd9 100644 --- a/quic/core/quic_crypto_server_stream.h +++ b/quic/core/quic_crypto_server_stream.h @@ -13,7 +13,7 @@ #include "net/third_party/quiche/src/quic/core/quic_crypto_server_stream_base.h" #include "net/third_party/quiche/src/quic/core/quic_session.h" #include "net/third_party/quiche/src/quic/platform/api/quic_export.h" - +#include "third_party/boringssl/src/include/openssl/ssl.h" namespace quic { namespace test { @@ -49,6 +49,7 @@ class QUIC_EXPORT_PRIVATE QuicCryptoServerStream void OnHandshakeDoneReceived() override; bool ShouldSendExpectCTHeader() const override; const ProofSource::Details* ProofSourceDetails() const override; + ssl_early_data_reason_t EarlyDataReason() const override; // From QuicCryptoStream bool encryption_established() const override; diff --git a/quic/core/quic_crypto_server_stream_base.h b/quic/core/quic_crypto_server_stream_base.h index bea998d..5427830 100644 --- a/quic/core/quic_crypto_server_stream_base.h +++ b/quic/core/quic_crypto_server_stream_base.h @@ -90,6 +90,7 @@ class QUIC_EXPORT_PRIVATE QuicCryptoServerStreamBase : public QuicCryptoStream { // made. The Details are owned by the QuicCryptoServerStreamBase and the // pointer is only valid while the owning object is still valid. virtual const ProofSource::Details* ProofSourceDetails() const = 0; + virtual ssl_early_data_reason_t EarlyDataReason() const = 0; };2.1 quic_client无法使用0-RTT
使用quic_client测试时,打印出来的原因一直都是ssl_early_data_no_session_offered,BoringSSL意思是The client did not offer a session,也就是客户端没有提供session。查看GQUIC和IQUIC在客户端支持0-RTT方面的差异,我们看到,客户端可能是需要持有一个SessionCache对象才可以,正好quic_client在初始化SessionCache时将其置为了nullptr。于是我们修改代码,使用tools/simple_session_cache来初始化SessionCache:
--- a/net/tools/quic/quic_simple_client.cc +++ b/net/tools/quic/quic_simple_client.cc @@ -28,7 +28,7 @@ #include "net/third_party/quiche/src/quic/platform/api/quic_ptr_util.h" #include "net/third_party/quiche/src/quic/tools/quic_simple_client_session.h" #include "net/third_party/quiche/src/spdy/core/spdy_header_block.h" - +#include "net/third_party/quiche/src/quic/test_tools/simple_session_cache.h" using std::string; namespace net { @@ -48,7 +48,8 @@ QuicSimpleClient::QuicSimpleClient( quic::QuicWrapUnique( new QuicClientMessageLooplNetworkHelper(&clock_, this)), std::move(proof_verifier), - nullptr), + //nullptr), + std::make_unique<quic::test::SimpleSessionCache>()), initialized_(false) { set_server_address(server_address); } diff --git a/net/BUILD.gn b/net/BUILD.gn index c46b8c6..84a1e7e 100644 --- a/net/BUILD.gn +++ b/net/BUILD.gn @@ -2698,6 +2698,8 @@ source_set("simple_quic_tools") { "tools/quic/quic_transport_simple_server.h", "tools/quic/synchronous_host_resolver.cc", "tools/quic/synchronous_host_resolver.h", + "//net/third_party/quiche/src/quic/test_tools/simple_session_cache.h", + "//net/third_party/quiche/src/quic/test_tools/simple_session_cache.cc", ] deps = [ ":net",2.2 服务端不支持0-RTT
再次测试,发现0-RTT的原因变成了:ssl_early_data_disabled,意思是本连接不支持early_data,猜想可能是服务端哪里有问题,查看quiche的历史提交我们发现有一个有意思的点就是一个quic flag: quic_enable_zero_rtt_for_tls,实际上这个flag默认是enable的,所以问题还不是出现这。经过调研,发现实际上起设置early_data作用的是:
SSL_CTX_set_early_data_enabled(SSL_CTX *ctx, int enabled)
具体调用的地方是:
// static bssl::UniquePtr<SSL_CTX> TlsServerConnection::CreateSslCtx( ProofSource* proof_source) { bssl::UniquePtr<SSL_CTX> ssl_ctx = TlsConnection::CreateSslCtx(); SSL_CTX_set_tlsext_servername_callback(ssl_ctx.get(), &SelectCertificateCallback); SSL_CTX_set_alpn_select_cb(ssl_ctx.get(), &SelectAlpnCallback, nullptr); // We don't actually need the TicketCrypter here, but we need to know // whether it's set. if (GetQuicRestartFlag(quic_enable_tls_resumption_v4) && proof_source->GetTicketCrypter()) { SSL_CTX_set_ticket_aead_method(ssl_ctx.get(), &TlsServerConnection::kSessionTicketMethod); QUIC_CODE_COUNT_N(quic_tls_resumption_ticket_method, 1, 2); if (GetQuicRestartFlag(quic_enable_zero_rtt_for_tls_v2)) { SSL_CTX_set_early_data_enabled(ssl_ctx.get(), 1); } } else { QUIC_CODE_COUNT_N(quic_tls_resumption_ticket_method, 2, 2); SSL_CTX_set_options(ssl_ctx.get(), SSL_OP_NO_TICKET); } return ssl_ctx; }经过调试发现,我们的proof_source->GetTicketCrypter()返回了null,才导致没有设置enable early data。检查proof_source,果然没有设置GetTicketCrypter这个接口,于是我们做了如下修复:
std::unique_ptr<ProofSourceChromium> proof_source(new ProofSourceChromium()); CHECK(proof_source->Initialize(cert_path, key_path, base::FilePath())); proof_source->SetTicketCrypter(std::make_unique<SimpleTicketCrypter>(QuicChromiumClock::GetInstance()));TicketCrypter是用来加解密session ticket的。session ticket在TLS1.3 0-RTT中还是起到很关键的作用的。
2.3 0-RTT仅能使用一次
修复上面的问题,我们看到0-RTT的reason成功地变成了ssl_early_data_accepted,也就是0-RTT成功了。但是我们发现第三次建连请求,0-RTT就失效了,reason为ssl_early_data_quic_parameter_mismatch,也就是说客户端发来的ticket和服务端保存的ticket没有匹配成功。调研发现,BoringSSL发现这个错误是在:
//file: tls13_server.cc static bool quic_ticket_compatible(const SSL_SESSION *session, const SSL_CONFIG *config) { if (!session->is_quic) { return true; } //session->quic_early_data_context为空,所以这里面就返回了false,问题就在于这个地方为什么是空呢? if (session->quic_early_data_context.empty() || config->quic_early_data_context.size() != session->quic_early_data_context.size() || CRYPTO_memcmp(config->quic_early_data_context.data(), session->quic_early_data_context.data(), session->quic_early_data_context.size()) != 0) { return false; } return true; }session->quic_early_data_context是服务端保存的一个数据,用来表示服务端是否应该拒绝early data。而这个session,则是从client hello里面的psk extension中解析的ticket而来的。在正常情况(第一次0-RTT可以使用的那次连接)下,从ticket中解析的session,经过验证,其quic_early_data_context这一部分已经包含了有效的数据(也就是一定不为空,不然上面那个quic_ticket_compatible就无法验证通过,0-RTT就不能生效)。但是第二次尝试使用0-RTT时,这里的quic_early_data_context就失效了。这是什么原因呢?
经过分析BoringSSL的握手机制,我们来看在每次建连时的一个大概的过程:
1)初次握手是1-RTT,客户端不发送psk extension,因为服务端之前没有发送过任何session ticket相关的东西;
2)服务端在1-RTT握手时,会调用ssl_get_new_session从hs->config获取quic_early_data_context,最后赋值给了hs->new_session:
int ssl_get_new_session(SSL_HANDSHAKE *hs, int is_server) { SSL *const ssl = hs->ssl; if (ssl->mode & SSL_MODE_NO_SESSION_CREATION) { OPENSSL_PUT_ERROR(SSL, SSL_R_SESSION_MAY_NOT_BE_CREATED); return 0; } UniquePtr<SSL_SESSION> session = ssl_session_new(ssl->ctx->x509_method); if (session == NULL) { return 0; } session->is_server = is_server; session->ssl_version = ssl->version; session->is_quic = ssl->quic_method != nullptr; if (is_server && ssl->enable_early_data && session->is_quic) { //这里quic_early_data_context会被成功设置 if (!session->quic_early_data_context.CopyFrom( hs->config->quic_early_data_context)) { return 0; } } /* ..... omit some codes. ..... */ hs->new_session = std::move(session); ssl_set_session(ssl, NULL); return 1; }hs->new_session,这是个很重要的对象,后面创建的session都是以new_session为基础复制出来的
3)在1-RTT,服务端根据hs->new_session发送2个new session ticket给client:
static bool add_new_session_tickets(SSL_HANDSHAKE *hs, bool *out_sent_tickets) { SSL *const ssl = hs->ssl; ................. //kNumTickets为2 for (int i = 0; i < kNumTickets; i++) { //请留意这里的SSL_SESSION_INCLUDE_NONAUTH,设置为该flag,SSL_SESSION_dup会复制 // quic_early_data_context UniquePtr<SSL_SESSION> session( SSL_SESSION_dup(hs->new_session.get(), SSL_SESSION_INCLUDE_NONAUTH)); if (!session) { return false; } if (!RAND_bytes((uint8_t *)&session->ticket_age_add, 4)) { return false; } session->ticket_age_add_valid = true; bool enable_early_data = ssl->enable_early_data && (!ssl->quic_method || !ssl->config->quic_early_data_context.empty()); //一般情况下为true if (enable_early_data) { // QUIC does not use the max_early_data_size parameter and always sets it // to a fixed value. See draft-ietf-quic-tls-22, section 4.5. session->ticket_max_early_data = ssl->quic_method != nullptr ? 0xffffffff : kMaxEarlyDataAccepted; } ............... }在完成1-RTT建连时,服务端发送new session ticket到client,client将其保存下来。在未来的0-RTT建连中,在psk extension中体现。
3) 0-RTT时,服务端解析client hello及其psk extension,从中解析出ticket,进而取得session(参考BoringSSL的select_session方法),这个session用来判断0-RTT参数是否能够匹配(见quic_ticket_compatible),且用来构造hs->new_session,:
switch (select_session(hs, &alert, &session, &ssl->s3->ticket_age_skew, &offered_ticket, msg, &client_hello)) { case ssl_ticket_aead_ignore_ticket: ......... break; case ssl_ticket_aead_success: // Carry over authentication information from the previous handshake into // a fresh session. // hs->new_session从session中复制,flag是SSL_SESSION_DUP_AUTH_ONLY hs->new_session = SSL_SESSION_dup(session.get(), SSL_SESSION_DUP_AUTH_ONLY); if (hs->new_session == nullptr) { ssl_send_alert(ssl, SSL3_AL_FATAL, SSL_AD_INTERNAL_ERROR); return ssl_hs_error; } .......... }4)以3)构造出的new_session为基础,然后服务端继续在添加new session ticket的时候(第2)步),以这个new_session为基础复制出新的session,用来构造session ticket。这样可以继续发送new session ticket给client。
上述过程中有什么问题呢?就在于3)步那里,从client hello中解析出来的session是包含quic_early_data_context的,但是把它复制给hs->new_session的时候,quic_early_data_context这部分没有被复制。这是因为dup flag设置为了SSL_SESSION_DUP_AUTH_ONLY,在这种模式下,quic_early_data_context不会被复制。但是由于解析的session里面是包含的,第一次0-RTT的判断逻辑也是根据它来进行的,所以第一次0-RTT可以成功。但是随后这个没有quic_early_data_context的hs->new_session,再后来又用于设置new_session_ticket,也就是第一次完成0-RTT后,服务端发送给客户端的new_session_ticket就缺失了quic_early_data_context,导致下一次0-RTT建连,psk extension就缺失了这部分,也就导致彼时解析出来的session也是缺失quic_early_data_context。最终就导致在检查参数是否匹配时,发生了错误。
临时解决办法就是把SSL_SESSION_DUP_AUTH_ONLY修改为SSL_SESSION_INCLUDE_NONAUTH,就可以了。
3. 总结和思考
目前我们使用的BoringSSL的版本是c947efa,解决上述问题也是强改的代码。为什么BoringSSL在这里是限制了使用SSL_SESSION_DUP_AUTH_ONLY ?他们是基于什么原因考虑这样呢?是仅仅因为考虑不周到的缺陷,还是因为安全等原因,暂时不考虑支持0-RTT的重复建连?这些都是需要日后关注的问题。